Low Resource Verse Dataset and Prosodic Feature Integration for Sanskrit ASR
Jaiswal S., Routray G., Rai A., Dwivedi P., Hegde R.M.
Conference paper, Proceedings of the National Conference on Communications, NCC, 2025, DOI Link
View abstract ⏷
Sanskrit, a linguistically rich classical language, presents significant challenges for Automatic Speech Recognition (ASR) due to its intricate phonetic structures, complex morphology, and variability in pronunciation. Existing ASR systems often struggle with these aspects, particularly in verse-based corpora, where prosodic features play a vital role. Additionally, the scarcity of annotated datasets limits advancements in this domain. To address these issues, we introduce Sabdavrndam (), a specialized low-resource dataset of Sanskrit verses enriched with prosodic information. By integrating prosodic features, such as pitch and energy, into the ASR pipeline alongside MFCC features, we aim to enhance recognition performance for verse-based corpora. Our experiments using state-of-the-art ASR models reveal that prosodic features provide valuable contextual information but also introduce additional complexity, impacting error rates. These findings underscore the potential of prosodic features while highlighting the need for more effective integration strategies.
Feature Transformation for Fast and Efficient Learning in Near-Field Source Localization
Dwivedi P., Routray G., Hegde R.M.
Article, IEEE Transactions on Artificial Intelligence, 2025, DOI Link
View abstract ⏷
This work develops an efficient and fast learning method for near-field acoustic source localization using the spherical harmonics (SH) feature transformation. The SH features are derived through the SH decomposition of the microphone array recordings. However, these SH features are often impaired by noise, interference, and reverberation, hindering localization accuracy. In this context, we proposed a feature transformation that leverages the signal subspace of the SH decomposed signals. The feature transformation reduces the irregularities in decomposed SH parameter in the noisy and reverberate condition. Therefore, the proposed subspace based feature captures significant directional and range-dependent cues for localization and enhances the training performance and accuracy with fewer epochs. The efficacy of this fast learning approach is demonstrated using convolutional neural network (CNN) training to map the input features to localization classes. The performance of the proposed approach is evaluated through exhaustive simulation and experiments and compared with the previous methods.
Spherical Harmonics Intensity Features for Improved Near-Field DOA Estimation
Dwivedi P., Routray G., Hegde R.M.
Conference paper, Proceedings of the National Conference on Communications, NCC, 2025, DOI Link
View abstract ⏷
Near-field acoustic direction of arrival (DOA) estimation remains a relatively unexplored challenge in array processing, particularly under noisy and reverberant conditions. This paper presents a novel near-field source localization method leveraging the acoustic intensity vector in the spherical harmonics (SH) domain. Unlike conventional pressure coefficients, SH-based inten-sity (SH - INT) coefficients enhance the DOA estimation accuracy by improving the distance between the modes at high frequencies. The proposed approach begins by modeling the sound pressure captured by a spherical microphone array (SMA) in the SH domain. The acoustic intensity vector is derived from the SH decomposition of sound pressure and acoustic velocity, effectively encoding directional and energy information. By analyzing the dependency of the intensity vector on location, a convolutional neural network (CNN) is trained to map these features to DOA co-ordinates (azimuth and elevation), even in challenging reverberant and noisy environments. Comprehensive evaluations conducted through simulations and real speech experiments demonstrate the efficacy of the proposed method. Results reveal a substantial reduction in root mean square error (RMSE) compared to state-of-the-art techniques, highlighting its potential for accurate near-field acoustic localization.
Sparse Bayesian Integrated CNN Framework for Enhanced Acoustic Source Localization
Dwivedi P., Routray G., Hegde R.M.
Conference paper, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2025, DOI Link
View abstract ⏷
This paper presents a novel framework for super-resolution direction of arrival (DOA) estimation of acoustic sources in the spherical harmonics (SH) domain. The proposed approach combines sparse Bayesian learning (SBL) with a convolutional neural network (CNN). The CNN is utilized to classify DOA based on the spherical harmonics decomposition (SHD) of recordings from a spherical microphone array (SMA), providing coarse DOA estimates. These estimates are then refined using SBL, which operates on a densely sampled grid around the CNN-predicted DOA classes to achieve precise localization. The CNN component exhibits robustness in noisy and reverberant environments, while SBL specializes in high-resolution localization of multiple sparse sources. By leveraging the strengths of both methods, the SH-CNN-SBL framework enhances DOA estimation accuracy in challenging conditions. Extensive simulations and real-world experiments are performed to validate the effectiveness, of the proposed method in achieving a resolution of 1◦
Improving Source Tracking Accuracy Through Learning-Based Estimation Methods in SH Domain: A Comparative Study
Dwivedi P., Routray G., Jha D.K., Hegde R.M.
Article, IEEE Transactions on Artificial Intelligence, 2024, DOI Link
View abstract ⏷
Acoustic source tracking is significant across applications like surveillance, teleconferencing, and robot audition, yet the complexity introduced by reverberation, background noise, and overlapping sources impedes precise source localization. This article uses learning-based localization methods to introduce a resilient and intelligent acoustic source tracking approach in the spherical harmonics (SHs) domain. The tracking algorithms anticipate moving source locations by leveraging past predictions and direction of arrival (DOA) estimations. The prediction probability is computed through alpha-beta and Kalman filtering applied to the estimated DOAs, which are likelihood probabilities obtained from learning models. Utilizing the spatial attributes of sound sources encoded in SH signals, diverse learning-based frameworks are introduced to capture the intricate relationship between SH features and source locations. Supervised learning is utilized to train the models that minimize localization errors between predicted and ground truth positions. Experimental assessments underscore the efficacy and resilience of our proposed approach, conducted using LOCAlization and TrAcking (LOCATA) data, revealing a substantial enhancement in tracking accuracy compared to baseline methods.
Octant Spherical Harmonics Features for Source Localization Using Artificial Intelligence Based on Unified Learning Framework
Dwivedi P., Routray G., Hegde R.M.
Article, IEEE Transactions on Artificial Intelligence, 2024, DOI Link
View abstract ⏷
Recent advancements in artificial intelligence (AI) have shown potential solutions to acoustic source localization in three-dimensional space. This article proposes a new low-complex AI-based framework in the spherical harmonics (SH) domain for efficient direction of arrival (DOA) estimation. The SH coefficients are the key features for the DOA estimation and are obtained from the SH decomposition (SHD) of the spherical microphone array (SMA) recordings. Subsequently, the unified convolutional neural network (UCNN) model is trained to estimate the source azimuth and elevation from the phase and magnitude of the SH coefficient. Since the relation between the azimuth and elevation with phase and magnitude of the SH coefficient is subjective, a high volume of data are required to train the model. In this context, the symmetric properties of the SH basis function are explored to obtain the SH implicit symmetric coefficients (SH-ISCs) that split the 3-D space into octant classes. Within each octant, the phase and magnitude of the SH coefficients exhibit one-to-one correspondence with the source azimuth and elevation and execute the data redundancy. This work can be divided into two parts, a multiclass support vector machine (M-SVM) is investigated to obtain the octant classes from the SH-ISC in the first part. In the second part, the UCNN model is developed to estimate the DOA angles in each octant class. Further, the proposed technique is computationally efficient compared to the baseline learning algorithms in terms of computational and run-time complexity. Impact Statement—DOA estimation is an important task in signal processing that involves determining the angle of arrival of signals in an array of sensors or antennas. AI techniques, such as machine learning and deep learning, can significantly enhance DOA estimation by providing more accurate, efficient, robust, and adaptable solutions. AI algorithms can learn complex patterns and relationships in data, optimize for specific hardware, and adapt to changing signal environments. This work explores the significance of AI in DOA estimation and highlights the potential benefits that AI can bring to this critical task in signal processing. M-SVM and UCNN models are studied in this work. Combining these learning models provides a robust DOA estimation corresponding to the SH features. Performance measured in terms of accuracy, root mean square error, and complexity yield intriguing findings that encourage using the proposed model in real-world scenarios.
Sparsity-driven loudspeaker gain optimization for sound field reconstruction with spherical microphone array
Article, Digital Signal Processing: A Review Journal, 2024, DOI Link
View abstract ⏷
The paper presents a sparsity-driven method utilizing loudspeakers to reconstruct spatial sound fields using measurements obtained from a spherical microphone array (SMA). Employing spherical harmonics decomposition (SHD), the SMA recordings are characterized in the spherical harmonics domain. The gains for the loudspeakers are determined through an optimization problem, equating spherical harmonics pressure coefficients from primary and secondary sources. Furthermore, the sparsity within the loudspeaker feeds is redefined as a constrained sparse optimization problem, integrating linearity and orthogonality constraints. This method effectively reduces the required loudspeakers while maintaining sound field quality. The Bregman iteration method is applied to solve the constrained optimization problem. Rigorous evaluation based on reconstructed sound fields and objective measures highlights significant enhancements compared to least square and compressed sensing methods.
Long-Term Temporal Audio Source Localization using SH-CRNN
Dwivedi P., Hazare S.B., Routray G., Hegde R.M.
Conference paper, 2023 National Conference on Communications, NCC 2023, 2023, DOI Link
View abstract ⏷
Acoustic source localization in a noisy and reverberating environment is still a challenging problem in signal processing. An improved technique has been developed herein exploring the convolutional recurrent neural network (CRNN) in the spherical harmonics domain for the far-field direction of arrival (DOA) estimation. The source signal is recorded using a spherical microphone array (SMA), and the spherical harmonics decomposition (SHD) of the recordings yields the spherical harmonics (SH) pressure coefficients. Subsequently, the SH phase and magnitude coefficients, are calculated. The CRNN model is designed and trained with long-term temporal SH magnitude and phase coefficients across all the frequencies to classify these features corresponding to the source locations. The proposed technique is assessed by extensive simulations and experimental analysis at various the signal-to-noise (SNR) ratio and reverberation time RT60. The root mean square error (RMSE) is evaluated for the proposed DOA estimation technique, and a comparison with the state-of-art methods shows a significant improvement in the localization of the audio source.
Sniper Localization using Acoustic Signal Processing based on Time of Arrivals
Rakesh R., Routray G., Dwivedi P., Hegde R.M.
Conference paper, 2023 National Conference on Communications, NCC 2023, 2023, DOI Link
View abstract ⏷
This paper presents a robust sniper localization technique based on the time of arrival of the shock waves, unlike the conventional methods that need both muzzle blast and shock wave. In the proposed work, a two-dimensional array geometry is considered to get rid of solving non-linear equations, as in the case of the linear array. Since the bullet's velocity is not constant throughout the trajectory, the deceleration parameter is also considered in the proposed work to improve the accuracy compared to conventional methods. The time of arrival of the shock waves are measured from the generalized cross correlation phase transform (GCC-PHAT) between microphone signal recordings. Extensive simulations are carried out for the proposed work and to compare it with the baseline method. The performance of the proposed method is observed to be better than the previous method.
LEARNING-BASED MASKING FOR RELIABLE SOURCE LOCALIZATION INTERFERED BY UNDESIRED DIRECTIONAL NOISE
Dwivedi P., Routray G., Hazare S.B., Hegde R.M.
Conference paper, Proceedings of Forum Acusticum, 2023,
View abstract ⏷
It is incredibly challenging to simultaneously locate an acoustic source in a noisy, reverberant environment and mitigates directional interference. The proposed study uses a spherical harmonic decomposition method to determine the spherical harmonics phase magnitude (SH-PM) components corresponding to the received spherical microphone array (SMA) signals. Before SH-PM components are used as input features to the CNN model, binary masking removes directional interference and emphasizes the desired audio source. In this work, the binary mask is estimated using the learning technique such that it is possible to reliably discriminate between acceptable and undesired sources using real-time mask estimation. The proposed strategy creates a learning-based mask to enable real-time and reliable filtering of the undesirable source. Because of this, the entire strategy is extremely flexible and adaptable. By creating datasets, extensive simulations evaluate the effectiveness of the offered strategy. Additionally, the approach is experimentally validated by conducting tests in a live lab setting. The significance of the suggested strategy promotes the use of the technique in real-world situations.
Learning based method for near field acoustic range estimation in spherical harmonics domain using intensity vectors: Learning based method for near field acoustic range estimation
Dwivedi P., Routray G., Hegde R.M.
Article, Pattern Recognition Letters, 2023, DOI Link
View abstract ⏷
Near-field acoustic range estimation is considered one of the least explored research problems in digital signal processing under noise and reverberant conditions. This letter develops a new learning-based range estimation technique utilizing the spherical harmonics intensity (SH-INT) coefficients. The conventional range estimation in the spherical harmonics (SH) domain relies on the pressure coefficients. However, at high frequencies, these coefficients of different order and range overlap and hinder the accuracy of range estimation. On the contrary, the SH-INT coefficients are well distinguished at high frequencies for various orders and ranges, making these features favorable for accurate range estimation using learning algorithms. Since the SH-INT coefficients in the radial direction are independent of the source signal and vary with range, a convolutional neural network (CNN) model has been adopted to map the SH-INT coefficients with the range classes. The performance of the proposed spherical harmonic intensity (SH-INT) features in the context of near-field range estimation is validated by conducting exhaustive experiments on simulated and real data. Further, the error in near-field source range estimates is characterized using root mean square error (RMSE) criteria. The results are impactful and encourage the use of this method for practical near-field source range estimation applications.
DIVERSITY MINIMIZATION TECHNIQUE FOR MULTIPLE MEASUREMENT VECTOR-BASED SUPER-RESOLUTION SPATIAL AUDIO IMAGING
Routray G., Dwivedi P., Hegde R.M.
Conference paper, Proceedings of Forum Acusticum, 2023,
View abstract ⏷
Ambisonics is an efficient spatial sound acquisition and reproduction technique in the spherical harmonic domain. At low frequencies, lower-order ambisonics reproduction is accurate, but at high frequencies, the spatial resolution suffers. An increase in frequency shrinks the radius of the error-free region and degrades the spatial resolution. Higher-order ambisonics (HOA) provided better spatial resolution in this context. However, sound spatial acquisition in HOA is constrained by hardware complexity and storage space, in contrast to low-order ambisonics (B-format). So, it is worthwhile to acquire the sound scene at low order to reduce hardware complexity and storage requirement and upscale to a higher order while reproducing to improve the spatial resolution. This work investigated algorithms based on minimizing the diversity measures for obtaining higher-order ambisonics from the B-format signals. In particular, we are interested in the FOCUSS (FOCal Underdetermined System Solver) class of algorithms, which is an alternative and complementary approach to the sequential forward method. Also, a more robust regularized FOCUSS algorithm for the sparse inverse problem is investigated further. The performance of the proposed upscaling method is evaluated using the mean square error metrics. The subjective evaluation is performed using a listening test and compared with state-of-art methods.
Spatial audio reproduction over ad hoc loudspeaker array using near-field compensation in spherical harmonics domain
Article, Digital Signal Processing: A Review Journal, 2023, DOI Link
View abstract ⏷
Spatial audio reproduction using the loudspeaker array introduces the curvature effect leading to a distorted listening experience when the listener is in the near field. In the near-field, the loudspeakers are approximated as point sources (spherical wave) and amplify the mode vectors. Further, the problem becomes more challenging for the irregular loudspeaker arrangement, which causes uneven energy distribution in the reproduction region. In this context, a near-field compensation is applied to the encoded ambisonics coefficients. An optimization problem is formulated, such as the loudspeaker gains encoded with spherical harmonics basis coefficients should match the target ambisonics coefficients. Further, the in-phase and quadrature components of the energy localization vector are imposed as the constraints to direct maximum energy in the reproduction region. The solution to the optimization problem is obtained using a derivative-free optimization solver. The performance of the proposed methods is evaluated for ITU-R recommended loudspeaker layouts using the technical and perceptual evaluation attributes.
Joint DOA Estimation in Spherical Harmonics Domain using Low Complexity CNN
Dwivedi P., Gohil R.P., Routray G., Varanasiy V., Hegde R.M.
Conference paper, SPCOM 2022 - IEEE International Conference on Signal Processing and Communications, 2022, DOI Link
View abstract ⏷
Direction of arrival (DOA) estimation for multi-channel speech enhancement is a challenging problem. In this context, this paper proposes a new method for joint DOA estimation using a low complexity convolutional neural network (CNN) architecture. The spherical harmonic (SH) coefficients of the received speech signal are obtained from the spherical harmonics decomposition (SHD). The magnitude and phase features are extracted from these SH coefficients and combined as a single feature for training the CNN. A single CNN model is trained using these combined features in contrast to two CNN models used in earlier work. Both azimuth and elevation are then obtained for estimation of DOA from this single CNN. Extensive simulations are also conducted for the performance evaluation of the proposed low complexity CNN model. It is observed that the proposed CNN model provides robust DOA estimates at the various signal to noise ratios (SNR) and reverberation times with reduced computational complexity. Performance evaluated in terms of the gross error (GE) and run-time complexity also provides interesting results motivating the use of the proposed model in practical applications.
Binaural Reproduction of HOA Signal using Temporal Convolutional Networks
Routray G., Dwivedi P., Gohil R.P., Hegde R.M.
Conference paper, Proceedings of the International Congress on Acoustics, 2022,
View abstract ⏷
In this work, a temporal convolutional network (TCN) based binaural reproduction of higher-order ambisonics (HOA) signals in the spherical harmonics (SH) domain is proposed. The binaural rendering is characterized by the head-related transfer function (HRTF). Since the HRTFs cannot be measured for all the directions, it limits error-free binaural reproduction. The proposed work presents a data-driven approach to learning binaural cues from the anthropometric parameter and source directions. The task is to estimate masking functions that transform the higher-order ambisonics (HOA) signals into binaural signals. The learning framework takes the HOA signals as the input along with the anthropometric parameters to generate the binaural signals. In the proposed method, the TCN implicitly learns the HRTFs parameter and produces the binaural signal. The performance of the method is evaluated based on the reproduction accuracy and mean square error (MSE). Further real-time experiments are carried out using the CIPIC HRTF dataset and the binaural recording using the autogenously developed bionic ears to validate the performance of the proposed method.
Far-field Source Localization in Spherical Harmonics Domain using Acoustic Intensity Vector
Dwivedi P., Routray G., Hegde R.M.
Conference paper, Proceedings of the International Congress on Acoustics, 2022,
View abstract ⏷
Source localization in the presence of reverberation and a noisy environment is still a challenging research problem and has various signal processing applications. This paper proposes a novel far-field source localization method using the acoustic intensity vector in the spherical harmonics domain. The mathematical model for the sound pressure captured by the spherical microphone array (SMA) is first developed in the spherical harmonics domain. Subsequently, the acoustic intensity vector is derived from the spherical harmonics decomposition of the pressure and acoustic velocity. As the acoustic velocity efficiently preserves the directional information, the intensity vector also contains directional and energy information. The dependency of location on the intensity vector is further explored. Since the intensity vector in the azimuth and elevation plane varies with the location, a unified convolutional neural network (CNN) model is selected to map the intensity features to the locations in reverberant and noisy conditions. Extensive simulations and experiments are conducted both on simulated and real speech data for evaluating the performance of the proposed localization method. The results show a significant improvement in localization accuracy and mean square error (MSE) compared with the state-of-art methods.
Upscaling HOA Signals using Order Recursive Matching Pursuit in Spherical Harmonics Domain
Routray G., Sahu S.K., Hegde R.M.
Conference paper, SPCOM 2022 - IEEE International Conference on Signal Processing and Communications, 2022, DOI Link
View abstract ⏷
Spatia1 sound acquisition in Higher-Order Ambisonics (HOA) is constrained by hardware complexity and storage space. In contrast, the low order ambisonics (B-format Signals) suffers from low spatial resolution. So it is worthwhile to acquire the sound at low order to reduce hardware complexity and storage requirement and upscale to a higher order while reproducing to improve the spatial resolution. In this work, a sparse framework is formulated that efficiently uses the Order Recursive Matching Pursuit (ORMP) algorithm for Multiple Measurement Vectors (MMV) to decompose the low-order encoded signal. Subsequently, the upscaled HOA signal is obtained from the decomposed low-order ambisonics to reproduce the spatial audio with high spatial resolution. The performance of the proposed upscaling method is evaluated using the metrics such as a Mean Square Error (MSE) in upscaled signals and error in the reproduced sound field. The subjective evaluation is carried out using a listening test and compared with state-of-art methods.
DOA Estimation using Multiclass-SVM in Spherical Harmonics Domain
Dwivedi P., Routray G., Hegde R.M.
Conference paper, SPCOM 2022 - IEEE International Conference on Signal Processing and Communications, 2022, DOI Link
View abstract ⏷
Direction of arrival (DOA) estimation is still a challenging and fundamental problem in acoustic signal processing. This paper proposes a new method for DOA estimation that utilizes the support vector machine (SVM) based classification. The source signal is recorded by the spherical microphone array (SMA) and decomposed into the spherical harmonics domain. The phase and the magnitude features are calculated from the spherical harmonics (SH) decomposed signals. A multiclass support vector machine (M-SVM) algorithm is implemented to classify these phase and magnitude features to the DOA classes. Since the SVM is a non-probabilistic and deterministic model, it is computationally faster and highly reduced complexity than the neural network-based learning models. Extensive simulations are conducted for the performance evaluation of the proposed method. It is observed that the proposed model provides robust DOA estimates at various signal-to-noise ratios (SNR) and reverberation time. Performance evaluated in terms of the root mean square error (RMSE) provides interesting results motivating the use of the proposed model in practical applications.
Hybrid SH-CNN-MP Approach for Super Resolution DOA Estimation
Dwivedi P., Routray G., Hegde R.M.
Conference paper, Conference Record - Asilomar Conference on Signals, Systems and Computers, 2022, DOI Link
View abstract ⏷
A novel framework for super-resolution direction of arrival (DOA) estimation of acoustic sources in the spherical harmonics (SH) domain has been addressed in this work. The proposed method is developed in two stages. First, a convolutional neural network (CNN) model is investigated to obtain the DOA classes from the spherical harmonics decomposition (SHD) of the spherical microphone array (SMA) recordings. Subsequently, the matching pursuit (MP) algorithm with a high-resolution search grid corresponding to the DOA classes is applied to the SHD signals to localize the acoustics source. Since the CNN model performs better in the noisy and reverberant environment and the MP algorithm uses the orthogonal property of the SH basis function to provide high-resolution localization, the proposed hybrid model takes advantage of both these models. Extensive simulations and real-time experiments are performed to validate the performance of the proposed model.
Binaural Source Localization in Median Plane using Learning based Method for Robot Audition
Dwivedi P., Routray G., Hegde R.M.
Conference paper, Proceedings of the International Congress on Acoustics, 2022,
View abstract ⏷
This article presents a learning-based binaural source localization technique in the median plane and its application to robot audition. Binaural recordings capture the audio signal and acoustic transfer function from the source to the ears, known as the head-related transfer function (HRTF), which parameterizes spatial cues such as interaural time difference (ITD) and interaural level difference (ILD). ITD and ILD cues are prominent for source localization in the horizontal plane. Since ITD and ILD are nearly equal to zero in the median plane (the ear canal of both the ears is colocated), the localization is complex. Therefore, monaural spectral cues such as spectral notches are investigated for median plane source localization. The spectral notch represents the delay between the direct and the reflected wave. As it varies with the elevation angle, a learning-based model is developed to map the spectral notch with the elevation angle. The spectral notch features are extracted from the binaural recording using linear prediction cepstral coefficients (LPCC) and linear prediction residual coefficients (LPRC). Simulations and experiments are carried out using high-spatial-resolution HRTF measurements from CIPIC dataset to evaluate the performance. The results show a significant improvement in localization accuracy compared with existing methods.
Spherical harmonics domain-based approach for source localization in presence of directional interference
Dwivedi P., Routray G., Hegde R.M.
Article, JASA Express Letters, 2022, DOI Link
View abstract ⏷
This paper presents a learning-based method for source localization in the presence of directional interference under reverberant and noisy conditions. The proposed method operates on the spherical harmonic decomposition of the spherical microphone array recordings to yield spherical harmonics coefficients as the features. An attention mechanism is incorporated through a binary mask that filters out the dominant undesired source components from the features before training. A convolutional neural network is trained to map the phase and magnitude of the filtered coefficients with the location class. Hence, the objective is to develop the binary mask followed by source localization.
Learning based method for robust DOA estimation using co-prime circular conformal microphone array
Gohil R.P., Routray G., Hegde R.M.
Conference paper, 2021 National Conference on Communications, NCC 2021, 2021, DOI Link
View abstract ⏷
Sound source localization in 1-Dimensional (1D) and 2-Dimensional (2D) is one of the most familiar problems in signal processing. Various types of microphone arrays and their geometry have been explored to find an optimal solution to this problem. The problem becomes more challenging for a reverberate and noisy environment. Localization of the source both in the azimuth and elevation increases the complexity further. In this paper, a convolutional neural network (CNN) based learning approach has been proposed to estimate the primary source in 2D space. Further, a noble co-prime circular conformal microphone array (C3MA) geometry has been developed for sound acquisition. The generalized cross-correlation with phase transform (GCC-PHAT)features have been extracted from the C3MA recordings, which are the input features for training purposes. The experimental results show that the learning-based estimation is more robust compared to the conventional signal processing approach. The learning-based approach also explores the GCC-PHAT features and can be adapted in an adverse acoustic environment. The performance of the proposed algorithm shows significant improvement in the root mean squared error (RMSE) and mean absolute error (MAE) scores compared to the available state-of-art methods.
Binaural reproduction of HOA signal using sparse multiple measurement vector projections
Routray G., Dwivedi P., Hegde R.M.
Conference paper, 2021 National Conference on Communications, NCC 2021, 2021, DOI Link
View abstract ⏷
Higher order Ambisonics (HOA) is one of the most promising technology in the reproduction of spatial audio in terms of spatial resolution. However binaural reproduction of spatial audio is ubiquitously used in several popular applications like AR. A novel method for binaural reproduction of HOA signals using sparse plane wave expansion is proposed in this paper. Unlike the parametric methods, the proposed method does not require prior information about the number of the discrete sources. The plane wave expansion of the encoded signals is obtained in the spherical harmonics domain using the multiple measurement vector projections, while upscaling of the input encoded signal is done to preserve the spatial resolution. Head-related transfer function (HRTF) cues are subsequently used to develop the binaural decoder. Unlike the virtual loudspeakers based approach it provides more accuracy in terms of spatial resolution as it removes the diffuse component. The efficacy of this method is illustrated using objective and subjective evaluations.
Spatial HRTF Interpolation using Spectral Phase Constraints
Srivastava A., Routray G., Hegde R.M.
Conference paper, SPCOM 2020 - International Conference on Signal Processing and Communications, 2020, DOI Link
View abstract ⏷
In this paper, a novel method for spatial Head Related Transfer Function (HRTF) interpolation is proposed. The property of linearity of HRTF spectral phase with frequency is first illustrated. An optimization problem is then formulated to compute the interpolated HRTF phase by imposing linearity constraints on the spectral phase of adjacent spatial angles. A weighted error minimization is carried out in an alternating optimization framework to obtain the solution to this problem. Phase-only reconstruction is used to obtain the HRIR for subsequent rendering of binaural audio. Interpolated HRTFs obtained by using the CIPIC and SYMARE database are evaluated using statistical analysis. Subjective evaluations are also performed to evaluate the quality of the binaural audio rendered using the proposed method.
Sparse Plane-wave Decomposition for Upscaling Ambisonic Signals
Conference paper, SPCOM 2020 - International Conference on Signal Processing and Communications, 2020, DOI Link
View abstract ⏷
Lower order ambisonics suffers from low spatial resolution, where hardware complexity is high for direct recording the higher order ambisonics (HOA). This problem can be solved by upscaling the order-l ambisonics (B-format signals). In this paper, a sparse plane-wave decomposition method using sequential matching pursuit is developed for upscaling the order of ambisonics. The proposed method maintains the same sparsity level across multiple measurements and is computationally efficient. The performance of the proposed method is evaluated based on the error in encoded signal and reconstructed sound field, and compared with the state-of-art upscaling techniques. Perceptual evaluations are also conducted, which indicates a significant improvement in spatial resolution.
Learning Based DOA Estimation in Adverse Acoustic Environment using Co-prime Circular Microphone Array
Gohil R., Raikar A., Routray G., Hegde R.M.
Conference paper, 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings, 2020,
View abstract ⏷
The direction of arrival (DOA) estimation is a well-known research problem. It is conditional to different microphone array geometry and acoustic room conditions. It also becomes more challenging in the presence of noise and reverberation. Many traditional signal processing approaches such as least square (LS) based rely on time difference of arrival estimation which is not robust to adverse acoustic conditions and hampers the DOA estimation. This problem can be solved using learningbased algorithms, which uses a large amount of data simulated on similar acoustic conditions. Though much of the work in learning algorithms until now leverages augmentation techniques and deep neural network (DNN) architecture for achieving robustness in DOA estimation, very less attention is given to the feature representation. Robust feature representation can be achieved using certain geometry of microphone array. In this work, a framework comprising of a learning-based DOA estimation along with a circular co-prime microphone array(CCMA) arrangement is proposed. Experiment results show that a robust feature representation is indeed essential in estimating the DOA accurately and gives a significant improvement in terms of root mean squared error(RMSE) and mean-absolute error(MAE) scores when compared to other state-of-the-art DNN and signal processing approaches.
Sparse Framework for Reproduction of NFC-HOA
Conference paper, Conference Record - Asilomar Conference on Signals, Systems and Computers, 2020, DOI Link
View abstract ⏷
In this paper a sparse optimization problem is formulated for reproduction of higher order ambisonics (HOA) encoded signals. The problem is solved using the least absolute shrinkage and selection operator (Lasso). The proposed solution is efficient enough to reproduce the desired sound field with reduced number of loudspeakers. Furthermore, this paper highlights the amplification in the ambisonics components due to the near field effect of the secondary source. The near field compensation(NFC) of HOA encoded signals is carried out to reduce the curvature distortion in the reproduced sound field. Experiments on sound field reproduction using the proposed method are conducted using finite distance loudspeaker arrays. Performance improvements are noted when compared to the standard HOA reproduction techniques in terms of sound field reproduction accuracy and normalized error.
Sparsity based framework for spatial sound reproduction in spherical harmonic domain
Conference paper, European Signal Processing Conference, 2018, DOI Link
View abstract ⏷
In this paper, a novel sparsity based framework is proposed for accurate spatial sound field reproduction in spherical harmonic domain. The proposed framework can effectively reduce the number of loudspeakers required to reproduce the desired sound field using higher order ambisonics (HOA) over a fixed listening area. Although HOA provides accurate reproduction of spatial sound, it has a disadvantage in terms of the restriction on the area of sound reproduction. This area can be increased with the increase in the number of loudspeakers during reproduction. In order to limit the use of a large number of loudspeakers the sparse nature of the weight vector in the HOA signal model is utilized in this work. The problem of obtaining the weight vector is first formulated as a constrained optimization problem which is difficult to solve due to orthogonality property of the spherical harmonic matrix. This problem is therefore reformulated to exploit the sparse nature of the weight vector. The solution is then obtained by using the Bregman iteration method. Experiments on sound field reproduction in free space using the proposed sparsity based method are conducted using loudspeaker arrays. Performance improvements are noted when compared to least squares and compressed sensing methods in terms of sound field reproduction accuracy, subjective, and objective evaluations.
Genetic algorithm based RNN structure for rayleigh fading MIMO channel estimation
Conference paper, Procedia Engineering, 2012, DOI Link
View abstract ⏷
Multi-Input Multi-Output (MIMO) wireless communication system is an emerging area which offers substantial advantages for achieving high data rate with cost effective design. The spectral efficiency and reliability of MIMO systems greatly depends on the assumption that the transmitter and/or the receiver have perfect knowledge of channel state information (CSI). Hence, it is required to predict the CSI at the receiver and send the updated CSI back to the transmitter. In this paper particularly the Frequency-Flat Rayleigh i.i.d MIMO channel is considered, where as the Rayleigh fading channel provides complex and random coefficients. These complex channel parameters are estimated by using split complex real-time recurrent learning (SCRTRL) and fully complex real-time recurrent learning (FCRTRL) algorithm based recurrent neural network (RNN) structure with complex weights. The RNN with SCRTRL and RNN with FCRTRL algorithm produced a premature convergence which is overcome by the proposed genetic algorithm (GA) based learning process.
Rayleigh fading MIMO channel prediction using RNN with genetic algorithm
Conference paper, Communications in Computer and Information Science, 2011, DOI Link
View abstract ⏷
The spectral efficiency and reliability of Multi-Input Multi-Output (MIMO) systems greatly depends on the prediction result of channel state information (CSI), such that the transmitter and/or the receiver have perfect knowledge of CSI. The employment of linear predictors used for narrow-band prediction has produced poor results in prediction of correlation coefficients of the channel in the presence of received data that has undergone non-linear distortions. Hence, one of the potential solutions to this challenge is artificial neural networks (ANN). In this paper we used fully connected recurrent neural network to predict the Rayleigh fading channel coefficients using genetic algorithm based learning, which is compared with split complex real time and fully complex real time based recurrent learning process. © 2011 Springer-Verlag.