Marc Ciufo Green and Damian Murphy,
Sound Source Localisation in Ambisonic Audio Using Peak Clustering,
DCASE Workshop 2019.

Accurate sound source direction-of-arrival and trajectory estimation in 3D is a key component of acoustic scene analysis for many appli- cations, including as part of polyphonic sound event detection sys- tems. Recently, a number of systems have been proposed which per- form this function with first-order Ambisonic audio and can work well, though typically performance drops when the polyphony is increased. This paper introduces a novel system for source localisa- tion using spherical harmonic beamforming and unsupervised peak clustering. The performance of the system is investigated using syn- thetic scenes in first to fourth order Ambisonics and featuring up to three overlapping sounds. It is shown that use of second-order Am- bisonics results in significantly increased performance relative to first-order. Using third and fourth-order Ambisonics also results in improvements, though these are not so pronounced.

Marc Ciufo Green, Damian Murphy, Sharath Adavanne, Tuomas Virtanen,
Acoustic Scene Classification Using Higher-Order Ambisonic Features,
WASPAA 2019.

This paper investigates the potential of using higher-order Ambisonic features to perform acoustic scene classification. We compare the performance of systems trained using first-order and fourth-order spatial features extracted from the EigenScape database. Using both gaussian mixture model and convolutional neural network classifiers, we show that features extracted from higher-order Ambisonics can yield increased classification accuracies relative to first-order features. Diffuseness-based features seem to describe scenes particularly well relative to direction-of-arrival based features. With specific feature subsets, however, differences in classification accuracy between first and fourth-order features become negligible.

Marc Ciufo Green and Damian Murphy,
Environmental Sound Monitoring using Machine Learning on Mobile Devices,
Applied Acoustics, Vol 159

This paper reports on a study to assess the feasibility of creating an intuitive environmental sound monitoring system that can be used on-location and return meaningful measurements beyond the standard LAeq. An iOS app was created using Machine Learning (ML) and Augmented Reality (AR) in conjunction with the Sennheiser AMBEO Smart Headset in order to test this. The app returns readings indicating the human, natural and mechanical sound content of the local acoustic scene, and implements four virtual sound objects which the user can place in the scene to observe their effect on the readings. Testing at various types of urban locations indicates that the app returns meaningful ratings for natural and mechanical sound, though the pattern of variation in the ratings for human sound is less clear. Adding the virtual objects largely has no significant effect aside from the car object, which significantly increases mechanical ratings. Results indicate that using ML to provide meaningful on-location sound monitoring is feasible, though the performance of the app developed could be improved given additional calibration.


Marc Ciufo Green and Damian Murphy,
EigenScape: A Database of Spatial Acoustic Scene Recordings,
Applied Sciences, Special Issue on Sound and Music Computing

The classification of acoustic scenes and events is an emerging area of research in the field of machine listening. Most of the research conducted so far uses spectral features extracted from monaural or stereophonic audio rather than spatial features extracted from multichannel recordings. This is partly due to the lack thus far of a substantial body of spatial recordings of acoustic scenes. This paper formally introduces EigenScape, a new database of fourth-order Ambisonic recordings of eight different acoustic scene classes. The potential applications of a spatial machine listening system are discussed before detailed information on the recording process and dataset are provided. A baseline spatial classification system using directional audio coding (DirAC) techniques is detailed and results from this classifier are presented. The classifier is shown to give good overall scene classification accuracy across the dataset, with 7 of 8 scenes being classified with an accuracy of greater than 60% with an 11% improvement in overall accuracy compared to use of Mel-frequency cepstral coefficient (MFCC) features. Further analysis of the results shows potential improvements to the classifier. It is concluded that the results validate the new database and show that spatial features can characterise acoustic scenes and as such are worthy of further investigation.

Additional materials:

Marc Ciufo Green and Damian Murphy,
Acoustic Scene Classification Using Spatial Features,
DCASE Workshop 2017.

Due to various factors, the vast majority of the research in the field of Acoustic Scene Classification has used monaural or binaural datasets. This paper introduces EigenScape – a new dataset of 4th-order Ambisonic acoustic scene recordings – and presents preliminary analysis of this dataset. The data is classified using a standard Mel-Frequency Cepstral Coefficient – Gaussian Mixture Model system, and the performance of this system is compared to that of a new system using spatial features extracted using Directional Audio Coding (DirAC) techniques. The DirAC features are shown to perform well in scene classification, with some subsets of these features outperforming the MFCC classification. The differences in label confusion between the two systems are especially interesting, as these suggest that certain scenes that are spectrally similar might not necessarily be spatially similar.


Marc Ciufo Green, John Szymanski and Matt Speed,
Assessing the Suitability of the Magnitude Slope Deviation Detection Criterion for use in Automatic Feedback Control,

Acoustic feedback is a recurrent problem in live sound reinforcement scenarios. Many attempts have been made to produce an automated feedback cancellation system, but none have seen widespread use due to concerns over the accuracy and transparency of feedback howl cancellation. This paper investigates the use of the Magnitude Slope Deviation (MSD) algorithm to intelligently identify feedback howl in live sound scenarios. A new variation on this algorithm is developed, tested, and shown to be much more computationally efficient without compromising detection accuracy. The effect of varying the length of the frequency spectrum history buffer available for analysis is evaluated across various live sound scenarios. The MSD algorithm is shown to be very accurate in detecting howl frequencies amongst the speech and classical music stimuli tested here, but inaccurate in the rock music scenario even when a long history buffer is used. Finally, a new algorithm for setting the depth of howl-cancelling notch filters is proposed and investigated. The algorithm shows promise in keeping frequency attenuation to a minimum required level, but the approach has some problems in terms of time taken to cancel howl.

See also: