Created by : Sameer Sharma
NOTE : Dataset created by me has not been uploaded.
This project is broadly organised into three parts. The first part consists of fixing a dataset and after analysis of audio content, preparation of training data for machine learning by manual separation of speech and non-speech segments from a part of the dataset. After separation, segments are divided and stored as one second audio files of respective categories. The second part consists of silence detection and using digital signal processing techniques for feature extraction, which modifies raw data to make it suitable for data modelling [9]. A mathematical model is then selected for training the model. Finally, the model is evaluated on unseen data.
Three-dimensional feature vectors are used for implementing three machine learning algorithms - K-nearest neighbours (KNN), AdaBoost and Artificial Neural Networks. The final part of the project uses log filter banks to extract high-dimensional feature vectors which are then used in Artificial Neural Networks algorithm. Audacity software was used for the manual work of listening and segregation.
Maximum Accuracies on broadcast news data
K Nearest Neighbours with handcrafted features : 87.10 %
AdaBoost with handcrafted features : 86.79 %
Artificial Neural Network with handcrafted features : 88.42 %
Artificial Neural Network with log Mel-filterbank features : 96.12%