Speech Activity Detection and its Evaluation in Speaker Diarization System
Keywords:Speaker Diarization System; Artificial Neural Network; Gaussian Mixture Model; ROC; DET
In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to select training data for silent, speech and nonspeech models. The trained models are used by two classifiers, Gaussian mixture model (GMM) and Artificial neural network (ANN), to classify the speech and non-speech frames of audio clip. The results of ANN and GMM classifier are compared by Receiver operating characteristics (ROC) curve and Detection ErrorTradeoff (DET) graph. It is concluded that neural network based SAD
comparatively better than Gaussian mixture model based SAD.
Research,â€ IEEE Trans. Audio Speech Lang. Process., vol. 20, no. 2, pp. 356â€“370, 2012.
 S. Meigner and T. Merlin, â€œAN OPEN SOURCE TOOLKIT FOR DIARIZATION Sylvain Meignier , Teva Merlin LIUM â€“ Universit Â´
du Maine , France.â€
 A. S. Toolkit and G. Gravier, â€œGuillaume Gravier MichaÂ¨ el Betser Mathieu Ben,â€ no. January, 2010.
 D. Vijayasenan and F. Valente, â€œDiarTk: An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application
to Meetings Recordings.,â€ Interspeech, pp. 5â€“8, 2012.
 M. Huijbregts, Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled. 2008.
 S. H. Yella, A. Stolcke, M. Slaney, and M. View, â€œARTIFICIAL NEURAL NETWORK FEATURES FOR SPEAKER
DIARIZATION Idiap Research Institute , CH-1920 Martigny , Switzerland,â€ pp. 402â€“406, 2014.
 A. Slaby, â€œROC analysis with Matlab,â€ Proc. Int. Conf. Inf. Technol. Interfaces, ITI, pp. 191â€“196, 2007.
 C. Micheal, â€œThe EM algorithm.â€ 1997.
 G. Nasr, E. Badr, and C. Joun, â€œCross Entropy Error Function in Neural Networks: Forecasting Gasoline Demand.,â€ FLAIRS Conf., pp.
 M. Huijbregts and F. De Jong, â€œRobust speech/non-speech classification in heterogeneous multimedia content,â€ Speech Commun., vol.
53, no. 2, pp. 143â€“153, 2011.
 A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, â€œThe DET Curve in Assessment of Detection Task
Performance,â€ Proc. Eurospeech â€™97, pp. 1895â€“1898, 1997.
 M. Sinclair and S. King, â€œWhere are the challenges in speaker diarization?,â€ ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. -
Proc., pp. 7741â€“7745, 2013.