Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
VoxMovies
[go: Go Back, main page]

Audio Wave
VoxMovies
A new, challenging speaker recognition domain & dataset

Dataset Examples

Below are example utterances for three identities in VoxMovies, and a contrasting utterance from the same identity in VoxCeleb.
Utterances in VoxMovies have strong variation in emotions, accents, and background noise for each identity. This is in contrast to the calm, interview-style utterances from VoxCeleb.





   
       



   
       



   
       

About

VoxMovies is an audio dataset, containing utterances sourced from movies with varying emotion, accents and background noise.

To bechmark performance of speaker recognition systems on this entirely new domain, VoxMovies contains a number of domain adaptation evaluation sets.

856

Speakers

VoxMovies contains speech from speakers in VoxCeleb1 and VoxCeleb2 (speaker recognition training datasets), allowing for domain change within the same identity to be investigated.




1,452

Movies

VoxMovies is sourced from key moments in a wide variety of movies from the Condensed Movies dataset. These movies cover many different genres such as comedy, action, romance and horror.




8,905

Utterances

VoxMovies consists of audio clips. On average each identity has utterances from 2.7 different movies. Variation in emotion and background noise is therefore seen within each identity, as well as across identities.





gender

Movie genres featured in VoxMovies




gender

Utterance lengths




Download and code

The dataset consists of a training and test partition, and several domain adaptation evaluation sets. For further details, please check the paper. Evaluation code can be found here.

The audio files are temporarily unavailable from this website.

License

The VoxMovies dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.

Caution: We note that the distribution of identities in the VoxMovies dataset may not be representative of the global human population. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained or evaluated on this data.

Please contact the authors below if you have any queries regarding the dataset.

Publications

Please cite the following if you make use of the dataset.

International Conference on Acoustics, Speech and Signal Processing, 2021


A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman
Computer Science and Language, 2019


M. Bain, A. Nagrani, A. Brown, A. Zisserman
Asian Conference on Computer Vision, 2020.



* Equal Contribution

Acknowledgements

This work is supported by the EPSRC programme grant Seebibyte EP/M013774/1: Visual Search for the Era of Big Data.