VoxMovies

About

VoxMovies is an audio dataset, containing utterances sourced from movies with varying emotion, accents and background noise.

To bechmark performance of speaker recognition systems on this entirely new domain, VoxMovies contains a number of domain adaptation evaluation sets.

856
Speakers

VoxMovies contains speech from speakers in VoxCeleb1 and VoxCeleb2 (speaker recognition training datasets), allowing for domain change within the same identity to be investigated.

1,452
Movies

VoxMovies is sourced from key moments in a wide variety of movies from the Condensed Movies dataset. These movies cover many different genres such as comedy, action, romance and horror.

8,905
Utterances

VoxMovies consists of audio clips. On average each identity has utterances from 2.7 different movies. Variation in emotion and background noise is therefore seen within each identity, as well as across identities.

Movie genres featured in VoxMovies

Utterance lengths

Download and code

The dataset consists of a training and test partition, and several domain adaptation evaluation sets. For further details, please check the paper. Evaluation code can be found here.

The audio files are temporarily unavailable from this website.

License

The VoxMovies dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.

Caution: We note that the distribution of identities in the VoxMovies dataset may not be representative of the global human population. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained or evaluated on this data.

Please contact the authors below if you have any queries regarding the dataset.

Publications

Please cite the following if you make use of the dataset.

A. Brown*, J. Huh*, A. Nagrani*, J. S. Chung, A. Zisserman

Playing a Part: Speaker Verification at the Movies

International Conference on Acoustics, Speech and Signal Processing, 2021

Bibtex | Abstract | PDF

@InProceedings{Brown20b,
                        title={Playing a Part: Speaker Verification at the Movies},
                        author={Andrew Brown and Jaesung Huh and Arsha Nagrani and Joon Son Chung and Andrew Zisserman},
                        year={2020},
                        booktitle={International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021}
                    }

A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman

Voxceleb: Large-scale speaker verification in the wild

Computer Science and Language, 2019

Bibtex | Abstract | PDF

@Article{Nagrani19,
              author       = "Arsha Nagrani and Joon~Son Chung and Weidi Xie and Andrew Zisserman",
              title        = "Voxceleb: Large-scale speaker verification in the wild",
              journal      = "Computer Science and Language",
              year         = "2019",
              publisher    = "Elsevier",
            }

M. Bain, A. Nagrani, A. Brown, A. Zisserman

Condensed Movies: Story Based Retrieval with Contextual Embeddings

Asian Conference on Computer Vision, 2020.

Bibtex | Abstract | PDF

@InProceedings{Bain20,
  author       = "Max Bain and Arsha Nagrani and Andrew Brown and Andrew Zisserman",
  title        = "Condensed Movies: Story Based Retrieval with Contextual Embeddings",
  booktitle    = "Asian Conference on Computer Vision",
  year         = "2020",
}

* Equal Contribution

Dataset Examples

About

856
Speakers

1,452
Movies

8,905
Utterances

Download and code

License

Please contact the authors below if you have any queries regarding the dataset.

Publications

Please cite the following if you make use of the dataset.

Acknowledgements

This work is supported by the EPSRC programme grant Seebibyte EP/M013774/1: Visual Search for the Era of Big Data.

Dataset Examples

About

856 Speakers

1,452 Movies

8,905 Utterances

Download and code

License

Please contact the authors below if you have any queries regarding the dataset.

Publications

Please cite the following if you make use of the dataset.

Acknowledgements

This work is supported by the EPSRC programme grant Seebibyte EP/M013774/1: Visual Search for the Era of Big Data.

856
Speakers

1,452
Movies

8,905
Utterances