Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
VoxSRC Workshop
[go: Go Back, main page]

Audio Wave

The VoxSRC Workshop 2020

Welcome to the VoxSRC Workshop 2020!. The workshop included presentations from the most exciting and novel submissions to the VoxCeleb Speaker Recognition Challenge (VoxSRC), as well as the announcement of the challenge winners.

The workshop was held in conjunction with Interspeech 2020. It took place on 30th October 2020 and awas be held entirely online.

You could see the information of all series of this workshop on this website.

Schedule

The workshop was held from 7pm - 10pm Shanghai time.

7:00pmIntroduction: "VoxCeleb, VoxConverse and VoxSRC", Arsha Nagrani, Joon Son Chung and Andrew Zisserman [slides]
7:25pmKeynote Talk: Daniel Garcia-Romero, "X-vectors: Neural Speech Embeddings for Speaker Recognition" [video]
8:00pm Announcements: Leaderboards and winners for Track 1, 2 and 3 [slides]
8:05pmParticipant talks from Tracks 1, 2 and 3
Team JTBD [slides] [video]
Team xx205 [slides] [video]
Team DKU-DukeECE [slides] [video]
8:50pmCoffee Break
9:10pmKeynote Talk: Shinji Watanabe, "Tackling Multispeaker Conversation Processing based on Speaker Diarization and Multispeaker Speech Recognition" [video]
9:40pm Announcements: Leaderboards and winners for Track 4 [slides]
9:42pmParticipant talks from Track 4
Team mandalorian [slides] [video]
Team landini [slides] [video]
10:00pmWrap Up Discussions and closing

Participant talks


Track 1, 2 and 3

Team JTBD Team xx205 Team DKU-DukeECE

Track 4

Team mandalorian Team landini

Technical reports


Team   Track   File  
JTBD 1,2,3 arXiv
DKU-DukeECE   1,3,4 arXiv
Veridas 1,2 PDF
xx205 1,2 arXiv
BUT-Omilia   1,2 PDF
DSP 1 PDF
EML 1 arXiv
ID R&D 1 PDF
NSYSU+CHT 1 PDF
NTNU 1 PDF
ShaneRun 1 PDF
SpeakIn 1 PDF
TalTech 1 PDF
clovaai 1 arXiv
Tongji 1 arXiv
Tongji-UG 1 arXiv
Takoyaki 2 PDF
UPC 3 arXiv
Sogou 4 PDF
BUT 4 arXiv
Microsoft 4 arXiv
Huawei 4 PDF

Workshop Registration

Registration to the workshop can be done via Eventbrite. Since spots are limited, please register early! One registration per participant only. The Zoom link for the workshop will only be sent to registered participants.

If you are looking for registration on the actual challenge, see our VoxCeleb Speaker Recognition Challenge (VoxSRC) page.

Keynote Speakers



Daniel Garcia-Romero

Title

X-vectors: Neural Speech Embeddings for Speaker Recognition

Abstract

The state-of-the-art in text-independent speaker recognition is represented by DNN embeddings (x-vectors) that summarize speaker characteristics over an entire recording and generalize well beyond the speakers in the training set. In this talk, I will present a behind-the-scenes account of the journey from our first attempt at end-to-end speaker recognition to our most recent x-vector system that achieved top performance at the most recent NIST SRE19 speaker recognition evaluation. I will discuss the challenges, lessons learned, and motivations behind our decision process. Additionally, I will show the evolution of the DNN architectures and training approaches. Performance results will be provided for conversational telephone speech, audio from videos, and far-field multi-speaker recordings of natural spoken interactions.

Biography

Daniel Garcia-Romero is a Senior Research Scientist at the Johns Hopkins University in the Human Language Technology Center of Excellence. His research interests are in the broad areas of speech processing, deep learning, and multi-modal person identification. For the past few years he has been working on deep neural networks for speaker, language recognition, and diarization. He is co-inventor of the x-vector embeddings that have set the state-of-the-art in these fields. His previous work includes significant contributions to probabilistic modeling of speaker representations for domain adaptation and noise robustness. Prior to joining JHU, he completed his Ph.D. in Electrical Engineering at the University of Maryland, College Park.

Video





Shinji Watanabe

Title

Tackling Multispeaker Conversation Processing based on Speaker Diarization and Multispeaker Speech Recognition

Abstract

Recently, speech recognition and understanding studies have been shifting their focuses from single-speaker automatic speech recognition (ASR) in controlled scenarios to more challenging and realistic multispeaker conversation analysis based on ASR and speaker diarization. The CHiME speech separation and recognition challenge is one of the attempts to tackle these new paradigms. This talk first describes the introduction and outcome of the latest CHiME-6 challenge focusing on the recognition of multispeaker conversations in a dinner party scenario. The second part of this talk is to tackle this multispeaker conversation analysis based on an emergent technique based on an end-to-end neural architecture. We introduce our recent attempts of speaker diarization based on an end-to-end approach including basic concepts, on-line extensions, and handling unknown numbers of speakers.

Biography

Shinji Watanabe is an Associate Research Professor at Johns Hopkins University, Baltimore, MD. He received his B.S., M.S., and Ph.D. (Dr. Eng.) degrees from Waseda University, Tokyo, Japan. He was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan, from 2001 to 2011, a visiting scholar in Georgia institute of technology, Atlanta, GA in 2009, and a Senior Principal Research Scientist at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA USA from 2012 to 2017. His research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing. He has been published more than 200 papers in peer-reviewed journals and conferences, and received several awards including the best paper award from the IEEE ASRU in 2019. He served as an Associate Editor of the IEEE Transactions on Audio Speech and Language Processing, and has been a member of several technical committees including the IEEE Signal Processing Society Speech and Language Technical Committee (SLTC) and Machine Learning for Signal Processing Technical Committee (MLSP).

Video

Organisers

Arsha Nagrani, VGG, University of Oxford,
Joon Son Chung, Naver, South Korea,
Andrew Zisserman, VGG, University of Oxford,
Jaesung Huh, VGG, University of Oxford,
Ernesto Coto, VGG, University of Oxford,
Andrew Brown, VGG, University of Oxford,
Weidi Xie, VGG, University of Oxford,
Mitchell McLaren, Speech Technology and Research Laboratory, SRI International, CA,
Douglas A Reynolds, Lincoln Laboratory, MIT.

Please contact arsha[at]robots[dot]ox[dot]ac[dot]uk if you have any queries, or if you would be interested in sponsoring this challenge.

Sponsors

The VoxCeleb Speaker Verification Challenge and Workshop are proudly sponsored by:

Acknowledgements

This work is supported by the EPSRC programme grant Seebibyte EP/M013774/1: Visual Search for the Era of Big Data.