arxiv:2312.09911

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Published on Dec 15, 2023

· Submitted by

AK on Dec 18, 2023

#1 Paper of the day

Upvote

Authors:

Xueyao Zhang ,

Liumeng Xue ,

Yuancheng Wang ,

Yicheng Gu ,

Zihao Fang ,

Lexiao Zou ,

Chaoren Wang ,

Jun Han ,

Zhizheng Wu

Abstract

Amphion is a toolkit for audio, music, and speech generation that includes model visualizations, vocoders, and evaluation metrics to support reproducible research and training for researchers.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development. Amphion offers a unique feature: visualizations of classic models or architectures. We believe that these visualizations are beneficial for junior researchers and engineers who wish to gain a better understanding of the model. The North-Star objective of Amphion is to offer a platform for studying the conversion of any inputs into general audio. Amphion is designed to support individual generation tasks. In addition to the specific generation tasks, Amphion also includes several vocoders and evaluation metrics. A vocoder is an important module for producing high-quality audio signals, while evaluation metrics are critical for ensuring consistent metrics in generation tasks. In this paper, we provide a high-level overview of Amphion.