AudioSR: Versatile Audio Super-resolution at Scale
Haohe Liu1, Ke Chen2, Qiao Tian3, Wenwu Wang1, Mark D. Plumbley1
1CVSSP, University of Surrey
2University of California San Diego
3Speech, Audio & Music Intelligence (SAMI), Bytedance
Abstract
Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, speech) and specific bandwidth settings they can handle (e.g., 4 kHz to 8 kHz). We introduce a diffusion-based generative model, AudioSR, that is capable of performing robust audio super-resolution on versatile audio types, including sound effects, music, and speech. Specifically, AudioSR can upsample any input audio signal within the bandwidth range of 2 kHz to 16 kHz to a high-resolution audio signal at 24 kHz bandwidth with a sampling rate of 48 kHz. Extensive objective evaluation on various audio super-resolution benchmarks demonstrates the strong result achieved by the proposed model. In addition, our subjective evaluation shows that AudioSR can acts as a plug-and-play module to enhance the generation quality of a wide range of audio generative models, including AudioLDM, Fastspeech2, and MusicGen.
Audio Super-Resolution across Different Models
Speech:
8 kHz Sample | NVSR-DNN | NVSR-ResUNet | AudioSR | Ground-Truth |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sound Effect:
8 kHz Sample | NVSR-DNN | NVSR-ResUNet | AudioSR | Ground-Truth |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Music:
8 kHz Sample | AudioSR | Ground-Truth |
|
|
|
|
|
|
|
|
|
AudioSR on Improving Generative Models
FastSpeech 2 (Text-to-Speech, 22.05 kHz to 48 kHz):
The "Generation" column is the model output (e.g., from MusicGen) after the pre-processing step introduced in the paper.
Generation | AudioSR Improvement |
|
|
|
|
|
|
AudioLDM (Text-to-Audio, 16 kHz to 48 kHz):
Generation | AudioSR Improvement |
|
|
|
|
|
|
MusicGen (Text-to-Music, 32 kHz to 48 kHz):
Generation | AudioSR Improvement |
|
|
|
|
|
|
More demos are on their way. Stay tuned for more updates.