Abstract: In this work, we propose CleanMel, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance ...
Abstract: Recent advancements in singing voice synthesis have significantly improved the quality of artificial singing voices, raising concerns about their potential misuse in generating deepfake ...
--audio /path/to/song.wav --mode blurred --blur-sigma 3.0 ...
frame_rate (int): The frame rate per second of the video. Default: 30. sample_rate (int): The sample rate for audio sampling. Default: 16000. num_mels (int): Number of channels of the melspectrogram.