WeChat Mini Program
Old Version Features

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

IEEE International Conference on Acoustics, Speech, and Signal Processing(2024)

Cited 3|Views12
Abstract
Target speaker extraction (TSE) aims to isolate a specific voice frommultiple mixed speakers relying on a registerd sample. Since voiceprintfeatures usually vary greatly, current end-to-end neural networks require largemodel parameters which are computational intensive and impractical forreal-time applications, espetially on resource-constrained platforms. In thispaper, we address the TSE task using microphone array and introduce a novelthree-stage solution that systematically decouples the process: First, a neuralnetwork is trained to estimate the direction of the target speaker. Second,with the direction determined, the Generalized Sidelobe Canceller (GSC) is usedto extract the target speech. Third, an Inplace Convolutional Recurrent NeuralNetwork (ICRN) acts as a denoising post-processor, refining the GSC output toyield the final separated speech. Our approach delivers superior performancewhile drastically reducing computational load, setting a new standard forefficient real-time target speaker extraction.
More
Translated text
Key words
target speaker extraction,Direction-of-Arrival estimation,Inplace CRN,GSC
PDF
Bibtex
收藏
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined