I am a final-year Ph.D. student from the school of cyber science and engineering, Wuhan University, where I specialize in audio signal analysis and multimodal learning. My academic journey began at Wuhan University, where I earned my Bachelor’s degree in 2020 with a thesis on spoofed speech detection. My research focuses primarily on audio event detection and multimodal learning. Currently, I’m focusing on audio-visual event localization on portrait mode short videos while keeping a keen eye on advancements in Multimodal Large Language Models (MLLMs).
When not immersed in research works, I recharge by strumming my guitar or diving into Dota 2. Check out this page if you’d like to know the human behind the research.
For collaboration or discussion about audio AI, LLM, multimodal systems, or the occasional off-topic thought, feel free to reach out.
My research begins with audio signal processing and spans to multimodal learning. Currently, I’m focusing on:
ICASSP
ICASSP
ICIP
Powered by Jekyll and Minimal Light theme.