Program

Important Info

Location: Building 2, Room: 2.1.1, Google Maps
Address: Piazza Leonardo da Vinci 32, 20133 Milano
Virtual: Zoom Webinar (link will be provided via registration mail)

Schedule

Time (CET)	Topic	Authors
14:00	Welcome message	Yuki Mitsufuji, Fabian Stöter
14:05	🏆 SDX'23 Challenge: Summary & Winner announcements)	Giorgio Fabbro, Igor Gadelha, Stefan Uhlich
14:45	🎓__Keynote:__ Defining "Source" in Audio Source Separation	Gordon Wichern
-	Oral Session 1: Insights and Lessons Learned from SDX Participants
15:15	Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data	Junghyun Koo, Yunkee Chae, Chang-Bin Jeon, Kyogu Lee
15:30	Multi-Resolution and Noise Robust Methods for Audio Source Separation	Nabarun Goswami, Tatsuya Harada
15:45	Benchmarks and leaderboards for sound demixing tasks	Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva
16:00	BS-RoFormer: The SAMI-ByteDance Music Source Separation System for Sound Demixing Challenge 2023	Ju-Chiang Wang, Wei-Tsung Lu, Qiuqiang Kong, Yun-Ning Hung
16:15	Tencent AI Lab’s CDX 2023 System	Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu
16:30	☕️ Coffee Break (Virtual: Hangout in Breakout Room)
17:00	🎓 Keynote: Differentiable audio signal processors	Christian Steinmetz
-	Oral Session 2: Audio Source Separation
17:30	The Mixology Dataset	Michael Clemens
17:45	Zero-Shot Duet Singing Voices Separation with Diffusion Models	Chin-Yun Yu, Emilian Postolache, Emanuele Rodolà, György Fazekas
18:00	The need for causal, low-latency sound demixing and remixing to improve accessibility	Gerardo Roa Dabike, Michael A. Akeroyd, Scott Bannister, Jon Barker, Trevor J. Cox, Bruno Fazenda, Jennifer Firth, Simone Graetzer, Alinka Greasley, Rebecca Vos, William Whitmer
18:15	StemGMD: A Large-Scale Multi-Kit Audio Dataset for Deep Drums Demixing	Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti
18:30	Panel: Future of SDX Challenge (Things to keep/improve, new tasks, ...)	Stefan Uhlich, Giorgio Fabbro, Fabian Stöter, Igor Gadelha
19:00 -	Social

Keynotes

Gordon Wichern (MERL): Defining "Source" in Audio Source Separation

The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research. In the classical setup, it is generally clear that the source of interest is one speaker among the several simultaneously talking at the party. However, with the explosion of purely data-driven techniques, it is now possible to separate nearly any type of sound from a wide range of signals including non-professional ambient recordings, music, movie soundtracks, and industrial machines. This increase in flexibility has created a new challenge: defining how a user specifies the source of interest. To better embrace this ambiguity, I will first describe how we use hierarchical targets for training source separation networks, where the model learns to separate at multiple levels of granularity, e.g., separate all music from a movie soundtrack in addition to isolating the individual instruments. These hierarchical relationships can be further enforced using hyperbolic representations inside the audio source separation network, enabling novel user interfaces and aiding model explainability. Finally, I will discuss how we incorporate the different meanings for “source” into source separation model prompts using qualitative audio features, natural language, or example audio clips. Gordon Wichern

Bio: Gordon Wichern is a Senior Principal Research Scientist at Mitsubishi Electric Research Laboratories (MERL) in Cambridge, Massachusetts. He received his B.Sc. and M.Sc. degrees from Colorado State University and his Ph.D. from Arizona State University. Prior to joining MERL, he was a member of the research team at iZotope, where he focused on applying novel signal processing and machine learning techniques to music and post-production software, and before that a member of the Technical Staff at MIT Lincoln Laboratory. He is the Chair of the AES Technical Committee on Machine Learning and Artificial Intelligence (TC-MLAI), and a member of the IEEE Audio and Acoustic Signal Processing Technical Committee (AASP-TC). His research interests span the audio signal processing and machine learning fields, with a recent focus on source separation and sound event detection.

Christian Steinmetz (QMUL): Differentiable audio signal processors

Large-scale deep generative models have enabled new applications in audio creation and processing. However, these methods often require significant compute which restrict real-time operation, they may introduce artifacts, and they ultimately lack grounding in signal processing operations, limiting controllability and interpretability. Furthermore, many audio production tasks can be addressed with traditional signal processing tools, such as audio effects, but they require expert operation. This motivates differentiable signal processing, which allows for the integration of classic signal processing operations within the gradient-based learning environment, enabling data-driven intelligent operation of these algorithms. This talk will provide an overview of differentiable signal processing techniques, highlighting their benefits and limitations, and offer practical guidance for their implementation. Central to the presentation will be the introduction of DASP — Differentiable Audio Signal Processors, a new open-source tool built in PyTorch. DASP offers a range of differentiable audio effects, and we will detail some potential applications, including blind parameter estimation, virtual analog modeling, automatic equalization, and audio production style transfer. To conclude, we'll discuss existing challenges in creating differentiable audio signal processors and suggest potential areas for future exploration.

Bio: Christian Steinmetz is a PhD researcher with the Centre for Digital Music at Queen Mary University of London advised by Joshua Reiss. His research focuses on applications of machine learning for audio signal processing with a focus on high fidelity audio and music production. His work has investigated methods for enhancing audio recordings, automatic and assistive systems for audio engineering, as well as applications of machine learning that augment creativity. He has worked as a research scientist intern at Adobe, Meta, Dolby, and Bose. Christian holds a BS in Electrical Engineering and BA in Audio Technology from Clemson University, as well as an MSc in Sound and Music Computing from the Music Technology Group at Universitat Pompeu Fabra.