Program
Important Info
Location: Building 2, Room: 2.1.1, Google Maps
Address: Piazza Leonardo da Vinci 32, 20133 Milano
Virtual: Zoom Webinar (link will be provided via registration mail)
Schedule
Time (CET) | Topic | Authors |
---|---|---|
14:00 | Welcome message | Yuki Mitsufuji, Fabian Stöter |
14:05 | 🏆 SDX'23 Challenge: Summary & Winner announcements) | Giorgio Fabbro, Igor Gadelha, Stefan Uhlich |
14:45 | 🎓__Keynote:__ Defining "Source" in Audio Source Separation | Gordon Wichern |
- | Oral Session 1: Insights and Lessons Learned from SDX Participants | |
15:15 | Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data | Junghyun Koo, Yunkee Chae, Chang-Bin Jeon, Kyogu Lee |
15:30 | Multi-Resolution and Noise Robust Methods for Audio Source Separation | Nabarun Goswami, Tatsuya Harada |
15:45 | Benchmarks and leaderboards for sound demixing tasks | Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva |
16:00 | BS-RoFormer: The SAMI-ByteDance Music Source Separation System for Sound Demixing Challenge 2023 | Ju-Chiang Wang, Wei-Tsung Lu, Qiuqiang Kong, Yun-Ning Hung |
16:15 | Tencent AI Lab’s CDX 2023 System | Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu |
16:30 | ☕️ Coffee Break (Virtual: Hangout in Breakout Room) | |
17:00 | 🎓 Keynote: Differentiable audio signal processors | Christian Steinmetz |
- | Oral Session 2: Audio Source Separation | |
17:30 | The Mixology Dataset | Michael Clemens |
17:45 | Zero-Shot Duet Singing Voices Separation with Diffusion Models | Chin-Yun Yu, Emilian Postolache, Emanuele Rodolà, György Fazekas |
18:00 | The need for causal, low-latency sound demixing and remixing to improve accessibility | Gerardo Roa Dabike, Michael A. Akeroyd, Scott Bannister, Jon Barker, Trevor J. Cox, Bruno Fazenda, Jennifer Firth, Simone Graetzer, Alinka Greasley, Rebecca Vos, William Whitmer |
18:15 | StemGMD: A Large-Scale Multi-Kit Audio Dataset for Deep Drums Demixing | Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti |
18:30 | Panel: Future of SDX Challenge (Things to keep/improve, new tasks, ...) | Stefan Uhlich, Giorgio Fabbro, Fabian Stöter, Igor Gadelha |
19:00 - | Social |
Keynotes
Gordon Wichern (MERL): Defining "Source" in Audio Source Separation
The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research. In the classical setup, it is generally clear that the source of interest is one speaker among the several simultaneously talking at the party. However, with the explosion of purely data-driven techniques, it is now possible to separate nearly any type of sound from a wide range of signals including non-professional ambient recordings, music, movie soundtracks, and industrial machines. This increase in flexibility has created a new challenge: defining how a user specifies the source of interest. To better embrace this ambiguity, I will first describe how we use hierarchical targets for training source separation networks, where the model learns to separate at multiple levels of granularity, e.g., separate all music from a movie soundtrack in addition to isolating the individual instruments. These hierarchical relationships can be further enforced using hyperbolic representations inside the audio source separation network, enabling novel user interfaces and aiding model explainability. Finally, I will discuss how we incorporate the different meanings for “source” into source separation model prompts using qualitative audio features, natural language, or example audio clips. Gordon Wichern
Bio: Gordon Wichern is a Senior Principal Research Scientist at Mitsubishi Electric Research Laboratories (MERL) in Cambridge, Massachusetts. He received his B.Sc. and M.Sc. degrees from Colorado State University and his Ph.D. from Arizona State University. Prior to joining MERL, he was a member of the research team at iZotope, where he focused on applying novel signal processing and machine learning techniques to music and post-production software, and before that a member of the Technical Staff at MIT Lincoln Laboratory. He is the Chair of the AES Technical Committee on Machine Learning and Artificial Intelligence (TC-MLAI), and a member of the IEEE Audio and Acoustic Signal Processing Technical Committee (AASP-TC). His research interests span the audio signal processing and machine learning fields, with a recent focus on source separation and sound event detection.
Christian Steinmetz (QMUL): Differentiable audio signal processors
Large-scale deep generative models have enabled new applications in audio creation and processing. However, these methods often require significant compute which restrict real-time operation, they may introduce artifacts, and they ultimately lack grounding in signal processing operations, limiting controllability and interpretability. Furthermore, many audio production tasks can be addressed with traditional signal processing tools, such as audio effects, but they require expert operation. This motivates differentiable signal processing, which allows for the integration of classic signal processing operations within the gradient-based learning environment, enabling data-driven intelligent operation of these algorithms. This talk will provide an overview of differentiable signal processing techniques, highlighting their benefits and limitations, and offer practical guidance for their implementation. Central to the presentation will be the introduction of DASP — Differentiable Audio Signal Processors, a new open-source tool built in PyTorch. DASP offers a range of differentiable audio effects, and we will detail some potential applications, including blind parameter estimation, virtual analog modeling, automatic equalization, and audio production style transfer. To conclude, we'll discuss existing challenges in creating differentiable audio signal processors and suggest potential areas for future exploration.
Bio: Christian Steinmetz is a PhD researcher with the Centre for Digital Music at Queen Mary University of London advised by Joshua Reiss. His research focuses on applications of machine learning for audio signal processing with a focus on high fidelity audio and music production. His work has investigated methods for enhancing audio recordings, automatic and assistive systems for audio engineering, as well as applications of machine learning that augment creativity. He has worked as a research scientist intern at Adobe, Meta, Dolby, and Bose. Christian holds a BS in Electrical Engineering and BA in Audio Technology from Clemson University, as well as an MSc in Sound and Music Computing from the Music Technology Group at Universitat Pompeu Fabra.