Programme for WIMP 2019

Thursday 5th September

Welcome Concert

Centrala, Digbeth, 18:00-01:00

The close of the DAFx conference is also the opening of WIMP 2019. All WIMP and DAFx visitors are welcome to attend the evening concert.

The concert will be ticketed, so make sure to bring your WIMP or DAFx registration ticket with you.

More information can be found on the DAFx Website

Friday 6th September


Parkside Foyer, 08:00-13:00

Collect your WIMP ticket for the conference. If you have a DAFx ticket you will not need another WIMP ticket.

Keynote: Lydia Gregory

The Shell, Parkside, 09:20-10:20

Whilst in recent years there have been significant advances in fundamental machine learning research, and in its use in applied fields such as MIR, there are still significant challenges in translating relevant research into commercial solutions and implementing those solutions successfully within organisations. Interestingly, these challenges are mostly not to do with the technology itself, but to do with people, culture, working practises and aligning commercial and research goals. It is about the interaction of the human system of business, with the human system of technology development, and with the requirements and nature of the technology itself. In short, the challenges are, to borrow a term from sociology, about the ‘sociotechnical system’ of machine learning. Working out best practises offers opportunities both for businesses and researchers.


Coffee Break

Parkside Foyer, 10:20-10:40

Paper Session 1

The Shell, Parkside, 10:40-12:00

Chair: Leonardo Gabrielli

Adding the Room to the Mix: Perceptual Aspects of Modal Resonance in Live Audio

Carlo Bolla, Alessandro Palladini, Bruno M. Fazenda

The problem of room acoustics correction in live sound is still an open one. In particular, the audible artefacts caused by low frequency resonances are still a major factor in determining the perceived quality of a live show.

Despite many years of research into room acoustics correction, very little research has been done into how this influences the sound at large venues and the difficulties this causes to the working practices of live sound engineers.

In this work, we show how perceptual models of modal resonance in rooms can be applied to designing a novel room response analysis tool that can be used to intelligently guide the mix process and to design music production tools that facilitate the work of sound engineers.

View PDF

Justification and Theory for a Frequency-Specific Spectral Panning tool based on Duplex Theory

Niall Garry

The following paper is an investigation into duplex theory of sound localization and the justification for a spatial audio application based on the theory. Functions and filters are defined for the time/phase differences and the amplitude/spectral differences between each ear caused by sounds originating from a specific location relative to the listener. These transformations are applied within a standalone application created in MATLAB that can take a monoaural sound file, apply a specific spatial characteristic relative to azimuthal angle, elevation angle and head radius, and output a stereo sound file.

View PDF

An Investigation Towards Verbally Controllable Graphic Equalizer for Singing Voices

Seiya Masuda, Eriko Aiba and Tetsuro Kitahara

This paper presents an investigation of the relationship between the equalization of a singing voice and its verbal evaluation. Participants were asked to listen to sound stimuli generated with different equalization settings and evaluate their timbres with respect to 10 words (warmness, presence, showiness, muddiness, mellowness, softness, brightness, lightness, thickness, and clearness). The mapping between the equalization settings and verbal evaluations were obtained with multivariate linear regression. The obtained results support findings described in know-how books for hobby musicians: for example, warmness, muddiness, mellowness, and softness are enhanced when a low pitch range is boosted while showiness, brightness, clearness, and presence are enhanced when a high pitch range is boosted. We also implemented a prototype of a system that estimates an equalization setting from a verbal evaluation vector.

View PDF

Sponsor Demonstrations


Our WIMP sponsors will be performing demonstrations during the Lunch time session



Lunch will be provided for all our members. Coffee and Tea will also be available

Keynote: Vesa Välimäki

The Shell, Parkside, 13:20-14:20

This keynote talk will give an overview of graphic equalizers (EQ). Any intelligent EQ method must rely on automatic design, which should not deviate much from the target magnitude response. Today we can design highly accurate cascade and parallel graphic EQ filters. However, former designs have been surprisingly inaccurate. A cascade graphic EQ consists of a chain of parametric EQ filters, which may be based on various alternative coefficient formulas. This presentation shows that choice of the parametric EQ design has a major impact on the accuracy of the graphic EQ, as it determines the interaction between the filter bands. The filter gains must be different from the target gains, and they are optimized using the least squares (LS) method. A parallel graphic EQ filter is more difficult to design accurately than a cascade one, because the design needs to account for the phase response of each band filter. A novel series-to-parallel conversion technique offers a simple solution, as it enables the design of a parallel EQ based on the cascade one. This presentation finally explains a recent idea of controlling the graphic EQ with a neural network: A multilayer perceptron predicts quickly and precisely the filter gains from target gains, replacing the LS optimization.


Coffee Break

Parkside Foyer, 14:20-14:40

Paper Session 2

The Shell, Parkside, 14:40-16:00

Chair: Brecht De Man

Music Reconstruction using Dynamic Episodic Memory

Vidya Rangasayee and Chaofei Fan

In this paper we present a novel approach to music generation that uses episodic memory to reconstruct or denoise a partial input from memory. In particular, we are using an implementation of Sparse Distributed Memory called Dynamic Kanerva Machine (DKM). DKM is a probabilistic generative model that is very effective in denoising. Our model was trained on the Bach Chorales dataset and tested on episodes for which the second half is masked. We show that DKM can display behaviors similar to human episodic memory (e.g., pattern storage and retrieval given a noisy or partial cue), with the model achieving good results in denoising and reconstruction with partial cues. Future work involves more accurate modeling of human episodic memory and using it for music generation.

View PDF

Unsupervised Single Channel Source Separation Autoencoders

André Bergner and Kevin M. Webster

Deep learning models have greatly improved the performance of audio source separation models, and there is an emerging trend towards end-to-end learning for this task, dispensing with traditional STFT preprocessing of the audio signal. However, most of the developments have still focused on supervised training of neural networks, and require a considerable amount of training data. In this paper, we propose an alternative deep learning-based formulation that is completely unsupervised, and additionally requires no pre-training of the neural network. Instead, the source separation task is learned and executed at run time. We demonstrate a proof of concept for this architecture on both synthetic and audio data.

View PDF

Trainable Data Manipulation with Unobserved Instruments

Carl Southall, Ryan Stables and Jason Hockman

Machine learning algorithms are the core components in a wide range of intelligent music production systems. As training data for these tasks is relatively sparse, data augmentation is often used to generate additional training data by slightly altering existing training data. User-defined techniques require a long parameter tuning process and typically use a single set of global variables. To address this, a trainable data manipulation system, termed player vs transcriber, was proposed for the task of automatic drum transcription. This paper expands the player vs transcriber model by allowing unobserved instruments to also be manipulated within the data augmentation and sample addition stages. Results from two evaluations demonstrate that this improves performance and suggests that trainable data manipulation could benefit additional intelligent music production tasks.

View PDF

Keynote: Chris Pike

The Shell, Parkside, 16:00-17:00

The role of a broadcaster has changed rapidly in recent years, expanding from one-to-many radio-communication services to provide a wide range of new media services and experiences, often delivered over the internet. BBC Research & Development has a remit to keep the BBC at the forefront of technological developments, for the benefit of UK audiences. This talk will cover ongoing research towards more intelligent media production, with services that are more responsive to the needs of the audience and experiences that are more immersive, personalisable, and accessible.



We are pleased to introduce our three keynote speakers for WIMP5

Lydia Gregory FeedForward AI

Lydia Gregory

Lydia’s experience and interest spans music, technology & business.After studying Music at Oxford University, she joined Accenture as a management consultant aligned to technology & financial services, where she worked on delivering technical solutions & data migration projects. She has worked in early-stage music companies, including as Head of Growth at Jukedeck, where she led the team to win awards such as Cannes Innovation Lion 2016 and BIMA Startup of the Year 2017. Lydia has been active in music as a classical singer, touring globally including to China, Japan, Israel & the USA. She has sung on critically-acclaimed recordings released on labels such as Hyperion and Naxos, and performed in many ensemble concerts. She regularly speaks on topics including the relevance of machine learning to the real world, debunking A.I., and building inclusivity into technology. She is particularly interested in using technology to enable human creativity and in the application of machine learning to art and sound. She is a member of the BIMA A.I. Think Tank and a fellow of the RSA, and has recently completed a Postgraduate Certificate in Data Science at University of London, Birkbeck College.

FeedForward AI

Chris Pike BBC R&D

Chris Pike

Chris leads the audio team in BBC R&D and the BBC Audio Research Partnership. He is passionate about using technology innovation to enable new creative possibilities in sound production and storytelling. Chris has led the BBC's work on spatial audio for several years, which has led to productions on major brands such as Doctor Who, Planet Earth II and the BBC Proms. He was director of sound on the BBC's first public VR app, The Turning Forest, and worked with Björk to create an augmented reality audio guide for her exhibition at MoMA in New York. As part of his role at the BBC, Chris is active in standardisation bodies, working to ensure open interoperable technology for spatial audio production. He did his PhD with the Audio Lab at the University of York, working on quality evaluation of binaural rendering


Vesa Välimäki Aalto University, Finland

Chris Pike

Vesa Välimäki is a Full Professor of audio signal processing at Aalto University, Espoo, Finland. He received his doctorate from the Helsinki University of Technology, Espoo, Finland, 1995. In 2008-2009, he was a Visiting Scholar at the Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, Stanford, CA, USA. Currently he is also the Vice Dean for Research in electrical engineering at Aalto University. His research group belongs to the Aalto Acoustics Lab, a multidisciplinary center with excellent facilities for sound-related research at Aalto University. He and his research group are part of the Nordic Sound and Music Computing Network (NordicSMC), which is funded by NordForsk. His research interests include digital filter design, audio effects processing, artificial reverberation, sound synthesis, and signal processing for headphones and loudspeakers.

Prof. Välimäki is a Fellow of the AES (Audio Engineering Society) and a Fellow of the IEEE (Institute of Electrical and Electronics Engineers). Since 2015, he has been a Senior Area Editor of the IEEE/ACM Transactions on Audio, Speech and Language Processing. He was the Chairman of the International Conference on Digital Audio Effects DAFx in 2008, and the Chairman of the Sound and Music Computing Conference SMC in 2017.

Vesa Välimäki

Aalto Acoustics Lab