A University of Washington team recently pondered if they could recreate the magic that happens between performers and their instruments while performing, using only visual cues— in this case, a silent, top-down video of someone playing on the piano. The researchers made use of machine learning to create a system called Audeo, which creates audio from silent piano performances. When the team tested the music created by Audeo with music-recognition apps like SoundHound, the apps accurately identified the piece played by Audeo 86% of the time. For comparison, these apps identify the piece in the audio tracks from the source videos 93% of the time.
The researchers presented their work on Audeo on December 8th at the NeurIPS 2020 conference.
‘To create music that sounds like it could be played in a musical performance was previously believed to be impossible’, said senior author Eli Shlizerman, Assistant Professor in Applied Mathematics department, as well as the Electrical and Computer Engineering departments. ‘An algorithm needs to figure out the cues or “features”, in the video frames that are related to generating music, and it needs to “imagine” the sound that’s happening in between the video frames. It requires a system that is both precise and imaginative. The fact that we achieved music that sounded pretty good was a surprise.’
Audeo makes use of a set of steps to decipher what’s happening in the video and translates that into music. It has to first detect which keys are pressed in each video frame to create a diagram over time. It then needs to translate that diagram into something a music synthesizer could recognize as a sound a piano would make. This second step taken cleans up the data and includes more information, like how strongly each key is played and for how long.
‘If we attempt to synthesize music from the first step alone, we would find the quality of the music to be unsatisfactory’, said Shlizerman. ‘The second step is like how a teacher goes over a student composer’s music and helps enhance it’.
The system was trained and tested by the researchers using YouTube videos of Paul Barton, the pianist. The training consisted of over 172,000 video frames of Barton playing the music of popular composers such as Bach and Mozart. Then they tested Audeo with about 19,000 frames of Barton playing different music from these composers and others, like Scott Joplin.
Once a transcript of the music has been generated by Audeo, the next step is to give it to a synthesizer that can translate it into sound. Every synthesizer makes the music sound a bit different—somewhat similar to changing the ‘instrument’ setting on an electric keyboard. For the purpose of this study, the researchers used two different synthesizers.
‘Fluidsynth makes synthesizer piano sounds that we are familiar with. These are somewhat mechanical-sounding but pretty accurate’, said Shlizerman. ‘We also used PerfNet, a new AI synthesizer that generates richer and more expressive music. But it also generates more noise’.
Audeo’s training and testing was done using only Paul Barton’s piano videos. According to Shlizerman, further research is required to see if it can transcribe music for any musician or piano.
‘The goal of this study was to see if artificial intelligence could generate music that was played by a pianist in a video recording— though we were not aiming to replicate Paul Barton because he is such a virtuoso’, said Shlizerman. ‘We hope that out study enables novel ways to interact with music. For example, one future application is that Audeo can be extended to a virtual piano with a camera recording just a person’s hands. Also, by placing a camera on top of a real piano, Audeo could potentially assist in new ways of teaching students how to play.’
The co-authors of this paper are Kun Su and Xiulong Liu, both doctoral students in electrical and computer engineering. This research was funded by the Washington Research Foundation Innovation Fund, as well as as the Applied Mathematics and Electrical and Computer Engineering departments.
By Marvellous Iwendi.
Source: UW News