Google recently unveiled SoundStream, an end-to-end neural audio codec. Yes, you read that right; it’s a “neural” audio codec. Above all, Google claims that this is the world’s first AI-powered audio codec that works on speech and music in real-time on a smartphone processor.
Audio codecs are basically tools that compress audio files to make them smaller in size and save time wherever possible. Hence, it might seem obvious that audio codecs are essential for streaming and/or transmitting audio over any service such as online voice and video calls.
Over the past few years, audio codecs have come a long way. The crisp and clear audio that you hear on online calls is one consequence of that. In an ideal condition, the compressed audio should be indifferent to the source audio. Although, compression has its tolls. That’s where we benefit from SoundStream.
What is Google SoundStream?
Earlier this year, Google released Lyra, a neural audio codec for low-bitrate speech. Similarly, SoundStream is an extended version of Lyra. It consists of Lyra’s capabilities around low-bitrate speech and much more. SoundStream aims to provide high-quality audio and expand the support for encoding more sound types, including clear speech, noisy speech, music, and more.
SoundStream is built around a neural network system consisting of an encoder, decoder, and quantizer. The encoder converts audio into a coded signal that’s compressed using the quantizer and converted back to audio using the decoder.
Consequently, after training the neural network model, the encoder and decoder can work on different clients. This will ultimately help in transmitting audio without losing quality.
In the older audio codecs, different modules contributed to the removal of background noise. However, SoundStream aims to do that at the same time without using any other components. SoundStream at 3 kbps outperforms the popular audio codec Opus at 12 kbps and comes close to another popular codec ECS at 9 kbps.
Note that popular video conferencing platforms, including Google Meet and streaming services such as YouTube, use the Opus audio codec. From a ‘big picture’ point-of-view, we could see drastic changes to online audio transmission soon.
Google has posted various audio compression samples compared against the original audio samples on this website. Certainly, the results blew my mind. The compression results of SoundStream versus other codecs were on a different level. The voice compression was quite close to the source. You can listen to the comparisons and judge them for yourself.
In conclusion, SoundStream can compete with the current audio codecs and outperform them easily shortly. Google will release SoundStream as a part of the next version of Lyra. In the end, SoundStream and Lyra will integrate to help the developers better use them.