NVIDIA reveals AI model for sound production

Published: 26 Nov 2024 - 12:14 pm | Last Updated: 26 Nov 2024 - 12:15 pm

File photo for representational purposes only

QNA

Washington: NVIDIA has unveiled a new experimental AI model called "Foundational Generative Audio Transformer Opus 1", or Fugatto, a model that is essentially dedicated to sound.

This model is characterized by its ability to create or modify music, audio and audio files based on text prompts. It was designed by a team of AI researchers from around the world, making the model's multi-accent and multilingual capabilities stronger.

"We wanted to create a model that understands and generates sound like humans do," said Rafael Valle, a manager of applied audio research at NVIDIA.

Music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments.

Language learning tools could be personalized to use any voice a speaker chooses.

Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game.

Furthermore, researchers found that the model can accomplish tasks not part of its pre-training, with some fine-tuning. It could combine instructions that it was trained on separately, such as generating speech that sounds angry with a specific accent or the sound of birds singing during a thunderstorm.

The model can generate sounds that change over time, as well, such as the sound of a train moving through an area.

Fugatto is not the first of its kind; Meta has previously launched an open source AI kit to create sounds from text descriptions. Google provides its own text-to-music model called MusicLM.