Meta has released its own AI music generator called ‘MusicGen,’ trained on 10,000 licensed music tracks.
The AI music generator works much like Google’s MusicLM, generating a snippet of about 12 seconds of audio based on a text prompt. I experimented with MusicLM’s model upon its initial release and discovered it’s pretty great at generating electronic music and synthwave, but not much else. MusicGen wants to be better at a wide variety of genres.
MusicGen was trained on 20,000 hours of music that includes 10,000 “high-quality” license tracks and 390,000 instrument-only tracks from ShutterStock and Pond5. While the model itself is open source, Meta has not provided the code it used to train the model. Instead, pre-trained models are available for download. The results from both MusicGen and MusicLM aren’t going to be putting musicians out of a job anytime soon.
Prompting an acceptable piece of audio from a text-to-audio AI means understanding how to describe what you want to hear. Simple prompts like ‘ambient chiptune music’ are so open-ended that simply re-feeding the prompt to the music generator will generate wildly different songs after each generation.
Meanwhile, a prompt like “Slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive,” will help the language model build something that sounds very similar after each successful generation. As generative AI progresses, these language models will become better at producing sound that is pleasing to the human ear—if a bit soulless.
It also means the era of deepfake music is about to get even harder to distinguish as these models get more use. We’ve already seen viral tracks like “Heart On My Sleeve” dominate social media like TikTok and YouTube. The Big Three labels are discussing new AI provisions to help combat deep-faked music that borrows musical concepts to create a five-finger discount mash-up of popular artists’ established sound.