Calm AI Launches Calm Audio 2—Can Music Generator Beat Suno 3’s ‘Mind-Blowing’?

Calm Ai Launches Calm Audio 2—Can Music Generator Beat Suno 3'S 'Mind-Blowing'?



Stable AI, a leading artificial intelligence developer committed to an open source ethos, this week released Stable Audio 2, a new audio and music generator. Stable Audio is the first major point release since its launch in September, with a number of improvements that will boost competition between companies like Suno, Google and Meta.

“Stable Audio 2.0 enables high-resolution, full tracks of musical structure up to three minutes long in 44.1 kHz stereo from a single natural language query,” announced Stability AI.

The announcement comes amid a rocky period for stability, with CEO Emad Moustake reportedly running out of cash before resigning two weeks ago.

However, the company continues to push forward in the open source AI space. In addition to Stable Audio, the company released a new Code LMM called Stable Code Instruct 3B on March 25, and last year released an advanced open source text-to-video generator called Stable Video Diffusion.

okex

Stability AI is also set to release Stable Diffusion 3, a more advanced image generator, later this year.

Among open source followers, Serendipity AI plays a leading role with popular names such as Mistral and Nus. Other big tech companies are also exploring the open source space, with Meta and Microsoft making significant contributions.

Internal stability audio

At its core, Stable Audio 2 uses Diffusion Transformer Technology (DT), following the same approach as Stability AI's upcoming Stable Diffusion 3 image generator, marking a change from the previously adopted U-Net technology.

DT and U-Net are both common architectures used in machine learning, but DT is designed to incrementally filter random noise into structured data, making it particularly effective at handling long data sequences. U-Net, in contrast, focuses on accuracy for short generations but is less capable of handling longer, more complex sequences.

Among the major improvements in Stable Audio 2 is audio-to-audio generation, a new feature that allows users to convert the audio samples they upload — from Stable Diffusion's img2img for image enhancement.

“Users can now upload audio samples and convert these samples into rich sounds through natural language queries,” the announcement said. “This update also expands sound effects generation and style transfer, giving artists and musicians more flexibility, control and a more advanced creative process.”

In other words, Stable Audio 2 doesn't start filtering random audio, instead modeling the original audio file to match the user's request. The result is a generation that follows the query but is identical to the reference voice.

The company ensures that Stable Audio 2 is trained only on licensed datasets from the AudioSparx music library. This ensures that all artists have the choice to opt out of the Stable Audio model training, respecting their rights and being fairly compensated.

Decrypt tested the model, and the results showed significant improvements compared to Stable Audio 1.0. The resulting music tracks were more consistent, and generations were longer – twice as long as Version One's 90-second limit.

Stable Audio 2's activation strategy is similar to Stable Diffusion 1.5, focusing heavily on labels or keywords. Natural languages ​​do not give good results.

The model seems better suited for inspiration or background music than substituting properly trained musicians for marquee songs. In many cases, generations suffer from multiple illusions and conflicting voices that differ from desire. However, it often generates good riffs that can be used later.

Stable Audio 2 with Suno 3

As impressive as Stable Audio 2 is—especially compared to its predecessor—its capabilities fade quickly when compared to the sounds and songs created by Suno 3, the flagship audio generator update released a month ago. Many AI enthusiasts say the Suno 3 is the best model in the AI ​​music space, with Kevin Hutson from Futurepedia describing it as “mind-blowing” and MatVidPro a “game changer”.

What makes it interesting – or simply good – is that the soundtrack is relative, but Decrypt attempted a side-by-side comparison of the Stable Audio 2 and Suno 3 using the same questions. It's an imperfect approach because of the differences in their best-inducing methods—Stable Audio chooses keywords, and Suno 3 preserves natural language.

We decided to use the Stability AI approach, even though it might hurt Sunon. Fortunately, the Suno 3 was able to understand our instructions effectively, which makes it reasonable to compare their results.

Still, the calm voice promotion strategy isn't beginner-friendly—using only keywords and tags limits the creativity and sophistication of the results. A typical Suno query, for example, might be “a pop rock song about decryption, a media site covering the AI ​​space.” A typical sound prompt will be something like, “Format: Band | Instruments: Drums, Electric Guitar, Bass, Keyboard,| Genre: Rock | Subgenre: Heavy Metal.

Right out of the gate, the Suno 3 has one big advantage over the competition: In addition to accepting natural language queries, it can generate sentences with the Large Language Model (LLM).

In terms of sound quality, Stable Audio 2 rivals the Suno 3. While Stable AI claims the device can generate music up to three minutes long, the tracks are clearer, less creative and structurally complex. Audio generated by Suno 3. Generations of Suno 3 typically include accurate song structure with natural riffs, choruses, bridges and variations, making the result sound like a complete song rather than a background instrumental track.

What's more, the transitions between riffs in Stable Audio's music generations are often abrupt. This is in stark contrast to the Suno 3, which generally transitions smoothly between different parts of the song, making for a more enjoyable listening experience.

Another significant difference between the two models is the speed of sound generation. The Suno 3 produces audio faster than the Stable Audio 2. While this is a server issue, it's still an important thing to consider, especially for users who want to generate audio quickly and efficiently.

But there's one thing the Stable Audio 2 does that the Suno 3 can't: audio-to-audio generations.

With Stable Audio 2, you whistle the melody of a song, for example, and Stable Audio brings some life to your idea. This is a level of control that Suno users have never had before. While not a deal breaker for us, this could certainly be important to many.

Both Stable Audio and Suno are powerful and worth a try, especially if you have a bug that makes music but no musical talent. But stable audio may need to move to the third version to come within striking distance of the Suno of the same generation.

Edited by Ryan Ozawa.

Stay on top of crypto news, get daily updates in your inbox.



Leave a Reply

Pin It on Pinterest