The best generative AI models: from chatbots to image and video generators
3 weeks ago Benito Santiago![Decrypt logo](https://coinsnewsdesk.com/wp-content/uploads/2024/12/The-best-generative-AI-models-from-chatbots-to-image-and.png)
The generative AI landscape has transformed into a massive battlefield in 2024, with an army of startups once dominated by OpenAI.
It seems like everyone and their tech-savvy grandma is vying for a piece of the AI pie, language models, agency AIs, image generators, and even an A Meme coin shiller or two.
The standards are changing faster than our human abilities. Barely a week goes by without a shiny new toy hitting the market — an updated LLM, a turbocharged image generator out there, or a next-generation AI revolutionizing some unique training technique.
But here at Decrypt, we've rolled up our sleeves and tried them all.
We kicked the wheel, pushed the buttons, and dived into the inner workings and results of some of the most popular AI models — and some less well-known ones.
Now that it's clear that OpenAI isn't the only sheriff in town, we've compiled a list of the cream of the crop—the generative AI models that have amazed us, stunned us, and occasionally made us spit out our coffee.
Table of Contents
ToggleChatbots
A chatbot is a computer program designed to simulate conversations with human users. It uses natural language processing and artificial intelligence to understand user inputs and generate appropriate responses. People often confuse chatbots with LLMs or big language models.
Today, chatbots are a little more complex, capable of more than text generation. Now you can browse the web, create and understand images, talk to the user, etc.
Here's our list of the best chat bots you should try:
Gold Medal: OpenAI's ChatGPT
ChatGPT offers a wide range of features in natural language, a clean interface, web search, and multiple models (including reasoning, writing, visualization, audio, and image generation) for $20/month.
Silver Medal: Anthropologie Clod
An intuitive UI with advanced LLM featuring split-screen artifacts for logic and code generation, Cloud supports millions of simulation contexts and custom agents. However, web search and image generation are lacking and often face capacity problems, forcing users to switch to a poor model or generate “short” short answers. For this reason, it cannot be the best yet.
Bronze Medal: Mistral AI's LeChat
This free platform is powered by Mistral Large, featuring high-level Flux image generation and advanced web search—the best in our opinion, even beating out SearchGPT. It supports document/image recognition and open source AI agents, even though they are text quality competitors. However, Mistral's Big LLM isn't as robust as its competitors, making it ideal for power users who want to trade text quality for features.
Honorable Mentions: Meta AI, Gemini (from Google AI Studio, not the main site), Hug Chat, Satisfied, Grok-2
Large language models
A large-scale language model, or LLM, is an artificial intelligence system trained on large amounts of text data to understand and generate human-like language. You can see it as a respectable autocomplete. They are designed to make the most likely simulation (think of words, even if it is an imprecise comparison) to predict what is in a group.
The result is human-sounding natural text because, well, it resembles what humans might do.
Here's our list of our best LLMs to date:
Best Generalist: OpenAI's GPT-4o
Although the style may seem predictable, it balances creative writing, coding, and reasoning with a customizable “canvas” feature. The latest version (as of November 20) achieved the highest rating in the LLM Arena with an ELO score of 1,366, beating the trial version of Google Gemini released on November 21.
Best to write: Anthropic Cloud 3.5 Sonnet
In many areas it is similar to or better than GPT-4o, with more creative, human-like output, although prone to distortion.
Best for Storytelling: Long Writer
Creates 10,000+ word stories in minutes. Need we say more?
The most versatile: Meta Lama-3.1
A leading open-source model with extensive customization, LoRA creation and fine-tuning options ranging from 7 billion to 405 billion standards, users can run it on their local machines or cloud servers as per their needs. Nvidia has developed a custom version called “Nemotron” that has made some waves in the community and is worth checking out.
Biggest Distraction: Reflection Lama-3.1 70B
The model, which was announced with high expectations, said that it had achieved a chain of thought, beating the GPT-4o. It became a major fiasco with false parameters, hidden API calls to Claude AI and huge controversy.
Image generators
An image generator is basically a model that receives text input and provides an output associated with that text input. So, for example, you say “a green horse with a dragon's face”, and the avatar will generate a picture of a green horse with a dragon's face. You can also put in something like “busty waifu” but that's not what they say.
These are some of the best image generators available today.
Best General Practitioner: Flux
Flux handles the latest AI models with highly customizable, LoRA/ControlNet support and text generation capabilities. It requires powerful hardware, but it features tons of bokeh and faint skin detail that users are still trying to overcome.
It comes in three flavors: Pro (closed source, most powerful model), Dev (non-commercial license), and Schnell (open source, transparent version). All three offer excellent image-generating capabilities, and if good tunes are taken into account, the ceiling will be higher.
Best for Realism: Recraft v3
It offers unmatched realism, offering versatile presets and better value than proprietary options like Midjourney.
Although Recraft owns Generations, it has the same quality free tier.
Best for Anime: Midjourney Niji
Unrivaled quality for anime-style images; Fixing a stable distribution is a second option.
Very versatile: stable distribution 3.5
Stable Distribution 3.5 is a major improvement over SD3, with better permissions, detailed output, and more support.
It's more resource-efficient than the Flux for fine tuning, and it's a full model—like the Flux Shell, a refined version—making it a great choice for custom models.
However, it came out a bit late and was overshadowed by Flux's popularity.
The biggest drop: SD 3 medium
Everyone expected this new model to beat SDXL and other models to become the new king of image generators. When trying to breed people on grass, he became a weak model, unknown for his terrible will and terrible mistakes.
Video generators
Video generators take video generation one step further. They generate each frame and use it as input, stitching together the following with image consistency and high speed.
This is still a work in progress, and models can only generate a few seconds of video. Below is a list of some of the best that you can try.
Best overall: Kling
By rapidly improving the Chinese model, in some cases it surpasses Sora. It supports model training and consistently produces high-quality scenes with consistency, realism and great versatility in camera movement.
Top Contender: Runway Gen 3
A pioneering video app with strong local awareness, but fast moving scenes.
Best for storytelling: ShowRunner
We can't tell you much about this. However, in secret testing, it showed great potential.
Best Open Source: Genmo Mochi 1
It's a great release that beats competitors like Rhymes Allegro and Stable Video Diffusion with superior realism and frame consistency.
Biggest failure: OpenAI Sora
Heralded with high expectations as a revolutionary “world model” beyond any video generation, it is not available today with difficult results.
Honorable Name: Google Veo
Google's Veo was released on December 3. We haven't tried it, but the generations shared by Google are pretty cool. Of course, we're on the waiting list to test drive the model, and you'll be the first to know our thoughts when we arrive.
Music generators
Like video generators, music generators create songs. It's different from sound generators, but the outputs are better suited for melodic effects that aren't noisy, clear sounds, or sound effects.
Users can rely on a separate LLM to manually generate song lyrics or lyrics and set a few parameters depending on the style of the song, and the model will extract relevant music from scratch.
These are the best two – plus an open source option.
Best General Suno v4
Excels in vocals and lyrics, stylistic diversity and long-term consistency. The predecessor Suno v3.5 is not free, however It remains a strong option.
Best contender: Udio
Suno's biggest competitor. It offers amazing compositional accuracy, comparable in sound to the Suno v4. Some generations surpass the Suno v3 in terms of style.
Best Open Source: Stable Audio 2
The open source scene is not doing much in this area. Stable Audio 2 looks like a great model, but it lags behind its closed-source competitors in all areas. Meta AudioCraft and MusicGen are alternatives, but far from industry-leading. Well-tuners don't pay attention, and often, it's the people behind the cherry on top that make open source models so good.
Edited by Andrew Hayward.
Generally intelligent newspaper
A weekly AI journey narrated by a generative AI model.