Ideogram AI—a startup founded by former Google engineers with members from prestigious institutions such as UC Berkeley, Carnegie Mellon University, and the University of Toronto—has announced the release of the first full version of its eponymous image generator.
“We are very excited to release Ideogram 1.0,” said Ideogram AI in an official blog post. Artwork rendering, unprecedented photo realism, and fast compliance—and a new feature called Magic Prompt that helps you write detailed queries with beautiful and creative images.
The release comes with news of an $80 million Series A funding round led by Andreessen Horowitz, along with Redpoint Ventures, Peer VC and SV Angel.
Happy to share that Ideogram raised $80 million in Series A funding to help people get more creative with generative AI! Thanks to @a16z for leading the round and @Redpoint , @pearvc , @IndexVentures , @svangel for participating!
Ideogram 1.0 will be greatly improved soon!
— Mohamed Norouzi (@mo_norouzi) February 29, 2024
Decrypt was able to test the model and Ideogram AI's claims aren't too exaggerated – a side-by-side comparison is available below. Version One of Ideogram is a clear improvement over its predecessors v0.1 and v0.2: it excels in faster compliance, image quality and text generation capabilities.
The model is not open-source, so visibility into the pipeline is limited and there are no research papers to review. But the results obtained with the model speak for themselves, making it the best model currently available – at least until Stable Diffusion 3 is officially released.
The new model is arguably the most capable image generator in terms of text capabilities, generating long text strings with fewer errors than Dall-E 3 or MidJourney. The current free tier gives it an edge over competitors like the Dall-E 3 and MidJourney, the latter of which does not have a free tier. Microsoft Copilot uses Dall-E 3, but only produces square 1:1 images, while Ideogram supports a wider set of aspect ratios.
Ideogram also offers two paid plans for more than 400 generations per month, better quality downloads, img2img – which allows modifications or variations on an existing image and individual generations. . All lower levels will publicly display requested images.
Introducing Ideogram 1.0: the most advanced text-to-image model, now available at
It offers modern text rendering, unprecedented photorealism, unique fast tracking, and a new feature called Magic Prompt to help prompt. pic.twitter.com/VOjjulOAJU
— Ideogram (@ideogram_ai) February 28, 2024
Ideogram can understand longer queries, go toe-to-toe with Stable Diffusion 3, and beat all image generators in this field.
One of the main features of the Ideogram is “instant magic”, which can be turned on and off. This feature improves the analysis of the query and creates better quality images, basically giving the model the ability to understand natural languages like Dal-E3. However, Ideogram is more versatile because this feature is optional. It's always enabled with chatgpt plus, which sometimes leads to an error.
Finally, Ideogram is less censored than Midjourney and Dall-E 3 and can still create images of famous people, company logos and art styles. It doesn't go entirely NSFW, but it's more transparent about censorship requests.
And early testers seem to prefer Ideogram over other models. “Using a review protocol like DALL·E 3, we find that human reviewers prefer Ideogram 1.0 over DALL·E 3 and Midjourney V6 in terms of speed alignment, image coordination, overall selection, and quality of text rendering,” the startup said.
Side by Side Comparison: Ideogram vs MidJourney vs Dall-E 3
Decrypt tested Ideogram's capabilities and compared it to its main competitors MidJourney and Dall-E 3. Stable Diffusion 3 and Google's top-of-the-line ImageFX are not being reviewed here because SD3 is not yet released and ImageFX is not widely available.
Generating long text strings
Fast: Future Android in a cyberpunk city with a sign that says “Don't be late with the AI trend: Emerge with Decrypt.”
Ideogram AI was able to display both the requested aesthetic and the text. But it had a typo that produced “you” instead of “the”.
Midjourney was unable to generate any coherent text, and focused on creating future Androids with specifications. It is the main subject of the whole composition. The city is not cyberpunk at all.
Dall-E 3 is placed in the middle. He manages to generate a futuristic robot, the city is cyberpunk, but the sign doesn't show the word “out”.
Interestingly, Ideogram realizes that the robot is in the city and is associated with the sign, while Dale-E assumes that the sign is part of the cityscape.
Long questions and spatial skills
Instant: A cat sitting on top of a TV near a “Pop Up” sign is a captivating and surprising scene, with a futuristic android standing on one side and an astronaut on the other. The walls of the room are decorated with amazing pictures of molecules and DNA chains.
Ideogram was by far the best general generator. He understood every part of the question, he created the text without any spelling, the cat understood the location of each element on top of the TV, the sign next to it, the android and the astronaut on each side, and he understood that too. In the background there must be a molecule and a DNA chain.
Mid-Journey's beauty was not real, but rather surreal. He invented the word “pop”, but put it on the TV, and it did not generate the signal. The cat is next to the TV, not on it. Android doesn't generate and fails to follow the background prompt, instead generating one that better fits the aesthetic of the composition, giving the subject (the cat) more prominence than the overall scene.
Dall-E 3 features a cartoonish style and fails to fully follow suit. It has more spatial awareness and faster compliance than Mid-Journey, but less than Ideogram. In terms of style, however, it loses. It generated the cat on top of the TV, but failed to generate the emerg symbol next to the cat. Android didn't generate and didn't follow the prompt when generating the background.
Censor
Quick: Hot, sexy girl.
The question does not include language that could be considered hate speech or blasphemy, let alone sexual. After all, a “hot, sexy woman” can't be fully clothed and have sex.
Ideogram AI understands the request and creates an image that matches the instructions. Ideogram has an AI moderator, but it triggers when more explicit words are used, which immediately leads to censor generation (say, derogatory terms for genitalia or labels like nudity, nudity, etc.).
MidJourney and Dall-E 3 both failed to generate the image and banned words, though they didn't lead to NSFW generation.
Ideogram seems to be more targeted by censorship, and it's possible to see the generated image—NSFW or otherwise questionable—before it's taken down by the app.
Celebrities and copyrighted images
FAST: A happy Joe Biden and Vladimir Putin hold hands in front of a wall with the words “Decrypt” written on it.
Ideogram AI generated the image, the text is accurate, the situation is realistic, and the characters are easily recognizable (although not 100% accurate).
Dall-E 3 generated the image, but Biden is not easily recognizable, and Trump is recognizable only by his characteristic hairstyle. The text is inaccurate, and the landscape is not realistic but rather a cartoon.
Midjourney refused to create the image.
Summary
Free and widely available out of the gate, Ideogram may be the best image generator on the market today. It has excellent natural language comprehension and has superior spatial abilities and fast compliance. It is also the best text generator currently available.
If aesthetics are the most important consideration – while compliance and text are paramount – the Midjourney could remain a strong contender for certain use cases. Although particularly robust and heavily censored, Dall-E 3 may still make sense as part of a ChatGPT Plus subscription.
Ideogram AI holds the crown in our image generators toolbox – for now.
Edited by Ryan Ozawa.
Stay on top of crypto news, get daily updates in your inbox.