Meet Ariane: The New Open Source Multimodal AI That’s Challenging Big Tech.

Meet Ariane: The New Open Source Multimodal AI That's Challenging Big Tech.


Artificial intelligence has got a new player – and it's completely open-source. Arya, Multimodal LMM. Developed by Tokyo-based Rhymes AI, it can process text, code, images and video all in one architecture.

But it's not just its versatility that should draw your attention, but its effectiveness. It's not as big a model as its multimodal counterparts, which means it's more energy- and hardware-friendly.

Poetic AI achieves this by employing a mixed-of-experts (MoE) framework. This architecture is similar to having a small team of specialists, each trained to excel in specific areas or tasks.

When a new input is given to the model, instead of using the entire model, only the relevant experts (or a subset) are activated. This way, running only part of the model is easier than running a fully aware component that tries to do everything.

Phemex

This makes Aria more efficient because, unlike traditional models that activate all parameters for each task, Aria tokenizes only 3.5 billion of the 24.9 billion parameters, reducing computational load and improving performance on specific tasks.

It also allows for better scalability as new professionals can be added to perform specific tasks without overloading the system.

It is important to note that Aria is the first multimodal MoE in the open-source arena. There are some MOEs (like the Mixtral-8x7B) and some multimodal LMMs (like the Pixtral), but the Aria is the only model that combines the two architectures.

Arya wins the competition with artificial benchmarks.

In benchmark tests, Aria is beating some open source heavyweights like Pixtral 12B and Llama 3.2-11B.

Surprisingly, it's giving proprietary models like the GPT-4o and Gemini-1 Pro or the Claude 3.5 Sonnet a run for their money, with multimodal performance on par with OpenAI's brainchild.

Screenshot 2024 10 14 121139

Rhymes AI has released Ariana under the Apache 2.0 license, allowing developers and researchers to adapt and build on the model.

It is also a very powerful addition to the expanding pool of open source AI models led by Meta and Mistral, which are similar to more popular and accepted closed source models.

Aria's versatility also shines in a variety of roles.

In the research paper, he explained how the team fed the model the entire financial report and it was able to provide accurate analysis, extract data from the reports, calculate profit margins and provide detailed reports.

When tasked with displaying weather data, Aria not only extracted the relevant data, but also created the graphs by generating the Python code, complete with formatting specifications.

The model's video processing capabilities also look promising. In one review, Aria transcribed an hour-long video of Michelangelo's David, identifying 19 different scenes with beginning and end times, titles, and descriptions. This is not simple keyword matching, but contextual understanding.

Coding is another area where Aria excels. It can watch video tutorials, extract code snippets and even edit. In one example, Aria spotted and fixed a logical flaw in a code snippet involving nested loops, demonstrating a deep understanding of programming concepts.

Testing the model

Aria is a beefy 25.3 billion gauge model that requires at least an A100 (80GB) GPU to run at half precision, so it's not something you can run on your laptop and tweak. However, we tested it on Rhyme AI's demo page, which offers a limited version.

Text analysis and processing

First, we tested how good it is by analyzing documents, feeding a research paper and simply asking it to explain what it is about.

The model was very short but accurate. It did not display and retain its speech, exhibiting good extracting capabilities.

It displays the response in a continuous, long paragraph, which can be tedious for users who prefer short paragraphs.

Screenshot 2024 10 14 091941

Compared to ChatGPT, the OpenAI model showed similar answers in terms of the data provided but was more structured in its format, making it easier to read.

Screenshot 2024 10 14 091954

Additionally, Rhyme's demo site limits PDF uploads to just five pages. ChatGPT is more capable of processing documents of more than 200 pages.

In contrast, Cloud 3.5 Sonnet allows files under 30MB as long as they don't exceed the encryption limits.

Code and image understanding

Screenshot 2024 10 14 102439

We then combined two instructions, asking the model to analyze a screenshot of the price performance of the top 10 tokens from CoinMarketCap, and used code to provide some data.

Our question was:

Sort the list by best performance in the last 24 hours.

Write Python code to draw a bar chart of the daily and weekly performance of each coin, and draw a line chart showing the current price of Bitcoin and the price yesterday and last week, taking into account the performance data of the past period. 24 hours and the last seven days.

Aria fails to organize the coins based on daily performance, and for some reason finds that Tron has a positive performance when it falls in value. The chart has added weekly performance next to the daily bars. The bar graph also had a flaw: it didn't order the time correctly on the X-axis.

Screenshot 2024 10 14 104537

ChatGPT had the ability to understand how to draw the timeline correctly, but they didn't actually order the coins based on their performance. There was also TRX Schiller showing a positive daily performance.

Screenshot 2024 10 14 105225

Video understanding

Aria has a good understanding of video. We uploaded a short video of a woman in action. The woman was not speaking in the video.

We asked the model to explain the scene to us and asked what the woman would say, trying to see if the model could manipulate the answer.

Screenshot 2024 10 14 110801

Aria understood the task, was adept at describing the ingredients and correctly noting that the woman did not change her look or speak to the camera.

ChatGPT is not capable of understanding video, so it cannot handle this request.

Creative writing

This test was probably the most exciting surprise. Aria's story was more imaginative than the results presented by Grok-2 or Cloud 3.5 Sonnet, which are leaders in our empirical analysis.

Our prompt: Write a short story about a man named José Lanz, who traveled back in time, using vivid descriptive language and adapting the story to his cultural background and personality – whatever that may be. Traveled from 2150 AD to 1000 years ago, the story is supposed to highlight the time travel paradox and how pointless it is to try to solve a problem from the past (or invent a problem) to change the current timeline. . The future only exists because it was influenced by the events of 1000 AD, which should have been to shape 2150 into its current characteristics – something he doesn't realize until he returns to his timeline.

Jose Lanz's story of Arya, a time-traveling historian from 2150, blends some science fiction with historical and philosophical elements. The story is not as abrupt in its outcome as it is told in other models, and although not as creative as the human writing, it results in a plot twist instead of a rushed ending.

Overall, Arya delivers an engaging and coherent story that is well-conceived and impactful on a variety of themes than its more powerful competitors. It was slightly more immersive but rushed due to token limitations. For long stories, the Longwriter is by far the best model out there.

You can read all the stories by clicking this link.

Overall, the Aria is a solid contender that looks promising in its build, clarity, and size capabilities. If you still want to test or train the model, it's available for free on FaceHug. Remember that you need at least 80GB of VRAM, a powerful GPU or three RTX 4090 to run together. It's still new, so no numbered versions (inaccurate but more efficient) are available.

Despite these hardware limitations, new developments like this in the open source space are a big step towards achieving the dream of having a fully open source ChatGPT competitor that people can run at home and modify as per their needs. Let's see where they go next.

Edited by Sebastian Sinclair and Josh Kittner

Generally intelligent newspaper

A weekly AI journey narrated by General AI Model.

Pin It on Pinterest