Beyond transformers: New AI architectures can transform large language models.
2 seconds ago Benito Santiago
In the past few weeks, researchers from Google and Sakana have unveiled two groundbreaking neural network designs that could revolutionize the AI industry.
These technologies aim to challenge the dominance of transformers – a type of neural network that contextually connects inputs and outputs – the technology that has defined AI for the past six years.
The new approaches are Google's “Titans” and “Transformers Squared,” designed by Sakana, a Tokyo AI startup known for modeling nature for technological solutions. Indeed, Google and Sakana solved the Transformers problem by studying the human brain. Their transformers basically use different levels of memory and activate different professional modules separately, instead of engaging the entire model for each problem at once.
The net result is making AI systems smarter, faster and more versatile.
For context, the Transformer Architecture, the technology that gives ChatGPT its name, is designed to move from sequence to sequence such as language modeling, translation, and image processing. Transformers rely on “attention methods” or tools to understand how important a concept is based on context, to model dependencies between input tokens, allowing them to process data in parallel rather than the core technology, such as recurrent neural networks. In AI before the appearance of transformers. This technology gave models context awareness and marked a before-and-after moment in AI development.
But despite their impressive success, Transformers faced significant challenges in terms of flexibility and adaptability. Models should be more flexible and versatile, more powerful. So once trained, they cannot be upgraded unless developers bring a new model or users rely on third-party tools. This is why “bigger is better” is the general rule in AI today.
But that could soon change, thanks to Google and Sakana.
Table of Contents
ToggleTitans: Dumb AI's new memory building
Google's Research Titans architecture takes a different approach to improving AI adaptation. Titans focuses on changing how models store and access data rather than modifying how they process it. The architecture introduces a neural long-term memory module that helps us remember during testing how human memory works.
Currently, models read your entire query and output, predict a token, read everything again, predict the next token, and so on until they get the answer. They have an incredible short-term memory, but short-term memory. Ask them to remember things outside the context window, or very specific information in a noisy set, and they will likely fail.
Titans, on the other hand, combine three types of memory mechanisms: short-term memory (similar to traditional transformers), long-term memory (for storing historical context), and persistent memory (for task-specific knowledge). This multi-level approach allows the model to hold lengths of more than 2 million tokens, far beyond what current transformers can handle efficiently.
According to the research paper, Titans show significant improvements in various tasks, including language modeling, common sense and genomics. The architecture has proven to be particularly effective in “needle-in-the-haystack” tasks, which require access to specific information in a very long context.
The system simulates how the human brain activates specific regions for different tasks and adjusts networks based on changing needs.
In other words, how different neurons in your brain are dedicated to specific tasks and activated based on the task you're performing, Titans mimic this idea by incorporating interconnected memory systems. These systems (short-term, long-term, and persistent memories) work together to dynamically store, retrieve, and process information based on the task at hand.
Transformers Squared: Self-Adapting AI is here.
Two weeks after Google's paper, Sakana AI and a team of researchers at the Tokyo Institute of Science introduced Transformer Squad, a framework that allows AI models to change their behavior in real time based on the task at hand. The system works by adjusting only individual elements of the weight matrix at the time of data, making it more efficient than traditional quality adjustment methods.
Transformer Squared “employs a two-pass method: first, the delivery system identifies the task features, then task-specific ‘expert' vectors, trained using reinforcement learning, are dynamically combined to find the target behavior for the incoming speed.” The research paper.
It sacrifices evaluation time (thinks more) for specialization (knowing which knowledge to apply).
What makes Transformer Squad particularly innovative is its ability to adapt without requiring extensive retraining. The system uses what the researchers call single-value fine-tuning (SVF), which focuses on optimizing only the essential components necessary for a specific task. This approach significantly reduces computational demands while maintaining or improving performance compared to existing methods.
In testing, Sakana Transformer has demonstrated impressive versatility across a variety of tasks and model architectures. The framework has shown particular promise in handling out-of-the-box applications, suggesting it can help AI systems become more flexible and responsive to new situations.
Here's our test on analog. When you learn a new skill without having to relearn everything, your brain creates new neural connections. For example, when you learn to play the piano, your brain doesn't need to rewrite all the knowledge – it adapts specific neural circuits for that task and has other skills. Sakana's idea was that developers wouldn't need to retrain the model's entire network to adapt to new tasks.
Instead, the model selectively fine-tunes certain components (in single-value fine-tuning) to be more efficient at specific tasks while maintaining overall capability.
In general, the days of AI companies bragging about the number of their models may soon be a thing of the past. If this new generation of neural networks gains traction, future models will no longer need to rely on large scales to achieve greater versatility and performance.
Today, transformers dominate the landscape, often supported by external devices such as Retrieval-Augmented Generation (RAG) or LoRAs to enhance their capabilities. But it only takes one breakthrough implementation to set the stage for a seismic shift in the fast-moving AI industry – and once that happens, the rest of the field is sure to follow.
Edited by Andrew Hayward.
Generally intelligent newspaper
A weekly AI journey narrated by a generative AI model.