Apple shakes up open source AI with its MGIE image editor
10 months ago Benito Santiago
After seemingly staying on the sidelines for most of the past year, Apple is starting to shake things up in the field of artificial intelligence, particularly open source AI.
The Cupertino-based tech giant has partnered with the University of Santa Barbara to develop an AI model that can edit images based on natural language, similar to ChatGPT. Apple calls it Multimodal Large Language Model-Guided Image Editing (MGIE).
MGIE interprets user-supplied text instructions, processing and refining them to generate accurate image editing commands. Integrating a diffusion model improves the process, allowing MGIE to apply edits based on the characteristics of the original image.
Multimodal Large Language Models (MLLMs), capable of processing both text and images, form the basis of the MGIE methodology. Unlike traditional single-mode AIs that only focus on text or images, MLMS can process complex instructions and work in a variety of situations. For example, a model can understand written instructions, examine the elements of a particular photo, then remove an object from the image and create a new image without that element.
To perform these actions, an AI system must have different capabilities, including generative text, generated image, segmentation and CLIP analysis, all in the same process.
The introduction of MGIE enables Apple to achieve capabilities similar to OpenAI's ChatGPT Plus, which allows users to have conversational interactions with AI models to create customized images based on text input. With MGIE, users can give detailed instructions in natural language—“remove the traffic cone in front”—that is translated into image editing commands and executed.
In other words, users can start with a photo of a hairy person and turn them into a ginger just by saying, “Make this person a redhead.” Under the hood, the model understands the instructions, segments the human hair, generates commands such as “red hair, very detailed, photo, ginger tone” and performs the transformations in color.
Apple's approach is compatible with tools like Stable Diffusion, which can be added with a simple interface for text-driven image editing. Using third-party tools like Pix2Pix, users can interact with the Stable Diffusion interface using natural language commands, displaying real-time effects on edited images.
Apple's approach, however, proves more accurate than any other similar method.
In addition to generative AI, Apple MGIE can perform other common image editing tasks such as color grading, resizing, rotating, style changes, and drawing.
Table of Contents
ToggleWhy does Apple make it open source?
Apple's open source initiatives are a clear strategic move beyond licensing requirements.
Apple uses open source models such as Lava and Vicuna to build MGIE. Because of the licensing requirements of these models, which limit commercial use by large corporate entities, Apple is likely forced to share the update openly on GitHub.
But this allows Apple to leverage developer pools around the world to increase its strength and flexibility. This kind of collaboration moves things forward faster than Apple does entirely on its own and starts from scratch. In addition, this openness inspires a broader set of ideas and attracts different technical skills, allowing MGIE to develop rapidly.
Apple's involvement in the open source community with projects like MGIE gives the brand a boost among developers and technology enthusiasts. This aspect is no secret, with both Meta and Microsoft investing heavily in open source AI.
Perhaps releasing MGIE as open source software could give Apple a head start on developing the still-evolving industry standards for AI and AI-based image editing. With MGIE, Apple may have given AI artists and developers a solid foundation on which to build the next big thing, offering greater accuracy and efficiency than anywhere else.
MGIE really makes Apple's products better: it won't be too difficult to process a voice command sent to Siri and use that text to edit a photo on the user's smartphone, computer or internal headset.
Tech-savvy AI developers can use MGIE now. Visit the project's GitHub repository.
Edited by Ryan Ozawa.