When Flux entered the scene a few days ago, it quickly gained a reputation as the crown jewel of open source image generators. It matches Midjourney's aesthetic prowess while completely crushing fast comprehension and text generation. The catch? You'll need a beefy GPU with over 24GB of RAM (or more) to make it work. That's more horsepower than most gaming devices, let alone your average work laptop.
But the AI community, never one to back down from a challenge, rolled up its collective sleeves and got to work. Through the wonderful magic of quantization to compress the model data—you're able to reduce Flux to a manageable size without sacrificing much of its artistic mojo.
Let's break it down: The first Flux model used full 32-bit precision (FP32), which is like driving a Formula 1 car to the grocery store—too much for most. The first round of updates brought us the FP16 and FP8 versions, each of which trades precision for a huge increase in efficiency. The FP8 version was already a game changer, allowing people with a 6GB GPU (think RTX 2060) to join the party.
After disabling shared memory failover for ComfyUI, Flux Schnell (FP8) runs flawlessly on a 6GB RTX 2060. At 107.47 seconds -4 steps, no OOM.16.86s/it512x768 Image.1024×1024 takes longer. I'd recommend a higher resolution or higher level… pic.twitter.com/LKe1rWzyQV
— jaldps (@jaldpsd) August 5, 2024
To do this, you need to disable System Memory Callback for Stable Diffusion, so your GPU can offload some of its work from its internal VRAM to system RAM. This avoids the infamous OOM (out of memory) error—though at the cost of being significantly slower. To disable this option, follow this Nvidia tutorial.
But hold on to your hat, because the more it gets, the better.
The true MVPs of the AI world have pushed the envelope even further, releasing 4-bit sized models. These bad boys use something called “Normal Point” (NP) Quantization, which provides a sweet spot of quality and speed that makes your potato PC feel like it's gotten a turbo boost. The size of NP does not destroy the quality like the size of FP, so in general, running this model gives high results, high speed, requires less resources.
So, how do you run this streamlined version of Flux? First, you'll need an interface like SwarmUI, ComfyUI, or Forge. We like ComfyUI for its flexibility, but in our tests Forge provided a 10-20% speed increase over the others, so that's where we're rolling.
Head over to the Forge GitHub repository (and download the one-click installer package. It's open source and community-verified, so there's no sketchy business here.
For NP4 Flux models themselves, Civit AI is your one-stop shop. You have two flavors to choose from: Schnell (for speed) and Dex (for quality). Both can be downloaded from this page.
Once you've downloaded everything, it's time to install:
Open the Forge file and open the Forge folder. Run update.bat to get all dependencies. Run run.bat to complete the setup.
Now, drop those shiny new Flux models into the \webui\models\stable-diffusion folder in your Forge installation. Refresh the Forge web interface (or start over if you're feeling old school) and boom – you're in business.
Tip: To squeeze every last drop of performance out of your resurrected device, dial back the resolution. Instead of going for full SDXL (1024×1024) resolutions, try the more modest SD1.5 sizes (768×768, 512×768 and similar). You can always zoom in later and use Adetailer for those crisp details.
Let's talk numbers: on a humble RTX 2060 with 6GB of VRAM, the Flux Schnell can render a 512×768 image in NP4 mode in 30 seconds, compared to the 107 seconds required by the FP8 version. Want to go big? Upgrading that bad boy to 1536×1024 takes about five minutes.
Want to go big without breaking your GPU? A better option is to start with Flux Schnell at SD1.5 resolutions and export that creation via img2img. Upscaling using the standard Stable Diffusion Model (SD1.5 or SDXL) with low intensity. The entire process takes around 50 seconds, comparing MidJourney's results on a closed day. You'll get impressive, large-scale results without melting your graphics card.
The real poet? Some crazy kids reportedly got Flux Schnell NP4 running on a GTX 1060 with 3GB of VRAM, Flux Dev takes 7.90s per iteration. We're talking about a GPU that practically breathes on life support, and here it's generating high-quality AI art. Not too mad at retirement-eligible hardware.
Generally intelligent newspaper
A weekly AI journey narrated by a generative AI model.