AI-Generated Images: Experimenting with Flux Schnell on the MacBook Pro M1 Max

AI-Generated Images: Experimenting with Flux Schnell on the MacBook Pro M1 Max
AI-generated picture with FLUX.1-schnell. Prompt author: Max Mortillaro.

I recently came across a mention about FLUX.1-schnell on social media. I can't remember the source (apologies), but in any case, according to Hugging Face, FLUX.1-schnell (which I will refer from now on as Flux Schnell) is "a 12 billion parameter rectified flow transformer capable of generating images from text descriptions".

The solution was created and launched by Black Forest Labs, and has the following key features according to the previous Hugging Face link:

  • Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives.
  • Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps.
  • Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.

I was curious about trying this, because the author of the social media post mentioned that it runs locally, it's open-source, and it's relatively fast. Having no prior experience in text-to-image generation (nor in AI, other than playing extensively with ChatGPT lately), I embarked on a Google Search, which drove me to this well written article: https://towards-agi.medium.com/how-to-flux-schnell-locally-on-an-m3-max-macbook-pro-a7b16b6fcd1c

If you're impatient: scroll down to see some ouputs

Although the article mentioned using an M3 Max chip, I decided to try it on my M1 Max because it's all I have to test it with, and because I refuse to accept that my beautiful device is already outdated!

Have a look at the prerequisites though, it takes up a lot of RAM (40 GB according to the article), and may fail if you are a bit too greedy (see below).

Things to Consider

The article is on-point and following the instructions will allow for a seamless experience.

What surprised me upon the first run (leaving everything as default in the article's flux_schnell.py file) is how long it took for the initial execution. I thought this was due to my machine not supporting bfloat16 so I did multiple changes, but it was probably only due to the initial training. I don't recall exactly what happens, but the first run was extremely long. Subsequent runs were faster.

Start small first: you may want to downscale the height and width parameters to, for example, 256 pixels, to allow for a faster image generation. The article says ~30 seconds on a M3 Max, it was approx 90 seconds with my M1 Max. When doing 1920x1080, this goes up to around 2 minutes 30 seconds.

Batch Generation

The next thing I wanted to experiment was generating multiple images at once. There is a parameter called num_images_per_prompt that is not included in the script (I've added it below max_sequence_length) and allows to generate multiple pictures at once. But there are a couple caveats.

The first one is that, if you keep the default python script from the article, it will generate only a single picture output.

Multi-Picture Output Code

I've fixed this by replacing the following (note that num_images_per_prompt is missing in the first block as the parameter wasn't being honored upon execution):

#Generate the image
out = pipe(
  prompt=prompt,
  guidance_scale=0.,
  height=1080,
  width=1920,
  num_inference_steps=4,
  max_sequence_length=256,
).images[0]

#Save the generated image
out.save("flux_image.png") 

with this:

#Generate the image
out = pipe(
  prompt=prompt,
  guidance_scale=0.,
  height=1080,
  width=1920,
  num_inference_steps=4,
  max_sequence_length=256,
  num_images_per_prompt=2,
).images

#Save the generated images
  for i, img in enumerate(out):
    with open(f"./flux_image{i}.jpg","w+") as f:
      img.save(f)

This will ensure it creates as many pictures as you have specified in num_images_per_prompt. Of course, this is quick and dirty (I'm not an expert at coding), and re-runs will overwrite the previous pictures if you don't move them somewhere else.

Adjusting num_images_per_prompt

On my MacBook M1 Max, I noticed that while running batch tests at 256x256 proved no challenge with num_images_per_prompt = 4, increasing resolution at 1920x1080 caused images 3 and 4 to be unusable, either black or noisy.

Using num_images_per_prompt = 2 turned out to be the most reasonable option.

What Can You Expect from Flux Schnell

I'm including below some pictures with the prompts I've been using to generate them:

prompt = "A night scene in Menton, Cote d'Azur, showing a midnight blue BMW E34 Touring car parked along a seaside promenade, with a backdrop of the sea. Scene includes palm trees and orange lamp lights."
prompt = "A Lufthansa Airbus A380 parked at Prague Airport Terminal 2, at night, on a slightly foggy winter night."
prompt = "A young, active eastern-european woman at work, holding a coffee mug and wearing glasses. The atmosphere is modern and includes wooden elements, green plants, and a large, sunny window."
prompt = "A casually-dressed man in his forties on the show floor, talking at a corporate keynote event. The scene has large screens showing slides about corporate sustainability."
prompt = "A Turkish Airlines A340 plane over the San Francisco Golden Gate Bridge, with a slightly cloudy weather."
prompt = "A group of about twelve women, dressed in long spring dresses, forming a circle around a bonfire and celebrating a spring ritual. The scene is set on a meadow, at dusk, and the moon is full."
prompt = "A cyberpunk-style scene set at night, in the future, showing a heavily tuned Alfa Romeo 75 car. The scene includes neon lights, stores of various kinds, large-sided ads, and inscriptions that are a blend between chinese and latin alphabets."

Flux Schnell vs. Midjourney

Through the entire exercise, I was sharing the outputs and prompts with my older son Alex, a seasoned graphical designer that works a lot with Midjourney. He also creates music and has a Spotify artist account, so check him out!

In multiple instances, Flux Schnell outpaced Midjourney, although I do not have all the pictures that Alex generated on his side. We used the same prompts, so you can compare for yourself, at least for the women around the bonfire (except Alex introduced a variation - white dresses and pointy caps - of which he informed me later), and the Cyberpunk Alfa Romeo.

prompt = "A group of about twelve women, in white dress with pointy cap, forming a circle around a bonfire and celebrating a spring ritual. The scene is set on a meadow, at dusk, and the moon is full."

The result for this prompt proved to be underwhelming to say the least. Alex also generated a black and white version which is rather dreadful:

prompt = "A cyberpunk-style scene set at night, in the future, showing a heavily tuned Alfa Romeo 75 car. The scene includes neon lights, stores of various kinds, large-sided ads, and inscriptions that are a blend between chinese and latin alphabets."

There is a stark contrast between the first and second prompts. The car pictures are pretty good, with #2 and #4 being my favorites. But the mood is definitely different.

Closing Comments

I will let you judge the pictures generated by Flux Schnell, but in my opinion, this is pretty impressive for something that runs locally on a MacBook Pro M1 Max.

I generally found the output to match the prompts, although you may want to be very specific in some cases:

  • Expect that prompt requiring text generation may not work properly. The A380 with Lufthansa markings is a notable exception.
  • The one with women around the bonfire required a lot of tweaking to ensure the output would be good enough. The picture you see is the first run (and turned out to be the best run). This is by the way related to my wife's coaching and spiritual activities business. I was exploring whether the tool is capable of generating realistic-enough thumbnails, taking into account the specificities of what she does.
  • You may have to be specific to get what you desire, and you may have to tweak your prompt to get the best output. Based on the size of your picture, it make take some time, so you better be specific upfront.

I hope you enjoyed reading this and wish you a lot of fun!