The Latest News In AI

Lately there’s been so much news in AI that I have a hard time keeping up, but I’ll try to keep it short-ish.

Red Panda

Not long ago there was a Red Panda hype with claims that Red Panda is revolutionary and the future of generative AI. Tests had shown that Red Panda was superior to other models.

I tried it some back then, and wasn’t that impressed by it. However, one of the key features of Red Panda was said to be it’s ability to incorporate texts in the images, and not just a word or two but longer texts. That isn’t exactly what I have been wishing for, so I put Red Panda out of my mind for a while.

But when I was going to write this, I needed to do some more testing, and here comes a warning. There are a LOT of fake websites claiming to offer image generation with Red Panda. Don’t do as I did, and pay $10 for a fake service. After some research I believe that the only website that offer image generation with the actual Red Panda model is Recraft.ai They will also provide you with free credits to let you try it out. Here are some comparisons.

Based off the text criterial the REAL Red Panda is obviously superior, but when it comes to details and quality of the image I don’t think Red Panda is better than Flux Ultra, for example.

Read more about videogenerating. Guide: Creating AI Videos step-by-step

ComfyUI Made Easy: Get Your FREE Beginner’s Toolkit!

Shuttle 3.1 Aesthetic

Shuttle AI released their latest model, Shuttle 3.1 Aesthetic, which is a text-to-image AI model designed to create detailed and aesthetic images from textual prompts in just 4 to 6 steps. It offers enhanced performance in image quality, typography, understanding complex prompts, and resource efficiency.

I personally like this model a lot, not only because it has a built-in 6 step mode but the enhanced details are really noticable. And the great thing for those of us that have been running Flux locally is that you can use your Flux workflows for this model, and only change the model itself. No need for extra clip models, special samplers or other things that many other models require.

The model is available in fp16 at 24GB (download link: Shuttle fp16) and in fp8 at 12GB (download link: Shuttle fp8).

I’m not sure what the minimum spec requirements are, but I can run the fp16 model using triple clips with a total of 42.75GB on my computer with 12GB VRAM and 32GB system RAM.

Prompt

In a secluded glade embraced by the lush, vibrant greenery of a tropical jungle, a mesmerizing waterfall cascades gracefully down moss-covered rocks, creating a symphony of soothing sounds that harmonize with the gentle rustle of leaves in the cool night breeze. At midnight, the sky above is a vast expanse of deep indigo, dotted with countless shimmering stars that twinkle like diamonds against the darkness. Their soft, silvery light dances upon the crystal-clear water as it tumbles and splashes into a serene pool below, creating a mirror-like surface that reflects the celestial beauty above. Lush ferns and vibrant orchids frame the scene, their petals adorned with tiny droplets of water that sparkle like jewels in the moonlight. The air is still, infused with the sweet, intoxicating scent of blooming night flowers, adding an enchanting aroma to the magical atmosphere. As the water flows, it produces a melodious sound that resonates with the tranquility of the night, inviting anyone nearby to bask in the harmonious fusion of nature's splendor.

Seed: 430095508386181

Also check out Flux AI compared to PixArt and Stable Diffusion

Google Gemini 2.0 Flash

A lot of you probably remember when Gemini 1.0 was released, and it looked like this.

Gemini 1.0 was so hardcoded for “diversity” that it was almost impossible to get it to create a white man.

And Google was being ridiculed by people everywhere.

Now Google have released Gemini 2.0 Flash and I just learned about this today, so I haven’t had much time testing it. As far as I can tell, it’s a text based AI model, and doesn’t generate images at all (I might be wrong, since I haven’t had much time to research). I asked it to describe the waterfall image created with Shuttle, and to make a guess which AI model was used to create it with.

The answer:

Certainly! Let’s break down this enchanting image.

Overall Impression:

The image presents a serene and magical tropical scene at night, illuminated by a large, full moon. It evokes a sense of tranquility and wonder, blending lush greenery with the soft glow of moonlight. The cascading waterfall adds movement and a calming sound to the composition.

Key Elements:

Moon: A large, prominent full moon dominates the upper-left quadrant, casting a soft, cool glow over the scene. The moon is highly detailed and appears realistic, with subtle lunar surface features.

Sky: The night sky is a deep blue, dotted with numerous twinkling stars. The sky transitions from a deeper tone near the horizon to a lighter, more ethereal shade around the moon. There are some subtle wisps of clouds adding texture.

Waterfalls: A series of small waterfalls cascades down mossy rocks, forming a gentle descent. The water is depicted with realistic motion blur, giving a sense of movement and flow.

Pool: At the base of the waterfalls, a pool of still, reflective water collects. The water reflects the moonlight and sky, adding depth to the scene.

Rocks: Moss-covered rocks are scattered throughout the scene, both in and around the waterfall and pool. They provide texture and depth to the composition.

Vegetation: Lush tropical vegetation surrounds the scene. This includes palm trees, ferns, various leafy plants, and distinctive purple flowers that stand out prominently. The vegetation is dense and detailed, with varying shades of green.

Lighting: The scene is primarily illuminated by the moonlight, which casts soft shadows and highlights on the vegetation and rocks. The light is diffused and creates a peaceful, dreamlike atmosphere. The moon’s reflection on the water also adds to the overall luminosity.

Color Palette: The color palette consists mainly of cool hues—blues and greens—accented by the purple of the flowers and the white of the moon. There are subtle hints of warmer tones in the moss and vegetation.

Style and Artistic Feel:

The image has a photorealistic quality, yet it also possesses a painterly touch. This creates a somewhat hyperreal aesthetic, characteristic of many modern AI-generated artworks. It’s a fantastical scene that leans into an idealistic vision of nature.

Possible Generative AI:

Based on the style, detail, and overall aesthetic, the image was most likely generated using one of the advanced text-to-image AI models, such as:

Midjourney: This is a strong candidate due to the blend of realism and artistic interpretation, and its consistent quality when generating fantasy scenes. The soft, luminous light and painterly style are frequently seen in Midjourney outputs.

DALL-E 2 (or DALL-E 3): The high level of detail and photorealistic approach is characteristic of DALL-E models. It is also very capable of creating realistic-looking images, and is known for its ability to generate fantastical images with a realistic style.

Stable Diffusion: This model is also capable of generating detailed scenes with a lot of control over the style. While the output from Stable Diffusion can be stylistically diverse, its potential for photorealism and painterly effects also makes it a viable candidate.

Reasoning for the Choice:

While it’s impossible to definitively name the AI without explicit information, the blend of hyperrealism, painterly style, and the fantastical elements, particularly the quality of lighting and natural textures, are strong indicators that one of these top generative AIs was likely used. Midjourney’s output has become very recognizable, and I lean towards that as the most likely.

If you have any other questions or want to explore different aspects of the image, feel free to ask!

Now here is the cool thing, you can use Google Gemini 2.0 Flash on your own computer.

So I gave Gemini the image I earlier created with Shuttle 3.1 and asked it to analyze the image and give me a promot that would generate a similar image. The prompt I got was the following:

Generate a high-resolution, photorealistic image of a tropical waterfall at night under a full moon. The scene should feature a multi-tiered waterfall cascading over moss-covered rocks into a clear pool. The sky should be a dark blue with many visible stars and a large, bright full moon. The surrounding vegetation should be lush and dense, including palm trees, ferns, and other tropical plants. Add pinkish-purple flowers scattered throughout the scene, particularly in the foreground. The lighting should be soft and diffused, with the moon as the primary light source. The overall mood should be serene, magical, and tranquil. The perspective should be eye-level, looking up the waterfall. Use a painterly style with a focus on detail and realism.

The images are pretty damn near identical, and that’s impressice I think.

LTX Video

Lightricks realeased their video generating model LTX, which can be used to create videos local on your computer.

LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768×512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content. They provide a model for both text-to-video as well as image+text-to-video usecases

You can try it on their Huggingface space: Video Generation with LTX Video

Check out Free animations in a few steps

Don’t miss any updates, join my mailing list.