Stable Cascade: The Successor of Stable Diffusion

GitHub is a platform that hosts millions of projects from developers all over the world and among them there are a lot of new surprising projects every day. The last that I found out about is Stable Cascade, an open source model for the image generation from text.

Image generation from text is a challenging and fascinating task that has many applications in art, entertainment, education, and more. However, most existing models for this task are either slow, expensive, or limited in their capabilities. Stable Cascade aims to overcome these limitations and achieve impressive results in both quality and efficiency.

Stable Cascade is a model developed by Stability AI, a research lab that focuses on creating stable and scalable AI systems. It is built upon the Würstchen architecture, which is a diffusion-based generative model that uses a denoising score matching objective. The main difference between Stable Cascade and other models, such as Stable Diffusion, also from Stability AI, is that Stable Cascade works at a much smaller latent space, resulting in faster inference and cheaper training.

Why is this important? The smaller the latent space, the less computation and memory are required to generate images! Stable Cascade achieves a compression factor of 42, meaning that it can encode a 1024×1024 image to 24×24, while maintaining crisp reconstructions. The text-conditional model is then trained in the highly compressed latent space, which allows it to learn more complex and diverse mappings from text to images.

Without going into too much detail, Stable Cascade consists of three models: Stage A, Stage B, and Stage C, representing a cascade for generating images, hence the name “Stable Cascade”. Stage A is a pre-trained model that encodes an image to a low-resolution latent code. Stage B is a text-conditional model that generates a latent code conditioned on a text prompt. Stage C is a post-trained model that decodes a latent code to a high-resolution image. The whole process can be seen in the figure below.

Stable Cascade achieves impressive results, both visually and evaluation wise, and it is fast, even if its largest model contains 1.4 billion parameters more than Stable Diffusion XL.

Nonetheless, it is also flexible and versatile, supporting various extensions and features, such as finetuning, LoRA, ControlNet, IP-Adapter, LCM, etc. These extensions allow users to control and manipulate the generated images in different ways, such as changing the style, color, pose, lighting, etc.

If you are interested in trying out Stable Cascade, you can check out the official codebase or the notebooks at https://github.com/Stability-AI/StableCascade/tree/master/inference that give more information and easy code to use it.

Subscribe for the latest breakthroughs and innovations shaping the world!

Leave a comment

Design a site like this with WordPress.com
Get started