Diffusion Transformer – Did you know?

Hunyuan-DiT: Unlocking the Power of Chinese Text-to-Image Generation

May 18, 2024

AI, Generative AI, Image Generation, Open Source

CLIP, Diffusion Transformer, DiT, Github, Hugging Face, Hunyuan-DiT, multi-turn dialogue, Open Source, RoPE, T5, Tencent, VAE

Hunyuan-DiT: Unlocking the Power of Chinese Text-to-Image Generation

Imagine typing a few lines of text, perhaps a verse from a Tang Dynasty poem or a description of a bustling Hong Kong street market, and watching as a stunningly realistic image materializes on your screen. This is the power of Hunyuan-DiT, a cutting-edge AI model developed by Tencent that excels in generating images from…

VASA-1: Generator of Talking Faces with Audio by Microsoft

Apr 24, 2024

AI, Audio-Driven Animation, Generative AI, Human Image Animation, Video Generation

Diffusion Transformer, Microsoft, Talking Face Generation, VASA-1

VASA-1: Generator of Talking Faces with Audio by Microsoft

Microsoft has introduced a new AI model called VASA-1, capable of generating remarkably realistic talking faces from a single image and audio clip. This technology has the potential to revolutionize how we interact with computers and each other in the digital world, reaching new levels of realism and in real-time. However, this is just an…

Stable Audio 2.0: Revolutionizing AI-Generated Music with Coherent Structure and Style

Apr 3, 2024

AI, Generative AI, LLM, Song Generation

Audio to Audio, Diffusion Transformer, Latent Diffusion, Music Generation, Song Generation, Stability AI, Stable Audio, Text to Audio

Stable Audio 2.0: Revolutionizing AI-Generated Music with Coherent Structure and Style

Similarly to Suno, Stable Audio 2.0 marks a significant leap forward in the world of AI-powered music generation. This innovative model transcends the limitations of its predecessor by crafting high-fidelity, full-length musical pieces (up to 3 minutes) with a coherent structure, including intro, development, and outro sections. It also introduces audio-to-audio generation, empowering users to…