April 22nd, 2024

Stable Diffusion 3, VASA-1, Meta AI, Juggernaut XL v10, first ban from AI tools

April 22, 2024

Suuup - this is SinkIn Newsletter, a 5 minutes read made at sinkin.ai to cover the most interesting stuff in the Image AI world. We scroll, so you don’t have to.

Stable Diffusion 3 API Launch

Stability AI just made Stable Diffusion 3 available through API access. Model weights will be available in the near future. Stability AI claims SD3 is equal to or outperforms DALL-E 3 and Midjourney v6 in typography and prompt adherence. There has been a lot testing in the Stable Diffusion community. General sentiment has been positive, especially around prompt alignment, image details and text rendering. People are excited about future finetunes based on this foundation.

Prompt: awesome artwork with a wizard on the top of a mountain, he’s creating the big text “Stable Diffusion 3 API“ with magic, magic text, at dawn, sunrise

Microsoft Introduces VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Microsoft just introduced VASA-1, a framework for generating realistic, audio-driven talking faces from a single image and a speech clip. It excels in producing lip movements in sync with audio and captures a wide range of facial expressions and head motions, enhancing the realism and liveliness of virtual characters. Employing a novel model in face latent space for facial dynamics and head movements, VASA-1 significantly outperforms existing methods. It supports real-time creation of high-quality videos at 512x512 resolution, up to 40 FPS, with minimal latency, facilitating interactions with lifelike avatars that mimic human conversational behaviors.

single portrait photo + speech audio = hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements, generated in real time

Meta Launches Meta AI with Image and Animation Generation

Meta launched Meta AI, an AI assistant similar to ChatGPT, powered by its open source LLM - Llama 3. The assistant comes with an "Imagine" feature, allowing users to create images from text. Notably, it enhances these creations by offering a one-click option to animate the generated images, producing a 1-second video at a 512x512 resolution. The animation is quite smooth, which reminds us of the Emu Video Meta announced last year.

Juggernaut XL v10 Update

The popular SDXL finetune model Juggernaut XL just dropped version 10. This update features enhanced prompt adherence, higher-quality image datasets, and improved classification of shots such as full-body, midshots, portraits, etc. Both SFW and NSFW versions are available to cater to diverse preferences and ensure inclusivity. Trained with a GPT4 Vision Captioning tool, this version is user-friendly, requiring only simple, straightforward prompts. It also has the ability to generate text on the image but it's only accurate with short words.

Sex offender banned from using AI tools in landmark UK case

Anthony Dover, a sex offender convicted for creating over 1,000 indecent child images, has been prohibited from using any AI creation tools by a UK court. This landmark ruling, aimed at preventing the misuse of AI technology for generating sexual abuse imagery, marks a significant step in addressing the challenges posed by AI in the production of such illegal content. The case highlights growing legal responses to the dangers of AI-generated deepfake images and the ongoing efforts to safeguard against their misuse.

Meme of the Day

That’s it for today, hope they tasted like a good latte!