May 6th, 2024

China’s Sora competitor, Midjourney CEO’s Prediction, InstantFamily, StoryDiffusion, Stylus

May 06, 2024

For those of you who are new, this is SinkIn Newsletter, a 5 minutes read made at sinkin.ai to cover the most interesting stuff in the Image AI world.

We scroll, so you don’t have to.

Vidu - China’s new Sora competitor for highly consistent AI video generation up to 16 seconds in 1080p

Chinese tech-firm ShengShu-AI and Tsinghua University on Saturday unveiled text-to-video artificial intelligence (AI) model Vidu, which is said to be the first in China that's on par with Sora. Launched at the ongoing Zhongguancun Forum in Beijing, Vidu can generate a 16-second 1080P videoclip with one click. It is built on a self-developed visual transformation model architecture called Universal Vision Transformer (U-ViT), which the allows it to simulate the real physical world with multi-camera view generation.

Showcase Video of Vidu

Midjourney CEO’s Prediction on Next 12 Months

Buckled up for it?

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

InstantFamily is an approach for multi-ID image generation, introduced by researchers from SK Telecom. This methodology leverages a masked cross-attention mechanism and a multimodal embedding stack, enabling the preservation and precise control of multiple identities within a single image. Through experiments, InstantFamily has shown solid performance in identity preservation, achieving state-of-the-art results in both single-ID and multi-ID scenarios.

A photo of seven men in mars

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

StoryDiffusion is a new framework that enhances the consistency of content across a series of images generated by diffusion models. It introduces Consistent Self-Attention, a method that improves the uniformity of generated images and integrates seamlessly with existing pretrained text-to-image models. Additionally, the Semantic Motion Predictor is introduced for creating smooth video transitions by predicting motion in semantic spaces between images. This enables the generation of stable long-range videos, a notable improvement over traditional models reliant on latent spaces.

Stylus: Automatically select the right LoRAs based on your prompt

Stylus is a method designed to enhance the generation of high-fidelity, custom images by efficiently selecting and automatically composing task-specific adapters (aka LoRAs). Stylus operates through a three-stage approach: it first improves adapter descriptions and embeddings, then retrieves relevant adapters based on a prompt's keywords, and finally assembles them to best match the prompt's requirements. Test shows Stylus is preferred twice as much as the base model when evaluated by humans and multimodal models.

China’s Sora competitor, Midjourney CEO’s Prediction, InstantFamily, StoryDiffusion, Stylus

Vidu - China’s new Sora competitor for highly consistent AI video generation up to 16 seconds in 1080p

Midjourney CEO’s Prediction on Next 12 Months

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Stylus: Automatically select the right LoRAs based on your prompt

Meme of the Day

Harry Potter and the Second Amendment for PlayStation

What'd you think of today's edition?