April 27th, 2024

HiDiffusion, VideoGigaGAN, Align Your Steps, Hyper-SD, Predicting Political Views

Happy Friday - This is SinkIn Newsletter, a 5 minutes read made at sinkin.ai to cover the most interesting stuff in the Image AI world. We scroll, so you don’t have to.

HiDiffusion is a new framework that can be integrated into various pretrained diffusion models to generate images up to 4096×4096 resolution at 1.5-6× the speed of previous methods, with better visual quality. It introduces Resolution-Aware U-Net (RAU-Net) that dynamically adjusts the feature map size to resolve object duplication, and incorporates Modified Shifted Window Multi-head Self-Attention (MSW-MSA) that utilizes optimized window attention to reduce computation. What’s even cooler: the integration only takes one line of code.

Faster, and better image details.

Adobe has unveiled VideoGigaGAN, a new AI model designed to significantly enhance video quality by upscaling videos up to eight times their original resolution. Unlike other Video Super Resolution methods, VideoGigaGAN combines the sharpness and detail of Generative Adversarial Networks (GANs) with fewer common artifacts like flickering, delivering clearer and more detailed video outputs. Despite being just a research preview and not yet available for consumer use, Adobe's demonstration suggests that VideoGigaGAN can produce highly natural results, making it difficult to discern that the videos were enhanced by AI.

VideoGigaGAN can improve video resolution by 8x without introducing flickering or other visual distortions.

From our sponsor

Learn AI in 5 Minutes a Day

AI Tool Report is one of the fastest-growing and most respected newsletters in the world, with over 550,000 readers from companies like OpenAI, Nvidia, Meta, Microsoft, and more.

Our research team spends hundreds of hours a week summarizing the latest news, and finding you the best opportunities to save time and earn more using AI.

Nvidia introduces "Align Your Steps", an approach to optimizing the sampling schedules of diffusion models for high-quality outputs. They leverage methods from stochastic calculus and find optimal schedules specific to different solvers, trained diffusion models and datasets.

They performed rigorous quantitative experiments on standard image generation benchmarks and found that these schedules result in consistent improvements across the board in image quality. They also performed a user study for text-to-image models, and found that on average images generated with these schedules are preferred twice as much.

Hyper-SD from Bytedance is one of the new State-of-the-Art diffusion model acceleration techniques. It delivers high quality images with low step inferences. It surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference. Try their 1-step SDXL demo here.

Visual Comparison between Hyper-SD and Other Methods

A study showed that AI could predict people's political orientations from their facial images. Conducted by Stanford's Michal Kosinski, the study involved scanning the faces of 591 participants under controlled conditions to create a numerical "fingerprint" that could predict political views. This discovery underscores the potential for misuse of such technology, as it can quickly analyze many people at low cost, posing a significant threat to privacy.

Meme of the Day

Austrian political party uses AI to generate a "manly" picture of their candidate:

The AI photo vs How he actually looks like

That’s it for today, have a lovely weekend!

What'd you think of today's edition?

Login or Subscribe to participate in polls.