April 17th, 2024

DALL-E as Battlefield Tool, Coachella animation, Perturbed-Attention Guidance, Pixart Sigma, InstantMesh

Hello - this is SinkIn Newsletter, a newsletter made by sinkin.ai. There are interesting things going on in the image AI world every day. We try to capture them with a 5 minutes read, so you can quickly stay up to speed with the latest trends and breakthroughs.

Microsoft has proposed leveraging OpenAI's DALL-E, an AI image generation tool, for U.S. military applications, despite OpenAI's stated mission to avoid harm and weaponry development. This suggestion surfaced amidst OpenAI’s recent shift away from prohibiting military work, marking a potential pivot in the company's ethical stance. The proposal detailed using DALL-E's synthetic image generation to train military battle management systems, aiding in target recognition and coordination during combat. While OpenAI denies involvement or selling tools for these purposes, the discussion raises questions about the ethical implications of AI's role in military operations and the potential indirect contribution to conflict.

This is the ComfyUI workflow used to create an animation video played at Coachella (check out the video below). The workflow utilizes two IPAdapters and an alpha mask to separate the subject and the background so you have total control over both and they are not tied to one another. You’ll also find a video tutorial walkthrough of the workflow on the Civitai page as well.

The Coachella animation video

Recently implemented in ComfyUI, Perturbed-Attention Guidance (PAG) is changing the game by enhancing prompt adherence and composition coherence without sacrificing image quality. This new method outshines others by maintaining the integrity of image fidelity while bringing structure to complex visual narratives. Check out the user-shared basic pipeline settings and impressive A/B image examples. Experiment with the recommended checkpoints for different styles, and see how PAG, along with the optional AutomaticCFG, can transform your AI-generated art into coherent masterpieces.

Generate high-fidelity 4K images from text prompts using PixArt-Sigma, a state-of-the-art diffusion model. PixArt-Sigma achieves excellent alignment with prompts. It does so efficiently, evolving from PixArt-alpha through a process termed weak-to-strong training - leveraging higher quality data and an improved attention mechanism. With just 0.6 billion parameters, PixArt-Sigma reaches new heights in text-to-image generation.

prompt: orange cat wrapped in white bandages and black dog wrapped in red bandages sitting on a bench on top of a hill filled with round stones, photo, cinematic

The InstantMesh framework from Tencent ARC presents an efficient solution for converting a single image into a 3D mesh, combining a conventional multiview diffusion model with a sparse-view reconstruction model for improved generation quality and training scalability. This system is designed to produce varied 3D assets swiftly, with generation times averaging around 10 seconds. Comparative studies with public datasets indicate that InstantMesh provides superior performance over current image-to-3D conversion methods. For broader application and community contribution, the team behind InstantMesh has made the code, weights, and a demonstrative application openly available. Check out the HuggingFace demo.

Meme of the Day

That’s it for today, see you next time!

What'd you think of today's edition?

Login or Subscribe to participate in polls.