May 16th, 2024

GPT-4o's Powerful Image Capabilities

Suuup - this is SinkIn Newsletter, a 5 minutes read made at sinkin.ai to cover the most interesting stuff in the Image AI world.

OpenAI just launched GPT-4o. While the presentation mainly focused on audio and video capabilities, the “Exploration of capabilities“ section on the announcement page showcases its powerful image capabilities. The model can create visual narratives with continuity, convert photo to caricature, do iterative editing, follow extremely complex instructions, 3D object synthesis, and more. They are not available in ChatGPT yet and it’s unclear when they will be rolled out.

Samples of what GPT-4o can do

IC-Light is a project to manipulate the illumination of images. The name "IC-Light" stands for "Imposing Consistent Light". This GitHub repository offers innovative models for text-conditioned and background-conditioned relighting, ensuring consistent and impressive illumination effects. Perfect for enhancing images with precise lighting adjustments. You can play with the demo here. It’s also available as a ComfyUI node.

Prompt: beautiful woman, detailed face, light and shadow

Tencent's Hunyuan-DiT is an open source text-to-image model supporting both Chinese and English. It features fine-grained language understanding and iterative image generation through multi-turn dialogues. They claim HunyuanDiT is the best open source text-to-image model and beats SD 3. However, the community testing shows mixed results.

At Google I/O 2024, Google unveiled Veo, an AI model that generates HD videos from text, image, or video prompts. Similar to OpenAI's Sora, Veo can produce 1080p videos over a minute long and edit videos via text commands. It maintains visual consistency and creates detailed scenes with cinematic effects. Initially available through VideoFX on Google's AI Test Kitchen, Veo aims for responsible use, with watermarking and safety filters to mitigate risks.

OpenAI is considering allowing AI-generated explicit content, including pornography, within responsible and age-appropriate limits. Despite maintaining a firm ban on deepfakes and prioritizing safety and legality, especially for children, the company is exploring the potential for Not-Safe-For-Work content through its technologies. This has sparked debate and criticism, with some arguing it could undermine OpenAI's mission of developing safe and beneficial AI.

Meme of the Day

That’s it for today, hope they are as refreshing as a nice bathroom sink!

What'd you think of today's edition?

Login or Subscribe to participate in polls.