Meta introduces AI models for video generation, image editing

Social media company Meta is developing new AI-based tools for Facebook and Instagram users

Social media giant Meta has introduced its latest artificial intelligence (AI) models for content editing and generation, according to a blog post on Nov. 16.

The company is introducing two AI-powered generative models. The first, Emu Video, which leverages Meta’s previous Emu model, is capable of generating video clips based on text and image inputs. While the second model, Emu Edit, is focused on image manipulation, promising more precision in image editing.

The models are still in the research stage, but Meta says their initial results show potential use cases for creators, artists and animators alike.

*Meta displays its new generative model Emu Edit. Source: Meta*

According to Meta’s blog post, the Emu Video was trained with a “factorized” approach, dividing the training process into two steps to allow the model to be responsive to different inputs:

“We’ve split the process into two steps: first, generating images conditioned on a text prompt, and then generating video conditioned on both the text and the generated image. This “factorized” or split approach to video generation lets us train video generation models efficiently.”

Based on a text prompt, the same model can “animate” images. According to Meta, instead of relying on a “deep cascade of models”, Emu Video only uses two diffusion models to generate 512×512 four-second long videos at 16 frames per second.

Emu Edit, focused on image manipulation, will allow users to remove or add backgrounds to images, perform color and geometry transformations, as well as local and global editing of images.

“We argue that the primary objective shouldn’t just be about producing a “believable” image. Instead, the model should focus on precisely altering only the pixels relevant to the edit request,” Meta noted, claiming its model is able to precisely follow instructions:

“For instance, when adding the text “Aloha!” to a baseball cap, the cap itself should remain unchanged.”

Meta trained Emu Edit using computer vision tasks with a dataset of 10 million synthesized images, each with an input image and a description of the task, as well as the targeted output image. “We believe it’s the largest dataset of its kind to date,” the company said.

Meta’s newly released Emu model was trained using 1.1 billion pieces of data, including photos and captions shared by users on Facebook and Instagram, CEO Mark Zuckerberg revealed during the Meta Connect event in September.

Regulators are closely scrutinizing Meta’s AI-based tools, resulting in a cautious deployment approach by the technology company. Recently, Meta disclosed it won’t allow political campaigns and advertisers to use its AI tools to create ads on Facebook and Instagram. The platform’s general advertising rules, however, do not include any rules addressing AI specifically.