OpenAI, the company renowned for ChatGPT, has now unveiled Sora, its latest artificial intelligence (AI)-powered text-to-video generation model.
Positioned as a significant advancement in the field, Sora is capable of generating videos up to 60 seconds in length, surpassing its competitors such as Google's Lumiere.
Currently accessible to red teamers and select content creators, OpenAI plans to integrate Coalition for Content Provenance and Authenticity (C2PA) metadata when deploying Sora in its products.
In a post on X, OpenAI wrote, "Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions."
Notably, Sora's claimed video length exceeds tenfold that of its competitors, with Google's Lumiere generating 5-second videos, and Runway AI and Pika 1.0 producing 4-second and 3-second videos, respectively.
OpenAI's X account and CEO Sam Altman showcased multiple videos generated by Sora, along with the prompts used for their creation.
The resulting videos exhibit high detail and seamless motion, setting Sora apart from other video generators. The model can generate intricate scenes with multiple characters, diverse camera angles, specific motion types, and accurate subject and background details. This capability stems from Sora's utilisation of both prompts and understanding of "how those things exist in the physical world."
Sora employs a diffusion model and a transformer architecture similar to GPT models. The data, represented as patches (similar to tokens in text-generating models), includes collections of videos and images. OpenAI used this visual data to train the model across various durations, resolutions, and aspect ratios. Additionally, Sora can transform a still image into a video.
However, OpenAI acknowledges the model's current limitations. The company states, "The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene and may not understand specific instances of cause and effect."
To prevent misuse for creating deepfakes or harmful content, OpenAI is developing tools to detect misleading content. The incorporation of C2PA metadata in generated videos is part of the company's commitment to responsible AI use, following its recent adoption of the DALL-E 3 model. OpenAI is collaborating with red teamers, specifically domain experts in areas such as misinformation, hateful content, and bias, to enhance the model.