The AI program Sora generated a video featuring this artificial woman based on a text prompt Sora/OpenAI
OpenAI has unveiled its latest artificial intelligence system, a program called Sora that can transform text descriptions into photorealistic videos. The video generation model is spurring excitement about advancing AI technology, along with growing concerns over how artificial deepfake videos worsen misinformation and disinformation during a pivotal election year worldwide.
The Sora AI model can currently create videos up to 60 seconds long using either text instructions alone or text combined with an image. One demonstration video starts with a text prompt that describes how 鈥渁 stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage鈥. Other examples include a dog frolicking in the snow, vehicles driving along roads and more fantastical scenarios such as sharks swimming in midair between city skyscrapers.
Advertisement
鈥淎s with other techniques in generative AI, there is no reason to believe that text-to-video will not continue to rapidly improve 鈥 moving us closer and closer to a time when it will be difficult to distinguish the fake from the real,鈥 says at the University of California, Berkeley. 鈥淭his technology, if combined with AI-powered voice cloning, could open up an entirely new front when it comes to creating deepfakes of people saying and doing things they never did.鈥
Sora is based in part on OpenAI鈥檚 preexisting technologies, such as the image generator DALL-E and the GPT large language models. Text-to-video AI models have lagged somewhat behind those other technologies in terms of realism and accessibility, but the Sora demonstration is an 鈥渙rder of magnitude more believable and less cartoonish鈥 than what has come before, says , co-founder of SocialProof Security, a white-hat hacking organisation focused on social engineering.
To achieve this higher level of realism, Sora combines two different AI approaches. The first is a diffusion model similar to those used in AI image generators such as DALL-E. These models learn to gradually convert randomised image pixels into a coherent image. The second AI technique is called 鈥渢ransformer architecture鈥 and is used to contextualise and piece together sequential data. For example, large language models use transformer architecture to assemble words into generally comprehensible sentences. In this case, OpenAI broke down video clips into visual 鈥渟pacetime patches鈥 that Sora鈥檚 transformer architecture could process.
Sora鈥檚 videos still contain plenty of mistakes, such as a walking human鈥檚 left and right legs swapping places, a chair randomly floating in midair or a bitten cookie magically having no bite mark. Still, , a senior research scientist at NVIDIA, took to the social media platform X to praise Sora as a 鈥渄ata-driven physics engine鈥 that can simulate worlds.
The fact that Sora鈥檚 videos still display some strange glitches when depicting complex scenes with lots of movement suggests that such deepfake videos will be detectable for now, says at Princeton University. But he also cautioned that in the long run 鈥渨e will need to find other ways to adapt as a society鈥.
OpenAI has held off on making Sora publicly available while it performs 鈥渞ed team鈥 exercises where experts try to break the AI model鈥檚 safeguards in order to assess its potential for misuse. The select group of people currently testing Sora are 鈥渄omain experts in areas like misinformation, hateful content and bias鈥, says an OpenAI spokesperson.
This testing is vital because artificial videos could let bad actors generate false footage in order to, for instance, harass someone or sway a political election. Misinformation and disinformation fuelled by AI-generated deepfakes ranks as a major concern in academia, business, government and other sectors, as well as .
鈥淪ora is absolutely capable of creating videos that could trick everyday folks,鈥 says Tobac. 鈥淰ideo does not need to be perfect to be believable as many people still don’t realise that video can be manipulated as easily as pictures.鈥
AI companies will need to collaborate with social media networks and governments to handle the scale of misinformation and disinformation likely to occur once Sora becomes open to the public, says Tobac. Defences could include implementing unique identifiers, or 鈥渨atermarks鈥, for AI-generated content.
When asked if OpenAI has any plans to make Sora more widely available in 2024, the OpenAI spokesperson described the company as 鈥渢aking several important safety steps ahead of making Sora available in OpenAI鈥檚 products鈥. For instance, the company already uses automated processes aimed at preventing its commercial AI models from generating depictions of extreme violence, sexual content, hateful imagery and real politicians or celebrities. With more people than ever before , those safety steps will be crucial.
Topics:



