AI can now make creepy videos of people using just ONE photo – but Microsoft won’t release tool over impersonation fears

The new AI tech can animate a single image into a realistic video with audio synching

MICROSOFT has realised new AI that can make creepy videos of people using just one photo – but won’t release tool over impersonation fears.

The technology can create synchronised animated clips of a person talking or singing with a single snap of their face and an audio track.

2

The new AI tech can animate a single image into a realistic video with audio synchingCredit: Alamy
Microsoft has refused to release the codes over fears of impersonation

2

Microsoft has refused to release the codes over fears of impersonationCredit: Alamy

The computer giant’s Research Asia team unveiled the VASA-1 model this week and say in future it could even power virtual avatars that appear to say whatever the creator wants.

“It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviours,” says an accompanying research paper.

VASA – short for Visual Affective Skills Animator – can analyse a static image alongside audio to generate a realistic video with lip syncing, facial expressions and head movements.

It can’t, however, clone or simulate voices like other Microsoft research.

The company – co-founded by billionaire Bill Gates – claims the model is a significant improvement on previous speech animation methods in terms of realism, expressiveness and efficiency.

In February, an AI model called EMO: Emote Portrait Alive from Alibaba’s Institute for Intelligent Computing research group used a similar approach to VASA-1 called Audio2Video.

Microsoft researchers trained their tech on the VoxCeleb2 dataset created in 2018 by a team from the University of Oxford.

That dataset claims to hold over a million “utterances” from 6,112 celebrities taken from videos uploaded to YouTube.

VASA-1 can reportedly generate videos at a resolution and frame rate that would not look out of place if used in realtime applications like video conferencing.

A research page released as part of the launch showcases the tool in use, with people singing and speaking, as well as showing how the model can be controlled.

Mona Lisa is even seen rapping.

The researchers are adamant that their intention with the tool is not to enhance deepfaking.

The site reads: “We are exploring visual effective skill generation for virtual, interactive characters, NOT impersonating any person in the real world.

“This is only a research demonstration and there’s no product or API release plan.”

The researchers are instead touting the potential for it to be used in education and even to provide companionship.

They are, however, refusing to release the code that powers the model.

Microsoft is not the only team developing similar technology, with increased realism and availability likely only a matter of time.

Share This Article