There is no dearth of AI-image content creation tools. Over the past few months, we have covered DALL-E 2 and Stable Diffusion. Recently, Meta released its AI-powered video creation system, and now Google has made one too. It is called Google Imagen Video AI, and like all the other systems mentioned above, it can produce videos based on natural language input.
Imagen Video AI has its roots in the Imagen program, which like DALL-E 2 and Stable Diffusion, uses text input to produce images. Google’s tool combined image, text, video, and text datasets and was trained on it to produce artistic videos. Despite the results looking murky at best, it sparks hope for the future of AI’s involvement in the creative industry.
Google Imagen Video AI: How do the videos look?
Google Imagen Video AI is trained on an “internal dataset” made up of 14 million videos and 60 million still images. On top of that, it also used 400 million images in the LAION-400M open dataset. The result of this huge training dataset is that the tool can produce five-second video clips based on the inputs you supply.
Unlike images, where you have to supply a line on two, you need to be super descriptive of the video. It is because each shot has to be different, and unless you describe it verbatim, the results won’t be delightful. There are a few video clips, like a panda eating a bamboo shoot and a video of ships battling a storm in the sea. The sea video also shows the camera adjusting the angles, which you must mention if you want to see the effect in the video.
Unlike DALL-E 2, which produced some rich images, Google Images Video AI has much more ground to cover. Unlike static images, it has to deal with 3-D objects and how they would move and interact with other objects in the video. So, the current samples aren’t that impressive. However, it is a product from Google, a company that can infuse a lot more R&D to improve the tool. Google doesn’t have any plans to release it for public usage as there is a lot to fix.