Video generation models – a far-fetched dream?
The first wave of generative AI marked remarkable advancements in text -to-text and text-to-image models, bringing photorealistic images to the forefront. Models such as DALL-E have continually enhanced their capabilities, producing increasingly lifelike images. The next leap forward, anticipated in the near future, lies in video generation models that include text-to-video, image-to-video, and audio-to-video, a progression hinted at in 2023. The text-to-video conversion process faces significant challenges, including the following:
- Computational demands for ensuring spatial and temporal frame consistency. Hence, training such models becomes unaffordable for most researchers.
- A lack of quality in multi-modal datasets for training the models.
- The complexity of effectively describing videos for the models to learn. This often requires a series of detailed prompts or narratives.
Although there have been some limitations with these models, we have seen some continual progress in video generation techniques such as GANs, Variational Auto Encoders, Transformers, and Stable Diffusion. Some popular video generation models have been released by organizations such as Runway ML, Stable Video Diffusion by Stability AI, Moonshot by Salesforce, and Google’s VideoPoet.
SORA, from OpenAI, is the most recent one with complex scene generation and advanced language comprehension capabilities. We provided more details on this model in Chapter 1.
Video generation models possess profound capabilities, with the potential to influence society, especially as they evolve and mature. This influence becomes particularly critical during election seasons, where the information landscape can shape public opinion and democratic outcomes significantly. However, this power also carries the risk of severe consequences if not implemented responsibly. Consequently, it’s imperative to establish robust ethical guidelines and safeguards, especially during sensitive periods such as elections, to ensure that these technologies are used in a manner that is beneficial and does not undermine the integrity of democratic processes.
Can AI smell?
We have learned that AI can hear, see, and speak. But can AI smell too? Recent research in the field of AI has shown significant progress in AI’s ability to “smell.” Various studies have explored how AI can analyze and interpret odors, a task that’s traditionally been challenging due to the complexity and subjective nature of olfaction:
- AI model outperforms humans in describing odors: A study demonstrated that an AI model was more accurate than human panelists in predicting the smell of different molecules. The model was particularly effective at identifying pairs of structurally dissimilar molecules that had similar smells, as well as characterizing a variety of odor properties, such as odor strength, for a large number of potential scent molecules. https://techxplore.com/news/2023-08-closer-digitizing-odors-human-panelists.html.
- AI in detecting illnesses through breath analysis: Laboratories have been using machines such as gas-chromatography mass-spectrometers (GC-MSs) to detect substances in the air, including volatile organic compounds present in human breath. These compounds can indicate various illnesses, including cancers. AI, particularly deep learning networks, is being adapted to analyze these compounds more efficiently, significantly speeding up the process of identifying specific patterns in breath samples that indicate certain diseases. https://www. smithsonianmag.com/innovation/artificial-intelligence-may-be-able-to-smell-illnesses-in-human-breath-180969286/
- Artificial networks learning to smell like the brain: Research at MIT has involved building an artificial smell network inspired by the fruit fly’s olfactory system. This network, comprising an input layer, a compression layer, and an expansion layer, mirrors the structure of the fruit fly’s olfactory system. The network was able to organize itself and process odor information in a manner strikingly similar to the fruit fly brain, demonstrating AI’s potential to mimic biological olfactory systems. https://news.mit.edu/2021/artificial-networks-learn-smell-like-the-brain-1018.
- AI “nose” predicts smells from molecular structures: AI technology has been developed to predict the smell of chemicals based on their molecular structures. Thisadvancement is significant as it opens up the possibility of designing new synthetic scents and provides insights into how the human brain interprets smell. https://phys.org/news/2023-09-ai-nose-molecular.html.
- Training AI to understand and map odors: Researchers have trained a neural network with thousands of compounds and corresponding smell labels from perfumery databases. The AI was able to create a “principal odor map” that visually shows the relationships between different smells. When tested, the AI’s predictions of how a new molecule would smell were found to be more accurate than those of human panelists. https://www.popsci.com/science/ teach-ai-how-to-smell/.
This section primarily focused on multimodal capabilities and how they will enhance our communication with AI as these capabilities mature. In the next section, we will discuss how these multimodal capabilities can foster creativity and innovation within industry-specific, generative AI applications.