Integrating generative AI with intelligent edge devices
As we progress into 2024, the fusion of generative AI with intelligent edge devices is poised to revolutionize the technology landscape. Examples of edge devices include smartphones, tablets, autonomous vehicles, medical devices, wearable devices, and IoT devices such as smart thermostats, cameras, and more. SLMs are becoming a pivotal component of edge computing, offering a new dimension of smart, localized processing. This is because we face challenges with LLMs when they’re integrated on edge devices. LLMs need to be optimized before deploying edge devices for several reasons:
- Limited resources: Edge devices typically have constrained computational resources, including CPU, GPU, memory, and storage. Large models require substantial resources for both storage (>500 GB) and computation.
- Energy efficiency: Running large models can consume significant power, which is critical for battery-operated devices. Optimizations aim to reduce the energy consumption of these models.
- Latency: For real-time applications, it’s crucial to have low latency. Large models can lead to slower inference times, so optimizing the model can help meet the latency requirements of the application.
- Bandwidth: Deploying large models or updating them over the network can consume significant bandwidth, which might be limited or costly in some edge environments.
- Cost: Computational resources on edge devices are not only limited but also potentially more expensive. Optimizing models can reduce the overall cost of deployment and operation.
There are different techniques to achieve this kind of efficiency in LLMs. One method, known as “knowledge distillation” or “domain reduction,” trains a smaller model to emulate a larger one using less data. Another method, “quantization,” shrinks the model size and boosts performance by decreasing the precision of its weights and activations, while still maintaining accuracy.
A device named Rabbit R1, which was announced at CES this year, a 2.88-inch touchscreen is an early example of the integration of generative AI on edge devices.
More important emerging trends and 2024–2025 predictions
The following trends and predictions are derived from our comprehensive research and experience, as well as insights shared by leading industry experts:
- LLMs optimized for structured data: LLMs excel in comprehending and generating natural language text, benefiting from extensive training on diverse textual sources, such as books and web pages. Yet, their proficiency in interpreting structured, tabular data remains less developed. Nevertheless, this domain is witnessing burgeoning research, with promising advancements anticipated in 2024 and beyond. A notable initiative in this trajectory is Table-GPT by Microsoft, which signifies a concerted effort to enhance LLMs’ capabilities in processing tabular data by specifically fine-tuning them on such datasets (https://arxiv.org/abs/2310.09263).
- Maturity of LLMOps: In 2023, the focus was predominantly on developing and transitioning Proof of Concepts (PoCs) into production environments. As we progress, the emphasis will shift toward refining and streamlininglarge language model operations (LLMOps) by leveraging automation and enhancing efficiency. This next phase is poised to attract increased investment from organizations, signaling a commitment to optimize and scale the operational aspects of these advanced AI systems.
- Building products with Agentive AI: In Chapter 6, we delved into frameworks for autonomous agents, such as Autogen, and explored groundbreaking research and applications in this arena. These innovative developments showcase AI systems autonomously interacting and executing tasks. As we move from 2024 and the years that follow, we anticipate a surge in products that integrate agentive actions, marking a significant evolution in how AI enhances user productivity.
- Increasing context window: We can expect ongoing progress in the realm of context window capabilities. Google recently unveiled the Gemini 1.5 model, which boasts an impressive context window of 1 million tokens.
- More AI-generated influencers: The popularity of virtual AI avatars is growing, as seen with figures such as Lil Miquela on Instagram, who has millions of followers and partnerships with big brands such as Chanel, Prada, and Calvin Klein, despite being a digital creation. We will continue to see more AI influencers gain popularity in the future.
Real-time AI: Real-time AI matters a lot for user experience. As compute prices start to go down, we will see evolving LLM architectures that produce faster responses. An example we saw in 2023 was Krea AI’s real-time image transfer.