About tinyAI Forum on Generative AI on the Edge
tinyML Foundation Presents:
A tinyAI Virtual Forum on Generative AI and Foundation Models on the Edge
Dates: March 27-28, 2024
Time: 7-10am US Pacific Time
Format: LIVE and interactive virtual event (no pre-recorded sessions)
Foundation models and generative AI have enabled a new generation of cloud AI applications. Meanwhile, embedded machine learning and tinyML are bringing AI to the edge of the network. As hardware and algorithms become more efficient, these fields are starting to collide.
For the first time ever, this Virtual Forum brought together experts on edge AI, foundation models, and generative AI, bridging connections that helps us chart a new frontier for AI. How can we deploy foundation models to the edge? And what becomes possible when we do?
Foundation models are general purpose deep learning models, trained on huge, unlabelled datasets, that can be applied to many tasks. Generative AI models are those designed to produce realistic data: from writing and images to speech, audio, and signals. Edge AI is the deployment of AI to physical devices, where digital hardware meets the real world.
This Forum reviewed the state-of-the-art at the intersection of these fast-evolving fields. It showcased the potentially transformative impacts that foundation models can bring to the edge—including in hardware, software, tooling, applications, and AI design methodologies.
Two broad themes are discussed:
-
The current status and future outlook for foundation models and generative AI on resource-constrained embedded devices, and their role within the edge-to-cloud continuum.
-
The foundation model and generative AI tools, methodologies and techniques that can assist tinyML and edge AI developers and designers, including EDA tools, assisted labeling, and synthetic data.
A key goal for the Forum is to be generative itself, inspiring the AI community towards collaborative work!
Schedule
PST
7:00 am to 7:05 am
Welcome
Session Moderator: Davis SAWYER, Co-founder, Deeplite
7:05 am to 7:40 am
Visual Language Models for Edge AI 2.0
Song HAN, Associate Professor, MIT EECS
Abstract (English)
7:40 am to 8:00 am
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Zechun LIU, Research Scientist, Meta Reality Labs
Abstract (English)
8:00 am to 8:20 am
GenAI and the Transformation of Edge Computing: Leveraging Heterogeneous Circuits for Innovation
Jose Angel Miranda CALERO, Post-doc research assistant, Embedded Systems Laboratory, EPFL
Abstract (English)
The integration of Generative AI (GenAI) into edge devices is pushing the boundaries of technology, from enhancing smartphone photography in real time to enabling immediate voice interactions in smart home systems. This presentation will explore the forefront of GenAI applications and their deployment in embedded systems. The challenge of incorporating GenAI into these systems is significant, as it involves local processing and generation of complex data within the constraints of limited computational resources and energy efficiency. To overcome these challenges, the development of specialized accelerators and the adoption of open-source frameworks are critical. A notable solution in this space involves the use of advanced, heterogeneous integrated circuits, which facilitate the rapid development of flexible and reconfigurable solutions. Such systems, characterized by their versatile processing units, offer the necessary balance between adaptability and computational power essential for GenAI tasks. Illustrative examples, including wearable health monitors providing instant diagnostic feedback, demonstrate the practicality of embedding GenAI in edge devices with appropriate strategies. This approach, focusing on the optimization of these sophisticated integrated circuits, outlines a method to navigate beyond the limitations of existing systems, thereby expanding the potential of GenAI in our daily lives.
8:20 am to 8:40 am
Generative AI on the Edge for Connected Vehicles and Mobility
Alok RANJAN, Research Architect, Bosch
Abstract (English)
Background:
Generative Artificial Intelligence (GenAI) is disrupting the technology landscape of several industries and many research groups have picked up this fast-evolving trend and demonstrated the impact and capability of GenAI to unlock new values. From texts to images, audio, video, new content/design generation to synthetic data; it has been extensively explored by the early adopters. Although the solution like ChatGPT has been a household name, GenAI adoption in different industry domains beyond custom chatbots and automation is still under progression phase due to obvious questions on security and privacy.
Most recently, automotive industry is going through the transformation journey in the direction of PACE (Personalization, Autonomous, Connected and Electrified) and AI based services are further fueling these technical advancements. From Passenger vehicles to commercial vehicles, there are now more advanced sensors generating high volume data which is further leveraged to offer connected services. Furthermore, in-vehicle connected features and personalization are moving towards more advancements and new features integration are on high demand. With the advent of GenAI and its capabilities, it is now feasible to offer hyper personalization features to the customers which shall increase customer experience to next level.
Thrust areas:
Although the recent advancements in GenAI domain have been discussed by the community, it is worth to mention that majority of applications including training/retraining to finetuning is majorly cloud native. As we know, traditional cloud-based architecture is limited by network bandwidth, data privacy, latency etc., which could be addressed using technologies like Edge Artificial Intelligence (Edge AI) and TinyML. Edge AI has been realized in certain use cases for the domain of connected vehicles such as object detection and classification, prognosis, gesture recognition with control and many others. GenAI particularly on the edge within the vehicular systems or connected vehicles is yet to be them mainstream for automotive industry. This is motivated from the fact that the current architecture, optimizations strategies and finetuning on custom business data need research advancements from the edge ecosystem perspectives.
In this presentation, we will discuss the best practices and some most recent hybrid edge-cloud architectures which have been realized to bring GenAI on the edge from connected vehicles and mobility perspectives. In particular, we will first present the specific use cases and how GenAI is helpful to offer hyper personalization services. We then discuss the advantages and benefits considering edge ecosystem. Future research directions where the community could help in advancing the domain from vehicular ecosystem will be also presented considering topics like specialized hardware architectures, optimization strategies, privacy preserving techniques such as edge federated learning in hybrid architecture.
ViT@Edge: Distilled Vision Transformer based Foundation Model for Efficient Edge Deployment
Hasib-Al RASHID, Ph.D. Student, University of Maryland
Abstract (English)
I. MOTIVATION AND PROBLEM FORMULATION
The rise of large-scale foundational models built on trans-former architectures has revolutionized AI capabilities across image recognition (Vision Transformers – ViTs [1]) and natural language processing (e.g., ChatGPT [2]). While these models demonstrate remarkable performance, their massive size and computational requirements present a fundamental obstacle to their deployment on resource-constrained edge devices. For instance, ViT-base [1] contains 86 million parameters, resulting in a 344 MB model – far too large for embedded systems. Our goal is to develop innovative compression techniques that drastically reduce the footprint of foundational transformer models, enabling their widespread adoption in edge and tinyML applications without compromising their breakthrough capabilities.
II. PROPOSED VIT@EDGE
ViT@Edge proposes a novel solution that leverages the strengths of Transformer models within edge computing environments. Transformers, known for their foundational role in various domains such as NLP, Computer Vision, and multimodal areas, offer superior capabilities in modeling long-range dependencies. However, their complexity and the quadratic computational demand of their self-attention mechanism present significant challenges for real-world, industrial deployment, particularly in resource-constrained settings. On the other hand, while Convolutional Neural Networks (CNNs) are celebrated for their efficiency and practicality, especially in industrial applications due to their translation equivalence bias, they fall short in capturing global information. This is where ViT@Edge comes into play, merging the global information processing power of Transformers with the efficiency and practical deployment capabilities of CNNs. Given the distinct advantages and limitations of both Vision Transformer models and CNNs, exploring Knowledge Distillation (KD) between these two diverse architectures emerges as a fascinating area of study [3]–[5]. KD provides a pathway for distilling knowledge from large CNN models to ViT (Vision Transformer) models, obviating the necessity for extensive labeled datasets to supplement the inductive bias of the latter. This solution aims to address the computational hurdles of traditional Transformer models, making them viable for edge computing applications without sacrificing the comprehensive data understanding that Transformers provide. This approach not only optimizes computational resources but also maintains the adaptability and performance excellence of foundational models in various applications.
III. EXPECTED VIT@EDGE RESULTS
We have shown in [6] that with the inclusion of vanilla knowledge distillation with uniform 8-bit quantization we got 296× memory reduction for CNN-based multimodal pose classification task. Our Raspberry Pi 4B real-time deployment has 303.93 GOP/s/W power efficiency. With the proposed ViT@Edge, we expect similar or even more memory compression and power efficiency while deploying real-time edge processors/devices.
IV. ACTIONS CALL FOR EFFICIENT EDGE DEPLOYMENT
Our approach enhances CNNs by infusing them with global insights from vision transformers through representation-level distillation. This method surpasses traditional logitbased distillation, tapping into the deeper, interdependent knowledge within transformer representations. It is particularly effective for models trained via self-supervised methods, offering a versatile and task-agnostic solution. This task aims to:
• Conduct a comprehensive study of KD between Trans-formers and CNNs to leverage the strengths of both architectures.
• Develop methodologies for effective knowledge transfer from vision-transformer models to CNNs, focusing on both logits and representation levels.
• Explore and evaluate the impact of transferring global information from vision transformers to CNNs on various applications.
The exploration of Knowledge Distillation between Trans-former models and Convolutional Neural Networks serves as a bridge to amalgamate the strengths of both architectures, paving the way for innovative solutions in various domains.
8:40 am to 9:00 am
Solve edge AI problems with foundation models
Daniel SITUNAYAKE, Founding tinyML Engineer, Edge Impulse
Abstract (English)
Generative AI has created a “moment” in technology: suddenly, everyone is aware of what machine learning can do. This talk goes beyond the hype to reveal the four key capabilities of foundation models, and how we can use them to solve real engineering problems at the edge—even when the models seem too big to fit.
9:00 am to 9:20 am
Running an LLM on a Raspberry Pi
Pete WARDEN, CEO, Useful Sensors
Abstract (English)
Large language models like ChatGPT are a great fit for edge devices, but it can be hard to figure out how to get started running them outside of the cloud. In this talk Pete will cover the basics of deploying an open source LLM on a Raspberry Pi 5, explaining some of the options and tradeoffs involved, and discuss the use cases that this sort of system can support. He’ll also discuss what customer problems these models can help with, and how to ensure the inevitable hallucinations don’t pose a safety risk!
9:20 am to 9:40 am
Using Generative Models to Improve Generative Models
Yubei CHEN, Assistant Professor, UC Davis
Abstract (English)
9:40 am to 10:00 am
Optimizing Large Language Model (LLM) Inference for Arm CPUs
Dibakar GOPE, Principal Engineer, Machine Learning & AI, Arm
Abstract (English)
10:00 am to 10:20 am
Toward a Foundation Model for Efficient Damage Assessment Following Natural Disasters
Maryam RAHNEMOONFAR, Associate Professor of Computer Science and Engineering, Lehigh University
Abstract (English)
Natural disasters caused by climate change are becoming more frequent and severe. These disasters pose a threat to human health, infrastructure, and natural systems. In order to respond and recover quickly and effectively after a natural disaster such as a hurricane, wildfire, or flooding, access to aerial images is crucial for the response team. Small Unmanned Aerial Vehicles (UAVs) with cost–effective sensors are a great solution for collecting thousands of images with high flexibility and easy maneuverability for rapid response and recovery. Furthermore, UAVs can access hard–to–reach areas and perform data–gathering tasks that are unsafe or impossible for humans. Combining multiple data modalities such as vision, language, and radar data is a promising technique for damage assessment. However, applying deep learning methods to radar time series images is challenging due to the lack of enough training data compared to optical images. Moreover, optical data can be difficult to use due to their limitations in all weather conditions. Traditional analyses provide some insights into the data, but the complexity, scale, and multimodality nature of the data require advanced, intelligent solutions. In this presentation, I will discuss some of our current innovative solutions such as generative models for multimodal imagery and explainable and interactive models for multimodal vision and language perception. Our goal is to provide an accurate damage assessment with multi–modal data after a natural disaster and facilitate rapid response and recovery. I will also discuss our current efforts toward developing a foundation model that can be transferred to any robot–based multi–modal downstream tasks with very few labeled data.
10:20 am to 10:30 am
The Voice of AI: Harnessing Voice Meta Data for GenAI Solutions
Rick RADIA, Head of Product, MyVoice AI
Tom BARKER, Senior Machine Learning Engineer, MyVoice AI
Abstract (English)
PST
7:00 am to 7:05 am
Welcome + Recap of Day 1
Session Moderator: Davis SAWYER, Co-founder, Deeplite
7:05 am to 7:50 am
Panel: "GenAI is accelerating the Edge"
Participants:
Pete Bernard from EDGECELSIOR
Max Petrenko from AWS
Rajeev Muralidhar from AWS
Sally Ward-Foxton from EE Times
7:50 am to 8:10 am
On-Device Generative AI
Fatih PORIKLI, Sr. Director, Technology, Qualcomm Technologies, Inc.
Abstract (English)
Generative AI emerges as a transformative force, capable of creating new content such as text, code, images, video, audio, or other data, while handling complex dialogues and reasoning about problems. This disruptive technology is reshaping traditional approaches across various domains, from search algorithms and content creation to automation and problem solving, and redefining the user interface to computing devices. Its impact transcends industries, promising substantial advancements in utility, productivity, and efficiency. As the adoption of generative AI accelerates at an unprecedented pace, driving its computational demands to surge, on-device processing becomes more important than ever.
In this presentation, you’ll learn about:
- The pivotal role of on-device AI deployment.
- Full-stack AI optimizations to make on-device AI possible and efficient
- Advanced techniques like distillation, quantization, and speculative decoding
- How we deploy generative AI models on device with examples, including LVMs, LLMs, LMMs
- Qualcomm Technologies’ role in scaling on-device generative AI
8:10 am to 8:30 am
Architecture 2.0: Prompt Engineering and Foundation Models for Edge AI Hardware Design
Vijay Janapa REDDI, Associate Professor, Harvard University
Abstract (English)
8:30 am to 8:50 am
Empowering Collaborative Chip Design: Leveraging Generative AI for Custom Accelerators and Edge AI Innovation
Mohamed KASSEM, CTO and Co-Founder, Efabless
Abstract (English)
In this presentation, we delve into how an open-source chip design SoC platform stimulates collaborative chip design, leverages generative AI to optimize custom chip designs and develop tailored accelerators for small machine learning models.
Through collaborative community efforts, we explore the transformative impact of generative AI and custom accelerators, while addressing challenges and future directions in deploying these technologies at the edge.
Join us to discover the convergence of generative AI, custom chip design, and edge AI innovation, and explore new horizons in AI-driven hardware design.
8:50 am to 9:10 am
A technology game changer: How GenAI Will Reshape Learning
Gian Marco IODICE, Team and Tech Lead in the Machine Learning Group, Arm
Milja LAAKSO, Programme Specialist, UNICEF Learning Innovation Hub
Abstract (English)
How can the tech industry come together and harness the rapidly evolving fields of AI, including GenAI to benefit millions of children around the world who are not learning? As of 2022, 70% of 10-year-olds in low- and middle-income countries are unable to read and understand a simple statement by the end of primary school.The promise of frontier technologies as a catalyst to help children learn is no longer a “nice to have” but an emergency need for reaching ALL children everywhere. Yet as the world has been captivated by the rapid advancements in GenAI, millions of children still miss out even from benefitting from the advent of the internet due to lack of electricity, connectivity, and even access to devices.This session will explore how technologies combined with AI can be a game changer that could finally bridge the digital divide and help address the learning crisis.This session is a collaborative effort fueled by the innovative partnership between Arm and the UNICEF Global Learning Innovation Hub–Office of Innovation to inspire the tech industry. Together, Arm and UNICEF aim to catalyze change on the global education stage, setting the foundation for alternative futures where every child embarks on a fascinating adventure of learning.Don’t miss the chance to join this impactful session shaping the narrative of education for generations to come.
9:10 am to 9:30 am
Inspired by ‘Her’: AI interaction models we’d like to see in the world
Savannah KUNOVSKY, Managing Director, IDEO
Abstract (English)
When humans design paradigm-shifting technologies, they often create a similar interaction model as the technologies that came before. Take a look at ChatGPT or other generative AI systems, and you’ll notice they look a lot like a Google search bar. We’re still in the awkward, early phases of AI, where the technology is so new, the mainstream interaction models align with we’re already familiar with—even if they don’t make the most of the new technology. In light of this, we’ve envisioned alternative AI interaction models, drawing inspiration from the film “Her” and the principles of calm technology to propose new ideas for AI interfaces.
9:30 am to 10:00 am
From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges
Sai Krishna Revanth VARUMA, Doctoral Candidate in Computer Science, University of South Carolina
Abstract (English)
Generative Artificial Intelligence (AI) has shown tremendous prospects in all aspects of technology, including design. However, due to its heavy demand on resources, it is usually trained on large computing infrastructure and often made available as a cloud-based service. In this position paper, we consider the potential, challenges, and promising approaches for generative AI for design on the edge, i.e., in resource-constrained settings where memory, compute, energy (battery) and network connectivity may be limited. Adapting generative AI for such settings involves overcoming significant hurdles, primarily in how to streamline complex models to function efficiently in low-resource environments. This necessitates innovative approaches in model compression, efficient algorithmic design, and perhaps even leveraging edge computing. The objective is to harness the power of generative AI in creating bespoke solutions for design problems, such as medical interventions, farm equipment maintenance, and educational material design, tailored to the unique constraints and needs of remote areas. These efforts could democratize access to advanced technology and foster sustainable development, ensuring universal accessibility and environmental consideration of AI-driven design benefits.
AI/ML for Embodied Systems at the Edge: Generative Models, LLMs and Beyond
Andreas ANDREOU, Professor, Johns Hopkins University
Abstract (English)
Problem:
Operation and decision making at the EDGE for embodied applications such as autonomous robotics requires real-time local processing with extreme energy efficiency and low latency. Hardware must perform insightful information extraction from sensing signals to symbols, and distill knowledge into models necessary for reasoning and inference to generate action signals where computations are done by COTS or specialized hardware at the edge. A canonical model of processing in embodied systems is shown in the diagram on the right.
The Role of Generative AI in Robust Reasoning:
Generative AI such as Large Language Models (LLMs) can play a key role for real-time learning and Contextual Modeling so that subsequently the machine can perform robust reasoning and produce desired behavior (actions). Despite the recent explosion of advances in Large Language Models for text (ChatGPT) and images (Dall-E, Midjourney) generative AI, the computational structures “under the hood” can have a broad impact. For example, graphical models such as the Diffusion Models in Dalle-E or Deep Belief Networks (DBNs), are of generative nature consisting of multiple layers of nodes connected as Markov random fields where sampling plays a central role. The latter are computationally intensive necessitating micro-architectural components available on custom hardware such as SpiNNaker SOC Arm M4 chip multiprocessor2 or FPGAs. Action recognition without a camera at the edge4, necessitates the recognition of action using low dimensional data (time series of micro-Doppler acoustic signatures) but it is trained on signatures from high dimensional signals that can be generated from Kinect 3D cloud data or physical model (Knowledge). Continuous life-long learning to keep the Knowledge up to date necessitates generative AI and sampling in Conditional Restricted Boltzmann Machines (RCBMs) or Conditional Deep Belief Networks (CDBNs). An example is shown on the side where a micro-Doppler time series of an action (top) is used to seed a CDBN and the CDBN “hallucinates” the time series of response (bottom).
Generative AI for next Generation AI machines:
The natural language prompting of ChatGPT can also be employed to produce synthesizable Verilog for chip design. We have employed ChatGPT4 for natural language driven hardware design. The AI-generated design, a synthesizable and functional verilog description for the entirety of a programmable Spiking Neuron Array including an SPI interface, for neurocomputing at the edge. The latter was verified in simulation using handcrafted testbenches and is currently fabricated in Skywater 130nm CMOS technology through Tiny Tapeout 5 using an open-source EDA flow.
LLM Pipelines: Seamless Integration on Embedded Devices
Enzo RUEDAS, AI Engineering Student, NXP Semiconductors
Abstract (English)
Large Language Models (LLMs) and broader Generative Artificial Intelligence have gained increasing prominence in the AI landscape. Various initiatives, including Hugging Face and libraries such as GGML, have played a crucial role in facilitating the accessibility and development of LLMs. Nevertheless, deploying such models on embedded devices remains extremely challenging, given the inherent constraints of computational power and memory. NXP’s LLM Pipelines project aims to enhance user experience with LLMs on embedded devices, facilitating more accessible deployment and improving human-machine interactions.
This presentation details our solutions to improve LLMs porting through quantization and fine-tuning. In particular, our experiments focus high end NXP MPUs, such as:
– i.MX 8M Plus featuring a 4x Arm Cortex-A53 Processor and a Neural Processing Unit (NPU)
– i.MX 93 featuring a 2x Arm Cortex-A55 Processor and an NPU
– i.MX 95 featuring a 6x Arm Cortex-A55 Processor and an NPU
When deploying AI models in resource-constrained environments, machine learning quantization techniques offer several significant benefits, including reductions in model size and memory footprint, as well as faster execution time. However, most integer quantization techniques can result in important accuracy drops, especially in auto-regressive models. The LLM Pipelines project features advanced quantization algorithms, encompassing model compression, dynamic quantization and latest post-training static quantization techniques. Our presentation will focus on comparing these different approaches.
On the other hand, most use-cases for embedded LLMs necessitate specialization, either to limit computational costs and usage or to mitigate hallucinations and biases. For example, a car assistant should focus on assisting the driver with vehicle-related tasks, avoiding unrelated topics like politics. Using Retrieval Augmented Generation (RAG), we explore various fine-tuning scenario for the smart assistant, utilizing user manual knowledge or even interacting with machine sensors. This presentation will address different RAG- related challenges, including constricting the input prompt to meet hardware requirements and handling out-of-topic queries.
10:00 am to 10:10 am
Edge of Tomorrow: Unleashing the Power of Small LLMs for Generative AI at the Edge
Mallik P. MOTURI, VP Product and Business Development, Syntiant
Abstract (English)
This talk dives into the innovations crafting small, efficient Large Language Models (LLMs) for edge devices. By distilling the essence of LLMs, we achieve real-time processing, circumventing cloud latency and bandwidth issues. We’ll explore advancements in LLM architectures and the strategic optimizations that enable their miniaturization without sacrificing performance. The focus will be on how these developments not only reduce on-chip memory demands but also pave the way for bespoke hardware, tailored to enhance edge AI’s capabilities. Addressing the economic impact, we’ll discuss the direct correlation between memory footprint and chip cost, and how innovations in this realm are crucial for affordable AI deployment.
Schedule subject to change without notice.
Committee
Davis SAWYER
Chair
Deeplite
Evgeni GOUSEV
Qualcomm Research, USA
Gian Marco IODICE
Arm
Tinoosh MOHSENIN
Johns Hopkins University
Danilo PAU
STMicroelectronics
Max PETRENKO
Amazon
Daniel SITUNAYAKE
Edge Impulse
Speakers
Andreas ANDREOU
Johns Hopkins University
Tom BARKER
MyVoice AI
Peter BERNARD
EDGECELSIOR
Jose Angel Miranda CALERO
Embedded Systems Laboratory, EPFL
Yubei CHEN
UC Davis
Aizip
Dibakar GOPE
Arm
Song HAN
MIT EECS
NVIDIA
Gian Marco IODICE
Arm
Mohamed KASSEM
Efabless
Savannah KUNOVSKY
IDEO
Milja LAAKSO
UNICEF Learning Innovation Hub
Zechun LIU
Meta Reality Labs
Terry MERSCHAT
Useful Sensors Inc.
Mallik P. MOTURI
Syntiant
Rajeev MURALIDHAR
Amazon Web Services
Max PETRENKO
Amazon
Fatih PORIKLI
Qualcomm Technologies, Inc.
Rick RADIA
MyVoice AI
Maryam RAHNEMOONFAR
Lehigh University
Alok RANJAN
Bosch
Hasib-Al RASHID
University of Maryland
Vijay Janapa REDDI
Harvard University
MLCommons
Enzo RUEDAS
NXP Semiconductors
Daniel SITUNAYAKE
Edge Impulse
Mohamed SHALAN
Efabless Corporation
Sai Krishna Revanth VARUMA
University of South Carolina
Sally WARD-FOXTON
EE Times
Pete WARDEN
Useful Sensors