April 22, 2024

About

The Symposium will be held in conjunction with the tinyML Summit 2024, the premier annual gathering of senior level technical experts and decision makers representing fast growing global tinyML community.

The tinyML Research Symposium will focus on emerging tinyML technology, research, and theory that will potentially come to market in the coming decade. The tinyML Summit focuses on the technology and solutions available now or in the immediate future. Join us in harnessing the power of the future, today. By attending the 2024 tinyML Summit and becoming part of our dynamic community, you are not just an observer; you are a catalyst to unleash a world of possibilities of Tiny ML.

Venue

Hyatt Regency San Francisco Airport

1333 Bayshore Highway, Burlingame, CA 94010

Contact us

Rosina HABERL

enohP

liaM

Technical Program Commmittee

Farshad Akbari	Infineon
Shaahin Angizi	New Jersey Institute of Tech
Zain Asgar	Stanford
Kshitij Bhardwaj	LLNL
Petrut Bogdan	Innatera
Alessio Burrello	Politecnico di Torino
Francesco Conti	University of Bologna
Federico Corradi	TU Eidhoven
Marco Donato	Tufts
Elisabetta Farella	Fondazione Bruno Kessler

Datta Gourav	Amazon
Jeremy Holleman	University of North Carolina, Charlotte
Faroozan Karimzadeh	Georgia Tech
Hana Khamfroush	U Kentucky
Qianyun Lu	NXP Semiconductors
Niall Lyons	Infineon
Hajar Mousannif	Cadi Ayyad Univ. Morocco
Ankita Nayak	Qualcomm
Guilherme Paim	KU Leuven
Danilo Pau	STMicroelectronics
Christian Peters	Bosch
Arman Roohi	University of Nebraska
Avik Santra	Infineon
Theo Theocharides	University of Cyprus
Xiaoxuan Yang	U Virginia

Schedule

Speakers

Commitee

Schedule

7:30 am to 9:00 am

Registration

9:00 am to 9:15 am

Welcome and Opening Statement

Session Moderator: Tinoosh MOHSENIN, Associate Professor, Johns Hopkins University

9:15 am to 10:00 am

Keynote

Prof. Borivoje Nikolic –

Session Moderator: Tinoosh MOHSENIN, Associate Professor, Johns Hopkins University

When Considering New Hardware Ideas, Build Complete ML Systems

Borivoje NIKOLIC, Professor, UC Berkley

Abstract (English)

The need for higher efficiency in running ML applications drives the development of new approaches to their execution in hardware. However, without considering the complete system implementation, it is often hard to envision where the performance bottlenecks are. To build and evaluate complete solutions we have developed the Chipyard framework, an integrated SoC design, simulation, and implementation environment for specialized compute systems. Chipyard includes configurable, composable, open-source, generator-based IP blocks that can be used across multiple stages of the hardware development flow while maintaining design intent and integration consistency. Through cloud-hosted or on-premises FPGA-accelerated simulation and rapid ASIC implementation, Chipyard enables continuous validation of physically realizable customized systems. We will showcase the use of this framework for accelerating ML workloads.

Bora Nikolic

10:00 am to 11:20 am

tinyML Algorithms

Session Moderator: Foroozan Karimzadeh, Postdoctoral Fellow, Georgia Institute of Technology

MicroHD: An Accuracy-Driven Optimization of Hyperdimensional Computing Algorithms for TinyML systems

Flavio PONZINA, Postdoctoral Researcher, University of California San Diego

Abstract (English)

Hyperdimensional computing (HDC) is emerging as a promising AI approach that can effectively target TinyML applications thanks to its lightweight computing and memory requirements. Previous works on HDC showed that limiting the standard 10k dimensions of the hyperdimensional space to much lower values is possible, reducing even more HDC resource requirements. Similarly, other studies demonstrated that binary values can be used as elements of the generated hypervectors, leading to significant efficiency gains at the cost of some degree of accuracy degradation. Nevertheless, current optimization attempts do not concurrently co-optimize HDC hyper-parameters, and accuracy degradation is not directly controlled, resulting in sub-optimal HDC models providing several applications with unacceptable output qualities.
In this work, we propose MicroHD, a novel accuracy-driven HDC optimization approach that iteratively tunes HDC hyper-parameters, reducing memory and computing requirements while ensuring user-defined accuracy levels. The proposed method can be applied to HDC implementations using different encoding functions, demonstrates good scalability for larger HDC workloads, and achieves compression and efficiency gains up to 200x when compared to baseline implementations for accuracy degradations lower than 1%.

Flavio Ponzina

Towards a tailored mixed-precision sub-8bit quantization scheme for Gated Recurrent Units using Genetic Algorithms

Riccardo MICCINI, Phd Student, Technical University of Denmark

Abstract (English)

Despite the recent advances in model compression techniques for deep neural networks, deploying such models on ultra-low-power embedded devices still proves challenging.
In particular, quantization schemes for Gated Recurrent Units (GRU) are difficult to tune due to their dependence on an internal state, preventing them from fully benefiting from sub-8bit quantization.
In this work, we propose a modular integer quantization scheme for GRUs where the bit width of each operator can be selected independently. We then employ Genetic Algorithms (GA) to explore the vast search space of possible bit widths, simultaneously optimising for model size and accuracy. We evaluate our methods on four different sequential tasks and demonstrate that mixed-precision solutions exceed homogeneous-precision ones in terms of Pareto efficiency. In our results, we achieve a model size reduction between 25% and 55% while maintaining an accuracy comparable with the 8-bit homogeneous equivalent.

Riccardo Micini

Tiny Graph Neural Networks for Radio Resource Management

Ahmad GHASEMI, Lecturer, University of Massachusetts Amherst

Abstract (English)

The surge in demand for efficient radio resource management has necessitated the development of sophisticated yet compact neural network architectures. In this paper, we introduce a novel approach to Graph Neural Networks (GNNs) tailored for radio resource management by presenting a new architecture: the Low Rank Message Passing Graph Neural Network (LR-MPGNN). The cornerstone of LR-MPGNN is the implementation of a low-rank approximation technique that substitutes the conventional linear layers with their low-rank counterparts. This innovative design significantly reduces the model size and the number of parameters without com- promising the system’s performance. We evaluate the performance of the proposed LR-MPGNN model based on several key metrics: model size, number of parameters, weighted sum rate as a metric for performance evaluation, and the distribution of eigenvalues of weight matrices. Our extensive evaluations demonstrate that the LR-MPGNN model achieves a sixtyfold decrease in model size, and the number of model parameters can be reduced by up to 98%. Performance-wise, the LR-MPGNN demonstrates robustness with a marginal 2% reduction in the best-case scenario in the normalized weighted sum rate compared to the original MPGNN model. Additionally, the distribution of eigenvalues of the weight matrices in the LR-MPGNN model is more uniform and spans a wider range, suggesting a strategic redistribution of weights.

Ahmad Ghasemi

Wet TinyML: Chemical Neural Network Using Gene Regulation and Cell Plasticity

Samitha SOMATHILAKA, Visiting Research Scholar, University of Nebraska-Lincoln

Abstract (English)

This paper extends the previously introduced concept of GRNN towards Wet tinyML, where chemical-based neural network-like structures naturally found in biological cells, are extracted for computing. These structures are based on the gene regulatory network, which is transformed into a Gene Regulatory Neural Network (GRNN) by estimating interaction weights. It is proven that GRNNs can be used for conventional computing by employing an application-based search process that matches the application’s requirement to a GRNN subnetwork. In this study, cell plasticity (adaptability to new data) is incorporated to improve search diversity in order to match different applications. Further, the low energy consumption of the GRNN found in this study along with its physical scale, position it as a strong contender in the tinyML domain. Finally, as an example application, it is shown that cell plasticity drives mathematical regression evolution expanding the search space and enabling it to match dynamic system applications. The concept of Wet TinyML can pave the way for a new emergence of chemical-based, energy-efficient and miniature Biological AI.

Samitha Somathilaka

11:20 am to 11:50 am

Break & Networking

11:50 am to 12:20 pm

Short papers

Session Moderator: Danilo PAU, Technical Director, IEEE & ST Fellow, System Research and Applications, STMicroelectronics

Comparing Classic Machine Learning Techniques with Deep Learning for TinyML Human Activity Recognition

Bruno MONTANARI, Master's in Embedded AI, STMicroelectronics & Centro Universitario FEI

Abstract (English)

Human Activity Recognition (HAR) has a significant role in people’s lives by providing relevant information through small sensors and MCUs in daily objects. The analysis and classification of human movements, such as running, walking, climbing up and down stairs, etc., is widely used in healthcare, sports, well-being, and security. The challenge is to perform HAR locally using ultra low power MCUs with low inference time and potentially applying
On-Device Training for customization and model evolution in a secure way. Although deep learning using 1-D Convolution Neural Network (CNN) has achieved high accuracy and performance in HAR, the same can be said about classic machine learning techniques, validated in PCs and smartphones. These techniques can also be implemented in MCUs, thanks to recent improvements in available frameworks. This work compares the inference time, memory
density, and power consumption of classic machine learning techniques (Principal Component Analysis with Support Vector Machine, Decision Tree Regression (DTR), and Random Forest) with a deep learning approach (1-D CNN) on an Arm® Cortex® M33 MCU, demonstrating that using PCA as input to both classic ML and ML models can greatly benefit the performance of HAR on MCUs. While the usefulness of PCA depends on the dataset dimensions, it can significantly improve classification performance, reduce memory consumption, and inference time. The DTR emerges
as a promising alternative for power-efficient and low inference time applications.

Bruno Fontes Montanari

Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons

Mozhgan NAVARDI, PhD Student, Johns Hopkins University

Abstract (English)

Tiny Machine Learning (TinyML) has become a growing field in on-device processing for Internet of Things (IoT) applications, capitalizing on AI algorithms that are optimized for their low complexity and energy efficiency. These algorithms are designed to minimize power and memory footprints, making them ideal for the constraints of IoT devices. Within this domain, Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML, owning to their event-driven processing paradigm which offers an efficient method of handling dataflow. This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model to efficiently deploy vision-based ML algorithms on TinyML systems. A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA. To evaluate the proposed model, a collision avoidance dataset is considered as a case study. The proposed SNN model is compared to the state-of-the-art works and Binarized Convolutional Neural Network (BCNN) as a baseline. The results show the proposed approach is 86% more energy efficient than the baseline.

Mozhgan Navardi

12:20 pm to 1:20 pm

Lunch & Networking

1:20 pm to 2:05 pm

Keynote by Houman Homayoun

Revolutionizing Digital Health Research: Bringing Smart Devices with Integrated AI to the Wild

Houman HOMAYOUN, Co-founder , HealtheTile Corporation

Abstract (English)

In the rapidly evolving landscape of digital health, wearable devices have become abundant and ubiquitous, providing an unprecedented opportunity for Artificial Intelligence (AI) to play a pivotal role in transforming how we approach medical research and patient care. My talk will first delve into the dynamic intersection of digital health using wearable devices and AI, focusing on its capacity to uncover hidden patterns in complex data sets. Despite the promising advancements, researchers in this field face three critical challenges: raw data access, model deployment, and end-to-end model testing and evaluation.
Firstly, access to raw physiological data, in particular in naturalistic uncontrolled environments, remains a significant hurdle, limiting the potential for comprehensive analysis and insight generation. Secondly, deploying tiny Machine Learning (ML) models on resource-constrained devices poses a unique set of technological challenges, especially in balancing efficiency and functionality. Lastly, the testing and evaluation of these solutions – particularly in assessing the effectiveness in terms of power, performance, and algorithm trade-offs – need to occur in naturalistic scenarios and on real platform, to ensure practical applicability.
In my talk, I will address these challenges and offer solutions developed through collaborative efforts between UC Davis and HealtheTile teams. We have developed an ecosystem that aims to revolutionize digital health research and ease the burden on patients and research sites. This ecosystem allows for seamless research conduct in clinical settings and various everyday environments, addressing the pivotal challenges of raw physiological data access, efficient ML model deployment, and comprehensive end-to-end testing and evaluation of the solution. Our integrated approach not only empowers researchers to navigate the complex landscape of digital health but also sets the stage for significant advancements in patient-centric care and precise medical research.

Houman Homayoun

2:05 pm to 3:05 pm

tinyML Hardware and Systems

Session Moderator: Zain ASGAR, Adjunct Professor Of Computer Science, Stanford University

Boosting keyword spotting through on-device learnable user speech characteristics

Lukas CAVIGELLI, Principal Researcher, Huawei

Abstract (English)

Keyword spotting systems for always-on TinyML-constrained applications require on-site tuning to boost the accuracy of offline trained classifiers when deployed in unseen inference conditions. Adapting to the speech peculiarities of target users requires many in-domain samples, often unavailable in real-world scenarios. Furthermore, current on-device learning techniques rely on computationally intensive and memory-hungry backbone update schemes, unfit for always-on, battery-powered devices. In this work, we propose a novel on-device learning architecture, composed of a pretrained backbone and a user-aware embedding learning the user’s speech characteristics. The so-generated features are fused and used to classify the input utterance. For domain shifts generated by unseen speakers, we measure error rate reductions of up to 19% from 30.1% to 24.3% based on the 35-class problem of the Google Speech Commands dataset, through the inexpensive update of the user projections. We moreover demonstrate the few-shot learning capabilities of our proposed architecture in sample- and class-scarce learning conditions. With 23.7 kparameters and 1 MFLOP per epoch required for on-device training, our system is feasible for TinyML applications aimed at battery-powered microcontrollers.

Christian Cioflan_Lukas Cavigelli

Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detection

Jared PING, Ph.D. Student, University of the Witwatersrand

Abstract (English)

Advances in Tiny Machine Learning (TinyML) have bolstered the creation of smart industry solutions, including smart agriculture, healthcare and smart cities. Whilst related research contributes to enabling TinyML solutions on constrained hardware, there is a need to amplify real-world applications by optimising energy consumption in battery-powered systems. The work presented extends and contributes to TinyML research through the optimisation of battery-powered image-based anomaly detection Internet of Things (IoT) systems. Whilst previous work in this area has yielded the capabilities of on-device inferencing and training, there has yet to be an investigation into optimising the management of such capabilities using machine learning approaches, such as Reinforcement Learning (RL), to improve the deployment battery life of such systems. Using modelled simulations, the battery life effects of an RL algorithm are benchmarked against static and dynamic optimisation approaches, with the foundation laid for a hardware benchmark to follow. It is shown that using RL within a TinyML-enabled IoT system to optimise the system operations, including cloud anomaly processing and on-device training, yields an improved battery life of 22.86% and 10.86% compared to static and dynamic optimisation approaches respectively. The proposed solution can be deployed to resource-constrained hardware, given its low memory footprint of 800 B, which could be further reduced. This further facilitates the real-world deployment of such systems, including key sectors such as smart agriculture.

Jared Ping

CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute In-Memory Hardwar

Souvik KUNDU, Research Scientist, Intel AI Lab

Abstract (English)

With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may not be ideally matched to any neural network’s attributes such as tensor dimension and arithmetic intensity, thus leading to suboptimal and under-performing systems. Despite the success of neural architecture search (NAS) techniques in yielding efficient sub-networks for a given hardware metric budget (e.g., DNN execution time or latency), it assumes the hardware configuration to be frozen, often yielding sub-optimal sub-networks for a given budget. In this paper, we present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for CiM architectures creating a Pareto optimal frontier of downstream task accuracy and execution metrics (e.g., latency). The proposed framework can comprehend the complex interplay between a sub-network’s performance and the CiM hardware configuration choices including bandwidth, processing element size, and memory size. Exhaustive experiments on different model architectures from both CNN and Transformer families demonstrate the efficacy of the CiMNet in finding co-optimized sub-networks and CiM hardware configurations. Specifically, for similar ImageNet classification accuracy as baseline ViT-B, optimizing only the model architecture increases performance (or reduces workload execution time) by 1.7x while optimizing for both the model architecture and hardware configuration increases it by 3.1x. We believe CiMNet provides a novel paradigm and framework for co-design to arrive at near-optimal and synergistic DNN algorithms and hardware. CiMNet can provide a paradigm of algorithm hardware co-design to extract the true benefit of hardware-inspired model design.

Souvik Kundu

3:05 pm to 3:35 pm

Break & Networking

3:35 pm to 4:35 pm

tinyML Applications

Session Moderator: Petrut BOGDAN, Neuromorphic Architect, Innatera

TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Hardware

Hasib-Al RASHID, Ph.D. Student, University of Maryland

Abstract (English)

Traditional machine learning models often require powerful hardware, making them unsuitable for deployment on resource-limited devices. Tiny Machine Learning (tinyML) has emerged as a promising approach for running machine learning models on these devices, but integrating multiple data modalities into tinyML models still remains a challenge due to increased complexity, latency, and power consumption. This paper proposes TinyVQA, a novel multimodal deep neural network for visual question answering tasks that can be deployed on resource-constrained tinyML hardware. TinyVQA leverages a supervised attention-based model to learn how to answer questions about images using both vision and language modalities. Distilled knowledge from the supervised attention-based VQA model trains the memory aware compact TinyVQA model and low bit-width quantization technique is employed to further compress the model for deployment on tinyML devices. The TinyVQA model was evaluated on the FloodNet dataset, which is used for post-disaster damage assessment. The compact model achieved an accuracy of 79.5%, demonstrating the effectiveness of TinyVQA for real-world applications. Additionally, the model was deployed on a Crazyflie 2.0 drone, equipped with an AI deck and GAP8 microprocessor. The TinyVQA model achieved low latencies of 56 ms and consumes 693 mW power while deployed on the tiny drone, showcasing its suitability for resource-constrained embedded systems.

Hasib Al Rahid

SpokeN-100: A Cross-Lingual Benchmarking Dataset for the Classification of Spoken Numbers in Different Languages

René GROH, Ph.D. Student, Friedrich-Alexander-Universität Erlangen-Nürnberg

Abstract (English)

Benchmarking plays a pivotal role in assessing and enhancing the performance of compact deep learning models designed for execution on resource-constrained devices, such as microcontrollers. Our study introduces a novel, entirely artificially generated benchmarking dataset tailored for speech recognition, representing a core challenge in the field of tiny deep learning. SpokeN-100 consists of spoken numbers from 0 to 99 spoken by 32 different speakers in four different languages, namely English, Mandarin, German and French, resulting in 12,800 audio samples. We determine auditory features and use UMAP as a dimensionality reduction method to show the diversity and richness of the dataset. To highlight the use case of the dataset, we introduce two benchmark tasks: given an audio sample, classify (i) the used language and/or (ii) the spoken number. We optimized state-of-the-art deep neural networks and performed an evolutionary neural architecture search to find tiny architectures optimized for the 32-bit ARM Cortex-M4 nRF52840 microcontroller. Our results represent the first benchmark data achieved for SpokeN-100.

Rene Groh

Scheduled Knowledge Acquisition on Lightweight Vector Symbolic Architectures for Brain-Computer Interfaces

Yejia LIU, PhD Student, University of California Riverside

Abstract (English)

Brain-Computer interfaces (BCIs) are typically designed to be lightweight and responsive in real-time to provide users timely feedback. Classical feature engineering is computational efficient but has low accuracy, whereas the recent neural networks (DNNs) improve accuracy but are computationally expensive and incur high latency. As a promising alternative, the low-dimensional computing (LDC) classifier based on vector symbolic architecture (VSA), achieves small model size yet higher accuracy than classical feature engineering methods. However, its accuracy still lags behind that of modern DNNs, making it challenging to process complex brain signals.
To improve the accuracy of a small model, knowledge distillation is a popular method. However, maintaining a constant level of distillation between the teacher and student models may not be the best way for a growing student during its progressive learning stages.
In this work, we propose a simple scheduled knowledge distillation method based on curriculum data order to enable the student to gradually build knowledge from the teacher model, controlled by an $\alpha$ scheduler. Meanwhile, we employ the LDC/VSA as the student model to enhance the on-device inference efficiency for tiny BCI devices that demand low latency. The empirical results have demonstrated that our approach achieves better tradeoff between accuracy and hardware efficiency compared to other methods.