Published in 16th Annual IEEE Conference on Networking Sensing and Control, 2010
Networked applications with heterogeneous sensors are a growing source of data. Such applications use machine learning (ML) to make real-time predictions. Currently, features from all sensors are collected in a centralized cloud-based tier to form the whole feature vector for ML prediction. This approach has high communication cost, which wastes energy and often bottlenecks the network. In this work, we study how the inference computation of several popular ML models can be factored over a hierarchy of IoT devices to reduce communication by computing partial inference results locally on devices beyond the edge. We introduce exact factoring algorithms for some models which preserve accuracy and present approximations for others that offer high accuracy while reducing communication. Measurements on a common IoT device show that energy use and latency can be reduced by up to 63% and 67% respectively without reducing accuracy relative to sending all data to the cloud.