A comparative evaluation of systems for scalable linear algebra-based analytics

Published in Proceedings of the VLDB Endowment, 2018

The growing use of statistical and machine learning (ML) algorithms to analyze large datasets has given rise to new systems to scale such algorithms. But implementing new scalable algorithms in low-level languages is a painful process, especially for enterprise and scientific users. To mitigate this issue, a new breed of systems expose high-level bulk linear algebra (LA) primitives that are scalable. By composing such LA primitives, users can write analysis algorithms in a higher-level language, while the system handles scalability issues. But there is little work on a unified comparative evaluation of the scalability, efficiency, and effectiveness of such “scalable LA systems.” We take a major step towards filling this gap. We introduce a suite of LA-specific tests based on our analysis of the data access and communication patterns of LA workloads and their use cases. Using our tests, we perform a comprehensive empirical comparison of a few popular scalable LA systems: MADlib, MLlib, SystemML, ScaLAPACK, SciDB, and TensorFlow using both synthetic data and a large real-world dataset. Our study has revealed several scalability bottlenecks, unusual performance trends, and even bugs in some systems. Our findings have already led to improvements in SystemML, with other systems’ developers also expressing interest. All of our code and data scripts are available for download at

Recommended citation:

Hierarchical and Distributed Machine Learning Inference Beyond the Edge

Published in 16th Annual IEEE Conference on Networking Sensing and Control, 2010

Networked applications with heterogeneous sensors are a growing source of data. Such applications use machine learning (ML) to make real-time predictions. Currently, features from all sensors are collected in a centralized cloud-based tier to form the whole feature vector for ML prediction. This approach has high communication cost, which wastes energy and often bottlenecks the network. In this work, we study how the inference computation of several popular ML models can be factored over a hierarchy of IoT devices to reduce communication by computing partial inference results locally on devices beyond the edge. We introduce exact factoring algorithms for some models which preserve accuracy and present approximations for others that offer high accuracy while reducing communication. Measurements on a common IoT device show that energy use and latency can be reduced by up to 63% and 67% respectively without reducing accuracy relative to sending all data to the cloud.

Download here