Saurabh Agarwal

(University of Wisconsin-Madison)
hosted by Laurent Bindschaedler

"Reducing Data Movement to Accelerate Machine Learning"

(MPI-SWS in Kooperation mit dem Fachbereich Informatik)

Training and Inference of Machine Learning jobs has become a dominant workload in data centers. In this talk I will first show how existing system designs can result in communication being a bottleneck -- specifically in the context of distributed training of ML models. Subsequently, I will introduce Bagpipe, a system that improves the training throughput of recommendation models by reducing remote embedding access overhead. BagPipe builds an oracular cache with the aid of our novel lookahead algorithm and realizes up to 5.6x improvement in training throughput while providing the same convergence and reproducibility guarantees as synchronous training. Finally, I will present CHAI (Clustered Head Attention for Inference), a new inference time method which reduces the memory bandwidth bottleneck of LLM inference. CHAI dynamically removes redundant heads in multi-head attention thus improving the latency for LLM inference by up to 1.7x and the size of K,V cache by up to 20%.

Bio: Saurabh is a Fifth year PhD student at University of Wisconsin-Madison. He works in the area of building Systems for Machine Learning. His work involves building new systems for emerging machine learning workloads to make training and inference faster, scalable and more efficient. Several of his works have been published in MLSys, Neurips, ICML, SOSP and Eurosys.


Time: Wednesday, 08.05.2024, 15:00
Place: MPI-SWS Saarbrücken, E 1 5, room 002
Video: https://zoom.us/j/96681414048?pwd=ZEllbHNBYUl1ZGRTVGozZjVYSXBOQT09

Termin als iCAL Datei downloaden und in den Kalender importieren.