Breakthrough Low-Latency, High-Energy-Efficiency LLM Inference Performance Using NorthPole
Rathinakumar Appuswamy, Michael V. Debole,Brian Taba,Steven K. Esser,Andrew S. Cassidy,Arnon Amir,Alexander Andreopoulos,Deepika Bablani, Pallab Datta,Jeffrey A. Kusnitz, Nathaniel J. McClatchey, Neil McGlohon,Jeffrey L. McKinstry,Tapan K. Nayak, Daniel F. Smith, Rafael Sousa,Ignacio Terrizzano, Filipp Akopyan, Peter J. Carlson, Rajamohan Gandhasri, Guillaume J. Garreau, Nelson M. Gonzalez,Megumi Ito, Jennifer L. Klamo,Yutaka Nakamura, Carlos Ortega Otero, William P. Risk, Jun Sawada,Kai Schleupen, Jay Sivagnaname, Matthew Stallone,Takanori Ueda,Myron D. Flickner, John V. Arthur, Rameswar Panda, David D. Cox, Dharmendra S. Modha 2024 IEEE High Performance Extreme Computing Conference (HPEC)(2024)
Key words
AI accelerators,large language model
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper