Chrome Extension
WeChat Mini Program
Use on ChatGLM

Breakthrough Low-Latency, High-Energy-Efficiency LLM Inference Performance Using NorthPole

Rathinakumar Appuswamy, Michael V. Debole,Brian Taba,Steven K. Esser,Andrew S. Cassidy,Arnon Amir,Alexander Andreopoulos,Deepika Bablani, Pallab Datta,Jeffrey A. Kusnitz, Nathaniel J. McClatchey, Neil McGlohon,Jeffrey L. McKinstry,Tapan K. Nayak, Daniel F. Smith, Rafael Sousa,Ignacio Terrizzano, Filipp Akopyan, Peter J. Carlson, Rajamohan Gandhasri, Guillaume J. Garreau, Nelson M. Gonzalez,Megumi Ito, Jennifer L. Klamo,Yutaka Nakamura, Carlos Ortega Otero, William P. Risk, Jun Sawada,Kai Schleupen, Jay Sivagnaname, Matthew Stallone,Takanori Ueda,Myron D. Flickner, John V. Arthur, Rameswar Panda, David D. Cox, Dharmendra S. Modha

2024 IEEE High Performance Extreme Computing Conference (HPEC)(2024)

Cited 0|Views3
Key words
AI accelerators,large language model
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined