MInference 1.0: Accelerating Pre-filling for Long-Context LLMs Via Dynamic Sparse Attention

NeurIPS 2024（2024）

Cited 67|Views56

Key words

LLMs Inference,Long-Context LLMs,Dynamic Sparse Attention,Efficient Inference

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined