The latest paper from DeepSeek introduces a new attention mechanism — NSA, a locally trainable sparse attention mechanism for ultra-fast long-context training and inference.
Kimi proposed a new attention mechanism, MoBA, which combines the principles of MoE and improves the efficiency of LLMs in long-text scenarios without sacrificing performance.
Frank F. Xu,Yufan Song,Boxuan Li,Yuxuan Tang,Kritanjali Jain,Mengxue Bao,Zora Z. Wang,Xuhui Zhou,Zhitong Guo,Murong Cao,Mingyang Yang,Hao Yang Lu,
Computing Research Repository (2024)
Cited18Views915
Download
Bibtex
ChatPaper
Rate
18
915
Expand all 5 New Papers
Popular Recommendation
Popular Viewed Papers&Topics
This paper introduces a new technique called SparQ Attention, which can significantly reduce the memory bandwidth requirements of generative large language models during inference, thereby improving the throughput of LLM inference.
Scaling up the size of vision models has become a practical trend to obtain more powerful visual representations. But is "bigger" always "better" in the future? This paper discusses the aspects of larger vision models that may not be necessary.