Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models
Zhenghao Lin, Zihao Tang,Xiao Liu,Yeyun Gong, Yi Cheng,Qi Chen,Hang Li, Ying Xin,Ziyue Yang,Kailai Yang,Yu Yan, Xiao Liang,Shuai Lu, Yiming Huang,Zheheng Luo, Lei Qu, Xuan Feng, Yaoxiang Wang, Yuqing Xia, Feiyang Chen, Yuting Jiang, Yasen Hu, Hao Ni,Binyang Li,Guoshuai Zhao, Jui-Hao Chiang, Zhongxin Guo,Chen Lin,Kun Kuang,Wenjie Li,Yelong Shen,Jian Jiao,Peng Cheng,Mao Yang CoRR(2025)
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper