AMiner - AI赋能科技情报挖掘-学术搜索-论文检索-论文专利-文献追踪-学者画像

Chrome Extension

WeChat Mini Program

Use on ChatGLM

Academic Profile User Profile

My Following Paper Collections Browse History

AI Reads Science

GPT, Language Model, Human Feedback, CLIP, LLaMA

57,273,964

Researchers

310,264,067

Publications

8,933,445

Concepts

2,216,223,842

Citations

Explore

Report

Trend

Input keywords, let AI filter and summarize latest papers

The following are popular content recommendations, and the recommendations are more accurate after adding subscriptions

Topic

Hardware-Aligned and Natively Trainable Sparse Attention

More topics

Kimi proposed a new attention mechanism, MoBA, which combines the principles of MoE and improves the efficiency of LLMs in long-text scenarios without sacrificing performance.

No More Adam: Learning Rate Scaling at Initialization is All You Need

Minghao Xu, Lichuan Xiang,Xu Cai,Hongkai Wen

CoRR （2024）

Cited2Views1436

Download

Bibtex

ChatPaper

Rate

1436

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Benjamin Warner, Antoine Chaffin,Benjamin Clavié,Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas,Faisal Ladhak, Tom Aarsen,Nathan Cooper,Griffin Adams,

CoRR （2024）

Cited53Views991

Download

Bibtex

ChatPaper

Rate

991

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Frank F. Xu, Yufan Song, Boxuan Li, Yuxuan Tang, Kritanjali Jain, Mengxue Bao, Zora Z. Wang,Xuhui Zhou, Zhitong Guo, Murong Cao, Mingyang Yang, Hao Yang Lu,

Computing Research Repository （2024）

Cited17Views825

Download

Bibtex

ChatPaper

Rate

825

Expand all 5 New Papers

Top

Selected papers in the past 7 days

This paper introduces a new technique called SparQ Attention, which can significantly reduce the memory bandwidth requirements of generative large language models during inference, thereby improving the throughput of LLM inference.

SparQ Attention: Bandwidth-Efficient LLM Inference

Luka Ribar,Ivan Chelombiev, Luke Hudlass-Galley,Charlie Blake, Carlo Luschi, Douglas Orr

Generative large language models (LLMs) have opened up numerous novel possibilities, but due to their significant computational requirements their ubiquitous use remains challenging. Some of the most useful applications require processing large numbers of samples at a time and using long contexts, both significantly increasing the memory communication load of the models. We introduce SparQ Attention, a technique for increasing the inference throughput of LLMs by reducing the memory bandwidth requirements within the attention blocks through selective fetching of the cached history. Our proposed technique can be applied directly to off-the-shelf LLMs during inference, without requiring any modification to the pre-training setup or additional fine-tuning. We show how SparQ Attention can decrease the attention memory bandwidth requirements up to eight times without any loss in accuracy by evaluating Llama 2 and Pythia models on a wide range of downstream tasks.

CoRR （2023）

Cited0Views0

Download

Bibtex

ChatPaper

Rate

Scaling up the size of vision models has become a practical trend to obtain more powerful visual representations. But is "bigger" always "better" in the future? This paper discusses the aspects of larger vision models that may not be necessary.

When Do We Not Need Larger Vision Models

Baifeng Shi, Ziyang Wu, Maolin Mao,Xin Wang,Trevor DarrellTop Scholar

Scaling up the size of vision models has been the de facto standard to obtain more powerful visual representations. In this work, we discuss the point beyond which larger vision models are not necessary. First, we demonstrate the power of Scaling on Scales (S^2), whereby a pre-trained and frozen smaller vision model (e.g., ViT-B or ViT-L), run over multiple image scales, can outperform larger models (e.g., ViT-H or ViT-G) on classification, segmentation, depth estimation, Multimodal LLM (MLLM) benchmarks, and robotic manipulation. Notably, S^2 achieves state-of-the-art performance in detailed understanding of MLLM on the V* benchmark, surpassing models such as GPT-4V. We examine the conditions under which S^2 is a preferred scaling approach compared to scaling on model size. While larger models have the advantage of better generalization on hard examples, we show that features of larger vision models can be well approximated by those of multi-scale smaller models. This suggests most, if not all, of the representations learned by current large pre-trained models can also be obtained from multi-scale smaller models. Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S^2 can match or even exceed the advantage of larger models. We release a Python package that can apply S^2 on any vision model with one line of code: https://github.com/bfshi/scaling_on_scales.

arXiv （2024）

Cited0Views0

Download

Bibtex

ChatPaper

Rate

A Survey on Language Models for Code

Ziyin Zhang, Chaoyu Chen,Bingchang Liu, Cong Liao, Zi Gong,Hang Yu,Jianguo Li,Rui Wang

In this work we systematically review the recent advancements in code processing with language models, covering 50+ models, 30+ evaluation tasks, and 500 related works. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also discuss code-specific features such as AST, CFG, and unit tests, along with their application in training code language models, and identify key challenges and potential future directions in this domain. We keep the survey open and updated on github repository at https://github.com/codefuse-ai/Awesome-Code-LLM.

CoRR （2023）

Cited0Views0

Download

Bibtex

ChatPaper

Rate

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Minghua Liu,Ruoxi Shi,Linghao Chen, Zhuoyang Zhang,Chao Xu,Xinyue Wei,Hansheng Chen, Chong Zeng, Jiayuan Gu,Hao SuTop Scholar

Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially finetuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image. Our project webpage: https://sudo-ai-3d.github.io/One2345plus_page.

CVPR 2024 （2023）

Cited0Views0

Download

Bibtex

ChatPaper

Rate

MART: Improving LLM Safety with Multi-round Automatic Red-Teaming

Suyu Ge,Chunting Zhou,Rui Hou,Madian Khabsa, Yi-Chia Wang,Qifan Wang,Jiawei HanTop Scholar,Yuning Mao

Red-teaming is a common practice for mitigating unsafe behaviors in Large Language Models (LLMs), which involves thoroughly assessing LLMs to identify potential flaws and addressing them with responsible and accurate responses. While effective, manual red-teaming is costly, and existing automatic red-teaming typically discovers safety risks without addressing them. In this paper, we propose a Multi-round Automatic Red-Teaming (MART) method, which incorporates both automatic adversarial prompt writing and safe response generation, significantly increasing red-teaming scalability and the safety of the target LLM. Specifically, an adversarial LLM and a target LLM interplay with each other in an iterative manner, where the adversarial LLM aims to generate challenging prompts that elicit unsafe responses from the target LLM, while the target LLM is fine-tuned with safety aligned data on these adversarial prompts. In each round, the adversarial LLM crafts better attacks on the updated target LLM, while the target LLM also improves itself through safety fine-tuning. On adversarial prompt benchmarks, the violation rate of an LLM with limited safety alignment reduces up to 84.7% after 4 rounds of MART, achieving comparable performance to LLMs with extensive adversarial prompt writing. Notably, model helpfulness on non-adversarial prompts remains stable throughout iterations, indicating the target LLM maintains strong performance on instruction following.

CoRR （2023）

Cited0Views0

Download

Bibtex

ChatPaper

Rate

Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

Hongxuan Zhang,Zhining Liu, Jiaqi Zheng ,Chenyi Zhuang, Jinjie Gu,Guihai ChenTop Scholar

In this work, we propose FastCoT, a model-agnostic framework based on parallel decoding without any further training of an auxiliary model or modification to the LLM itself. FastCoT uses a size-varying context window whose size changes with position to conduct parallel decoding and auto-regressive decoding simultaneously, thus fully utilizing GPU computation resources. In FastCoT, the parallel decoding part provides the LLM with a quick glance of the future composed of approximate tokens, which could lead to faster answers compared to regular autoregressive decoding used by causal transformers. We also provide an implementation of parallel decoding within LLM, which supports KV-cache generation and batch processing. Through extensive experiments, we demonstrate that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach. Additionally, we show that the context window size exhibits considerable robustness for different tasks.

CoRR （2023）

Cited0Views0

Download

Bibtex

ChatPaper

Rate

加载更多

Hot

Top100 papers viewd in last 7 days

SparQ Attention: Bandwidth-Efficient LLM Inference