Chrome Extension
WeChat Mini Program
Use on ChatGLM

Towards Greener Yet Powerful Code Generation Via Quantization: an Empirical Study

SIGACT News (ACM)(2023)

AWS AI Labs

Cited 1|Views64
Abstract
ML-powered code generation aims to assist developers to write code in a more productive manner by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have pushed the boundary of code generation and achieved impressive performance. However, the huge number of model parameters poses a significant challenge to their adoption in a typical software development environment, where a developer might use a standard laptop or mid-size server to develop code. Such large models cost significant resources in terms of memory, latency, dollars, as well as carbon footprint. Model compression is a promising approach to address these challenges. We have identified quantization as one of the most promising compression techniques for code-generation as it avoids expensive retraining costs. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit. We empirically evaluate quantized models on code generation tasks across different dimensions: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. Through systematic experiments we find a code-aware quantization recipe that could run even a 6-billion-parameter model in a regular laptop without significant accuracy or robustness degradation. We find that the recipe is readily applicable to code summarization task as well.
More
Translated text
Key words
Quantization,Code Generation,Large Language Models,Generative AI,Model Hosting
求助PDF
上传PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Upload PDF to Generate Summary
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:通过量化实现更环保又强大的代码生成,在不损失准确性和鲁棒性的情况下,将大型预训练深度学习模型应用于代码生成任务。

方法】:通过系统实验,找到了适用于代码生成任务的代码感知量化方案,可以在普通笔记本电脑上运行甚至有60亿参数的模型。

实验】:在不同维度上评估了量化模型在代码生成任务上的表现:资源消耗和碳足迹、准确性和鲁棒性。发现该方案还适用于代码摘要任务。