An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures
IEEE Transactions on Parallel and Distributed Systems (TPDS)(2021)CCF ASCI 2区
Tsinghua Univ | Alibaba Grp
Abstract
Coarse-grained reconfigurable architectures (CGRAs) are increasingly employed as domain-specific accelerators due to their efficiency and flexibility. A CGRA typically relies on compilers to perform task scheduling. The longstanding problem of static scheduling is that it suffers from insufficient parallelism in handling irregularities due to over-serialization and workload imbalance, which leads to severe resource underutilization and performance loss. To counteract the limitations of static scheduling in CGRAs, it is essential to exploit dynamic parallelism automatically and manage hardware resources adaptively. However, existing dynamic scheduling mechanisms, e.g., work stealing, often reschedule aggressively for instant performance but sacrifice efficiency, which is unfavorable to CGRAs that emphasize efficiency and fewer reconfigurations. This article proposes an elastic task scheduling scheme that enables lightweight dynamic scheduling in CGRAs. Tasks are rescheduled at runtime according to the classic tagged-token dataflow paradigm to enable dynamic task-level parallelism. Meanwhile, tasks are dynamically resized according to run-time throughputs via duplication, combination, and substitution operators for balanced multitask execution. We implement the elastic task scheduling scheme on a well-known reconfigurable architecture - triggered instruction architecture (TIA). Evaluation on the MachSuite benchmarks shows that the proposed scheme is effective in improving performance and energy efficiency. The average speedup is 2× over the baseline. Also, our design attains a 57 percent improvement in the area-normalized performance and a 49 percent better energy efficiency. Compared with a state-of-the-art dynamic scheduling method, our scheme achieves 1.6× speedup and 1.6× energy efficiency than work-stealing mechanism on the same substrate.
MoreTranslated text
Key words
Task schedule,elastic task schedule,reconfigurable architectures,dynamic issue,dynamic mapping
求助PDF
上传PDF
View via Publisher
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Upload PDF to Generate Summary
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Related Papers
2022 10th International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing (ICETET-SIP-22) 2022
被引用1
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS 2023
被引用0
PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023 2023
被引用6
SN APPLIED SCIENCES 2023
被引用2
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2023
被引用0
DFGC: DFG-aware NoC Control Based on Time Stamp Prediction for Dataflow Architecture
2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD 2023
被引用0
A Self-adaptive HPL-Based Benchmark with Dynamic Task Parallelism for Multicore Systems
GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 5, WORLDCIST 2024 2024
被引用0
Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless Communication
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS 2024
被引用0
NETWORK-COMPUTATION IN NEURAL SYSTEMS 2024
被引用0
IEEE JOURNAL OF SOLID-STATE CIRCUITS 2025
被引用0
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper