Chrome Extension
WeChat Mini Program
Use on ChatGLM

An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures

IEEE Transactions on Parallel and Distributed Systems (TPDS)(2021)CCF ASCI 2区

Tsinghua Univ | Alibaba Grp

Cited 10|Views54
Abstract
Coarse-grained reconfigurable architectures (CGRAs) are increasingly employed as domain-specific accelerators due to their efficiency and flexibility. A CGRA typically relies on compilers to perform task scheduling. The longstanding problem of static scheduling is that it suffers from insufficient parallelism in handling irregularities due to over-serialization and workload imbalance, which leads to severe resource underutilization and performance loss. To counteract the limitations of static scheduling in CGRAs, it is essential to exploit dynamic parallelism automatically and manage hardware resources adaptively. However, existing dynamic scheduling mechanisms, e.g., work stealing, often reschedule aggressively for instant performance but sacrifice efficiency, which is unfavorable to CGRAs that emphasize efficiency and fewer reconfigurations. This article proposes an elastic task scheduling scheme that enables lightweight dynamic scheduling in CGRAs. Tasks are rescheduled at runtime according to the classic tagged-token dataflow paradigm to enable dynamic task-level parallelism. Meanwhile, tasks are dynamically resized according to run-time throughputs via duplication, combination, and substitution operators for balanced multitask execution. We implement the elastic task scheduling scheme on a well-known reconfigurable architecture - triggered instruction architecture (TIA). Evaluation on the MachSuite benchmarks shows that the proposed scheme is effective in improving performance and energy efficiency. The average speedup is 2× over the baseline. Also, our design attains a 57 percent improvement in the area-normalized performance and a 49 percent better energy efficiency. Compared with a state-of-the-art dynamic scheduling method, our scheme achieves 1.6× speedup and 1.6× energy efficiency than work-stealing mechanism on the same substrate.
More
Translated text
Key words
Task schedule,elastic task schedule,reconfigurable architectures,dynamic issue,dynamic mapping
求助PDF
上传PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Upload PDF to Generate Summary
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers

IL2ATL: Design of a High Efficiency Load Balancing Model Using Augmented Deep Incremental Transfer Learning

2022 10th International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing (ICETET-SIP-22) 2022

被引用1

MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search.

PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023 2023

被引用6

DFGC: DFG-aware NoC Control Based on Time Stamp Prediction for Dataflow Architecture

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD 2023

被引用0

A Self-adaptive HPL-Based Benchmark with Dynamic Task Parallelism for Multicore Systems

GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 5, WORLDCIST 2024 2024

被引用0

Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了一种在粗粒度可重构架构(CGRAs)上的弹性任务调度方案,通过运行时根据经典标记令牌数据流范式调整任务,并动态调整任务大小以实现平衡的多任务执行,旨在提高CGRAs的效率和性能。

方法】:该方案采用运行时调整任务和资源的管理策略,以实现轻量级动态调度。

实验】:研究者在触发指令架构(TIA)上实现了该调度方案,并使用MachSuite基准测试集进行了评估。结果显示,相较于基线,平均加速比达到2.2倍;在面积归一化性能上提高了57%,能效比提高了49%。与最先进的数据窃取机制相比,在相同底层架构上,该方案实现了1.6倍的加速比和1.6倍的能效提升。