An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures

Longlong Chen,Jianfeng Zhu,Yangdong Deng,Zhaoshi Li,Jian Chen,Xiaowei Jiang,Shouyi Yin,Shaojun Wei,Leibo Liu

IEEE Transactions on Parallel and Distributed Systems (TPDS)（2021）CCF ASCI 2区

Cited 10|Views54

Abstract

Coarse-grained reconfigurable architectures (CGRAs) are increasingly employed as domain-specific accelerators due to their efficiency and flexibility. A CGRA typically relies on compilers to perform task scheduling. The longstanding problem of static scheduling is that it suffers from insufficient parallelism in handling irregularities due to over-serialization and workload imbalance, which leads to severe resource underutilization and performance loss. To counteract the limitations of static scheduling in CGRAs, it is essential to exploit dynamic parallelism automatically and manage hardware resources adaptively. However, existing dynamic scheduling mechanisms, e.g., work stealing, often reschedule aggressively for instant performance but sacrifice efficiency, which is unfavorable to CGRAs that emphasize efficiency and fewer reconfigurations. This article proposes an elastic task scheduling scheme that enables lightweight dynamic scheduling in CGRAs. Tasks are rescheduled at runtime according to the classic tagged-token dataflow paradigm to enable dynamic task-level parallelism. Meanwhile, tasks are dynamically resized according to run-time throughputs via duplication, combination, and substitution operators for balanced multitask execution. We implement the elastic task scheduling scheme on a well-known reconfigurable architecture - triggered instruction architecture (TIA). Evaluation on the MachSuite benchmarks shows that the proposed scheme is effective in improving performance and energy efficiency. The average speedup is 2× over the baseline. Also, our design attains a 57 percent improvement in the area-normalized performance and a 49 percent better energy efficiency. Compared with a state-of-the-art dynamic scheduling method, our scheme achieves 1.6× speedup and 1.6× energy efficiency than work-stealing mechanism on the same substrate.

Translated text

Key words

Task schedule,elastic task schedule,reconfigurable architectures,dynamic issue,dynamic mapping

求助PDF

上传PDF

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Upload PDF to Generate Summary

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper