On the Communication Complexity of 3D FFTs and Its Implications for Exascale.
ACM International Conference on Supercomputing (ICS)(2012)CCF B
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Toward A Theory of Algorithm-Architecture Co-Design
被引用2
Fast Multipole Preconditioners for Sparse Matrices Arising from Elliptic Equations
被引用18
Towards a Performance-Portable FFT Library for Heterogeneous Computing
被引用17
An Approach to Selecting Thread + Process Mixes for Hybrid MPI + OpenMP Applications
被引用3
An Approach for Energy Efficient Execution of Hybrid Parallel Programs
被引用13
Combining Power and Performance Modeling for Application Analysis: A Case Study Using Aspen.
被引用1
PANORAMA: an Approach to Performance Modeling and Diagnosis of Extreme-Scale Workflows
被引用38
Modeling the Energy-Time Performance of MIC Architecture System
被引用1
A Study of Power-Performance Modeling Using a Domain-Specific Language.
被引用2
A Framework for Scalable Biophysics-Based Image Analysis
被引用21
被引用28
Aspen-based Performance and Energy Modeling Frameworks.
被引用6
FFT, FMM, and Multigrid on the Road to Exascale: Performance Challenges and Opportunities
被引用14
CuQ-RTM: A CUDA-based Code Package for Stable and Efficient Q-compensated Reverse Time Migration
被引用44
GPU Acceleration of Extreme Scale Pseudo-Spectral Simulations of Turbulence Using Asynchronism
被引用48
The Landscape of Exascale Research: A Data-Driven Literature Analysis
被引用45
被引用15
The Transplantation Technology of Communication Intensive Applications on Heterogeneous Clusters
被引用0
被引用8
Accelerating Multi - Process Communication for Parallel 3-D FFT
被引用4
Performance Analysis of Parallel FFT on Large Multi-GPU Systems
被引用4
A Framework for Low Communication Approaches for Large Scale 3D Convolution
被引用0