European Physical Journal C: Particles and Fields(2023)
Kamioka Observatory | School of Physics and Astronomy | High Energy Accelerator Research Organization (KEK) | TRIUMF | Institute for Particle Physics and Astrophysics | Department of Physics | IFIC (CSIC and University of Valencia) | Kobe University | H. Niewodniczanski Institute of Nuclear Physics PAN | Dipartimento Interuniversitario di Fisica | Department of Physics and Astronomy | Section de Physique | IRFU | Faculty of Physics and Astronomy | Institute of Physics | Laboratoire Leprince-Ringuet | Departamento de Física Atómica | Institute For Interdisciplinary Research in Science and Education (IFIRSE) | CERN European Organization for Nuclear Research | Institute for Nuclear Research of the Russian Academy of Sciences | Dipartimento di Fisica | Kavli Institute for the Physics and Mathematics of the Universe (WPI) | Laboratoire de Physique Nucléaire et de Hautes Energies (LPNHE) | Joint Institute for Nuclear Research | Physics Department | Rutherford Appleton Laboratory | Institute of Radioelectronics and Multimedia Technology | ILANCE | III. Physikalisches Institut | International Centre of Physics | Research Center for Cosmic Neutrinos | Institut für Physik | National Centre for Nuclear Research | Department of Theoretical Physics | INFN Sezione di Roma and Università di Roma “La Sapienza” | Institut de Fisica d’Altes Energies (IFAE)-The Barcelona Institute of Science and Technology | Department of Atomic Physics | Laboratory for High Energy Physics (LHEP) | Faculty of Physics | Lawrence Berkeley National Laboratory
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
