A Multimodal Generative AI Copilot for Human Pathology

Ming Y. Lu,Bowen Chen,Drew F. K. Williamson,Richard J. Chen,Melissa Zhao, Aaron K. Chow, Kenji Ikemura,Ahrong Kim,Dimitra Pouli,Ankush Patel,Amr Soliman,Chengkuan Chen,Tong Ding,Judy J. Wang,Georg Gerber,Ivy Liang,Long Phi Le,Anil V. Parwani,Luca L. Weishaupt,Faisal Mahmood

Nature（2024）SCI 1区

Harvard Med Sch | Ohio State Univ | Mayo Clin

Cited 17|Views26

Abstract

The field of computational pathology[1,2] has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders[3,4]. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants and copilots[5] tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology. We build PathChat by adapting a foundational vision encoder for pathology, combining it with a pretrained large language model and finetuning the whole system on over 456,000 diverse visual language instructions consisting of 999,202 question-answer turns. We compare PathChat against several multimodal vision language AI assistants and GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4[7]. PathChat achieved state-of-the-art performance on multiple-choice diagnostic questions from cases of diverse tissue origins and disease models. Furthermore, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision-language AI Copilot that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.

Translated text

Bibtex

AI Read Science

Video&Figures

论文作者介绍

The authors of this paper include Ming Y. Lu, Bowen Chen, Drew F K Williamson, Richard J. Chen, Melissa Zhao, Aaron K. Chow, Kenji Ikemura, Kim Ahrong, Dimitra Pouli, Ankush Patel, Amr S. Soliman, Chengkuan Chen, Tong Ding, Wang Judy Huei-yu, Georg K. Gerber, Ivy Liang, Long Phi Le, Anil Parwani, L. Weishaupt, and Mahmood Faisal. Their research areas span medical image analysis, digital pathology, computational pathology, feature extraction, whole-slide imaging, cellular microscopy, cancer epidemiology, health policy, and other fields, and they are affiliated with renowned institutions such as Harvard University, Massachusetts General Hospital, The Ohio State University, Tufts University, among others.

文献大纲

Abstract
- The field of computational pathology has made significant advancements in recent years,得益于 the availability of digital slide scanning, artificial intelligence research, large datasets, and high-performance computing resources.
- Researchers have utilized deep learning to address various tasks, including cancer typing, grading, metastasis detection, survival prediction, treatment response prediction, tumor origin prediction, mutation prediction, and biomarker selection.
- General visual encoders have provided improvements for many tasks in computational pathology, but they do not fully reflect the important role of natural language in pathology.
- The rise of multi-modal large language models (MLLM) and generative AI has opened up new frontiers in computational pathology, emphasizing natural language and human interaction as key components of AI model design and user experience.
- This paper develops PathChat, a multi-modal generative AI co-pilot for human pathology driven by a custom-tuned MLLM.
- PathChat excels in analyzing pathological cases, answering open-ended questions, and multiple-choice questions, outperforming other MLLM and commercial solutions.
- PathChat is expected to play a significant role in pathology education, research, and clinical decision-making.
Introduction
- Progress in the field of computational pathology
- The importance of natural language in pathology
- The emergence of multi-modal generative AI
- Potential applications of PathChat
PathChat: A Multi-Modal Generative AI Co-Pilot for Human Pathology
- PathChat model design
- PathChat model training
- PathChat dataset
Demonstration of PathChat in Various Use Cases
- Pathological case analysis
- Answering open-ended questions
- Multiple-choice diagnosis
Discussion
- Advantages of PathChat
- Limitations of PathChat
- Future directions for PathChat
PathChat Dataset
- Dataset source
- Dataset filtering
- Dataset structure
PathChat Model Design
- Model architecture
- Model training
Expert-Curated Pathology Q&A Benchmark
- Benchmark design
- Benchmark evaluation
MLLM Evaluation
- Evaluation methods
- Evaluation results
GPT4V Evaluation
- Evaluation methods
- Evaluation results
Statistical Analysis
- Statistical methods
- Statistical results
Computational Hardware and Software
- Computational hardware
- Computational software
Data Availability
- Data source
- Data access

关键问题

Q: What specific research methods were used in the paper?
- Dataset Construction:
  - Constructed the PathChat dataset containing 456,916 instructions and 999,202 answer turns, covering various formats (such as multi-turn dialogues, multiple-choice questions, and short answers), sourced from diverse origins including image captions, PubMed Open Access educational articles, pathological case reports, and regions extracted from WSI.
  - Built PathQABench, an expert-curated pathological visual question answering benchmark with 105 H&E WSI cases, covering 11 tissue types and 54 diagnostics.
- Model Design:
  - Utilized the UNI model as a visual encoder and visually aligned image representation space and pathological text through visual language pre-training on 1.18 million pairs of pathological image captions.
  - Connected the visual encoder with the 13-billion parameter pre-trained Llama 2 language model through a multi-modal projection module to form the complete MLLM architecture.
  - Fine-tuned MLLM with a curated dataset of over 450,000 instructions to build the PathChat model.
- Model Evaluation:
  - Evaluated PathChat's performance on multiple-choice questions and open-ended questions using the PathQABench dataset.
  - Compared PathChat with LLaVA 1.5, LLaVA-Med, and GPT4V.
  - Assessed model answers to open-ended questions through expert evaluation and binary correct/wrong labels.
Q: What are the main research findings and outcomes?
- PathChat outperformed LLaVA 1.5, LLaVA-Med, and GPT4V on both multiple-choice questions and open-ended questions.
- PathChat is capable of analyzing and describing significant morphological details in histological images and answering questions requiring pathological and general biomedical background knowledge.
- PathChat supports interactive multi-turn dialogues, which can assist humans in complex diagnostic work.
Q: What are the current limitations of this research?
- PathChat's performance is slightly lower than GPT4V in the "clinical" and "auxiliary test" categories.
- PathChat may reflect "past scientific consensus" rather than the latest information.
- PathChat needs further improvement and validation to ensure consistent and correct identification of invalid queries and to avoid providing unexpected or incorrect outputs.
- PathChat currently can only process a single WSI or parts of multiple WSIs, and will need to support input of entire GigaPixel WSIs or multiple WSIs in the future.
- PathChat requires further study to support more specialized tasks such as precise counting or localization of objects and integration with other tools (e.g., digital slide viewers or electronic health records).

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

A Multimodal Generative AI Copilot for Human Pathology

Q: What specific research methods were used in the paper?

Q: What are the main research findings and outcomes?

Q: What are the current limitations of this research?