A Multimodal Generative AI Copilot for Human Pathology
Nature(2024)SCI 1区
Harvard Med Sch | Ohio State Univ | Mayo Clin
The authors of this paper include Ming Y. Lu, Bowen Chen, Drew F K Williamson, Richard J. Chen, Melissa Zhao, Aaron K. Chow, Kenji Ikemura, Kim Ahrong, Dimitra Pouli, Ankush Patel, Amr S. Soliman, Chengkuan Chen, Tong Ding, Wang Judy Huei-yu, Georg K. Gerber, Ivy Liang, Long Phi Le, Anil Parwani, L. Weishaupt, and Mahmood Faisal. Their research areas span medical image analysis, digital pathology, computational pathology, feature extraction, whole-slide imaging, cellular microscopy, cancer epidemiology, health policy, and other fields, and they are affiliated with renowned institutions such as Harvard University, Massachusetts General Hospital, The Ohio State University, Tufts University, among others.
Abstract
- The field of computational pathology has made significant advancements in recent years,得益于 the availability of digital slide scanning, artificial intelligence research, large datasets, and high-performance computing resources.
- Researchers have utilized deep learning to address various tasks, including cancer typing, grading, metastasis detection, survival prediction, treatment response prediction, tumor origin prediction, mutation prediction, and biomarker selection.
- General visual encoders have provided improvements for many tasks in computational pathology, but they do not fully reflect the important role of natural language in pathology.
- The rise of multi-modal large language models (MLLM) and generative AI has opened up new frontiers in computational pathology, emphasizing natural language and human interaction as key components of AI model design and user experience.
- This paper develops PathChat, a multi-modal generative AI co-pilot for human pathology driven by a custom-tuned MLLM.
- PathChat excels in analyzing pathological cases, answering open-ended questions, and multiple-choice questions, outperforming other MLLM and commercial solutions.
- PathChat is expected to play a significant role in pathology education, research, and clinical decision-making.
Introduction
- Progress in the field of computational pathology
- The importance of natural language in pathology
- The emergence of multi-modal generative AI
- Potential applications of PathChat
PathChat: A Multi-Modal Generative AI Co-Pilot for Human Pathology
- PathChat model design
- PathChat model training
- PathChat dataset
Demonstration of PathChat in Various Use Cases
- Pathological case analysis
- Answering open-ended questions
- Multiple-choice diagnosis
Discussion
- Advantages of PathChat
- Limitations of PathChat
- Future directions for PathChat
PathChat Dataset
- Dataset source
- Dataset filtering
- Dataset structure
PathChat Model Design
- Model architecture
- Model training
Expert-Curated Pathology Q&A Benchmark
- Benchmark design
- Benchmark evaluation
MLLM Evaluation
- Evaluation methods
- Evaluation results
GPT4V Evaluation
- Evaluation methods
- Evaluation results
Statistical Analysis
- Statistical methods
- Statistical results
Computational Hardware and Software
- Computational hardware
- Computational software
Data Availability
- Data source
- Data access
Q: What specific research methods were used in the paper?
- Dataset Construction:
- Constructed the PathChat dataset containing 456,916 instructions and 999,202 answer turns, covering various formats (such as multi-turn dialogues, multiple-choice questions, and short answers), sourced from diverse origins including image captions, PubMed Open Access educational articles, pathological case reports, and regions extracted from WSI.
- Built PathQABench, an expert-curated pathological visual question answering benchmark with 105 H&E WSI cases, covering 11 tissue types and 54 diagnostics.
- Model Design:
- Utilized the UNI model as a visual encoder and visually aligned image representation space and pathological text through visual language pre-training on 1.18 million pairs of pathological image captions.
- Connected the visual encoder with the 13-billion parameter pre-trained Llama 2 language model through a multi-modal projection module to form the complete MLLM architecture.
- Fine-tuned MLLM with a curated dataset of over 450,000 instructions to build the PathChat model.
- Model Evaluation:
- Evaluated PathChat's performance on multiple-choice questions and open-ended questions using the PathQABench dataset.
- Compared PathChat with LLaVA 1.5, LLaVA-Med, and GPT4V.
- Assessed model answers to open-ended questions through expert evaluation and binary correct/wrong labels.
Q: What are the main research findings and outcomes?
- PathChat outperformed LLaVA 1.5, LLaVA-Med, and GPT4V on both multiple-choice questions and open-ended questions.
- PathChat is capable of analyzing and describing significant morphological details in histological images and answering questions requiring pathological and general biomedical background knowledge.
- PathChat supports interactive multi-turn dialogues, which can assist humans in complex diagnostic work.
Q: What are the current limitations of this research?
- PathChat's performance is slightly lower than GPT4V in the "clinical" and "auxiliary test" categories.
- PathChat may reflect "past scientific consensus" rather than the latest information.
- PathChat needs further improvement and validation to ensure consistent and correct identification of invalid queries and to avoid providing unexpected or incorrect outputs.
- PathChat currently can only process a single WSI or parts of multiple WSIs, and will need to support input of entire GigaPixel WSIs or multiple WSIs in the future.
- PathChat requires further study to support more specialized tasks such as precise counting or localization of objects and integration with other tools (e.g., digital slide viewers or electronic health records).
- Dataset Construction:

Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions
被引用3
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
被引用0
被引用0
被引用1
Impact of Stain Variation and Color Normalization for Prognostic Predictions in Pathology
被引用0
MetaPath Chat: Multimodal Generative Artificial Intelligence Chatbot for Clinical Pathology
被引用0
Advancing Clinical Practice: the Potential of Multimodal Technology in Modern Medicine
被引用0
被引用0
被引用0
Aligning Knowledge Concepts to Whole Slide Images for Precise Histopathology Image Analysis
被引用0
被引用0
MEDSQ: Towards Personalized Medical Education Via Multi-Form Interaction Guidance
被引用0
Machine Learning Methods for Histopathological Image Analysis: Updates in 2024
被引用0