26 Evaluating AI Models Trained with Varying Amounts of Expert Feedback for Chronic Graft-Versus-host Disease Skin Assessment in Photos of Patients with Diverse Skin Tones
Journal of Clinical and Translational Science(2025)
Vanderbilt University Medical Center | National Institute of Arthritis and Musculoskeletal | Skin Diseases Mayo Clinic | Fred Hutchinson Cancer Center | Center for Cancer Research
Abstract
Objectives/Goals: Manual skin assessment in chronic graft-versus-host disease (cGVHD) can be time consuming and inconsistent (>20% affected area) even for experts. Building on previous work we explore methods to use unmarked photos to train artificial intelligence (AI) models, aiming to improve performance by expanding and diversifying the training data without additional burden on experts. Methods/Study Population: Common to many medical imaging projects, we have a small number of expert-marked patient photos (N = 36, n = 360), and many unmarked photos (N = 337, n = 25,842). Dark skin (Fitzpatrick type 4+) is underrepresented in both sets; 11% of patients in the marked set and 9% in the unmarked set. In addition, a set of 20 expert-marked photos from 20 patients were withheld from training to assess model performance, with 20% dark skin type. Our gold standard markings were manual contours around affected skin by a trained expert. Three AI training methods were tested. Our established baseline uses only the small number of marked photos (supervised method). The semi-supervised method uses a mix of marked and unmarked photos with human feedback. The self-supervised method uses only unmarked photos without any human feedback. Results/Anticipated Results: We evaluated performance by comparing predicted skin areas with expert markings. The error was given by the absolute difference between the percentage areas marked by the AI model and expert, where lower is better. Across all test patients, the median error was 19% (interquartile range 6 – 34) for the supervised method and 10% (5 – 23) for the semi-supervised method, which incorporated unmarked photos from 83 patients. On dark skin types, the median error was 36% (18 – 62) for supervised and 28% (14 – 52) for semi-supervised, compared to a median error on light skin of 18% (5 – 26) for supervised and 7% (4 – 17) for semi-supervised. Self-supervised, using all 337 unmarked patients, is expected to further improve performance and consistency due to increased data diversity. Full results will be presented at the meeting. Discussion/Significance of Impact: By automating skin assessment for cGVHD, AI could improve accuracy and consistency compared to manual methods. If translated to clinical use, this would ease clinical burden and scale to large patient cohorts. Future work will focus on ensuring equitable performance across all skin types, providing fair and accurate assessments for every patient.
MoreTranslated text
求助PDF
上传PDF
View via Publisher
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Upload PDF to Generate Summary
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper