26 Evaluating AI Models Trained with Varying Amounts of Expert Feedback for Chronic Graft-Versus-host Disease Skin Assessment in Photos of Patients with Diverse Skin Tones

Andrew McNeil, Kelsey Parks, Michael Pogharian,Edward W Cowen,Julia Lehman,Stephanie J Lee,Steven Z Pavletic,Benoit M Dawant, Eric R Tkaczyk

Journal of Clinical and Translational Science（2025）

Vanderbilt University Medical Center | National Institute of Arthritis and Musculoskeletal | Skin Diseases Mayo Clinic | Fred Hutchinson Cancer Center | Center for Cancer Research

Cited 0|Views0

Abstract

Objectives/Goals: Manual skin assessment in chronic graft-versus-host disease (cGVHD) can be time consuming and inconsistent (>20% affected area) even for experts. Building on previous work we explore methods to use unmarked photos to train artificial intelligence (AI) models, aiming to improve performance by expanding and diversifying the training data without additional burden on experts. Methods/Study Population: Common to many medical imaging projects, we have a small number of expert-marked patient photos (N = 36, n = 360), and many unmarked photos (N = 337, n = 25,842). Dark skin (Fitzpatrick type 4+) is underrepresented in both sets; 11% of patients in the marked set and 9% in the unmarked set. In addition, a set of 20 expert-marked photos from 20 patients were withheld from training to assess model performance, with 20% dark skin type. Our gold standard markings were manual contours around affected skin by a trained expert. Three AI training methods were tested. Our established baseline uses only the small number of marked photos (supervised method). The semi-supervised method uses a mix of marked and unmarked photos with human feedback. The self-supervised method uses only unmarked photos without any human feedback. Results/Anticipated Results: We evaluated performance by comparing predicted skin areas with expert markings. The error was given by the absolute difference between the percentage areas marked by the AI model and expert, where lower is better. Across all test patients, the median error was 19% (interquartile range 6 – 34) for the supervised method and 10% (5 – 23) for the semi-supervised method, which incorporated unmarked photos from 83 patients. On dark skin types, the median error was 36% (18 – 62) for supervised and 28% (14 – 52) for semi-supervised, compared to a median error on light skin of 18% (5 – 26) for supervised and 7% (4 – 17) for semi-supervised. Self-supervised, using all 337 unmarked patients, is expected to further improve performance and consistency due to increased data diversity. Full results will be presented at the meeting. Discussion/Significance of Impact: By automating skin assessment for cGVHD, AI could improve accuracy and consistency compared to manual methods. If translated to clinical use, this would ease clinical burden and scale to large patient cohorts. Future work will focus on ensuring equitable performance across all skin types, providing fair and accurate assessments for every patient.

Translated text

求助PDF

上传PDF

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Upload PDF to Generate Summary

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：本研究探讨了使用不同量专家反馈训练的AI模型在慢性移植物抗宿主病皮肤评估中的表现，特别是针对不同肤色的患者照片，创新性地引入了半监督学习和无监督学习方法来提高模型性能。

【方法】：研究使用了三种AI训练方法：仅使用少量标记照片的监督学习法、结合标记和无标记照片的半监督学习法，以及仅使用无标记照片的无监督学习法。

【实验】：实验使用了36位患者的360张标记照片和337位患者的25,842张无标记照片，以及20张保留的专家标记照片来评估模型性能。结果显示，半监督学习法在所有测试患者中的误差为10%，在深色皮肤类型上的误差为28%，相较于监督学习法有显著改善。无监督学习法预计将进一步改善性能和一致性。完整结果将在会议上展示。