University of Maryland School of Medicine Baltimore, Maryland, United States
Background/Case Studies: Artificial Intelligence (AI) has rapidly evolved, yet its role in transfusion medicine practices remains unclear. Various AI models, including Machine Learning (statistical models requiring manual data processing), Deep Learning (neural networks for automated processing), and Natural Language Processing (NLP), are under investigation in the medical literature. NLP utilizes computational modeling of language to develop systems that integrate speech, enable database interaction, and model human-human communication. An example of an NLP model that is widely accessible is ChatGPT. This platform has the ability to rapidly process user-based queries with specific text commands that can provide computational results. This study aimed to evaluate ChatGPT's efficacy as an NLP model in predicting perioperative RBC transfusion risk in on-pump cardiac surgery patients, utilizing the established risk stratification model proposed by Zhang et al.
Study
Design/Methods: Forty hypothetical patient case scenarios (20 M, 20F) to reflect the prevalence of clinical conditions seen in hospitalized patients were randomly created, including the eight risk variables identified by Zhang et al. that predict perioperative red blood cell (RBC) transfusion risk in on-pump cardiac surgery patients: age, sex, anemia severity, NYHA III/IV heart failure, body surface area (BSA), prior cardiac surgery, emergency surgery status, and surgical procedure type. These scenarios were then input into ChatGPT using two methods. The first method involved inputting the scenarios without any initial training commands (pre-training). The second method utilized training prompts (post-training) that incorporated the Zhang et al. point value system. Accuracy rates and risk stratification scores were calculated.
Results/Findings: The case distribution consisted of 6 high-risk, 8 intermediate-risk, and 26 low-risk patients. In the pre-training method, ChatGPT's overall accuracy in predicting perioperative RBC transfusion risk was 60% (50% accuracy for high-risk cases, 50% for intermediate-risk cases, and 65.4% for low-risk cases). The average stratification score for the initial analysis was 12.2 ± 5.66. The average BSA was 1.60 ± 0.18, and the average patient age was 60.18 ± 16.2 years. The post-training average risk score was 11.15 ± 3.52, with 67.5% overall accuracy (33.3% high-risk, 12.5% intermediate-risk, 92.3% low-risk. Conclusions: Overall, ChatGPT's Natural Language Processing model exhibited poor performance in predicting perioperative risk stratification for hypothetical on-pump cardiac surgery patients. This underscores the necessity for larger, more comprehensive studies to evaluate its efficacy, and highlights that other learning algorithms may provide better results. Therefore, it is essential to exercise caution when employing ChatGPT and other NLP platforms for perioperative RBC transfusion risk assessment