Although the categorization of ultrasound using the Breast Imaging Reporting and Data System (BI-RADS) has become widespread worldwide, the problem of inter-observer variability remains. To maintain uniformity in diagnostic accuracy, we have developed a system in which artificial intelligence (AI) can distinguish whether a static image obtained using a breast ultrasound represents BI-RADS3 or lower or BI-RADS4a or higher to determine the medical management that should be performed on a patient whose breast ultrasound shows abnormalities. To establish and validate the AI system, a training dataset consisting of 4028 images containing 5014 lesions and a test dataset consisting of 3166 images containing 3656 lesions were collected and annotated. We selected a setting that maximized the area under the curve (AUC) and minimized the difference in sensitivity and specificity by adjusting the internal parameters of the AI system, achieving an AUC, sensitivity, and specificity of 0.95, 91.2%, and 90.7%, respectively. Furthermore, based on 30 images extracted from the test data, the diagnostic accuracy of 20 clinicians and the AI system was compared, and the AI system was found to be significantly superior to the clinicians (McNemar test, p < 0.001). Although deep-learning methods to categorize benign and malignant tumors using breast ultrasound have been extensively reported, our work represents the first attempt to establish an AI system to classify BI-RADS3 or lower and BI-RADS4a or higher successfully, providing important implications for clinical actions. These results suggest that the AI diagnostic system is sufficient to proceed to the next stage of clinical application.
ASJC Scopus subject areas