A risk score derived from a form of artificial intelligence (AI) outperformed breast density as a predictor of future breast cancer risk, data from a retrospective chart review showed.
Evaluation of mammograms by means of an age-adjusted risk score resulted in a numerically higher odds ratio (OR 1.56) for future cancer development than did either total area of dense tissue (OR 1.31) or percentage density (OR 1.18). Area under the receiver operating characteristic curve (AUC), a performance measure, was significantly higher for the risk score (P<0.001).
Assessment by the risk score also led to significantly fewer false-negative results, Karin Dembrower, MD, of the Karolinska Institute in Stockholm, and colleagues reported in Radiology.
“The deep neural network overall was better than density-based models, and it did not have the same bias as the density-based models,” Dembrower said in a statement. “Its predictive accuracy was not negatively affected by more aggressive cancer subtypes.”
In a separate study published simultaneously in Radiology, computer models developed by scientists and engineers at Google Health performed as well as radiologists for identifying four types of lung abnormalities on x-rays. The models and radiologists had similar accuracy for identifying pneumothorax, lung nodule or mass, airspace opacity, and fracture.
Network vs Density
The risk-prediction model is not the first for breast cancer, but existing models exclude imaging information. Age and mammographic density are two of the strongest predictors of breast cancer risk and are easily obtainable in a screening setting, Dembrower and co-authors noted.
A number of other mathematically defined image features have associations with breast cancer risk. However, such “human-specified” features still might not capture all relevant risk-associated information, the authors continued. Use of a deep neural network may offer a solution.
Deep neural networks in radiology evolve by training on a large set of input parameters representing images on a pixel-by-pixel basis. Several previous studies employed deep neural networks to detect tumors on mammograms and for risk prediction. Dembrower and colleagues hypothesized that the networks’ flexibility might increase the yield of useful information from mammograms as compared with models based on breast density.
The investigators developed a network with data for women with breast cancer diagnoses from 2008 to 2012. The primary analysis included 278 women with newly diagnosed breast cancer during 2013-2014, and a control group of 2,005 healthy women.
A deep learning risk score (0=none, 1=greatest risk), breast density area, and percentage density were calculated for each patient, using data from the earliest available mammogram. Data inputs for the deep neural network consisted of mammographic images, age at mammography, and the image-acquisition parameters of exposure, tube current, breast thickness, and compression force.
Women with breast cancer diagnoses were older at the time of mammography (55.7 vs 54.6 years, P<0.001), had a larger area of density (38.2 vs 34.2 cm2, P<0.001), and had a higher percentage density (25.6% vs 24.0%, P<0.001).
Though numerically higher, the breast cancer odds ratio with the deep learning risk score did not differ significantly from the ORs associated with dense area or percentage density. The age-adjusted risk score did have a significantly higher AUC (0.65) as compared with dense area (0.60) or percentage density (0.57).
The risk score was associated with a false-negative rate of 31% vs 36% for dense area (P=0.006), and 39% for percentage density (P<0.001). The difference was most pronounced for patients with aggressive cancers, the authors noted.
The findings suggest that mammograms harbor risk indicators that are not captured by assessment of breast density, Manisha Bahl, MD, of Massachusetts General Hospital in Boston, said in an accompanying editorial.
“Herein lies the power of DL [deep learning]: to discover useful features that may not be discernible by the most experienced and skillful breast imagers and/or that are not currently known,” Bahl wrote.
The findings are consistent with those of previously reported studies of DL models, she noted. Collectively, the studies “demonstrate that image-based DL models offer promise as more accurate predictors of breast cancer risk than density-based models and existing epidemiology-based models.”
Noninferiority in Lung Dx
The study of lung abnormalities incorporated information from almost 900,000 radiographic images, 760,000 from a multicity hospital network in India and 112,000 from the publicly available ChestX-ray14 dataset, reported Shravya Shetty, MS, engineering lead at Google Health in Palo Alto, California.
Training included 657,954 images with labels developed from a combination of natural language processing and expert review. The labels were adjudicated by a panel of radiologists, who helped produce a consensus rate of 97%.
Testing of the DL models involved 1,818 images from the hospital network and 1,962 from public dataset, and the primary outcome was noninferiority of the models versus radiologist review. The hospital dataset consisted of 88 images positive for pneumothorax, 322 for nodule or mass, 444 for opacity, and 257 for fracture. The public dataset comprised 195 images positive for pneumothorax, 295 for nodule or mass, 1,135 for opacity, and 72 for fracture.
The investigators compared the models and radiology review with respect to AUC, sensitivity, specificity, and positive predictive value. Each outcome was calculated separately for the two datasets and for each of the four lung conditions represented in the images.
Overall, the models’ performance was on par with that of the radiologists, Shetty and colleagues reported. Most of the comparisons met statistical criteria for noninferiority.
The AUC associated with the models for evaluation of the hospital images were 0.95 for pneumothorax, 0.72 for nodule/mass, 0.91 for airspace opacity, and 86% for fracture. Corresponding values for the public images were 0.94, 0.91, 0.94, and 0.81. The AUC values met or exceeded mean values achieved with expert radiology review.
The author of an editorial that accompanied this study called the results a “welcome addition to the increasing corpus of evidence” regarding the promise of AI. Even so, the study was “essentially a feasibility study,” wrote Paul Chang, MD, of the University of Chicago.
“Whereas important, these proof-of-concept studies are not sufficient,” he said. “In order for AI to ‘become real,’ we need studies that evaluate how AI will work with respect to real-world considerations, such as generalizability, measurable impact on efficiency, reduced variability, and even outcomes.”
AI in medicine remains a work in progress, said Sarah Cate, MD, of Mount Sinai Health System in New York City, who was not affiliated with either study.
“These studies are supportive of including AI to help in the diagnosis of breast and lung disease,” she told MedPage Today via email. “However, there have not been prospective randomized trials evaluating these modalities, which is how advances are included as routine parts of care.”
“AI certainly can help in reading films, such as mammograms, and x-rays, because certain abnormalities can be detected by a computer,” Cate added. “We see this in [computer-aided design]. However, this cannot entirely replace a well-trained radiologist, which we know is superior to computer programs.”
Dembrower and co-authors reported having no relevant relationships with industry.
Shetty and many of the study co-authors are Google employees, and none of the authors disclosed relationships aside from those with Google.