bims-arihec Biomed News
on Artificial intelligence in healthcare
Issue of 2020–03–15
fiveteen papers selected by
Céline Bélanger, Cogniges Inc.



  1. Front Cardiovasc Med. 2020 ;7 17
      Cardiac magnetic resonance (CMR) imaging is an important tool for the non-invasive assessment of cardiovascular disease. However, CMR suffers from long acquisition times due to the need of obtaining images with high temporal and spatial resolution, different contrasts, and/or whole-heart coverage. In addition, both cardiac and respiratory-induced motion of the heart during the acquisition need to be accounted for, further increasing the scan time. Several undersampling reconstruction techniques have been proposed during the last decades to speed up CMR acquisition. These techniques rely on acquiring less data than needed and estimating the non-acquired data exploiting some sort of prior information. Parallel imaging and compressed sensing undersampling reconstruction techniques have revolutionized the field, enabling 2- to 3-fold scan time accelerations to become standard in clinical practice. Recent scientific advances in CMR reconstruction hinge on the thriving field of artificial intelligence. Machine learning reconstruction approaches have been recently proposed to learn the non-linear optimization process employed in CMR reconstruction. Unlike analytical methods for which the reconstruction problem is explicitly defined into the optimization process, machine learning techniques make use of large data sets to learn the key reconstruction parameters and priors. In particular, deep learning techniques promise to use deep neural networks (DNN) to learn the reconstruction process from existing datasets in advance, providing a fast and efficient reconstruction that can be applied to all newly acquired data. However, before machine learning and DNN can realize their full potentials and enter widespread clinical routine for CMR image reconstruction, there are several technical hurdles that need to be addressed. In this article, we provide an overview of the recent developments in the area of artificial intelligence for CMR image reconstruction. The underlying assumptions of established techniques such as compressed sensing and low-rank reconstruction are briefly summarized, while a greater focus is given to recent advances in dictionary learning and deep learning based CMR reconstruction. In particular, approaches that exploit neural networks as implicit or explicit priors are discussed for 2D dynamic cardiac imaging and 3D whole-heart CMR imaging. Current limitations, challenges, and potential future directions of these techniques are also discussed.
    Keywords:  AI; cardiac MRI; deep learning; dictionary learning; reconstruction; undersampling
    DOI:  https://doi.org/10.3389/fcvm.2020.00017
  2. J Oral Pathol Med. 2020 Mar 11.
      Oral cancer is easily detectable by physical (self) examination. However, many cases of oral cancer are detected late, which causes unnecessary morbidity and mortality. Screening of high-risk populations seems beneficial, but these populations are commonly located in regions with limited access to healthcare. The advent of information technology and its modern derivative Artificial Intelligence (AI) promises to improve oral cancer screening but to date, few efforts have been made to apply these techniques and relatively little research has been conducted to retrieve meaningful information from AI data. In this paper, we discuss the promise of AI to improve the quality and reach of oral cancer screening and its potential effect on improving mortality and unequal access to health care around the world.
    Keywords:  Machine learning; artificial intelligence; early detection; oral squamous cell carcinoma
    DOI:  https://doi.org/10.1111/jop.13013
  3. BMJ Open Diabetes Res Care. 2020 Mar;pii: e001055. [Epub ahead of print]8(1):
       OBJECTIVE: Medication adherence plays a key role in type 2 diabetes (T2D) care. Identifying patients with high risks of non-compliance helps individualized management, especially for China, where medical resources are relatively insufficient. However, models with good predictive capabilities have not been studied. This study aims to assess multiple machine learning algorithms and screen out a model that can be used to predict patients' non-adherence risks.
    METHODS: A real-world registration study was conducted at Sichuan Provincial People's Hospital from 1 April 2018 to 30 March 2019. Data of patients with T2D on demographics, disease and treatment, diet and exercise, mental status, and treatment adherence were obtained by face-to-face questionnaires. The medication possession ratio was used to evaluate patients' medication adherence status. Fourteen machine learning algorithms were applied for modeling, including Bayesian network, Neural Net, support vector machine, and so on, and balanced sampling, data imputation, binning, and methods of feature selection were evaluated by the area under the receiver operating characteristic curve (AUC). We use two-way cross-validation to ensure the accuracy of model evaluation, and we performed a posteriori test on the sample size based on the trend of AUC as the sample size increase.
    RESULTS: A total of 401 patients out of 630 candidates were investigated, of which 85 were evaluated as poor adherence (21.20%). A total of 16 variables were selected as potential variables for modeling, and 300 models were built based on 30 machine learning algorithms. Among these algorithms, the AUC of the best capable one was 0.866±0.082. Imputing, oversampling and larger sample size will help improve predictive ability.
    CONCLUSIONS: An accurate and sensitive adherence prediction model based on real-world registration data was established after evaluating data filling, balanced sampling, and so on, which may provide a technical tool for individualized diabetes care.
    Keywords:  adherence; personality; prediction and prevention; type 2 diabetes
    DOI:  https://doi.org/10.1136/bmjdrc-2019-001055
  4. JAMA Dermatol. 2020 Mar 11.
       Importance: The use of artificial intelligence (AI) is expanding throughout the field of medicine. In dermatology, researchers are evaluating the potential for direct-to-patient and clinician decision-support AI tools to classify skin lesions. Although AI is poised to change how patients engage in health care, patient perspectives remain poorly understood.
    Objective: To explore how patients conceptualize AI and perceive the use of AI for skin cancer screening.
    Design, Setting, and Participants: A qualitative study using a grounded theory approach to semistructured interview analysis was conducted in general dermatology clinics at the Brigham and Women's Hospital and melanoma clinics at the Dana-Farber Cancer Institute. Forty-eight patients were enrolled. Each interview was independently coded by 2 researchers with interrater reliability measurement; reconciled codes were used to assess code frequency. The study was conducted from May 6 to July 8, 2019.
    Main Outcomes and Measures: Artificial intelligence concept, perceived benefits and risks of AI, strengths and weaknesses of AI, AI implementation, response to conflict between human and AI clinical decision-making, and recommendation for or against AI.
    Results: Of 48 patients enrolled, 26 participants (54%) were women; mean (SD) age was 53.3 (21.7) years. Sixteen patients (33%) had a history of melanoma, 16 patients (33%) had a history of nonmelanoma skin cancer only, and 16 patients (33%) had no history of skin cancer. Twenty-four patients were interviewed about a direct-to-patient AI tool and 24 patients were interviewed about a clinician decision-support AI tool. Interrater reliability ratings for the 2 coding teams were κ = 0.94 and κ = 0.89. Patients primarily conceptualized AI in terms of cognition. Increased diagnostic speed (29 participants [60%]) and health care access (29 [60%]) were the most commonly perceived benefits of AI for skin cancer screening; increased patient anxiety was the most commonly perceived risk (19 [40%]). Patients perceived both more accurate diagnosis (33 [69%]) and less accurate diagnosis (41 [85%]) to be the greatest strength and weakness of AI, respectively. The dominant theme that emerged was the importance of symbiosis between humans and AI (45 [94%]). Seeking biopsy was the most common response to conflict between human and AI clinical decision-making (32 [67%]). Overall, 36 patients (75%) would recommend AI to family members and friends.
    Conclusions and Relevance: In this qualitative study, patients appeared to be receptive to the use of AI for skin cancer screening if implemented in a manner that preserves the integrity of the human physician-patient relationship.
    DOI:  https://doi.org/10.1001/jamadermatol.2019.5014
  5. Eur Radiol. 2020 Mar 11.
       OBJECTIVES: Pneumothorax is the most common and potentially life-threatening complication arising from percutaneous lung biopsy. We evaluated the performance of a deep learning algorithm for detection of post-biopsy pneumothorax in chest radiographs (CRs), in consecutive cohorts reflecting actual clinical situation.
    METHODS: We retrospectively included post-biopsy CRs of 1757 consecutive patients (1055 men, 702 women; mean age of 65.1 years) undergoing percutaneous lung biopsies from three institutions. A commercially available deep learning algorithm analyzed each CR to identify pneumothorax. We compared the performance of the algorithm with that of radiology reports made in the actual clinical practice. We also conducted a reader study, in which the performance of the algorithm was compared with those of four radiologists. Performances of the algorithm and radiologists were evaluated by area under receiver operating characteristic curves (AUROCs), sensitivity, and specificity, with reference standards defined by thoracic radiologists.
    RESULTS: Pneumothorax occurred in 17.5% (308/1757) of cases, out of which 16.6% (51/308) required catheter drainage. The AUROC, sensitivity, and specificity of the algorithm were 0.937, 70.5%, and 97.7%, respectively, for identification of pneumothorax. The algorithm exhibited higher sensitivity (70.2% vs. 55.5%, p < 0.001) and lower specificity (97.7% vs. 99.8%, p < 0.001), compared with those of radiology reports. In the reader study, the algorithm exhibited lower sensitivity (77.3% vs. 81.8-97.7%) and higher specificity (97.6% vs. 81.7-96.0%) than the radiologists.
    CONCLUSION: The deep learning algorithm appropriately identified pneumothorax in post-biopsy CRs in consecutive diagnostic cohorts. It may assist in accurate and timely diagnosis of post-biopsy pneumothorax in clinical practice.
    KEY POINTS: • A deep learning algorithm can identify chest radiographs with post-biopsy pneumothorax in multicenter consecutive cohorts reflecting actual clinical situation. • The deep learning algorithm has a potential role as a surveillance tool for accurate and timely diagnosis of post-biopsy pneumothorax.
    Keywords:  Artificial intelligence; Deep learning; Needle biopsy; Pneumothorax; Thoracic radiography
    DOI:  https://doi.org/10.1007/s00330-020-06771-3
  6. J Clin Med. 2020 Mar 07. pii: E724. [Epub ahead of print]9(3):
      To bridge the translational gap between recent discoveries of distinct molecular phenotypes of pancreatic cancer and tangible improvements in patient outcome, there is an urgent need to develop strategies and tools informing and improving the clinical decision process. Radiomics and machine learning approaches can offer non-invasive whole tumor analytics for clinical imaging data-based classification. The retrospective study assessed baseline computed tomography (CT) from 207 patients with proven pancreatic ductal adenocarcinoma (PDAC). Following expert level manual annotation, Pyradiomics was used for the extraction of 1474 radiomic features. The molecular tumor subtype was defined by immunohistochemical staining for KRT81 and HNF1a as quasi-mesenchymal (QM) vs. non-quasi-mesenchymal (non-QM). A Random Forest machine learning algorithm was developed to predict the molecular subtype from the radiomic features. The algorithm was then applied to an independent cohort of histopathologically unclassifiable tumors with distinct clinical outcomes. The classification algorithm achieved a sensitivity, specificity and ROC-AUC (area under the receiver operating characteristic curve) of 0.84 ± 0.05, 0.92 ± 0.01 and 0.93 ± 0.01, respectively. The median overall survival for predicted QM and non-QM tumors was 16.1 and 20.9 months, respectively, log-rank-test p = 0.02, harzard ratio (HR) 1.59. The application of the algorithm to histopathologically unclassifiable tumors revealed two groups with significantly different survival (8.9 and 39.8 months, log-rank-test p < 0.001, HR 4.33). The machine learning-based analysis of preoperative (CT) imaging allows the prediction of molecular PDAC subtypes highly relevant for patient survival, allowing advanced pre-operative patient stratification for precision medicine applications.
    Keywords:  molecular subtypes; pancreatic cancer; radiomics
    DOI:  https://doi.org/10.3390/jcm9030724
  7. Eur Radiol. 2020 Mar 11.
       BACKGROUND AND PURPOSE: Recent studies have highlighted the importance of isocitrate dehydrogenase (IDH) mutational status in stratifying biologically distinct subgroups of gliomas. This study aimed to evaluate whether MRI-based radiomic features could improve the accuracy of survival predictions for lower grade gliomas over clinical and IDH status.
    MATERIALS AND METHODS: Radiomic features (n = 250) were extracted from preoperative MRI data of 296 lower grade glioma patients from databases at our institutional (n = 205) and The Cancer Genome Atlas (TCGA)/The Cancer Imaging Archive (TCIA) (n = 91) datasets. For predicting overall survival, random survival forest models were trained with radiomic features; non-imaging prognostic factors including age, resection extent, WHO grade, and IDH status on the institutional dataset, and validated on the TCGA/TCIA dataset. The performance of the random survival forest (RSF) model and incremental value of radiomic features were assessed by time-dependent receiver operating characteristics.
    RESULTS: The radiomics RSF model identified 71 radiomic features to predict overall survival, which were successfully validated on TCGA/TCIA dataset (iAUC, 0.620; 95% CI, 0.501-0.756). Relative to the RSF model from the non-imaging prognostic parameters, the addition of radiomic features significantly improved the overall survival prediction accuracy of the random survival forest model (iAUC, 0.627 vs. 0.709; difference, 0.097; 95% CI, 0.003-0.209).
    CONCLUSION: Radiomic phenotyping with machine learning can improve survival prediction over clinical profile and genomic data for lower grade gliomas.
    KEY POINTS: • Radiomics analysis with machine learning can improve survival prediction over the non-imaging factors (clinical and molecular profiles) for lower grade gliomas, across different institutions.
    Keywords:  Glioma; Machine learning; Prognosis; Survival
    DOI:  https://doi.org/10.1007/s00330-020-06737-5
  8. Obstet Gynecol. 2020 Mar 10.
       OBJECTIVE: To predict a woman's risk of postpartum hemorrhage at labor admission using machine learning and statistical models.
    METHODS: Predictive models were constructed and compared using data from 10 of 12 sites in the U.S. Consortium for Safe Labor Study (2002-2008) that consistently reported estimated blood loss at delivery. The outcome was postpartum hemorrhage, defined as an estimated blood loss at least 1,000 mL. Fifty-five candidate risk factors routinely available on labor admission were considered. We used logistic regression with and without lasso regularization (lasso regression) as the two statistical models, and random forest and extreme gradient boosting as the two machine learning models to predict postpartum hemorrhage. Model performance was measured by C statistics (ie, concordance index), calibration, and decision curves. Models were constructed from the first phase (2002-2006) and externally validated (ie, temporally) in the second phase (2007-2008). Further validation was performed combining both temporal and site-specific validation.
    RESULTS: Of the 152,279 assessed births, 7,279 (4.8%, 95% CI 4.7-4.9) had postpartum hemorrhage. All models had good-to-excellent discrimination. The extreme gradient boosting model had the best discriminative ability to predict postpartum hemorrhage (C statistic: 0.93; 95% CI 0.92-0.93), followed by random forest (C statistic: 0.92; 95% CI 0.91-0.92). The lasso regression model (C statistic: 0.87; 95% CI 0.86-0.88) and logistic regression (C statistic: 0.87; 95% CI 0.86-0.87) had lower-but-good discriminative ability. The above results held with validation across both time and sites. Decision curve analysis demonstrated that, although all models provided superior net benefit when clinical decision thresholds were between 0% and 80% predicted risk, the extreme gradient boosting model provided the greatest net benefit.
    CONCLUSION: Postpartum hemorrhage on labor admission can be predicted with excellent discriminative ability using machine learning and statistical models. Further clinical application is needed, which may assist health care providers to be prepared and triage at-risk women.
    DOI:  https://doi.org/10.1097/AOG.0000000000003759
  9. J Magn Reson Imaging. 2020 Mar 13.
       BACKGROUND: Approximately one-fourth of all cancer metastases are found in the brain. MRI is the primary technique for detection of brain metastasis, planning of radiotherapy, and the monitoring of treatment response. Progress in tumor treatment now requires detection of new or growing metastases at the small subcentimeter size, when these therapies are most effective.
    PURPOSE: To develop a deep-learning-based approach for finding brain metastasis on MRI.
    STUDY TYPE: Retrospective.
    SEQUENCE: Axial postcontrast 3D T1 -weighted imaging.
    FIELD STRENGTH: 1.5T and 3T.
    POPULATION: A total of 361 scans of 121 patients were used to train and test the Faster region-based convolutional neural network (Faster R-CNN): 1565 lesions in 270 scans of 73 patients for training; 488 lesions in 91 scans of 48 patients for testing. From the 48 outputs of Faster R-CNN, 212 lesions in 46 scans of 18 patients were used for training the RUSBoost algorithm (MatLab) and 276 lesions in 45 scans of 30 patients for testing.
    ASSESSMENT: Two radiologists diagnosed and supervised annotation of metastases on brain MRI as ground truth. This data were used to produce a 2-step pipeline consisting of a Faster R-CNN for detecting abnormal hyperintensity that may represent brain metastasis and a RUSBoost classifier to reduce the number of false-positive foci detected.
    STATISTICAL TESTS: The performance of the algorithm was evaluated by using sensitivity, false-positive rate, and receiver's operating characteristic (ROC) curves. The detection performance was assessed both per-metastases and per-slice.
    RESULTS: Testing on held-out brain MRI data demonstrated 96% sensitivity and 20 false-positive metastases per scan. The results showed an 87.1% sensitivity and 0.24 false-positive metastases per slice. The area under the ROC curve was 0.79.
    CONCLUSION: Our results showed that deep-learning-based computer-aided detection (CAD) had the potential of detecting brain metastases with high sensitivity and reasonable specificity.
    LEVEL OF EVIDENCE: 3 TECHNICAL EFFICACY STAGE: 2.
    Keywords:  Faster R-CNN; RUSBoost; brain metastases; deep learning
    DOI:  https://doi.org/10.1002/jmri.27129
  10. J Thorac Imaging. 2020 Mar 12.
       PURPOSE: The purpose of this study was to evaluate the accuracy of a novel fully automated deep learning (DL) algorithm implementing a recurrent neural network (RNN) with long short-term memory (LSTM) for the detection of coronary artery calcium (CAC) from coronary computed tomography angiography (CCTA) data.
    MATERIALS AND METHODS: Under an IRB waiver and in HIPAA compliance, a total of 194 patients who had undergone CCTA were retrospectively included. Two observers independently evaluated the image quality and recorded the presence of CAC in the right (RCA), the combination of left main and left anterior descending (LM-LAD), and left circumflex (LCx) coronary arteries. Noncontrast CACS scans were allowed to be used in cases of uncertainty. Heart and coronary artery centerline detection and labeling were automatically performed. Presence of CAC was assessed by a RNN-LSTM. The algorithm's overall and per-vessel sensitivity, specificity, and diagnostic accuracy were calculated.
    RESULTS: CAC was absent in 84 and present in 110 patients. As regards CCTA, the median subjective image quality, signal-to-noise ratio, and contrast-to-noise ratio were 3.0, 13.0, and 11.4. A total of 565 vessels were evaluated. On a per-vessel basis, the algorithm achieved a sensitivity, specificity, and diagnostic accuracy of 93.1% (confidence interval [CI], 84.3%-96.7%), 82.76% (CI, 74.6%-89.4%), and 86.7% (CI, 76.8%-87.9%), respectively, for the RCA, 93.1% (CI, 86.4%-97.7%), 95.5% (CI, 88.77%-98.75%), and 94.2% (CI. 90.2%-94.6%), respectively, for the LM-LAD, and 89.9% (CI, 80.2%-95.8%), 90.0% (CI, 83.2%-94.7%), and 89.9% (CI, 85.0%-94.1%), respectively, for the LCx. The overall sensitivity, specificity, and diagnostic accuracy were 92.1% (CI, 92.1%-95.2%), 88.9% (CI. 84.9%-92.1%), and 90.3% (CI, 88.0%-90.0%), respectively. When accounting for image quality, the algorithm achieved a sensitivity, specificity, and diagnostic accuracy of 76.2%, 87.5%, and 82.2%, respectively, for poor-quality data sets and 93.3%, 89.2% and 90.9%, respectively, when data sets rated adequate or higher were combined.
    CONCLUSION: The proposed RNN-LSTM demonstrated high diagnostic accuracy for the detection of CAC from CCTA.
    DOI:  https://doi.org/10.1097/RTI.0000000000000491
  11. PLoS One. 2020 ;15(3): e0229226
      In medicine, a misdiagnosis or the absence of specialists can affect the patient's health, leading to unnecessary tests and increasing the costs of healthcare. In particular, the lack of specialists in otolaryngology in third world countries forces patients to seek medical attention from general practitioners, whom might not have enough training and experience for making correct diagnosis in this field. To tackle this problem, we propose and test a computer-aided system based on machine learning models and image processing techniques for otoscopic examination, as a support for a more accurate diagnosis of ear conditions at primary care before specialist referral; in particular, for myringosclerosis, earwax plug, and chronic otitis media. To characterize the tympanic membrane and ear canal for each condition, we implemented three different feature extraction methods: color coherence vector, discrete cosine transform, and filter bank. We also considered three machine learning algorithms: support vector machine (SVM), k-nearest neighbor (k-NN) and decision trees to develop the ear condition predictor model. To conduct the research, our database included 160 images as testing set and 720 images as training and validation sets of 180 patients. We repeatedly trained the learning models using the training dataset and evaluated them using the validation dataset to thus obtain the best feature extraction method and learning model that produce the highest validation accuracy. The results showed that the SVM and k-NN presented the best performance followed by decision trees model. Finally, we performed a classification stage -i.e., diagnosis- using testing data, where the SVM model achieved an average classification accuracy of 93.9%, average sensitivity of 87.8%, average specificity of 95.9%, and average positive predictive value of 87.7%. The results show that this system might be used for general practitioners as a reference to make better decisions in the ear pathologies diagnosis.
    DOI:  https://doi.org/10.1371/journal.pone.0229226
  12. J Affect Disord. 2020 Feb 28. pii: S0165-0327(19)33545-1. [Epub ahead of print]268 118-126
       BACKGROUND: Depressive disturbances in Parkinson's disease (dPD) have been identified as the most important determinant of quality of life in patients with Parkinson's disease (PD). Prediction models to triage patients at risk of depression early in the disease course are needed for prognosis and stratification of participants in clinical trials.
    METHODS: One machine learning algorithm called extreme gradient boosting (XGBoost) and the logistic regression technique were applied for the prediction of clinically significant depression (defined as The 15-item Geriatric Depression Scale [GDS-15] ≥ 5) using a prospective cohort study of 312 drug-naïve patients with newly diagnosed PD during 2-year follow-up from the Parkinson's Progression Markers Initiative (PPMI) database. Established models were assessed with out-of-sample validation and the whole sample was divided into training and testing samples by the ratio of 7:3.
    RESULTS: Both XGBoost model and logistic regression model achieved good discrimination and calibration. 2 PD-specific factors (age at onset, duration) and 4 nonspecific factors (baseline GDS-15 score, State Trait Anxiety Inventory [STAI] score, Rapid Eye Movement Sleep Behavior Disorder Screening Questionnaire [RBDSQ] score, and history of depression) were identified as important predictors by two models.
    LIMITATIONS: Access to several variables was limited by database.
    CONCLUSIONS: In this longitudinal study, we developed promising tools to provide personalized estimates of depression in early PD and studied the relative contribution of PD-specific and nonspecific predictors, constituting a substantial addition to the current understanding of dPD.
    Keywords:  Depression; Machine learning; Parkinson's disease; Prediction model
    DOI:  https://doi.org/10.1016/j.jad.2020.02.046
  13. Schizophr Res. 2020 Mar 05. pii: S0920-9964(20)30023-2. [Epub ahead of print]
    GROUP Investigators
       OBJECTIVE: The main goal of the study was to predict individual patients' future mental healthcare consumption, and thereby enhancing the design of an efficient demand-oriented mental healthcare system by focusing on a patient population associated with intensive mental healthcare consumption. Factors that affect the mental healthcare consumption of service users with non-affective psychosis were identified, and subsequently used in a prognostic model to predict future healthcare consumption.
    METHOD: This study was a secondary analysis of an existing dataset from the GROUP study. Based on mental healthcare consumption, patients with non-affective psychosis were divided into two groups: low (N = 579) and high (N = 488) intensive mental healthcare consumers. Three different techniques from the field of machine learning were applied on crosssectional data to identify risk factors: logistic regression, classification tree and a random forest. Subsequently, the same techniques were applied longitudinally in order to predict future healthcare consumption.
    RESULTS: Identified variables that affected healthcare consumption were the number of psychotic episodes, paid employment, engagement in social activities, previous healthcare consumption, and met needs. Analyses showed that the random forest method is best suited to model risk factors, and that these relations predict future healthcare consumption (AUC 0.71, PPV 0.65).
    CONCLUSIONS: Machine learning techniques provide valuable information for identifying risk factors in psychosis. They may thus help clinicians optimize allocation of mental healthcare resources by predicting future healthcare consumption.
    Keywords:  Machine learning; Non-affective psychosis; Predicting mental healthcare use
    DOI:  https://doi.org/10.1016/j.schres.2020.01.008
  14. JAMA Netw Open. 2020 Mar 02. 3(3): e200772
       Importance: Predicting infarct size and location is important for decision-making and prognosis in patients with acute stroke.
    Objectives: To determine whether a deep learning model can predict final infarct lesions using magnetic resonance images (MRIs) acquired at initial presentation (baseline) and to compare the model with current clinical prediction methods.
    Design, Setting, and Participants: In this multicenter prognostic study, a specific type of neural network for image segmentation (U-net) was trained, validated, and tested using patients from the Imaging Collaterals in Acute Stroke (iCAS) study from April 14, 2014, to April 15, 2018, and the Diffusion Weighted Imaging Evaluation for Understanding Stroke Evolution Study-2 (DEFUSE-2) study from July 14, 2008, to September 17, 2011 (reported in October 2012). Patients underwent baseline perfusion-weighted and diffusion-weighted imaging and MRI at 3 to 7 days after baseline. Patients were grouped into unknown, minimal, partial, and major reperfusion status based on 24-hour imaging results. Baseline images acquired at presentation were inputs, and the final true infarct lesion at 3 to 7 days was considered the ground truth for the model. The model calculated the probability of infarction for every voxel, which can be thresholded to produce a prediction. Data were analyzed from July 1, 2018, to March 7, 2019.
    Main Outcomes and Measures: Area under the curve, Dice score coefficient (DSC) (a metric from 0-1 indicating the extent of overlap between the prediction and the ground truth; a DSC of ≥0.5 represents significant overlap), and volume error. Current clinical methods were compared with model performance in subgroups of patients with minimal or major reperfusion.
    Results: Among the 182 patients included in the model (97 women [53.3%]; mean [SD] age, 65 [16] years), the deep learning model achieved a median area under the curve of 0.92 (interquartile range [IQR], 0.87-0.96), DSC of 0.53 (IQR, 0.31-0.68), and volume error of 9 (IQR, -14 to 29) mL. In subgroups with minimal (DSC, 0.58 [IQR, 0.31-0.67] vs 0.55 [IQR, 0.40-0.65]; P = .37) or major (DSC, 0.48 [IQR, 0.29-0.65] vs 0.45 [IQR, 0.15-0.54]; P = .002) reperfusion for which comparison with existing clinical methods was possible, the deep learning model had comparable or better performance.
    Conclusions and Relevance: The deep learning model appears to have successfully predicted infarct lesions from baseline imaging without reperfusion information and achieved comparable performance to existing clinical methods. Predicting the subacute infarct lesion may help clinicians prepare for decompression treatment and aid in patient selection for neuroprotective clinical trials.
    DOI:  https://doi.org/10.1001/jamanetworkopen.2020.0772
  15. Skeletal Radiol. 2020 Mar 13.
       OBJECTIVE: To clinically validate a fully automated deep convolutional neural network (DCNN) for detection of surgically proven meniscus tears.
    MATERIALS AND METHODS: One hundred consecutive patients were retrospectively included, who underwent knee MRI and knee arthroscopy in our institution. All MRI were evaluated for medial and lateral meniscus tears by two musculoskeletal radiologists independently and by DCNN. Included patients were not part of the training set of the DCNN. Surgical reports served as the standard of reference. Statistics included sensitivity, specificity, accuracy, ROC curve analysis, and kappa statistics.
    RESULTS: Fifty-seven percent (57/100) of patients had a tear of the medial and 24% (24/100) of the lateral meniscus, including 12% (12/100) with a tear of both menisci. For medial meniscus tear detection, sensitivity, specificity, and accuracy were for reader 1: 93%, 91%, and 92%, for reader 2: 96%, 86%, and 92%, and for the DCNN: 84%, 88%, and 86%. For lateral meniscus tear detection, sensitivity, specificity, and accuracy were for reader 1: 71%, 95%, and 89%, for reader 2: 67%, 99%, and 91%, and for the DCNN: 58%, 92%, and 84%. Sensitivity for medial meniscus tears was significantly different between reader 2 and the DCNN (p = 0.039), and no significant differences existed for all other comparisons (all p ≥ 0.092). The AUC-ROC of the DCNN was 0.882, 0.781, and 0.961 for detection of medial, lateral, and overall meniscus tear. Inter-reader agreement was very good for the medial (kappa = 0.876) and good for the lateral meniscus (kappa = 0.741).
    CONCLUSION: DCNN-based meniscus tear detection can be performed in a fully automated manner with a similar specificity but a lower sensitivity in comparison with musculoskeletal radiologists.
    Keywords:  Artificial intelligence; Data accuracy; Magnetic resonance imaging; Neural networks (computer); Tibial meniscus injuries
    DOI:  https://doi.org/10.1007/s00256-020-03410-2