bims-arihec Biomed News
on Artificial intelligence in healthcare
Issue of 2020–02–23
24 papers selected by
Céline Bélanger, Cogniges Inc.



  1. Circ Arrhythm Electrophysiol. 2020 Feb 16.
      Background - Deep learning algorithms derived in homogeneous populations may be poorly generalizable and have the potential to reflect, perpetuate, and even exacerbate racial/ethnic disparities in health and healthcare. In this study we aimed to (1) assess if the performance of a deep learning algorithm designed to detect low left ventricular ejection fraction (LVEF) using the 12-lead electrocardiogram (ECG) varies by race/ethnicity, and to (2) determine whether its performance is determined by the derivation population, or by racial variation in the ECG. Methods - We performed a retrospective cohort analysis that included 97,829 patients with paired ECGs and echocardiograms. We tested the model performance by race/ethnicity for convolutional neural network (CNN) designed to identify patients with a LVEF ≤35% from the 12-lead ECG. Results - The CNN which was previously derived in a homogeneous population (derivation cohort N=44,959; 96.2% non-Hispanic White) demonstrated consistent performance to detect low LVEF across a range of racial/ethnic subgroups in a separate testing cohort (N=52,870): Non-Hispanic white (N= 44,524, AUC 0.931), Asian (N=557, AUC 0.961), black/African American (N=651, AUC 0.937), Hispanic/Latino (N=331, AUC 0.937), and American Indian/Native Alaskan (N=223, AUC 0.938). In secondary analyses, a separate neural network was able to discern racial subgroup category (Black/African-American [AUC of 0.84], and white, non-Hispanic [AUC 0.76] in a five-class classifier), and a network trained only in non-Hispanic whites from the original derivation cohort performed similarly well across a range of racial/ethnic subgroups in the testing cohort with an AUC of at least 0.930 in all racial/ethnic subgroups. Conclusions - Our study demonstrates that while ECG characteristics vary by race, this did not impact the ability of a CNN to predict low LVEF from the ECG. We recommend reporting of performance amongst diverse ethnic, racial, age and gender groups for all new AI tools to ensure responsible use of AI in medicine.
    Keywords:  artificial intelligence; machine learning
    DOI:  https://doi.org/10.1161/CIRCEP.119.007988
  2. Eur Radiol. 2020 Feb 17.
      Artificial intelligence (AI) has the potential to significantly disrupt the way radiology will be practiced in the near future, but several issues need to be resolved before AI can be widely implemented in daily practice. These include the role of the different stakeholders in the development of AI for imaging, the ethical development and use of AI in healthcare, the appropriate validation of each developed AI algorithm, the development of effective data sharing mechanisms, regulatory hurdles for the clearance of AI algorithms, and the development of AI educational resources for both practicing radiologists and radiology trainees. This paper details these issues and presents possible solutions based on discussions held at the 2019 meeting of the International Society for Strategic Studies in Radiology. KEY POINTS: • Radiologists should be aware of the different types of bias commonly encountered in AI studies, and understand their possible effects. • Methods for effective data sharing to train, validate, and test AI algorithms need to be developed. • It is essential for all radiologists to gain an understanding of the basic principles, potentials, and limits of AI.
    Keywords:  Artificial intelligence; Bioethics; Data; Education; Regulation
    DOI:  https://doi.org/10.1007/s00330-020-06672-5
  3. Eur J Vasc Endovasc Surg. 2020 Feb 13. pii: S1078-5884(20)30065-4. [Epub ahead of print]
      
    DOI:  https://doi.org/10.1016/j.ejvs.2020.01.019
  4. CMAJ Open. 2020 Jan-Mar;8(1):8(1): E90-E95
       BACKGROUND: As artificial intelligence (AI) approaches in research increase and AI becomes more integrated into medicine, there is a need to understand perspectives from members of the Canadian public and medical community. The aim of this project was to investigate current perspectives on ethical issues surrounding AI in health care.
    METHODS: In this qualitative study, adult patients with meningioma and their caregivers were recruited consecutively (August 2018-February 2019) from a neurosurgical clinic in Toronto. Health care providers caring for these patients were recruited through snowball sampling. Based on a nonsystematic literature search, we constructed 3 vignettes that sought participants' views on hypothetical issues surrounding potential AI applications in health care. The vignettes were presented to participants in interviews, which lasted 15-45 minutes. Responses were transcribed and coded for concepts, frequency of response types and larger concepts emerging from the interview.
    RESULTS: We interviewed 30 participants: 18 patients, 7 caregivers and 5 health care providers. For each question, a variable number of responses were recorded. The majority of participants endorsed nonconsented use of health data but advocated for disclosure and transparency. Few patients and caregivers felt that allocation of health resources should be done via computerized output, and a majority stated that it was inappropriate to delegate such decisions to a computer. Almost all participants felt that selling health data should be prohibited, and a minority stated that less privacy is acceptable for the goal of improving health. Certain caveats were identified, including the desire for deidentification of data and use within trusted institutions.
    INTERPRETATION: In this preliminary study, patients and caregivers reported a mixture of hopefulness and concern around the use of AI in health care research, whereas providers were generally more skeptical. These findings provide a point of departure for institutions adopting health AI solutions to consider the ethical implications of this work by understanding stakeholders' perspectives.
    DOI:  https://doi.org/10.9778/cmajo.20190151
  5. J Med Educ Curric Dev. 2019 Jan-Dec;6:6 2382120519889348
      Discussions surrounding the future of artificial intelligenc (AI) in healthcare often cause consternation among healthcare professionals. These feelings may stem from a lack of formal education on AI and how to be a leader of AI implementation in medical systems. To address this, our academic medical center hosted an educational summit exploring how to become a leader of AI in healthcare. This article presents three lessons learned from hosting this summit, thus providing guidance for developing medical curriculum on the topic of AI in healthcare.
    Keywords:  AI; artifical intelligence; augmented intelligence; leadership
    DOI:  https://doi.org/10.1177/2382120519889348
  6. J Med Imaging (Bellingham). 2020 Jan;7(1): 016502
      We present a roadmap for integrating artificial intelligence (AI)-based image analysis algorithms into existing radiology workflows such that (1) radiologists can significantly benefit from enhanced automation in various imaging tasks due to AI, and (2) radiologists' feedback is utilized to further improve the AI application. This is achieved by establishing three maturity levels where (1) research enables the visualization of AI-based results/annotations by radiologists without generating new patient records; (2) production allows the AI-based system to generate results stored in an institution's picture-archiving and communication system; and (3) feedback equips radiologists with tools for editing the AI inference results for periodic retraining of the deployed AI systems, thereby allowing continuous organic improvement of AI-based radiology-workflow solutions. A case study (i.e., detection of brain metastases with T1-weighted contrast-enhanced three-dimensional MRI) illustrates the deployment details of a particular AI-based application according to the aforementioned maturity levels. It is shown that the given AI application significantly improves with feedback coming from radiologists; the number of incorrectly detected brain metastases (false positives) decreases from 14.2 to 9.12 per patient with the number of subsequently annotated datasets increasing from 93 to 217 as a result of radiologist adjudication.
    Keywords:  AI-based image analysis; digital imaging and communications in medicine; picture archiving and communication system; radiology workflow
    DOI:  https://doi.org/10.1117/1.JMI.7.1.016502
  7. Healthc Inform Res. 2020 Jan;26(1): 20-33
       Objectives: The study aimed to develop and compare predictive models based on supervised machine learning algorithms for predicting the prolonged length of stay (LOS) of hospitalized patients diagnosed with five different chronic conditions.
    Methods: An administrative claim dataset (2008-2012) of a regional network of nine hospitals in the Tampa Bay area, Florida, USA, was used to develop the prediction models. Features were extracted from the dataset using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes. Five learning algorithms, namely, decision tree C5.0, linear support vector machine (LSVM), k-nearest neighbors, random forest, and multi-layered artificial neural networks, were used to build the model with semi-supervised anomaly detection and two feature selection methods. Issues with the unbalanced nature of the dataset were resolved using the Synthetic Minority Over-sampling Technique (SMOTE).
    Results: LSVM with wrapper feature selection performed moderately well for all patient cohorts. Using SMOTE to counter data imbalances triggered a tradeoff between the model's sensitivity and specificity, which can be masked under a similar area under the curve. The proposed aggregate rank selection approach resulted in a balanced performing model compared to other criteria. Finally, factors such as comorbidity conditions, source of admission, and payer types were associated with the increased risk of a prolonged LOS.
    Conclusions: Prolonged LOS is mostly associated with pre-intraoperative clinical and patient socioeconomic factors. Accurate patient identification with the risk of prolonged LOS using the selected model can provide hospitals a better tool for planning early discharge and resource allocation, thus reducing avoidable hospitalization costs.
    Keywords:  Chronic Disease; Discharge Planning; Inpatients; Length of Stay; Machine Learning
    DOI:  https://doi.org/10.4258/hir.2020.26.1.20
  8. Eur Radiol. 2020 Feb 17.
       PURPOSE: This study aimed to validate a deep learning model's diagnostic performance in using computed tomography (CT) to diagnose cervical lymph node metastasis (LNM) from thyroid cancer in a large clinical cohort and to evaluate the model's clinical utility for resident training.
    METHODS: The performance of eight deep learning models was validated using 3838 axial CT images from 698 consecutive patients with thyroid cancer who underwent preoperative CT imaging between January and August 2018 (3606 and 232 images from benign and malignant lymph nodes, respectively). Six trainees viewed the same patient images (n = 242), and their diagnostic performance and confidence level (5-point scale) were assessed before and after computer-aided diagnosis (CAD) was included.
    RESULTS: The overall area under the receiver operating characteristics (AUROC) of the eight deep learning algorithms was 0.846 (range 0.784-0.884). The best performing model was Xception, with an AUROC of 0.884. The diagnostic accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of Xception were 82.8%, 80.2%, 83.0%, 83.0%, and 80.2%, respectively. After introducing the CAD system, underperforming trainees received more help from artificial intelligence than the higher performing trainees (p = 0.046), and overall confidence levels significantly increased from 3.90 to 4.30 (p < 0.001).
    CONCLUSION: The deep learning-based CAD system used in this study for CT diagnosis of cervical LNM from thyroid cancer was clinically validated with an AUROC of 0.884. This approach may serve as a training tool to help resident physicians to gain confidence in diagnosis.
    KEY POINTS: • A deep learning-based CAD system for CT diagnosis of cervical LNM from thyroid cancer was validated using data from a clinical cohort. The AUROC for the eight tested algorithms ranged from 0.784 to 0.884. • Of the eight models, the Xception algorithm was the best performing model for the external validation dataset with 0.884 AUROC. The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were 82.8%, 80.2%, 83.0%, 83.0%, and 80.2%, respectively. • The CAD system exhibited potential to improve diagnostic specificity and accuracy in underperforming trainees (3 of 6 trainees, 50.0%). This approach may have clinical utility as a training tool to help trainees to gain confidence in diagnoses.
    Keywords:  Deep learning; Lymphatic metastasis; Thyroid neoplasms; Tomography, X-ray computed
    DOI:  https://doi.org/10.1007/s00330-019-06652-4
  9. Circulation. 2020 Feb 14.
      Background: Myocardial perfusion reflects the macro- and microvascular coronary circulation. Recent quantitation developments using cardiovascular magnetic resonance (CMR) perfusion permit automated measurement clinically. We explored the prognostic significance of stress myocardial blood flow (MBF) and myocardial perfusion reserve (MPR, the ratio of stress to rest MBF). Methods: A two center study of patients with both suspected and known coronary artery disease referred clinically for perfusion assessment. Image analysis was performed automatically using a novel artificial intelligence approach deriving global and regional stress and rest MBF and MPR. Cox proportional hazard models adjusting for co-morbidities and CMR parameters sought associations of stress MBF and MPR with death and major adverse cardiovascular events (MACE), including myocardial infarction, stroke, heart failure hospitalization, late (>90 day) revascularization and death. Results: 1049 patients were included with median follow-up 605 (interquartile range 464-814) days. There were 42 (4.0%) deaths and 188 MACE in 174 (16.6%) patients. Stress MBF and MPR were independently associated with both death and MACE. For each 1ml/g/min decrease in stress MBF the adjusted hazard ratio (HR) for death and MACE were 1.93 (95% CI 1.08-3.48, P=0.028) and 2.14 (95% CI 1.58-2.90, P<0.0001) respectively, even after adjusting for age and co-morbidity. For each 1 unit decrease in MPR the adjusted HR for death and MACE were 2.45 (95% CI 1.42-4.24, P=0.001) and 1.74 (95% CI 1.36-2.22, P<0.0001) respectively. In patients without regional perfusion defects on clinical read and no known macrovascular coronary artery disease (n=783), MPR remained independently associated with death and MACE, with stress MBF remaining associated with MACE only. Conclusions: In patients with known or suspected coronary artery disease, reduced MBF and MPR measured automatically inline using artificial intelligence quantification of CMR perfusion mapping provides a strong, independent predictor of adverse cardiovascular outcomes.
    Keywords:  cardiovascular magnetic resonance; inline perfusion quantification
    DOI:  https://doi.org/10.1161/CIRCULATIONAHA.119.044666
  10. Gastrointest Endosc. 2020 Feb 18. pii: S0016-5107(20)30132-2. [Epub ahead of print]
       BACKGROUND AND AIMS: Protruding lesions of the small bowel vary in wireless capsule endoscopy (WCE) images, and their automatic detection may be difficult. We aimed to develop and test a deep learning-based system to automatically detect protruding lesions of various types in WCE images.
    METHODS: We trained a deep convolutional neural network (CNN), using 30,584 WCE images of protruding lesions from 292 patients. We evaluated CNN performance by calculating the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity, using an independent set of 17,507 test images from 93 patients, including 7,507 images of protruding lesions from 73 patients.
    RESULTS: The developed CNN analyzed 17,507 images in 530.462 seconds. The AUC for detection of protruding lesions was 0.911 (95% confidence interval [Cl], 0.9069 - 0.9155). The sensitivity and specificity of the CNN were 90.7% (95% CI, 90.0% - 91.4%) and 79.8% (95% CI, 79.0% - 80.6%), respectively, at the optimal cut-off value of 0.317 for probability score. In subgroup analysis of the category of protruding lesions, the sensitivities were 86.5%, 92.0%, 95.8%, 77.0%, and 94.4% for the detection of polyps, nodules, epithelial tumors, submucosal tumors, and venous structures, respectively. In individual patient analysis (n = 73), the detection rate of protruding lesions was 98.6%.
    CONCLUSION: We developed and tested a new computer-aided system based on a CNN to automatically detect various protruding lesions in WCE images. Patient-level analyses with larger cohorts and efforts to achieve better diagnostic performance are necessary in further studies.
    Keywords:  artificial intelligence; convolutional neural network; deep learning; protruding lesion; wireless capsule endoscopy
    DOI:  https://doi.org/10.1016/j.gie.2020.01.054
  11. Dig Endosc. 2020 Feb 16.
       OBJECTIVES: The prognosis for pharyngeal cancer is relatively poor. It is usually diagnosed in an advanced stage. Although the recent development of narrow-band imaging (NBI) and increased awareness among endoscopists have enabled detection of superficial pharyngeal cancer, these techniques are still not prevalent worldwide. Nevertheless, artificial intelligence (AI)-based deep learning has led to significant advancements in various medical fields. Here, we demonstrate the diagnostic ability of AI-based detection of pharyngeal cancer from endoscopic images in esophagogastroduodenoscopy.
    METHODS: We retrospectively collected 5,403 training images of pharyngeal cancer from 202 superficial cancers and 45 advanced cancers from the Cancer Institute Hospital, Tokyo, Japan. Using these images, we developed an AI-based diagnosing system with convolutional neural networks. We prepared 1,912 validation images from 35 patients with 40 pharyngeal cancers and 40 patients without pharyngeal cancer to evaluate our system.
    RESULTS: Our AI-based diagnosing system correctly detected all pharyngeal cancer lesions (40/40) in the patients with cancer, including three small lesions smaller than 10 mm. For each image, the AI-based system correctly detected pharyngeal cancers in images obtained via NBI with a sensitivity of 85.6%, significantly higher sensitivity than that for images obtained via white light imaging (70.1%). The novel diagnosing system took only 28 s to analyze 1,912 validation images.
    CONCLUSIONS: The novel AI-based diagnosing system detected pharyngeal cancer with high sensitivity. Therefore, it could facilitate early detection, thereby leading to better prognosis and quality of life for patients with pharyngeal cancers in the near future.
    Keywords:  Artificial intelligence; Convolutional neural network; Deep learning; Pharyngeal cancer
    DOI:  https://doi.org/10.1111/den.13653
  12. Lancet. 2020 Feb 15. pii: S0140-6736(20)30294-4. [Epub ahead of print]395(10223): 485
      
    DOI:  https://doi.org/10.1016/S0140-6736(20)30294-4
  13. J Am Coll Radiol. 2020 Feb 14. pii: S1546-1440(20)30028-4. [Epub ahead of print]
       OBJECTIVES: Performance of recently developed deep learning models for image classification surpasses that of radiologists. However, there are questions about model performance consistency and generalization in unseen external data. The purpose of this study is to determine if the high performance of deep learning on mammograms can be transferred to external data with a different data distribution.
    MATERIALS AND METHODS: Six deep learning models (three published models with high performance and three models designed by us) were evaluated on four different mammogram data sets, including three public (Digital Database for Screening Mammography, INbreast, and Mammographic Image Analysis Society) and one private data set (UKy). The models were trained and validated on either Digital Database for Screening Mammography alone or a combined data set that included Digital Database for Screening Mammography. The models were then tested on the three external data sets. The area under the receiver operating characteristic curve was used to evaluate model performance.
    RESULTS: The three published models reported validation area under the receiver operating characteristic curve scores between 0.88 and 0.95 on the validation data set. Our models achieved between 0.71 (95% confidence interval [CI]: 0.70-0.72) and 0.79 (95% CI: 0.78-0.80) area under the receiver operating characteristic curve on the same validation data set. However, the same evaluation criteria of all six models on the three external test data sets were significantly decreased, only between 0.44 (95% CI: 0.43-0.45) and 0.65 (95% CI: 0.64-0.66).
    CONCLUSION: Our results demonstrate performance inconsistency across the data sets and models, indicating that the high performance of deep learning models on one data set cannot be readily transferred to unseen external data sets, and these models need further assessment and validation before being applied in clinical practice.
    Keywords:  Deep learning; mammogram; performance inconsistency
    DOI:  https://doi.org/10.1016/j.jacr.2020.01.006
  14. Eur Radiol. 2020 Feb 21.
       OBJECTIVES: Classification of histologic subgroups has significant prognostic value for lung adenocarcinoma patients who undergo surgical resection. However, clinical histopathology assessment is generally performed on only a small portion of the overall tumor from biopsy or surgery. Our objective is to identify a noninvasive quantitative imaging biomarker (QIB) for the classification of histologic subgroups in lung adenocarcinoma patients.
    METHODS: We retrospectively collected and reviewed 1313 CT scans of patients with resected lung adenocarcinomas from two geographically distant institutions who were seen between January 2014 and October 2017. Three study cohorts, the training, internal validation, and external validation cohorts, were created, within which lung adenocarcinomas were divided into two disease-free-survival (DFS)-associated histologic subgroups, the mid/poor and good DFS groups. A comprehensive machine learning- and deep learning-based analytical system was adopted to identify reproducible QIBs and help to understand QIBs' significance.
    RESULTS: Intensity-Skewness, a QIB quantifying tumor density distribution, was identified as the optimal biomarker for predicting histologic subgroups. Intensity-Skewness achieved high AUCs (95% CI) of 0.849(0.813,0.881), 0.820(0.781,0.856) and 0.863(0.827,0.895) on the training, internal validation, and external validation cohorts, respectively. A criterion of Intensity-Skewness ≤ 1.5, which indicated high tumor density, showed high specificity of 96% (sensitivity 46%) and 99% (sensitivity 53%) on predicting the mid/poor DFS group in the training and external validation cohorts, respectively.
    CONCLUSIONS: A QIB derived from routinely acquired CT was able to predict lung adenocarcinoma histologic subgroups, providing a noninvasive method that could potentially benefit personalized treatment decision-making for lung cancer patients.
    KEY POINTS: • A noninvasive imaging biomarker, Intensity-Skewness, which described the distortion of pixel-intensity distribution within lesions on CT images, was identified as a biomarker to predict disease-free-survival-associated histologic subgroups in lung adenocarcinoma. • An Intensity-Skewness of ≤ 1.5 has high specificity in predicting the mid/poor disease-free survival histologic patient group in both the training cohort and the external validation cohort. • The Intensity-Skewness is a feature that can be automatically computed with high reproducibility and robustness.
    Keywords:  Adenocarcinoma of lung; Deep learning; Histological types of neoplasms; Machine learning; Tomography, X-ray computed
    DOI:  https://doi.org/10.1007/s00330-020-06663-6
  15. Pituitary. 2020 Feb 15.
       PURPOSE: This study was designed to develop a computer-aided diagnosis (CAD) system based on a convolutional neural network (CNN) to diagnose patients with pituitary tumors.
    METHODS: We included adult patients clinically diagnosed with pituitary adenoma (pituitary adenoma group), or adult individuals without pituitary adenoma (control group). After pre-processing, all the MRI data were randomly divided into training or testing datasets in a ratio of 8:2 to create or evaluate the CNN model. Multiple CNNs with the same structure were applied for different types of MR images respectively, and a comprehensive diagnosis was performed based on the classification results of different types of MR images using an equal-weighted majority voting strategy. Finally, we assessed the diagnostic performance of the CAD system by accuracy, sensitivity, specificity, positive predictive value, and F1 score.
    RESULTS: We enrolled 149 participants with 796 MR images and adopted the data augmentation technology to create 7960 new images. The proposed CAD method showed remarkable diagnostic performance with an overall accuracy of 91.02%, sensitivity of 92.27%, specificity of 75.70%, positive predictive value of 93.45%, and F1-score of 92.67% in separate MRI type. In the comprehensive diagnosis, the CAD achieved better performance with accuracy, sensitivity, and specificity of 96.97%, 94.44%, and 100%, respectively.
    CONCLUSION: The CAD system could accurately diagnose patients with pituitary tumors based on MR images. Further, we will improve this CAD system by augmenting the amount of dataset and evaluate its performance by external dataset.
    Keywords:  Artificial intelligence; Convolutional neural network; Diagnose; Magnetic resonance imaging; Pituitary adenoma
    DOI:  https://doi.org/10.1007/s11102-020-01032-4
  16. J Am Coll Cardiol. 2020 Feb 25. pii: S0735-1097(20)30003-6. [Epub ahead of print]75(7): 722-733
       BACKGROUND: Hypertrophic cardiomyopathy (HCM) is an uncommon but important cause of sudden cardiac death.
    OBJECTIVES: This study sought to develop an artificial intelligence approach for the detection of HCM based on 12-lead electrocardiography (ECG).
    METHODS: A convolutional neural network (CNN) was trained and validated using digital 12-lead ECG from 2,448 patients with a verified HCM diagnosis and 51,153 non-HCM age- and sex-matched control subjects. The ability of the CNN to detect HCM was then tested on a different dataset of 612 HCM and 12,788 control subjects.
    RESULTS: In the combined datasets, mean age was 54.8 ± 15.9 years for the HCM group and 57.5 ± 15.5 years for the control group. After training and validation, the area under the curve (AUC) of the CNN in the validation dataset was 0.95 (95% confidence interval [CI]: 0.94 to 0.97) at the optimal probability threshold of 11% for having HCM. When applying this probability threshold to the testing dataset, the CNN's AUC was 0.96 (95% CI: 0.95 to 0.96) with sensitivity 87% and specificity 90%. In subgroup analyses, the AUC was 0.95 (95% CI: 0.94 to 0.97) among patients with left ventricular hypertrophy by ECG criteria and 0.95 (95% CI: 0.90 to 1.00) among patients with a normal ECG. The model performed particularly well in younger patients (sensitivity 95%, specificity 92%). In patients with HCM with and without sarcomeric mutations, the model-derived median probabilities for having HCM were 97% and 96%, respectively.
    CONCLUSIONS: ECG-based detection of HCM by an artificial intelligence algorithm can be achieved with high diagnostic performance, particularly in younger patients. This model requires further refinement and external validation, but it may hold promise for HCM screening.
    Keywords:  artificial intelligence; diagnostic performance; electrocardiogram; hypertrophic cardiomyopathy
    DOI:  https://doi.org/10.1016/j.jacc.2019.12.030
  17. Expert Rev Cardiovasc Ther. 2020 Feb 18.
      Introduction: With the increase in the number of patients with cardiovascular diseases, better risk-prediction models for cardiovascular events are needed. Statistical-based risk-prediction models for cardiovascular events (CVEs) are available, but they lack the ability to predict individual-level risk. Machine learning (ML) methods are especially equipped to handle complex data and provide accurate risk-prediction models at the individual level.Areas covered: In this review, the authors summarize the literature comparing the performance of machine learning methods to that of traditional, statistical-based models in predicting CVEs. They provide a brief summary of ML methods and then discuss risk-prediction models for CVEs such as major adverse cardiovascular events, heart failure and arrhythmias.Expert opinion: Current evidence supports the superiority of ML methods over statistical-based models in predicting CVEs. Statistical models are applicable at the population level and are subject to overfitting, while ML methods can provide an individualized risk level for CVEs. Further prospective research on ML-guided treatments to prevent CVEs is needed.
    Keywords:  Machine Learning; artificial intelligence; cardiovascular events; prediction
    DOI:  https://doi.org/10.1080/14779072.2020.1732208
  18. J Am Med Inform Assoc. 2020 Feb 17. pii: ocaa004. [Epub ahead of print]
       OBJECTIVE: Reliable longitudinal risk prediction for hospitalized patients is needed to provide quality care. Our goal is to develop a generalizable model capable of leveraging clinical notes to predict healthcare-associated diseases 24-96 hours in advance.
    METHODS: We developed a reCurrent Additive Network for Temporal RIsk Prediction (CANTRIP) to predict the risk of hospital acquired (occurring ≥ 48 hours after admission) acute kidney injury, pressure injury, or anemia ≥ 24 hours before it is implicated by the patient's chart, labs, or notes. We rely on the MIMIC III critical care database and extract distinct positive and negative cohorts for each disease. We retrospectively determine the date-of-event using structured and unstructured criteria and use it as a form of indirect supervision to train and evaluate CANTRIP to predict disease risk using clinical notes.
    RESULTS: Our experiments indicate that CANTRIP, operating on text alone, obtains 74%-87% area under the curve and 77%-85% Specificity. Baseline shallow models showed lower performance on all metrics, while bidirectional long short-term memory obtained the highest Sensitivity at the cost of significantly lower Specificity and Precision.
    DISCUSSION: Proper model architecture allows clinical text to be successfully harnessed to predict nosocomial disease, outperforming shallow models and obtaining similar performance to disease-specific models reported in the literature.
    CONCLUSION: Clinical text on its own can provide a competitive alternative to traditional structured features (eg, lab values, vital signs). CANTRIP is able to generalize across nosocomial diseases without disease-specific feature extraction and is available at https://github.com/h4ste/cantrip.
    Keywords:  artificial intelligence; clinical; decision support systems; deep learning; machine learning; medical informatics; natural language processing
    DOI:  https://doi.org/10.1093/jamia/ocaa004
  19. Clin Neurol Neurosurg. 2020 Feb 03. pii: S0303-8467(20)30061-5. [Epub ahead of print]192 105718
       OBJECTIVES: Machine Learning and Artificial Intelligence (AI) are rapidly growing in capability and increasingly applied to model outcomes and complications within medicine. In spinal surgery, post-operative surgical site infections (SSIs) are a rare, yet morbid complication. This paper applied AI to predict SSIs after posterior spinal fusions.
    PATIENTS AND METHODS: 4046 posterior spinal fusions were identified at a single academic center. A Deep Neural Network DNN classification model was trained using 35 unique input variables The model was trained and tested using cross-validation, in which the data were randomly partitioned into training n = 3034 and testing n = 1012 datasets. Stepwise multivariate regression was further used to identify actual model weights based on predictions from our trained model.
    RESULTS: The overall rate of infection was 1.5 %. The mean area under the curve (AUC), representing the accuracy of the model, across all 300 iterations was 0.775 (95 % CI [0.767,0.782]) with a median AUC of 0.787. The positive predictive value (PPV), representing how well the model predicted SSI when a patient had SSI, over all predictions was 92.56 % with a negative predictive value (NPV), representing how well the model predicted absence of SSI when a patient did not have SSI, of 98.45 %. In analyzing relative model weights, the five highest weighted variables were Congestive Heart Failure, Chronic Pulmonary Failure, Hemiplegia/Paraplegia, Multilevel Fusion and Cerebrovascular Disease respectively. Notable factors that were protective against infection were ICU Admission, Increasing Charlson Comorbidity Score, Race (White), and being male. Minimally invasive surgery (MIS) was also determined to be mildly protective.
    CONCLUSION: Machine learning and artificial intelligence are relevant and impressive tools that should be employed in the clinical decision making for patients. The variables with the largest model weights were primarily comorbidity related with the exception of multilevel fusion. Further study is needed, however, in order to draw any definitive conclusions.
    Keywords:  Artificial intelligence; Spine surgery; Surgical site infection
    DOI:  https://doi.org/10.1016/j.clineuro.2020.105718
  20. Radiol Artif Intell. 2019 May 08. 1(3): 180091
       Purpose: To investigate the feasibility of using a deep learning-based approach to detect an anterior cruciate ligament (ACL) tear within the knee joint at MRI by using arthroscopy as the reference standard.
    Materials and Methods: A fully automated deep learning-based diagnosis system was developed by using two deep convolutional neural networks (CNNs) to isolate the ACL on MR images followed by a classification CNN to detect structural abnormalities within the isolated ligament. With institutional review board approval, sagittal proton density-weighted and fat-suppressed T2-weighted fast spin-echo MR images of the knee in 175 subjects with a full-thickness ACL tear (98 male subjects and 77 female subjects; average age, 27.5 years) and 175 subjects with an intact ACL (100 male subjects and 75 female subjects; average age, 39.4 years) were retrospectively analyzed by using the deep learning approach. Sensitivity and specificity of the ACL tear detection system and five clinical radiologists for detecting an ACL tear were determined by using arthroscopic results as the reference standard. Receiver operating characteristic (ROC) analysis and two-sided exact binomial tests were used to further assess diagnostic performance.
    Results: The sensitivity and specificity of the ACL tear detection system at the optimal threshold were 0.96 and 0.96, respectively. In comparison, the sensitivity of the clinical radiologists ranged between 0.96 and 0.98, while the specificity ranged between 0.90 and 0.98. There was no statistically significant difference in diagnostic performance between the ACL tear detection system and clinical radiologists at P < .05. The area under the ROC curve for the ACL tear detection system was 0.98, indicating high overall diagnostic accuracy.
    Conclusion: There was no significant difference between the diagnostic performance of the ACL tear detection system and clinical radiologists for determining the presence or absence of an ACL tear at MRI.© RSNA, 2019Supplemental material is available for this article.
    DOI:  https://doi.org/10.1148/ryai.2019180091
  21. J Thorac Imaging. 2020 Feb 19.
       PURPOSE: The purpose of this study was to validate the accuracy of an artificial intelligence (AI) prototype application in determining bone mineral density (BMD) from chest computed tomography (CT), as compared with dual-energy x-ray absorptiometry (DEXA).
    MATERIALS AND METHODS: In this Institutional Review Board-approved study, we analyzed the data of 65 patients (57 female, mean age: 67.4 y) who underwent both DEXA and chest CT (mean time between scans: 1.31 y). From the DEXA studies, T-scores for L1-L4 (lumbar vertebrae 1 to 4) were recorded. Patients were then divided on the basis of their T-scores into normal control, osteopenic, or osteoporotic groups. An AI algorithm based on wavelet features, AdaBoost, and local geometry constraints independently localized thoracic vertebrae from chest CT studies and automatically computed average Hounsfield Unit (HU) values with kVp-dependent spectral correction. The Pearson correlation evaluated the correlation between the T-scores and HU values. Mann-Whitney U test was implemented to compare the HU values of normal control versus osteoporotic patients.
    RESULTS: Overall, the DEXA-determined T-scores and AI-derived HU values showed a moderate correlation (r=0.55; P<0.001). This 65-patient population was divided into 3 subgroups on the basis of their T-scores. The mean T-scores for the 3 subgroups (normal control, osteopenic, osteoporotic) were 0.77±1.50, -1.51±0.04, and -3.26±0.59, respectively. The mean DEXA-determined L1-L4 BMD measures were 1.13±0.16, 0.88±0.06, and 0.68±0.06 g/cm, respectively. The mean AI-derived attenuation values were 145±42.5, 136±31.82, and 103±16.28 HU, respectively. Using these AI-derived HU values, a significant difference was found between the normal control patients and osteoporotic group (P=0.045).
    CONCLUSION: Our results show that this AI prototype can successfully determine BMD in moderate correlation with DEXA. Combined with other AI algorithms directed at evaluating cardiac and lung diseases, this prototype may contribute to future comprehensive preventative care based on a single chest CT.
    DOI:  https://doi.org/10.1097/RTI.0000000000000484
  22. Gait Posture. 2020 Feb 10. pii: S0966-6362(20)30068-0. [Epub ahead of print]77 257-263
       BACKGROUND: Progressive supranuclear palsy (PSP), a neurodegenerative conditions may be difficult to discriminate clinically from idiopathic Parkinson's disease (PD). It is critical that we are able to do this accurately and as early as possible in order that future disease modifying therapies for PSP may be deployed at a stage when they are likely to have maximal benefit. Analysis of gait and related tasks is one possible means of discrimination.
    RESEARCH QUESTION: Here we investigate a wearable sensor array coupled with machine learning approaches as a means of disease classification.
    METHODS: 21 participants with PSP, 20 with PD, and 39 healthy control (HC) subjects performed a two minute walk, static sway test, and timed up-and-go task, while wearing an array of six inertial measurement units. The data were analysed to determine what features discriminated PSP from PD and PSP from HC. Two machine learning algorithms were applied, Logistic Regression (LR) and Random Forest (RF).
    RESULTS: 17 features were identified in the combined dataset that contained independent information. The RF classifier outperformed the LR classifier, and allowed discrimination of PSP from PD with 86 % sensitivity and 90 % specificity, and PSP from HC with 90 % sensitivity and 97 % specificity. Using data from the single lumbar sensor only resulted in only a modest reduction in classification accuracy, which could be restored using 3 sensors (lumbar, right arm and foot). However for maximum specificity the full six sensor array was needed.
    SIGNIFICANCE: A wearable sensor array coupled with machine learning methods can accurately discriminate PSP from PD. Choice of array complexity depends on context; for diagnostic purposes a high specificity is needed suggesting the more complete array is advantageous, while for subsequent disease tracking a simpler system may suffice.
    Keywords:  Gait; Inertial sensor array; Machine learning; Parkinson’s disease; Progressive supranuclear pasly; Wearables
    DOI:  https://doi.org/10.1016/j.gaitpost.2020.02.007