bims-arihec Biomed News
on Artificial intelligence in healthcare
Issue of 2019–11–17
twenty papers selected by
Céline Bélanger, Cogniges Inc.



  1. JAMA. 2019 11 12. 322(18): 1806-1816
      In recent years, many new clinical diagnostic tools have been developed using complicated machine learning methods. Irrespective of how a diagnostic tool is derived, it must be evaluated using a 3-step process of deriving, validating, and establishing the clinical effectiveness of the tool. Machine learning-based tools should also be assessed for the type of machine learning model used and its appropriateness for the input data type and data set size. Machine learning models also generally have additional prespecified settings called hyperparameters, which must be tuned on a data set independent of the validation set. On the validation set, the outcome against which the model is evaluated is termed the reference standard. The rigor of the reference standard must be assessed, such as against a universally accepted gold standard or expert grading.
    DOI:  https://doi.org/10.1001/jama.2019.16489
  2. Metabolomics. 2019 Nov 15. 15(12): 150
       INTRODUCTION: Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models.
    OBJECTIVES: We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when compared to linear alternatives, in particular when compared with the current gold standard PLS discriminant analysis.
    METHODS: We compared the general predictive performance of eight archetypal machine learning algorithms across ten publicly available clinical metabolomics data sets. The algorithms were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks.
    RESULTS: There was only marginal improvement in predictive ability for SVM and ANN over PLS across all data sets. RF performance was comparatively poor. The use of out-of-bag bootstrap confidence intervals provided a measure of uncertainty of model prediction such that the quality of metabolomics data was observed to be a bigger influence on generalised performance than model choice.
    CONCLUSION: The size of the data set, and choice of performance metric, had a greater influence on generalised predictive performance than the choice of machine learning algorithm.
    Keywords:  Artificial neural network; Jupyter; Machine learning; Metabolomics; Open source; Partial least squares; Random forest; Support vector machines
    DOI:  https://doi.org/10.1007/s11306-019-1612-4
  3. Diabetologia. 2019 Nov 12.
       AIMS/HYPOTHESIS: Corneal confocal microscopy is a rapid non-invasive ophthalmic imaging technique that identifies peripheral and central neurodegenerative disease. Quantification of corneal sub-basal nerve plexus morphology, however, requires either time-consuming manual annotation or a less-sensitive automated image analysis approach. We aimed to develop and validate an artificial intelligence-based, deep learning algorithm for the quantification of nerve fibre properties relevant to the diagnosis of diabetic neuropathy and to compare it with a validated automated analysis program, ACCMetrics.
    METHODS: Our deep learning algorithm, which employs a convolutional neural network with data augmentation, was developed for the automated quantification of the corneal sub-basal nerve plexus for the diagnosis of diabetic neuropathy. The algorithm was trained using a high-end graphics processor unit on 1698 corneal confocal microscopy images; for external validation, it was further tested on 2137 images. The algorithm was developed to identify total nerve fibre length, branch points, tail points, number and length of nerve segments, and fractal numbers. Sensitivity analyses were undertaken to determine the AUC for ACCMetrics and our algorithm for the diagnosis of diabetic neuropathy.
    RESULTS: The intraclass correlation coefficients for our algorithm were superior to those for ACCMetrics for total corneal nerve fibre length (0.933 vs 0.825), mean length per segment (0.656 vs 0.325), number of branch points (0.891 vs 0.570), number of tail points (0.623 vs 0.257), number of nerve segments (0.878 vs 0.504) and fractals (0.927 vs 0.758). In addition, our proposed algorithm achieved an AUC of 0.83, specificity of 0.87 and sensitivity of 0.68 for the classification of participants without (n = 90) and with (n = 132) neuropathy (defined by the Toronto criteria).
    CONCLUSIONS/INTERPRETATION: These results demonstrated that our deep learning algorithm provides rapid and excellent localisation performance for the quantification of corneal nerve biomarkers. This model has potential for adoption into clinical screening programmes for diabetic neuropathy.
    DATA AVAILABILITY: The publicly shared cornea nerve dataset (dataset 1) is available at http://bioimlab.dei.unipd.it/Corneal%20Nerve%20Tortuosity%20Data%20Set.htm and http://bioimlab.dei.unipd.it/Corneal%20Nerve%20Data%20Set.htm.
    Keywords:  Corneal confocal microscopy; Corneal nerve; Deep learning; Diabetic neuropathy; Image processing and analysis; Image segmentation; Ophthalmic imaging; Small nerve fibres
    DOI:  https://doi.org/10.1007/s00125-019-05023-4
  4. JAMA Ophthalmol. 2019 Nov 14.
       Importance: Techniques that properly identify patients in whom ocular hypertension (OHTN) is likely to progress to open-angle glaucoma can assist clinicians with deciding on the frequency of monitoring and the potential benefit of early treatment.
    Objective: To test whether Kalman filtering (KF), a machine learning technique, can accurately forecast mean deviation (MD), pattern standard deviation, and intraocular pressure values 5 years into the future for patients with OHTN.
    Design, Setting, and Participants: This cohort study was a secondary analysis of data from patients with OHTN from the Ocular Hypertension Treatment Study, performed between February 1994 and March 2009. Patients underwent tonometry and perimetry every 6 months for up to 15 years. A KF (KF-OHTN) model was trained, validated, and tested to assess how well it could forecast MD, pattern standard deviation, and intraocular pressure at up to 5 years, and the forecasts were compared with results from the actual trial. Kalman filtering for OHTN was compared with a previously developed KF for patients with high-tension glaucoma (KF-HTG) and 3 traditional forecasting algorithms. Statistical analysis for the present study was performed between May 2018 and May 2019.
    Main Outcomes and Measures: Prediction error and root-mean-square error at 12, 24, 36, 48, and 60 months for MD, pattern standard deviation, and intraocular pressure.
    Results: Among 1407 eligible patients (2806 eyes), 809 (57.5%) were female and the mean (SD) age at baseline was 57.5 (9.6) years. For 2124 eyes with sufficient measurements, KF-OHTN forecast MD values 60 months into the future within 0.5 dB of the actual value for 696 eyes (32.8%), 1.0 dB for 1295 eyes (61.0%), and 2.5 dB for 1980 eyes (93.2%). Among the 5 forecasting algorithms tested, KF-OHTN achieved the lowest root-mean-square error (1.72 vs 1.85-4.28) for MD values 60 months into the future. For the subset of eyes that progressed to open-angle glaucoma, KF-OHTN and KF-HTG forecast MD values 60 months into the future within 1 dB of the actual value for 30 eyes (68.2%; 95% CI, 54.4%-82.0%) and achieved the lowest root-mean-square error among all models.
    Conclusions and Relevance: These findings suggest that machine learning algorithms such as KF can accurately forecast MD, pattern standard deviation, and intraocular pressure 5 years into the future for many patients with OHTN. These algorithms may aid clinicians in managing OHTN in their patients.
    DOI:  https://doi.org/10.1001/jamaophthalmol.2019.4190
  5. Am J Ophthalmol. 2019 Nov 12. pii: S0002-9394(19)30543-4. [Epub ahead of print]
       PURPOSE: To compare the diagnostic performance of human gradings versus predictions provided by a machine-to-machine (M2M) deep learning (DL) algorithm trained to quantify retinal nerve fiber layer (RNFL) damage on fundus photographs.
    DESIGN: Evaluation of a machine learning algorithm.
    METHODS: A M2M DL algorithm trained with RNFL thickness parameters from spectral-domain optical coherence tomography was applied to a subset of 490 fundus photos of 490 eyes of 370 subjects graded by two glaucoma specialists for the probability of glaucomatous optical neuropathy (GON), and estimates of cup-to-disc (C/D) ratios. Spearman correlations with standard automated perimetry (SAP) global indices were compared between the human gradings versus the M2M DL-predicted RNFL thickness values. The area under the receiver operating characteristic curves (AUC) and partial AUC for the region of clinically meaningful specificity (85-100%) were used to compare the ability of each output to discriminate eyes with repeatable glaucomatous SAP defects versus eyes with normal fields.
    RESULTS: The M2M DL-predicted RNFL thickness had a significantly stronger absolute correlation with SAP mean deviation (rho=0.54) than the probability of GON given by human graders (rho=0.48; P<0.001). The partial AUC for the M2M DL algorithm was significantly higher than that for the probability of GON by human graders (partial AUC = 0.529 vs. 0.411, respectively; P=0.016).
    CONCLUSION: A M2M DL algorithm performed as well as, if not better than, human graders at detecting eyes with repeatable glaucomatous visual field loss. This DL algorithm could potentially replace human graders in population screening efforts for glaucoma.
    DOI:  https://doi.org/10.1016/j.ajo.2019.11.006
  6. J Crit Care. 2019 Nov 02. pii: S0883-9441(19)30751-8. [Epub ahead of print]55 73-78
       PURPOSE: To develop and compare the predictive performance of machine-learning algorithms to estimate the risk of quality-adjusted life year (QALY) lower than or equal to 30 days (30-day QALY).
    MATERIAL AND METHODS: Six machine-learning algorithms were applied to predict 30-day QALY for 777 patients admitted in a prospective cohort study conducted in Intensive Care Units (ICUs) of two public Brazilian hospitals specialized in cancer care. The predictors were 37 characteristics collected at ICU admission. Discrimination was evaluated using the area under the receiver operating characteristic (AUROC) curve. Sensitivity, 1-specificity, true/false positive and negative cases were measured for different estimated probability cutoff points (30%, 20% and 10%). Calibration was evaluated with GiViTI calibration belt and test.
    RESULTS: Except for basic decision trees, the adjusted predictive models were nearly equivalent, presenting good results for discrimination (AUROC curves over 0.80). Artificial neural networks and gradient boosted trees achieved the overall best calibration, implying an accurately predicted probability for 30-day QALY.
    CONCLUSIONS: Except for basic decision trees, predictive models derived from different machine-learning algorithms discriminated the QALY risk at 30 days well. Regarding calibration, artificial neural network model presented the best ability to estimate 30-day QALY in critically ill oncologic patients admitted to ICUs.
    Keywords:  Clinical decision-making; Critically ill patients; Intensive care unit; Machine learning; Prognosis; Quality of life
    DOI:  https://doi.org/10.1016/j.jcrc.2019.10.015
  7. Mult Scler J Exp Transl Clin. 2019 Oct-Dec;5(4):5(4): 2055217319885983
       Background: Enhanced prediction of progression in secondary progressive multiple sclerosis (SPMS) could improve clinical trial design. Machine learning (ML) algorithms are methods for training predictive models with minimal human intervention.
    Objective: To evaluate individual and ensemble model performance built using decision tree (DT)-based algorithms compared to logistic regression (LR) and support vector machines (SVMs) for predicting SPMS disability progression.
    Methods: SPMS participants (n = 485) enrolled in a 2-year placebo-controlled (negative) trial assessing the efficacy of MBP8298 were classified as progressors if a 6-month sustained increase in Expanded Disability Status Scale (EDSS) (≥1.0 or ≥0.5 for a baseline of ≤5.5 or ≥6.0 respectively) was observed. Variables included EDSS, Multiple Sclerosis Functional Composite component scores, T2 lesion volume, brain parenchymal fraction, disease duration, age, and sex. Area under the receiver operating characteristic curve (AUC) was the primary outcome for model evaluation.
    Results: Three DT-based models had greater AUCs (61.8%, 60.7%, and 60.2%) than independent and ensemble SVM (52.4% and 51.0%) and LR (49.5% and 51.1%).
    Conclusion: SPMS disability progression was best predicted by non-parametric ML. If confirmed, ML could select those with highest progression risk for inclusion in SPMS trial cohorts and reduce the number of low-risk individuals exposed to experimental therapies.
    Keywords:  Artificial intelligence; decision support techniques; disease progression; machine learning; prognosis; secondary progressive multiple sclerosis
    DOI:  https://doi.org/10.1177/2055217319885983
  8. Ann Thorac Surg. 2019 Nov 07. pii: S0003-4975(19)31612-1. [Epub ahead of print]
       BACKGROUND: This review article provides an overview of artificial intelligence (AI) and machine learning (ML) as it relates to cardiovascular healthcare.
    METHODS: An overview of the terminology and algorithms used in ML as it relates to healthcare are provided by the author. Articles published up to August 1, 2019 in the field of AI and ML in cardiovascular medicine are also reviewed and placed in the context of the potential role these approaches will have in clinical practice in the future.
    RESULTS: AI is a broader term referring to the ability of machines to perform intelligent tasks, and ML is a subset of AI that refers to the ability of machines to learn independently and make accurate predictions. An expanding body of literature has been published using ML in cardiovascular healthcare. Moreover, ML has been applied in the settings of automated imaging interpretation, natural language processing and data extraction from electronic health records, and predictive analytics. Examples include automated interpretation of chest xrays, electrocardiograms, echocardiograms, and angiography, identification of patients with early heart failure using clinical notes evaluated by ML, and predicting mortality or complications following percutaneous or surgical cardiovascular procedures.
    CONCLUSIONS: Although there is an expanding body of literature on AI and ML in cardiovascular medicine, the future these fields will have in clinical practice remains to be paved. In particular, there is a promising role in providing automated imaging interpretation, automated data extraction and quality control, and clinical risk prediction although these techniques require further refinement and evaluation.
    Keywords:  artificial intelligence; cardiovascular; machine learning
    DOI:  https://doi.org/10.1016/j.athoracsur.2019.09.042
  9. Diagn Interv Imaging. 2019 Nov 11. pii: S2211-5684(19)30234-7. [Epub ahead of print]
       OBJECTIVE: To assess the diagnostic value of machine learning-based texture feature analysis of late gadolinium enhancement images on cardiac magnetic resonance imaging (MRI) for assessing the presence of ventricular tachyarrhythmia (VT) in patients with hypertrophic cardiomyopathy.
    MATERIALS AND METHODS: This retrospective study included 64 patients with hypertrophic cardiomyopathy who underwent cardiac MRI and 24-hour Holter monitoring within 1 year before cardiac MRI. There were 42 men and 22 women with a mean age of 48.13±13.06 (SD) years (range: 20-70 years). Quantitative textural features were extracted via manually placed regions of interest in areas with high and intermediate signal intensity on late gadolinium-chelate enhanced images. Feature selection and dimension reduction were performed. The diagnostic performances of machine learning classifiers including support vector machines, Naive Bayes, k-nearest-neighbors, and random forest for predicting the presence of VT were assessed using the results of 24-hour Holter monitoring as the reference test. All machine learning models were assessed with and without the application of the synthetic minority over-sampling technique (SMOTE).
    RESULTS: Of the 64 patients with hypertrophic cardiomyopathy, 21/64 (32.8%) had VT. Of eight machine learning models investigated, k-nearest-neighbors with SMOTE exhibited the best diagnostic accuracy for the presence or absence of VT. k-nearest-neighbors with SMOTE correctly identified 40/42 (95.2%) VT-positive patients and 40/43 (93.0%) VT-negative patients, yielding 95.2% sensitivity (95% CI: 82.5%-99.1%), 93.0% specificity (95% CI: 79.8%-98.1%) and 94.1% accuracy (95% CI: 88.8%-98%).
    CONCLUSION: Machine learning-based texture analysis of late gadolinium-chelate enhancement-positive areas is a promising tool for the classification of hypertrophic cardiomyopathy patients with and without VT.
    Keywords:  Artificial intelligence; Cardiomyopathy; Hypertrophic; Machine learning; Tachycardia; Texture analysis; Ventricular
    DOI:  https://doi.org/10.1016/j.diii.2019.10.005
  10. Eur J Heart Fail. 2019 Nov 12.
       BACKGROUND: Predicting mortality is important in patients with heart failure (HF). However, current strategies for predicting risk are only modestly successful, likely because they are derived from statistical analysis methods that fail to capture prognostic information in large data sets containing multi-dimensional interactions.
    METHODS AND RESULTS: We used a machine learning algorithm to capture correlations between patient characteristics and mortality. A model was built by training a boosted decision tree algorithm to relate a subset of the patient data with a very high or very low mortality risk in a cohort of 5822 hospitalized and ambulatory patients with HF. From this model we derived a risk score that accurately discriminated between low and high-risk of death by identifying eight variables (diastolic blood pressure, creatinine, blood urea nitrogen, haemoglobin, white blood cell count, platelets, albumin, and red blood cell distribution width). This risk score had an area under the curve (AUC) of 0.88 and was predictive across the full spectrum of risk. External validation in two separate HF populations gave AUCs of 0.84 and 0.81, which were superior to those obtained with two available risk scores in these same populations.
    CONCLUSIONS: Using machine learning and readily available variables, we generated and validated a mortality risk score in patients with HF that was more accurate than other risk scores to which it was compared. These results support the use of this machine learning approach for the evaluation of patients with HF and in other settings where predicting risk has been challenging.
    Keywords:  Heart failure; Machine learning; Outcomes
    DOI:  https://doi.org/10.1002/ejhf.1628
  11. Ann Thorac Surg. 2019 Nov 07. pii: S0003-4975(19)31620-0. [Epub ahead of print]
       BACKGROUND: This study evaluated the predictive utility of a machine learning algorithm in estimating operative mortality risk in cardiac surgery.
    METHODS: Index adult cardiac operations performed between 2011-2017 at a single institution were included. The primary outcome was operative mortality. Extreme gradient boosting (XGBoost) models were developed and evaluated using 10-fold cross validation with 1000-replication bootstrapping. Model performance was assessed using multiple measures including precision, recall, calibration plots, area under receiver operating characteristic curve (c-index), accuracy, and F1 score.
    RESULTS: A total of 11,190 patients were included (7,048 isolated coronary artery bypass grafting [CABG], 2,507 isolated valves, and 1,635 CABG plus valves). The Society of Thoracic Surgeons predicted risk of mortality (STS-PROM) was 3.2% ± 5.0%. Actual operative mortality was 2.8%. There was moderate correlation (r=0.652) in predicted risk between XGBoost versus STS-PROM for the overall cohort and weak correlation (r=0.473) in predicted risk between the models specifically in patients with operative mortality. XGBoost demonstrated improvements in all measures of model performance when compared to STS-PROM in the validation cohorts: mean average precision (0.221 XGBoost versus 0.180 STS-PROM), c-index (0.808 XGBoost versus 0.795 STS-PROM), calibration (mean observed:expected mortality: XGBoost 0.993 versus 0.956 STS-PROM), accuracy (1-3% improvement across discriminatory thresholds of 3-10% risk), and F1 score (0.281 XGBoost versus 0.230 STS-PROM).
    CONCLUSIONS: Machine learning algorithms such as XGBoost have promise in predictive analytics in cardiac surgery. The modest improvements in model performance demonstrated in the current study warrant further validation in larger cohorts of patients.
    Keywords:  aortic valve replacement; artificial intelligence; coronary artery bypass grafts (CABG); database; mitral valve; outcomes
    DOI:  https://doi.org/10.1016/j.athoracsur.2019.09.049
  12. Eur Radiol. 2019 Nov 14.
       OBJECTIVES: To evaluate an artificial intelligence (AI)-based, automatic coronary artery calcium (CAC) scoring software, using a semi-automatic software as a reference.
    METHODS: This observational study included 315 consecutive, non-contrast-enhanced calcium scoring computed tomography (CSCT) scans. A semi-automatic and an automatic software obtained the Agatston score (AS), the volume score (VS), the mass score (MS), and the number of calcified coronary lesions. Semi-automatic and automatic analysis time were registered, including a manual double-check of the automatic results. Statistical analyses were Spearman's rank correlation coefficient (⍴), intra-class correlation (ICC), Bland Altman plots, weighted kappa analysis (κ), and Wilcoxon signed-rank test.
    RESULTS: The correlation and agreement for the AS, VS, and MS were ⍴ = 0.935, 0.932, 0.934 (p < 0.001), and ICC = 0.996, 0.996, 0.991, respectively (p < 0.001). The correlation and agreement for the number of calcified lesions were ⍴ = 0.903 and ICC = 0.977 (p < 0.001), respectively. The Bland Altman mean difference and 1.96 SD upper and lower limits of agreements for the AS, VS, and MS were - 8.2 (- 115.1 to 98.2), - 7.4 (- 93.9 to 79.1), and - 3.8 (- 33.6 to 25.9), respectively. Agreement in risk category assignment was 89.5% and κ = 0.919 (p < 0.001). The median time for the semi-automatic and automatic method was 59 s (IQR 35-100) and 36 s (IQR 29-49), respectively (p < 0.001).
    CONCLUSIONS: There was an excellent correlation and agreement between the automatic software and the semi-automatic software for three CAC scores and the number of calcified lesions. Risk category classification was accurate but showing an overestimation bias tendency. Also, the automatic method was less time-demanding.
    KEY POINTS: • Coronary artery calcium (CAC) scoring is an excellent candidate for artificial intelligence (AI) development in a clinical setting. • An AI-based, automatic software obtained CAC scores with excellent correlation and agreement compared with a conventional method but was less time-consuming.
    Keywords:  Artificial intelligence; Coronary artery disease; Multidetector computed tomography; Software
    DOI:  https://doi.org/10.1007/s00330-019-06489-x
  13. J Am Med Inform Assoc. 2019 Dec 01. 26(12): 1600-1608
       OBJECTIVE: To evaluate the feasibility of a convolutional neural network (CNN) with word embedding to identify the type and severity of patient safety incident reports.
    MATERIALS AND METHODS: A CNN with word embedding was applied to identify 10 incident types and 4 severity levels. Model training and validation used data sets (n_type = 2860, n_severity = 1160) collected from a statewide incident reporting system. Generalizability was evaluated using an independent hospital-level reporting system. CNN architectures were examined by varying layer size and hyperparameters. Performance was evaluated by F score, precision, recall, and compared to binary support vector machine (SVM) ensembles on 3 testing data sets (type/severity: n_benchmark = 286/116, n_original = 444/4837, n_independent = 6000/5950).
    RESULTS: A CNN with 6 layers was the most effective architecture, outperforming SVMs with better generalizability to identify incidents by type and severity. The CNN achieved high F scores (> 85%) across all test data sets when identifying common incident types including falls, medications, pressure injury, and aggression. When identifying common severity levels (medium/low), CNN outperformed SVMs, improving F scores by 11.9%-45.1% across all 3 test data sets.
    DISCUSSION: Automated identification of incident reports using machine learning is challenging because of a lack of large labelled training data sets and the unbalanced distribution of incident classes. The standard classification strategy is to build multiple binary classifiers and pool their predictions. CNNs can extract hierarchical features and assist in addressing class imbalance, which may explain their success in identifying incident report types.
    CONCLUSION: A CNN with word embedding was effective in identifying incidents by type and severity, providing better generalizability than SVMs.
    Keywords:  clinical incident reports; multiple classification; neural networks; patient safety; text classification; word embedding
    DOI:  https://doi.org/10.1093/jamia/ocz146
  14. Cancers (Basel). 2019 Nov 08. pii: E1751. [Epub ahead of print]11(11):
      Objective: Early reports indicate that individuals with type 2 diabetes mellitus (T2DM) may have a greater incidence of breast malignancy than patients without T2DM. The aim of this study was to investigate the effectiveness of three different models for predicting risk of breast cancer in patients with T2DM of different characteristics. Study design and methodology: From 2000 to 2012, data on 636,111 newly diagnosed female T2DM patients were available in the Taiwan's National Health Insurance Research Database. By applying their data, a risk prediction model of breast cancer in patients with T2DM was created. We also collected data on potential predictors of breast cancer so that adjustments for their effect could be made in the analysis. Synthetic Minority Oversampling Technology (SMOTE) was utilized to increase data for small population samples. Each datum was randomly assigned based on a ratio of about 39:1 into the training and test sets. Logistic Regression (LR), Artificial Neural Network (ANN) and Random Forest (RF) models were determined using recall, accuracy, F1 score and area under the receiver operating characteristic curve (AUC). Results: The AUC of the LR (0.834), ANN (0.865), and RF (0.959) models were found. The largest AUC among the three models was seen in the RF model. Conclusions: Although the LR, ANN, and RF models all showed high accuracy predicting the risk of breast cancer in Taiwanese with T2DM, the RF model performed best.
    Keywords:  artificial neural network; breast cancer; logistic regression; random forest; type II diabetes mellitus
    DOI:  https://doi.org/10.3390/cancers11111751
  15. JAMA Oncol. 2019 Nov 14.
       Importance: Diagnosing the site of origin for cancer is a pillar of disease classification that has directed clinical care for more than a century. Even in an era of precision oncologic practice, in which treatment is increasingly informed by the presence or absence of mutant genes responsible for cancer growth and progression, tumor origin remains a critical factor in tumor biologic characteristics and therapeutic sensitivity.
    Objective: To evaluate whether data derived from routine clinical DNA sequencing of tumors could complement conventional approaches to enable improved diagnostic accuracy.
    Design, Setting, and Participants: A machine learning approach was developed to predict tumor type from targeted panel DNA sequence data obtained at the point of care, incorporating both discrete molecular alterations and inferred features such as mutational signatures. This algorithm was trained on 7791 tumors representing 22 cancer types selected from a prospectively sequenced cohort of patients with advanced cancer.
    Results: The correct tumor type was predicted for 5748 of the 7791 patients (73.8%) in the training set as well as 8623 of 11 644 patients (74.1%) in an independent cohort. Predictions were assigned probabilities that reflected empirical accuracy, with 3388 cases (43.5%) representing high-confidence predictions (>95% probability). Informative molecular features and feature categories varied widely by tumor type. Genomic analysis of plasma cell-free DNA yielded accurate predictions in 45 of 60 cases (75.0%), suggesting that this approach may be applied in diverse clinical settings including as an adjunct to cancer screening. Likely tissues of origin were predicted from targeted tumor sequencing in 95 of 141 patients (67.4%) with cancers of unknown primary site. Applying this method prospectively to patients under active care enabled genome-directed reassessment of diagnosis in 2 patients initially presumed to have metastatic breast cancer, leading to the selection of more appropriate treatments, which elicited clinical responses.
    Conclusions and Relevance: These results suggest that the application of artificial intelligence to predict tissue of origin in oncologic practice can act as a useful complement to conventional histologic review to provide integrated pathologic diagnoses, often with important therapeutic implications.
    DOI:  https://doi.org/10.1001/jamaoncol.2019.3985
  16. J Am Med Inform Assoc. 2019 Nov 09. pii: ocz153. [Epub ahead of print]
       OBJECTIVE: We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficiency.
    MATERIALS AND METHODS: Multitask CNN (MTCNN) attempts to tackle document information extraction by learning to extract multiple key cancer characteristics simultaneously. We trained our MTCNN to perform 5 information extraction tasks: (1) primary cancer site (65 classes), (2) laterality (4 classes), (3) behavior (3 classes), (4) histological type (63 classes), and (5) histological grade (5 classes). We evaluated the performance on a corpus of 95 231 pathology documents (71 223 unique tumors) obtained from the Louisiana Tumor Registry. We compared the performance of the MTCNN models against single-task CNN models and 2 traditional machine learning approaches, namely support vector machine (SVM) and random forest classifier (RFC).
    RESULTS: MTCNNs offered superior performance across all 5 tasks in terms of classification accuracy as compared with the other machine learning models. Based on retrospective evaluation, the hard parameter sharing and cross-stitch MTCNN models correctly classified 59.04% and 57.93% of the pathology reports respectively across all 5 tasks. The baseline models achieved 53.68% (CNN), 46.37% (RFC), and 36.75% (SVM). Based on prospective evaluation, the percentages of correctly classified cases across the 5 tasks were 60.11% (hard parameter sharing), 58.13% (cross-stitch), 51.30% (single-task CNN), 42.07% (RFC), and 35.16% (SVM). Moreover, hard parameter sharing MTCNNs outperformed the other models in computational efficiency by using about the same number of trainable parameters as a single-task CNN.
    CONCLUSIONS: The hard parameter sharing MTCNN offers superior classification accuracy for automated coding support of pathology documents across a wide range of cancers and multiple information extraction tasks while maintaining similar training and inference time as those of a single task-specific model.
    Keywords:  cancer pathology reports; convolutional neural network; deep learning; information extraction; multitask learning; natural language processing
    DOI:  https://doi.org/10.1093/jamia/ocz153
  17. Curr Opin Urol. 2019 Nov 12.
       PURPOSE OF REVIEW: This review aims to draw a road-map to the use of artificial intelligence in an era of robotic surgery and highlight the challenges inherent to this process.
    RECENT FINDINGS: Conventional mechanical robots function by transmitting actions of the surgeon's hands to the surgical target through the tremor-filtered movements of surgical instruments. Similarly, the next iteration of surgical robots conform human-initiated actions to a personalized surgical plan leveraging 3D digital segmentation generated prior to surgery. The advancements in cloud computing, big data analytics, and artificial intelligence have led to increased research and development of intelligent robots in all walks of human life. Inspired by the successful application of deep learning, several surgical companies are joining hands with tech giants to develop intelligent surgical robots. We, hereby, highlight key steps in the handling and analysis of big data to build, define, and deploy deep-learning models for building autonomous robots.
    SUMMARY: Despite tremendous growth of autonomous robotics, their entry into the operating room remains elusive. It is time that surgeons actively collaborate for the development of the next generation of intelligent robotic surgery.
    DOI:  https://doi.org/10.1097/MOU.0000000000000692
  18. J Med Radiat Sci. 2019 Nov 10.
      Artificial intelligence (AI) is heralded as the most disruptive technology to health services in the 21st century. Many commentary articles published in the general public and health domains recognise that medical imaging is at the forefront of these changes due to our large digital data footprint. Radiomics is transforming medical images into mineable high-dimensional data to optimise clinical decision-making; however, some would argue that AI could infiltrate workplaces with very few ethical checks and balances. In this commentary article, we describe how AI is beginning to change medical imaging services and the innovations that are on the horizon. We explore how AI and its various forms, including machine learning, will challenge the way medical imaging is delivered from workflow, image acquisition, image registration to interpretation. Diagnostic radiographers will need to learn to work alongside our 'virtual colleagues', and we argue that there are vital changes to entry and advanced curricula together with national professional capabilities to ensure machine-learning tools are used in the safest and most effective manner for our patients.
    DOI:  https://doi.org/10.1002/jmrs.369
  19. Prev Med. 2019 Nov 06. pii: S0091-7435(19)30362-7. [Epub ahead of print]130 105886
      This study evaluated prediction performance of three different machine learning (ML) techniques in predicting opioid misuse among U.S. adolescents. Data were drawn from the 2015-2017 National Survey on Drug Use and Health (N = 41,579 adolescents, ages 12-17 years) and analyzed in 2019. Prediction models were developed using three ML algorithms, including artificial neural networks, distributed random forest, and gradient boosting machine. The performance of the ML prediction models was compared with performance of the penalized logistic regression. The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used as metrics of prediction performance. We used the AUPRC as the primary measure of prediction performance given that it is considered more informative for assessing binary classifiers on imbalanced outcome variable than AUROC. The overall rate of opioid misuse among U.S. adolescents was 3.7% (n = 1521). Prediction performance was similar across the four models (AUROC values range from 0.809-0.815). In terms of the AUPRC, the distributed random forest showed the best performance in prediction (0.172) followed by penalized logistic regression (0.162), gradient boosting machine (0.160), and artificial neural networks (0.157). Findings suggest that machine learning techniques can be a promising technique especially in the prediction of outcomes with rare cases (i.e., when the binary outcome variable is heavily lopsided) such as adolescent opioid misuse.
    Keywords:  Distributed random forest; Machine learning; Opioid misuse; Penalized logistic regression; Substance use
    DOI:  https://doi.org/10.1016/j.ypmed.2019.105886