Mol Oncol. 2022 Nov 04.
Mitochondrial DNA (mtDNA) somatic mutations play important roles in the initiation and progression of cancer. Although next-generation sequencing (NGS) of paired tumor and control samples has become a common practice to identify tumor-specific mtDNA mutations, the unique nature of mtDNA and NGS-associated sequencing bias could cause false positive/negative somatic mutation calling. Additionally, there are clinical scenarios where matched control tissues are unavailable for comparison. Therefore, a novel approach for accurately identifying somatic mtDNA variants is greatly needed, particularly in the absence of matched controls. In this study, the ground truth mtDNA variants orthogonally validated by triple-paired tumor, adjacent non-tumor, and blood samples were used to develop mitoSomatic, a random-forest-based machine learning tool. We demonstrated that mitoSomatic achieved area under the curve (AUC) values over 0.99 for identifying somatic mtDNA variants without paired control in three tumor types. In addition, mitoSomatic was also applicable in non-tumor tissues such as adjacent non-tumor and blood samples, suggesting the flexibility of mitoSomatic's classification capability. Furthermore, analysis of triple-paired samples identified a small group of variants with uncertain somatic/germline origin, whereas application of mitoSomatic significantly facilitated the prediction of their possible source. Finally, a control-free evaluation of the public pan-cancer NGS dataset with mitoSomatic revealed a substantial number of variants that were probably misclassified by conventional tumor-control comparison, further emphasizing the usefulness of mitoSomatic in application. Taken together, our study demonstrates that mitoSomatic is valuable for accurately identifying somatic mtDNA variants in mtDNA NGS data without paired controls, applicable for both tumor and non-tumor tissues.
Keywords: machine learning; mitochondrial DNA; next-generation sequencing; somatic mutations