bims-sicarn Biomed News
on scRNA-seq
Issue of 2025–04–13
thirty papers selected by
Anna Zawada, International Centre for Translational Eye Research



  1. Cell Rep. 2025 Apr 03. pii: S2211-1247(25)00291-8. [Epub ahead of print]44(4): 115520
      DNA methylation and hydroxymethylation are extensively reprogrammed during mammalian early embryogenesis, and studying their regulatory functions requires comprehensive DNA hydroxymethylation maps at base resolution. Here, we develop single-cell 5-hydroxymethylcytosine (5hmC) chemical-assisted C-to-T conversion-enabled sequencing (schmC-CATCH), a method leveraging selective 5hmC labeling for a quantitative, base-resolution, genome-wide landscape of the DNA hydroxymethylome in mouse gametes and preimplantation embryos spanning from the zygote to blastocyst stage. We revealed that, in addition to late zygotic stages, onset of ten-eleven translocation (TET)-mediated DNA hydroxymethylation initiates immediately after fertilization and is characterized by the distinct 5hmC patterns on the parental genomes shaped by TET3 demethylase. We identified persistent clusters of 5hmC hotspots throughout early embryonic stages, which are highly associated with young retroelements. 5hmC is also associated with different regulatory elements, indicating a potential regulatory function during early embryogenesis. Collectively, our work elucidates the dynamics of active DNA demethylation during mouse preimplantation development and provides a valuable resource for functional studies of epigenetic reprogramming in early embryos.
    Keywords:  CP: Developmental biology; CP: Molecular biology; DNA hydroxymethylation; TET3; bisulfite-free method; mammalian early embryo; single cell sequencing
    DOI:  https://doi.org/10.1016/j.celrep.2025.115520
  2. Genome Biol. 2025 Apr 07. 26(1): 86
       BACKGROUND: Advancements in RNA sequencing have expanded our ability to study gene expression profiles of biological samples in bulk tissue and single cells. Deconvolution of bulk data with single-cell references provides the ability to study relative cell-type proportions, but most methods assume a reference is present for every cell type in bulk data. This is not true in all circumstances-cell types can be missing in single-cell profiles for many reasons. In this study, we examine the impact of missing cell types on deconvolution methods.
    RESULTS: Using paired single-cell and single-nucleus data, we simulate realistic scenarios where cell types are missing since single-nucleus RNA sequencing is able to capture cell types that would otherwise be missing in a single-cell counterpart. Single-nucleus sequencing captures cell types absent in single-cell counterparts, allowing us to study their effects on deconvolution. We evaluate three different methods and find that performance is influenced by both the number and similarity of missing cell types. Additionally, missing cell-type profiles can be recovered from residuals using a simple non-negative matrix factorization strategy. We also analyzed real bulk data of cancerous and non-cancerous samples. We observe results consistent with simulation, namely that expression patterns from cell types likely to be missing appear present in residuals.
    CONCLUSIONS: We expect our results to provide a starting point for those developing new deconvolution methods and help improve their to better account for the presence of missing cell types. Our results suggest that deconvolution methods should consider the possibility of missing cell types.
    DOI:  https://doi.org/10.1186/s13059-025-03506-9
  3. Genome Res. 2025 Apr 10. pii: gr.279955.124. [Epub ahead of print]
      Reprogramming cell state transitions provides the potential for cell engineering and regenerative therapy for many diseases. Finding the reprogramming transcription factors (TFs) and their combinations that can direct the desired state transition is crucial for the task. Computational methods have been developed to identify such reprogramming TFs. However, most of them can only generate a ranked list of individual TFs and ignore the identification of TF combinations. Even for individual reprogramming TF identification, current methods often fail to put the real effective reprogramming TFs at the top of their rankings. To address these challenges, we developed TFcomb, a computational method that leverages single-cell multiomics data to identify reprogramming TFs and TF combinations that can direct cell state transitions. We modeled the task of finding reprogramming TFs and their combinations as an inverse problem to enable searching for answers in very high dimensional space, and used Tikhonov regularization to guarantee the generalization ability of solutions. For the coefficient matrix of the model, we designed a graph attention network to augment gene regulatory networks built with single-cell RNA-seq and ATAC-seq data. Benchmarking experiments on data of human embryonic stem cells demonstrated superior performance of TFcomb against existing methods for identifying individual TFs. We curated datasets of multiple cell reprogramming cases and demonstrated that TFcomb can efficiently identify reprogramming TF combinations from a vast pool of potential combinations. We applied TFcomb on a dataset of mouse hair follicle development and found key TFs in cell differentiation. All experiments showed that TFcomb is powerful in identifying reprogramming TFs and TF combinations from single-cell datasets to empower future cell engineering.
    DOI:  https://doi.org/10.1101/gr.279955.124
  4. bioRxiv. 2025 Mar 25. pii: 2025.03.21.644670. [Epub ahead of print]
      Age-related macular degeneration (AMD) is a leading cause of vision loss worldwide. Genome-wide association studies (GWAS) of AMD have identified dozens of risk loci that may house disease targets. However, variants at these loci are largely noncoding, making it difficult to assess their function and whether they are causal. Here, we present a single-cell gene expression and chromatin accessibility atlas of human retinal pigment epithelium (RPE) and choroid to systematically analyze both coding and noncoding variants implicated in AMD. We employ HiChIP and Activity-by-Contact modeling to map enhancers in these tissues and predict cell and gene targets of risk variants. We further perform allele-specific self-transcribing active regulatory region sequencing (STARR-seq) to functionally test variant activity in RPE cells, including in the context of complement activation. Our work nominates new pathogenic variants and mechanisms in AMD and offers a rich and accessible resource for studying diseases of the RPE and choroid.
    DOI:  https://doi.org/10.1101/2025.03.21.644670
  5. Brief Bioinform. 2025 Mar 04. pii: bbaf138. [Epub ahead of print]26(2):
      Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity by providing gene expression data at the single-cell level. Unlike bulk RNA-seq, scRNA-seq allows identification of different cell types within a given tissue, leading to a more nuanced comprehension of cell functions. However, the analysis of scRNA-seq data presents challenges due to its sparsity and high dimensionality. Since bioinformatics plays an important role in the analysis of big data and its utility for the welfare of living beings, it has been widely applied in analyzing scRNA-seq data. To address these challenges, we introduce the scMUG computational pipeline, which incorporates gene functional module information to enhance scRNA-seq clustering analysis. The pipeline includes data preprocessing, cell representation generation, cell-cell similarity matrix construction, and clustering analysis. The scMUG pipeline also introduces a novel similarity measure that combines local density and global distribution in the latent cell representation space. As far as we can tell, this is the first attempt to integrate gene functional associations into scRNA-seq clustering analysis. We curated nine human scRNA-seq datasets to evaluate our scMUG pipeline. With the help of gene functional information and the novel similarity measure, the clustering results from scMUG pipeline present deep insights into functional relationships between gene expression patterns and cellular heterogeneity. In addition, our scMUG pipeline also presents comparable or better clustering performances than other state-of-the-art methods. All source codes of scMUG have been deposited in a GitHub repository with instructions for reproducing all results (https://github.com/degiminnal/scMUG).
    Keywords:  autoencoder; clustering analysis; gene functional modules; scRNA-seq
    DOI:  https://doi.org/10.1093/bib/bbaf138
  6. Sci Rep. 2025 Apr 10. 15(1): 12299
      The neocortical development process includes cell proliferation, differentiation, migration, and maturation, supported by precise genetic regulation. To understand these processes at the cellular and molecular levels, it is necessary to characterize the fundamental anatomical structures by gene expression. However, markers established in the adult brain sometimes behave differently in the fetal brain, actively changing during development. The spatial transcriptome is a powerful analytical method that enables sequence analysis while retaining spatial information. However, a deeper understanding of these data requires computational estimation, including integration with single-cell transcriptome data and aggregation of spots at the single-cell cluster level. The application of such analysis to biomarker discovery has only begun recently, and its application to the developing fetal brain is largely unexplored. In this study, we performed a spatial transcriptome analysis of the developing mouse brain to investigate spatio-temporal regulation of gene expression during development. Using these data, we conducted an integrated study with publicly available mouse data sets. Our data-driven analysis identified novel molecular markers of the choroid plexus, piriform cortex, and thalamus. Furthermore, we identified a novel molecular marker that can determine the dorsal endopiriform nucleus (DEn) of the developmental stage in the claustrum/DEn complex.
    DOI:  https://doi.org/10.1038/s41598-025-95496-8
  7. Nat Protoc. 2025 Apr 09.
      Bacterial single-cell transcriptomics is revolutionizing our understanding of cell-to-cell variation within bacterial populations and enables gene expression profiling in complex microbial communities. Using the eukaryotic multiple annealing and dC-tailing-based quantitative single-cell RNA-sequencing (scRNA-seq) (MATQ-seq) approach, we have developed a robust bacterial scRNA-seq protocol, which integrates index sorting, random priming and rRNA depletion. This method stands out for its high rate of cell retention and its suitability for experiments with limited input material, offering a reliable method even for small sample sizes. Here we provide a step-by-step protocol covering the entire process of generating single-bacteria transcriptomes, including experimental and computational analysis. It involves (i) single-cell isolation via fluorescence-activated cell sorting (FACS) and cell lysis, (ii) reverse transcription and cDNA amplification using robotic liquid handling, (iii) rRNA depletion, (iv) indexing and sequencing, and (v) data processing steps to start comprehensive data analysis. Using model organisms such as Salmonella enterica, we show that the method achieves a retention rate of 95%, defined as the rate of initially sorted cells converted into effective sequencing libraries. This substantially surpasses other available protocols. The method robustly detects 300-600 genes per cell, highlighting its effectiveness in capturing a broad transcriptomic profile. The entire procedure from FACS-based single-cell isolation to raw data generation spans ~5 d. As MATQ-seq has already been proven robust in several bacterial species, it holds promise for the establishment of a streamlined microbial scRNA-seq platform.
    DOI:  https://doi.org/10.1038/s41596-025-01157-5
  8. Nature. 2025 Apr 09.
      The mammalian nucleus is compartmentalized by diverse subnuclear structures. These subnuclear structures, marked by nuclear bodies and histone modifications, are often cell-type specific and affect gene regulation and 3D genome organization1-3. Understanding their relationships rests on identifying the molecular constituents of subnuclear structures and mapping their associations with specific genomic loci and transcriptional levels in individual cells, all in complex tissues. Here, we introduce two-layer DNA seqFISH+, which enables simultaneous mapping of 100,049 genomic loci, together with the nascent transcriptome for 17,856 genes and subnuclear structures in single cells. These data enable imaging-based chromatin profiling of diverse subnuclear markers and can capture their changes at genomic scales ranging from 100-200 kilobases to approximately 1 megabase, depending on the marker and DNA locus. By using multi-omics datasets in the adult mouse cerebellum, we showed that repressive chromatin regions are more variable by cell type than are active regions across the genome. We also discovered that RNA polymerase II-enriched foci were locally associated with long, cell-type-specific genes (bigger than 200 kilobases) in a manner distinct from that of nuclear speckles. Furthermore, our analysis revealed that cell-type-specific regions of heterochromatin marked by histone H3 trimethylated at lysine 27 (H3K27me3) and histone H4 trimethylated at lysine 20 (H4K20me3) are enriched at specific genes and gene clusters, respectively, and shape radial chromosomal positioning and inter-chromosomal interactions in neurons and glial cells. Together, our results provide a single-cell high-resolution multi-omics view of subnuclear structures, associated genomic loci and their effects on gene regulation, directly within complex tissues.
    DOI:  https://doi.org/10.1038/s41586-025-08838-x
  9. Res Sq. 2025 Mar 27. pii: rs.3.rs-6081101. [Epub ahead of print]
      Chronic inflammation is a well-established risk factor for cancer, but the underlying molecular mechanisms remain unclear. Using a mouse model of colitis, we demonstrate that colonic stem cells retain an epigenetic memory of inflammation following disease resolution, characterized by a cumulative gain of activator protein 1 (AP-1) transcription factor activity. Further, we develop SHARE-TRACE, a method that enables simultaneous profiling of gene expression, chromatin accessibility and clonal history in single cells, enabling high resolution tracking of epigenomic memory. This reveals that inflammatory memory is propagated cell-intrinsically and inherited through stem cell lineages, with certain clones demonstrating dramatically stronger memory than others. Finally, we show that colitis primes stem cells for amplified expression of regenerative gene programs following oncogenic mutation that accelerate tumor growth. This includes a subpopulation of tumors that have exceptionally high AP-1 activity and the additional upregulation of pro-oncogenic programs. Together, our findings provide a mechanistic link between chronic inflammation and malignancy, revealing how long-lived epigenetic alterations in regenerative tissues may contribute to disease susceptibility and suggesting potential therapeutic strategies to mitigate cancer risk in patients with chronic inflammatory conditions.
    DOI:  https://doi.org/10.21203/rs.3.rs-6081101/v1
  10. Nat Commun. 2025 Apr 11. 16(1): 3435
      The dynamic three-dimensional spatial conformations of chromosomes demonstrate complex structural variations across single cells, which plays pivotal roles in modulating single-cell specific transcription and epigenetics landscapes. The high rates of missing contacts in single-cell chromatin contact maps impose significant challenges to reconstruct high-resolution spatial chromatin configurations. We develop a data-driven algorithm, Tensor-FLAMINGO, based on a low-rank tensor completion strategy. Implemented on a diverse panel of single-cell chromatin datasets, Tensor-FLAMINGO generates 10kb- and 30kb-resolution spatial chromosomal architectures across individual cells. Tensor-FLAMINGO achieves superior accuracy in reconstructing 3D chromatin structures, recovering missing contacts, and delineating cell clusters. The unprecedented high-resolution characterization of single-cell genome folding enables expanded identification of single-cell specific long-range chromatin interactions, multi-way spatial hubs, and the mechanisms of disease-associated GWAS variants. Beyond the sparse 2D contact maps, the complete 3D chromatin conformations promote an avenue to understand the dynamics of spatially coordinated molecular processes across different cells.
    DOI:  https://doi.org/10.1038/s41467-025-58674-w
  11. Front Oncol. 2025 ;15 1535504
       Background: The main treatments for ovarian cancer are surgery, chemotherapy, radiotherapy, and targeted therapy. Targeted therapy is a new treatment method that has emerged in recent years and relies on specific molecular targets to treat cancer. Succinic acid is a key intermediate product in the tricarboxylic acid cycle. Research has shown that succinic acid has antioxidant properties and can alleviate oxidative stress in cells and tissues. These findings indicate the potential application of succinic acid in antioxidant therapy and the prevention of oxidative damage. This study explored the potential targets and therapeutic mechanisms of succinic acid in ovarian cancer.
    Methods: Using bioinformatics and single-cell sequencing technology, the hub genes related to succinic acid and ovarian cancer and the frequency and gene expression patterns of different cell types in ovarian cancer patients and normal individuals were analyzed.
    Results: The frequency of immune cells, including B cells, CD4+ cells, CD8+ cells, macrophages, and plasma cells, was significantly increased in ovarian cancer patients, and the frequency of other cell types, such as endothelial cells, NK cells, and pericytes/SMCs, was decreased. Further research revealed three key hub genes: SPP1, SLPI, and CD9. The expression patterns of these genes in ovarian cancer were closely related to different cell types. SPP1 was expressed mainly in macrophages, SLPI was expressed in epithelial cells, and CD9 was expressed in pericytes/SMCs and epithelial cells. SPP1, SLPI, and CD9 and their mechanisms of action may be potential targets for the treatment of ovarian cancer with succinic acid.
    Conclusions: This study investigated the potential therapeutic targets and mechanisms of succinic acid in ovarian cancer and the differences in immune cell infiltration and gene expression patterns, providing important insights for future tumor immunotherapy research.
    Keywords:  SPP1; immune cell infiltration; ovarian cancer; single-cell RNA sequencing; succinic acid
    DOI:  https://doi.org/10.3389/fonc.2025.1535504
  12. Ann Appl Stat. 2023 Dec;17(4): 3426-3449
      Categorizing individual cells into one of many known cell type categories, also known as cell type annotation, is a critical step in the analysis of single-cell genomics data. The current process of annotation is time-intensive and subjective, which has led to different studies describing cell types with labels of varying degrees of resolution. While supervised learning approaches have provided automated solutions to annotation, there remains a significant challenge in fitting a unified model for multiple datasets with inconsistent labels. In this article, we propose a new multinomial logistic regression estimator which can be used to model cell type probabilities by integrating multiple datasets with labels of varying resolution. To compute our estimator, we solve a nonconvex optimization problem using a blockwise proximal gradient descent algorithm. We show through simulation studies that our approach estimates cell type probabilities more accurately than competitors in a wide variety of scenarios. We apply our method to ten single-cell RNA-seq datasets and demonstrate its utility in predicting fine resolution cell type labels on unlabeled data as well as refining cell type labels on data with existing coarse resolution annotations. Finally, we demonstrate that our method can lead to novel scientific insights in the context of a differential expression analysis comparing peripheral blood gene expression before and after treatment with interferon- β . An R package implementing the method is available at https://github.com/keshav-motwani/IBMR and the collection of datasets we analyze is available at https://github.com/keshav-motwani/AnnotatedPBMC.
    Keywords:  Integrative analysis; cell type annotation; group lasso; multinomial logistic regression; nonconvex optimization; single-cell genomics
    DOI:  https://doi.org/10.1214/23-aoas1769
  13. bioRxiv. 2025 Mar 28. pii: 2025.03.26.645419. [Epub ahead of print]
      Genetic disruption of SETD1A markedly increases the risk for schizophrenia. To elucidate the underlying mechanisms, we generated isogenic organoid models of the developing human cerebral cortex harboring a SETD1A loss-of-function schizophrenia risk mutation. Employing chromatin profiling combined with RNA sequencing, we identified high-confidence SETD1A target genes, analyzed the impact of the mutation on SETD1A binding and transcriptional regulation and validated key findings with orthogonal approaches. Disruption of SETD1A function disturbs the finely tuned temporal gene expression in the excitatory neuron lineage, yielding an aberrant transcriptional program that compromises key regulatory and metabolic pathways essential for neurodevelopmental transitions. Although overall SETD1A binding remains unchanged in mutant neurons, we identified localized alterations in SETD1A binding that correlate with shifts in H3K4me3 levels and gene expression. These changes are enriched at enhancer regions, suggesting that enhancer-regulated genes are especially vulnerable to SETD1A reduction. Notably, target genes with enhancer-bound SETD1A are primarily linked to neuronal functions while those with promoter-bound SETD1A are enriched for basic cellular functions. By mapping the SETD1A binding landscape in excitatory neurons of the human fetal frontal cortex and integrating multimodal neuroimaging and genetic datasets, we demonstrate that the genomic context of SETD1A binding differentially correlates with macroscale brain organization and establish a link between SETD1A-bound enhancers, schizophrenia-associated brain alterations and genetic susceptibility. Our study advances our understanding of the role of SETD1A binding patterns in schizophrenia pathogenesis, offering insights that may guide future therapeutic strategies.
    DOI:  https://doi.org/10.1101/2025.03.26.645419
  14. Elife. 2025 Apr 07. pii: RP102819. [Epub ahead of print]13
      The formation of the mammalian brain requires regionalization and morphogenesis of the cranial neural plate, which transforms from an epithelial sheet into a closed tube that provides the structural foundation for neural patterning and circuit formation. Sonic hedgehog (SHH) signaling is important for cranial neural plate patterning and closure, but the transcriptional changes that give rise to the spatially regulated cell fates and behaviors that build the cranial neural tube have not been systematically analyzed. Here, we used single-cell RNA sequencing to generate an atlas of gene expression at six consecutive stages of cranial neural tube closure in the mouse embryo. Ordering transcriptional profiles relative to the major axes of gene expression predicted spatially regulated expression of 870 genes along the anterior-posterior and mediolateral axes of the cranial neural plate and reproduced known expression patterns with over 85% accuracy. Single-cell RNA sequencing of embryos with activated SHH signaling revealed distinct SHH-regulated transcriptional programs in the developing forebrain, midbrain, and hindbrain, suggesting a complex interplay between anterior-posterior and mediolateral patterning systems. These results define a spatiotemporally resolved map of gene expression during cranial neural tube closure and provide a resource for investigating the transcriptional events that drive early mammalian brain development.
    Keywords:  brain development; developmental biology; mouse; mouse embryo; neural plate; neural tube closure; patterning; single-cell RNA sequencing
    DOI:  https://doi.org/10.7554/eLife.102819
  15. JCI Insight. 2025 Apr 08. pii: e185758. [Epub ahead of print]10(7):
      Using transcriptomic profiling at single-cell resolution, we investigated cell-intrinsic and cell-extrinsic signatures associated with pathogenesis and inflammation-driven fibrosis in both adult and pediatric patients with localized scleroderma (LS). We performed single-cell RNA-Seq on adult and pediatric patients with LS and healthy controls. We then analyzed the single-cell RNA-Seq data using an interpretable factor analysis machine learning framework, significant latent factor interaction discovery and exploration (SLIDE), which moves beyond predictive biomarkers to infer latent factors underlying LS pathophysiology. SLIDE is a recently developed latent factor regression-based framework that comes with rigorous statistical guarantees regarding identifiability of the latent factors, corresponding inference, and FDR control. We found distinct differences in the characteristics and complexity in the molecular signatures between adult and pediatric LS. SLIDE identified cell type-specific determinants of LS associated with age and severity and revealed insights into signaling mechanisms shared between LS and systemic sclerosis (SSc), as well as differences in onset of the disease in the pediatric compared with adult population. Our analyses recapitulate known drivers of LS pathology and identify cellular signaling modules that stratify LS subtypes and define a shared signaling axis with SSc.
    Keywords:  Autoimmune diseases; Autoimmunity; Bioinformatics; Immunology
    DOI:  https://doi.org/10.1172/jci.insight.185758
  16. Nat Microbiol. 2025 Apr 07.
      Microbial genome-wide association studies (GWAS) have uncovered numerous host genetic variants associated with gut microbiota. However, links between host genetics, the gut microbiome and specific cellular contexts remain unclear. Here we use a computational framework, scBPS (single-cell Bacteria Polygenic Score), to integrate existing microbial GWAS and single-cell RNA-sequencing profiles of 24 human organs, including the liver, pancreas, lung and intestine, to identify host tissues and cell types relevant to gut microbes. Analysing 207 microbial taxa and 254 host cell types, scBPS-inferred cellular enrichments confirmed known biology such as dominant communications between gut microbes and the digestive tissue module and liver epithelial cell compartment. scBPS also identified a robust association between Collinsella and the central-veinal hepatocyte subpopulation. We experimentally validated the causal effects of Collinsella on cholesterol metabolism in mice through single-nuclei RNA sequencing on liver tissue to identify relevant cell subpopulations. Mechanistically, oral gavage of Collinsella modulated cholesterol pathway gene expression in central-veinal hepatocytes. We further validated our approach using independent microbial GWAS data, alongside single-cell and bulk transcriptomic analyses, demonstrating its robustness and reproducibility. Together, scBPS enables a systematic mapping of the host-microbe crosstalk by linking cell populations to their interacting gut microbes.
    DOI:  https://doi.org/10.1038/s41564-025-01978-w
  17. Commun Biol. 2025 Apr 04. 8(1): 561
      Single-cell RNA sequencing (scRNA-seq) is an important technique for obtaining biological insights at cellular resolution, with scRNA-seq batch integration a key step before downstream statistical analysis. Despite the plethora of methods proposed, achieving reliable batch correction while preserving the heterogeneity of biological signals that define cell type continues to pose a challenge. To address this, we propose scCRAFT, an autoencoder model that separates cell-type-related signals from batch effects for reliable multi-batch integration. scCRAFT integrates three key loss components: a reconstruction loss for observation reconstruction, a multi-domain adaptation loss to eliminate batch effects, and an innovative dual-resolution triplet loss to preserve intra-batch, introduced as an effective mechanism to counteract the over-correction effect of domain adaptation loss amid heterogeneous cell distributions across batches. We show that scCRAFT effectively manages unbalanced batches, rare cell types, and batch-specific cell phenotypes in simulations, and surpasses state-of-the-art methods in a diverse set of real datasets.
    DOI:  https://doi.org/10.1038/s42003-025-07988-y
  18. Brief Bioinform. 2025 Mar 04. pii: bbaf157. [Epub ahead of print]26(2):
      Single-cell multi-omics technologies have revolutionized the study of cell states and functions by simultaneously profiling multiple molecular layers within individual cells. However, existing methods for integrating these data struggle to preserve critical feature information and fail to exploit known regulatory knowledge, which is essential for understanding cell functions. This limitation hinders their ability to provide comprehensive and accurate insights into cells. Here, we propose FactVAE, an innovative factorized variational autoencoder designed for the robust and accurate understanding of single-cell multi-omics data. FactVAE integrates the factorization principle into the variational autoencoder framework, ensuring the preservation of feature information while leveraging the non-linear capture of sample information by neural networks. Additionally, known regulatory knowledge is incorporated during model training, and a knowledge transfer strategy is employed for cell embedding optimization and data augmentation. Comparative analyses of single-cell multi-omics datasets from different protocols and the spatial multi-omics dataset demonstrate that FactVAE not only outperforms benchmark methods in clustering performance but also generates augmented data that reveals the clearest cell-type-specific motif expression. Moreover, the feature embeddings captured by FactVAE enable the inference of potential and reliable gene regulatory relationships. Overall, FactVAE's superior performance and strong scalability make it a promising new solution for single-cell multi-omics data analysis.
    Keywords:  factorization; single-cell multi-omics data; variational autoencoder (VAE)
    DOI:  https://doi.org/10.1093/bib/bbaf157
  19. Cell Mol Life Sci. 2025 Apr 06. 82(1): 139
      Hepatocytes are crucial for drug screening, disease modeling, and clinical transplantation, yet generating functional hepatocytes in vitro is challenging due to the difficulty of establishing their authentic gene regulatory networks (GRNs). We have previously developed a two-step lineage reprogramming strategy to generate functionally competent human induced hepatocytes (hiHeps), providing an effective model for studying the establishment of hepatocyte-specific GRNs. In this study, we utilized high-throughput single-cell RNA sequencing (scRNA-seq) to explore the cell-fate transition and the establishment of hepatocyte-specific GRNs involved in the two-step reprogramming process. Our findings revealed that the late stage of the reprogramming process mimics the natural trajectory of liver development, exhibiting similar transcriptional waves of developmental genes. CD24 and DLK1 were identified as surface markers enriching two distinct hepatic progenitor populations respectively. Lipid metabolism emerged as a key enhancer of hiHeps maturation. Furthermore, transcription factors HNF4A and HHEX were identified as pivotal gatekeepers directing cell fate decisions between hepatocytes and intestinal cells. Collectively, this study provides valuable insights into the establishment of hepatocyte-specific GRNs during hiHeps induction at single-cell resolution, facilitating more efficient production of functional hepatocytes for therapeutic applications.
    Keywords:  Gene regulatory networks; Human induced hepatocytes; Lineage reprogramming; Lipid metabolism; Single-cell RNA sequencing
    DOI:  https://doi.org/10.1007/s00018-025-05677-x
  20. Brief Bioinform. 2025 Mar 04. pii: bbaf136. [Epub ahead of print]26(2):
      The development of single-cell and spatial transcriptomics has revolutionized our capacity to investigate cellular properties, functions, and interactions in both cellular and spatial contexts. Despite this progress, the analysis of single-cell and spatial omics data remains challenging. First, single-cell sequencing data are high-dimensional and sparse, and are often contaminated by noise and uncertainty, obscuring the underlying biological signal. Second, these data often encompass multiple modalities, including gene expression, epigenetic modifications, metabolite levels, and spatial locations. Integrating these diverse data modalities is crucial for enhancing prediction accuracy and biological interpretability. Third, while the scale of single-cell sequencing has expanded to millions of cells, high-quality annotated datasets are still limited. Fourth, the complex correlations of biological tissues make it difficult to accurately reconstruct cellular states and spatial contexts. Traditional feature engineering approaches struggle with the complexity of biological networks, while deep learning, with its ability to handle high-dimensional data and automatically identify meaningful patterns, has shown great promise in overcoming these challenges. Besides systematically reviewing the strengths and weaknesses of advanced deep learning methods, we have curated 21 datasets from nine benchmarks to evaluate the performance of 58 computational methods. Our analysis reveals that model performance can vary significantly across different benchmark datasets and evaluation metrics, providing a useful perspective for selecting the most appropriate approach based on a specific application scenario. We highlight three key areas for future development, offering valuable insights into how deep learning can be effectively applied to transcriptomic data analysis in biological, medical, and clinical settings.
    Keywords:  deep learning; single-cell; spatial transcriptomics
    DOI:  https://doi.org/10.1093/bib/bbaf136
  21. bioRxiv. 2025 Mar 28. pii: 2025.03.25.645278. [Epub ahead of print]
      Biological systems are composed of diverse, interconnected cell types, yet capturing both their functional dynamics and molecular identities at high spatiotemporal resolution remains challenging. While electrophysiological measurements provide real-time insights into cellular activities, they cannot fully describe the molecular architecture and states of the measured cells. Conversely, transcriptomics reveals cell gene expression patterns but does not capture functional states. Bridging these modalities is essential for a holistic understanding of the molecular mechanisms driving functional changes. In this study, we introduce in situ graphene-sequencing (graphene-seq), a unique platform that seamlessly integrates chronic electrophysiology with imaging-based, spatially resolved 3D transcriptomics, overcoming longstanding limitations of current multimodal approaches. This system leverages stretchable mesh nanoelectronics for long-term, single-cell-level interfacing and incorporates transparent graphene/PEDOT:PSS electrodes, enabling seamless integration of electrical recordings and optical imaging. By combining electrophysiology with high-throughput, imaging-based in situ sequencing, this platform allows comprehensive multimodal, spatially resolved analysis of cell microenvironment within spatially heterogeneous tissues. We validate in situ graphene-seq by charting multimodal profiles of human-induced pluripotent stem cell-derived cardiomyocyte and endothelial cell co-cultures, examining how spatial heterogeneity in cell composition influences both electrophysiological activity and gene expression. This scalable, integrated approach offers a powerful tool for studying the complex interplay between cellular function and molecular identity. It also provides insights into how tissue microenvironments shape cell behavior and molecular states, advancing applications in regenerative medicine, stem cell therapy, and disease modeling.
    DOI:  https://doi.org/10.1101/2025.03.25.645278
  22. JCI Insight. 2025 Apr 08. pii: e187072. [Epub ahead of print]
      The inflammatory response after myocardial infarction (MI) is a precisely regulated process that greatly affects subsequent wound healing and remodeling. However, the understood about the process are still limited. Macrophages are critically involved in inflammation resolution after MI. Krüppel-like factor 9 (Klf9) is a C2H2 zinc finger-containing transcription factor that has been implicated in glucocorticoid regulation of macrophages. However, the contribution of Klf9 to macrophage phenotype and function in the context of MI remains unclear. Our study revealed that KLF9 deficiency results in higher mortality and cardiac rupture rate, as well as a considerable exacerbation in cardiac function. Single-cell RNA sequencing and flow cytometry analyses reveals that, compared to WT mice, Klf9-/- mice display excessive neutrophil infiltration, insufficient macrophage infiltration, and a reduced proportion of Monocyte-derived CD206+ macrophages post-MI. Moreover, the expression of IFN-γ-STAT1 pathway genes in Klf9-/- cardiac macrophages is dysregulated, characterized by insufficient expression at 1 day post-MI and excessive expression at day 3 post-MI. Mechanistically, Klf9 directly binds to the promoters of Stat1 gene, regulating its transcription. Overall, these findings indicates that Klf9 beneficially influences wound healing after MI through modulating macrophage recruitment and differentiation by regulating the IFN-γ-STAT1 signal pathway.
    Keywords:  Cardiology; Cardiovascular disease; Fibrosis; Inflammation; Macrophages
    DOI:  https://doi.org/10.1172/jci.insight.187072
  23. EMBO J. 2025 Apr 04.
      The mammary epithelium derives from multipotent mammary stem cells (MaSCs) that engage into differentiation during embryonic development. However, adult MaSCs maintain the ability to reactivate multipotency in non-physiological contexts. We previously reported that Notch1 activation in committed basal cells triggers a basal-to-luminal cell fate switch in the mouse mammary gland. Here, we report conservation of this mechanism and found that in addition to the mammary gland, constitutive Notch1 signaling induces a basal-to-luminal cell fate switch in adult cells of the lacrimal gland, the salivary gland, and the prostate. Since the lineage transition is progressive in time, we performed single-cell transcriptomic analysis on index-sorted mammary cells at different stages of lineage conversion, generating a temporal map of changes in cell identity. Combining single-cell analyses with organoid assays, we demonstrate that cell proliferation is indispensable for this lineage conversion. We also reveal the individual transcriptional landscapes underlying the cellular plasticity switching of committed mammary cells in vivo with spatial and temporal resolution. Given the roles of Notch signaling in cancer, these results may help to better understand the mechanisms that drive cellular transformation.
    Keywords:  Epithelial Stem Cells; Lineage Conversion; Notch1 Signaling; Plasticity
    DOI:  https://doi.org/10.1038/s44318-025-00424-1
  24. Cancer Cell. 2025 Apr 07. pii: S1535-6108(25)00126-6. [Epub ahead of print]
      Brain metastases (BrMs) remain a major clinical and therapeutic challenge in patients with metastatic cancers. However, advances in our understanding of BrM have been hampered by the constrained sample size and resolution of BrM profiling studies. Here, we perform integrative single-cell RNA sequencing analysis on 108 BrM samples and 111 primary tumor (PTs) samples to investigate the characteristics and remodeling of cell states and composition across cancer lineages and subsets. Recurring and enriched features of malignant cells are increased chromosomal instability, marked proliferative and angiogenic hallmarks, and adoption of a neural-like BrM-associated metaprogram. Immunosuppressive myeloid and stromal subsets dominate the BrM tumor microenvironment, which are associated with poor prognosis and resistance to immunotherapy. Furthermore, five distinct BrM ecotypes are identified, correlating with specific histopathological patterns and clinical characteristics. This work defines hallmarks of BrM biology across cancer types and suggests that shared dependencies may exist, which may be exploited clinically.
    Keywords:  brain metastases; central nervous system; chromosomal instability; ecotype; hallmarks of cancer; metastatic tumor cell; neuronal-like cell state; pan-cancer; single-cell RNA sequencing; tumor microenvironment
    DOI:  https://doi.org/10.1016/j.ccell.2025.03.025
  25. bioRxiv. 2025 Mar 30. pii: 2025.03.26.645511. [Epub ahead of print]
      Single-cell analysis has refined our understanding of cellular heterogeneity in glioma, yet RNA alternative splicing (AS)-a critical layer of transcriptome regulation-remains underexplored at single-cell resolution. Here, we present a pan-glioma single-cell AS analysis in both tumor and immune cells through integrating seven SMART-seq2 datasets of human gliomas. Our analysis reveals lineage-specific AS across glioma cellular states, with the most divergent AS landscapes between mesenchymal- and neuronal-like glioma cells, exemplified by AS in TCF12 and PTBP2 . Comparison between core and peripheral glioma cells highlights AS-redox co-regulation of cytoskeleton organization. Further analysis of glioma-infiltrating immune cells reveals potential isoform-level regulation of protein glycosylation in regulatory T cells and a link between MS4A7 AS in macrophages and clinical response to anti-PD-1 therapy. This study emphasizes the role of AS in glioma cellular heterogeneity, highlighting the importance of an isoform-centric approach to better understand the complex biological processes driving tumorigenesis.
    DOI:  https://doi.org/10.1101/2025.03.26.645511
  26. Front Oncol. 2025 ;15 1553722
       Background: Glucose metabolism reprogramming provides significant insights into the development and progression of malignant tumors. This study aims to explore the temporal-spatial evolution of the glucose metabolism in HCC using single-cell sequencing and spatial transcriptomics (ST), and validates G6PD as a potential therapeutic target for HCC.
    Methods: We collected single-cell sequencing data from 7 HCC and adjacent non-cancerous tissues from the GSE149614 database, and ST data from 4 HCC tissues from the HRA000437 database. Pseudotime analysis was performed on the single-cell data, while ST data was used to analyze spatial metabolic activity. High-throughput sequencing and experiments, including wound healing, CCK-8, and transwell assays, were conducted to validate the role and regulatory mechanisms of G6PD in HCC.
    Results: Our study identified a progressive upregulation of PPP-related genes during tumorigenesis. ST analysis revealed elevated PPP metabolic scores in the central and intermediate tumor regions compared to the peripheral zones. High-throughput sequencing and experimental validation further suggested that G6PD-mediated regulation of HCC cell proliferation, migration, and invasion is likely associated with glutathione metabolism and ROS production. Finally, Cox regression analysis cofirmed G6PD as an independent prognostic factor for overall survival in HCC patients.
    Conclusion: Our study provides novel insights into the changes in glucose metabolism in HCC from both temporal and spatial perspectives. We experimentally demonstrated that G6PD regulates proliferation, migration, and invasion in HCC and propose G6PD as a prognostic marker and therapeutic metabolic target for the HCC.
    Keywords:  bioinformatics; carbohydrate metabolism; hepatocellular carcinoma; metabolic reprogramming; prognostic biomarker
    DOI:  https://doi.org/10.3389/fonc.2025.1553722
  27. bioRxiv. 2025 Mar 28. pii: 2025.03.24.644582. [Epub ahead of print]
       Background: Approximately 15-20% of head and neck cancer squamous cell carcinoma (HNSCC) patients respond favorably to immune checkpoint blockade (ICB). Previous single-cell RNA-Seq (scRNA-Seq) studies identified immune features, including macrophage subset ratios and T-cell subtypes, in HNSCC ICB response. However, the spatial features of HNSCC-infiltrated immune cells in response to ICB treatment need to be better characterized.
    Methods: Here, we perform a systematic evaluation of cell interactions between immune cell types within the tumor microenvironment using spatial omics data using complementary techniques from both 10X Visium spot-based spatial transcriptomics and Nanostring CosMx single-cell spatial omics with RNA gene panel including 435 ligands and receptors. In this study, we used integrated bioinformatics analyses to identify cellular neighborhoods of co-localizing cell types in single-cell spatial transcriptomics and proteomics data. In addition, we used both publicly available scRNA-Seq and in-house spatial RNA-Seq data to identify spatially constrained Ligand-Receptor interactions in Responder patients.
    Results: With 522,399 single cells profiled with both RNA and protein from 26 patients, in addition to spot-resolved spatial RNA-Seq from 8 patients treated with ICB together with bioinformatics analysis of publicly available single-cell and bulk RNA-Seq, we have identified a spatial and cell-type specific context-dependency of myeloid and T cell interaction difference between Responders and Non-Responders. We defined further cellular neighborhood and the sources of chemokine CXCL9/10-CXCR3 interactions in Responders, emerging targets in ICB, as well as CXCL16-CXCR6, CCL4/5-CCR5, and other underappreciated and potential markers and targets for ICB response in HNSCC. In addition, we have contributed a rich data resource of cell-cell Ligand Receptor interactions for the immunotherapy and HNSCC research community.
    Discussion: Our work provides a comprehensive single-cell and spatial atlas of immune cell interactions that correlate with response to ICB in HNSCC. We showcase how integrating multiple technologies and bioinformatics approaches can provide new insights into potential immune-based biomarkers of ICB response. Our results suggested refining future studies using preclinical animal models in a more context-specific manner to elucidate potential underlying mechanisms that lead to improved ICB responses.
    What is already known on this topic: Most cancer patients still do not experience clinical benefits from immune checkpoint blockade (ICB), necessitating the development of response biomarkers and new immunotherapeutic targets.
    What this study adds: Here, we use integrated high-dimensional omics and bioinformatics approaches to identify immune cell-cell interaction markers associated with ICB response in patients with Head and neck squamous cell carcinoma.
    How this study might affect research practice or policy: We identified spatial and cell-type specificity of Ligand-Receptor interactions between myeloid and T cells in ICB Responder patients that may help inform further mechanistic studies and biomarker development.
    DOI:  https://doi.org/10.1101/2025.03.24.644582
  28. Blood Sci. 2025 Jun;7(2): e00226
      The combined analysis of dual diseases can provide new insights into pathogenic mechanisms, identify novel biomarkers, and develop targeted therapeutic strategies. Polycythemia vera (PV) is a chronic myeloproliferative neoplasm associated with a risk of acute myeloid leukemia (AML) transformation. However, the chronic nature of disease transformation complicates longitudinal high-throughput sequencing studies of patients with PV before and after AML transformation. This study aimed to develop a diagnostic model for malignant transformation of chronic proliferative diseases, addressing the challenges of early detection and intervention. Integrated public datasets of PV and AML were analyzed to identify differentially expressed genes (DEGs) and construct a weighted correlation network. Machine-learning algorithms screen genes for potential biomarkers, leading to the development of diagnostic models. Clinical specimens were collected to validate gene expression. cMAP and molecular docking predicted potential drugs. In vitro experiments were performed to assess drug efficacy in PV and AML cells. CIBERSORT and single-cell RNA-sequencing (scRNA-seq) analyses were used to explore the impact of hub genes on the tumor microenvironment. We identified 24 genes shared between PV and AML, which were enriched in immune-related pathways. Lactoferrin (LTF) and G protein-coupled receptor 65 (GPR65) were integrated into a nomogram with a robust predictive power. The predicted drug vemurafenib inhibited proliferation and increased apoptosis in PV and AML cells. TME analysis has linked these biomarkers to macrophages. Clinical samples were used to confirm LTF and GPR65 expression levels. We identified shared genes between PV and AML and developed a diagnostic nomogram that offers a novel avenue for the diagnosis and clinical management of AML-related PV.
    Keywords:  Acute myeloid leukemia; Bioinformatics analysis; Biomarker; Hub genes; Machine learning; Polycythemia vera
    DOI:  https://doi.org/10.1097/BS9.0000000000000226
  29. bioRxiv. 2025 Mar 30. pii: 2025.03.24.645122. [Epub ahead of print]
      Neurons in the dorsal root ganglion (DRG) receive and transmit sensory information from the tissues they innervate and from the external environment. Upper cervical (C1-C2) DRGs are functionally unique as they receive input from the neck, head, and occipital cranial dura, the latter two of which are also innervated by the trigeminal ganglion (TG). The C2 DRG also plays an important role in neck pain, a common and disabling disorder that is poorly understood. Advanced transcriptomic approaches have significantly improved our ability to characterize RNA expression patterns at single-cell resolution in the DRG and TG, but no previous studies have characterized the C2 DRG. Our aim was to use single-nucleus and spatial transcriptomic approaches to create a molecular map of C2 DRGs from patients undergoing arthrodesis surgery with ganglionectomy. Patients with acute (<3 months) or chronic (≥3 months) neck pain were enrolled and completed patient-reported outcomes and quantitative sensory testing prior to surgery. C2 DRGs were characterized with bulk, single nucleus, and spatial RNA sequencing technologies from 22 patients. Through a comparative analysis to published datasets of the lumbar DRG and TG, neuronal clusters identified in both TG and DRG were identified in the C2 DRG. Therefore, our study definitively characterizes the molecular composition of human C2 neurons and establishes their similarity with unique characteristics of subsets of TG neurons. We identified differentially expressed genes in endothelial, fibroblast and myelinating Schwann cells associated with chronic pain, including FGFBP2, C8orf34 and EFNA1 which have been identified in previous genome and transcriptome wide association studies (GWAS/TWAS). Our work establishes an atlas of the human C2 DRG and identifies altered gene expression patterns associated with chronic neck pain. This work establishes a foundation for the exploration of painful disorders in humans affecting the cervical spine.
    DOI:  https://doi.org/10.1101/2025.03.24.645122
  30. medRxiv. 2025 Mar 28. pii: 2025.03.27.25324777. [Epub ahead of print]
      Diverticular disease is a common and morbid complex phenotype influenced by both innate and environmental risk factors. We performed the largest genome-wide association study meta-analysis for diverticular disease, identifying 126 novel loci. Employing multiple downstream analytic strategies, including tissue and pathway enrichment, statistical fine-mapping, allele-specific expression, protein quantitative trait loci and drug-target investigations, and linkage disequilibrium score regression, we prioritized causal genes and produced several lines of evidence linking diverticular disease to connective tissue biology and colonic motility. We substantiated these findings by integrating single-cell RNA sequencing data, showing that prioritized diverticular disease-associated genes are enriched for expression in colonic smooth muscle, fibroblasts, and interstitial cells of Cajal. In quantitative analysis of surgical specimens, we found a substantial reduction in the density of elastin present in the sigmoid colon in severe diverticulitis.
    DOI:  https://doi.org/10.1101/2025.03.27.25324777