bims-rednas Biomed News
on Repetitive DNA sequences
Issue of 2025–04–20
ten papers selected by
Anna Zawada, International Centre for Translational Eye Research



  1. Int J Mol Sci. 2025 Mar 21. pii: 2850. [Epub ahead of print]26(7):
      Inherited neurological disorders, such as spinocerebellar ataxia (SCA) and fragile X (FraX), are frequently caused by short tandem repeat (STR) expansions. The detection and assessment of STRs is important for diagnostics and prognosis. We tested the abilities of nanopore long-read sequencing (LRS) using a custom panel including the nine most common SCA-related genes and FraX and created raw data to report workflow. Using known STR lengths for 23 loci in 12 patients, a pipeline was validated to detect and report STR lengths. In addition, we assessed the capability to detect SNVs, indels, and the methylation status in the same test. For the 23 loci, 22 were concordant with known STR lengths, while for the last, one of three replicates differed, indicating an artefact. All positive control STRs were detected as likely pathogenic, with no additional findings after a visual assessment of repeat motifs. Out of 226 SNV and Indel variants, two were false positive and one false negative (accuracy 98.7%). In all FMR1 controls, a methylation status could be determined. In conclusion, LRS is suitable as a diagnostic workflow for STR analysis in neurological disorders and can be generalized to other diseases. The addition of SNV/Indel and methylation detection promises to allow for a one-test-fits-all workflow.
    Keywords:  Oxford Nanopore Technologies; SNV; fragile X syndrome; indel; long read sequencing; methylation; neurological disorders; short tandem repeat; spinocerebellar ataxia
    DOI:  https://doi.org/10.3390/ijms26072850
  2. Brain. 2025 Apr 16. pii: awaf134. [Epub ahead of print]
      Partial phenotypic overlap has been suggested between multiple system atrophy (MSA) and spinocerebellar ataxia 27B, the autosomal dominant ataxia caused by an intronic GAA•TTC repeat expansion in FGF14. This study investigated the frequency of FGF14 GAA•TTC repeat expansion in clinically diagnosed and pathologically confirmed multiple system atrophy cases. We screened 657 multiple system atrophy cases (193 clinically diagnosed and 464 pathologically confirmed) and 1,003 controls. The FGF14 repeat locus was genotyped using long-range PCR and bidirectional repeat-primed PCRs, and expansions were confirmed with targeted long-read Oxford Nanopore Technologies sequencing. We identified 19 multiple system atrophy cases carrying an FGF14 GAA≥250 expansion (2.89%, n=19/657), a significantly higher frequency than in controls (1.40%, n=12/1,003) (p=0.04). Long-read Oxford Nanopore Technologies sequencing confirmed repeat sizes and polymorphisms detected by PCR, with high concordance (Pearson's r=0.99, p<0.0001). Seven multiple system atrophy patients had a pathogenic FGF14 GAA≥300 expansion (five pathologically confirmed and two clinically diagnosed) and 12 had intermediate GAA250-299 expansion (six pathologically confirmed and six clinically diagnosed). A similar proportion of cerebellar-predominant and parkinsonism-predominant multiple system atrophy cases had FGF14 expansions. multiple system atrophy patients carrying an FGF14 GAA≥250 expansion exhibited severe gait ataxia, autonomic dysfunction and parkinsonism in keeping with a MSA phenotype, with a faster progression to falls (p=0.03) and regular wheelchair use (p=0.02) compared to the multiple system atrophy cases without FGF14 GAA expansion. The length of the GAA•TTC repeat expansion lengths inversely correlated with survival in multiple system atrophy patients (r = -0.67; p=0.02), but not with age of onset. Therefore, screening for FGF14 GAA•TTC repeat expansion should be considered for multiple system atrophy patients with rapid loss of mobility and for complete diagnostic accuracy at inclusion in disease-modifying multiple system atrophy drug trials.
    Keywords:  FGF14 GAA ataxia; MSA; SCA27B; multiple system atrophy; spinocerebellar ataxia 27B
    DOI:  https://doi.org/10.1093/brain/awaf134
  3. Nucleic Acids Res. 2025 Apr 10. pii: gkaf298. [Epub ahead of print]53(7):
      Non-canonical (non-B) DNA structures-e.g. bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g. A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies and occupy 9%-15%, 9%-11%, and 12%-38% of autosomes and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
    DOI:  https://doi.org/10.1093/nar/gkaf298
  4. bioRxiv. 2025 Apr 03. pii: 2025.03.29.646092. [Epub ahead of print]
      Transposable elements (TEs) are vital components of eukaryotic genomes and have played a critical role in genome evolution. Although most TEs are silenced in the mammalian genome, increasing evidence suggests that certain TEs are actively involved in gene regulation during early developmental stages. However, the extent to which human TEs drive gene transcription in adult tissues remains largely unexplored. In this study, we systematically analyzed 17,329 human transcriptomes to investigate how TEs influence gene transcription across 47 adult tissues. Our findings reveal that TE-derived transcripts are broadly expressed in human tissues, contributing to both housekeeping functions and tissue-specific gene regulation. We identified sex-specific expression of TE-derived transcripts regulated by sex hormones in breast tissue between females and males. Our results demonstrated that TE-derived alternative transcription initiation significantly enhances the variety of translated protein products, e.g., changes in the N-terminal peptide length of WNT2B caused by TE-derived transcription result in isoform-specific subcellular localization. Additionally, we identified 68 human-specific TE-derived transcripts associated with metabolic processes and environmental adaptation. Together, these findings highlight the pivotal evolutionary role of TEs in shaping the human transcriptome, demonstrating how conserved and human-specific TEs contribute to transcriptional and translational innovation in human genome evolution.
    DOI:  https://doi.org/10.1101/2025.03.29.646092
  5. Cold Spring Harb Perspect Biol. 2025 Apr 15. pii: a041697. [Epub ahead of print]
      Telomerase ribonucleoprotein (RNP) plays a crucial role in maintaining telomere length by processively adding telomeric repeats to the 3' ends of chromosomes. Telomerase activation is linked to cancer, while mutations that compromise telomerase function result in diseases such as dyskeratosis congenita. The synthesis of telomeric repeats necessitates two core telomerase components: telomerase reverse transcriptase (TERT) and telomerase RNA (TER). However, cellular telomerase holoenzymes encompass a diverse range of protein factors, both constitutively and transiently interacting. These factors are integral to telomerase assembly or regulation at telomeres. This review emphasizes recent advancements in structural studies of telomerase holoenzymes and their associated factors from Tetrahymena thermophila, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and humans. These studies have significantly deepened our molecular understanding not only of the mechanism underlying telomeric repeat synthesis but also of the biological roles of telomerase-associated proteins.
    DOI:  https://doi.org/10.1101/cshperspect.a041697
  6. Clin Exp Nephrol. 2025 Apr 17.
       BACKGROUND: Autosomal-dominant tubulointerstitial kidney disease caused by MUC1 (ADTKD-MUC1) is a rare disorder characterized by progressive kidney dysfunction. Pathogenic variants in MUC1 are difficult to detect owing to the variable number tandem repeat region. To address this issue, VNtyper-Kestrel, a bioinformatics pipeline for short-read sequencing data, was recently developed. In this study, the performance of VNtyper-Kestrel for detecting MUC1 variants in clinical settings was evaluated.
    METHODS: We used VNtyper-Kestrel to retrospectively analyze short-read sequencing data for 209 individuals with suspected ADTKD who were previously evaluated through long-read sequencing. Data from a panel including ~ 180 genes and an ADTKD-specific panel were used. In addition, the pipeline was applied to 976 patients with suspected hereditary kidney diseases other than ADTKD and positive cases were validated using long-read sequencing. Accuracy was assessed by comparisons with the results of long-read sequencing.
    RESULTS: Using VNtyper-Kestrel, we identified MUC1 variants in 16 of 19 confirmed cases of ADTKD-MUC1. Three initially negative cases were reanalyzed using the ADTKD-specific panel, yielding positive detection results with high confidence. We obtained two low-confidence positive results from 190 cases of suspected ADTKD and 10 low-confidence positive results among 976 non-ADTKD cases; however, all were classified as false positives upon long-read sequencing validation.
    CONCLUSIONS: VNtyper-Kestrel demonstrated high sensitivity in identifying MUC1 variants when sequencing coverage was adequate, supporting its potential as a rapid and cost-effective screening tool. However, confirmatory long-read sequencing is needed in uncertain cases. Optimizing coverage and refining patient selection criteria could improve the clinical utility of this approach.
    Keywords:  ADTKD; MUC1; VNtyper; Variable number tandem repeat
    DOI:  https://doi.org/10.1007/s10157-025-02675-y
  7. Commun Biol. 2025 Apr 16. 8(1): 616
      Catfish represents a diverse lineage with variable number of chromosomes and complex relationships with humans. Although certain species pose significant invasive threats to native fish populations, comprehensive genomic investigations into the evolutionary adaptations that contribute to their invasion success are lacking. To address this gap, our study presents a high-quality genome assembly of the Amazon sailfin catfish (Pterygoplichthys pardalis), a member of the armored catfish family, along with a comprehensive comparative genomic analysis. By utilizing conserved genomic regions across different catfish species, we reconstructed the 29 ancestral chromosomes of catfish, including two microchromosomes (28 and 29) that show different fusion and breakage patterns across species. Our analysis shows that the Amazon sailfin catfish genome is notably larger (1.58 Gb) with more than 40,000 coding genes. The genome expansion was linked to early repetitive sequence expansions and recent gene duplications. Several expanded genes are associated with immune functions, including antigen recognition domains like the Ig-v-set domain and the tandem expansion of the CD300 gene family. We also identified specific insertions in CNEs (conserved non-coding elements) near genes involved in cellular processes and neural development. Additionally, rapidly evolving and positively selected genes in the Amazon sailfin catfish genome were found to be associated with collagen formation. Moreover, we identified multiple positively selected codons in hoxb9, which may lead to functional alterations. These findings provide insights into molecular adaptations in an invasive catfish that may underlie its widespread invasion success.
    DOI:  https://doi.org/10.1038/s42003-025-08029-4
  8. BMC Genom Data. 2025 Apr 18. 26(1): 30
       BACKGROUND: The comprehensive annotation of repeated sequences in genomes is an essential prerequisite for studying the dynamics of these sequences over time and their involvement in gene regulation. Currently, the diversity of repeated sequences in Citrus genomes is only partially characterized because the annotations have been performed using heterogeneous bioinformatics tools, each with its specificity and dedicated only to the annotation of transposable elements.
    RESULTS: We combined complementary repeat-finding programs including REPET, CAULIFINDER, and TAREAN, to enable the identification of all types of repetitive sequences found in plant genomes, including transposable elements, endogenous caulimovirids, and satellite DNAs. A fine-grained annotation method was first developed to create a consensus sequence library of repeated sequences identified in the genome assemblies of C. medica, C. micrantha, C. reticulata, and C. maxima, the four ancestral parental species involved in the formation of economically valuable cultivated Citrus varieties. A second, faster annotation method was developed to enrich the dataset by adding new repeated sequences retrieved from genome assemblies of other Citrus species and closely related species belonging to the Aurantioideae subfamily. The final reference library contains 3,091 consensus sequences, of which 94.5% are transposable elements. The diversity of endogenous caulimovirids was characterized for the first time within the genus Citrus, contributing 160 consensus sequences to the final reference library. Finally, 10 satellite DNAs were also identified.
    CONCLUSION: Combining multiple repeat detection methods enables the comprehensive annotation of all repeated sequences in Citrus genomes. Using the final reference library reported in this work will improve our understanding of the dynamics of repeated sequences during Citrus speciation, particularly following the genome duplication and hybridization events that led to modern cultivars. The exploration of repeat position insertions along chromosomes using the developed web interface, RepeatLoc Citrus, will also make it possible to further investigate the role of transposable elements and endogenous caulimovirids in genome structure and gene regulation in Citrus species.
    Keywords:  Citrus; Endogenous caulimovirids; Genome annotation; Satellite DNAs; Transposable elements
    DOI:  https://doi.org/10.1186/s12863-025-01321-6
  9. Front Plant Sci. 2025 ;16 1573967
      Modern sugarcane cultivars are derived from interspecific hybridization between S. officinarum and S. spontaneum with complex genetic backgrounds, and their lack of SSR markers limits the genetic improvement of sugarcane. In this study, We searched for and identified SSR loci within the genomes of 14 Poaceae plants. Notably, a significant positive correlation (r = 0.958) was detected between genome size and the number of SSRs. We identified SSR loci in the whole genome of XTT22, a modern sugarcane cultivar. A total of 1,054,918 SSR loci were identified, with a frequency of 123 loci/Mb and an average of 1 SSR locus per 8.11 kb, with Chr1 having the highest content and frequency of SSR loci. Among different repeat types, the number of mononucleotide repeats (620, 901) and dinucleotide repeats (238, 261) was the largest, accounting for 81.45% of the total number of SSR loci, and the number of SSR decreases with the increase of the number of SSR repeat motifs. Based on the above SSR loci, 910,519 primer pairs were obtained, and 459 SSR markers with polymorphism were screened. The polymorphism rate of SSR markers among different SSR repeat types ranged from 81.97% to 97.90%, and the pentanucleotide repeat type had the highest number of SSR markers. In order to test the universality of the developed SSR markers in sugarcane and its related species, 24 polymorphic SSR markers were randomly selected for verification in 33 sugarcane and its related species and amplified 134 alleles in total. Each pair of primers amplified 1-11 alleles, with an average of 5.58 alleles per pair. This study is the first to systematically develop SSR molecular markers for modern sugarcane cultivars at the genome-wide level, which not only enriches the number of existing SSR markers of modern sugarcane cultivars, but also provides important molecular markers to support the molecular marker-assisted breeding of sugarcane.
    Keywords:  SSR markers; genome-wide; modern sugarcane cultivars; molecular breeding; polymorphism
    DOI:  https://doi.org/10.3389/fpls.2025.1573967
  10. Genome Res. 2025 Apr 14. 35(4): 593-598
      Long-read sequencing technologies, particularly those from Pacific Biosciences and Oxford Nanopore Technologies, are revolutionizing genome research by providing high-resolution insights into complex and repetitive regions of the human genome that were previously inaccessible. These advances have been particularly enabling for the comprehensive detection of genomic structural variants (SVs), which is critical for linking genotype to phenotype in population-scale and rare disease studies, as well as in cancer. Recent developments in sequencing throughput and computational methods, such as pangenome graphs and haplotype-resolved assemblies, are paving the way for the future inclusion of long-read sequencing in clinical cohort studies and disease diagnostics. DNA methylation signals directly obtained from long reads enhance the utility of single-molecule long-read sequencing technologies by enabling molecular phenotypes to be interpreted, and by allowing the identification of the parent of origin of de novo mutations. Despite this recent progress, challenges remain in scaling long-read technologies to large populations due to cost, computational complexity, and the lack of tools to facilitate the efficient interpretation of SVs in graphs. This perspective provides a succinct review on the current state of long-read sequencing in genomics by highlighting its transformative potential and key hurdles, and emphasizing future opportunities for advancing the understanding of human genetic diversity and diseases through population-scale long-read analysis.
    DOI:  https://doi.org/10.1101/gr.280120.124