bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2025–09–28
three papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Eur J Hum Genet. 2025 Sep 26.
      Autosomal Dominant Polycystic Kidney Disease (ADPKD), caused by pathogenic variants in PKD1 and PKD2, is the most common monogenic cause of kidney failure. Approximately 10% of ADPKD patients remain undiagnosed after coding-region focused genomic testing. Non-coding variants in regulatory regions are not an established cause of disease in ADPKD. We performed regulatory region analysis in a primary cohort of undiagnosed ADPKD patients (n = 20) and then extended this analysis to patients with undiagnosed cystic kidney disease within the Australian KidGen cohort (n = 42) and the Genomics England rare disease cohort (n = 1320). Through this genomic analysis we identified two rare, potentially disease-causing variants in the PKD1 5'untranslated region (UTR). We then designed a PKD1 5'UTR-luciferase translation assay to characterise these variants in vitro, which showed that a PKD1 variant c.-69dupG, reduced the translation efficiency of the main PKD1 open reading frame by ~87% compared to wildtype (p < 0.0001). The human PKD1 5'UTR contains two upstream open reading frames (uORFs). Using our model, we knocked-out the upstream open reading frames of the wildtype PKD1 5'UTR sequence, which increased expression of wildtype polycystin-1 (130%, p < 0.0001). We show that PKD1 5'-UTR variants are a currently overlooked rare cause of disease in ADPKD and that analysis of this region should be included in variant analysis pathways to increase diagnostic rates. In addition, we show that manipulation of the wildtype 5'UTR sequence can increase polycystin-1 expression, providing insights into regulation of PKD1 and suggested new approaches for therapeutic intervention in this haplo-insufficient disease.
    DOI:  https://doi.org/10.1038/s41431-025-01949-z
  2. Database (Oxford). 2025 Jan 18. pii: baaf045. [Epub ahead of print]2025
      In 2018, we analysed the three main repositories for the human proteome: Ensembl/GENCODE, RefSeq, and UniProtKB. At that time the three gene sets disagreed on the coding status of one of every eight annotated coding genes, and our results suggested that as many as 4234 of these genes might not be correctly classified. Here, we have repeated the analysis with updated versions of the three reference gene sets. Superficially, little appears to have changed. The three sets annotate 21 871 coding genes, slightly fewer than previously, and still disagree on the status of 2603 annotated genes, almost one in eight. However, we show that collaborations between the three reference gene sets have led to greater consensus. Reference catalogues have agreed on the coding status of another 249 genes since the last analysis while at least 700 genes have been reclassified. We still find that there are >2000 coding genes with at least one potential non-coding feature to indicate that they may not be coding genes. This includes a large majority of the 2603 genes for which annotators do not agree on coding status. In total, we believe that as many as 3000 genes may be misclassified as coding and could be annotated as non-coding genes, pseudogenes, or cancer antigens.
    DOI:  https://doi.org/10.1093/database/baaf045
  3. bioRxiv. 2025 Sep 15. pii: 2021.07.04.451082. [Epub ahead of print]
      Ribosome profiling is a valuable methodology for measuring changes in a cell's translational program. The approach can report how efficiently mRNA coding sequences are translated and pinpoint positions along mRNAs where ribosomes slow down or arrest. It can also reveal when translation takes place outside coding regions, often with important regulatory consequences. While many useful software tools have emerged to facilitate analysis of these data, packages can become complex and challenging to adapt to specialized needs. We therefore introduce ribofootPrinter, a suite of Python tools designed to offer an accessible and modifiable set of code for analysis of data from ribosome profiling and related types of small RNA sequencing experiments. Alignments are made to a simplified transcriptome to keep the code intuitive and multiple normalization options help facilitate interpretation of meta analysis, particularly outside coding regions. We demonstrate how mapping of short reads to the transcriptome increases the frequency of matches to multiple sites and we provide multimapper identifier files to highlight these regions. Overall, this tool has the capability to carry out sophisticated analysis while maintaining enough simplicity to make it readily understandable and adaptable.
    DOI:  https://doi.org/10.1101/2021.07.04.451082