bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2025–01–12
seven papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Genomics. 2025 Jan 02. pii: S0888-7543(25)00003-5. [Epub ahead of print] 110987
      X-ray irradiation induces widespread changes in gene expression. Positioned at the bottom of the central dogma, translational regulation responds swiftly to environmental stimuli, fine-tuning protein levels. However, the global view of mRNA translation following X-ray exposure remains unclear. In this study, we systematically investigated X-ray-induced translational alternation using ribosome profiling. Our study revealed a temporary translation inhibition in HEK293T cells following X-ray treatment. A subset of mRNAs experienced translational upregulation by bypassing upstream open reading frames (uORFs). The upregulated genes were enriched in the MAPK signaling pathway, such as MAPKBP1. Suppression of MAPKBP1 inhibited X-ray-induced cell apoptosis. Furthermore, we identified the induction of novel peptides encoded by small open reading frames (smORFs) within long non-coding RNAs (lncRNAs) upon X-ray treatment. Overall, our findings provide a comprehensive overview of the translational landscape within eukaryotic cells following X-ray treatment, offering new insights into DNA damage response.
    Keywords:  Cell apoptosis; Ribosome profiling; X-ray; mRNA translation; smORF; uORF
    DOI:  https://doi.org/10.1016/j.ygeno.2025.110987
  2. Cells. 2024 Dec 18. pii: 2090. [Epub ahead of print]13(24):
      Small Open Reading Frames (smORFs) of less than 100 codons remain mostly uncharacterised. About a thousand smORFs per genome encode peptides and microproteins about 70-80 aa long, often containing recognisable protein structures and markers of translation, and these are referred to as short Coding Sequences (sCDSs). The characterisation of individual sCDSs has provided examples of smORFs' function and conservation, but we cannot infer the functionality of all other metazoan smORFs from these. sCDS function has been characterised at a genome-wide scale in yeast and bacteria, showing that hundreds can produce a phenotype, but attempts in metazoans have been less successful. Either most sCDSs are not functional, or classic experimental techniques do not work with smORFs due to their shortness. Here, we combine extensive proteomics with bioinformatics and genetics in order to detect and corroborate sCDS function in Drosophila. Our studies nearly double the number of sCDSs with detected peptides and microproteins and an experimentally corroborated function. Finally, we observe a correlation between proven sCDS protein function and bioinformatic markers such as conservation and GC content. Our results support that sCDSs peptides and microproteins act as membrane-related regulators of canonical proteins, regulators whose functions are best understood at the cellular level, and whose mutants produce little, if any, overt morphological phenotypes.
    Keywords:  Drosophila melanogaster; autophagy regulation; embryogenesis; functional genomics; microproteins; proteomics; ribosome profiling; sCDS (short coding sequences); smORFs (small open reading frames)
    DOI:  https://doi.org/10.3390/cells13242090
  3. Mol Ther Nucleic Acids. 2025 Mar 11. 36(1): 102406
      Upstream open reading frames (uORFs) are cis-regulatory motifs that are predicted to occur in the 5' UTRs of the majority of human protein-coding transcripts and are typically associated with translational repression of the downstream primary open reading frame (pORF). Interference with uORF activity provides a potential mechanism for targeted upregulation of the expression of specific transcripts. It was previously reported that steric block antisense oligonucleotides (ASOs) can bind to and mask uORF start codons to inhibit translation initiation, and thereby disrupt uORF-mediated gene regulation. Given the relative maturity of the oligonucleotide field, such a uORF blocking mechanism might have widespread therapeutic utility. Here, we re-synthesized three of the most potent ASOs targeting the RNASEH1 uORF described in a study by Liang et al. and investigated their potential for RNASEH1 protein upregulation, with care taken to replicate the conditions of the original study. No upregulation (of endogenous or reporter protein expression) was observed with any of the oligonucleotides tested at doses ranging from 25 to 300 nM. Conversely, we observed downregulation of expression in some instances. We conclude that previously described RNASEH1 uORF-targeting steric block ASOs are incapable of upregulating pORF protein expression in our hands.
    Keywords:  MT: Oligonucleotides: Therapies and Applications; RNASEH1; antisense oligonucleotides; steric block ASO; uORF; upstream open reading frame
    DOI:  https://doi.org/10.1016/j.omtn.2024.102406
  4. RNA Biol. 2025 Dec;22(1): 1-12
      Mutations in PKD1 coding sequence and abnormal PKD1 expression levels contribute to the development of autosomal-dominant polycystic kidney disease, the most common genetic disorder. Regulation of PKD1 expression by factors located in the promoter and 3´ UTR have been extensively studied. Less is known about its regulation by 5´ UTR elements. In this study, we investigated the effects of uORFs and uORF-affecting variants by combining bioinformatic analyses, luciferase reporter assays, RT-qPCR and immunoblotting experiments. Our analyses demonstrate that PKD1 mRNA contains two evolutionarily conserved translation-inhibitory uORFs. uORF1 is translatable, and uORF2 is likely not translatable. The 5´ UTR and uORFs do not modulate downstream protein output under endoplasmic reticulum stress and oxidative stress conditions. Some of uORF-perturbing variants in the SNP database are predicted to affect gene translation. Luciferase reporter assays and RT-qPCR results reveal that rs2092942382 and rs1596636969 increase, while rs2092942900 decreases main gene translation without affecting transcription. Antisense oligos targeting the uORFs reduce luciferase protein levels without altering luciferase mRNA levels. Our results establish PKD1 as a novel target of uORF-mediated translational regulation and mutations that perturb uORFs may dysregulate PKD1 protein level.
    Keywords:  5´ UTR; ADPKD; PKD1; SNP; uORF
    DOI:  https://doi.org/10.1080/15476286.2024.2448387
  5. NAR Genom Bioinform. 2025 Mar;7(1): lqae186
      Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins. We have also trained a model for identification of small open reading frame (smORF)-encoded antimicrobial peptides (AMPs). Comprehensive benchmarking of SProtFP revealed an average area under the receiver operator curve (ROC-AUC) of 0.92 during 10-fold cross-validation and an ROC-AUC of 0.94 and 0.93 on held-out balanced and imbalanced test sets. Utilizing our method to annotate bacterial isolates from the human gut microbiome, we could identify thousands of remote homologs of known small protein families and assign putative functions to uncharacterized proteins. This highlights the utility of SProtFP for large-scale functional annotation of microbiome datasets, especially in cases where sequence homology is low. SProtFP is freely available at http://www.nii.ac.in/sprotfp.html and can be combined with genome annotation tools such as ProsmORF-pred to uncover the functional repertoire of novel small proteins in bacteria.
    DOI:  https://doi.org/10.1093/nargab/lqae186
  6. Insects. 2024 Nov 30. pii: 950. [Epub ahead of print]15(12):
      Background: Transposable elements (TEs) and noncoding sequences are major components of the genome, yet their functional contributions to long noncoding RNAs (lncRNAs) are not well understood. Although many lncRNAs originating from TEs (TE-lncRNAs) have been identified across various organisms, their characteristics and regulatory roles, particularly in insects, remain largely unexplored. This study integrated multi-omics data to investigate TE-lncRNAs in D. melanogaster, focusing on the influence of transposons across different omics levels. Results: We identified 16,118 transposons overlapping with lncRNA sequences that constitute 2119 TE-lncRNAs (40.4% of all lncRNAs) using 256 public RNA-seq samples and 15 lncRNA-seq samples of Drosophila S2 cells treated with heavy metals. Of these, 67.2% of TE-lncRNAs contain more than one TE. The LTR/Gypsy family was the most common transposon insertion. Transposons preferred to insert into promoters, transcription starting sites, and intronic regions, especially in chromosome ends. Compared with lncRNAs, TE-lncRNAs showed longer lengths, a lower conservation, and lower levels but a higher specificity of expression. Multi-omics data analysis revealed positive correlations between transposon insertions and chromatin openness at the pre-transcriptional level. Notably, a total of 516 TE-lncRNAs provided transcriptional factor binding sites through transposon insertions. The regulatory network of a key transcription factor was rewired by transposons, potentially recruiting other transcription factors to exert regulatory functions under heavy metal stress. Additionally, 99 TE-lncRNAs were associated with m6A methylation modification sites, and 115 TE-lncRNAs potentially provided candidate small open reading frames through transposon insertions. Conclusions: Our data analysis demonstrated that TEs contribute to the regulation of lncRNAs. TEs not only promote the transcriptional regulation of lncRNAs, but also facilitate their post-transcriptional and epigenetic regulation.
    Keywords:  Drosophila; TE-lncRNA; heavy metal; long noncoding RNA; transposable element
    DOI:  https://doi.org/10.3390/insects15120950
  7. Res Sq. 2024 Dec 19. pii: rs.3.rs-5390104. [Epub ahead of print]
      Background: Nucleotide sequence can be translated in three reading frames from 5' to 3' producing distinct protein products. Many examples of RNA translation in two reading frames (dual coding) have been identified so far. Results: We report simultaneous translation of mRNA transcripts derived from SRD5A1 locus in all three reading frames that result in the synthesis of long proteins. This occurs due to initiation at three nearby AUG codons occurring in all three-reading frame. Only one of the three proteoforms contains the conserved catalytical domain of SDRD5A1 produced either from the second or the third AUG codon depending on the transcript. Paradoxically, ribosome profiling data and expression reporters indicate that the most efficient translation produces catalytically inactive proteoforms. While phylogenetic analysis suggests that the long triple decoding region is specific to primates, occurrence of nearby AUGs in all three reading frames is ancestral to placental mammals. This suggests that their evolutionary significance belongs to regulation of translation rather than biological role of their products. By analysing multiple publicly available ribosome profiling data and with gene expression assays carried out in different cellular environments, we show that relative expression of these proteoforms is mutually dependent and vary across environments supporting this conjecture. A remarkable feature of triple decoding is its resistance to indel mutations with apparent implications to clinical interpretation of genomic variants. Conclusion: We argue for the importance of identification, characterisation and annotation of productive RNA translation irrespective of the presumed biological roles of the products of this translation.
    DOI:  https://doi.org/10.21203/rs.3.rs-5390104/v1