bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2024‒03‒10
seven papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Nat Commun. 2024 Mar 02. 15(1): 1932
      Studies have revealed dozens of functional peptides in putative 'noncoding' regions and raised the question of how many proteins are encoded by noncanonical open reading frames (ORFs). Here, we comprehensively annotate genome-wide translated ORFs across five eukaryotes (human, mouse, zebrafish, worm, and yeast) by analyzing ribosome profiling data. We develop a logistic regression model named PepScore based on ORF features (expected length, encoded domain, and conservation) to calculate the probability that the encoded peptide is stable in humans. Systematic ectopic expression validates PepScore and shows that stable complex-associating microproteins can be encoded in 5'/3' untranslated regions and overlapping coding regions of mRNAs besides annotated noncoding RNAs. Stable noncanonical proteins follow conventional rules and localize to different subcellular compartments. Inhibition of proteasomal/lysosomal degradation pathways can stabilize some peptides especially those with moderate PepScores, but cannot rescue the expression of short ones with low PepScores suggesting they are directly degraded by cellular proteases. The majority of human noncanonical peptides with high PepScores show longer lengths but low conservation across species/mammals, and hundreds contain trait-associated genetic variants. Our study presents a statistical framework to identify stable noncanonical peptides in the genome and provides a valuable resource for functional characterization of noncanonical translation during development and disease.
    DOI:  https://doi.org/10.1038/s41467-024-46240-9
  2. Proteomics. 2024 Mar 08. e2300105
      Peptides have a plethora of activities in biological systems that can potentially be exploited biotechnologically. Several peptides are used clinically, as well as in industry and agriculture. The increase in available 'omics data has recently provided a large opportunity for mining novel enzymes, biosynthetic gene clusters, and molecules. While these data primarily consist of DNA sequences, other types of data provide important complementary information. Due to their size, the approaches proven successful at discovering novel proteins of canonical size cannot be naïvely applied to the discovery of peptides. Peptides can be encoded directly in the genome as short open reading frames (smORFs), or they can be derived from larger proteins by proteolysis. Both of these peptide classes pose challenges as simple methods for their prediction result in large numbers of false positives. Similarly, functional annotation of larger proteins, traditionally based on sequence similarity to infer orthology and then transferring functions between characterized proteins and uncharacterized ones, cannot be applied for short sequences. The use of these techniques is much more limited and alternative approaches based on machine learning are used instead. Here, we review the limitations of traditional methods as well as the alternative methods that have recently been developed for discovering novel bioactive peptides with a focus on prokaryotic genomes and metagenomes.
    Keywords:  bioinformatics; biomedicine; data mining < bioinformatics; diseases < biomedicine; infectious
    DOI:  https://doi.org/10.1002/pmic.202300105
  3. Nat Commun. 2024 Mar 07. 15(1): 2091
      Identifying open reading frames (ORFs) being translated is not a trivial task. ProTInSeq is a technique designed to characterize proteomes by sequencing transposon insertions engineered to express a selection marker when they occur in-frame within a protein-coding gene. In the bacterium Mycoplasma pneumoniae, ProTInSeq identifies 83% of its annotated proteins, along with 5 proteins and 153 small ORF-encoded proteins (SEPs; ≤100 aa) that were not previously annotated. Moreover, ProTInSeq can be utilized for detecting translational noise, as well as for relative quantification and transmembrane topology estimation of fitness and non-essential proteins. By integrating various identification approaches, the number of initially annotated SEPs in this bacterium increases from 27 to 329, with a quarter of them predicted to possess antimicrobial potential. Herein, we describe a methodology complementary to Ribo-Seq and mass spectroscopy that can identify SEPs while providing other insights in a proteome with a flexible and cost-effective DNA ultra-deep sequencing approach.
    DOI:  https://doi.org/10.1038/s41467-024-46112-2
  4. Adv Protein Chem Struct Biol. 2024 ;pii: S1876-1623(23)00101-3. [Epub ahead of print]139 289-334
      Studies focusing on characterizing circRNAs with the potential to translate into peptides are quickly advancing. It is helping to elucidate the roles played by circRNAs in several biological processes, especially in the emergence and development of diseases. While various tools are accessible for predicting coding regions within linear sequences, none have demonstrated accurate open reading frame detection in circular sequences, such as circRNAs. Here, we present cirCodAn, a novel tool designed to predict coding regions in circRNAs. We evaluated the performance of cirCodAn using datasets of circRNAs with strong translation evidence and showed that cirCodAn outperformed the other tools available to perform a similar task. Our findings demonstrate the applicability of cirCodAn to identify coding regions in circRNAs, which reveals the potential of use of cirCodAn in future research focusing on elucidating the biological roles of circRNAs and their encoded proteins. cirCodAn is freely available at https://github.com/denilsonfbar/cirCodAn.
    Keywords:  CircRNA; Coding region prediction; Generalized hidden Markov model; Probabilistic models; Transcriptomics
    DOI:  https://doi.org/10.1016/bs.apcsb.2023.11.012
  5. Eur J Hum Genet. 2024 Mar 04.
      More than 50% of patients with primary familial brain calcification (PFBC), a rare neurological disorder, remain genetically unexplained. While some causative genes are yet to be identified, variants in non-coding regions of known genes may represent a source of missed diagnoses. We hypothesized that 5'-Untranslated Region (UTR) variants introducing an AUG codon may initiate mRNA translation and result in a loss of function in some of the PFBC genes. After reannotation of exome sequencing data of 113 unrelated PFBC probands, we identified two upstream AUG-introducing variants in the 5'UTR of PDGFB. One, NM_002608.4:c.-373C>G, segregated with PFBC in the family. It was predicted to create an upstream open reading frame (ORF). The other one, NM_002608.4:c.-318C>T, was found in a simplex case. It was predicted to result in an ORF overlapping the natural ORF with a frameshift. In a GFP reporter assay, both variants were associated with a dramatic decrease in GFP levels, and, after restoring the reading frame with the GFP sequence, the c.-318C>T variant was associated with a strong initiation of translation as measured by western blotting. Overall, we found upstream AUG-introducing variants in the 5'UTR of PDGFB in 2/113 (1.7%) undiagnosed PFBC cases. Such variants thus represent a source of putative pathogenic variants.
    DOI:  https://doi.org/10.1038/s41431-024-01580-4
  6. RNA. 2024 Mar 05. pii: rna.079903.123. [Epub ahead of print]
      Despite being predicted to lack coding potential, cytoplasmic long non-coding (lnc)RNAs can associate with ribosomes. However, the landscape and biological relevance of lncRNAs translation remains poorly studied. In yeast, cytoplasmic Xrn1-sensitive lncRNAs (XUTs) are targeted by the Nonsense-Mediated mRNA Decay (NMD), suggesting a translation-dependent degradation process. Here, we report that XUTs are pervasively translated, which impacts their decay. We show that XUTs globally accumulate upon translation elongation inhibition, but not when initial ribosome loading is impaired. Ribo-Seq confirmed ribosomes binding to XUTs and identified actively translated 5'-proximal small ORFs. Mechanistically, the NMD-sensitivity of XUTs mainly depends on the 3'-untranslated region length. Finally, we show that the peptide resulting from the translation of an NMD-sensitive XUT reporter exists in NMD-competent cells. Our work highlights the role of translation in the post-transcriptional metabolism of XUTs. We propose that XUT-derived peptides could be exposed to the natural selection, while NMD restricts XUTs levels.
    Keywords:  NMD; Xrn1; lncRNA; translation
    DOI:  https://doi.org/10.1261/rna.079903.123
  7. Proteomics Clin Appl. 2024 Mar 05. e2300128
      PURPOSE: Micropeptides are an emerging class of proteins that play critical roles in cell signaling. Here, we describe the discovery of a novel micropeptide, dubbed slitharin (Slt), in conditioned media from Cardiosphere-derived cells (CDCs), a therapeutic cardiac stromal cell type.EXPERIMENTAL DESIGN: We performed mass spectrometry of peptide-enriched fractions from the conditioned media of CDCs and a therapeutically inert cell type (human dermal fibrobasts). We then evaluated the therapeutic capacity of the candidate peptide using an in vitro model of cardiomyocyte injury and a rat model of myocardial infarction.
    RESULTS: We identified a novel 24-amino acid micropeptide (dubbed Slitharin [Slt]) with a non-canonical leucine start codon, arising from long intergenic non-coding (LINC) RNA 2099. Neonatal rat ventricular myocytes (NRVMs) exposed to Slt were protected from hypoxic injury in vitro compared to a vehicle or scrambled control. Transcriptomic analysis of cardiomyocytes exposed to Slt reveals cytoprotective capacity, putatively through regulation of stress-induced MAPK-ERK. Slt also exerted cardioprotective effects in rats with myocardial infarction as shown by reduced infarct size 48 h post-injury. Conclusions and clinical relavance: Thus, Slt is a non-coding RNA-derived micropeptide, identified in the extracellular space, with a potential cardioprotective function.
    Keywords:  cardiomyocytes; cardiosphere-derived cells; micropeptide; myocardial infarction
    DOI:  https://doi.org/10.1002/prca.202300128