bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2021‒03‒28
eight papers selected by
Thomas Martinez
Salk Institute for Biological Studies


  1. J Proteome Res. 2021 Mar 24.
      The identification of proteins below approximately 70-100 amino acids in bottom-up proteomics is still a challenging task due to the limited number of peptides generated by proteolytic digestion. This includes the short open reading frame-encoded peptides (SEPs), which are a subset of the small proteins that were not previously annotated or that are alternatively encoded. Here, we systematically investigated the use of multiple proteases (trypsin, chymotrypsin, LysC, LysargiNase, and GluC) in GeLC-MS/MS analysis to improve the sequence coverage and the number of identified peptides for small proteins, with a focus on SEPs, in the archaeon Methanosarcina mazei. Combining the data of all proteases, we identified 63 small proteins and additional 28 SEPs with at least two unique peptides, while only 55 small proteins and 22 SEP could be identified using trypsin only. For 27 small proteins and 12 SEPs, a complete sequence coverage was achieved. Moreover, for five SEPs, incorrectly predicted translation start points or potential in vivo proteolytic processing were identified, confirming the data of a previous top-down proteomics study of this organism. The results show clearly that a multi-protease approach allows to improve the identification and molecular characterization of small proteins and SEPs. LC-MS data: ProteomeXchange PXD023921.
    Keywords:  LC−MS; alternative open reading frames; bottom-up; peptidomics; sORF; smORF; small open reading frames; terminomics
    DOI:  https://doi.org/10.1021/acs.jproteome.1c00115
  2. Nat Commun. 2021 03 09. 12(1): 1515
      Ribosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5'UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.
    DOI:  https://doi.org/10.1038/s41467-021-21812-1
  3. Methods Mol Biol. 2021 ;2252 27-55
      The knowledge of translation start sites is crucial for annotation of genes in bacterial genomes. However, systematic mapping of start codons in bacterial genes has mainly relied on predictions based on protein conservation and mRNA sequence features which, although useful, are not always accurate. We recently found that the pleuromutilin antibiotic retapamulin (RET) is a specific inhibitor of translation initiation that traps ribosomes specifically at start codons, and we used it in combination with ribosome profiling to map start codons in the Escherichia coli genome. This genome-wide strategy, that was named Ribo-RET, not only verifies the position of start codons in already annotated genes but also enables identification of previously unannotated open reading frames and reveals the presence of internal start sites within genes. Here, we provide a detailed Ribo-RET protocol for E. coli. Ribo-RET can be adapted for mapping the start codons of the protein-coding sequences in a variety of bacterial species.
    Keywords:  Alternative proteome; Bacterial translation; Pleuromutilin; Retapamulin; Ribo-Seq; Ribosome profiling; Start codons; Translation initiation
    DOI:  https://doi.org/10.1007/978-1-0716-1150-0_2
  4. Peptides. 2021 Mar 17. pii: S0196-9781(21)00037-1. [Epub ahead of print] 170529
      The rat angiotensin type 1a receptor (AT1aR) is a peptide hormone G protein-coupled receptor (GPCR) that plays a key role in electrolyte homeostasis and blood pressure control. There is a highly conserved short open reading frame (sORF) in exon 2 (E2) that is downstream from exon 1 (E1) and upstream of the AT1aR coding region located in exon 3 (E3). To determine the role of this E2 sORF in AT1aR signaling, human embryonic kidney-293 (HEK293) cells were transfected with plasmids containing AT1aR cDNA with either an intact or disrupted E2 sORF. The intact sORF attenuated the efficacy of angiotensin (Ang) II (p < 0.001) and sarcosine1,Ile4,Ile8-Ang II (SII), (p < 0.01) to activate AT1aR signaling through extracellular signal-related kinases 1/2 (ERK1/2). A time-course showed agonist-induced AT1aR-mediated ERK1/2 activation was slower in the presence of the intact compared to the disrupted sORF [Ang II: p < 0.01 and SII: p < 0.05]. Ang II-induced ERK1/2 activation was completely inhibited by the protein kinase C (PKC) inhibitor Ro 31-8220 regardless of whether the sORF was intact or disrupted. Flow cytometric analyses suggested the intact sORF improved cell survival; the percentage of live cells increased (p < 0.05) while the percentage of early apoptotic cells decreased (p < 0.01) in cells transfected with the AT1aR plasmid containing the intact sORF. These findings have implications for the regulation of AT1Rs in physiological and pathological conditions and warrant investigation of sORFs in the 5' leader sequence (5'LS) of other GPCRs.
    Keywords:  PEP7; biased agonist; endocytosis; posttranscriptional regulation; receptor internalization; β-arrestin
    DOI:  https://doi.org/10.1016/j.peptides.2021.170529
  5. Methods Mol Biol. 2021 ;2252 313-329
      The identification of upstream open reading frames (uORFs) using ribosome profiling data is complicated by several factors such as the noise inherent to the procedure, the substantial increase in potential translation initiation sites (and false positives) when one includes non-canonical start codons, and the paucity of molecularly validated uORFs. Here we present uORF-seqr, a novel machine learning algorithm that uses ribosome profiling data, in conjunction with RNA-seq data, as well as transcript aware genome annotation files to identify statistically significant AUG and near-cognate codon uORFs.
    Keywords:  Alternative translation initiation site; Near-cognate codon; Non-canonical start codon; Translational regulation; Upstream open reading frame; uORF
    DOI:  https://doi.org/10.1007/978-1-0716-1150-0_15
  6. Methods Mol Biol. 2021 ;2252 331-346
      Ribosome profiling, or Ribo-seq, provides precise information about the position of actively translating ribosomes. It can be used to identify open reading frames (ORFs) that are translated in a given sample. The RiboTaper pipeline, and the ORFquant R package, leverages the periodic distribution of such ribosomes along the ORF to perform a statistically robust test for translation which is insensitive to aperiodic noise and provides a statistically robust measure of translation. In addition to accounting for complex loci with overlapping ORFs, ORFquant is also able to use Ribo-seq as a tool for distinguishing actively translated transcripts from non-translated ones, within a given gene locus.
    Keywords:  Genomics; Open reading frame; Periodicity; Ribo-seq; Ribosome; Sequencing; Translation
    DOI:  https://doi.org/10.1007/978-1-0716-1150-0_16
  7. Methods Mol Biol. 2021 ;2252 295-312
      Ribosome profiling has been instrumental in leading to important discoveries in several fields of life sciences. Here we describe a computational approach that enables identification of translation events on a genome-wide scale from ribosome profiling data. Periodic fragment sizes indicative of active translation are selected without supervision for each library. Our workflow allows to map the whole translational landscape of a given cell, tissue, or organism, under varying conditions, and can be used to expand the search for novel, uncharacterized open reading frames, such as regulatory upstream translation events. Through a detailed workflow example, we show how to perform qualitative and quantitative analysis of translatomes.
    Keywords:  Bayesian; Open reading frame; Ribosome profiling; Translation
    DOI:  https://doi.org/10.1007/978-1-0716-1150-0_14
  8. Methods. 2021 Mar 19. pii: S1046-2023(21)00077-3. [Epub ahead of print]
      Recently, a large number of circular RNAs (circRNAs) were discovered in eukaryotes, some of which were reported to be translated through a cap-independent fashion. However, study of circRNA translation is still not trivial. Here we describe two distinct systems to generate the translatable circRNAs containing validated open reading frames (ORF) to analyze their translation in living cells. The first system is a plasmid reporter containing a single exon with split GFP fragments in reverse order, which can be efficiently back-spliced to generate a circRNA encoding intact GFP. The second system is a self-splicing reporter containing an intact Renilla luciferase (Rluc) ORF and the flanking split group I introns in reverse order, which can produce circRNAs through in vitro self-splicing of the precursor RNAs. Both circRNA systems can serve as the platforms for mechanistic studies of circRNA translation, and also serve as the reliable systems to measure the activity of IRES-mediated translation.
    Keywords:  Back-splicing; Cap-independent Translation; Circular RNA; Internal ribosomal entry sites; Self-splicing
    DOI:  https://doi.org/10.1016/j.ymeth.2021.03.011