bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2023‒03‒19
five papers selected by
Thomas Farid Martínez
University of California, Irvine


  1. Nat Ecol Evol. 2023 Mar 16.
      Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
    DOI:  https://doi.org/10.1038/s41559-023-02014-y
  2. Elife. 2023 Mar 15. pii: e78299. [Epub ahead of print]12
      Increasing numbers of small proteins with diverse physiological roles are being identified and characterized in both prokaryotic and eukaryotic systems, but the origins and evolution of these proteins remain unclear. Recent genomic sequence analyses in several organisms suggest that new functions encoded by small open reading frames (sORFs) may emerge de novo from noncoding sequences. However, experimental data demonstrating if and how randomly generated sORFs can confer beneficial effects to cells are limited. Here we show that by up-regulating hisB expression, de novo small proteins (≤ 50 amino acids in length) selected from random sequence libraries can rescue Escherichia coli cells that lack the conditionally essential SerB enzyme. The recovered small proteins are hydrophobic and confer their rescue effect by binding to the 5' end regulatory region of the his operon mRNA, suggesting that protein binding promotes structural rearrangements of the RNA that allow increased hisB expression. This study adds RNA regulatory elements as another interacting partner for de novo proteins isolated from random sequence libraries, and provides further experimental evidence that small proteins with selective benefits can originate from the expression of nonfunctional sequences.
    Keywords:  E. coli; evolutionary biology; infectious disease; microbiology
    DOI:  https://doi.org/10.7554/eLife.78299
  3. Semin Cell Dev Biol. 2023 Mar 14. pii: S1084-9521(23)00059-9. [Epub ahead of print]
      The importance of translation fidelity has been apparent since the discovery of genetic code. It is commonly believed that translation deviating from the main coding region is to be avoided at all times inside cells. However, ribosome profiling and mass spectrometry have revealed pervasive noncanonical translation. Both the scope and origin of translational "noise" are just beginning to be appreciated. Although largely overlooked, those translational "noises" are associated with a wide range of cellular functions, such as producing unannotated protein products. Furthermore, the dynamic nature of translational "noise" is responsive to stress conditions, highlighting the beneficial effect of translational "noise" in stress adaptation. Mechanistic investigation of translational "noise" will provide better insight into the mechanisms of translational regulation. Ultimately, they are not "noise" at all but represent a signature of cellular activities under pathophysiological conditions. Deciphering translational "noise" holds the therapeutic and diagnostic potential in a wide spectrum of human diseases.
    Keywords:  Alternative initiation; Frameshifting; Reinitiation; Ribosome; Start codon; Stop codon; Translation
    DOI:  https://doi.org/10.1016/j.semcdb.2023.03.004
  4. Comput Biol Med. 2023 Mar 11. pii: S0010-4825(23)00238-X. [Epub ahead of print]157 106773
      Recently, small open reading frames (sORFs) in long noncoding RNA (lncRNA) have been demonstrated to encode small peptides that can help study the mechanisms of growth and development in organisms. Since machine learning-based computational methods are less costly compared with biological experiments, they can be used to identify sORFs and provide a basis for biological experiments. However, few computational methods and data resources have been exploited for identifying sORFs in plant lncRNA. Besides, machine learning models produce underperforming classifiers when faced with a class-imbalance problem. In this study, an alternative method called SMOTE based on weighted cosine distance (WCDSMOTE) which enables interaction with feature selection is put forward to synthesize minority class samples and weighted edited nearest neighbor (WENN) is applied to clean up majority class samples, thus, hybrid sampling WCDSMOTE-ENN is proposed to deal with imbalanced datasets with the multi-angle feature. A heterogeneous classifier ensemble is introduced to complete the classification task. Therefore, a novel computational method that is based on class-imbalance learning to identify the sORFs with coding potential in plant lncRNA (sORFplnc) is presented. Experimental results manifest that sORFplnc outperforms existing computational methods in identifying sORFs with coding potential. We anticipate that the proposed work can be a reference for relevant research and contribute to agriculture and biomedicine.
    Keywords:  Class-imbalance learning; Ensemble learning; Feature selection; Hybrid resampling; lncRNA; sORFs
    DOI:  https://doi.org/10.1016/j.compbiomed.2023.106773
  5. J Proteome Res. 2023 Mar 16.
      The incidence rate of atrial fibrillation (AF) has stayed at a high level in recent years. Despite the intensive efforts to study the pathologic changes of AF, the molecular mechanism of disease development remains unclarified. Microproteins are ribosomally translated gene products from small open reading frames (sORFs) and are found to play crucial biological functions, while remain rare attention and indistinct in AF study. In this work, we recruited 65 AF patients and 65 healthy subjects for microproteomic profiling. By differential analysis and cross-validation between independent datasets, a total of 4 microproteins were identified as significantly different, including 3 annotated ones and 1 novel one. Additionally, we established a diagnostic model with either microproteins or global proteins by machine learning methods and found the model with microproteins achieved comparable and excellent performance as that with global proteins. Our results confirmed the abnormal expression of microproteins in AF and may provide new perspectives on the mechanism study of AF.
    Keywords:  APCO1; LC/MS/MS; atrial fibrillation; cardiac homeostasis; machine learning; microproteins; serum proteomics
    DOI:  https://doi.org/10.1021/acs.jproteome.2c00622