bims-strubi Biomed News
on Advances in structural biology
Issue of 2021–09–12
seventeen papers selected by
Alessandro Grinzato, European Synchrotron Radiation Facility



  1. Annu Rev Biomed Data Sci. 2020 Jul;3 163-190
      Single-particle electron cryomicroscopy (cryo-EM) is an increasingly popular technique for elucidating the three-dimensional structure of proteins and other biologically significant complexes at near-atomic resolution. It is an imaging method that does not require crystallization and can capture molecules in their native states. In single-particle cryo-EM, the three-dimensional molecular structure needs to be determined from many noisy two-dimensional tomographic projections of individual molecules, whose orientations and positions are unknown. The high level of noise and the unknown pose parameters are two key elements that make reconstruction a challenging computational problem. Even more challenging is the inference of structural variability and flexible motions when the individual molecules being imaged are in different conformational states. This review discusses computational methods for structure determination by single-particle cryo-EM and their guiding principles from statistical inference, machine learning, and signal processing that also play a significant role in many other data science applications.
    Keywords:  Electron cryomicroscopy; conformational heterogeneity; contrast transfer function; image alignment and classification; statistical estimation; three-dimensional tomographic reconstruction
    DOI:  https://doi.org/10.1146/annurev-biodatasci-021020-093826
  2. Front Mol Biosci. 2021 ;8 716973
      Proteins interact to form complexes. Predicting the quaternary structure of protein complexes is useful for protein function analysis, protein engineering, and drug design. However, few user-friendly tools leveraging the latest deep learning technology for inter-chain contact prediction and the distance-based modelling to predict protein quaternary structures are available. To address this gap, we develop DeepComplex, a web server for predicting structures of dimeric protein complexes. It uses deep learning to predict inter-chain contacts in a homodimer or heterodimer. The predicted contacts are then used to construct a quaternary structure of the dimer by the distance-based modelling, which can be interactively viewed and analysed. The web server is freely accessible and requires no registration. It can be easily used by providing a job name and an email address along with the tertiary structure for one chain of a homodimer or two chains of a heterodimer. The output webpage provides the multiple sequence alignment, predicted inter-chain residue-residue contact map, and predicted quaternary structure of the dimer. DeepComplex web server is freely available at http://tulip.rnet.missouri.edu/deepcomplex/web_index.html.
    Keywords:  deep learning; distance-based modeling; inter-chain contact prediction; protein complex structure prediction; protein interaction; protein quaternary structure prediction
    DOI:  https://doi.org/10.3389/fmolb.2021.716973
  3. J Cheminform. 2021 Sep 08. 13(1): 65
       BACKGROUND: Predicting protein-ligand binding sites is a fundamental step in understanding the functional characteristics of proteins, which plays a vital role in elucidating different biological functions and is a crucial step in drug discovery. A protein exhibits its true nature after binding to its interacting molecule known as a ligand that binds only in the favorable binding site of the protein structure. Different computational methods exploiting the features of proteins have been developed to identify the binding sites in the protein structure, but none seems to provide promising results, and therefore, further investigation is required.
    RESULTS: In this study, we present a deep learning model PUResNet and a novel data cleaning process based on structural similarity for predicting protein-ligand binding sites. From the whole scPDB (an annotated database of druggable binding sites extracted from the Protein DataBank) database, 5020 protein structures were selected to address this problem, which were used to train PUResNet. With this, we achieved better and justifiable performance than the existing methods while evaluating two independent sets using distance, volume and proportion metrics.
    Keywords:  Binding site prediction; Convolutional neural network; Data cleaning; Deep residual network; Ligand binding sites
    DOI:  https://doi.org/10.1186/s13321-021-00547-7
  4. Commun Biol. 2021 Sep 07. 4(1): 1044
      In cryo-electron microscopy (cryo-EM) data collection, locating a target object is error-prone. Here, we present a machine learning-based approach with a real-time object locator named yoneoLocr using YOLO, a well-known object detection system. Implementation shows its effectiveness in rapidly and precisely locating carbon holes in single particle cryo-EM and in locating crystals and evaluating electron diffraction (ED) patterns in automated cryo-electron crystallography (cryo-EX) data collection. The proposed approach will advance high-throughput and accurate data collection of images and diffraction patterns with minimal human operation.
    DOI:  https://doi.org/10.1038/s42003-021-02577-1
  5. Chem Commun (Camb). 2021 Sep 10.
      Solvation is a controlling factor for the structure and function of proteins. This article addresses the effects of solvation from an energetic perspective for the fluctuations and cosolvent-induced changes in protein structures and the equilibrium of aggregate formation for a peptide. A theoretical framework to analyze the solvation effects with an explicit solvent is introduced by adopting the energy-representation theory of solvation, and the connection of the solvation free energy to the protein structure and the aggregation tendency is quantitatively described in combination with all-atom molecular dynamics simulations. The interaction components that govern the solvation effects on the structural variations of proteins are further identified through correlation analysis, and a computational scheme to assess the shift of an aggregation equilibrium due to the addition of a cosolvent is provided.
    DOI:  https://doi.org/10.1039/d1cc03395f
  6. J Chem Theory Comput. 2021 Sep 07.
      Coarse-grained molecular dynamics provides a means for simulating the assembly and interactions of macromolecular complexes at a reduced level of representation, thereby allowing both longer timescale and larger sized simulations. Here, we describe an enhanced fragment-based protocol for converting macromolecular complexes from coarse-grained to atomistic resolution, for further refinement and analysis. While the focus is upon systems that comprise an integral membrane protein embedded in a phospholipid bilayer, the technique is also suitable for membrane-anchored and soluble protein/nucleotide complexes. Overall, this provides a method for generating an accurate and well-equilibrated atomic-level description of a macromolecular complex. The approach is evaluated using a diverse test set of 11 system configurations of varying size and complexity. Simulations are assessed in terms of protein stereochemistry, conformational drift, lipid/protein interactions, and lipid dynamics.
    DOI:  https://doi.org/10.1021/acs.jctc.1c00295
  7. Front Mol Biosci. 2021 ;8 715972
      Modern proteins have been shown to share evolutionary relationships via subdomain-sized fragments. The assembly of such fragments through duplication and recombination events led to the complex structures and functions we observe today. We previously implemented a pipeline that identified more than 1,000 of these fragments that are shared by different protein folds and developed a web interface to analyze and search for them. This resource named Fuzzle helps structural and evolutionary biologists to identify and analyze conserved parts of a protein but it also provides protein engineers with building blocks for example to design proteins by fragment combination. Here, we describe a new version of this web resource that was extended to include ligand information. This addition is a significant asset to the database since now protein fragments that bind specific ligands can be identified and analyzed. Often the mode of ligand binding is conserved in proteins thereby supporting a common evolutionary origin. The same can now be explored for subdomain-sized fragments within this database. This ligand binding information can also be used in protein engineering to graft binding pockets into other protein scaffolds or to transfer functional sites via recombination of a specific fragment. Fuzzle 2.0 is freely available at https://fuzzle.uni-bayreuth.de/2.0.
    Keywords:  flavodoxin-like fold; periplasmic binding protein; protein design; protein evolution; protein fragment; web server
    DOI:  https://doi.org/10.3389/fmolb.2021.715972
  8. BMC Bioinformatics. 2021 Sep 08. 22(1): 428
       BACKGROUND: RNA regulates a variety of biological functions by interacting with other molecules. The ligand often binds in the RNA pocket to trigger structural changes or functions. Thus, it is essential to explore and visualize the RNA pocket to elucidate the structural and recognition mechanism for the RNA-ligand complex formation.
    RESULTS: In this work, we developed one user-friendly bioinformatics tool, RPocket. This database provides geometrical size, centroid, shape, secondary structure element for RNA pocket, RNA-ligand interaction information, and functional sites. We extracted 240 RNA pockets from 94 non-redundant RNA-ligand complex structures. We developed RPDescriptor to calculate the pocket geometrical property quantitatively. The geometrical information was then subjected to RNA-ligand binding analysis by incorporating the sequence, secondary structure, and geometrical combinations. This new approach takes advantage of both the atom-level precision of the structure and the nucleotide-level tertiary interactions. The results show that the higher-level topological pattern indeed improves the tertiary structure prediction. We also proposed a potential mechanism for RNA-ligand complex formation. The electrostatic interactions are responsible for long-range recognition, while the Van der Waals and hydrophobic contacts for short-range binding and optimization. These interaction pairs can be considered as distance constraints to guide complex structural modeling and drug design.
    CONCLUSION: RPocket database would facilitate RNA-ligand engineering to regulate the complex formation for biological or medical applications. RPocket is available at http://zhaoserver.com.cn/RPocket/RPocket.html .
    Keywords:  Drug discovery; Pocket database; RNA-ligand interaction; Structure prediction
    DOI:  https://doi.org/10.1186/s12859-021-04349-4
  9. Nat Commun. 2021 Sep 10. 12(1): 5364
      Ribosomes comprise a large (LSU) and a small subunit (SSU) which are synthesized independently in the nucleolus before being exported into the cytoplasm, where they assemble into functional ribosomes. Individual maturation steps have been analyzed in detail using biochemical methods, light microscopy and conventional electron microscopy (EM). In recent years, single particle analysis (SPA) has yielded molecular resolution structures of several pre-ribosomal intermediates. It falls short, however, of revealing the spatiotemporal sequence of ribosome biogenesis in the cellular context. Here, we present our study on native nucleoli in Chlamydomonas reinhardtii, in which we follow the formation of LSU and SSU precursors by in situ cryo-electron tomography (cryo-ET) and subtomogram averaging (STA). By combining both positional and molecular data, we reveal gradients of ribosome maturation within the granular component (GC), offering a new perspective on how the liquid-liquid-phase separation of the nucleolus supports ribosome biogenesis.
    DOI:  https://doi.org/10.1038/s41467-021-25413-w
  10. Bioinformatics. 2021 Sep 06. pii: btab640. [Epub ahead of print]
       MOTIVATION: Intrinsically disordered protein regions interact with proteins, nucleic acids and lipids. Regions that bind lipids are implicated in a wide spectrum of cellular functions and several human diseases. Motivated by the growing amount of experimental data for these interactions and lack of tools that can predict them from the protein sequence, we develop DisoLipPred, the first predictor of the disordered lipid-binding residues (DLBRs).
    RESULTS: DisoLipPred relies on a deep bidirectional recurrent network that implements three innovative features: transfer learning, bypass module that sidesteps predictions for putative structured residues, and expanded inputs that cover physiochemical properties associated with the protein-lipid interactions. Ablation analysis shows that these features drive predictive quality of DisoLipPred. Tests on an independent test dataset and the yeast proteome reveal that DisoLipPred generates accurate results and that none of the related existing tools can be used to indirectly identify DLBR. We also show that DisoLipPred's predictions complement the results generated by predictors of the transmembrane regions. Altogether, we conclude that DisoLipPred provides high-quality predictions of DLBRs that complement the currently available methods.
    AVAILABILITY: DisoLipPred's webserver is available at http://biomine.cs.vcu.edu/servers/DisoLipPred/.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab640
  11. J Chem Theory Comput. 2021 Sep 10.
      RNA molecules can easily adopt alternative structures in response to different environmental conditions. As a result, a molecule's energy landscape is rough and can exhibit a multitude of deep basins. In the absence of a high-resolution structure, small-angle X-ray scattering data (SAXS) can narrow down the conformational space available to the molecule and be used in conjunction with physical modeling to obtain high-resolution putative structures to be further tested by experiments. Because of the low resolution of these data, it is natural to implement the integration of SAXS data into simulations using a coarse-grained representation of the molecule, allowing for much wider searches and faster evaluation of SAXS theoretical intensity curves than with atomistic models. We present here the theoretical framework and the implementation of a simulation approach based on our coarse-grained model HiRE-RNA combined with SAXS evaluations "on-the-fly" leading the simulation toward conformations agreeing with the scattering data, starting from partially folded structures as the ones that can easily be obtained from secondary structure prediction-based tools. We show on three benchmark systems how our approach can successfully achieve high-resolution structures with remarkable similarity with the native structure recovering not only the overall shape, as imposed by SAXS data, but also the details of initially missing base pairs.
    DOI:  https://doi.org/10.1021/acs.jctc.1c00441
  12. J Chem Theory Comput. 2021 Sep 08.
      The binding kinetic properties of potential drugs may significantly influence their subsequent clinical efficacy. Predictions of these properties based on computer simulations provide a useful alternative to their expensive and time-consuming experimental counterparts, even at an early drug discovery stage. Herein, we perform scaled molecular dynamics (ScaledMD) simulations on a set of 27 ligands of HSP90 belonging to more than seven chemical series to estimate their relative residence times. We introduce two new techniques for the analysis and the classification of the simulated unbinding trajectories. The first technique, which helps in estimating the limits of the free energy well around the bound state, and the second one, based on a new contact map fingerprint, allow the description and the comparison of the paths that lead to unbinding. Using these analyses, we find that ScaledMD's relative residence time generally enables the identification of the slowest unbinders. We propose an explanation for the underestimation of the residence times of a subset of compounds, and we investigate how the biasing in ScaledMD can affect the mechanistic insights that can be gained from the simulations.
    DOI:  https://doi.org/10.1021/acs.jctc.1c00453
  13. Chem Rev. 2021 Sep 10.
      Mass spectrometry (MS) is increasingly being used to probe the structure and dynamics of proteins and the complexes they form with other macromolecules. There are now several specialized MS methods, each with unique sample preparation, data acquisition, and data processing protocols. Collectively, these methods are referred to as structural MS and include cross-linking, hydrogen-deuterium exchange, hydroxyl radical footprinting, native, ion mobility, and top-down MS. Each of these provides a unique type of structural information, ranging from composition and stoichiometry through to residue level proximity and solvent accessibility. Structural MS has proved particularly beneficial in studying protein classes for which analysis by classic structural biology techniques proves challenging such as glycosylated or intrinsically disordered proteins. To capture the structural details for a particular system, especially larger multiprotein complexes, more than one structural MS method with other structural and biophysical techniques is often required. Key to integrating these diverse data are computational strategies and software solutions to facilitate this process. We provide a background to the structural MS methods and briefly summarize other structural methods and how these are combined with MS. We then describe current state of the art approaches for the integration of structural MS data for structural biology. We quantify how often these methods are used together and provide examples where such combinations have been fruitful. To illustrate the power of integrative approaches, we discuss progress in solving the structures of the proteasome and the nuclear pore complex. We also discuss how information from structural MS, particularly pertaining to protein dynamics, is not currently utilized in integrative workflows and how such information can provide a more accurate picture of the systems studied. We conclude by discussing new developments in the MS and computational fields that will further enable in-cell structural studies.
    DOI:  https://doi.org/10.1021/acs.chemrev.1c00356
  14. Molecules. 2021 Aug 24. pii: 5124. [Epub ahead of print]26(17):
      In silico target fishing, whose aim is to identify possible protein targets for a query molecule, is an emerging approach used in drug discovery due its wide variety of applications. This strategy allows the clarification of mechanism of action and biological activities of compounds whose target is still unknown. Moreover, target fishing can be employed for the identification of off targets of drug candidates, thus recognizing and preventing their possible adverse effects. For these reasons, target fishing has increasingly become a key approach for polypharmacology, drug repurposing, and the identification of new drug targets. While experimental target fishing can be lengthy and difficult to implement, due to the plethora of interactions that may occur for a single small-molecule with different protein targets, an in silico approach can be quicker, less expensive, more efficient for specific protein structures, and thus easier to employ. Moreover, the possibility to use it in combination with docking and virtual screening studies, as well as the increasing number of web-based tools that have been recently developed, make target fishing a more appealing method for drug discovery. It is especially worth underlining the increasing implementation of machine learning in this field, both as a main target fishing approach and as a further development of already applied strategies. This review reports on the main in silico target fishing strategies, belonging to both ligand-based and receptor-based approaches, developed and applied in the last years, with a particular attention to the different web tools freely accessible by the scientific community for performing target fishing studies.
    Keywords:  docking; machine learning; molecular similarity; reverse screening; target fishing
    DOI:  https://doi.org/10.3390/molecules26175124
  15. Phys Chem Chem Phys. 2021 Sep 09.
      Two families of organic molecules with different backbones have been considered. The first family is based on a macrolactam-like unit that is constrained in a particular conformation. The second family is composed by a substituted central phenyl that allows a larger mobility for its substituents. They have however a common feature, three amide moieties (within the cycle for the macrolactam-like molecule and as substituents for the phenyl) that permit hydrogen bonding when molecules are stacked. In this study we propose a computational protocol to unravel the ability of the different families to self-assemble into organic nanotubes. Starting from the monomer and going towards larger assemblies like dimers, trimers, and pentamers we applied the different protocols to rationalize the behavior of the different assemblies. Both structures and thermodynamics were investigated to give a complete picture of the process. Thanks to the combination of a quantum mechanics approach and molecular dynamics simulations along with the use of tailored tools (non covalent interaction visualization) and techniques (umbrella sampling), we have been able to differentiate the two families and highlight the best candidate for self-assembling purposes.
    DOI:  https://doi.org/10.1039/d1cp02675e
  16. J Chem Theory Comput. 2021 Sep 08.
      Coarse-grained modeling can be used to explore general theories that are independent of specific chemical detail. In this paper, we present cg_openmm, a Python-based simulation framework for modeling coarse-grained hetero-oligomers and screening them for structural and thermodynamic characteristics of cooperative secondary structures. cg_openmm facilitates the building of coarse-grained topology and random starting configurations, setup of GPU-accelerated replica exchange molecular dynamics simulations with the OpenMM software package, and features a suite of postprocessing thermodynamic and structural analysis tools. In particular, native contact analysis, heat capacity calculations, and free energy of folding calculations are used to identify and characterize cooperative folding transitions and stable secondary structures. In this work, we demonstrate the capabilities of cg_openmm on a simple 1-1 Lennard-Jones coarse-grained model, in which each residue contains 1 backbone and 1 side-chain bead. By scanning both nonbonded and bonded force-field parameter spaces at the coarse-grained level, we identify and characterize sets of parameters which result in the formation of stable helices through cooperative folding transitions. Moreover, we show that the geometries and stabilities of these helices can be tuned by manipulating the force-field parameters.
    DOI:  https://doi.org/10.1021/acs.jctc.1c00528
  17. Biopolymers. 2021 Sep 09. e23471
      Extant fold-switching proteins remodel their secondary structures and change their functions in response to cellular stimuli, regulating biological processes and affecting human health. Despite their biological importance, these proteins remain understudied. Predictive methods are needed to expedite the process of discovering and characterizing more of these shapeshifting proteins. Most previous approaches require a solved structure or all-atom simulations, greatly constraining their use. Here, we propose a high-throughput sequence-based method for predicting extant fold switchers that transition from α-helix in one conformation to β-strand in the other. This method leverages two previous observations: (a) α-helix ↔ β-strand prediction discrepancies from JPred4 are a robust predictor of fold switching, and (b) the fold-switching regions (FSRs) of some extant fold switchers have different secondary structure propensities when expressed by themselves (isolated FSRs) than when expressed within the context of their parent protein (contextualized FSRs). Combining these two observations, we ran JPred4 on 99-fold-switching proteins and found strong correspondence between predicted and experimentally observed α-helix ↔ β-strand discrepancies. To test the overall robustness of this finding, we randomly selected regions of proteins not expected to switch folds (single-fold proteins) and found significantly fewer predicted α-helix ↔ β-strand discrepancies. Combining these discrepancies with the overall percentage of predicted secondary structure, we developed a classifier to identify extant fold switchers (Matthews correlation coefficient of .71). Although this classifier had a high false-negative rate (7/17), its false-positive rate was very low (2/136), suggesting that it can be used to predict a subset of extant fold switchers from a multitude of available genomic sequences.
    Keywords:  bioinformatics; fold-switching proteins; metamorphic proteins; protein folding
    DOI:  https://doi.org/10.1002/bip.23471