Cell. 2025 Nov 12. pii: S0092-8674(25)01191-2. [Epub ahead of print]
Intrinsically disordered regions (IDRs) of proteins are defined by molecular grammars. This refers to IDR-specific non-random amino acid compositions and non-random patterning of distinct pairs of amino acid types. Here, we introduce grammars inferred using NARDINI+ (GIN) as a resource that uncovers IDR-specific and IDRome-spanning grammars. Using GIN-enabled analyses, we find that specific IDR features and GIN clusters are associated with distinct biological processes, intra-cellular localization preferences, specialized molecular functions, and functionalization as assessed by cellular fitness correlations. IDRs with exceptional grammars, defined as sequences with high-scoring non-random features, are harbored in proteins and complexes that enable spatial and temporal sorting of biochemical activities within the nucleus. Overall, GIN can be used to extract sequence-function relationships of individual IDRs or clusters of IDRs, to redesign extant IDRs or design de novo IDRs, to perform evolutionary analyses through the lens of molecular grammars and GIN clusters, and to make sense of IDR-specific disease-associated mutations.
Keywords: RNA polymerase; biomolecular condensates; cancer; intrinsically disordered regions; molecular grammars; subcellular localization; transcriptional regulation