bims-crepig Biomed News
on Chromatin regulation and epigenetics in cell fate and cancer
Issue of 2023‒04‒02
twenty-one papers selected by
Connor Rogerson
University of Cambridge


  1. Nat Commun. 2023 Mar 29. 14(1): 1753
      During meiotic prophase I, spermatocytes must balance transcriptional activation with homologous recombination and chromosome synapsis, biological processes requiring extensive changes to chromatin state. We explored the interplay between chromatin accessibility and transcription through prophase I of mammalian meiosis by measuring genome-wide patterns of chromatin accessibility, nascent transcription, and processed mRNA. We find that Pol II is loaded on chromatin and maintained in a paused state early during prophase I. In later stages, paused Pol II is released in a coordinated transcriptional burst mediated by the transcription factors A-MYB and BRDT, resulting in ~3-fold increase in transcription. Transcriptional activity is temporally and spatially segregated from key steps of meiotic recombination: double strand breaks show evidence of chromatin accessibility earlier during prophase I and at distinct loci from those undergoing transcriptional activation, despite shared chromatin marks. Our findings reveal mechanisms underlying chromatin specialization in either transcription or recombination in meiotic cells.
    DOI:  https://doi.org/10.1038/s41467-023-37408-w
  2. Nucleic Acids Res. 2023 Mar 29. pii: gkad227. [Epub ahead of print]
      Many transcription factors (TFs) localize in nuclear clusters of locally increased concentrations, but how TF clustering is regulated and how it influences gene expression is not well understood. Here, we use quantitative microscopy in living cells to study the regulation and function of clustering of the budding yeast TF Gal4 in its endogenous context. Our results show that Gal4 forms clusters that overlap with the GAL loci. Cluster number, density and size are regulated in different growth conditions by the Gal4-inhibitor Gal80 and Gal4 concentration. Gal4 truncation mutants reveal that Gal4 clustering is facilitated by, but does not completely depend on DNA binding and intrinsically disordered regions. Moreover, we discover that clustering acts as a double-edged sword: self-interactions aid TF recruitment to target genes, but recruited Gal4 molecules that are not DNA-bound do not contribute to, and may even inhibit, transcription activation. We propose that cells need to balance the different effects of TF clustering on target search and transcription activation to facilitate proper gene expression.
    DOI:  https://doi.org/10.1093/nar/gkad227
  3. Cell Rep. 2023 Mar 29. pii: S2211-1247(23)00291-7. [Epub ahead of print]42(4): 112280
      In metazoan cells, DNA replication initiates from thousands of genomic loci scattered throughout the genome called DNA replication origins. Origins are strongly associated with euchromatin, particularly open genomic regions such as promoters and enhancers. However, over a third of transcriptionally silent genes are associated with DNA replication initiation. Most of these genes are bound and repressed by the Polycomb repressive complex-2 (PRC2) through the repressive H3K27me3 mark. This is the strongest overlap observed for a chromatin regulator with replication origin activity. Here, we asked whether Polycomb-mediated gene repression is functionally involved in recruiting DNA replication origins to transcriptionally silent genes. We show that the absence of EZH2, the catalytic subunit of PRC2, results in increased DNA replication initiation, specifically in the vicinity of EZH2 binding sites. The increase in DNA replication initiation does not correlate with transcriptional de-repression or the acquisition of activating histone marks but does correlate with loss of H3K27me3 from bivalent promoters.
    Keywords:  CP: Molecular biology; DNA replication initiation; DNA replication origins; EZH2 knockout; H3K27me3; PRC2; SNS-seq; bivalent promoters; chromatin; polycomb
    DOI:  https://doi.org/10.1016/j.celrep.2023.112280
  4. Nat Biotechnol. 2023 Mar 27.
      Transcription factor binding across the genome is regulated by DNA sequence and chromatin features. However, it is not yet possible to quantify the impact of chromatin context on transcription factor binding affinities. Here, we report a method called binding affinities to native chromatin by sequencing (BANC-seq) to determine absolute apparent binding affinities of transcription factors to native DNA across the genome. In BANC-seq, a concentration range of a tagged transcription factor is added to isolated nuclei. Concentration-dependent binding is then measured per sample to quantify apparent binding affinities across the genome. BANC-seq adds a quantitative dimension to transcription factor biology, which enables stratification of genomic targets based on transcription factor concentration and prediction of transcription factor binding sites under non-physiological conditions, such as disease-associated overexpression of (onco)genes. Notably, whereas consensus DNA binding motifs for transcription factors are important to establish high-affinity binding sites, these motifs are not always strictly required to generate nanomolar-affinity interactions in the genome.
    DOI:  https://doi.org/10.1038/s41587-023-01715-w
  5. Sci Adv. 2023 Mar 31. 9(13): eadg1123
      Biomolecular condensates participate in the regulation of gene transcription, yet the relationship between nuclear condensation and transcriptional activation remains elusive. Here, we devised a biotinylated CRISPR-dCas9-based optogenetic method, light-activated macromolecular phase separation (LAMPS), to enable inducible formation, affinity purification, and multiomic dissection of nuclear condensates at the targeted genomic loci. LAMPS-induced condensation at enhancers and promoters activates endogenous gene transcription by chromatin reconfiguration, causing increased chromatin accessibility and de novo formation of long-range chromosomal loops. Proteomic profiling of light-induced condensates by dCas9-mediated affinity purification uncovers multivalent interaction-dependent remodeling of macromolecular composition, resulting in the selective enrichment of transcriptional coactivators and chromatin structure proteins. Our findings support a model whereby the formation of nuclear condensates at native genomic loci reconfigures chromatin architecture and multiprotein assemblies to modulate gene transcription. Hence, LAMPS facilitates mechanistic interrogation of the relationship between nuclear condensation, genome structure, and gene transcription in living cells.
    DOI:  https://doi.org/10.1126/sciadv.adg1123
  6. Nat Commun. 2023 Mar 30. 14(1): 1787
      MYC is a well characterized oncogenic transcription factor in prostate cancer, and CTCF is the main architectural protein of three-dimensional genome organization. However, the functional link between the two master regulators has not been reported. In this study, we find that MYC rewires prostate cancer chromatin architecture by interacting with CTCF protein. Through combining the H3K27ac, AR and CTCF HiChIP profiles with CRISPR deletion of a CTCF site upstream of MYC gene, we show that MYC activation leads to profound changes of CTCF-mediated chromatin looping. Mechanistically, MYC colocalizes with CTCF at a subset of genomic sites, and enhances CTCF occupancy at these loci. Consequently, the CTCF-mediated chromatin looping is potentiated by MYC activation, resulting in the disruption of enhancer-promoter looping at neuroendocrine lineage plasticity genes. Collectively, our findings define the function of MYC as a CTCF co-factor in three-dimensional genome organization.
    DOI:  https://doi.org/10.1038/s41467-023-37544-3
  7. Cell Rep. 2023 Mar 30. pii: S2211-1247(23)00334-0. [Epub ahead of print]42(4): 112323
      Special AT-rich sequence binding protein 1 (SATB1) has long been proposed to act as a global chromatin loop organizer in T cells. However, the exact functions of SATB1 in spatial genome organization remain elusive. Here we show that the depletion of SATB1 in human and murine T cells leads to transcriptional dysregulation for genes involved in T cell activation, as well as alterations of 3D genome architecture at multiple levels, including compartments, topologically associating domains, and loops. Importantly, SATB1 extensively colocalizes with CTCF throughout the genome. Depletion of SATB1 leads to increased chromatin contacts among and across the SATB1/CTCF co-occupied sites, thereby affecting the transcription of critical regulators of T cell activation. The loss of SATB1 does not affect CTCF occupancy but significantly reduces the retention of CTCF in the nuclear matrix. Collectively, our data show that SATB1 contributes to 3D genome organization by constraining chromatin topology surrounding CTCF-binding sites.
    Keywords:  3D genome architecture; CP: Molecular biology; CTCF; SATB1; T cell activation; nuclear matrix; transcriptional regulation
    DOI:  https://doi.org/10.1016/j.celrep.2023.112323
  8. Development. 2023 Mar 30. pii: dev.201229. [Epub ahead of print]
      Transcriptional networks governing cardiac precursor cell (CPC) specification are incompletely understood due in part to limitations in distinguishing CPCs from non-cardiac mesoderm in early gastrulation. We leveraged detection of early cardiac lineage transgenes within a granular single cell transcriptomic time course of mouse embryos to identify emerging CPCs and describe their transcriptional profiles. Mesp1, a transiently-expressed mesodermal transcription factor (TF), is canonically described as an early regulator of cardiac specification. However, we observed perdurance of CPC transgene-expressing cells in Mesp1 mutants, albeit mis-localized, prompting us to investigate the scope of Mesp1's role in CPC emergence and differentiation. Mesp1 mutant CPCs failed to robustly activate markers of cardiomyocyte maturity and critical cardiac TFs, yet they exhibited transcriptional profiles resembling cardiac mesoderm progressing towards cardiomyocyte fates. Single cell chromatin accessibility analysis defined a Mesp1-dependent developmental breakpoint in cardiac lineage progression at a shift from mesendoderm transcriptional networks to those necessary for cardiac patterning and morphogenesis. These results reveal Mesp1-independent aspects of early CPC specification and underscore a Mesp1-dependent regulatory landscape required for progression through cardiogenesis.
    Keywords:  cardiac development; cardiac specification; gastrulation; gene regulation; mouse embryo
    DOI:  https://doi.org/10.1242/dev.201229
  9. Mol Cell. 2023 Mar 22. pii: S1097-2765(23)00166-1. [Epub ahead of print]
      Enhancer clusters overlapping disease-associated mutations in Pierre Robin sequence (PRS) patients regulate SOX9 expression at genomic distances over 1.25 Mb. We applied optical reconstruction of chromatin architecture (ORCA) imaging to trace 3D locus topology during PRS-enhancer activation. We observed pronounced changes in locus topology between cell types. Subsequent analysis of single-chromatin fiber traces revealed that these ensemble-average differences arise through changes in the frequency of commonly sampled topologies. We further identified two CTCF-bound elements, internal to the SOX9 topologically associating domain, which promote stripe formation, are positioned near the domain's 3D geometric center, and bridge enhancer-promoter contacts in a series of chromatin loops. Ablation of these elements results in diminished SOX9 expression and altered domain-wide contacts. Polymer models with uniform loading across the domain and frequent cohesin collisions recapitulate this multi-loop, centrally clustered geometry. Together, we provide mechanistic insights into architectural stripe formation and gene regulation over ultra-long genomic ranges.
    Keywords:  3D genome architecture; CTCF; enhancer; gene regulation; loop extrusion; stripe-associated structural element
    DOI:  https://doi.org/10.1016/j.molcel.2023.03.009
  10. Genome Biol. 2023 Mar 29. 24(1): 61
      Epigenetic modifications of histones are associated with development and pathogenesis of disease. Existing approaches cannot provide insights into long-range interactions and represent the average chromatin state. Here we describe BIND&MODIFY, a method using long-read sequencing for profiling histone modifications and transcription factors on individual DNA fibers. We use recombinant fused protein A-M.EcoGII to tether methyltransferase M.EcoGII to protein binding sites to label neighboring regions by methylation. Aggregated BIND&MODIFY signal matches bulk ChIP-seq and CUT&TAG. BIND&MODIFY can simultaneously measure histone modification status, transcription factor binding, and CpG 5mC methylation at single-molecule resolution and also quantifies correlation between local and distal elements.
    Keywords:  CTCF; CpG methylation; Epigenetics; H3K27me3; Histone modification; Methyltransferase; m6A
    DOI:  https://doi.org/10.1186/s13059-023-02896-y
  11. Elife. 2023 Mar 30. pii: e83810. [Epub ahead of print]12
      Transcription by RNA Polymerase II (Pol II) is initiated by the hierarchical assembly of the Pre-Initiation Complex onto promoter DNA. Decades of research have shown that the TATA-box binding protein (TBP) is essential for Pol II loading and initiation. Here, we report instead that acute depletion of TBP in mouse embryonic stem cells has no global effect on ongoing Pol II transcription. In contrast, acute TBP depletion severely impairs RNA Polymerase III initiation. Furthermore, Pol II transcriptional induction occurs normally upon TBP depletion. This TBP-independent transcription mechanism is not due to a functional redundancy with the TBP paralog TRF2, though TRF2 also binds to promoters of transcribed genes. Rather, we show that the TFIID complex can form and, despite having reduced TAF4 and TFIIA binding when TBP is depleted, the Pol II machinery is sufficiently robust in sustaining TBP-independent transcription.
    Keywords:  chromosomes; gene expression; mouse
    DOI:  https://doi.org/10.7554/eLife.83810
  12. Elife. 2023 Mar 27. pii: e79380. [Epub ahead of print]12
      Histone acetylation is a pivotal epigenetic modification that controls chromatin structure and regulates gene expression. It plays an essential role in modulating zygotic transcription and cell lineage specification of developing embryos. While the outcomes of many inductive signals have been described to require enzymatic activities of histone acetyltransferases and deacetylases (HDACs), the mechanisms by which HDACs confine the utilization of the zygotic genome remain to be elucidated. Here, we show that histone deacetylase 1 (Hdac1) progressively binds to the zygotic genome from mid blastula and onward. The recruitment of Hdac1 to the genome at blastula is instructed maternally. Cis-regulatory modules (CRMs) bound by Hdac1 possess epigenetic signatures underlying distinct functions. We highlight a dual function model of Hdac1 where Hdac1 not only represses gene expression by sustaining a histone hypoacetylation state on inactive chromatin, but also maintains gene expression through participating in dynamic histone acetylation-deacetylation cycles on active chromatin. As a result, Hdac1 maintains differential histone acetylation states of bound CRMs between different germ layers and reinforces the transcriptional program underlying cell lineage identities, both in time and space. Taken together, our study reveals a comprehensive role for Hdac1 during early vertebrate embryogenesis.
    Keywords:  developmental biology; xenopus
    DOI:  https://doi.org/10.7554/eLife.79380
  13. Mol Cell. 2023 Mar 24. pii: S1097-2765(23)00163-6. [Epub ahead of print]
      Nucleosomes drastically limit transcription factor (TF) occupancy, while pioneer transcription factors (PFs) somehow circumvent this nucleosome barrier. In this study, we compare nucleosome binding of two conserved S. cerevisiae basic helix-loop-helix (bHLH) TFs, Cbf1 and Pho4. A cryo-EM structure of Cbf1 in complex with the nucleosome reveals that the Cbf1 HLH region can electrostatically interact with exposed histone residues within a partially unwrapped nucleosome. Single-molecule fluorescence studies show that the Cbf1 HLH region facilitates efficient nucleosome invasion by slowing its dissociation rate relative to DNA through interactions with histones, whereas the Pho4 HLH region does not. In vivo studies show that this enhanced binding provided by the Cbf1 HLH region enables nucleosome invasion and ensuing repositioning. These structural, single-molecule, and in vivo studies reveal the mechanistic basis of dissociation rate compensation by PFs and how this translates to facilitating chromatin opening inside cells.
    Keywords:  chromatin biology; cryoelectron microscopy single-particle analysis; dissociation rate compensation mechanism; gene regulation; nucleosome-depleted regions; pioneer transcription factors; single-molecule measurement
    DOI:  https://doi.org/10.1016/j.molcel.2023.03.006
  14. Sci Adv. 2023 Mar 31. 9(13): eabo3789
      Cell fate transitions observed in embryonic development involve changes in three-dimensional genomic organization that provide proper lineage specification. Whether similar events occur within tumor cells and contribute to cancer evolution remains largely unexplored. We modeled this process in the pediatric cancer Ewing sarcoma and investigated high-resolution looping and large-scale nuclear conformation changes associated with the oncogenic fusion protein EWS-FLI1. We show that chromatin interactions in tumor cells are dominated by highly connected looping hubs centered on EWS-FLI1 binding sites, which directly control the activity of linked enhancers and promoters to establish oncogenic expression programs. Conversely, EWS-FLI1 depletion led to the disassembly of these looping networks and a widespread nuclear reorganization through the establishment of new looping patterns and large-scale compartment configuration matching those observed in mesenchymal stem cells, a candidate Ewing sarcoma progenitor. Our data demonstrate that major architectural features of nuclear organization in cancer cells can depend on single oncogenes and are readily reversed to reestablish latent differentiation programs.
    DOI:  https://doi.org/10.1126/sciadv.abo3789
  15. Genes Dev. 2023 Mar 29.
      Individual elements within a superenhancer can act in a cooperative or temporal manner, but the underlying mechanisms remain obscure. We recently identified an Irf8 superenhancer, within which different elements act at distinct stages of type 1 classical dendritic cell (cDC1) development. The +41-kb Irf8 enhancer is required for pre-cDC1 specification, while the +32-kb Irf8 enhancer acts to support subsequent cDC1 maturation. Here, we found that compound heterozygous Δ32/Δ41 mice, lacking the +32- and +41-kb enhancers on different chromosomes, show normal pre-cDC1 specification but, surprisingly, completely lack mature cDC1 development, suggesting cis dependence of the +32-kb enhancer on the +41-kb enhancer. Transcription of the +32-kb Irf8 enhancer-associated long noncoding RNA (lncRNA) Gm39266 is also dependent on the +41-kb enhancer. However, cDC1 development in mice remained intact when Gm39266 transcripts were eliminated by CRISPR/Cas9-mediated deletion of lncRNA promoters and when transcription across the +32-kb enhancer was blocked by premature polyadenylation. We showed that chromatin accessibility and BATF3 binding at the +32-kb enhancer were dependent on a functional +41-kb enhancer located in cis Thus, the +41-kb Irf8 enhancer controls the subsequent activation of the +32-kb Irf8 enhancer in a manner that is independent of associated lncRNA transcription.
    Keywords:  IRF8; dendritic cell development; enhancer cooperation; enhancer-associated lncRNA; superenhancer
    DOI:  https://doi.org/10.1101/gad.350339.122
  16. Nat Protoc. 2023 Mar 29.
      Micro Capture-C (MCC) is a chromatin conformation capture (3C) method for visualizing reproducible three-dimensional contacts of specified regions of the genome at base pair resolution. These methods are an established family of techniques that use proximity ligation to assay the topology of chromatin. MCC can generate data at substantially higher resolution than previous techniques through multiple refinements of the 3C method. Using a sequence agnostic nuclease, the maintenance of cellular integrity and full sequencing of the ligation junctions, MCC achieves subnucleosomal levels of resolution, which can be used to reveal transcription factor binding sites analogous to DNAse I footprinting. Gene dense regions, close-range enhancer-promoter contacts, individual enhancers within super-enhancers and multiple other types of loci or regulatory regions that were previously challenging to assay with conventional 3C techniques, are readily observed using MCC. MCC requires training in common molecular biology techniques and bioinformatics to perform the experiment and analyze the data. The protocol can be expected to be completed in a 3 week timeframe for experienced molecular biologists.
    DOI:  https://doi.org/10.1038/s41596-023-00817-8
  17. Nat Commun. 2023 Mar 28. 14(1): 1736
      Arabidopsis telomeric repeat binding factors (TRBs) can bind telomeric DNA sequences to protect telomeres from degradation. TRBs can also recruit Polycomb Repressive Complex 2 (PRC2) to deposit tri-methylation of H3 lysine 27 (H3K27me3) over certain target loci. Here, we demonstrate that TRBs also associate and colocalize with JUMONJI14 (JMJ14) and trigger H3K4me3 demethylation at some loci. The trb1/2/3 triple mutant and the jmj14-1 mutant show an increased level of H3K4me3 over TRB and JMJ14 binding sites, resulting in up-regulation of their target genes. Furthermore, tethering TRBs to the promoter region of genes with an artificial zinc finger (TRB-ZF) successfully triggers target gene silencing, as well as H3K27me3 deposition, and H3K4me3 removal. Interestingly, JMJ14 is predominantly recruited to ZF off-target sites with low levels of H3K4me3, which is accompanied with TRB-ZFs triggered H3K4me3 removal at these loci. These results suggest that TRB proteins coordinate PRC2 and JMJ14 activities to repress target genes via H3K27me3 deposition and H3K4me3 removal.
    DOI:  https://doi.org/10.1038/s41467-023-37263-9
  18. Mol Cancer Res. 2023 Mar 28. pii: MCR-22-0745. [Epub ahead of print]
      Mutations in Fms-like tyrosine kinase 3 (FLT3) are common drivers in acute myeloid leukemia (AML) yet FLT3 inhibitors only provide modest clinical benefit. Prior work has shown that inhibitors of lysine-specific demethylase 1 (LSD1) enhance kinase inhibitor activity in AML. Here we show that combined LSD1 and FLT3 inhibition induces synergistic cell death in FLT3-mutant AML. Multi-omic profiling revealed that the drug combination disrupts STAT5, LSD1, and GFI1 binding at the MYC blood super-enhancer, suppressing super-enhancer accessibility as well as MYC expression and activity. The drug combination simultaneously results in the accumulation of repressive H3K9me1 methylation, an LSD1 substrate, at MYC target genes. We validated these findings in 72 primary AML samples with the nearly every sample demonstrating synergistic responses to the drug combination. Collectively, these studies reveal how epigenetic therapies augment the activity of kinase inhibitors in FLT3-ITD AML. Implications: This work establishes the synergistic efficacy of combined FLT3 and LSD1 inhibition in FLT3-ITD AML by disrupting STAT5 and GFI1 binding at the MYC blood-specific super-enhancer complex.
    DOI:  https://doi.org/10.1158/1541-7786.MCR-22-0745
  19. Nat Biotechnol. 2023 Mar 27.
      Metacells are cell groupings derived from single-cell sequencing data that represent highly granular, distinct cell states. Here we present single-cell aggregation of cell states (SEACells), an algorithm for identifying metacells that overcome the sparsity of single-cell data while retaining heterogeneity obscured by traditional cell clustering. SEACells outperforms existing algorithms in identifying comprehensive, compact and well-separated metacells in both RNA and assay for transposase-accessible chromatin (ATAC) modalities across datasets with discrete cell types and continuous trajectories. We demonstrate the use of SEACells to improve gene-peak associations, compute ATAC gene scores and infer the activities of critical regulators during differentiation. Metacell-level analysis scales to large datasets and is particularly well suited for patient cohorts, where per-patient aggregation provides more robust units for data integration. We use our metacells to reveal expression dynamics and gradual reconfiguration of the chromatin landscape during hematopoietic differentiation and to uniquely identify CD4 T cell differentiation and activation states associated with disease onset and severity in a Coronavirus Disease 2019 (COVID-19) patient cohort.
    DOI:  https://doi.org/10.1038/s41587-023-01716-9
  20. Genome Biol. 2023 Mar 27. 24(1): 56
      BACKGROUND: The largest sequence-based models of transcription control to date are obtained by predicting genome-wide gene regulatory assays across the human genome. This setting is fundamentally correlative, as those models are exposed during training solely to the sequence variation between human genes that arose through evolution, questioning the extent to which those models capture genuine causal signals.RESULTS: Here we confront predictions of state-of-the-art models of transcription regulation against data from two large-scale observational studies and five deep perturbation assays. The most advanced of these sequence-based models, Enformer, by and large, captures causal determinants of human promoters. However, models fail to capture the causal effects of enhancers on expression, notably in medium to long distances and particularly for highly expressed promoters. More generally, the predicted impact of distal elements on gene expression predictions is small and the ability to correctly integrate long-range information is significantly more limited than the receptive fields of the models suggest. This is likely caused by the escalating class imbalance between actual and candidate regulatory elements as distance increases.
    CONCLUSIONS: Our results suggest that sequence-based models have advanced to the point that in silico study of promoter regions and promoter variants can provide meaningful insights and we provide practical guidance on how to use them. Moreover, we foresee that it will require significantly more and particularly new kinds of data to train models accurately accounting for distal elements.
    Keywords:  Deep learning; Enhancer; Gene expression; Promoter; Transcription; Variant effect
    DOI:  https://doi.org/10.1186/s13059-023-02899-9
  21. Mol Cell. 2023 Mar 16. pii: S1097-2765(23)00159-4. [Epub ahead of print]
      The expansion of introns within mammalian genomes poses a challenge for the production of full-length messenger RNAs (mRNAs), with increasing evidence that these long AT-rich sequences present obstacles to transcription. Here, we investigate RNA polymerase II (RNAPII) elongation at high resolution in mammalian cells and demonstrate that RNAPII transcribes faster across introns. Moreover, we find that this acceleration requires the association of U1 snRNP (U1) with the elongation complex at 5' splice sites. The role of U1 to stimulate elongation rate through introns reduces the frequency of both premature termination and transcriptional arrest, thereby dramatically increasing RNA production. We further show that changes in RNAPII elongation rate due to AT content and U1 binding explain previous reports of pausing or termination at splice junctions and the edge of CpG islands. We propose that U1-mediated acceleration of elongation has evolved to mitigate the risks that long AT-rich introns pose to transcript completion.
    Keywords:  CpG island; RNA polymerase II; U1 snRNP; co-transcriptional splicing; elongation factors; elongation rate; long genes; nascent RNA; sequence content; transcription regulation
    DOI:  https://doi.org/10.1016/j.molcel.2023.03.002