Although many driver mutations are thought to promote carcinogenesis via abnormal splicing, the landscape of splicing-associated variants (SAVs) remains unknown due to the complexity of splicing abnormalities. Here, we developed a statistical framework to systematically identify SAVs disrupting or newly creating splice site motifs and applied it to matched whole-exome and transcriptome sequencing data from 8976 samples across 31 cancer types, generating a catalog of 14,438 SAVs. Such a large collection of SAVs enabled us to characterize their genomic features, underlying mutational processes, and influence on cancer driver genes. In fact, ∼50% of SAVs identified were those disrupting noncanonical splice sites (non-GT-AG dinucleotides), including the third and fifth intronic bases of donor sites, or newly creating splice sites. Mutation signature analysis revealed that tobacco smoking is more strongly associated with SAVs, whereas ultraviolet exposure has less impact. SAVs showed remarkable enrichment of cancer-related genes, and as many as 14.7% of samples harbored at least one SAVs affecting them, particularly in tumor suppressors. In addition to intron retention, whose association with tumor suppressor inactivation has been previously reported, exon skipping and alternative splice site usage caused by SAVs frequently affected tumor suppressors. Finally, we described high-resolution distributions of SAVs along the gene and their splicing outcomes in commonly disrupted genes, including TP53, PIK3R1, GATA3, and CDKN2A, which offers genetic clues for understanding their functional properties. Collectively, our findings delineate a comprehensive portrait of SAVs, novel insights into transcriptional de-regulation in cancer.
ASJC Scopus subject areas