Other Variant Based Features

Summary


Distance From a Splice Junction

Previous studies have shown that a number of features including ESE density, evolutonary constraint and codon bias appear to be affected by proximity to the splice junctions.

This circumstantial evidence supports th view that the boundaries of exons are splicing regulatory 'hotspots' that may contain elements that are more critical to splicing than those located towards the center of the exon, possibly due to their proximity to the splice machinery positioned at the splice junctions.

When we compared known SAVs with HapMap SNPs we found an enrichment of SAVs located in regions at the peripheries of the exon, closer to the splice junctions (P=0.005), suggesting SAVs may act by affecting regulatory elements located in these splicing 'hotspots'.

Nevertheless, over a quarter of the SAVs are located within the central sections of the exon, suggesting that while variants located at the peripheries of the exon are likely to have the greatest effect on splicing, other elements important for splicing may be found at positions across the exon.

As SAVs tend to be enriched at the boundaries of exons, we provide distances from the splice junction as the 'minimum distance from the splice junction' which shows the closest distance a variant is located from either splice site. To normalize this value, so as to be comparable to variants in exons of widely different sizes, we also generate a value 'Minimum distance as proportion of exon length' which is the minimum distance divided by half the length of the exon (i.e. the distance equidistant from each splice junction). Variant values which are significantly lower than the mean for hSNPs are shaded red, while those significantly greater than the mean for hSNPs are shaded blue.


Regulatory Constraint (RC) Score

Splicing and the exonic splicing regulatory elements that control it are generally conserved across mammals and therefore sequences important for splicing should be detectable by greater evolutionary conservation; a case that is certainly true for intronic factors.

For splicing factors located in coding sequence the use of conservation to measure constraint is not as clear-cut. To be detectable, the constraint on the sequence due to splicing has to be decoupled from pre-existing protein-coding constraint. One solution is to measure conservation of synonymous positions normally considered to be neutrally evolving, and indeed a number of studies have demonstrated that exonic splicing regulatory elements increase selective constraint on synonymous positions.

In Skippy, we created a scoring function, the Regulatory Constraint (RC) Score which is a measure of the how likely the observed level of conservation is, in a region surrounding the variant, given the underlying codons. A multiple-alignment across 4 mammals (human, mouse, rat and dog) is extracted for a region surrounding the variant of interest and fully conserved columns are weighted (assigned a score) dependant on its status in the genetic code and how likely that position is to be conserved genome-wide. Therefore, for any variant position in a coding exon, a region of 5 bases flanking the variant is extracted from 4-way mammalian alignments (see Figure below). The triplet codon status of each column in the alignment is extracted and scored dependant on the conservation likelihood matrix (see Figure below). Scores are added up across all the columns and a mean for the region taken by dividing the total score by the total number of columns.

Click for larger version and description.

In our study, we found that our set of SAVs have significantly greater mean RC scores than a distribution of randomly samples sets of hSNPs of the same size (see Figure above).

It is important to note however that the RC score is influenced by other genomic contexts which you should be aware of when interpreting the score. The RC score is correlated with distance from a splice junction (see plot). This is in agreement with other studies that show that the edges of exons are generally better conserved than the interior, possibly due to splicing constraints.

In Skippy, we provide mean hSNP RC scores so a meaningful comparison of the score from your variant can be made against those of common polymorphisms. We control for the effect of distance from a splice junction by only comparing the variant to a distribution of hSNPs with similar distance proportions from the splice junction as the variant. Scores shaded red are greater than mean hSNP scores and scores shaded blue are lower than mean hSNP scores.