QIMR Genetic Epidemiology Laboratory Home > Dale's Homepage > ssSNPer web interface

ssSNPer: web interface to identify statistically similar SNPs (ssSNPs) in the HapMap
 

Please Note:  this web-interface will typically analyse files up to ~10 Megabytes in size. This limit is primarily due to the web server restrictions we have in place. That is, the server will time out before large files have finished uploading. If users wish to run a ssSNPer analysis on larger files, please try the Linux command line ssSNPer [updated 29/03/07].

This web interface allows users to simply upload a file specifying a test SNP rs# and a file containing HapMap SNP genotype data to determine the number and location of SNPs in the surrounding region which are statistically similar and could therefore produce similar (including more significant) association results.

Statistical similarity is measured by the pairwise linkage disequilibrium (LD) r2 coefficient.  The output first includes a graph providing an overview of the statistical similarity between the test SNP and surrounding HapMap SNPs.  Below the graph is a list of the surrounding SNPs which have at least 50% similarity [i.e., r2>=0.5] first sorted from highest to lowest r2 and then according to map order [i.e., from pter to qter].  Map distances are given as bp from the test SNP. Clicking on the rs# opens the SNP's dbSNP Cluster Report in a new window.  The final section reports the r2 values between the test SNP and all HapMap SNPs across the submitted region in map order; this data is suitable for plotting along with other map landmarks (e.g., genes, conserved regions, etc).

SNPs with r2 values of 1.0 are "statistically indistinguishable" (siSNPs) (also previously termed "genetically indistinguishable", giSNPs) and will produce identical association results.  SNPs with r2 values >=0.8 can be considered to be of high enough similarity [i.e., SNP 1 has 80% power to predict the genotypes of SNP 2 and vice versa] to produce similar and perhaps more significant association results compared to the test SNP.  Therefore, researchers should consider investigating ssSNPs (especially with r2>=0.8) and their surrounding plausibly causative region(s) in addition to the region(s) initially implicated by the test SNP.

The maximum HapMap genotype data dump file region size is 5 Mb via the HapMap database "Browse Project Data" online web interface.  Although, it is unlikely users will want to examine larger region sizes [i.e., significant  LD will not extend over such distances], the ssSNPer web interface should easily accommodate larger datasets encompassing 10,000s of SNPs.

ssSNPer is also useful for choosing replacement SNPs in the late stages of association study design.  More specifically, the ssSNPs identified can be preferentially chosen with respect to their r2 values to replace previously selected SNPs which do not fulfil particular assay design requirements.

Please note: to allow maximum flexibility of user-supplied HapMap data [i.e., either unrelated, trio or a mixture of unrelated and trio data], ssSNPer calculates r2 values using ALL individuals and ignoring familial relationships.  This approach may result in negligible differences in r2 values (up to ~0.05) calculated from trio data, where only founder genotypes are typically utilised.

Please use the following reference when reporting results obtained via this web interface:
Nyholt DR (2006) ssSNPer: identifying statistically similar SNPs to aid interpretation of genetic association studies. Bioinformatics 22(23):2960-2961.

Some user may also be interested in a novel application of a modified command line ssSNPer which was used to identify Minor Histocompatibility Antigens (mHags), for more details link to http://www.umcutrecht.nl/subsite/dcch/Research/Hemato-Oncology/Identification-of-novel-GvT-associated-minor-H-antigens.htm
 

The ssSNPer interface takes two files as input:

1) a plain [ascii/ansi] text file containing the HapMap SNP rs# you wish to investigate (e.g., named "rs1800630snp.txt").
[NB: this file should contain only ONE line listing the SNP rs# (e.g., "rs1800630").]

2) a HapMap SNP genotype data dump file for the region surrounding the SNP you wish to investigate (e.g., named "rs1800630dumpedregion.txt").
[NB: See below for step-by-step instruction for obtaining this file.]

These files are run through a modified version of Gonçalo Abecasis' LDMAX program - part of the GOLD (Command Line Tools) package [gold-1.1.0.tar.gz].  LDMAX calculates r2 values from haplotype frequencies estimated via the expectation-maximization (EM) algorithm of Slatkin and Excoffier (1995), Mol Biol Evol 12:921-7.

The following links show example input files:
rs1800630snp.txt
rs1800630dumpedregion.txt

The following link shows the output obtained using the example input files:
rs1800630results.pdf

To run a detailed ssSNPer analysis on a single SNP:

Specify the location of your file containing the test SNP rs# (e.g., "rs1800630snp.txt"):

Specify the location of your HapMap SNP genotype data dump file (e.g., "rs1800630dumpedregion.txt"):

Please be patient, our web server may be busy and depending upon the size of your test region your query may take a few minutes (the example files take ~90 seconds).

To run a less detailed ssSNPer analysis on a list of SNPs:
Output is restricted to section listing ssSNPs with r2>=0.5

Specify the location of your file containing the test SNP rs# list (e.g., rs1800630snplist.txt):

Specify the location of your HapMap SNP genotype data dump file (e.g., "rs1800630dumpedregion.txt"):

Please be patient, our web server may be busy and depending upon the number of testSNP and size of your test region your query may take a few minutes (the example files take ~3 minutes).

 

Step-by-step instructions for downloading HapMap SNP genotype data:

1) Go to the International HapMap Project webpage at http://www.hapmap.org/


 

2) Follow the "Browse Project Data" link (http://www.hapmap.org/cgi-perl/gbrowse/gbrowse) under the "Project Data" heading listed on the left.
 

3) Enter the SNP rs# (e.g., "rs1800630") you wish to investigate in the Landmark or Region box. You should also make sure to use the latest data release available in the drop down Data Source window menu.

4) Click the Search button.


 

IMPORTANT: the SNP for which you wish to identify ssSNPs (test SNP) must be listed under "Genotyped SNPs" (as shown above for rs1800630) - this indicates that HapMap SNP genotype data exists and in which population(s). [i.e., if your test SNP does not have HapMap SNP genotype data available, then you (obviously) cannot calculate its' pairwise r2 with surrounding SNPs.]
 

5) Select the size of region surrounding the SNP you wish to investigate via the Scroll/Zoom window menu (e.g., to investigate 500 kb either side select "Show 1 Mbp"). Size Options include {21 bp, 100 bp, 500 bp, 1 kbp, 2 kbp, 5 kbp, 10 kbp, 20 kbp, 40 kbp, 100kbp, 200 kbp, 750 kbp, 1 Mbp, 2 Mbp, 5 Mbp}.
 

6) Making sure "Download SNP genotype data" is selected in the Reports & Analysis drop down window menu, click the Configure button.

7) Select the Population you wish to investigate and the preferred method of data download (I suggest "Save to Disk").
 

8) Click the Go button.
 


 

9) Click the "Save" button and specify the file name (e.g., "rs1800630dumpedregion.txt") and location you wish to save the file. The downloaded file is suitable for analysis via the ssSNP www interface.
 

Special thanks to David Smyth for invaluable assistance in debugging this web interface.

Page last updated March 26, 2006.


 
Tel: +61-7-3362 0258 Find Us
The Genetic Epidemiology Laboratory
Email: daleN@qimr.edu.au Contact Us
The Queensland Institute of Medical Research