VIGNETTES: Single-sample Genotype Calling with the Whole Genome Sampling Assay

Date:
2008-06-22
This is an experimental workflow for generating genotype calls one sample at a time. Do your own due diligence in using this workflow and interpreting the results. One should not use this for final and optimal genotype calling, but rather for applications such as quality assurance including sample tracking where a single sample approach may be desirable from a workflow perspective. Again, DO NOT use these single sample genotype calls for your association study.

Contents

Introduction

Accuarate determination of SNP genotypes is best accomplished by clustering together samples in batches. The availability of multiple samples processed at the same time and under similar circumstances enables genotype clustering algorithms to estimate the positions and variances of genotype clusters - this is particularly important when the clusters are positioned away from the average locations.

That said, there are still situations where it is useful to call genotypes based on a single sample. Examples include situations where very few samples were processed, or when one wants to do a check of sample identity via a quick check on genotypes without going through a full clustering. When using the single-sample analysis workflow it is important to bear in mind the risk that the accuracy of the calls may be lower, perhaps substantially, than what is attainable via the recommended multiple sample clustering.

The single-sample workflow described here uses the BRLMM-P genotyping algorithm, which has two key enabling features - it has the capability to enable the user to generate a set of SNP-specific priors and it has a special single-samples calling mode.

Training the required parameters

Single-sample genotype calling requries three things

The target probe intensity distribution can be created with the --write-sketch option to apt-probeset-genotype - this option causes the quantile sketch derived to be written out to a file with filename suffix ".normalization-target.txt" in the output directory. Estimates of probe (feature) effects can also be generated with apt-probeset-genotype using the ----feat-effects option - the resulting feature effects are written to a file with suffix ".feature-response.txt" in the output directory.

To generate a set of SNP-specific priors, see the clustering without priors and building priors. Even if a set of priors is already available (for example, if you are using one of the catalog products) it may be useful to allow your own samples influence the priors by starting with the set of provided priors, doing a clustering run with your own samples, then saving the resulting posteriors to serve as priors for future runs. This can be accomplished with the --write-models option to apt-probeset-genotype.

An example of using the above three options in a run using the GenomeWideSNP_6 product is shown below.

apt-probeset-genotype \
  --cdf-file             lib/GenomeWideSNP_6.cdf \
  --special-snps         lib/GenomeWideSNP_6.specialSNPs \
  --read-models-brlmmp   lib/GenomeWideSNP_6.brlmm-p.models \
  --chrX-probes          lib/GenomeWideSNP_6.chrXprobes \
  --chrY-probes          lib/GenomeWideSNP_6.chrYprobes \
  --set-gender-method    cn-probe-chrXY-ratio \
  --analysis             brlmm-p-plus \
  --out-dir              single_sample_parameter_dir \
  --write-models \
  --write-sketch \
  --summaries \
  --feat-effects \
  --cel-files            my_cel_file_list.txt

Note that the analysis string should match that which was used for the models file supplied. In the case of the GenomeWideSNP_6 product there is a models file GenomeWideSNP_6.brlmm-p.models supplied as part of the library file package found on the product page, it was created with the same analysis string that is aliased to "brlmm-p-plus".

Genotype calling

Following on from the example of how to train the required parameters, here is an example of how to use them to perform single-sample genotype calling:

apt-probeset-genotype \
  --cdf-file             lib/GenomeWideSNP_6.cdf \
  --special-snps         lib/GenomeWideSNP_6.specialSNPs \
  --chrX-probes          lib/GenomeWideSNP_6.chrXprobes \
  --chrY-probes          lib/GenomeWideSNP_6.chrYprobes \
  --set-gender-method    cn-probe-chrXY-ratio \
  --analysis             quant-norm.sketch=50000,pm-only,brlmm-p.CM=2.bins=100.mix=1.bic=2.HARD=3.SB=0.45.KX=1.KH=1.5.KXX=0.5.KAH=-0.6.KHB=-0.6.transform=MVA.AAM=2.0.BBM=-2.0.AAV=0.06.BBV=0.06.ABV=0.06.copyqc=0.000001.wobble=0.05.MS=0.05 \
  --out-dir              my_output_dir \
  --use-feat-eff         single_sample_parameter_dir/brlmm-p-plus.plier.feature-response.txt  \
  --target-sketch        single_sample_parameter_dir/quant-norm.normalization-target.txt  \
  --read-models-brlmmp   single_sample_parameter_dir/brlmm-p-plus.snp-posteriors.txt \
  --cel-files            my_cel_file_list.txt

Note that the long analysis string is the same as the analysis string aliased to "brlmm-p-plus" as used in training the SNP-specific priors, but with the critical distinction that "CM=2". CM stands for Calling Method and setting it to 2 enables the single-sample calling mode.

As a test you can analyze a collection of CEL files, save the feature effects, quantile sketch and SNP-specific priors, and then recall all the samples in single-sample mode. This should result in the same genotype calls for both modes of calling, with some small amount of difference due to the rounding involved when the parameters are written out to text files.

Affymetrix Power Tools (APT) Release apt-1.10.1

Generated on Mon Nov 3 12:21:42 2008 for Affymetrix Power Tools by  doxygen 1.5.3