= Illumina SNP Normalization = tQN is a strategy using quantile normalization to improve the quality of data from Illumina Infinium Whole-Genome Genotyping SNP Beadchips described in ''Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios''[[BR]] J. Staaf, J. Vallon-Christersson, D. Lindgren, G. Juliusson, R. Rosenquist, M. Höglund, Å. Borg, and M. Ringnér[[BR]] ''BMC Bioinformatics'' '''9''':409 (2008)[[BR]] [http://www.biomedcentral.com/1471-2105/9/409/abstract Abstract] [http://www.biomedcentral.com/1471-2105/9/409 Full text] === News === * Sept. 29, 2009. Added support for additional beadchip. tQN normalized cluster file for human1M-omnia based on Illumina's !HapMap data set has been released. Download [http://cbbp.thep.lu.se/~markus/software/tQN/human1M-omnia_tQN_clusters.zip?format=raw here]. * Sept. 11, 2009. Added support for additional beadchip. tQN normalized cluster file for human660w-quad based on Illumina's !HapMap data set has been released. Download [http://cbbp.thep.lu.se/~markus/software/tQN/human660w-quad_tQN_clusters.zip?format=raw here]. * Mar. 13, 2009. tQN 1.1.2 released. This is a minor bug-fix release. 'split_samples.pl' has been modified to replace commas (,) with points(.) in data columns as !BeadStudio exports data with commas for some language settings, which caused problems for tQN. * Jan. 15, 2009. tQN 1.1.1 released. This is a minor bug-fix release to handle not only NaN but also NA for missing values in files from !BeadStudio. !BeadStudio only outputs NaN but reformatting in for example Excel may change NaN to NA, which previously resulted in erroneous results from tQN. * Dec. 16, 2008. Added support for an additional beadchip. tQN normalized cluster files for human1M-duo based on Illumina's !HapMap data set has been released. * Dec. 3, 2008. Added support for additional beadchips. tQN normalized cluster files for humancnv370-quad and human610-quad based on Illumina's !HapMap data sets have been released. * Oct. 17, 2008. tQN 1.1.0 released. New features include support for output format for [wiki:se.lu.onk.BAFsegmentation BAFsegmentation] and changed BAF and log R ratio calculations for SNPs having only homozygous clusters in !HapMap samples. === Future plans === * Please provide feedback. === License === The tQN software is available as a stand-alone software package, and will become available as as a plug-in to BASE as the handling of SNP arrays in BASE is developed. Both versions are available under the [http://www.gnu.org/copyleft/gpl.html GNU General Public License]. === Download tQN === [http://cbbp.thep.lu.se/~markus/software/tQN/tQN-1.1.2.zip?format=raw Download the latest stand-alone release (tQN-1.1.2)]. This release includes tQN cluster files for humanhap300, humanhap300-duo, humancnv370-duo, humancnv370-quad, humanhap550, human610-quad, and human1M-duo. [http://cbbp.thep.lu.se/~markus/software/tQN/human660w-quad_tQN_clusters.zip?format=raw Download tQN cluster file for human660w-quad]. This additional cluster file is not part of the tQN-1.1.2 release. Unzip and put the resulting txt-file in the subdirectory ''lib'' in your tQN installation if you want to use tQN with human660w-quad beadchips. [http://cbbp.thep.lu.se/~markus/software/tQN/human1M-omnia_tQN_clusters.zip?format=raw Download tQN cluster file for human1M-omnia]. This additional cluster file is not part of the tQN-1.1.2 release. Unzip and put the resulting txt-file in the subdirectory ''lib'' in your tQN installation if you want to use tQN with human1M-omnia beadchips. === Supplemental data === * Infinium data for the 6 breast cancers hybridized on Illumina !HumanHap 550K !BeadChips used to evaluate tQN are available in NCBI's Gene Expression Omnibus with accession [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11977 GSE11977]. === How to use tQN === ''Requirements'' tQN is written in R with a Perl wrapper, so both R and Perl are required. Required Perl modules are: File::Spec, Getopt::Long, IO::File and Pod::Usage (http://www.cpan.org). Required R package is limma (http://www.bioconductor.org). ''Installation'' Download and unzip the file available under the section ''Download tQN'' on this page. OS X or Linux: The programs should run as they are. You need R and perl in your path. Windows: Depending on how you have installed R and Perl on your system you may have to edit the variables ''$R_command'' and ''$R_windows'' at the beginning of the file ''tQN_normalize_samples.pl''. ''$R_windows'' should likely contain the full path to the R script interpreter on your system. Also comment out (with an initial #) the ''$R_command'' used on OS X and Linux systems. For example, we have successfully used tQN using !ActivePerl on a Windows system with the following ''$R_windows'' and ''$R_command'': {{{ # Mac OS X and Linux #my $R_command="R --vanilla --no-save --slave < tQN.R"; ### # Windows # Note that we are using Rscript, which is a part of the R distribution. my $R_windows=File::Spec->canonpath('C:/"Program Files"/R/R-2.7.0/bin/Rscript'); my $R_command="$R_windows --vanilla tQN.R"; }}} ''Input data format'' tQN is applied to data exported from !BeadStudio. For a set of samples, the file exported from !BeadStudio should be tab-delimited in the following format: ||Name||Chr||Position||sample1.X||sample1.Y||sample2.X||sample2.Y||sample3.X||sample3.Y||...|| ||rs12354060||1||10004||0.04424883||1.818238||0.03157751||1.632767||0.04973672||1.770216||...|| ||rs2691310||1||46844||0.7046126||1.305445||0.8322142||1.271329||0.8042333||1.151523||...|| ||...||...||...||...||...||...||...||...||...||...|| The data extracted from !BeadStudio needs to be split into a separate file for each sample using the script ''split_samples.pl''. {{{ perl split_samples.pl --data_file=example/example_beadstudio_data.txt }}} where ''example_beadstudio_data.txt'' is a file exported from !BeadStudio in the format described above. With some language settings !BeadStudio exports files with commas (,) as decimal points; 'split_samples.pl' replaces all commas (,) in data columns with points (.) in the 'split_samples.pl' results files. This script will generate one file per sample together with a file ''sample_names.txt'' in the tQN subdirectory ''extracted''. These files are used when tQN is run and can be deleted once the samples are normalized. ''Performing tQN'' In the tQN directory, run tQN with the following command: {{{ perl tQN_normalize_samples.pl --beadchip=humancnv370-duo }}} This command will perform tQN on the samples in the tQN subdirectory ''extracted'' that are specified in the file ''sample_names.txt''. If you want to perform tQN on a subset of samples you can edit ''sample_names.txt'' accordingly. tQN requires that points (.) are used as decimal points. The normalized data is stored in the tQN subdirectory ''normalized''. For each sample, there is a file with tQN normalized data. A file ''tQN_beadstudio.txt'' is also generated with tQN B allele frequencies and Log R ratios for all samples in a format suitable for import into !BeadStudio using its import sub-column process. tQN also supports generating tQN data for further analysis with PennCNV, QuantiSNP and BAFsegmentation. Running tQN with the following command: {{{ perl tQN_normalize_samples.pl --beadchip=humancnv370-duo --output_format=PennCNV }}} generates one data file per sample in the tQN subdirectory ''normalized'' for further analysis using PennCNV. Alternatives for ''--output_format'' are ''QuantiSNP'', which generates one data file per sample for further analysis with QuantiSNP, ''BAFsegmentation'', which generates files for further analysis with BAFsegmentation, and ''!BeadStudio'', which is the default argument generating the default ''tQN_beadstudio.txt'' file with data for all samples. Beadchip types for which there is a cluster file in the tQN subdirectory ''lib'' are supported by tQN and can be used as alternatives for ''--beadchip''. For PennCNV and QuantiSNP, SNPs having missing values in either B allele frequencies or log R ratios after normalization are excluded from the respective output files. ''tQN B allele frequencies and log R ratios'' CNV probes are not normalized by tQN; for these probes the original X and Y intensities from !BeadStudio are kept. tQN may result in a slightly smaller number of SNPs having data. The reason for this reduction is that some SNPs have not been genotyped in the !HapMap samples used to generate the cluster files. A detailed description of how tQN X and Y intensities are turned into tQN BAF and tQN log R ratios is available [http://cbbp.thep.lu.se/~markus/software/tQN/ReadMe_calculation_BAF_Log_R_Ratio.pdf?format=raw here]. ''Supported beadchip types'' tQN cluster files for additional beadchip types can be generated upon request. === Contact === If you have suggestions, comments or bug reports, please send an email to johan.staaf@med.lu.se