= BAFsegmentation = BAFsegmentation is a method to identify regions of allelic imbalance from B allele frequencies obtained from SNP arrays described in ''Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays'' [[BR]] J. Staaf, D. Lindgren, J. Vallon-Christersson, A. Isaksson, H. Göransson, G. Juliusson, R. Rosenquist, M. Höglund, Å. Borg, M. Ringnér [[BR]] ''Genome Biology'' '''9''':R136 (2008)[[BR]] [http://genomebiology.com/2008/9/9/R136/abstract Abstract] [http://genomebiology.com/2008/9/9/R136 Full text] === News === * Jun. 1, 2010. BAFsegmentation 1.2.0 release. Added support for paired tumor-normal samples. * Mar. 13, 2009. BAFsegmentation 1.1.2 release. This is a minor bug-fix release. Fixed bugs include errors in documentation (default has never been to remove cnv probes), handling of when there are no probes for entire chromosomes, and 'split_samples.pl' can now handle language settings for which !BeadStudio generates files in which comma denotes the decimal point. * Feb. 6, 2009. BAFsegmentation 1.1.1 release. This is a minor bug-fix release fixing a bug in bookmark files generated for import into !BeadStudio. * Oct. 21, 2008. The experimental dilution series is now also available as a !BeadStudio project and as a tab-delimited table exported from !BeadStudio. See supplemental data below. * Oct. 17, 2008. BAFsegmentation 1.1.0 released. New features include: * Improved plotting, including a new across assays plot of regions of allelic imbalance for each chromosome and whole genome plots for each assay. * Improved removal of noisy and non-informative homozygous SNPs. * Support for segmenting BAF data normalized with [wiki:se.lu.onk.IlluminaSNPNormalization tQN] (tQN version 1.1.0 or higher). === Future plans === * Adaptation of BAFsegmentation to samples having 100% tumor content, for example, cell-lines. * Integration of log R ratio into calling of regions and their type. === License === The BAFsegmentation software is available as a stand-alone software package, and will become available as as a plug-in to BASE as the handling of SNP arrays in BASE is developed. Both versions are available under the [http://www.gnu.org/copyleft/gpl.html GNU General Public License]. === Download BAFsegmentation === [http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/BAFsegmentation-1.2.0.zip?format=raw Download the latest stand-alone release (BAFsegmentation 1.2.0).] === Supplemental Data === * The plots referred to in additional data file 4 in the manuscript are available [http://cbbp.thep.lu.se/~markus/publications/papers/BAFsegmentation_supplemental_data.zip here]. * The simulated data set used in the publication is available [http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/SimulatedTumorData.zip?format=raw here]. * Infinium data for four matched tumor-normal pairs and a dilution series of a tumor cell line mixed with its paired normal cell line are available in NCBI's Gene Expression Omnibus with accession [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11976 GSE11976]. * The Infinium data for the dilution series of a tumor cell line mixed with its paired normal cell line are also available as a [http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/CRL2324_dilutionSeries_BeadStudio.zip?format=raw BeadStudio project] and as a [http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/CRL2324_dilutionSeries_TableExport.zip?format=raw tab-delimited text file] exported from Beadstudio. === How to use BAFsegmentation === ''Recommendations'' For Infinium data, we recommend using BAFsegmentation with data normalized using [wiki:se.lu.onk.IlluminaSNPNormalization tQN]. BAFsegmentation benefits from the symmetrical B allele frequencies obtained with tQN. ''Requirements'' BAFsegmentation is written in R with a Perl wrapper, so both R and Perl are required. Required Perl modules are: File::Spec, Getopt::Long, IO::File and Pod::Usage (http://www.cpan.org). Required R package is DNAcopy, recommended version is 1.14.0 (http://www.bioconductor.org). ''Installation'' Download and unzip the file available under the section ''Download BAFsegmentation'' on this page. OS X or Linux: The programs should run as they are. You need R and perl in your path. Windows: Depending on how you have installed R and Perl on your system you may have to edit the variables ''$R_command'' and ''$R_windows'' at the beginning of the file ''BAF_segment_samples.pl''. ''$R_windows'' should likely contain the full path to the R script interpreter on your system. Also comment out (with an initial #) the ''$R_command'' used on OS X and Linux systems. For example, we have successfully used BAFsegmentation using !ActivePerl on a Windows system with the following ''$R_windows'' and ''$R_command'': {{{ ### # Mac OS X and Linux #my $R_command="R --vanilla --no-save --slave < BAF_segment.R"; ### # Windows # Note that we are using ''Rscript'', which is a part of the R distribution. my $R_windows=File::Spec->canonpath('C:/"Program Files"/R/R-2.7.0/bin/Rscript'); my $R_command="$R_windows --vanilla BAF_segment.R"; }}} ''Input data format'' __Unpaired samples__ BAFsegmentation is applied to data for a set of samples in a file that should be tab-delimited in the following format: ||Name||Chr||Position||sample1.B Allele Freq||sample1.Log R Ratio||sample2.B Allele Freq||sample2.Log R Ratio||sample3.B Allele Freq||sample3.Log R Ratio||...|| ||rs12354060||1||10004||1||0.110391||1||-0.05188531||1||0.07706165||...|| ||rs2691310||1||46844||0.5519782||0.2984372||0.4636427||0.3640218||0.4393658||0.2589271||...|| ||...||...||...||...||...||...||...||...||...||...|| For Illumina arrays data can be exported in this format directly from !BeadStudio. The data need to be split into a separate file for each sample using the script ''split_samples.pl''. In the BAFsegmentation directory, run ''split_samples.pl'' with the following command: {{{ perl split_samples.pl --data_file=example/example_beadstudio_data.txt }}} where ''example_beadstudio_data.txt'' is a file exported from !BeadStudio in the format described above. With some language settings !BeadStudio exports files with commas (,) as decimal points; 'split_samples.pl' replaces all commas (,) in data columns with points (.) in the 'split_samples.pl' results files. This script will generate one file per sample together with a file ''sample_names.txt'' in the BAFsegmentation subdirectory ''extracted''. These files are used when BAFsegmentation is run and can be deleted once the samples are segmented. __Paired tumor-normal samples__ BAFsegmentation can be applied to paired tumor-normal samples essentially as for unpaired samples. The main difference is that the genotypes for the normal samples are required. BAFsegmentation is applied to data for a set of paired tumor-normal samples in a file that should be tab-delimited in the following format: ||Name||Chr||Position||sample1.GType || sample1.B Allele Freq||sample1.Log R Ratio||sample2.GType||sample2.B Allele Freq||sample2.Log R Ratio||...|| ||rs12354060||1||10004||AA||1||0.110391||AB||1||-0.05188531||...|| ||rs2691310||1||46844||AB||0.5519782||0.2984372||BB||0.4636427||0.3640218||...|| ||...||...||...||...||...||...||...||...||...||...|| For Illumina arrays data can be exported in this format directly from !BeadStudio. The data need to be split into a separate file for each sample using the script ''split_samples.pl''. In the BAFsegmentation directory, run ''split_samples.pl'' with the following command: {{{ perl split_samples.pl --data_file=example/example_beadstudio_paired_data.txt }}} where ''example_beadstudio_paired_data.txt'' is a file exported from !BeadStudio in the format described above. With some language settings !BeadStudio exports files with commas (,) as decimal points; 'split_samples.pl' replaces all commas (,) in data columns with points (.) in the 'split_samples.pl' results files. This script will generate one file per sample together with a file ''sample_names.txt'' in the BAFsegmentation subdirectory ''extracted''. These files are used when BAFsegmentation is run and can be deleted once the samples are segmented. __Mixing paired and unpaired samples__ ''Performing BAFsegmentation'' In the BAFsegmentation directory, run BAFsegmentation with the following command: {{{ perl BAF_segment_samples.pl }}} This command will perform BAFsegmentation on the samples in the BAFsegmentation subdirectory ''extracted'' that are specified in the file ''sample_names.txt''. If you want to perform BAFsegmentation on a subset of samples you can edit ''sample_names.txt'' accordingly. Note that BAFsegmentation requires points (.) for decimal points. BAFsegmentation can be run with different settings. To get an overview of parameters run BAFsegmentation with the following command: {{{ perl BAF_segment_samples.pl --help }}} To run BAFsegmentation on data normalized with tQN use the following command: {{{ perl BAF_segment_samples.pl --input_directory=path/to/tQN/normalized }}} where ''path/to/tQN'' is the path to your tQN directory in which you have a directory ''normalized'' with your tQN normalized data. Note that tQN is used with X and Y intensities. Please look at [wiki:se.lu.onk.IlluminaSNPNormalization tQN] for further instructions on how to use tQN and prepare your data for use with tQN. ''Results'' * The segmented regions identified as allelic imbalance are stored in the file ''AI_regions.txt'' in the BAFsegmentation subdirectory ''segmented''. In addition an xml-file ''AI_regions.xml'' with the regions is also produced. This xml-file can be imported as a bookmark file into the Illumina !BeadStudio software for visualization and further analysis of the identified regions. * In the BAFsegmentation subdirectory ''plots'', the following postscript files are generated: * A file for each sample with three plots per chromosome: a BAF plot with non-informative homozygous SNPs removed, an mBAF plot with non-informative homozygous SNPs removed and with superimposed segmentation line, and a log R ratio plot with all SNPs with average log R ratios within mBAF segments superimposed. * A file for each sample with two plots for the whole genome: a plot with segmented mBAF and a plot with average log R ratios within mBAF segments. * A file with a plot for each chromosome of regions of allelic imbalance across all assays. === Contact === If you have comments please send an email to johan.staaf ...at... med.lu.se