= BAFsegmentation = BAFsegmentation is a method to identify regions of allelic imbalance from B allele frequencies obtained from SNP arrays described in ''Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays''[[BR]]J. Staaf, D. Lindgren, J. Vallon-Christersson, A. Isaksson, H. Göransson, G. Juliusson, R. Rosenquist, M. Höglund, Å. Borg, M. Ringnér[[BR]]''Genome Biology'' '''9''':R136 (2008)[[BR]][http://genomebiology.com/2008/9/9/R136/abstract Abstract][http://genomebiology.com/2008/9/9/R136 Full text] === News === * Dec. 12, 2011. We are currently working on BAFsegmentation 2.0, in which the main aim is improving results for noisier samples and samples with high purity (e.g. cell lines). We have published a new paper, Staaf et al. Breast Cancer Research, 2011 "Landscape of somatic allelic imbalances and copy number alterations in HER2-amplified breast cancer", for which we have made a pre-release with 2.0 functionality available: [#pre-release BAFsegmentation 2.0pre]. * Jun. 1, 2010. BAFsegmentation 1.2.0 release. Added support for analysing paired tumor-normal samples. * Mar. 13, 2009. BAFsegmentation 1.1.2 release. This is a minor bug-fix release. Fixed bugs include errors in documentation (default has never been to remove cnv probes), handling of when there are no probes for entire chromosomes, and 'split_samples.pl' can now handle language settings for which !BeadStudio generates files in which comma denotes the decimal point. * Feb. 6, 2009. BAFsegmentation 1.1.1 release. This is a minor bug-fix release fixing a bug in bookmark files generated for import into !BeadStudio. * Oct. 21, 2008. The experimental dilution series is now also available as a !BeadStudio project and as a tab-delimited table exported from !BeadStudio. See supplemental data below. * Oct. 17, 2008. BAFsegmentation 1.1.0 released. New features include: * Improved plotting, including a new across assays plot of regions of allelic imbalance for each chromosome and whole genome plots for each assay. * Improved removal of noisy and non-informative homozygous SNPs. * Support for segmenting BAF data normalized with [wiki:se.lu.onk.IlluminaSNPNormalization tQN] (tQN version 1.1.0 or higher). === Future plans === * Adaptation of BAFsegmentation to samples having 100% tumor content, for example, cell-lines. * Integration of log R ratio into calling of regions and their type. === License === The BAFsegmentation software is available as a stand-alone software package, and will become available as as a plug-in to BASE as the handling of SNP arrays in BASE is developed. Both versions are available under the [http://www.gnu.org/copyleft/gpl.html GNU General Public License]. === Download BAFsegmentation === [http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/BAFsegmentation-1.2.0.zip?format=raw Download the latest stand-alone release (BAFsegmentation 1.2.0).] === Supplemental Data === * The plots referred to in additional data file 4 in the manuscript are available [http://cbbp.thep.lu.se/~markus/publications/papers/BAFsegmentation_supplemental_data.zip here]. * The simulated data set used in the publication is available [http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/SimulatedTumorData.zip?format=raw here]. * Infinium data for four matched tumor-normal pairs and a dilution series of a tumor cell line mixed with its paired normal cell line are available in NCBI's Gene Expression Omnibus with accession [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11976 GSE11976]. * The Infinium data for the dilution series of a tumor cell line mixed with its paired normal cell line are also available as a [http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/CRL2324_dilutionSeries_BeadStudio.zip?format=raw BeadStudio project] and as a [http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/CRL2324_dilutionSeries_TableExport.zip?format=raw tab-delimited text file] exported from Beadstudio. === How to use BAFsegmentation === ''Recommendations'' For Infinium data, we recommend using BAFsegmentation with data normalized using [wiki:se.lu.onk.IlluminaSNPNormalization tQN]. BAFsegmentation benefits from the symmetrical B allele frequencies obtained with tQN. ''Requirements'' BAFsegmentation is written in R with a Perl wrapper, so both R and Perl are required. Required Perl modules are: File::Spec, Getopt::Long, IO::File and Pod::Usage (http://www.cpan.org). Required R package is DNAcopy, recommended version is 1.14.0 (http://www.bioconductor.org). ''Installation'' Download and unzip the file available under the section ''Download BAFsegmentation'' on this page. OS X or Linux: The programs should run as they are. You need R and perl in your path. Windows: Depending on how you have installed R and Perl on your system you may have to edit the variables ''$R_command'' and ''$R_windows'' at the beginning of the file ''BAF_segment_samples.pl''. ''$R_windows'' should likely contain the full path to the R script interpreter on your system. Also comment out (with an initial #) the ''$R_command'' used on OS X and Linux systems. For example, we have successfully used BAFsegmentation using !ActivePerl on a Windows system with the following ''$R_windows'' and ''$R_command'': {{{ ### # Mac OS X and Linux #my $R_command="R --vanilla --no-save --slave < BAF_segment.R"; ### # Windows # Note that we are using ''Rscript'', which is a part of the R distribution. my $R_windows=File::Spec->canonpath('C:/"Program Files"/R/R-2.7.0/bin/Rscript'); my $R_command="$R_windows --vanilla BAF_segment.R"; }}} ''Input data format'' __Unpaired samples__ BAFsegmentation is applied to data for a set of samples in a file that should be tab-delimited in the following format: ||Name||Chr||Position||sample1.B Allele Freq||sample1.Log R Ratio||sample2.B Allele Freq||sample2.Log R Ratio||sample3.B Allele Freq||sample3.Log R Ratio||...|| ||rs12354060||1||10004||1||0.110391||1||-0.05188531||1||0.07706165||...|| ||rs2691310||1||46844||0.5519782||0.2984372||0.4636427||0.3640218||0.4393658||0.2589271||...|| ||...||...||...||...||...||...||...||...||...||...|| For Illumina arrays data can be exported in this format directly from !BeadStudio. The data need to be split into a separate file for each sample using the script ''split_samples.pl''. In the BAFsegmentation directory, run ''split_samples.pl'' with the following command: {{{ perl split_samples.pl --data_file=example/example_beadstudio_data.txt }}} where ''example_beadstudio_data.txt'' is a file exported from !BeadStudio in the format described above. With some language settings !BeadStudio exports files with commas (,) as decimal points; 'split_samples.pl' replaces all commas (,) in data columns with points (.) in the 'split_samples.pl' results files. This script will generate one file per sample together with a file ''sample_names.txt'' in the BAFsegmentation subdirectory ''extracted''. These files are used when BAFsegmentation is run and can be deleted once the samples are segmented. __Paired tumor-normal samples__ BAFsegmentation can be applied to paired tumor-normal samples essentially as for unpaired samples. The main difference is that the genotypes for the normal samples are required. BAFsegmentation is applied to data for a set of paired tumor-normal samples in a file that should be tab-delimited in the following format: ||Name||Chr||Position||sample1.GType || sample1.B Allele Freq||sample1.Log R Ratio||sample2.GType||sample2.B Allele Freq||sample2.Log R Ratio||...|| ||rs12354060||1||10004||AA||1||0.110391||AB||1||-0.05188531||...|| ||rs2691310||1||46844||AB||0.5519782||0.2984372||BB||0.4636427||0.3640218||...|| ||...||...||...||...||...||...||...||...||...||...|| For Illumina arrays data can be exported in this format directly from !BeadStudio. The data need to be split into a separate file for each sample using the script ''split_samples.pl''. In the BAFsegmentation directory, run ''split_samples.pl'' with the following command: {{{ perl split_samples.pl --data_file=example/example_beadstudio_paired_data.txt }}} where ''example_beadstudio_paired_data.txt'' is a file exported from !BeadStudio in the format described above. With some language settings !BeadStudio exports files with commas (,) as decimal points; 'split_samples.pl' replaces all commas (,) in data columns with points (.) in the 'split_samples.pl' results files. This script will generate one file per sample together with a file ''sample_names.txt'' in the BAFsegmentation subdirectory ''extracted''. These files are used when BAFsegmentation is run and can be deleted once the samples are segmented. ''Performing BAFsegmentation'' In the BAFsegmentation directory, run BAFsegmentation with the following command: {{{ perl BAF_segment_samples.pl }}} This command will perform BAFsegmentation on the samples in the BAFsegmentation subdirectory ''extracted'' that are specified in the file ''sample_names.txt''. If you want to perform BAFsegmentation on a subset of samples you can edit ''sample_names.txt'' accordingly. Note that BAFsegmentation requires points (.) for decimal points. BAFsegmentation can be run with different settings. To get an overview of parameters run BAFsegmentation with the following command: {{{ perl BAF_segment_samples.pl --help }}} To run BAFsegmentation on data normalized with tQN use the following command: {{{ perl BAF_segment_samples.pl --input_directory=path/to/tQN/normalized }}} where ''path/to/tQN'' is the path to your tQN directory in which you have a directory ''normalized'' with your tQN normalized data. Note that tQN is used with X and Y intensities. Please look at [wiki:se.lu.onk.IlluminaSNPNormalization tQN] for further instructions on how to use tQN and prepare your data for use with tQN. To run BAFsegmentation on paired tumor-normal samples a file named ''normal_sample_names.txt'' has to be created and put in the directory ''extracted''. The file ''normal_sample_names.txt'' should be tab-separated in the following format: ||!FilenameAssay||!FilenameNormal|| ||Sample1_extracted.txt||Normal1_extracted.txt|| ||Sample2_extracted.txt||Normal2_extracted.txt|| || ... || ... || This file allows to user to provide the required mapping between tumor-normal pairs. A single normal sample can be used for multiple tumor samples. It is also possible to mix analysis with paired and unpaired samples: samples not present in ''normal_sample_names.txt'' are analysed as unpaired tumor samples. ''Results'' * The segmented regions identified as allelic imbalance are stored in the file ''AI_regions.txt'' in the BAFsegmentation subdirectory ''segmented''. In addition an xml-file ''AI_regions.xml'' with the regions is also produced. This xml-file can be imported as a bookmark file into the Illumina !BeadStudio software for visualization and further analysis of the identified regions. * In the BAFsegmentation subdirectory ''plots'', the following postscript files are generated: * A file for each sample with three plots per chromosome: a BAF plot with non-informative homozygous SNPs removed, an mBAF plot with non-informative homozygous SNPs removed and with superimposed segmentation line, and a log R ratio plot with all SNPs with average log R ratios within mBAF segments superimposed. * A file for each sample with two plots for the whole genome: a plot with segmented mBAF and a plot with average log R ratios within mBAF segments. * A file with a plot for each chromosome of regions of allelic imbalance across all assays. === BAFsegmentation 2.0 pre-release #pre-release This version of the BAF segmentation software corresponds to the version used in Staaf et al. Breast Cancer Research, 2011 "Landscape of somatic allelic imbalances and copy number alterations in HER2-amplified breast cancer". The main change in this version compared to earlier versions is that a double segmentation is performed aimed at improving results for noisier samples and samples with high purity (e.g. cell lines). This version requires different input formats and some additional variables compared to 1.x versions. Notably, the data needs to be pre-compiled using R into RData objects as described in the zip-file. This version works on a per data set basis, allowing multiple data sets to be specified with different parameters. We are planning to integrate this new version into the same frame-work as older versions, with potential additional changes to improve performance, and release it as BAFsegmentation 2.0. [http://cbbp.thep.lu.se/~markus/software/BAFsegmentation/BAFsegmentation-2.0pre-snapshot111209.zip?format=raw Download the 2.0 pre-release (BAFsegmentation 2.0pre-snapshot111209).] === Contact === If you have comments please send an email to johan.staaf ...at... med.lu.se