Version 27 (modified by 16 years ago) ( diff ) | ,
---|
BAFsegmentation
BAFsegmentation is a method to identify regions of allelic imbalance from B allele frequencies obtained from SNP arrays described in
Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays
- Staaf, D. Lindgren, J. Vallon-Christersson, A. Isaksson, H. Göransson, G. Juliusson, R. Rosenquist, M. Höglund, Å. Borg, M. Ringnér
Genome Biology 9:R136 (2008)
Abstract
Full text
News
- Oct. 17, 2008. BAFsegmentation 1.1.0 released. New features include:
- Improved plotting, including a new across assays plot of regions of allelic imbalance for each chromosome and whole genome plots for each assay.
- Improved removal of noisy and non-informative homozygous SNPs.
- Support for segmenting BAF data normalized with tQN (tQN version 1.1.0 or higher).
Future plans
- Adaptation of BAFsegmentation to samples having 100% tumor content for example cell-lines.
- Integration of log R ratio into calling of regions and their type.
License
The BAFsegmentation software is available as a stand-alone software package, and will become available as as a plug-in to BASE as the handling of SNP arrays in BASE is developed. Both versions are available under the GNU General Public License.
Download BAFsegmentation
Download the latest stand-alone release (BAFsegmentation 1.1.0).
Supplemental Data
- The plots referred to in additional data file 4 in the manuscript are available here.
- The simulated data set used in the publication is available here.
- Infinium data for four matched tumor-normal pairs and a dilution series of a tumor cell line mixed with its paired normal cell line are available in NCBI's Gene Expression Omnibus with accession GSE11976.
How to use BAFsegmentation
Recommendations
For Infinium data, we recommend using BAFsegmentation with data normalized using tQN. BAFsegmentation benefits from the symmetrical B allele frequencies obtained with tQN.
Requirements
BAFsegmentation is written in R with a Perl wrapper, so both R and Perl are required. Required Perl modules are: File::Spec, Getopt::Long, IO::File and Pod::Usage (http://www.cpan.org). Required R package is DNAcopy, recommended version is 1.14.0 (http://www.bioconductor.org).
Installation
Download and unzip the file available under the section Download BAFsegmentation on this page.
OS X or Linux: The programs should run as they are. You need R and perl in your path.
Windows: Depending on how you have installed R and Perl on your system you may have to edit the variables $R_command and $R_windows at the beginning of the file BAF_segment_samples.pl. $R_windows should likely contain the full path to the R script interpreter on your system. Also comment out (with an initial #) the $R_command used on OS X and Linux systems. For example, we have successfully used BAFsegmentation using ActivePerl on a Windows system with the following $R_windows and $R_command:
### # Mac OS X and Linux #my $R_command="R --vanilla --no-save --slave < BAF_segment.R"; ### # Windows # Note that we are using ''Rscript'', which is a part of the R distribution. my $R_windows=File::Spec->canonpath('C:/"Program Files"/R/R-2.7.0/bin/Rscript'); my $R_command="$R_windows --vanilla BAF_segment.R";
Input data format
BAFsegmentation is applied to data for a set of samples in a file that should be tab-delimited in the following format:
Name Chr Position sample1.B Allele Freq sample1.Log R Ratio sample2.B Allele Freq sample2.Log R Ratio sample3.B Allele Freq sample3.Log R Ratio ... rs12354060 1 10004 1 0.110391 1 -0.05188531 1 0.07706165 ... rs2691310 1 46844 0.5519782 0.2984372 0.4636427 0.3640218 0.4393658 0.2589271 ... ... ... ... ... ... ... ... ... ... ...
For Illumina arrays data can be exported in this format directly from BeadStudio. The data need to be split into a separate file for each sample using the script split_samples.pl. In the BAFsegmentation directory, run split_samples.pl with the following command:
perl split_samples.pl --data_file=example/example_beadstudio_data.txt
where example_beadstudio_data.txt is a file exported from BeadStudio in the format described above.
This script will generate one file per sample together with a file sample_names.txt in the BAFsegmentation subdirectory extracted. These files are used when BAFsegmentation is run and can be deleted once the samples are normalized.
Performing BAFsegmentation
In the BAFsegmentation directory, run BAFsegmentation with the following command:
perl BAF_segment_samples.pl
This command will perform BAFsegmentation on the samples in the BAFsegmentation subdirectory extracted that are specified in the file sample_names.txt. If you want to perform BAFsegmentation on a subset of samples you can edit sample_names.txt accordingly.
BAFsegmentation can be run with different settings. To get an overview of parameters run BAFsegmentation with the following command:
perl BAF_segment_samples.pl --help
To run BAFsegmentation on data normalized with tQN use the following command:
perl BAF_segment_samples.pl --input_directory=path/to/tQN/normalized
where path/to/tQN is the path to your tQN directory in which you have a directory normalized with your tQN normalized data. Note that tQN is used with X and Y intensities. Please look at tQN for further instructions on how to use tQN and prepare your data for use with tQN.
Results
- The segmented regions identified as allelic imbalance are stored in the file AI_regions.txt in the BAFsegmentation subdirectory segmented. In addition an xml-file AI_regions.xml with the regions is also produced. This xml-file can be imported as a bookmark file into the Illumina BeadStudio software for visualization and further analysis of the identified regions.
- In the BAFsegmentation subdirectory plots, the following postscript files are generated:
- A file for each sample with three plots per chromosome: a BAF plot with non-informative homozygous SNPs removed, an mBAF plot with non-informative homozygous SNPs removed and with superimposed segmentation line, and a log R ratio plot with all SNPs with average log R ratios within mBAF segments superimposed.
- A file for each sample with two plots for the whole genome: a plot with segmented mBAF and a plot with average log R ratios within mBAF segments.
- A file with a plot for each chromosome of regions of allelic imbalance across all assays.
Contact
If you have comments please send an email to johan.staaf@…
Attachments (5)
-
BAFsegmentation-2.0pre-snapshot111209.zip
(20.5 MB
) - added by 3 years ago.
BAFsegmentation 2.0pre-snapshot 111209
-
BAFsegmentation-1.2.0.zip
(2.3 MB
) - added by 3 years ago.
BAFsegmentation 1.2.0
-
CRL2324_dilutionSeries_TableExport.zip
(74.9 MB
) - added by 3 years ago.
CRL2324 dilutionSeries TableExport
-
CRL2324_dilutionSeries_BeadStudio.zip
(54.9 MB
) - added by 3 years ago.
CRL2324 dilutionSeries BeadStudio
-
SimulatedTumorData.zip
(220.4 MB
) - added by 3 years ago.
Simulated Tumor Data