$Id: README 2173 2013-12-12 09:30:30Z jari $

= About `Normalization package for BASE` =

The `Normalization package for BASE (net.sf.basedb.normalization)`
plug-in set is a compilation of normalisers for expression data. See
``Documentation`` below for further information about the different
plug-ins in this package. Common to most of the plug-ins provided with
this package is that they work on bioassay sets with either 1-channel
and 2-channel data. The algorithms are working on expression values,
that is for 2-channel data, ratio ch1/ch2 are used.

`Normalization package for BASE` is free software. See the file
license.txt for copying conditions.

The package was created, and is maintained, by Martin Svensson and
Jari Hakkinen.


== Downloading ==

`Normalization package for BASE` can be obtained from

  http://baseplugins.thep.lu.se/wiki/PluginDownload


= Installation =

Installation instructions can be found in the 'INSTALL' file.


= Documentation =


== Average normalization ==

This plug-in scales the expression values for an assay with a factor,
''S'', equal to the ratio of either i) the geometric mean of the
expression values of all spots in the bioassay set divided by the
assay average, or ii) a user defined value divided by the assay
average.

The new expression values will become ''S'' times the original
expression value.
The user can choose between using geometric or arithmetic mean when 
calculating the averages.

Background subtraction and proper filtration have to be done before
running this plug-in.


== qQuantile normalization ==

The current implementation of qQuantile normalization supports only
1-channel arrays.

The qQuantile normalization is inspired by the 'Cubic Spline'
normalization in Illumina Beadstudio and the work by Workman et al.,
http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=12225587

In qQuantile normalization, all assays (including the target) are
sorted in increasing intensity. The sorted list of probe intensities
are partitioned into q groups, and each of theses q groups are
adjusted (normalized) with the corresponding target group. After
normalization the intensity distribution of each assay will be
approximately the same as the target distribution. q is calculated as
q=max(10,min(100,target_size/10)). The program will stop if the number
of well defined expression values in the target or any of the assays
in the set is smaller than q.

The target is defined by selecting a subset of the assays in the
bioassay set, and the target expression values are the medians of
probe intensities over the bioassay set. Probes with no well defined
measurements in the bioassay set are simply ignored in target
calculation.

Since the normalization calculations are based on geometric means and
performed in log space the intensities must be positive and larger
than 0. Rather than expecting the user of qQuantile normalization to
remove such intensity the underlying algorithm silently ignores zero
and negative intensities.

The bioassay set to be normalized must be non-logarithmic values since this
plug-in will log all values before performing the normalization.

Background subtraction and proper filtration should be done on the
bioassay set before running this plug-in.


== Quantile normalization ==

In quantile normalization each assay data is sorted in ascending
expression value order and added to a matrix as columns. The matrix
rows will contain mixed probes (also known as reporters or genes)
decided by their rank. For each row in the matrix, the expression
values are replaced with the row average value (geometric or
arithmetic selectable by user). Finally, each assay is
reordered into its original order to retain a standard expression
matrix were each row represents one probe. Assays are not mixed.

Background subtraction and proper filtration should be done on the
bioassay set before running this plug-in. The bioassay set must not
contain any missing values.


----------------------------------------------------------------------
{{{
Copyright (C) 2008 Jari Häkkinen, Martin Svensson
Copyright (C) 2009 Jari Häkkinen

This file is part of the Normalizers plug-in package for BASE
(net.sf.based.normalizers). The package is available at
http://baseplugins.thep.lu.se/ BASE main site is
http://base.thep.lu.se/

This is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 3
of the License, or (at your option) any later version.

The software is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
}}}