Opened 7 years ago

Closed 7 years ago

## #541 closed Request (fixed)

# net.sf.basedb.normalizers: Normalizers should store normalized data in proper base

Reported by: | Jari Häkkinen | Owned by: | olle |
---|---|---|---|

Priority: | major | Milestone: | Normalization package v1.1 |

Component: | net.sf.basedb.normalizers | Keywords: | |

Cc: |

### Description

QuantileNormalizer always stores data non-logged irrespective what data was before normalization. Data should be stored back to BASE in the same base as before the transform.

### Change History (14)

### comment:1 Changed 7 years ago by

Status: | new → assigned |
---|

### comment:2 Changed 7 years ago by

Background discussion (Thanks to Nicklas Nordborg for detailed information about this):

- Originally data in BASE was stored in untransformed format, and the average method used was arithmetic mean (the latter differs from the current recommendation for untransformed data, which is to use geometric mean).
- A flag indicating if stored data was untransformed, stored as log-2, or log-10 values, was introduced in BASE Ticket #1120 (The dynamic part of BASE should keep track whether intensity data is in log space or not). See this ticket for a lengthy discussion on the use and storage of transformed data in BASE. Some additions were:

a. A`BioAssaySet`

has a`getIntensityTransform()`

method that returns an`IntensityTransform`

`enum`

object, whose public`Formula.AverageMethod getAverage()`

method returns`Formula.AverageMethod.ARITHMETIC_MEAN`

or`Formula.AverageMethod.GEOMETRIC_MEAN`

.

b. Class`VirtualColumn`

was extended with static methods`VirtualColumn channelIntensity(int channel)`

and`VirtualColumn channelRaw(int channel)`

. The former performs a reverse transformation if needed, determined by the transformation flag, in order to return untransformed data, while the latter returns data the way it was stored in the database, i.e. data in log-2 and log-10 format is returned in that format.

- Enum class
`IntensityTransform`

public`Formula.AverageMethod getAverage()`

method does however return`Formula.AverageMethod.ARITHMETIC_MEAN`

for data stored in untransformed format (according to the flag), and`Formula.AverageMethod.GEOMETRIC_MEAN`

for data stored in log-2 or log-10 format. This is the opposite to what is recommended to be used in this ticket, for data fetched in untransformed form.

### comment:3 Changed 7 years ago by

Traceability note:

- BASE Ticket #1120 (The dynamic part of BASE should keep track whether intensity data is in log space or not) added several new methods and items concerning work with transformed/untransformed data.
- Ticket #228 (net.sf.basedb.normalizers: Normalizers should use ask BASE whether data is logged or not) is concerned with setting a proper default average method for normalization, based on if the data is stored in transformed or untransformed format.

### comment:4 Changed 7 years ago by

Problem discussion:

- Class
`QuantileNormalization`

in`src/net/sf/basedb/plugins/QuantileNormalization.java`

extends`AbstractNormalizationPlugin`

, and the latter contains several methods to access data, all using method`VirtualColumn.channelIntensity(int channel)`

to retrieve the data from the database. For transformed data in log-2 or log-10 format, this means that the data is returned in untransformed format, and will be stored as such after normalization. - Exchanging method
`VirtualColumn.channelIntensity(int channel)`

calls in class`AbstractNormalizationPlugin`

for`VirtualColumn.channelRaw(int channel)`

calls, will result in the data being returned as stored in the database, without any optional transformation, and therefore will be stored after normalization in the same format as when fetched. For this to work, it is essential that a proper averaging method is used, as using geometric mean for data with zero or negative values leads to problems.

### comment:5 Changed 7 years ago by

Problem discussion update:

- BASE Ticket #1792 (Incorrect average method specified in IntensityTransform) switched the results returned by public method
`Formula.AverageMethod getAverage()`

in enum class`IntensityTransform`

for data flagged to be stored in untransformed, relative to transformed (log-2 or log-10) format (see changeset [6564]):`Formula.AverageMethod.GEOMETRIC_MEAN`

for data stored in untransformed format`Formula.AverageMethod.ARITHMETIC_MEAN`

for data stored in transformed format

This would solve the problem with wrong default averaging method being shown for`QuantileNormalizer`

. However, there are benefits letting the plug-in itself determine the default averaging method, based on the intensity transform information for the`BioAssaySet`

to work on.

Class`IntensityTransform`

was extended with new public method`double transform(double value)`

in changeset [6365], that stores data in the same transform as the source data.

### comment:6 Changed 7 years ago by

Design update for `QuantileNormalization`

:

- Data stored in logarithmic format should be untransformed before averaging, and then transformed back to logarithmic format before results are stored. The first part is already implemented by use of static method
`VirtualColumn channelIntensity(int channel)`

in class`AbstractNormalizationPlugin`

, while public method`double transform(double value)`

in class`IntensityTransform`

should be used to transform the normalized result back before storing results. - Since averaging is performed on untransformed data, default averaging method should always be set to
`Formula.AverageMethod.GEOMETRIC_MEAN`

. - Help text for selecting averaging method in class
`AbstractNormalizationPlugin`

should be updated to avoid references to the format data is stored in, since averaging is performed on untransformed data. - Help text for
`Quantile normalization`

in`META-INF/extensions.xml`

should be extended with information that data stored in logarithmic format will be untransformed before averaging, and then transformed back to logarithmic format before results are stored.

### comment:7 Changed 7 years ago by

(In [2165]) Refs #228. Refs #541. Quantile normalization updated to untransform logarithmic data before normalization, and then transform it back to logarithmic format before results are stored. Default averaging method will always be set to `Formula.AverageMethod.GEOMETRIC_MEAN`

:

- Class/file
`QuantileNormalizer.java`

in`src/net/sf/basedb/plugins/`

in package`net.sf.basedb.normalizers`

updates:

a. Private method`RequestInformation getConfiguredJobParameters()`

updated to always set the default averaging method to`Formula.AverageMethod.GEOMETRIC_MEAN`

.

b. Private method`BioAssaySet normalize(DbControl dc, BioAssaySet source, Job job, ProgressReporter progress)`

updated to call public method`double transform(double value)`

in class`IntensityTransform`

to transform the normalized result back before storing it. Error message when number of spots differ between two`BioAssay`

sets updated to report the names of the latter and the number of spots in each one. Also minor updates in order to increase clarity of code. - Class/file
`AbstractNormalizationPlugin.java`

in`src/net/sf/basedb/plugins/`

in package`net.sf.basedb.normalizers`

updated in help text for selecting averaging method by avoiding reference to the format data is stored in, since averaging is performed on untransformed data. - XML files
`extensions.xml`

in`META-INF`

in package`net.sf.basedb.normalizers`

updated in help text for`Quantile normalization`

by adding information that data stored in logarithmic format will be untransformed before averaging, and then transformed back to logarithmic format before results are stored.

### comment:8 Changed 7 years ago by

Design update for `AverageNormalization`

(mirrors update for `QuantileNormalization`

):

- Data stored in logarithmic format should be untransformed before averaging, and then transformed back to logarithmic format before results are stored. The first part is already implemented by use of static method
`VirtualColumn channelIntensity(int channel)`

in class`AbstractNormalizationPlugin`

, while public method`double transform(double value)`

in class`IntensityTransform`

should be used to transform the normalized result back before storing results. - Since averaging is performed on untransformed data, default averaging method should always be set to
`Formula.AverageMethod.GEOMETRIC_MEAN`

.

### comment:9 Changed 7 years ago by

(In [2166]) Refs #228. Refs #541. Average normalization updated to untransform logarithmic data before normalization, and then transform it back to logarithmic format before results are stored. Default averaging method will always be set to `Formula.AverageMethod.GEOMETRIC_MEAN`

:

- Class/file
`AverageNormalization.java`

in`src/net/sf/basedb/plugins/`

in package`net.sf.basedb.normalizers`

updates:

a. Private method`RequestInformation getConfiguredJobParameters()`

updated to always set the default averaging method to`Formula.AverageMethod.GEOMETRIC_MEAN`

.

b. Private method`BioAssaySet normalize(DbControl dc, BioAssaySet source, Job job, float refValue, float minIntensity, ProgressReporter progress)`

updated to call public method`double transform(double value)`

in class`IntensityTransform`

to transform the normalized result back before storing it.

### comment:10 Changed 7 years ago by

Design note:

- When data stored in logarithmic format are untransformed before averaging, and then transformed back to logarithmic format before results are stored, transformation information for the result
`BioAssaySet`

must be set explicitly by calling its public`void setIntensityTransform(IntensityTransform transform)`

method. If this is not done, methods using the result`BioAssaySet`

will treat the data in logarithmic format as original data.

### comment:11 Changed 7 years ago by

(In [2167]) Refs #541. Rank invariant normalization updated to store data in same format as before the normalization, untransformed, log-2, log-10. (Note that Rank inveriant normalization currently is only implemented for one-channel data):

- Class/file
`RankInvariantNormalization.java`

in`src/net/sf/basedb/plugins/`

in package`net.sf.basedb.normalizers`

updated in private method`BioAssaySet normalize1Ch(DbControl dc, BioAssaySet source, List<?> masterAssays, int numIteration, Job job, ProgressReporter progress)`

to call public method`double transform(double value)`

in class`IntensityTransform`

to transform the normalized result back before storing it.

### comment:13 Changed 7 years ago by

### comment:14 Changed 7 years ago by

Resolution: | → fixed |
---|---|

Status: | assigned → closed |

Ticket closed as the requested functionality has been added.

**Note:**See TracTickets for help on using tickets.

Ticket accepted.