#228 closed Request (fixed)
net.sf.basedb.normalizers: Normalizers should use ask BASE whether data is logged or not
Reported by: | Jari Häkkinen | Owned by: | olle |
---|---|---|---|
Priority: | major | Milestone: | ZZ Normalization package v1.1 |
Component: | net.sf.basedb.normalizers | Keywords: | |
Cc: |
Description
and use that information appropriately. Either set a proper default selection of average to use or simply use proper averaging without asking the user how to average.
Logged values should use arithmetic averages by default, non-logged should use geometric average by default.
Adding above functionality requires BASE 2.12 use.
Change History (19)
comment:1 by , 15 years ago
Status: | new → assigned |
---|
comment:2 by , 15 years ago
comment:3 by , 15 years ago
(In [1144]) References #228 Updated the method that calculates bioassayset's global average in AverageNormalizationPlugin, to support userdefine average method. Documentation related to recent changes has also been updated. The updated plugins execute as expected but the normalized data has not yet been verified.
comment:4 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:6 by , 11 years ago
Background discussion (Thanks to Nicklas Nordborg for detailed information about this):
- Originally data in BASE was stored in untransformed format, and the average method used was arithmetic mean (the latter differs from the current recommendation for untransformed data, which is to use geometric mean).
- A flag indicating if stored data was untransformed, stored as log-2, or log-10 values, was introduced in BASE Ticket #1120 (The dynamic part of BASE should keep track whether intensity data is in log space or not). See this ticket for a lengthy discussion on the use and storage of transformed data in BASE. Some additions were:
a. ABioAssaySet
has agetIntensityTransform()
method that returns anIntensityTransform
enum
object, whose publicFormula.AverageMethod getAverage()
method returnsFormula.AverageMethod.ARITHMETIC_MEAN
orFormula.AverageMethod.GEOMETRIC_MEAN
.
b. ClassVirtualColumn
was extended with static methodsVirtualColumn channelIntensity(int channel)
andVirtualColumn channelRaw(int channel)
. The former performs a reverse transformation if needed, determined by the transformation flag, in order to return untransformed data, while the latter returns data the way it was stored in the database, i.e. data in log-2 and log-10 format is returned in that format.
- Enum class
IntensityTransform
publicFormula.AverageMethod getAverage()
method does however returnFormula.AverageMethod.ARITHMETIC_MEAN
for data stored in untransformed format (according to the flag), andFormula.AverageMethod.GEOMETRIC_MEAN
for data stored in log-2 or log-10 format. This is the opposite to what is recommended to be used in this ticket, for data fetched in untransformed form.
comment:7 by , 11 years ago
Traceability note:
- BASE Ticket #1120 (The dynamic part of BASE should keep track whether intensity data is in log space or not) added several new methods and items concerning work with transformed/untransformed data.
- Ticket #541 (net.sf.basedb.normalizers: Normalizers should store normalized data in proper base) is concerned with how data should be stored after normalization.
comment:8 by , 11 years ago
Problem discussion:
- The desired default option for averaging method can be obtained by switching the results returned by public method
Formula.AverageMethod getAverage()
in enum classIntensityTransform
for data flagged to be stored in untransformed, relative to transformed (log-2 or log-10) format:Formula.AverageMethod.GEOMETRIC_MEAN
for data stored in untransformed formatFormula.AverageMethod.ARITHMETIC_MEAN
for data stored in transformed format
However, since thegerAverage()
method is used in several other classes, a thorough investigation is needed before such a change should be added, since there is a risk that functionality is broken in these other classes. - Another solution is to update class
QuantileNormalization
in private methodRequestInformation getConfiguredJobParameters()
to check the intensity transform set for theBioAssaySet
, and select the default averaging method based on that. This makes the choice of default averaging method independent on the value of thegetAverage()
method.
comment:9 by , 11 years ago
Problem discussion update:
- BASE Ticket #1792 (Incorrect average method specified in IntensityTransform) switched the results returned by public method
Formula.AverageMethod getAverage()
in enum classIntensityTransform
for data flagged to be stored in untransformed, relative to transformed (log-2 or log-10) format (see changeset [6564]):Formula.AverageMethod.GEOMETRIC_MEAN
for data stored in untransformed formatFormula.AverageMethod.ARITHMETIC_MEAN
for data stored in transformed format
This would solve the problem with wrong default averaging method being shown forQuantileNormalizer
. However, there are benefits letting the plug-in itself determine the default averaging method, based on the intensity transform information for theBioAssaySet
to work on.
ClassIntensityTransform
was extended with new public methoddouble transform(double value)
in changeset [6365], that stores data in the same transform as the source data.
comment:10 by , 11 years ago
(In [2154]) Refs #228. Classes/files AverageNormalization.java
and QuantileNormalization.java
updated to determine the default averaging method based on the intensity transform information for the BioAssaySet
to work on, instead of what is returned from the Formula.AverageMethod getAverage()
method for the IntensityTransform
of the latter:
- Private method
RequestInformation getConfiguredJobParameters()
is updated to check the intensity transform set for theBioAssaySet
, and select the default averaging method based on that. This makes the choice of default averaging method independent on the value of thegetAverage()
method.
comment:11 by , 11 years ago
Design update for QuantileNormalizer
:
- Data stored in logarithmic format should be untransformed before averaging, and then transformed back to logarithmic format before results are stored. The first part is already implemented by use of static method
VirtualColumn channelIntensity(int channel)
in classAbstractNormalizationPlugin
, while public methoddouble transform(double value)
in classIntensityTransform
should be used to transform the normalized result back before storing results. - Since averaging is performed on untransformed data, default averaging method should always be set to
Formula.AverageMethod.GEOMETRIC_MEAN
. - Help text for selecting averaging method in class
AbstractNormalizationPlugin
should be updated to avoid references to the format data is stored in, since averaging is performed on untransformed data. - Help text for
Quantile normalization
inMETA-INF/extensions.xml
should be extended with information that data stored in logarithmic format will be untransformed before averaging, and then transformed back to logarithmic format before results are stored.
comment:12 by , 11 years ago
(In [2165]) Refs #228. Refs #541. Quantile normalization updated to untransform logarithmic data before normalization, and then transform it back to logarithmic format before results are stored. Default averaging method will always be set to Formula.AverageMethod.GEOMETRIC_MEAN
:
- Class/file
QuantileNormalization.java
insrc/net/sf/basedb/plugins/
in packagenet.sf.basedb.normalizers
updates:
a. Private methodRequestInformation getConfiguredJobParameters()
updated to always set the default averaging method toFormula.AverageMethod.GEOMETRIC_MEAN
.
b. Private methodBioAssaySet normalize(DbControl dc, BioAssaySet source, Job job, ProgressReporter progress)
updated to call public methoddouble transform(double value)
in classIntensityTransform
to transform the normalized result back before storing it. Error message when number of spots differ between twoBioAssay
sets updated to report the names of the latter and the number of spots in each one. Also minor updates in order to increase clarity of code. - Class/file
AbstractNormalizationPlugin.java
insrc/net/sf/basedb/plugins/
in packagenet.sf.basedb.normalizers
updated in help text for selecting averaging method by avoiding reference to the format data is stored in, since averaging is performed on untransformed data. - XML files
extensions.xml
inMETA-INF
in packagenet.sf.basedb.normalizers
updated in help text forQuantile normalization
by adding information that data stored in logarithmic format will be untransformed before averaging, and then transformed back to logarithmic format before results are stored.
comment:13 by , 11 years ago
Design update for AverageNormalization
(mirrors update for QuantileNormalization
):
- Data stored in logarithmic format should be untransformed before averaging, and then transformed back to logarithmic format before results are stored. The first part is already implemented by use of static method
VirtualColumn channelIntensity(int channel)
in classAbstractNormalizationPlugin
, while public methoddouble transform(double value)
in classIntensityTransform
should be used to transform the normalized result back before storing results. - Since averaging is performed on untransformed data, default averaging method should always be set to
Formula.AverageMethod.GEOMETRIC_MEAN
.
comment:14 by , 11 years ago
(In [2166]) Refs #228. Refs #541. Average normalization updated to untransform logarithmic data before normalization, and then transform it back to logarithmic format before results are stored. Default averaging method will always be set to Formula.AverageMethod.GEOMETRIC_MEAN
:
- Class/file
AverageNormalization.java
insrc/net/sf/basedb/plugins/
in packagenet.sf.basedb.normalizers
updates:
a. Private methodRequestInformation getConfiguredJobParameters()
updated to always set the default averaging method toFormula.AverageMethod.GEOMETRIC_MEAN
.
b. Private methodBioAssaySet normalize(DbControl dc, BioAssaySet source, Job job, float refValue, float minIntensity, ProgressReporter progress)
updated to call public methoddouble transform(double value)
in classIntensityTransform
to transform the normalized result back before storing it.
comment:15 by , 11 years ago
Design note:
- When data stored in logarithmic format are untransformed before averaging, and then transformed back to logarithmic format before results are stored, transformation information for the result
BioAssaySet
must be set explicitly by calling its publicvoid setIntensityTransform(IntensityTransform transform)
method. If this is not done, methods using the resultBioAssaySet
will treat the data in logarithmic format as original data.
comment:17 by , 11 years ago
comment:18 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Ticket closed as the requested functionality has been added.
comment:19 by , 4 years ago
Milestone: | Normalization package v1.1 → ZZ Normalization package v1.1 |
---|
Milestone renamed
(In [1143]) References #228 Added parameter to define which average method to use when running the plugin. The functionality is not tested yet.