Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#461 closed enhancement (fixed)

Sample processing report generator

Reported by: olle Owned by: olle
Priority: major Milestone: Reggie v2.11
Component: net.sf.basedb.reggie Keywords:
Cc:

Description (last modified by olle)

Reggie should be updated to include a sample processing report generator, producing a report similar to the one produced today for the SCAN-B project using an R script. The latter report consists of a number of box plots showing the statistical distribution of various quantities related to the sample processing. The box plots show the statistics per quarter and per month for the selected time period.

Change History (39)

comment:1 by olle, 11 years ago

Status: newassigned

Ticket accepted.

comment:2 by olle, 11 years ago

Traceability note:

  • The report generator was introduced in Ticket #339 (Report generator). It included a sample count report.
  • A consent count report was introduced in Ticket #426 (Consent count report generator).
  • A patient count report was introduced in Ticket #433 (Patient count report generator).
  • An overview report was introduced in Ticket #438 (Overview report).
  • A missing sample data report was introduced in Ticket #439 (Missing sample data report).
  • Common table report utilities was broken out from the sample count report code and placed in its own class in Ticket #459 (Common table report utilities should be placed in its own class).
Last edited 11 years ago by olle (previous) (diff)

comment:3 by olle, 11 years ago

User interface design:

  • The previous reports (sample count, consent count, patient count, overview, and missing sample data) were all placed under the sub-header "Report generator". Since they are all concerned with the original samples and their sources, this sub-header will be renamed "Sample source report".
  • The new report will be named "SCAN-B quarter/month report", and placed under a new sub-header "Sample processing report".
  • The interface for the SCAN-B quarter/month report will be based on that for the previous reports, with three modifications:

    1. The view type selection will be extended with an option "Quarter + Month", that will produce plots for the selected quantities first by quarter and then by month.
    2. The view type "Auto" option will never return "Week", since this is normally too short a period for items of interest in sample processing. However, the user may still be able to manually select "Week" from the view type select box.
    3. A new selection box "Chart data" will be added, with options for all of the quantities, for which plots were produced by the original R script. An option "All" will also be available (and the default), that will produce plots for all of the quantities. If view type is set to "Quarter + Month", and chart data to "All", first plots will be displayed for all quantities per quarter, and then by month.
Last edited 11 years ago by olle (previous) (diff)

comment:4 by olle, 11 years ago

Description: modified (diff)

Added basic info on the R script box plots.

Last edited 11 years ago by olle (previous) (diff)

comment:5 by olle, 11 years ago

Description: modified (diff)

Typo fixed.

Last edited 11 years ago by olle (previous) (diff)

comment:6 by olle, 11 years ago

Short description of the original sample processing report (produced by the R script):

  1. The report contains quarterly and monthly statistics for the following quantities:
    a. Original quantity used for SCANB specimens
    b. Quantity tissue used for SCANB specimens
    c. Histology piece quantity used for SCANB specimens
    d. Remaining tissue quantity for SCANB specimens
    e. Total quantity DNA used for SCANB extractions
    f. Total quantity RNA used for SCANB extractions
    g. DNA yield (µg/mg tissue) SCANB
    h. RNA yield (µg/mg tissue) SCANB
    i. DNA yield corrected for lysate volume (µg/mg tissue) SCANB
    j. RNA yield corrected for lysate volume (µg/mg tissue) SCANB
    k. RNA QC SCANB, RQS otherwise RQS~RIN
    l. Minutes to RNAlater
  2. The R script uses input from two files, containing data for samples and extracts, respectively. Samples of subtypes Specimen, Histology, and Case are used, and extracts of subtypes DNA, RNA, Lysate, and RNAQC.
  3. The R script discards samples with no known original quantity and samples from biopsies.
  4. The R script creates box plots for the statistics of each quantity, where the center box in the box plot is determined by the 25- and 75-percentile, and contains a marker for the 50-percentile (the median). The whiskers on each box show the smallest and largest data values that lie within a distance of 1.5 times the value distance between the 25- and 75-percentile (the inter-percentile range, IPR) from the bottom and top of the box.
Version 0, edited 11 years ago by olle (next)

comment:7 by olle, 11 years ago

Design description.

A preliminary version of the SCAN-B quarter/month report in Reggie will be based on the following components:

  • A servlet class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/. The servlet will perform the main data processing and calculate the statistics. This has the advantage of reducing the bandwidth need by only having to send the statistics values to the receiving JavaScript in a JSON object, instead of the whole data set, that would have been necessary if the statistics calculation were performed by the script.
  • A JSP script scanbquartermonthreportgenerator.jsp in reggie/resources/. The JSP script will manage the GUI and display the created box plots.
  • A JavaScript utility file boxplot.js in reggie/resources/. The utility file will take statistics data in JSON format as input and create a boxplot in an HTML 5 canvas element.
  • JSP script index.jsp in reggie/resources/ is updated in section "Reggie reports" by having sub-header "Report generator" renamed "Sample source report" and addition of a new sub-header "Sample processing report", with an item "SCAN-B quarter/month report".

Statistics calculation:

  • Percentiles will be calculated by linear interpolation between values for nearest items, in case no item exactly corresponds to the percentile.
  • Statistics for a quantity will only be performed if at least 5 items exist for the time period in question.

In order to create the statistics, a quantity value and an associated date must be selected for each chart type. In some cases, the choice is straight forward, while in other more than one option is available. The following table shows the quantity values and dates that are used in the first preliminary version of the report. The status column shows a comment on how well the generated plot and displayed data reproduce the printed plots created by the original procedure using the R script.

Chart data Quantity used Date used Status
Original tissue quantity sample.getOriginalQuantity()/1000.0f Annotation value QIACUBE_DATE of extract from lysate from sample OK
Quantity tissue used extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample (may be more than one extract/sample) Annotation value QIACUBE_DATE of extract from lysate from sample OK
Histology piece quantity sample.getOriginalQuantity()/1000.0f for histology sample Annotation value PARTITION_DATE of histology sample Extra items, partial resemblance
Remaining tissue quantity sample.getRemainingQuantity()/1000.0f Annotation value QIACUBE_DATE of extract from lysate from sample OK
Total quantity DNA used extract.getOriginalQuantity() for DNA extract Annotation value QIACUBE_DATE of DNA extract OK
Total quantity RNA used extract.getOriginalQuantity() for RNA extract Annotation value QIACUBE_DATE of RNA extract OK
DNA yield (µg/mg tissue) extract.getOriginalQuantity() for DNA extract divided by extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample Annotation value QIACUBE_DATE of DNA extract OK
RNA yield (µg/mg tissue) extract.getOriginalQuantity() for RNA extract divided by extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample Annotation value QIACUBE_DATE of RNA extract OK
DNA yield corrected for lysate volume (µg/mg tissue) 2 x extract.getOriginalQuantity() for DNA extract divided by extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample Annotation value QIACUBE_DATE of DNA extract Resemblance OK
RNA yield corrected for lysate volume (µg/mg tissue) 2 x extract.getOriginalQuantity() for RNA extract divided by extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample Annotation value QIACUBE_DATE of RNA extract Resemblance OK
RNA QC SCANB, RQS otherwise RQS~RIN Annotations CA_RQS and BA_RIN for RNA extract. When no RQS value exists, the value RIN*0.7698785 + 1.607572 is used. Annotation value QIACUBE_DATE of RNA extract Missing items, order of magnitude OK
Minutes to RNAlater Difference between annotation values RNALATER_DATETIME and SAMPLING_DATETIME for specimen sample, converted to minutes Annotation value QIACUBE_DATE of extract from lysate from sample Extra items, order of magnitude OK
Last edited 11 years ago by olle (previous) (diff)

comment:8 by olle, 11 years ago

(In [1820]) Refs #461. Preliminary version of SCAN-B quarter/month report:

  1. XML file servlets.xml in reggie/META-INF/ updated with entry for new servlet net.sf.basedb.reggie.servlet.ScanBQuarterMonthReportServlet.
  2. JSP file index.jsp in reggie/resources/ updated by renaming of entry "Report generator" to "Sample source report" and adding new entry "Sample processing report.
  3. JavaScript file boxplot.js in reggie/resources/ added. It contains functions for producing simple box plots in an HTML canvas element from input data in a JSON object.
  4. JSP file samplereportgenerator.jsp in reggie/resources/ updated by renaming of header "Report generator" to "Sample source report".
  5. JSP file scanbquartermonthreportgenerator.jsp added. It manages the GUI for the SCAN-B quarter/month report.
  6. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ added. It collects the data and performs the statistical calculations for the report.

comment:9 by Nicklas Nordborg, 11 years ago

(In [1822]) References #461: Sample processing report generator

Added a new icon for the plots.

comment:10 by olle, 11 years ago

(In [1823]) Refs #461. JSP file scanbquartermonthreportgenerator.jsp in reggie/resources/ updated by fix of bug (variable definition was erroneously commented out).

comment:11 by olle, 11 years ago

(In [1825]) Refs #461. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated to report the number of remaining tissue items > 1 mg. The number is reported in the top right sub-title, next to the total number of items.

comment:12 by Nicklas Nordborg, 11 years ago

(In [1829]) References #461: Sample processing report generator

Added an SQL script that can be used to get rid of sensitive personal information after cloning the production server database.

comment:13 by Nicklas Nordborg, 11 years ago

(In [1830]) References #461: Sample processing report generator

Fixed the cleanup SQL script. Must use .. id in (.. since there are of course multiple values to change.

comment:14 by olle, 11 years ago

(In [1837]) Refs #461. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated in report for min to RNAlater:

  1. Samples where time of day for sampling or RNAlater equals 00:00:00 are now excluded.
  2. Samples where min to RNAlater is 0 or negative are now included.
  3. Name, sampling date & time, and RNAlater date & time are now reported in a list for samples where min to RNAlater is 0 or negative.

comment:15 by olle, 11 years ago

(In [1840]) Refs #461. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated in report for histology piece quantity:

  1. The date used for a histology sample is change from the partition date to the QiaCube date of extract from lysate from parent sample of histology sample.

comment:16 by olle, 11 years ago

(In [1842]) Refs #461. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated in report for RNA QC, plus other minor changes:

  1. RNA QC report updated by complete rewriting of code. Previous code contained two major errors, checks were made against a list that was continuously updated, and the RQS value was erroneously used as the RIN value. The updated code has the following major changes:
    a. RQS and RIN values are checked to be > 0 (-100 is used as a flag for bad data).
    b. The selection process is now based on the grand parent sample, i.e. only one RNA QC value is used for a single sample. If a valid RQS value exists, it is used, otherwise a valid RIN value.
  2. The number of decimals used in reported values has been set to 1.
  3. Removal of variables used for test and debugging purposes.

comment:17 by olle, 11 years ago

(In [1843]) Refs #461. Sample processing report is updated to allow appended text after the plot section:

  1. JSP file scanbquartermonthreportgenerator.jsp in reggie/resources/ updated by retrieving optional text to append from JSON object with key "appendedInfo".
  2. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated to add information on number of samples with negative or zero min to RNAlater value as appended info text. This appended info is only added if "min to RNAlater" is included in the selected chart data.

comment:18 by olle, 11 years ago

(In [1845]) Refs #461. Boxplot routines for drawing y-axis scale markers now separates between the value used for placing the marker, and the text displayed at the marker. Besides allowing general text to be used, this solves a problem when values were modified to a fixed number of decimals for display, which affected the placement of the marker in the previous design. Now a precise value should be used for placement of the marker, and a truncated value may be used as display text:

  1. JavaScript file boxplot.js in reggie/resources/ updated in functions drawScaleMarkerYLeft(...) and drawScaleMarkerYRight(...) with new argument markerText, for the text to be displayed at the marker. Function createBoxPlot(boxPlotJsonObject, ...) updated to retrieve the text to displayed at the markers for dotted horizontal guidelines from the JSON object for guide lines.
  2. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated with two new private convenience methods JSONObject createValueWithText(Float value, int numberOfDecimalsShown) and JSONObject createValueWithText(Float value, String text). Both creates a JSONObject with keys "value" and "text". Private method JSONObject createJSONPlotStatistics(...) updated to use new method JSONObject createValueWithText(Float value, int numberOfDecimalsShown) when creating JSONObjects for the dotted horizontal guidelines.

comment:19 by olle, 11 years ago

Revised overview of the SCAN-B quarter/month report in Reggie:

Chart data Quantity used Date used Status
Original tissue quantity sample.getOriginalQuantity()/1000.0f Annotation value QIACUBE_DATE of extract from lysate from sample OK
Quantity tissue used extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample (may be more than one extract/sample) Annotation value QIACUBE_DATE of extract from lysate from sample OK
Histology piece quantity sample.getOriginalQuantity()/1000.0f for histology sample Annotation value QIACUBE_DATE of extract from lysate from parent sample of histology sample OK
Remaining tissue quantity sample.getRemainingQuantity()/1000.0f Annotation value QIACUBE_DATE of extract from lysate from sample OK
Total quantity DNA used extract.getOriginalQuantity() for DNA extract Annotation value QIACUBE_DATE of DNA extract OK
Total quantity RNA used extract.getOriginalQuantity() for RNA extract Annotation value QIACUBE_DATE of RNA extract OK
DNA yield (µg/mg tissue) extract.getOriginalQuantity() for DNA extract divided by extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample Annotation value QIACUBE_DATE of DNA extract OK
RNA yield (µg/mg tissue) extract.getOriginalQuantity() for RNA extract divided by extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample Annotation value QIACUBE_DATE of RNA extract OK
DNA yield corrected for lysate volume (µg/mg tissue) 2 x extract.getOriginalQuantity() for DNA extract divided by extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample Annotation value QIACUBE_DATE of DNA extract OK
RNA yield corrected for lysate volume (µg/mg tissue) 2 x extract.getOriginalQuantity() for RNA extract divided by extract.getCreationEvent() .getUsedQuantity(sample)/1000.0f for extracts from sample Annotation value QIACUBE_DATE of RNA extract OK
RNA QC SCANB, RQS otherwise RQS~RIN Only one value per grand parent sample (sample of lysate of RNA extract of RNA QC extract) is used. Annotations CA_RQS and BA_RIN for RNA parent extract of RNA QC. Only valid RQS and RIN values are used, i.e. value > 0. When no RQS value exists, the value RIN*0.7698785 + 1.607572 is used. Annotation value QIACUBE_DATE of parent RNA extract OK
Minutes to RNAlater Difference between annotation values RNALATER_DATETIME and SAMPLING_DATETIME for specimen sample, converted to minutes. Entries with time set to "00:00:00" are discarded. Annotation value QIACUBE_DATE of extract from lysate from sample OK

comment:20 by olle, 11 years ago

(In [1846]) Refs #461. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated for appended info on special samples if "min to RNAlater" is included in the chart selection:

  1. The information is now displayed in tables instead of in lists.
  2. The sample info now includes a link to the case summary for the sample.

comment:21 by olle, 11 years ago

(In [1847]) Refs #461. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated in private method String createCaseSummaryButton(DbControl dc, String caseName) to use more general links to JSP page and icon image.

comment:22 by olle, 11 years ago

(In [1848]) Refs #461. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated in private method String createCaseSummaryButton(DbControl dc, String caseName) to append case name after icon image in case summary link.

comment:23 by olle, 11 years ago

(In [1849]) References #462: Implement a print function that can print complete pages

Implemented a javascript function that opens a popup window and copies the content of a specified html tag (given by the id) to the popup window.

openPrintWindow(ID, printElementId, pageTitle, pageOrientation, printNote)

The change also includes printing changes made in separate branch for #425 in [1723].

comment:24 by olle, 11 years ago

Note: The update of JSP file scanbquartermonthreportgenerator.jsp in reggie/resources/ in change set [1849] included removal of unused div tags for canvas objects, that contained a style setting without closing '"' character. If these div tags should be desired in the future, the style setting should be corrected.

comment:25 by olle, 11 years ago

(In [1850]) Refs #461. Java class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated in private method String createCaseSummaryButton(DbControl dc, String caseName) to generate code for an HTML link, instead of a button, in order for the mouse cursor to indicate that the element could be clicked on. The "onerror" code in the icon "img" tag was removed, since it wasn't parsed correctly.

comment:26 by Nicklas Nordborg, 11 years ago

(In [1851]) References #461 and #462.

  • Added print functionality to the "Sample source report" function.
  • Minor adjustments to the print template to center-align the printed output on the page if possible.
  • Hide the "Case summary" links on the "Samples processing statistics" report when printing.
  • Force page break before the "Appended info" section since it seems like it often breaks in the middle of the table.
  • Get rid of hardcoded line-breaks between images in the "Samples processing statistics" so that scaling down a printout may create a 2-column layout. Or even 3 or 4 columns, but this create very small plots.

comment:27 by Nicklas Nordborg, 11 years ago

(In [1852]) References #461: Sample processing report generator

  • Disable debug output.
  • Encode HTML tags in the debug output since it would corrupt the debug display and generate strange "Syntax error" and "Illegal character" warnings from the browser.

comment:28 by Nicklas Nordborg, 11 years ago

(In [1853]) References #461: Sample processing report generator

  • Changed AJAX request to an asynchronous request.
  • Added a "loading" animation while waiting for the plots.

comment:29 by olle, 11 years ago

(In [1854]) Refs #461. JavaScript file boxplot.js in reggie/resources/ updated in function createBoxPlot(...) to determine the number of decimals used for y-axis scale marker numbers. The algorithm will inspect the decimal parts of all scale marker values and select the number of decimals needed to make the largest decimal part displayed with one digit. However, if the first decimals after rounding are ".000" for all the decimal parts, no decimals will be printed.

comment:30 by olle, 11 years ago

(In [1855]) Refs #461. SCAN-B quarter/month report updated to allow statistics of data from a single site. Default selection is "all sites together", which will produce the same plots as before.

comment:31 by olle, 11 years ago

(In [1856]) Refs #461. JavaScript file boxplot.js in reggie/resources/ updated in function createBoxPlot(...) in algorithm to determine the number of decimals used for y-axis scale marker numbers. The scale values are now rounded to 3 decimals before inspection of the decimal parts, to fix cases like e.g. 2.99999993, which previously would have been displayed with one decimal as "3.0", instead of "3".

comment:32 by Nicklas Nordborg, 11 years ago

(In [1857]) References #461: Sample processing report generator

Use 1 or 2 number of decimals for the mean, sd, range and percentile values depending on the range on the y axis.

comment:33 by Nicklas Nordborg, 11 years ago

(In [1858]) References #461: Sample processing report generator

Do not display seconds for RNAlater timestamps in the "appended info" table since they are only recorded with minutes. Refactored code to use DateToStringConverter instead of methods in the ReportTableUtilServlet (which has been removed).

comment:34 by olle, 11 years ago

Design comment:

When a specific chart site is selected for the SCAN-B quarter/month report, the filtering in class ScanBQuarterMonthReportServlet is applied as follows:

  1. When the samples are processed, a site filter is applied when creating the lists List<Sample> sampleTrimmedList and List<Sample> sampleHistologyList.
  2. When the extracts are processed, only extracts with parent samples in sampleTrimmedList are placed in the list List<Extract> extractLysateList.
  3. When DNA and RNA extracts are processed, only extracts with parent extracts in extractLysateList are placed in the lists List<Extract> extractDnaList and List<Extract> extractRnaList, respectively. The id values of the extract and grand parent sample are placed in HashMap<Integer, Integer> extractIdSampleIdHashMap.
  4. When RNA QC extracts are processed, only RNA QC extracts with parents extracts used as keys in HashMap<Integer, Integer> extractIdSampleIdHashMap are placed in the List<Extract> extractRnaQcList.

Since the lists sampleTrimmedList, sampleHistologyList, extractDnaList, extractRnaList, and extractRnaQcList are the ones used, when the statistics data is calculated, all items will have a reference to a sample, resulting in only data from the selected site to be included. Future changes in the way items are selected for these and other lists used for the statistics, should take care that the site filtering works for all lists.

comment:35 by olle, 11 years ago

Resolution: fixed
Status: assignedclosed

Ticket closed as the first version of the SCAN-B quarter/month report has been added.

comment:36 by olle, 11 years ago

Design update:

  • The SCAN-B quarter/month report should obtain the sites for the chart site menu dynamically from server data, instead of having the data hard-coded in JSP file scanbquartermonthreportgenerator.jsp in reggie/resources/.

comment:37 by olle, 11 years ago

(In [1997]) Refs #461. The SCAN-B quarter/month report is updated to obtain the sites for the chart site menu dynamically from server data, instead of having the data hard-coded in JSP file scanbquartermonthreportgenerator.jsp in reggie/resources/:

  1. JSP file scanbquartermonthreportgenerator.jsp in reggie/resources/ updated in function gotoStep2() to call new function getSites() to obtain a JSON object with site data from the server, and construct the site-specific part of the chart site menu dynamically. New function getSites() makes an Ajax call to servlet ScanBQuarterMonthReportServlet to obtain the JSON site data.
  2. Class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated in protected method void doGet(HttpServletRequest req, HttpServletResponse resp) to repond to command getsites by calling new private method JSONObject createSitesList(JSONObject json) and returning the JSON object. New private method JSONObject createSitesList(JSONObject json) returns a JSON object containing JSON site data.

comment:38 by olle, 11 years ago

(In [2008]) Refs #461. The SCAN-B quarter/month report is updated to avoid problems when no data is available for the site/quantity/time period selected for the plot:

  1. Class/file ScanBQuarterMonthReportServlet.java in reggie/src/net/sf/basedb/reggie/servlet/ updated in private method JSONObject createJSONPlotStatistics(...) to check if variable floatPct25 differs from null, before using the value in a conditional.
  2. JavaScript file reports/boxplot.js in reggie/resources/ updated in function createBoxPlot(boxPlotJsonObject, canvas, draw_area_wdt, draw_area_hgt, draw_scale_factor) to check if an element in boxPlotJsonObject.valueGuideLinesY differs from null, before using the value.

comment:39 by olle, 11 years ago

(In [2016]) Fixes #498. Refs #461. SCAN-B sites file Site.java in reggie/src/net/sf/basedb/reggie/ updated with new entry for "Uppsala", with prefix "88" and start date 2013-10-01.

Note: See TracTickets for help on using tickets.