Opened 8 years ago

Closed 8 years ago

#898 closed task (fixed)

INCA import should create trimmed tab-delimited file with synonymized data

Reported by: olle Owned by: olle
Priority: major Milestone: Reggie v4.6
Component: net.sf.basedb.reggie Keywords:
Cc:

Description (last modified by olle)

INCA import should create a trimmed tab-delimited file with synonymized data for analysis outside Reggie/BASE. The new file should be based on the tab-delimited file used for INCA import, with the following differences:

  1. Accrued entries, i.e. those with personal numbers, should have the latter exchanged for the patient item names ("PAT" + 6 figures) in the SCAN-B database.
  2. Apart from columns with headers "PersonalNo" and "PAT_ID", only data in columns corresponding to INCA annotation types should be included.

Change History (11)

comment:1 by olle, 8 years ago

Status: newassigned

Ticket accepted.

comment:2 by olle, 8 years ago

Traceability note:

  • Creation of a csv file to be used when requesting information from the INCA database was introduced in Ticket #487 (Export information intended for INCA).
  • Import of an INCA csv file was introduced in Ticket #525 (Import data from INCA).
  • INCA import was updated in Ticket #896 (INCA import should include laterality mapping column).

comment:3 by olle, 8 years ago

Description: modified (diff)

Ticket description updated:
a. Clarifying that personal numbers should be exchanged for SCAN-B patient item names.
b. Column "PERSNR" is not needed in the output file, since column "PersonalNo" is included.

comment:4 by olle, 8 years ago

Functional specification of first version of support for an INCA import output CSV file:

  • The first version of support for an INCA import output CSV file, will have the following specification:
    a. An INCA import output CSV file in tab-delimited format should be automatically created, when an INCA import is performed.
    b. Accrued entries, i.e. those with personal numbers, should have the latter exchanged for the patient item names ("PAT" + 6 figures) in the SCAN-B database.
    c. Apart from columns with headers "PersonalNo" and "PAT_ID", only data in columns corresponding to INCA annotation types should be included.
    d. The user should be able to download the created INCA import output file by clicking on a button.
    e. INCA statistics should be updated to allow an INCA import output file to be used as input, i.e. it should be able to identify an accrued entry mapped to a SCAN-B case item from the patient item name, instead of the personal number.

Design update overview:

  1. The functionality for creation and download of a tab-delimited file will be based on that for INCA statistics.
  2. The INCA import output CSV file will be created by servlet IncaServlet in a new private method void createIncaImportOutputFile(DbControl dc, ...), that at import will be called just before changes are committed to the database. This will decrease the risk of the creation of the output file interfering with the INCA import itself.
  3. A new hash map mapping patient item names to patient (BioSource) database ID values will be created.
  4. Private inner class LineDatabaseMappingResult will be updated with new attribute for mapping raw line numbers to patient database ID values for accrued entries.
  5. Private inner classes RawIncaCase and IncaCase will be updated with new attributes for patient database ID and temporary patient id from INCA import file.

comment:5 by olle, 8 years ago

(In [4026]) Refs #898. First version of support for an INCA import output CSV file in tab-delimited format:

  1. JSP file import-inca.jsp in resources/personal/ updated by adding new output CSV file download button with id "downloadoutputcsvfile".
  2. Javascript file import-inca.js in resources/personal/ updated.
    a. String constants REPORT_TYPE_IMPORT and REPORT_TYPE_IMPORT_OUTPUT_CSV defined.
    b. Function initPage() updated by coupling new output CSV file download button to new function downloadOutputCsvFile(). Also, updated function checkForReportFile(reportType) called with argument reportType set to string constants REPORT_TYPE_IMPORT and REPORT_TYPE_IMPORT_OUTPUT_CSV, respectively.
    c. Function initializeStep2(response) updated to hide or show new output CSV file download button, depending on if a simple check only is performed, or not.
    d. Function checkForReportFile() updated with argument reportType, which value is set to attribute "reporttype", when performing a GET request to servlet IncaServlet with command "CheckForIncaReportFile".
    e. Function reportFileDownloadButtonDisplay(response) updated to control the download button corresponding to the report type.
    f. Function downloadReportFile() updated to set value of attribute "reporttype" to string constant REPORT_TYPE_IMPORT, instead of explicit string value, when performing a GET request to servlet IncaServlet with command "DownloadIncaReportFile".
    g. New function downloadOutputCsvFile() is identical to function downloadReportFile(), except that attribute reporttype is set to value of string constant REPORT_TYPE_IMPORT_OUTPUT_CSV.
  3. Java servlet class/file IncaServlet.java in src/net/sf/basedb/reggie/servlet/ updated:
    a. New static final String constants REPORT_TYPE_IMPORT_OUTPUT_CSV and INCA_IMPORT_OUTPUT_CSV_FILENAME defined.
    b. Protected method void doGet(HttpServletRequest req, HttpServletResponse resp) updated for command "DownloadIncaReportFile" to perform a file copy using streams, instead of a PrintWriter object, in order to ensure that character encoding is unchanged.
    c. Protected method void doPost(HttpServletRequest req, HttpServletResponse resp) updated for command "ImportInca" to call private method List<IncaEntryLine> fetchIncaEntryLines(int tempPatIdClmIndex, int personalNoClmIndex, int lateralityDescriptionClmIndex, List<String> lines, boolean accruedEntries) to obtain a list of non-accrued INCA entry lines. A raw line number patient ID hash map is obtained from a LineDatabaseMappingResult object. Created RawIncaCase items are updated with patient ID and temporary patient id. At import, these values are transferred to created IncaCase objects. New private method void createIncaImportOutputFile(DbControl dc, List<IncaCase> incaCaseList, List<AnnotationType> incaAnnoTypeList, HashMap<Integer,AnnotationType> incaAnnoIdAnnoTypeHM, List<Integer> fileHeaderIndexList, List<String> fileHeaderList, List<IncaEntryLine> nonAccruedIncaEntryLines) is called at import.
    d. Protected method void doPost(HttpServletRequest req, HttpServletResponse resp) updated for command "IncaStatistics" to call new private method HashMap<String,Integer> fetchPatientItemNameBioSourceIdHashMap(DbControl dc, SimpleProgressReporter progress, float progressBiosourceMappingFraction, int progressOffset) to obtain a hash map mapping patient item names to patient (BioSource) database ID values. This hash map is used as new argument when calling update private method lineDatabaseMappingForStatistics(DbControl dc, ..., HashMap<String,Integer> patItemNameBioSourceIdHM,, ...).
    e. New private method void createIncaImportOutputFile(DbControl dc, List<IncaCase> incaCaseList, List<AnnotationType> incaAnnoTypeList, HashMap<Integer,AnnotationType> incaAnnoIdAnnoTypeHM, List<Integer> fileHeaderIndexList, List<String> fileHeaderList, List<IncaEntryLine> nonAccruedIncaEntryLines) added. It creates an INCA import output file in CSV format with columns corresponding to INCA annotation types, plus a "PersonalNo" column with patient item name for accrued entries, that could be mapped to SCAN-B case items, and a "PAT_ID" column with temporary patient id in the import file.
    f. New private method HashMap<String,Integer> fetchPatientItemNameBioSourceIdHashMap(DbControl dc, SimpleProgressReporter progress, float progressBiosourceMappingFraction, int progressOffset) added. It returns a hash map mapping patient item name to biosource id.
    g. Private method LineDatabaseMappingResult lineDatabaseMapping(DbControl dc, ...) updated to obtain a hash map mapping raw line numbers to patient database ID values, and add it to the returned LineDatabaseMappingResult object.
    h. Private method LineDatabaseMappingForStatisticsResult lineDatabaseMappingForStatistics(DbControl dc, ...) updated with new argument HashMap<String,Integer> patItemNameBioSourceIdHM, that is used to accept patient item names instead of personal numbers in column "PersonalNo". Also updated to obtain a hash map mapping raw line numbers to patient database ID values, and add to the LineDatabaseMappingResult object, that in turn is added to the returned LineDatabaseMappingForStatisticsResult object.
    i. Private method String fetchReportFileName(String reportType) updated to return constant INCA_IMPORT_OUTPUT_CSV_FILENAME for argument reportType equal to constant REPORT_TYPE_IMPORT_OUTPUT_CSV.
    j. Private inner class LineDatabaseMappingResult updated by adding new private attribute HashMap<Integer,Integer> rawLineNumberPatientIdHM with public accessor methods.
    j. Private inner class RawIncaCase updated by adding new private attributes int patientId and String tempPatientId with public accessor methods.
    k. Private inner class IncaCase updated by adding new private attributes int patientId and String tempPatientId with public accessor methods.
    l. Some typos fixed.

comment:6 by olle, 8 years ago

Test:

  • Setup:
    a. A modified INCA import file with data in tab-delimited format was used for INCA import on a local SCAN-B database.
    b. Personal numbers had been exchanged for faked ones used in the local SCAN-B database.
    c. The file contained 586 header columns with 117 unknown INCA headers, resulting in 468 columns plus an extra ignored personal number column.
    d. All lines had valid data, and no lines with same personal numbers mapped to more than two lateralities.
    e. The file contained 7232 lines of data, 4819 with personal numbers, 2413 without.
    f. INCA import created an INCA import output file with 468 columns, 7232 lines of data, 4819 with personal numbers, 2413 without.
    g. The INCA statistics application was run with the original INCA import file and the INCA import output file as input, respectively. The statistics test was performed both for all cancer types, and invasive cancer.
    h. The statistics results for both files were then compared for each cancer type separately. For a successful test result, the statistics results for both files should be identical.
  • Result:
    a. The original INCA import file and the INCA import output file produced identical statistics, when compared for each cancer type separately. The test therefore passed successfully.

Note: Since INCA statistics currently is more forgiving than INCA import regarding entry lines with bad data values or mapping same personal number to more than two lateralities, the statistics will differ if the INCA import file contains entries of this kind, since they will not appear in the INCA import output file, but will be included in the statistics.

comment:7 by olle, 8 years ago

(In [4027]) Refs #898. INCA import updated to display button to download INCA import output CSV file after performed INCA import:

  1. Javascript file import-inca.js in resources/personal/ updated in function submissionResults(response) to display button to download INCA import output CSV file after performed INCA import.

comment:8 by olle, 8 years ago

Milestone: Reggie v4.xReggie v4.6

Milestone changed to Reggie 4.6.

comment:9 by olle, 8 years ago

Functional specification update:

  • A button should be added by which the user can delete the INCA import output CSV file. Even though the INCA import output CSV file doesn't contain personal numbers, it still contains a lot of sensitive data, so it might be desirable to keep it on the server for as short time as possible (although it should in principle be hidden for unauthorized access there).
  • INCA import should be updated to hide buttons for downloading or deleting an INCA import output CSV file, when the latter has been deleted.

Design update:

  1. JSP file import-inca.jsp in resources/personal/ updated by adding new INCA import output CSV file delete button with id "deleteoutputcsvfile".
  2. Javascript file import-inca.js in resources/personal/ updated.
    a. Function initPage() updated by coupling new INCA import output CSV file delete button to new function deleteOutputCsvFile(). Also, multiple calls to function checkForReportFile(reportType) exchanged for call to new function checkForReportFiles().
    b. Function initializeStep2(response) updated to call new function checkForReportFiles() to manage display of INCA report file and INCA import output file buttons.
    c. New function checkForReportFiles() calls new function checkForReportFiles2(reportTypes) with argument array of key strings for INCA report file and INCA import output file.
    d. New function checkForReportFiles2(reportTypes) calls servlet IncaServlet with new command "CheckForIncaReportFiles" (note plural "s" in "Files") with attribute "reporttypes" set to value of argument reportTypes and callback function to new function reportFileButtonsDisplay(response).
    e. New function reportFileButtonsDisplay(response) shows buttons related to INCA report file and INCA import output file, depending on the response indicating that the respective file exists, or not.
    f. New function deleteOutputCsvFile() calls servlet IncaServlet with new command "DeleteIncaReportFile" with attribute "reporttype" set to value of string constant REPORT_TYPE_IMPORT_OUTPUT_CSV and callback function to function submissionResults(response).
    g. Function submissionResults(response) updated to call new function checkForReportFiles() to manage display of INCA report file and INCA import output file buttons.
  3. Java servlet class/file IncaServlet.java in src/net/sf/basedb/reggie/servlet/ updated:
    a. Protected method void doGet(HttpServletRequest req, HttpServletResponse resp) updated for command "CheckForIncaReportFile" by calling new private method boolean checkForReportFile(String reportType) to check if a specific report file exists.
    b. Protected method void doGet(HttpServletRequest req, HttpServletResponse resp) updated with new command "CheckForIncaReportFiles". It calls new private method boolean checkForReportFile(String reportType) with argument reportType set to key strings in array attribute "reportTypes" to check if the specific report files exist. The results are returned in a JSON object with the report types as keys.
    c. Protected method void doPost(HttpServletRequest req, HttpServletResponse resp) updated for command "ImportInca" by including value of request parameter "importcheckonly" in returned JSON object for key "importCheckOnly".
    d. Protected method void doPost(HttpServletRequest req, HttpServletResponse resp) updated with new command "DeleteIncaReportFile". It is currently restricted to deleting an INCA import output CSV file.
    e. Protected method void doPost(HttpServletRequest req, HttpServletResponse resp) updated by including value of request parameter "cmd" in returned JSON object for key "cmd".
    f. New private method boolean checkForReportFile(String reportType) added. It checks if a report file of right type exists.

comment:10 by olle, 8 years ago

(In [4032]) Refs #898. INCA import updated by adding a button by which the user can delete an INCA import output CSV file. Buttons for downloading or deleting an INCA import output CSV file are hidden, when the latter file has been deleted:

  1. JSP file import-inca.jsp in resources/personal/ updated by adding new INCA import output CSV file delete button with id "deleteoutputcsvfile".
  2. Javascript file import-inca.js in resources/personal/ updated.
    a. Function initPage() updated by coupling new INCA import output CSV file delete button to new function deleteOutputCsvFile(). Also, multiple calls to function checkForReportFile(reportType) exchanged for call to new function checkForReportFiles().
    b. Function initializeStep2(response) updated to call new function checkForReportFiles() to manage display of INCA report file and INCA import output file buttons.
    c. New function checkForReportFiles() calls new function checkForReportFiles2(reportTypes) with argument array of key strings for INCA report file and INCA import output file.
    d. New function checkForReportFiles2(reportTypes) calls servlet IncaServlet with new command "CheckForIncaReportFiles" (note plural "s" in "Files") with attribute "reporttypes" set to value of argument reportTypes and callback function to new function reportFileButtonsDisplay(response).
    e. New function reportFileButtonsDisplay(response) shows buttons related to INCA report file and INCA import output file, depending on the response indicating that the respective file exists, or not.
    f. New function deleteOutputCsvFile() calls servlet IncaServlet with new command "DeleteIncaReportFile" with attribute "reporttype" set to value of string constant REPORT_TYPE_IMPORT_OUTPUT_CSV and callback function to function submissionResults(response).
    g. Function submissionResults(response) updated to call new function checkForReportFiles() to manage display of INCA report file and INCA import output file buttons.
  3. Java servlet class/file IncaServlet.java in src/net/sf/basedb/reggie/servlet/ updated:
    a. Protected method void doGet(HttpServletRequest req, HttpServletResponse resp) updated for command "CheckForIncaReportFile" by calling new private method boolean checkForReportFile(String reportType) to check if a specific report file exists.
    b. Protected method void doGet(HttpServletRequest req, HttpServletResponse resp) updated with new command "CheckForIncaReportFiles". It calls new private method boolean checkForReportFile(String reportType) with argument reportType set to key strings in array attribute "reportTypes" to check if the specific report files exist. The results are returned in a JSON object with the report types as keys.
    c. Protected method void doPost(HttpServletRequest req, HttpServletResponse resp) updated for command "ImportInca" by including value of request parameter "importcheckonly" in returned JSON object for key "importCheckOnly".
    d. Protected method void doPost(HttpServletRequest req, HttpServletResponse resp) updated with new command "DeleteIncaReportFile". It is currently restricted to deleting an INCA import output CSV file.
    e. Protected method void doPost(HttpServletRequest req, HttpServletResponse resp) updated by including value of request parameter "cmd" in returned JSON object for key "cmd".
    f. New private method boolean checkForReportFile(String reportType) added. It checks if a report file of right type exists.

comment:11 by Nicklas Nordborg, 8 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.