Opened 8 years ago
Closed 8 years ago
#898 closed task (fixed)
INCA import should create trimmed tab-delimited file with synonymized data
Reported by: | olle | Owned by: | olle |
---|---|---|---|
Priority: | major | Milestone: | Reggie v4.6 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description (last modified by )
INCA import should create a trimmed tab-delimited file with synonymized data for analysis outside Reggie/BASE. The new file should be based on the tab-delimited file used for INCA import, with the following differences:
- Accrued entries, i.e. those with personal numbers, should have the latter exchanged for the patient item names ("PAT" + 6 figures) in the SCAN-B database.
- Apart from columns with headers "
PersonalNo
" and "PAT_ID
", only data in columns corresponding to INCA annotation types should be included.
Change History (11)
comment:1 by , 8 years ago
Status: | new → assigned |
---|
comment:2 by , 8 years ago
Traceability note:
- Creation of a csv file to be used when requesting information from the INCA database was introduced in Ticket #487 (Export information intended for INCA).
- Import of an INCA csv file was introduced in Ticket #525 (Import data from INCA).
- INCA import was updated in Ticket #896 (INCA import should include laterality mapping column).
comment:3 by , 8 years ago
Description: | modified (diff) |
---|
Ticket description updated:
a. Clarifying that personal numbers should be exchanged for SCAN-B patient item names.
b. Column "PERSNR
" is not needed in the output file, since column "PersonalNo
" is included.
comment:4 by , 8 years ago
Functional specification of first version of support for an INCA import output CSV file:
- The first version of support for an INCA import output CSV file, will have the following specification:
a. An INCA import output CSV file in tab-delimited format should be automatically created, when an INCA import is performed.
b. Accrued entries, i.e. those with personal numbers, should have the latter exchanged for the patient item names ("PAT" + 6 figures) in the SCAN-B database.
c. Apart from columns with headers "PersonalNo
" and "PAT_ID
", only data in columns corresponding to INCA annotation types should be included.
d. The user should be able to download the created INCA import output file by clicking on a button.
e. INCA statistics should be updated to allow an INCA import output file to be used as input, i.e. it should be able to identify an accrued entry mapped to a SCAN-B case item from the patient item name, instead of the personal number.
Design update overview:
- The functionality for creation and download of a tab-delimited file will be based on that for INCA statistics.
- The INCA import output CSV file will be created by servlet
IncaServlet
in a new private methodvoid createIncaImportOutputFile(DbControl dc, ...)
, that at import will be called just before changes are committed to the database. This will decrease the risk of the creation of the output file interfering with the INCA import itself. - A new hash map mapping patient item names to patient (BioSource) database ID values will be created.
- Private inner class
LineDatabaseMappingResult
will be updated with new attribute for mapping raw line numbers to patient database ID values for accrued entries. - Private inner classes
RawIncaCase
andIncaCase
will be updated with new attributes for patient database ID and temporary patient id from INCA import file.
comment:5 by , 8 years ago
(In [4026]) Refs #898. First version of support for an INCA import output CSV file in tab-delimited format:
- JSP file
import-inca.jsp
inresources/personal/
updated by adding new output CSV file download button with id "downloadoutputcsvfile
". - Javascript file
import-inca.js
inresources/personal/
updated.
a. String constantsREPORT_TYPE_IMPORT
andREPORT_TYPE_IMPORT_OUTPUT_CSV
defined.
b. FunctioninitPage()
updated by coupling new output CSV file download button to new functiondownloadOutputCsvFile()
. Also, updated functioncheckForReportFile(reportType)
called with argumentreportType
set to string constantsREPORT_TYPE_IMPORT
andREPORT_TYPE_IMPORT_OUTPUT_CSV
, respectively.
c. FunctioninitializeStep2(response)
updated to hide or show new output CSV file download button, depending on if a simple check only is performed, or not.
d. FunctioncheckForReportFile()
updated with argumentreportType
, which value is set to attribute "reporttype
", when performing aGET
request to servletIncaServlet
with command "CheckForIncaReportFile
".
e. FunctionreportFileDownloadButtonDisplay(response)
updated to control the download button corresponding to the report type.
f. FunctiondownloadReportFile()
updated to set value of attribute "reporttype
" to string constantREPORT_TYPE_IMPORT
, instead of explicit string value, when performing aGET
request to servletIncaServlet
with command "DownloadIncaReportFile
".
g. New functiondownloadOutputCsvFile()
is identical to functiondownloadReportFile()
, except that attributereporttype
is set to value of string constantREPORT_TYPE_IMPORT_OUTPUT_CSV
. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. New static final String constantsREPORT_TYPE_IMPORT_OUTPUT_CSV
andINCA_IMPORT_OUTPUT_CSV_FILENAME
defined.
b. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
updated for command "DownloadIncaReportFile
" to perform a file copy using streams, instead of aPrintWriter
object, in order to ensure that character encoding is unchanged.
c. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" to call private methodList<IncaEntryLine> fetchIncaEntryLines(int tempPatIdClmIndex, int personalNoClmIndex, int lateralityDescriptionClmIndex, List<String> lines, boolean accruedEntries)
to obtain a list of non-accrued INCA entry lines. A raw line number patient ID hash map is obtained from aLineDatabaseMappingResult
object. CreatedRawIncaCase
items are updated with patient ID and temporary patient id. At import, these values are transferred to createdIncaCase
objects. New private methodvoid createIncaImportOutputFile(DbControl dc, List<IncaCase> incaCaseList, List<AnnotationType> incaAnnoTypeList, HashMap<Integer,AnnotationType> incaAnnoIdAnnoTypeHM, List<Integer> fileHeaderIndexList, List<String> fileHeaderList, List<IncaEntryLine> nonAccruedIncaEntryLines)
is called at import.
d. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "IncaStatistics
" to call new private methodHashMap<String,Integer> fetchPatientItemNameBioSourceIdHashMap(DbControl dc, SimpleProgressReporter progress, float progressBiosourceMappingFraction, int progressOffset)
to obtain a hash map mapping patient item names to patient (BioSource) database ID values. This hash map is used as new argument when calling update private methodlineDatabaseMappingForStatistics(DbControl dc, ..., HashMap<String,Integer> patItemNameBioSourceIdHM,, ...)
.
e. New private methodvoid createIncaImportOutputFile(DbControl dc, List<IncaCase> incaCaseList, List<AnnotationType> incaAnnoTypeList, HashMap<Integer,AnnotationType> incaAnnoIdAnnoTypeHM, List<Integer> fileHeaderIndexList, List<String> fileHeaderList, List<IncaEntryLine> nonAccruedIncaEntryLines)
added. It creates an INCA import output file in CSV format with columns corresponding to INCA annotation types, plus a "PersonalNo
" column with patient item name for accrued entries, that could be mapped to SCAN-B case items, and a "PAT_ID" column with temporary patient id in the import file.
f. New private methodHashMap<String,Integer> fetchPatientItemNameBioSourceIdHashMap(DbControl dc, SimpleProgressReporter progress, float progressBiosourceMappingFraction, int progressOffset)
added. It returns a hash map mapping patient item name to biosource id.
g. Private methodLineDatabaseMappingResult lineDatabaseMapping(DbControl dc, ...)
updated to obtain a hash map mapping raw line numbers to patient database ID values, and add it to the returnedLineDatabaseMappingResult
object.
h. Private methodLineDatabaseMappingForStatisticsResult lineDatabaseMappingForStatistics(DbControl dc, ...)
updated with new argumentHashMap<String,Integer> patItemNameBioSourceIdHM
, that is used to accept patient item names instead of personal numbers in column "PersonalNo
". Also updated to obtain a hash map mapping raw line numbers to patient database ID values, and add to theLineDatabaseMappingResult
object, that in turn is added to the returnedLineDatabaseMappingForStatisticsResult
object.
i. Private methodString fetchReportFileName(String reportType)
updated to return constantINCA_IMPORT_OUTPUT_CSV_FILENAME
for argumentreportType
equal to constantREPORT_TYPE_IMPORT_OUTPUT_CSV
.
j. Private inner classLineDatabaseMappingResult
updated by adding new private attributeHashMap<Integer,Integer> rawLineNumberPatientIdHM
with public accessor methods.
j. Private inner classRawIncaCase
updated by adding new private attributesint patientId
andString tempPatientId
with public accessor methods.
k. Private inner classIncaCase
updated by adding new private attributesint patientId
andString tempPatientId
with public accessor methods.
l. Some typos fixed.
comment:6 by , 8 years ago
Test:
- Setup:
a. A modified INCA import file with data in tab-delimited format was used for INCA import on a local SCAN-B database.
b. Personal numbers had been exchanged for faked ones used in the local SCAN-B database.
c. The file contained 586 header columns with 117 unknown INCA headers, resulting in 468 columns plus an extra ignored personal number column.
d. All lines had valid data, and no lines with same personal numbers mapped to more than two lateralities.
e. The file contained 7232 lines of data, 4819 with personal numbers, 2413 without.
f. INCA import created an INCA import output file with 468 columns, 7232 lines of data, 4819 with personal numbers, 2413 without.
g. The INCA statistics application was run with the original INCA import file and the INCA import output file as input, respectively. The statistics test was performed both for all cancer types, and invasive cancer.
h. The statistics results for both files were then compared for each cancer type separately. For a successful test result, the statistics results for both files should be identical.
- Result:
a. The original INCA import file and the INCA import output file produced identical statistics, when compared for each cancer type separately. The test therefore passed successfully.
Note: Since INCA statistics currently is more forgiving than INCA import regarding entry lines with bad data values or mapping same personal number to more than two lateralities, the statistics will differ if the INCA import file contains entries of this kind, since they will not appear in the INCA import output file, but will be included in the statistics.
comment:7 by , 8 years ago
(In [4027]) Refs #898. INCA import updated to display button to download INCA import output CSV file after performed INCA import:
- Javascript file
import-inca.js
inresources/personal/
updated in functionsubmissionResults(response)
to display button to download INCA import output CSV file after performed INCA import.
comment:9 by , 8 years ago
Functional specification update:
- A button should be added by which the user can delete the INCA import output CSV file. Even though the INCA import output CSV file doesn't contain personal numbers, it still contains a lot of sensitive data, so it might be desirable to keep it on the server for as short time as possible (although it should in principle be hidden for unauthorized access there).
- INCA import should be updated to hide buttons for downloading or deleting an INCA import output CSV file, when the latter has been deleted.
Design update:
- JSP file
import-inca.jsp
inresources/personal/
updated by adding new INCA import output CSV file delete button with id "deleteoutputcsvfile
". - Javascript file
import-inca.js
inresources/personal/
updated.
a. FunctioninitPage()
updated by coupling new INCA import output CSV file delete button to new functiondeleteOutputCsvFile()
. Also, multiple calls to functioncheckForReportFile(reportType)
exchanged for call to new functioncheckForReportFiles()
.
b. FunctioninitializeStep2(response)
updated to call new functioncheckForReportFiles()
to manage display of INCA report file and INCA import output file buttons.
c. New functioncheckForReportFiles()
calls new functioncheckForReportFiles2(reportTypes)
with argument array of key strings for INCA report file and INCA import output file.
d. New functioncheckForReportFiles2(reportTypes)
calls servletIncaServlet
with new command "CheckForIncaReportFiles
" (note plural "s" in "Files") with attribute "reporttypes
" set to value of argumentreportTypes
and callback function to new functionreportFileButtonsDisplay(response)
.
e. New functionreportFileButtonsDisplay(response)
shows buttons related to INCA report file and INCA import output file, depending on the response indicating that the respective file exists, or not.
f. New functiondeleteOutputCsvFile()
calls servletIncaServlet
with new command "DeleteIncaReportFile
" with attribute "reporttype
" set to value of string constantREPORT_TYPE_IMPORT_OUTPUT_CSV
and callback function to functionsubmissionResults(response)
.
g. FunctionsubmissionResults(response)
updated to call new functioncheckForReportFiles()
to manage display of INCA report file and INCA import output file buttons. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
updated for command "CheckForIncaReportFile
" by calling new private methodboolean checkForReportFile(String reportType)
to check if a specific report file exists.
b. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
updated with new command "CheckForIncaReportFiles
". It calls new private methodboolean checkForReportFile(String reportType)
with argumentreportType
set to key strings in array attribute "reportTypes
" to check if the specific report files exist. The results are returned in a JSON object with the report types as keys.
c. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by including value of request parameter "importcheckonly
" in returned JSON object for key "importCheckOnly
".
d. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated with new command "DeleteIncaReportFile
". It is currently restricted to deleting an INCA import output CSV file.
e. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated by including value of request parameter "cmd
" in returned JSON object for key "cmd
".
f. New private methodboolean checkForReportFile(String reportType)
added. It checks if a report file of right type exists.
comment:10 by , 8 years ago
(In [4032]) Refs #898. INCA import updated by adding a button by which the user can delete an INCA import output CSV file. Buttons for downloading or deleting an INCA import output CSV file are hidden, when the latter file has been deleted:
- JSP file
import-inca.jsp
inresources/personal/
updated by adding new INCA import output CSV file delete button with id "deleteoutputcsvfile
". - Javascript file
import-inca.js
inresources/personal/
updated.
a. FunctioninitPage()
updated by coupling new INCA import output CSV file delete button to new functiondeleteOutputCsvFile()
. Also, multiple calls to functioncheckForReportFile(reportType)
exchanged for call to new functioncheckForReportFiles()
.
b. FunctioninitializeStep2(response)
updated to call new functioncheckForReportFiles()
to manage display of INCA report file and INCA import output file buttons.
c. New functioncheckForReportFiles()
calls new functioncheckForReportFiles2(reportTypes)
with argument array of key strings for INCA report file and INCA import output file.
d. New functioncheckForReportFiles2(reportTypes)
calls servletIncaServlet
with new command "CheckForIncaReportFiles
" (note plural "s" in "Files") with attribute "reporttypes
" set to value of argumentreportTypes
and callback function to new functionreportFileButtonsDisplay(response)
.
e. New functionreportFileButtonsDisplay(response)
shows buttons related to INCA report file and INCA import output file, depending on the response indicating that the respective file exists, or not.
f. New functiondeleteOutputCsvFile()
calls servletIncaServlet
with new command "DeleteIncaReportFile
" with attribute "reporttype
" set to value of string constantREPORT_TYPE_IMPORT_OUTPUT_CSV
and callback function to functionsubmissionResults(response)
.
g. FunctionsubmissionResults(response)
updated to call new functioncheckForReportFiles()
to manage display of INCA report file and INCA import output file buttons. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
updated for command "CheckForIncaReportFile
" by calling new private methodboolean checkForReportFile(String reportType)
to check if a specific report file exists.
b. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
updated with new command "CheckForIncaReportFiles
". It calls new private methodboolean checkForReportFile(String reportType)
with argumentreportType
set to key strings in array attribute "reportTypes
" to check if the specific report files exist. The results are returned in a JSON object with the report types as keys.
c. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by including value of request parameter "importcheckonly
" in returned JSON object for key "importCheckOnly
".
d. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated with new command "DeleteIncaReportFile
". It is currently restricted to deleting an INCA import output CSV file.
e. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated by including value of request parameter "cmd
" in returned JSON object for key "cmd
".
f. New private methodboolean checkForReportFile(String reportType)
added. It checks if a report file of right type exists.
comment:11 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Ticket accepted.