Opened 11 years ago
Closed 9 years ago
#525 closed task (fixed)
Import data from INCA
Reported by: | Nicklas Nordborg | Owned by: | olle |
---|---|---|---|
Priority: | major | Milestone: | Reggie v4.4 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description
We get data from INCA at regular intervals in the form of a tab-separated file. Some of that data should be imported and attached to various items as annotations. Most of them should be related to patient, case, specimen or blood.
Specific details about what data should be attached to what item need to be defined.
Change History (71)
comment:1 by , 9 years ago
Milestone: | Reggie v3.x → Reggie v4.x |
---|
comment:4 by , 9 years ago
Traceability note:
- Creation of a csv file to be used when requesting information from the INCA database was introduced in Ticket #487 (Export information intended for INCA).
comment:5 by , 9 years ago
Background info:
- INCA is an abbreviation of the Swedish expression "Informationsnätverk för cancervården", loosely translated as "Information Network for Cancer Care".
- INCA database extracts for the SCAN-B project is normally obtained from "RCC syd" (an abbreviation of "Regionalt Cancercentrum syd", "Regional Cancer Center - south"), that manages information on cancer patients in southern Sweden.
comment:6 by , 9 years ago
Background info on INCA export files:
- Traditionally, INCA export data has been retrieved in two files in spreadsheet format, that were then saved in tab-separated format. Inspection of two such example files, here called "INCA_file_a" and "INCA_file_b", revealed the following information:
- The first line in each file contained column header names, and was followed by lines with values.
- All column header names were unique in each file.
- 5 column header names were identical in both files.
- Column header names and column values might contain Swedish national characters 'å', 'ä', and 'ö'.
- Most column header names consist of Swedish words (or abbreviations of such).
- Some column values containing text might include tab characters (file "INCA_file_a" contained 337 such lines). These tab characters must be removed before the spreadsheet file is stored in tab-separated format, otherwise some lines will appear to contain too many columns, and there is no simple way to find what columns should be joined. Replacing the internal tab characters with empty strings are recommended, unless the tab character is used to separate two strings, in which case a space character should be used.
- Some column values containing text might include line feed characters (<LF>). Unlike the internal tab characters, there are ways to combine lines, that are too short, until a line of correct size results, but it is still recommended that the internal line feed characters are removed before the spreadsheet file is stored in tab-separated format.
- The files together contained 75 (45 + 30) column pairs, where the column header names in each pair consisted of a common unique identifier plus suffix "_Beskrivning" and "_Värde", respectively, corresponding to "_Description" and "_Value" in English. The value column contained an integer value, that was presumably stored in the database, while the description column contained a Swedish description of the property encoded by that particular value. The columns in a pair did not always come in the same order. File "INCA_file_b" contained two "_Description" columns, without a corresponding "_Value" column, where the contents in both cases consisted of either "Höger", "Vänster", or no value, corresponding to English "Right", "Left", or no value.
- Each line contained data for a single patient, but data for one patient might appear in more than one line.
- The files contained data for all patients in the requested time interval, but only personal numbers for patients, for which data had been requested in the csv file sent to INCA. However, all data lines had a temporary unique patient ID, which did not correspond to a value in the INCA database, but was added to the export file in order to identify entries related to the same patient.
- The sizes of the two INCA import example files in tab-separated format were 7.33 MB and 2.31 MB, respectively, indicating that there should be no problem holding the import data in memory on the server.
INCA example file | # Column headers | # Value lines | # Value lines for requested patients |
INCA_file_a | 145 | 9425 | 6522 |
INCA_file_b | 146 | 9425 | 6522 |
Both files (all columns) | 291 | 9425 | 6522 |
Both files (unique columns) | 286 | 9425 | 6522 |
The first version of INCA data import should only import data for patients, for which data had been requested in the csv file sent to INCA, i.e., those with personal numbers in the INCA export files.
A description of the variables in the INCA database from 2014-01-01 was available. This together with inspection of data in the two example export files gave the following result:
INCA example file | # Column headers | # Date columns | # String columns | # Integer columns | # Boolean columns | # Float columns |
INCA_file_a | 145 | 5 | 59 | 65 | 16 | 0 |
INCA_file_b | 146 | 23 | 37 | 32 | 50 | 4 |
Both files (all columns) | 291 | 28 | 96 | 97 | 66 | 4 |
Both files (unique columns) | 286 | 28 | 94 | 94 | 66 | 4 |
Note regarding Boolean columns: If the INCA variable description described a variable as being of type "Kryssruta
" (check box), or described as being set to value "sant
" (true) at a specific event, the variable is regarded as Boolean. However, if the type is described as a list of values 0 and 1, corresponding to "Nej
" (no), and "Ja
" (yes), respectively, the type is regarded as Integer.
Types of the columns represented in both example files:
Column headers in both example files | Value type | Comment |
PATID | Integer | Temporary patient id |
PersonalNo | String | Personal number (for requested patients only) |
A030Sida_Beskrivning | String | Laterality "Höger" (Right), "Vänster" (Left) |
A030Sida_Värde | Integer | Laterality 1 = Right, 2 = Left |
KON_VALUE | Integer | "Kön" (Sex) 1 = Male, 2 = Female |
Inspection of the example files indicated the following:
Variable type | Value in variable description | Value in example files |
Boolean variables related to check boxes | checked = true | 1 for checked = true, null (blank) for unchecked = false
|
Boolean variables not related to check boxes | 1 for true, 0 for false | 1 for true, 0 for false |
Date | "YYYYMMDD" format | "YYYY-MM-DD" format |
Float | Decimal comma, 2 decimals | Decimal point, 2 decimals |
Note: Sweden, like most countries in central Europe, historically used a decimal comma, but after computers were used more regularly, technical and natural sciences converted in the 1970's to using a decimal point.
comment:7 by , 9 years ago
Possible discrepancies between SCAN-B and INCA data:
Possible causes for a SCAN-B patient entry not appearing in an INCA export file:
- The patient was operated at a site not belonging to "RCC syd" (an abbreviation of "Regionalt Cancercentrum syd", "Regional Cancer Center - south"), that manages information on cancer patients in southern Sweden. Two of nine SCAN-B sites, Uppsala and Jönköping, belong to this category at the time of writing.
- SCAN-B site Halmstad belongs to "RCC syd", but some of the patients operated there do not themselves belong to the region covered by INCA, and their records are therefore not sent there.
- There may be a delay of several months, before a patient record is sent to be included in INCA, while SCAN-B gets the referral form with specimen a few days after the operation.
Possible causes for an INCA patient entry not appearing in the SCAN-B database:
- The patient may have retracted the consent to participate in the SCAN-B study in the time period between records having been requested from INCA and having been received.
comment:8 by , 9 years ago
Recommended procedure for creating a tab-separated *.csv file suitable for INCA import into BASE from an *.xlsx INCA export file in spreadsheet format. The instructions are written for Apache OpenOffice Calc 3.4.1, but should be regarded as guidelines for use of other programs:
- Open INCA export file *.xlsx in OpenOffice.org Calc.
- Replace all tab characters by empty strings:
a. Menu "Edit" -> "Select All".
b. Menu "Edit" -> "Find & Replace...".
c. Click button "More Options", select "Regular expressions" in opened sub-window, in order to allow special characters to be represented with escape character "\".
d. Search for\t
.
e. Replace with "" (blank field).
f. Click button "Replace All".
g. Close "Find & Replace" dialog. - Replace all newline characters by empty strings:
a. Menu "Edit" -> "Select All".
b. Menu "Edit" -> "Find & Replace...".
c. Click button "More Options", select "Regular expressions" in opened sub-window, in order to allow special characters to be represented with escape character "\".
d. Search for\n
.
e. Replace with "" (blank field).
f. Click button "Replace All".
g. Close "Find & Replace" dialog. - Save edited file as *.csv file in tab-delimited format:
a. Menu "File" -> "Save As...".
b. In "Save As" dialog, select directory to save created file in.
c. For "Save as type:" select "Text CSV (.csv) (*.csv)".
d. For "File name:" change file extension to ".csv", if not already done by "Automatic file name extension".
e. Click button "Save".
f. In extra dialog, select "Keep Current Format" (not "Save in ODF Format").
g. In "Export Text File" dialog, for "Character set" select "Unicode (UTF-8)".
h. In "Export Text File" dialog, for "Field delimiter" select "{Tab}".
i. In "Export Text File" dialog, for "Text delimiter" select "" (blank field).
j. In "Export Text File" dialog, select check box option "Save cell content as shown" (all other check box options unselected).
k. In "Export Text File" dialog, click button "OK". - Close OpenOffice.org Calc window.
comment:9 by , 9 years ago
Design discussion:
Apart from technical considerations, there are some special issues regarding INCA import, that affects the software design:
- The INCA export files contain sensitive data, that can be traced to a specific patient via the personal number, so it is preferable not to require the files to be uploaded to BASE before import.
- The INCA import is special in that it does not affect existing item properties or annotations, except the new dedicated INCA annotations. It was therefore decided to let it be implemented by a specific Java servlet,
IncaServlet
, instead of the existingImportServlet
. - Full import of the data in the two example export files will require ~285 new annotations in BASE (286 unique columns minus one or two used for mapping the data to existing BASE items). Even though they have to be added to the BASE database, it is preferable to keep the Reggie source code independent of the details of the annotations. This can be done if the INCA annotations are given names, that can be mapped to the column header names.
comment:10 by , 9 years ago
Design discussion:
- It was decided to perform the INCA data import in a single session, using the complete set of INCA data files as input, since this allows a check to be made, whether some INCA data annotation types are missing in a specific import session.
- The INCA import wizard should perform an initial check of each INCA data file, after which the results are presented to the user. It should be possible to initially select a simple file check, that skips an intricate database consistency check (part "
C.
" in the table below), and therefore can be performed much faster. - Properties of the INCA file checks:
a. If critical problems are encountered, import should be blocked.
b. If problems with individual headers/data lines are encountered, the corresponding data columns/lines might be skipped during import; it is then the user's decision whether to fix the problems in the data file, or proceed with import of the eligible data.
c. Basic results from the file check should be presented in the web form. In addition, it should be possible to open/download a text file with more detailed information from the file check by clicking on a button. The file should include the information presented in the web form, but also optional information on what headers or data lines problems were found with.
d. In the report file, due to the sensitive type of information in the INCA data file, temporary patient ID values should be used instead of personal numbers to identify entries in the INCA file.
- The INCA data file check should include four parts:
Check (Information) | Comment |
A. Basic check | |
Number of header columns | At least 3 key headers required (checked later). |
Number of lines of data | |
Number of lines with internal line feeds | Wizard should remove the internal line feeds before import |
Number of lines with too many columns | None accepted |
Number of lines with too few columns | None accepted |
B. Internal data check | |
Number of duplicate header columns | None accepted |
Temporary patient ID column index | Column required |
Personal number column index | Column required |
Laterality description column index | Column required |
Number of unknown headers | Columns skipped at import |
Number of data lines with personal no. | Required for import |
Number of personal no. with more than 2 lines | Data lines skipped at import |
Number of personal no.s with many identical lateralities | Data lines skipped at import |
C. Database consistency check | (Only lines with personal no. processed) |
Number of data lines with personal no. not in database | Data lines skipped at import |
Number of patient lateralities without database reference | Data lines skipped at import |
D. Database consistency check II | (All files together) |
Number of missing INCA headers | INCA headers skipped at import |
- INCA import annotation types:
a. All annotation types are coupled toCase
items.
b. Data in all columns in the two INCA example files except the temporary patient ID and the two mapping columns "PersonalNo
" and "A030Sida_Beskrivning
" should be imported to annotations. The personal number in the INCA data is used together with the laterality "A030Sida_Beskrivning
" value for mapping an INCA entry to a Case entry in BASE, and are therefore not needed.
c. The name of the annotation type corresponding to a data column should equal prefix "INCA_
" plus the name of the header for the column.
d. The value type of an annotation type should be one ofType.DATE
,Type.STRING
,Type.INT
,Type.BOOLEAN
, orType.FLOAT
, corresponding to the type of the corresponding column in the INCA data file, according to the description of the variables in the INCA database from 2014-01-01.
e. INCA example file two contained two headers, "BN20_Sida_Beskrivning
" and "BP20_Sida_Beskrivning
", without the corresponding "value" headers, "BN20_Sida_Värde
" and "BP20_Sida_Värde
", respectively. In order to be able to check if some INCA data annotation types are missing in an import session, it was decided not to define annotation types for the latter two "_Värde
" columns.
f. Columns corresponding to list values in the INCA variable description, should be mapped to annotation types with value options set to the available values. However, except for the "A030Sida_Beskrivning
" column used for laterality mapping, value options should only be set for integer "_Värde
" columns, since the strings corresponding to these values in the INCA data files are not guaranteed to exactly match the descriptions strings in the INCA variable description.
g. All INCA annotation types should belong to a new "INCA" annotation type category.
h. Two extra annotation types, not coupled to columns in the INCA data file, should be added, one for the date that the INCA data was exported from the database, and one for the last date an INCA import was made for a Case item. These two annotation types should not have prefix "INCA_
", and should not belong to the new "INCA" annotation type category, since they do not correspond to INCA import file headers, and should be excluded, when checking if some INCA data annotation types are missing in an import session. However, they should belong to the "Case" annotation type category.
i. At import, a data line in the INCA data file should be mapped to theCase
item corresponding to a patient with the same personal number as in the line, and where theCase
item has a laterality matching the laterality description in the line.
j. An INCA annotation should only be updated if the value from the INCA data file at import differs from the current annotation value. If the new value isnull
, corresponding to an empty cell in the INCA export spreadsheet file, the corresponding annotation should be removed, if existing.
k. Annotations for the extra annotation types for the date that the INCA data was exported from the database, and the last date an INCA import was made for a Case item, should be updated, even if no INCA annotations for a Case items has been updated, to indicate that the INCA annotation values for the case item equals those of the latest INCA data file.
comment:11 by , 9 years ago
Functional specification update:
- First version of the INCA data import will be based on the following functionality:
a. A new "INCA import" entry will be added to section "Personal information wizards", sub-section "Export/import information to/from external registers", and will require aPatientCurator
role to be used.
b. Step 1 of the INCA import wizard will have two input fields, one for the INCA export date, and one for selecting the files containing the INCA data in tab-delimited format. Two buttons should exist; one for performing a (fast) simple file check, and a "Next" button for a more complete check.
c. Step 2 should present the results after a performed check. Test results specific for a simple file, should be presented for each selected file. It should be possible to download a file with more detailed check results. In order to perform an import, the complete check must be performed. If the complete check does not find any fatal errors, an "Import" button should appear.
d. After import has been performed, a summary report line should be shown.
Design update:
- JSP file
index.jsp
inresources/
updated with new "INCA import" entry in section "Personal information wizards", sub-section "Export/import information to/from external registers". The INCA import entry is linked to new JSP fileimport-inca.jsp
inresources/personal/
, and requires aPatientCurator
role to be used. - New JSP file
import-inca.jsp
inresources/personal/
added. It is linked to new javascript fileimport-inca.js
inresources/personal/
. - New javascript file
import-inca.js
inresources/personal/
added.
a. Functions for performing file checks or importing data appends the selected files to aFormData
object, which is sent to command "ImportInca
" in java servletIncaServlet
in a POST request using Reggie wizard functionWizard.asyncJsonRequest(url, callback, method, postdata)
, with new functioninitializeStep2(response)
as callback function.
b. FunctioninitializeStep2(response)
uses the JSON data in the response to dynamically build a table containing the results reported by the servlet. It displays a button linked to functiondownloadReportFile()
for downloading a file with more detailed results. A path to a temporary file with more detailed results is read from the response, and stored in a hidden input field.
c. FunctiondownloadReportFile()
calls commandwindow.open(url)
for a URL to command "DownloadIncaImportReportFile
" in servletIncaServlet
. A path to a temporary file is added to the URL as parameter with name "tmpFilePath
". - Javascript file
reggie-2.js
inresources
updated in functionWizard.asyncJsonRequest(url, callback, method, postdata)
by not adding a request header for postdata, if the latter is an instance ofFormData
. This was needed to work with Firefox web browser, that only adds request header with needed "boundary" info, if the request header is not set explicitly. - Data access object class/file
Annotationtype.java
insrc/net/sf/basedb/reggie/dao/
updated:
a. New data sample annotation typesINCA_EXPORT_DATE
andINCA_IMPORT_DATE
defined. - Java servlet class/file
InstallServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
to include the new sample annotation typesINCA_EXPORT_DATE
andINCA_IMPORT_DATE
, and add them toSubtype.CASE
annotation type category. - New java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
added.
a. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
supports command "DownloadIncaImportReportFile
", which retrieves a path to a temporary file from the request parameter "tmpFilePath
", after which it sends the file contents to aPrintWriter
object, for download by the user.
b. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
supports command "ImportInca
", that performs a check on retrieved files, and optionally imports the data to the database. - XML configuration file servlets.xml in
META-INF
updated by adding new java servlet classIncaServlet
to the servlet list.
comment:12 by , 9 years ago
(In [3786]) Refs #525. First version of the INCA data import. It is based on the following functionality:
a. A new "INCA import" entry will be added to section "Personal information wizards", sub-section "Export/import information to/from external registers", and will require a PatientCurator
role to be used.
b. Step 1 of the INCA import wizard will have two input fields, one for the INCA export date, and one for selecting the files containing the INCA data in tab-delimited format. Two buttons should exist; one for performing a (fast) simple file check, and a "Next" button for a more complete check.
c. Step 2 should present the results after a performed check. Test results specific for a simple file, should be presented for each selected file. It should be possible to download a file with more detailed check results. In order to perform an import, the complete check must be performed. If the complete check does not find any fatal errors, an "Import" button should appear.
d. After import has been performed, a summary report line should be shown.
- JSP file
index.jsp
inresources/
updated with new "INCA import" entry in section "Personal information wizards", sub-section "Export/import information to/from external registers". The INCA import entry is linked to new JSP fileimport-inca.jsp
inresources/personal/
, and requires aPatientCurator
role to be used. - New JSP file
import-inca.jsp
inresources/personal/
added. It is linked to new javascript fileimport-inca.js
inresources/personal/
. - New javascript file
import-inca.js
inresources/personal/
added.
a. Functions for performing file checks or importing data appends the selected files to aFormData
object, which is sent to command "ImportInca
" in java servletIncaServlet
in a POST request using Reggie wizard functionWizard.asyncJsonRequest(url, callback, method, postdata)
, with new functioninitializeStep2(response)
as callback function.
b. FunctioninitializeStep2(response)
uses the JSON data in the response to dynamically build a table containing the results reported by the servlet. It displays a button linked to functiondownloadReportFile()
for downloading a file with more detailed results. A path to a temporary file with more detailed results is read from the response, and stored in a hidden input field.
c. FunctiondownloadReportFile()
calls commandwindow.open(url)
for a URL to command "DownloadIncaImportReportFile
" in servletIncaServlet
. A path to a temporary file is added to the URL as parameter with name "tmpFilePath
". - Javascript file
reggie-2.js
inresources
updated in functionWizard.asyncJsonRequest(url, callback, method, postdata)
by not adding a request header for postdata, if the latter is an instance ofFormData
. This was needed to work with Firefox web browser, that only adds request header with needed "boundary" info, if the request header is not set explicitly. - Data access object class/file
Annotationtype.java
insrc/net/sf/basedb/reggie/dao/
updated:
a. New data sample annotation typesINCA_EXPORT_DATE
andINCA_IMPORT_DATE
defined. - Java servlet class/file
InstallServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
to include the new sample annotation typesINCA_EXPORT_DATE
andINCA_IMPORT_DATE
, and add them toSubtype.CASE
annotation type category. - New java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
added.
a. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
supports command "DownloadIncaImportReportFile
", which retrieves a path to a temporary file from the request parameter "tmpFilePath
", after which it sends the file contents to aPrintWriter
object, for download by the user.
b. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
supports command "ImportInca
", that performs a check on retrieved files, and optionally imports the data to the database. - XML configuration file servlets.xml in
META-INF
updated by adding new java servlet classIncaServlet
to the servlet list.
comment:13 by , 9 years ago
(In [3801]) Refs #525. Java servlet class IncaServlet
refactored to make code more readable. Redundant statements, unnecessary checks, and code residues from test output during development have been removed. Some comments have been added:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
", by removal of redundant statements, unnecessary checks, and code residues from test output during development. Some comments have been added.
comment:14 by , 9 years ago
Note on INCA annotations:
- INCA file columns "
A030PrepNr
" and "A090VPrepNr
" contain PAD values, which are regarded as sensitive data, similar to name and personal number. The corresponding annotation types "INCA_A030PrepNr
" and "INCA_A090VPrepNr
" should therefore requirePatientCurator
role to be inspected. - Test import of anonymized data on a local system for 75% of the lines intended for import in the two available INCA files, revealed a number of discrepancies between the INCA variable description from 2014-01-01 and the INCA files (numbers quoted in the comment column are for the original INCA files, including items not intended for import):
Column | Odd value in file | Corresponds to | Comment |
A040MKlass_Värde | 20 | "MX Fjärrmetastaser kan ej bedömas" | Value '20' is deprecated according to the variable description, but 892 items found in indata files. |
A090PadTyp_Värde | 97 | "PAD ej utförd" | According to the variable description, this should be represented by '3', not '97', but 67 items with '97' found, none with '3'. |
A080OrsKompLgl2_Värde | 10 | "Axillutrymning efter SN pga tumördata (t ex positiv SN)" | Not included in variable description, but 157 items found. |
A030PatKod | D5 | Only value that is not an integer (9311 integer values). Variable description does not specify that the value should be an integer, so the annotation should be of value type Type.STRING .
|
comment:15 by , 9 years ago
(In [3816]) Refs #525. Java servlet class IncaServlet
refactored to make code more readable. Redundant statements, unnecessary checks, and code residues from test output during development have been removed. Some comments have been added:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a.ItemQuery
for INCA annotation types updated to include types shared to current project and to logged-in user. The latter is needed in order to include INCA annotation types with values representing PAD-numbers, and which therefore are shared to thePatientCurator
group.
b. Exception when trying to open a file is now re-thrown after a log message has been written.
comment:16 by , 9 years ago
Tests of initial version of INCA import:
Test setup:
- Tests were performed on a local system with an anonymized subset of the SCAN-B data. Two test files were prepared, having the same format as the two example INCA files. The two test files contained personal numbers (faked) and lateralities from the anonymized local database, while the other columns contained data from the two example files, with the exception of the two columns representing PAD numbers, which were filled with faked values. Each test file contained 4703 lines with valid input for INCA import on the local system.
Things learned from initial tests:
- The first test runs failed due to "java.lang.OutOfMemoryError: GC overhead limit exceeded" ("GC" = Java garbage collection). Apache Tomcat 8 was re-configured by increasing Java max heap memory from 1024MB to 2048MB.
- When many annotation changes were made, the final "commit" step, when Hibernate updates the database with the changes made in the corresponding items in the program, took a long time. Often the import crashed during this step, due to different reasons.
- Processing of the first test file took much longer than the second, as the former contains much more non-blank entries. Sometimes this led to error "java.io.EOFException: Unexpected EOF read on the socket" when trying to read the second test file afterwards.
- After a successful initial import, when a lot of INCA annotations were getting values, was tested to re-import the same test files. The expected result was that no INCA annotations should be updated, but the INCA date annotations should, as the test was performed another day than the initial import. However, the test was also performed in order to check if the data processing took longer than for the initial import, as the program now had to retrieve values for a lot of annotations, in order to check if the latter needed to be updated. The tests confirmed the suspicion that this was indeed the case.
Recommendations for changes in the INCA import, based on initial tests:
- In order to make the import more stable, data for all files should be input before the data is processed. This should be possible, since the size of the test files (7.5MB and 2.4MB, respectively) is low enough for the contents to be stored in the memory of a modern computer.
comment:17 by , 9 years ago
(In [3817]) Refs #525. INCA import hopefully made more stable, by reading data for all input files, before processing the former. Also some other changes in order to increase readability of the code:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. Data for all input files are now read, before the data is processed. The change requires the introduction of a number of extra list items to store input information for the individual files for later processing.
b. Name of AnnotationType variable changed to "at
", since the latter is used in several other programs.
comment:18 by , 9 years ago
(In [3818]) Refs #525. INCA import refactored, in order to put code sections in more logical order:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. Code for input of request parameters moved to top of code section.
b. Mapping of biosource ID to personal number moved outside of loop for processing file data, since the map is independent of the input files.
comment:19 by , 9 years ago
(In [3819]) Refs #525. First (experimental) attempt at implementing support for an annotation snapshot manager for INCA annotations:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. A newSnapshotManager
objectmanager
is created by calling methodgetSnapshotManager()
.
b. Retrieval of annotation values, where the annotation types have anAnnotationtype
(note lowercase 't' in "type") representation, is prepared for using aSnapshotManager
, but not yet activated (calls are commented out).
c. AHashMap<Integer,AnnotationTypeFilter>
object is used to store a snapshot filter for ID values of usedAnnotationType
items.
d. Retrieval of INCA annotation values are now performed by calling new private methodObject fetchAnnotationValue(DbControl dc, AnnotationType at, AnnotationSet as, HashMap<Integer,AnnotationTypeFilter> atIdSnapshotFilterHM, SnapshotManager manager, Annotatable item)
.
e. New private methodObject fetchAnnotationValue(DbControl dc, AnnotationType at, AnnotationSet as, HashMap<Integer,AnnotationTypeFilter> atIdSnapshotFilterHM, SnapshotManager manager, Annotatable item)
added. If a snapshot manager and snapshot filter exist, it obtains the value by calling methodfindAnnotations(dc, item, snapshotFilter, false)
, otherwise new private methodObject fetchAnnotationValue(AnnotationType at, AnnotationSet as)
is called. If a new snapshot filter needs to be created, it is stored inHashMap atIdSnapshotFilterHM
.
f. New private methodObject fetchAnnotationValue(AnnotationType at, AnnotationSet as)
added. It retrieves the annotation value in the same manner as previously used.
g. Output log messages added for tests, in order to check how far the application has run, if terminated prematurely, and to get time estimates for different parts of the code for performance checks.
comment:20 by , 9 years ago
Design discussion:
When INCA data is supplied in multiple files, the latter normally contain rows for the same set of personal numbers and laterality. The majority of these corresponds to uni-lateral cases, where a personal number uniquely defines a case item in the database. This gives a possibility to speed up the case mapping step in the file check stage for INCA files following the first, if unique case ID and laterality values in the SCAN-B database are stored for personal numbers in the first INCA file.
Design update:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. Two new hash maps created for storing database case ID and laterality for given personal number. These hash maps are then first checked for case ID and laterality for a given personal number, before accessing the database/snapshot manager for the values. If the values need to be obtained from the database/snapshot manager, the former are stored for future use in the new hash maps with the personal number as key.
b. Commands for obtaining laterality and INCA date annotations for a case item are updated to use a snapshot manager.
comment:21 by , 9 years ago
(In [3830]) Refs #525. Attempt to speed up file check/case mapping stage for extra INCA files, by storing case ID and database laterality for personal numbers in the first INCA file, when the former are unique. Also more use of annotation snapshot manager:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. Two new hash maps created for storing database case ID and laterality for given personal number. These hash maps are then first checked for case ID and laterality for a given personal number, before accessing the database/snapshot manager for the values. If the values need to be obtained from the database/snapshot manager, the former are stored for future use in the new hash maps with the personal number as key.
b. Commands for obtaining laterality and INCA date annotations for a case item are updated to use a snapshot manager.
comment:22 by , 9 years ago
Design discussion:
- The procedure for speeding up the mapping step by utilizing that the INCA input files normally contain lines for the same cases, could also be used for the import step, by importing all INCA annotations for a single case in the same sub-step, instead of doing it file by file. While some speed improvement might be possible, the main improvement would be that this makes it possible to divide the single commit step into a number of commits, where each one concerns a sub-set of the cases. The advantage of using several commits would be to avoid program crashes due to the java heap memory reaching its maximum limit. If something goes wrong during the import step, this would result in some cases having been updated and other not, but a for a single case either all INCA annotations were up-to-date or unchanged. Using several commits while processing file by file might end in some cases having only a part of the INCA annotations up-to-date.
- Allowing all INCA annotations for a single case to be committed in the same sub-step, requires collecting INCA data for a case from all input INCA files. In order to make the code more readable, this should be done using a number of inner help classes of the data access object type.
- If some headers of columns to be imported exist in more than one file, it is desirable that the corresponding INCA annotation for a case item only is updated once (or update counted once, if the column values are identical in all files).
Design update:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. A new hash map is created for storing annotation type for given annotation type ID.
b. Code is re-written to collect INCA annotations for a given case from all input INCA files, before import is performed. The import is then performed, one case at a time. In order to make the code more readable, use is made of new inner help classesUnprocessedIncaFile
,IncaFile
,RawIncaCase
,IncaCase
, andIncaAnnoItem
.
c. New inner private classUnprocessedIncaFile
added. It stores filename, list of header strings, and list of data lines for an INCA input file. Note that not all data lines might contain mapping information allowing import to the SCAN-B database.
d. New inner private classIncaFile
added. It stores filename, list of header strings, list of indexes for columns to be imported, and list ofRawIncaCase
items for an INCA input file.
e. New inner private classRawIncaCase
added. It stores database ID and INCA import line for a case item.
f. New inner private classIncaCase
added. It stores database ID, list ofIncaAnnoItem
objects, and list of database ID values for used annotation types for a case item.
g. New inner private classIncaAnnoItem
added. It stores the database ID of the annotation type and the value string to be imported.
h. Some variable names have been updated, in order to make them more consistent.
i. Test version writes a log message with time stamp for every 100 case item that is processed for import, in order to check performance/stability.
comment:23 by , 9 years ago
(In [3836]) Refs #525. INCA import re-written to collect INCA data for each case from all input INCA files before import, and then perform the import one case at a time:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. A new hash map is created for storing annotation type for given annotation type ID.
b. Code is re-written to collect INCA annotations for a given case from all input INCA files, before import is performed. The import is then performed, one case at a time. In order to make the code more readable, use is made of new inner help classesUnprocessedIncaFile
,IncaFile
,RawIncaCase
,IncaCase
, andIncaAnnoItem
.
c. New inner private classUnprocessedIncaFile
added. It stores filename, list of header strings, and list of data lines for an INCA input file. Note that not all data lines might contain mapping information allowing import to the SCAN-B database.
d. New inner private classIncaFile
added. It stores filename, list of header strings, list of indexes for columns to be imported, and list ofRawIncaCase
items for an INCA input file.
e. New inner private classRawIncaCase
added. It stores database ID and INCA import line for a case item.
f. New inner private classIncaCase
added. It stores database ID, list ofIncaAnnoItem
objects, and list of database ID values for used annotation types for a case item.
g. New inner private classIncaAnnoItem
added. It stores the database ID of the annotation type and the value string to be imported.
h. Some variable names have been updated, in order to make them more consistent.
i. Test version writes a log message with time stamp for every 100 case item that is processed for import, in order to check performance/stability.
comment:24 by , 9 years ago
Functional specification update:
- It is desirable that the steps needed when preparing an INCA spreadsheet file for import, i.e. creating a file in *.csv format with tab column separators, are as few and simple as possible. The operation described above 2016-02-09 contains 5 steps, where steps 2 and 3 concern replacing internal line feed and tab characters with empty strings. The text parsing procedure can already handle cells with internal line feed characters, as the number of columns in a line is known from the header line. If step 4 is modified to saving the *.csv file with "Text delimiter" set to a double quote character '"', instead of an empty string (blank), it should be possible for the program to identify the internal tab characters and replace them with spaces in the input data. (In the old instruction, internal tab characters were replaced by empty strings, but if the former are used to separate two words, it is safer to replace them with spaces.)
Design update:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
" in the section reading an input INCA *.csv file:
a. A single helperTrimmedLineItem
data access object is created. It will be reused, and allows new private methodTrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
to return both a trimmed line and current status of a flag indicating if the text processing is inside a section enclosed by double quotes.
b. Boolean flaginsideDoubleQuotes
indicates if the text processing is inside a section enclosed by double quotes, and is initialized tofalse
at the start of each new file.
c. Each raw line read from the input file is processed by new private methodTrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
, before being split into columns, using tab characters as column separators.
d. New private methodTrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
added. It replaces tabs with spaces in sections enclosed by double quotes, and returns aTrimmedLineItem
object containing the possibly modified line, together with current status of the flag indicating if the text processing is inside a section enclosed by double quotes.
e. New inner private helper classTrimmedLineItem
of data access object type added. It contains a boolean flag indicating if the text processing is inside a section enclosed by double quotes, and a string containing the line for processing.
comment:25 by , 9 years ago
(In [3840]) Refs #525. INCA import updated to allow the program to handle internal line feed and tab characters, provided the *.csv file is saved with "Text delimiter" set to a double quote character '"', instead of an empty string (blank):
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
" in the section reading an input INCA *.csv file:
a. A single helperTrimmedLineItem
data access object is created. It will be reused, and allows new private methodTrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
to return both a trimmed line and current status of a flag indicating if the text processing is inside a section enclosed by double quotes.
b. Boolean flaginsideDoubleQuotes
indicates if the text processing is inside a section enclosed by double quotes, and is initialized tofalse
at the start of each new file.
c. Each raw line read from the input file is processed by new private methodTrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
, before being split into columns, using tab characters as column separators.
d. New private methodTrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
added. It replaces tabs with spaces in sections enclosed by double quotes, and returns aTrimmedLineItem
object containing the possibly modified line, together with current status of the flag indicating if the text processing is inside a section enclosed by double quotes.
e. New inner private helper classTrimmedLineItem
of data access object type added. It contains a boolean flag indicating if the text processing is inside a section enclosed by double quotes, and a string containing the line for processing.
comment:26 by , 9 years ago
(In [3841]) Refs #525. INCA import updated by removing debug output of lines modified by private method TrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
, when reading input files:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
" in the section reading an input INCA *.csv file:
a. Debug output of lines modified by private methodTrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
removed.
comment:27 by , 9 years ago
Updated recommended procedure for creating a tab-separated *.csv file suitable for INCA import into BASE from an *.xlsx INCA export file in spreadsheet format. The instructions are written for Apache OpenOffice Calc 3.4.1 or LibreOffice 5.1.1.3, but should be regarded as guidelines for use of other programs:
- Open INCA export file *.xlsx in OpenOffice.org Calc.
- Save edited file as *.csv file in tab-delimited format:
a. Menu "File" -> "Save As...".
b. In "Save As" dialog, select directory to save created file in.
c. For "Save as type:" select "Text CSV (.csv) (*.csv)".
d. For "File name:" change file extension to ".csv", if not already done by "Automatic file name extension".
e. Click button "Save".
f. In extra dialog, select "Keep Current Format" (not "Save in ODF Format").
g. In "Export Text File" dialog, for "Character set" select "Unicode (UTF-8)".
h. In "Export Text File" dialog, for "Field delimiter" select "{Tab}".
i. In "Export Text File" dialog, for "Text delimiter" select """ (double quote).
j. In "Export Text File" dialog, select check box option "Save cell content as shown" (all other check box options unselected).
k. In "Export Text File" dialog, click button "OK". - Close OpenOffice.org Calc window.
Note: The previous instructions from 2016-02-09 should still produce valid input files, but require more work.
comment:28 by , 9 years ago
comment:29 by , 9 years ago
comment:30 by , 9 years ago
Design discussion:
- Tests have shown that imports with a large number of annotation changes (a full INCA import might include >~ 500000), might terminate prematurely during the commit step, as the Java heap memory is exhausted. In order to stabilize the import, a commit can be performed after a fixed number of case items have been processed.
- Conversion of a value string from an INCA import file to the expected value type of the corresponding INCA annotation type might throw a
net.sf.basedb.core.InvalidDataException
, if the string content doesn't match the expected type. In order to obtain more information, if this happens, the exception should be caught, and the full contents of the parsed import line for the case in question should be logged.
Design update:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. Commit is now performed after each 100th case item has been processed. Debug output is written to the log file, in order to check the time spent in different parts of the program.
b. If anet.sf.basedb.core.InvalidDataException
is thrown when trying to convert a value string from an INCA file column to the value type of the corresponding INCA annotation type, the exception is now caught, and the full contents of the parsed import line for the case in question is written to the log.
comment:31 by , 9 years ago
(In [3854]) Refs #525. INCA import updated in order to try to stabilize the application, and to gain more information if it terminates prematurely due to conversion errors:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. Commit is now performed after each 100th case item has been processed. Debug output is written to the log file, in order to check the time spent in different parts of the program.
b. If anet.sf.basedb.core.InvalidDataException
is thrown when trying to convert a value string from an INCA file column to the value type of the corresponding INCA annotation type, the exception is now caught, and the full contents of the parsed import line for the case in question is written to the log.
comment:33 by , 9 years ago
Bug found:
- Current implementation of INCA import with multiple commits does not work past the first commit, unless some components of stored annotation type objects are initialized, when the annotation type object is fetched from the database. (Unfortunately, the first test with the new code was performed with input *.csv files with all columns except "PATID" and mapping columns being empty, in an effort to clear all previous INCA annotations in the database. Since all INCA annotations should be removed in this case, the program does only need to check if previous annotations exist, but not their values, in order to determine how many changes have been made.)
Design update:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca"
:
a. In order for stored annotation types to work past a first commit, values for enumerable annotation types, as well as collections "itemTypes
" and "options
", are initialized when an annotation type is fetched from the database.
comment:34 by , 9 years ago
(In [3855]) Refs #525. Bug fixed in INCA import with multiple commits, to make it work past the first commit. It now initializes some components of stored annotation type objects, when the annotation type object is fetched from the database. The program now also re-throws a caught net.sf.basedb.core.InvalidDataException
after a log message has been written:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca"
:
a. In order for stored annotation types to work past a first commit, values for enumerable annotation types, as well as collections "itemTypes
" and "options
", are initialized when an annotation type is fetched from the database.
b. A caughtnet.sf.basedb.core.InvalidDataException
is now re-thrown after a log message has been written.
comment:35 by , 9 years ago
Functional specification update:
- INCA import should display a report file download button when entering the application, if a report file exists.
- INCA report file should contain start and end times for the import/test. Currently one time value is included, which corresponds to the end of the import/test.
Design update:
- In addition to code changes to implement the functional specification updates above, some code updates have been made in order to make the code more complete and increase clarity.
- Outermost Ant build file
build.xml
in/
updated to set BASE version to3.8.0
, since the recommended way of adding INCA annotation types is through use of an annotation type importer, that was introduced in that BASE version. (This also opens the possibility to use planned additions to the annotation API in BASE 3.8.0 in the INCA import code.) - Javascript file
import-inca.js
inresources/personal/
updated:
a. FunctioninitPage()
updated to not hide and disable the button for downloading a report file, but instead call new functioncheckForReportFile()
.
b. New functioncheckForReportFile()
added. It calls servletIncaServlet
with new command "CheckForIncaImportReportFile
" in a "Get" request with callback functionreportFileDownloadButtonDisplay(response)
.
c. New functionreportFileDownloadButtonDisplay(response)
added. It retrieves a boolean flag indicating an existing report file from the servlet response, and an optional path to the report file. If a report file exists, the path for the file is stored in hidden input fieldreportFilePath
and the button for downloading the report file is shown, otherwise the button is disabled and hidden.
d. FunctioninitializeStep2(response)
updated to call "Wizard.setCurrentStep(2)
" at start of the function. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
updated with new command "CheckForIncaImportReportFile
". It calls new private convenience methodString fetchReportFilePath()
to obtain the path for an optional report file, and returns a JSON object with the path for key "reportFilePath
", and a boolean flag indicating if a report file exists for key "incaReportFileExists
".
b. New private convenience methodString fetchReportFilePath()
added. It returns the path for an optional report file.
c. The name to use for the report file is now stored in private StringINCA_IMPORT_REPORT_FILENAME
.
d. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
". A time stamp is obtained at the start of the method, and is stored in JSONObjectjsonIncaFilePropDetails
for key "incaImportStart
".
e. Private methodString createIncaImportReportFile(JSONArray jsonIncaFilePropDetailsArr, List<String> missingIncaHeadersList, String message)
updated to call new private convenience methodString fetchReportFilePath()
to obtain the path for the report file, and retrieves the value for the import start time stamp from JSONObjectjsonIncaFilePropDetails
. The start and end times of the import/test are now written at the beginning of the report.
f. References to the report file in the code has been updated to avoid referring to it as a temporary file, since the file is not removed, when to import/test is finished.
comment:36 by , 9 years ago
(In [3856]) Refs #525. INCA import updated:
a. A report file download button is now displayed, when entering the application, if a report file exists.
b. The INCA report file now contains start and end times for the import/test. (Previously one time value was included, which corresponded to the end of the import/test.)
c. In addition, some code updates have been made in order to make the code more complete and increase clarity.
- Outermost Ant build file
build.xml
in/
updated to set BASE version to3.8.0
, since the recommended way of adding INCA annotation types is through use of an annotation type importer, that was introduced in that BASE version. (This also opens the possibility to use planned additions to the annotation API in BASE 3.8.0 in the INCA import code.) - Javascript file
import-inca.js
inresources/personal/
updated:
a. FunctioninitPage()
updated to not hide and disable the button for downloading a report file, but instead call new functioncheckForReportFile()
.
b. New functioncheckForReportFile()
added. It calls servletIncaServlet
with new command "CheckForIncaImportReportFile
" in a "Get" request with callback functionreportFileDownloadButtonDisplay(response)
.
c. New functionreportFileDownloadButtonDisplay(response)
added. It retrieves a boolean flag indicating an existing report file from the servlet response, and an optional path to the report file. If a report file exists, the path for the file is stored in hidden input fieldreportFilePath
and the button for downloading the report file is shown, otherwise the button is disabled and hidden.
d. FunctioninitializeStep2(response)
updated to call "Wizard.setCurrentStep(2)
" at start of the function. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
updated with new command "CheckForIncaImportReportFile
". It calls new private convenience methodString fetchReportFilePath()
to obtain the path for an optional report file, and returns a JSON object with the path for key "reportFilePath
", and a boolean flag indicating if a report file exists for key "incaReportFileExists
".
b. New private convenience methodString fetchReportFilePath()
added. It returns the path for an optional report file.
c. The name to use for the report file is now stored in private StringINCA_IMPORT_REPORT_FILENAME
.
d. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
". A time stamp is obtained at the start of the method, and is stored in JSONObjectjsonIncaFilePropDetails
for key "incaImportStart
".
e. Private methodString createIncaImportReportFile(JSONArray jsonIncaFilePropDetailsArr, List<String> missingIncaHeadersList, String message)
updated to call new private convenience methodString fetchReportFilePath()
to obtain the path for the report file, and retrieves the value for the import start time stamp from JSONObjectjsonIncaFilePropDetails
. The start and end times of the import/test are now written at the beginning of the report.
f. References to the report file in the code has been updated to avoid referring to it as a temporary file, since the file is not removed, when to import/test is finished.
comment:37 by , 9 years ago
(In [3857]) Refs #525. Bug fix: INCA import updated in javascript, to make it compatible with changes in parameter names in servlet IncaServlet
:
- Javascript file
import-inca.js
inresources/personal/
updated in functiondownloadReportFile()
to use parameter name "reportFilePath
" instead of "tmpFilePath
" for the report file path, when calling servletIncaServlet
with command "DownloadIncaImportReportFile
".
comment:38 by , 9 years ago
Design update:
- Inca import should be updated in the commit step to use the
AnnotationBatcher
API introduced in BASE 3.8.0 in BASE Ticket #2000 (Batch API for annotation handling). This will decrease use of heap memory and improve commit speed. - The new
AnnotationBatcher
cannot be used in the same session as standard database requests for the same item. Since values of a number of case annotations like laterality etc. are needed in the test part preceding the import part, a newDbControl
item has to be created for use in the import part.
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. A list of database ID values for INCA annotation types is stored after the database query.
b. Before the import step,dc.commit()
is called for theDbControl
item used in the input file test steps, after which a newDbControl
item is created for use with theAnnotationBatcher batcher
item. INCA annotation type lists and hash maps, that are to be used in the import step, are re-created using annotation type items created from the stored ID list using the newDbControl
item. The INCA annotation types plus the INCA export and import date annotation types are added to thebatcher
.
c. For each case mapped to the INCA import,batcher
is set to use the current case item, then loaded with the INCA annotations to be updated (using thesetValue()
method), after which the INCA export and import date annotation are loaded.
d. A single commanddc.commit()
is then called. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated by removing private methodsObject fetchAnnotationValue(AnnotationType at, AnnotationSet as)
andObject fetchAnnotationValue(DbControl dc, AnnotationType at, AnnotationSet as, HashMap<Integer,AnnotationTypeFilter> atIdSnapshotFilterHM, SnapshotManager manager, Annotatable item)
, as they are no longer used.
comment:39 by , 9 years ago
(In [3858]) Refs #525. Inca import updated in the commit step to use the AnnotationBatcher
API introduced in BASE 3.8.0 in BASE Ticket #2000 (Batch API for annotation handling). This will decrease use of heap memory and improve commit speed. The new AnnotationBatcher
cannot be used in the same session as standard database requests for the same item. Since values of a number of case annotations like laterality etc. are needed in the test part preceding the import part, a new DbControl
item has to be created for use in the import part:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. A list of database ID values for INCA annotation types is stored after the database query.
b. Before the import step,dc.commit()
is called for theDbControl
item used in the input file test steps, after which a newDbControl
item is created for use with theAnnotationBatcher batcher
item. INCA annotation type lists and hash maps, that are to be used in the import step, are re-created using annotation type items created from the stored ID list using the newDbControl
item. The INCA annotation types plus the INCA export and import date annotation types are added to thebatcher
.
c. For each case mapped to the INCA import,batcher
is set to use the current case item, then loaded with the INCA annotations to be updated (using thesetValue()
method), after which the INCA export and import date annotation are loaded.
d. A single commanddc.commit()
is then called. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated by removing private methodsObject fetchAnnotationValue(AnnotationType at, AnnotationSet as)
andObject fetchAnnotationValue(DbControl dc, AnnotationType at, AnnotationSet as, HashMap<Integer,AnnotationTypeFilter> atIdSnapshotFilterHM, SnapshotManager manager, Annotatable item)
, as they are no longer used.
comment:40 by , 9 years ago
Milestone: | Reggie v4.x → Reggie v4.4 |
---|
comment:41 by , 9 years ago
comment:42 by , 9 years ago
(In [3867]) Refs #525. INCA import updated in report file management. The report file path is only needed on the servlet side, so all references to it in JSP/Javascript are removed:
- JSP file
import-inca.jsp
inresources/personal/
updated by removing unused hidden input field. - Javascript file
import-inca.js
inresources/personal/
updated by removing unused references to report file path. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
updated for command "CheckForIncaImportReportFile
" to not return report file path.
b. Protected methodvoid doGet(HttpServletRequest req, HttpServletResponse resp)
updated for command "DownloadIncaImportReportFile
" to call private methodString fetchReportFilePath()
to obtain report file path.
c. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" to not return report file path.
comment:43 by , 9 years ago
Design update:
- Use of snapshot manager when getting annotation values resulted in significantly different import times, depending on whether the snapshot cache had to be updated or not. The snapshot manager is therefore no longer used.
- INCA importer should check if a potential import value has a type corresponding to the value type of the annotation type, it is to be imported to. If not, data for the case corresponding to the line with the offending value should be skipped in all import files.
- Javascript file
import-inca.js
inresources/personal/
updated in functioninitializeStep2(response)
to include line with number of data lines with bad values in database consistency check tables for import files. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
":
a. Snapshot manager removed.
b. List added for storing case ID values for lines with bad values.
c. Loop over import files updated if full check is to be performed by storing info on bad values, and storing case ID values for lines with bad values.
d. Import step updated when collecting data for each case ID, to skip cases, corresponding to lines with bad data in any of the import files.
e. Log output during case mapping step reduced to every 1000 cases, instead of every 100 cases.
f. Private methodString createIncaImportReportFile(JSONArray jsonIncaFilePropDetailsArr, List<String> missingIncaHeadersList, String message)
updated to include the number of data lines with bad values found for each import file, and details about any found bad value.
comment:44 by , 9 years ago
(In [3875]) Refs #525. INCA import updated:
a. Use of snapshot manager when getting annotation values resulted in significantly different import times, depending on whether the snapshot cache had to be updated or not. The snapshot manager is therefore no longer used.
b. INCA importer now checks if a potential import value has a type corresponding to the value type of the annotation type, it is to be imported to. If not, data for the case corresponding to the line with the offending value is skipped in all import files.
- Javascript file
import-inca.js
inresources/personal/
updated in functioninitializeStep2(response)
to include line with number of data lines with bad values in database consistency check tables for import files. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
":
a. Snapshot manager removed.
b. List added for storing case ID values for lines with bad values.
c. Loop over import files updated if full check is to be performed by storing info on bad values, and storing case ID values for lines with bad values.
d. Import step updated when collecting data for each case ID, to skip cases, corresponding to lines with bad data in any of the import files.
e. Log output during case mapping step reduced to every 1000 cases, instead of every 100 cases.
f. Private methodString createIncaImportReportFile(JSONArray jsonIncaFilePropDetailsArr, List<String> missingIncaHeadersList, String message)
updated to include the number of data lines with bad values found for each import file, and details about any found bad value.
comment:45 by , 9 years ago
(In [3882]) Refs #525. INCA importer updated in the bad value check for INCA annotation types of integer value type, to check if an enumeration is specified, and if so, check if the supplied value is included in the enumeration. If not, data for the case corresponding to the line with the offending value is skipped in all import files, and a note on the offending value is included in the report file.
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" to check if an INCA annotation type of integer value type specifies an enumeration, and if so, check if the supplied value is included in the enumeration. If not, data for the case corresponding to the line with the offending value is skipped in all import files. The offending value together with the enumeration list is added to the data stored for use by the report file.
b. Private methodString createIncaImportReportFile(JSONArray jsonIncaFilePropDetailsArr, List<String> missingIncaHeadersList, String message)
updated to include the optional enumeration list in the data reported for found bad values.
comment:46 by , 9 years ago
(In [3886]) Refs #525. INCA importer updated in the bad value check for INCA annotation types, to not specify an optional enumeration in the report file, if the offending value is of the wrong value type, or if no enumeration exists (previously value null
was reported in these cases):
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" in the bad value check, to set enumeration tonull
, if the offending value is of the wrong value type.
b. Private methodString createIncaImportReportFile(JSONArray jsonIncaFilePropDetailsArr, List<String> missingIncaHeadersList, String message)
updated to only include an optional enumeration list in the data reported for a found bad value, if the enumeration list differs fromnull
.
comment:47 by , 9 years ago
Design discussion:
- It has been decided that future INCA input files should be converted to a tab-separated *.csv file directly at INCA, using a specially designed program, taking the "raw" INCA output and the SCAN-B request file (referred to as "INCA export") as input. See wiki page INCA XML to CSV converter and Ticket #881 (Implement INCA XML to CSV converter) for more information.
- The new INCA export procedure will affect the INCA importer, as several properties of the INCA import file will change:
- The number of columns will change (normally it will increase).
- The headers of some columns that existed in previous INCA files have changed. Specifically, some columns used to map lines in the INCA file to SCAN-B case items are affected.
- The management of internal new line and tab characters are now handled by the conversion program, and the INCA importer has to be adapted to this, in order for the imported values to be consistent with other BASE exporters. The conversion program will encode newline, tabs, and backslash characters to
\n
,\t
, and\\
, respectively. The INCA importer should decode these values back to the original characters, when an INCA annotation is updated.
Changed column headers used for case mapping:
Column contents | Old header | New header |
Temporary patient ID | PATID | PAT_ID
|
Personal number | PersonalNo | PERSNR
|
The new INCA file contains 255 columns (excluding the PAT_ID
and PERSNR
mapping columns) without a corresponding INCA annotation type.
Missing INCA columns in new file:
Header | |
U070EndoAnn | ' |
U070EndoArom
| |
U070EndoBehPg
| |
U070EndoEjAkt
| |
U070EndoSlutDat
| |
U070EndoTam
|
comment:48 by , 9 years ago
(In [3888]) Refs #525. INCA importer updated to accept alternative headers "PAT_ID
" and "PERSNR
" for temporary patient ID and personal number, respectively:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Alternative header strings "PAT_ID
" and "PERSNR
" added to list of headers for unimported columns.
b. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" when searching for mapping columns , to test alternative header columns "PAT_ID
" and "PERSNR
", if no temporary patient ID column was found for the original header.
comment:49 by , 9 years ago
(In [3889]) Refs #525. INCA importer updated to use an object of the BASE TabCrLfEncodeDecoder
class to decode escaped characters "\n
", "\t
", and "\\
" back to the original special characters before updating an INCA annotation:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Unused imports removed.
b. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" in import section to use an object of the BASETabCrLfEncodeDecoder
class to decode escaped characters "\n
", "\t
", and "\\
" back to the original special characters before updating an INCA annotation.
comment:50 by , 9 years ago
(In [3895]) Refs #525. INCA importer updated to accept alternative headers for temporary patient ID and personal number independently of each other. "PATID
" is tested first for temporary patient ID and "PersonalNo
" for personal number; if a column is found for an INCA variable, the other alternative is not tested. An optional hyphen in the personal number is now removed before mapping to case items, since the personal numbers in the SCAN-B database do not contain hyphens.
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
":
a. Alternative headers for temporary patient ID and personal number are now tested independently of each other. "PATID
" is tested first for temporary patient ID and "PersonalNo
" for personal number; if a column is found for an INCA variable, the other alternative is not tested.
b. An optional hyphen in the personal number is now removed before mapping to case items.
comment:51 by , 9 years ago
(In [3900]) Refs #525. INCA importer updated by refactoring file upload:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by obtaining list of unprocessed INCA files by calling new private methodList<UnprocessedIncaFile> fetchUnprocessedIncaFiles(HttpServletRequest req)
. Number of lines with line feeds, too many columns, and too few columns, respectively, in each file are now obtained from the correspondingUnprocessedIncaFile
object, instead of from local lists.
b. New private methodList<UnprocessedIncaFile> fetchUnprocessedIncaFiles(HttpServletRequest req)
added. It uploads files posted in anHttpServletRequest
and returns a list ofUnprocessedIncaFile
objects (one per uploaded file).
c. Inner private classUnprocessedIncaFile
updated with integer attributes for number of lines with line feeds, too many columns, and too few columns, respectively, together with public accessor methods.
comment:52 by , 9 years ago
(In [3901]) Refs #525. INCA importer updated by refactoring mapping of personal numbers to biosource id:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by obtaining hash map of personal number to biosource id by calling new private methodHashMap<String,Integer> fetchPersonalNumberBioSourceIdHashMap(DbControl dc)
.
b. New private methodHashMap<String,Integer> fetchPersonalNumberBioSourceIdHashMap(DbControl dc)
added. It returns a hash map mapping personal number to biosource id.
comment:53 by , 9 years ago
(In [3902]) Refs #525. INCA importer updated by removing unused variable List<String> excludePnoList
:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
" by removing variableList<String> excludePnoList
, that is not used (variableList<Integer> excludeLineList
is used instead).
comment:54 by , 9 years ago
(In [3903]) Refs #525. INCA importer updated by refactoring finding key column indexes:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by obtaining key column indexes by calling new private methodJSONObject fetchKeyColumnIndexes(List<String> headerList, JSONObject jsonIncaFileProp)
.
b. New private methodJSONObject fetchKeyColumnIndexes(List<String> headerList, JSONObject jsonIncaFileProp)
added. It updates an input INCA file property JSONObject with information on key column indexes, obtained from a list of column headers.
comment:55 by , 9 years ago
(In [3908]) Refs #525. INCA importer updated by refactoring collection of potential INCA import lines for an unprocessed INCA file:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by obtaining list of potential INCA import lines for an unprocessed INCA file by calling new private methodList<PotentialIncaImportLine> fetchPotentialIncaImportLines(int tempPatIdClmIndex, int personalNoClmIndex, int lateralityDescriptionClmIndex, List<String> lines)
. Use of a single list ofPotentialIncaImportLine
objects should also be safer, than to rely on a number of synchronized lists.
b. New private methodList<PotentialIncaImportLine> fetchPotentialIncaImportLines(int tempPatIdClmIndex, int personalNoClmIndex, int lateralityDescriptionClmIndex, List<String> lines)
added. It returns a list ofPotentialIncaImportLine
objects for lines with personal number.
c. New inner private classPotentialIncaImportLine
added. It is a data access object class with string attributes for personal number, laterality, temporary patient ID, and data line, respectively, together with public accessor methods.
comment:56 by , 9 years ago
(In [3909]) Refs #525. INCA importer updated by removing unused or redundant variables numPatientIdWithMoreThanTwoLines
and numPatientIdWithManyIdenticalLateralityLines
:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by removing unused or redundant variablesnumPatientIdWithMoreThanTwoLines
andnumPatientIdWithManyIdenticalLateralityLines
.
b. Private methodString createIncaImportReportFile(JSONArray jsonIncaFilePropDetailsArr, List<String> missingIncaHeadersList, String message)
updated by getting number of personal numbers with identical laterality lines from correct JSON key "numPersonalNoWithManyIdenticalLateralityLines
", instead of previously used "numPatientIdWithManyIdenticalLateralityLines
" (the numbers were equal).
comment:57 by , 9 years ago
(In [3910]) Refs #525. INCA importer updated by refactoring internal laterality check of potential INCA import lines for an unprocessed INCA file:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by performing internal laterality check of potential INCA import lines for an unprocessed INCA file by calling new private methodInternalLateralityCheckResult internalLateralityCheck(List<PotentialIncaImportLine> potentialIncaImportLines)
.
b. New private methodInternalLateralityCheckResult internalLateralityCheck(List<PotentialIncaImportLine> potentialIncaImportLines)
added. It performs an internal laterality check on a list of potential INCA import lines, and returns anInternalLateralityCheckResult
object with results of the check.
c. New inner private classInternalLateralityCheckResult
added. It is a data access object class with attributes forJSONArray jsonPatientIdWithMoreThanTwoLines
,JSONArray jsonPatientIdWithManyIdenticalLateralityLines
, andList<Integer> excludeLineList
, respectively, together with public accessor methods.
comment:58 by , 9 years ago
(In [3915]) Refs #525. INCA importer updated by refactoring database mapping and data value check of potential INCA import lines for an unprocessed INCA file:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by performing database mapping and data value check of potential INCA import lines for an unprocessed INCA file by calling new private methodLineDatabaseMappingResult lineDatabaseMapping(DbControl dc, List<PotentialIncaImportLine> potentialIncaImportLines, ...)
.
b. New private methodLineDatabaseMappingResult lineDatabaseMapping(DbControl dc, List<PotentialIncaImportLine> potentialIncaImportLines, HashMap<String,Integer> pnoBioSourceIdHM, HashMap<String,String> incaLateralityHM, List<Integer> excludeLineList, List<Integer> importHeaderIndexList, List<String> headerList, HashMap<String,AnnotationType> incaAnnoNameAnnoTypeHM, int fileNo)
added. It performs a database mapping and data value check on a list of potential INCA import lines, and returns aLineDatabaseMappingResult
object with results of the check.
c. New inner private classLineDatabaseMappingResult
added. It is a data access object class with attributes forHashMap<Integer,Integer> rawLineNumberCaseIdHM
,List<Integer> excludeLineList
,List<Integer> excludeCaseIdList
,int numPersonalNoWithoutDatabaseReference
,int numPatientLateralitiesWithoutDatabaseReference
,JSONArray jsonPatientIdForPersonalNoWithoutDatabaseReference
,JSONArray jsonPatientLateralitiesWithoutDatabaseReference
, andJSONArray jsonBadValueLines
, together with public accessor methods.
comment:59 by , 9 years ago
(In [3917]) Refs #525. INCA importer updated by using a sample query to obtain sample item[s] for a patient item, in order to gain some speed increase (~70-80% of the original time for mapping cases to patients):
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Private methodLineDatabaseMappingResult lineDatabaseMapping(DbControl dc, List<PotentialIncaImportLine> potentialIncaImportLines, ...)
updated to use a sample query to obtain sample item[s] for a patient item, instead of usingCase.findByPatient(dc, patient)
.
comment:60 by , 9 years ago
Design update:
- Reggie API has been updated with support for a progress reporter in Ticket #883 (Add support for progress reporting to the Reggie wizard API). The INCA importer should use this, as many operations take a long time to finish.
- JSP file
import-inca.jsp
inresources/personal/
updated by adding a<div id="wizard-progess"></div>
div tag below the<div id="wizard-status"></div>
tag. - Javascript file
import-inca.js
inresources/personal/
updated in calls ofWizard.showLoadingAnimation(...)
by adding a second argument'inca-import-progress'
with the name of the progress reporter. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by creating aSimpleProgressReporter
itemprogress
and storing it with the chosen name in the current session control. Calls toprogress.display(...)
are made at regular intervals. In order to make the progress percent values as representative to the truth as possible, the fraction of time spent for mapping and value checking was estimated from test runs, and is adjusted, depending on whether just a data check is performed, or if it is to be followed by an import.
b. Private methodLineDatabaseMappingResult lineDatabaseMapping(DbControl dc, List<PotentialIncaImportLine> potentialIncaImportLines, ...)
updated with new argumentsint numFiles
,SimpleProgressReporter progress
,float progressTestFraction
, andint progressOffset
, which are used to calculate progress percentage values and reporting them.
comment:61 by , 9 years ago
(In [3920]) Refs #525. INCA importer updated to use progress reporter:
- JSP file
import-inca.jsp
inresources/personal/
updated by adding a<div id="wizard-progess"></div>
div tag below the<div id="wizard-status"></div>
tag. - Javascript file
import-inca.js
inresources/personal/
updated in calls ofWizard.showLoadingAnimation(...)
by adding a second argument'inca-import-progress'
with the name of the progress reporter. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by creating aSimpleProgressReporter
itemprogress
and storing it with the chosen name in the current session control. Calls toprogress.display(...)
are made at regular intervals. In order to make the progress percent values as representative to the truth as possible, the fraction of time spent for mapping and value checking was estimated from test runs, and is adjusted, depending on whether just a data check is performed, or if it is to be followed by an import.
b. Private methodLineDatabaseMappingResult lineDatabaseMapping(DbControl dc, List<PotentialIncaImportLine> potentialIncaImportLines, ...)
updated with new argumentsint numFiles
,SimpleProgressReporter progress
,float progressTestFraction
, andint progressOffset
, which are used to calculate progress percentage values and reporting them.
comment:62 by , 9 years ago
(In [3921]) Refs #525. INCA importer updated in progress reporter to report progress during patient mapping, and to make progress reporting more continuous:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" for progress reporter to report progress during patient mapping, and to make progress reporting more continuous. Variablefloat progressBiosourceMappingFraction
is used to store an estimate of the fraction of time spent mapping database patient items to personal numbers.
b. Private methodHashMap<String,Integer> fetchPersonalNumberBioSourceIdHashMap(DbControl dc)
updated with new argumentsSimpleProgressReporter progress
,float progressBiosourceMappingFraction
, andint progressOffset
, which are used to calculate progress percentage values and reporting them.
comment:63 by , 9 years ago
(In [3922]) Refs #525. INCA importer updated in progress reporter to report progress at start and end of actual import phase:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
" for progress reporter to report progress at start and end of actual import phase.
comment:64 by , 9 years ago
Design discussion:
- The functionality added in change set [3840] 2016-04-13, where the INCA importer was updated to allow the program to handle internal line feed and tab characters, provided the *.csv file is saved with "Text delimiter" set to a double quote character '"', instead of an empty string (blank), was kept in the code when the latter was adapted in change set [3889] etc. 2016-04-28 to be used with a single INCA input file obtained by feeding a special program with a raw INCA XML file and a SCAN-B *.csv request file. The idea was to have an INCA importer that could be used with both kind of files, as replacing tabs with spaces inside sections within double quotes should not change anything, if internal tabs had already been replaced by "
\t
" by the special conversion program.
Unfortunately, inspection of new INCA files has shown the (pretty non-standard) custom of prefixing an entry value in a comment with a single double quote '"
', leading to all tabs separating columns being replaced by spaces until the next double quote is found, resulting in chaos when lines are concatenated to obtain the number of columns in the header line. As this custom would not have worked with the original *.csv file creation procedure, unless original double quotes were escaped, the functionality added in change set [3840] will be removed, as new INCA import files will be created using the special program at INCA.
Design update:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" when reading an input INCA *.csv file to no longer replace tabs with spaces in sections enclosed by double quotes.
b. Private methodTrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
removed, since it is no longer needed.
c. Inner private helper classTrimmedLineItem
of data access object type removed, since it is no longer needed.
comment:65 by , 9 years ago
(In [3923]) Refs #525. INCA importer updated to no longer replace tabs with spaces in sections enclosed by double quotes, since it is not needed using the new special program to produce INCA *.csv input files, and can cause problems, if an odd number of double quotes are used inside a column entry:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" when reading an input INCA *.csv file to no longer replace tabs with spaces in sections enclosed by double quotes.
b. Private methodTrimmedLineItem tabDoubleQuoteTrim(TrimmedLineItem trimmedLineItem)
removed, since it is no longer needed.
c. Inner private helper classTrimmedLineItem
of data access object type removed, since it is no longer needed.
comment:66 by , 9 years ago
(In [3924]) Refs #525. INCA importer updated in JSP file to only allow a single *.csv file to be selected for import:
- JSP file
import-inca.jsp
inresources/personal/
updated by no longer allowing multiple files to be selected in the "importfile
" file selection field. Various text strings modified to be consistent with selection of a single import file.
comment:67 by , 9 years ago
(In [3925]) Refs #525. INCA importer updated to only use a single *.csv file for import. Unused functionality is removed, in order to simplify the code and user interface:
- Javascript file
import-inca.js
inresources/personal/
updated:
a. FunctioninitializeStep2(response)
updated to obtain a JSONObjectfileProp
from JSON key "incaFileProperties
", instead of a JSONArrayfilePropArray
from JSON key "incaFilePropertiesArray
".
b. FunctioncreateTableHeader()
updated to have no argument.
c. FunctionsfetchTableRowStatus(...)
,createTableRow(...)
, andcreateTableRow2(...)
updated to take first argument JSONObject instead of a JSONArray. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by using the first (= the only) input file for import. Data are now stored for later use in JSONObjects instead of JSONArrays.
b. Private methodLineDatabaseMappingResult lineDatabaseMapping(DbControl dc, List<PotentialIncaImportLine> potentialIncaImportLines, ...)
updated by removing argumentsint fileNo
andint numFiles
, since they are no longer needed for the progress report calculation.
c. Private methodString createIncaImportReportFile(JSONArray jsonIncaFilePropDetailsArr, List<String> missingIncaHeadersList, String message)
updated by exchanging first argumentJSONArray jsonIncaFilePropDetailsArr
forJSONObject jsonIncaFilePropDetails
.
comment:68 by , 9 years ago
(In [3926]) Refs #525. INCA importer updated by removing or commenting out debug output to server log file, unless the information concerns a caught exception or other error, not reported otherwise:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated by removing or commenting out debug output to server log file, unless the information concerns a caught exception or other error, not reported otherwise.
comment:69 by , 9 years ago
(In [3927]) Refs #525. INCA importer updated in method creating INCA import report file:
- Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated:
a. Protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
updated for command "ImportInca
" by not expecting private methodcreateIncaImportReportFile(JSONObject jsonIncaFilePropDetails, List<String> missingIncaHeadersList, String message)
to return a string with the report file path (the information was never used).
b. Private methodcreateIncaImportReportFile(JSONObject jsonIncaFilePropDetails, List<String> missingIncaHeadersList, String message)
updated to no longer return a string with the report file path, but instead have typevoid
. The file contents also updated by only referring to a single INCA file.
comment:70 by , 9 years ago
(In [3928]) Refs #525. INCA importer updated in preparation for release of first version:
- JSP file
index.jsp
inresources/
updated by removing entries "experimental not-implemented
" from class description for "inca-import
"<span>
tag. - Java servlet class/file
IncaServlet.java
insrc/net/sf/basedb/reggie/servlet/
updated in protected methodvoid doPost(HttpServletRequest req, HttpServletResponse resp)
for command "ImportInca
" by moving code for creating aDbControl
item and checking role permissions to the top, in order to increase similarity with other servlets.
comment:71 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Ticket closed as first version of INCA importer has been implemented.
Milestone renamed