Opened 6 years ago

Closed 6 years ago

#1104 closed enhancement (fixed)

Re-factor the INCA import

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.21
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

We want to add more information to the output files generated by the INCA importer. For example, for cases that has a laterality mismatch we would like the SCAN-B id to be listed (right now only the internal INCA id is listed), so that there is a chance to maybe correct this manually.

In the current implementation it is hard to collect more data in a structured way since the importer uses multiple maps and arrays and data is copied between them in a way that is not easy to understand.

A re-factoring is needed with a more simple and static model where we can add information as we need without having to copy or duplicate between different data structures.

Change History (16)

comment:1 by Nicklas Nordborg, 6 years ago

(In [5256]) References #1104: Re-factor the INCA import

Updated the initial parsing of the INCA file data. No big changes yet, except for moving some existing data structures to new places to avoid some copy steps. Future changes are expected to change the data structures as well.

comment:2 by Nicklas Nordborg, 6 years ago

(In [5258]) References #1104: Re-factor the INCA import

Updated parsing of pre-defined columns.

comment:3 by Nicklas Nordborg, 6 years ago

(In [5260]) References #1104: Re-factor the INCA import

Mapping columns to annotation types.

comment:4 by Nicklas Nordborg, 6 years ago

(In [5261]) References #1104: Re-factor the INCA import

Initial parsing of data lines that include a first check if a personal number is present or not.

comment:5 by Nicklas Nordborg, 6 years ago

(In [5262]) References #1104: Re-factor the INCA import

Replaced duplicate laterality checking with new implementation.

comment:6 by Nicklas Nordborg, 6 years ago

(In [5263]) References #1104: Re-factor the INCA import

Started a new implementation for collecting statistics (for JSON) in a single place. The report in the GUI is also re-designed.

comment:7 by Nicklas Nordborg, 6 years ago

(In [5267]) References #1104: Re-factor the INCA import

Started to re-factor data value check and patient/case mapping.

comment:8 by Nicklas Nordborg, 6 years ago

(In [5268]) References #1104: Re-factor the INCA import

Started with new implementation of report file generation.

comment:9 by Nicklas Nordborg, 6 years ago

(In [5269]) References #1104: Re-factor the INCA import

Cleaning up and documenting new code. Removing lots of old code.

comment:10 by Nicklas Nordborg, 6 years ago

(In [5270]) References #1104: Re-factor the INCA import

Changes to the flow in the GUI to make the wizard behave more like other wizards.

comment:11 by Nicklas Nordborg, 6 years ago

(In [5273]) References #1104: Re-factor the INCA import

Updated patient/case mapping and progress reporting. More details in the report file.

comment:12 by Nicklas Nordborg, 6 years ago

(In [5274]) References #1104: Re-factor the INCA import

Improved reporting about mapped and unmapped INCA annotations.

comment:13 by Nicklas Nordborg, 6 years ago

(In [5275]) References #1104: Re-factor the INCA import

Re-implemented the method for creating the output CSV file. Due to earlier changes it will now output data lines in the same order as they appear in the input file and there will also be a one-to-one relationship. A 'Flag' colum has been introduced to mark lines that has been excluded from the import. The change means that the output file now may contain more lines that was not present in earlier version (eg. all lines that has a flag value that is not MISSING_PERSONAL_NO).

The report file now reports the PAT_ID value instead of personal number.

comment:14 by Nicklas Nordborg, 6 years ago

(In [5279]) References #1104: Re-factor the INCA import

Check for and report cases and blood items for data lines that can be matched against a patient but not a case. Typically there are two such use cases:

  • The INCA file says RIGHT and the database LEFT (=something has been incorrectly registered)
  • We have a patient with only a BLOOD item. This is expected and can manually be solved by registering a Case with NoSpecimen.

The report files are now stored in the static cache instead of in the userfiles directory. The current session id is used in the path so that the files can only be accessed by the current user.

Added a help text explaining the difference between the simple and full file check.

The ouput csv file is created by the full file check (instead of waiting for the actual import).

comment:15 by Nicklas Nordborg, 6 years ago

In 5280:

References #1104: Re-factor the INCA import

Cleaning up some unused code and some minor changes to the report file.

comment:16 by Nicklas Nordborg, 6 years ago

Resolution: fixed
Status: newclosed

The import wizard should work. The statistics wizard does not and has been disabled. It will be fixed in a later release. See #1108.

Note: See TracTickets for help on using tickets.