Opened 8 years ago

Closed 8 years ago

#881 closed task (fixed)

Implement INCA XML to CSV converter

Reported by: Nicklas Nordborg Owned by:
Priority: blocker Milestone: INCA XML to CSV converter 1.0
Component: net.sf.basedb.inca Keywords:
Cc:

Description

We should implement a simple standalone Java program that converts an XML file exported from INCA to a tab-separated CSV file.

The XML file is expected to include data that we are not allowed to access and must be filtered.

The filter is defined by the INCA export file from Reggie. This file include the personal numbers that we are allowed to access.

Entries in the XML file that have a matching entry in the SCANB file should have should be fully writted to the CSV files. XML entries that doesn't match should still be written to the CSV file except that certain fields should be "blanked" out. The fields to blank are defined by a "blacklist" file.

Change History (15)

comment:1 by Nicklas Nordborg, 8 years ago

Milestone: INCA XML to CSV converter 1.0

comment:2 by Nicklas Nordborg, 8 years ago

(In [3881]) References #881: Implement INCA XML to CSV converter

Added

  • Folder structure for source code
  • JAR file manifest
  • build.xml
  • License and readme files
  • Eclipse project files

comment:3 by Nicklas Nordborg, 8 years ago

(In [3883]) References #881: Implement INCA XML to CSV converter

Added parser for reading the file exported from SCANB.

comment:4 by Nicklas Nordborg, 8 years ago

(In [3884]) References #881: Implement INCA XML to CSV converter

Open a file selection dialog for selecting the SCANB file unless the path to it has been given on the command line.

comment:5 by Nicklas Nordborg, 8 years ago

(In [3885]) References #881: Implement INCA XML to CSV converter

Added a SAX parser implementation for parsing the INCA XML file. It currently only generates some debug output.

comment:6 by Nicklas Nordborg, 8 years ago

(In [3887]) References #881: Implement INCA XML to CSV converter

Added a writer for the CSV file. The IncaXmlParser parser will simply call the writer every time a full row has been completed. The writer checks with the ScanBParser if the row should be fully accepted or if the blacklisted columns should be masked.

The writer will also encode newline, tabs and backslash characters to \n, \t,
. This is compatible with the TabCrLfEncoderDecorder implementation in BASE and should make it easy to parse the CSV file in Reggie.

comment:7 by Nicklas Nordborg, 8 years ago

(In [3890]) References #881: Implement INCA XML to CSV converter

Added "Save as" dialog for setting the output CSV file. A default filename is generated by replacing the .xml from the INCA XML file with .csv and placing it in the same folder as the SCANB CSV file.

Added some counters to the parsers/writer and display some information about what happend to stdout and as a popup information dialog. The dialog is only used if the user also selected files using the GUI.

comment:8 by Nicklas Nordborg, 8 years ago

(In [3891]) References #881: Implement INCA XML to CSV converter

Personal number in INCA are stored with a '-' separator but SCANB doesn't.

We replace the '-' with nothing and store the value as "PersonalNo". Both columns are included in the CSV file. The "PersonalNo" is always the first column.

comment:9 by Nicklas Nordborg, 8 years ago

(In [3892]) References #881: Implement INCA XML to CSV converter

Renamed JAR/TAR file to start with IncaXml2Csv instead of inca-xml2csv.

comment:10 by Nicklas Nordborg, 8 years ago

(In [3893]) References #881: Implement INCA XML to CSV converter

Ignore the new file names.

comment:11 by Nicklas Nordborg, 8 years ago

(In [3894]) References #881: Implement INCA XML to CSV converter

Arrgggh... difficult to get the file name correct.

comment:12 by Nicklas Nordborg, 8 years ago

(In [3896]) References #881: Implement INCA XML to CSV converter

Updated README and once again changed the filename of the JAR file.

comment:13 by Nicklas Nordborg, 8 years ago

(In [3897]) References #881: Implement INCA XML to CSV converter

Changes to make the README more "tracified".

comment:14 by Nicklas Nordborg, 8 years ago

(In [3898]) References #881: Implement INCA XML to CSV converter

Changes to make the README more "tracified".

comment:15 by Nicklas Nordborg, 8 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.