Changes between Version 1 and Version 2 of Ticket #525, comment 22


Ignore:
Timestamp:
Apr 12, 2016, 9:24:34 AM (8 years ago)
Author:
olle

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #525, comment 22

    v1 v2  
    11Design discussion:
    22
    3 The procedure for speeding up the mapping step by utilizing that the INCA input files normally contain lines for the same cases, could also be used for the import step, by importing all INCA annotations for a single case in the same sub-step, instead of doing it file by file. While some speed improvement might be possible, the main improvement would be that this makes it possible to divide the single commit step into a number of commits, where each one concerns a sub-set of the cases. The advantage of using several commits would be to avoid program crashes due to the java heap memory reaching its maximum limit. If something goes wrong during the import step, this would result in some cases having been updated and other not, but a for a single case either all INCA annotations were up-to-date or unchanged. Using several commits while processing file by file might end in some cases having only a part of the INCA annotations up-to-date.
     3 a. The procedure for speeding up the mapping step by utilizing that the INCA input files normally contain lines for the same cases, could also be used for the import step, by importing all INCA annotations for a single case in the same sub-step, instead of doing it file by file. While some speed improvement might be possible, the main improvement would be that this makes it possible to divide the single commit step into a number of commits, where each one concerns a sub-set of the cases. The advantage of using several commits would be to avoid program crashes due to the java heap memory reaching its maximum limit. If something goes wrong during the import step, this would result in some cases having been updated and other not, but a for a single case either all INCA annotations were up-to-date or unchanged. Using several commits while processing file by file might end in some cases having only a part of the INCA annotations up-to-date.
     4 b. Allowing all INCA annotations for a single case to be committed in the same sub-step, requires collecting INCA data for a case from all input INCA files. In order to make the code more readable, this should be done using a number of inner help classes of the data access object type.
     5 c. If some headers of columns to be imported exist in more than one file, it is desirable that the corresponding INCA annotation for a case item only is updated once (or update counted once, if the column values are identical in all files).
    46
    5 Allowing all INCA annotations for a single case to be committed in the same sub-step, requires collecting INCA data for a case from all input INCA files. In order to make the code more readable, this should be done using a number of inner help classes of the data access object type.
     7Design update:
    68
    7  1. Java servlet class/file `IncaServlet.java` in `src/net/sf/basedb/reggie/servlet/` updated in protected method `void doPost(HttpServletRequest req, HttpServletResponse resp)` for command "`ImportInca`":[[BR]]a. A new hash map created for storing annotation type for given annotation type ID.[[BR]]b. Code is re-written to collect INCA annotations for a given case from all input INCA files, before import is performed. The import is then performed, one case at a time. In order to make the code more readable, use is made of new inner help classes `UnprocessedIncaFile`, `IncaFile`, `RawIncaCase`, `IncaCase`, and `IncaAnnoItem`.
     9 1. Java servlet class/file `IncaServlet.java` in `src/net/sf/basedb/reggie/servlet/` updated in protected method `void doPost(HttpServletRequest req, HttpServletResponse resp)` for command "`ImportInca`":[[BR]]a. A new hash map is created for storing annotation type for given annotation type ID.[[BR]]b. Code is re-written to collect INCA annotations for a given case from all input INCA files, before import is performed. The import is then performed, one case at a time. In order to make the code more readable, use is made of new inner help classes `UnprocessedIncaFile`, `IncaFile`, `RawIncaCase`, `IncaCase`, and `IncaAnnoItem`.[[BR]]c. New inner private class `UnprocessedIncaFile` added. It stores filename, list of header strings, and list of data lines for an INCA input file. Note that not all data lines might contain mapping information allowing import to the SCAN-B database.[[BR]]d. New inner private class `IncaFile` added. It stores filename, list of header strings, list of indexes for columns to be imported, and list of `RawIncaCase` items for an INCA input file.[[BR]]e. New inner private class `RawIncaCase` added. It stores database ID and INCA import line for a case item.[[BR]]f. New inner private class `IncaCase` added. It stores database ID, list of `IncaAnnoItem` objects, and list of database ID values for used annotation types for a case item.[[BR]]g. New inner private class `IncaAnnoItem` added. It stores the database ID of the annotation type and the value string to be imported.[[BR]]h. Some variable names have been updated, in order to make them more consistent.[[BR]]i. Test version writes a log message with time stamp for every 100 case item that is processed for import, in order to check performance/stability.