Opened 6 months ago

Closed 2 months ago

#938 closed enhancement (fixed)

Release exporter should export files that can be imported on a different server

Reported by: nicklas Owned by: nicklas
Priority: critical Milestone: Reggie v4.10
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

The tab-separated files that are currently exported by the Release exporter are not very suitable to use when importing. It would be better to divide the data in a way that allows for importing an entire chain of items from raw bioassay up to biosource from a single file. This would make it easier to handle the multiple transactions that are needed (there is too much data to be able to import it in a single transaction).

The current idea is to produce one JSON-formatted file with data for each item chain. The files are named after the raw bioassay that is the starting point. The files should contain all annotations and file references that are needed on the importing side.

Change History (59)

comment:1 Changed 6 months ago by nicklas

(In [4346]) References #938: Release exporter should export files that can be imported on a different server

Added a first (and very simple) version of a JSON writer. It creates one JSON file for each raw bioassay which holds an array of all items in the cohort chain up to patient. So far, only the name, type and subtype of each item is exported.

comment:2 Changed 6 months ago by nicklas

  • Status changed from new to assigned

comment:3 Changed 6 months ago by nicklas

(In [4365]) References #938: Release exporter should export files that can be imported on a different server

The JSON writer now creates index.json before starting the export. This file contains some information about the list that is exported and is intended to be used by the relax importer to extract some information.

comment:4 Changed 6 months ago by nicklas

(In [4367]) References #938: Release exporter should export files that can be imported on a different server

The release exporter now creates the exportcomplete file to indicate that the export has been completed.

comment:5 Changed 6 months ago by nicklas

(In [4369]) References #938: Release exporter should export files that can be imported on a different server

Exporting the parent item name for each item.

comment:6 Changed 6 months ago by nicklas

(In [4371]) References #938: Release exporter should export files that can be imported on a different server

Re-factored the JSON export so that each CohortWriter implementation is responsible for generating it's own part of the JSON array.

comment:7 Changed 6 months ago by nicklas

(In [4378]) References #938: Release exporter should export files that can be imported on a different server

Added support exporting annotation types and annotation values. All annotation type definitions that are part of the release are exported to "annotationtypes.json" so that they can automatically be re-created on the importing server. The patient and library writer has been used for testing.

comment:8 Changed 5 months ago by nicklas

(In [4379]) References #938: Release exporter should export files that can be imported on a different server

Started to refactor the cohort annotation export to make it more flexible and easier to use for "virtual" annotation types.

comment:9 Changed 5 months ago by nicklas

(In [4380]) References #938: Release exporter should export files that can be imported on a different server

Some more changes to make it possible to format dates (and other values) as we want.

comment:10 Changed 5 months ago by nicklas

(In [4382]) References #938: Release exporter should export files that can be imported on a different server

Exporting an additional file files.json with paths to the JSON files for the cohort items. This should make it easier on the import side that just have to read the files.json instead of connecting via SSH and executing a ls command.

comment:11 Changed 5 months ago by nicklas

(In [4385]) References #938: Release exporter should export files that can be imported on a different server

Added some basic support for exporting file information. Implemented for FASTQ files on the merged level and FPKM files on the rawbioassay level.

The actual files are not copied. It's not decided yet how the file structure is going to look like on the relax side yet so the file export will probably change.

comment:12 Changed 5 months ago by nicklas

(In [4389]) References #938: Release exporter should export files that can be imported on a different server

Added support for exporting annotation values as project-specific annotations. Use RNAQC and RNAQC date as a test case on the RNA item since we always use the latest information here.

comment:13 Changed 5 months ago by nicklas

(In [4394]) References #938: Release exporter should export files that can be imported on a different server

Exporting the registration date.

comment:14 Changed 5 months ago by nicklas

(In [4396]) References #938: Release exporter should export files that can be imported on a different server

Exporting the creation date.

comment:15 Changed 5 months ago by nicklas

(In [4398]) References #938: Release exporter should export files that can be imported on a different server

RNAQC date need to be formatted as a date.

comment:16 Changed 5 months ago by nicklas

(In [4402]) References #938: Release exporter should export files that can be imported on a different server

Exporting the DataFilesFolder annotation. We want to make this a project-specific annotation on the relax side since the idea is to create a new top-folder for every release.

comment:17 Changed 5 months ago by nicklas

(In [4404]) References #938: Release exporter should export files that can be imported on a different server

Exporting more metadata about files.

comment:18 Changed 5 months ago by nicklas

(In [4407]) References #938: Release exporter should export files that can be imported on a different server

Started to implement support for exporting other "type" definitions than annotation types... The first other case is to export data file types.

While it works as it is I think we need to rename a few things to make them more generic (eg. CohortAnnotationType, etc.)

comment:19 Changed 5 months ago by nicklas

(In [4408]) References #938: Release exporter should export files that can be imported on a different server

Renamed the CohortAnnotationType class to CohortTypeDef and other changes related to this.

Type definitions are now exported to typedefs.json instead of annotationtypes.json.

comment:20 Changed 5 months ago by nicklas

(In [4410]) References #938: Release exporter should export files that can be imported on a different server

Exporting FPKM file type.

comment:21 Changed 5 months ago by nicklas

(In [4412]) References #938: Release exporter should export files that can be imported on a different server

Exporting files linked to items with any-to-any links.

comment:22 Changed 5 months ago by nicklas

(In [4420]) References #938: Release exporter should export files that can be imported on a different server

Added a plugin paramter that asks for the release version to create. Before accepting the parameters the file server is checked to make sure that the given version number doesn't already correspond to a directory in the release archive.

Addded ScriptWriter class that should be responsible for created the bash scripts that are required for syncing the released data files between the project archive and release archive. At the moment it only creates the mkdirs.sh script which creates the directory structure that is needed for the release.

comment:23 Changed 5 months ago by nicklas

(In [4421]) References #938: Release exporter should export files that can be imported on a different server

Added support for creating the link script for linking to previously released files. An actual check for finding existing files has not been implemented yet (we simply assume that all files exists in release 0.9).

A temporary (for debugging) 'cat.sh' script has been created in place of the 'rsync.sh' script that creates dummy files.

comment:24 Changed 5 months ago by nicklas

(In [4422]) References #938: Release exporter should export files that can be imported on a different server

The rsync script is now being created. It seems to work but we still need to implement checks for which files already exists (and should be linked) and which need to be copied.

The code is also a bit messy now when it comes to finding the correct path to sync to/from. We are converting back and forth between external and scan-b id too many times...

comment:25 Changed 5 months ago by nicklas

(In [4423]) References #938: Release exporter should export files that can be imported on a different server

Added support for checking for existing released files. To find existing files we first find the top-level directory for each release. Then for each release, we find all REAL files (ignoring symlinks). We store the result in a map that allows us to find the release version a file appears in.

comment:26 Changed 5 months ago by nicklas

(In [4427]) References #938: Release exporter should export files that can be imported on a different server

Cleaning up the script generation code.

  • Moved common parts to functions
  • Scripts are created with execute permission set
  • More checks and error handling

comment:27 Changed 5 months ago by nicklas

(In [4428]) References #938: Release exporter should export files that can be imported on a different server

Removed the auto-generated UUID since it has been fully replaced by the release version.

comment:28 Changed 5 months ago by nicklas

(In [4430]) References #938: Release exporter should export files that can be imported on a different server

Added "overwrite" option when exporting to a remote server.

When checking for existing files, the current release directory is ignored. Needed when the "overwrite" option is in effect since otherwise we would create symbolic links that points to themselves.

The 'rsync' script uses information from the ProjectArchive item to generate a default value for the PROJECTARCHIVE parameter.

comment:29 Changed 5 months ago by nicklas

(In [4432]) References #938: Release exporter should export files that can be imported on a different server

Files that have been marked for deletion are not included in the export.

comment:30 Changed 5 months ago by nicklas

(In [4434]) References #938: Release exporter should export files that can be imported on a different server

Fixed incorrect description for plugin parameter.

comment:31 Changed 5 months ago by nicklas

(In [4435]) References #938: Release exporter should export files that can be imported on a different server

Exporting platform and platform variant for raw bioassays.

comment:32 Changed 5 months ago by nicklas

(In [4437]) References #938: Release exporter should export files that can be imported on a different server

Adding and changing exported properties and annotations.

For all items: no dates Case: Only "Yes" is allowed for consent (in case we accidentally include a No in the export) Specimen: OriginalQuantityMilliGram, DaysToLab, !MinutesToRNALater, BiopsyType, SpecimenType, !Laterality, NofPieces, LinkedSpecimen

comment:33 Changed 5 months ago by nicklas

(In [4438]) References #938: Release exporter should export files that can be imported on a different server

Added support for white- and blacklisting files that should be included in the export.

comment:34 Changed 5 months ago by nicklas

(In [4439]) References #938: Release exporter should export files that can be imported on a different server

Including some files at the aligned level in the sync scripts (no JSON).

comment:35 Changed 5 months ago by nicklas

(In [4442]) References #938: Release exporter should export files that can be imported on a different server

Re-factored loading of the related "GoodStain?" sample so that it can be used with other writers than the StainedWriter.

comment:36 Changed 5 months ago by nicklas

(In [4443]) References #938: Release exporter should export files that can be imported on a different server

Added histology scores to the export for specimen.

comment:37 Changed 5 months ago by nicklas

(In [4444]) References #938: Release exporter should export files that can be imported on a different server

Fixed an UnsupportedOperationException issue with converting dates that are coming from the SQL server.

comment:38 Changed 4 months ago by nicklas

(In [4448]) References #938: Release exporter should export files that can be imported on a different server

Exporting some more annotations for RNA and Library items.

comment:39 Changed 4 months ago by nicklas

(In [4461]) References #962 and #938. The release exporter is now exporting the reference date and source. The date is converted to a year-only value (integer).

comment:40 Changed 4 months ago by nicklas

(In [4462]) References #938. Added support for exporting annotation type definitions with units. Tested with specimen writer.

comment:41 Changed 4 months ago by nicklas

(In [4464]) References #938. Added "His" prefix to annotations related to histology score. "HisName?" is also exported to indicate the existence of a histology item even if there are no scores.

comment:42 Changed 4 months ago by nicklas

(In [4465]) References #938. Added Lysate and Qiacube annotations to the RNA writer. The date annotations should be converted to a "batch index" that is not related to the actual date, except that the should sort in the same order.

comment:43 Changed 4 months ago by nicklas

(In [4466]) References #938. Batch index annotations are now created with random and unique proxy values that are mapped to an index value after all items have been exported. The mapping is saved in the batch-index-lookup.json file which is used on the importing side to map the proxy to the batch index value.

comment:44 Changed 4 months ago by nicklas

(In [4468]) References #938. Started to refactor the CohortAnnotationTypeFactory to make it easier (=less code) to create annotation type definitions for the export. Used by the Patient, Case, Specimen and RNA exporters which should now be complete (except that BASE has no unit for concentration that we need for the NdConc annotation).

comment:45 Changed 4 months ago by nicklas

(In [4469]) References #938. More re-factoring. The CohortAnnotationTypeFactory is now CohortTypeDefFactory and can be used to create both annotation type definitions and file type definitions.

comment:46 Changed 4 months ago by nicklas

(In [4470]) References #938. Library annotations should now be complete. Added unit to the NDConc annotation for RNA.

comment:47 Changed 4 months ago by nicklas

(In [4471]) References #938. Added unit to the library molarity annotation.

comment:48 Changed 4 months ago by nicklas

(In [4472]) References #938. Merged sequences annotations should now be completed.

comment:49 Changed 4 months ago by nicklas

(In [4473]) References #938. Cufflinks annnotations should now be complete.

comment:50 Changed 4 months ago by nicklas

(In [4474]) References #938. Re-factoring to make similar methods to use parameters in the same order.

comment:51 Changed 4 months ago by nicklas

(In [4475]) References #938. Added support for exporting INCA annotations. The annotations to export must be added to the "INCA_Release" category. The annotations are currently exported as is since there is no support for creating masked or re-calculated values.

comment:52 Changed 3 months ago by nicklas

(In [4479]) References #938. Implemented support for converting INCA values to some other value. The currently implemented rules:

  • IncaExportDate: Converted to year+quarter
  • INCA_A030DiaDat: Converted to year
  • All other INCA date annotations: Converted to number of days relative the INCA_A030DiaDat.
  • INCA_A000Alder: Converted to the nearest higher 5-year value.

comment:53 Changed 3 months ago by nicklas

(In [4480]) References #938. Exporting Site as an annotation.

comment:54 Changed 3 months ago by nicklas

(In [4481]) References #938. Implemented an option for if the expression matrix (and related files) should be created or not. Implemented an option for if JSON files should be created or not.

When doing a local export both options are avilable for user configuration. The matrix export is enabled by default while the JSON is disabled.

When doing a remote export, only the matrix export option is available (disabled by default). JSON file export can't be turned off.

The progress bar should now also go from 0 to 100% regardless which options that are selected.

comment:55 Changed 3 months ago by nicklas

(In [4484]) References #938. Added "SamplingDate" as a specimen annotation. It is calculated as the number of days between the "ReferenceDate" on the case item and the "SamplingDateTime" annotation.

Added unit to INCA dates that are converted to number of days since the reference date.

comment:56 Changed 3 months ago by nicklas

(In [4485]) References #938. The release export plug-in now ends with a "Done" message.

comment:57 Changed 3 months ago by nicklas

(In [4513]) References #938. Added support for re-naming annotation types that are exported.

comment:58 Changed 2 months ago by nicklas

(In [4518]) References #938. Fixed echo commands in scripts so that variables are output by value instead of by name.

comment:59 Changed 2 months ago by nicklas

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.