Opened 8 years ago

Closed 7 years ago

#938 closed enhancement (fixed)

Release exporter should export files that can be imported on a different server

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: critical Milestone: Reggie v4.10
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

The tab-separated files that are currently exported by the Release exporter are not very suitable to use when importing. It would be better to divide the data in a way that allows for importing an entire chain of items from raw bioassay up to biosource from a single file. This would make it easier to handle the multiple transactions that are needed (there is too much data to be able to import it in a single transaction).

The current idea is to produce one JSON-formatted file with data for each item chain. The files are named after the raw bioassay that is the starting point. The files should contain all annotations and file references that are needed on the importing side.

Change History (59)

comment:1 by Nicklas Nordborg, 8 years ago

(In [4346]) References #938: Release exporter should export files that can be imported on a different server

Added a first (and very simple) version of a JSON writer. It creates one JSON file for each raw bioassay which holds an array of all items in the cohort chain up to patient. So far, only the name, type and subtype of each item is exported.

comment:2 by Nicklas Nordborg, 8 years ago

Status: newassigned

comment:3 by Nicklas Nordborg, 8 years ago

(In [4365]) References #938: Release exporter should export files that can be imported on a different server

The JSON writer now creates index.json before starting the export. This file contains some information about the list that is exported and is intended to be used by the relax importer to extract some information.

comment:4 by Nicklas Nordborg, 8 years ago

(In [4367]) References #938: Release exporter should export files that can be imported on a different server

The release exporter now creates the exportcomplete file to indicate that the export has been completed.

comment:5 by Nicklas Nordborg, 8 years ago

(In [4369]) References #938: Release exporter should export files that can be imported on a different server

Exporting the parent item name for each item.

comment:6 by Nicklas Nordborg, 8 years ago

(In [4371]) References #938: Release exporter should export files that can be imported on a different server

Re-factored the JSON export so that each CohortWriter implementation is responsible for generating it's own part of the JSON array.

comment:7 by Nicklas Nordborg, 8 years ago

(In [4378]) References #938: Release exporter should export files that can be imported on a different server

Added support exporting annotation types and annotation values. All annotation type definitions that are part of the release are exported to "annotationtypes.json" so that they can automatically be re-created on the importing server. The patient and library writer has been used for testing.

comment:8 by Nicklas Nordborg, 8 years ago

(In [4379]) References #938: Release exporter should export files that can be imported on a different server

Started to refactor the cohort annotation export to make it more flexible and easier to use for "virtual" annotation types.

comment:9 by Nicklas Nordborg, 8 years ago

(In [4380]) References #938: Release exporter should export files that can be imported on a different server

Some more changes to make it possible to format dates (and other values) as we want.

comment:10 by Nicklas Nordborg, 8 years ago

(In [4382]) References #938: Release exporter should export files that can be imported on a different server

Exporting an additional file files.json with paths to the JSON files for the cohort items. This should make it easier on the import side that just have to read the files.json instead of connecting via SSH and executing a ls command.

comment:11 by Nicklas Nordborg, 8 years ago

(In [4385]) References #938: Release exporter should export files that can be imported on a different server

Added some basic support for exporting file information. Implemented for FASTQ files on the merged level and FPKM files on the rawbioassay level.

The actual files are not copied. It's not decided yet how the file structure is going to look like on the relax side yet so the file export will probably change.

comment:12 by Nicklas Nordborg, 8 years ago

(In [4389]) References #938: Release exporter should export files that can be imported on a different server

Added support for exporting annotation values as project-specific annotations. Use RNAQC and RNAQC date as a test case on the RNA item since we always use the latest information here.

comment:13 by Nicklas Nordborg, 8 years ago

(In [4394]) References #938: Release exporter should export files that can be imported on a different server

Exporting the registration date.

comment:14 by Nicklas Nordborg, 8 years ago

(In [4396]) References #938: Release exporter should export files that can be imported on a different server

Exporting the creation date.

comment:15 by Nicklas Nordborg, 8 years ago

(In [4398]) References #938: Release exporter should export files that can be imported on a different server

RNAQC date need to be formatted as a date.

comment:16 by Nicklas Nordborg, 8 years ago

(In [4402]) References #938: Release exporter should export files that can be imported on a different server

Exporting the DataFilesFolder annotation. We want to make this a project-specific annotation on the relax side since the idea is to create a new top-folder for every release.

comment:17 by Nicklas Nordborg, 8 years ago

(In [4404]) References #938: Release exporter should export files that can be imported on a different server

Exporting more metadata about files.

comment:18 by Nicklas Nordborg, 7 years ago

(In [4407]) References #938: Release exporter should export files that can be imported on a different server

Started to implement support for exporting other "type" definitions than annotation types... The first other case is to export data file types.

While it works as it is I think we need to rename a few things to make them more generic (eg. CohortAnnotationType, etc.)

comment:19 by Nicklas Nordborg, 7 years ago

(In [4408]) References #938: Release exporter should export files that can be imported on a different server

Renamed the CohortAnnotationType class to CohortTypeDef and other changes related to this.

Type definitions are now exported to typedefs.json instead of annotationtypes.json.

comment:20 by Nicklas Nordborg, 7 years ago

(In [4410]) References #938: Release exporter should export files that can be imported on a different server

Exporting FPKM file type.

comment:21 by Nicklas Nordborg, 7 years ago

(In [4412]) References #938: Release exporter should export files that can be imported on a different server

Exporting files linked to items with any-to-any links.

comment:22 by Nicklas Nordborg, 7 years ago

(In [4420]) References #938: Release exporter should export files that can be imported on a different server

Added a plugin paramter that asks for the release version to create. Before accepting the parameters the file server is checked to make sure that the given version number doesn't already correspond to a directory in the release archive.

Addded ScriptWriter class that should be responsible for created the bash scripts that are required for syncing the released data files between the project archive and release archive. At the moment it only creates the mkdirs.sh script which creates the directory structure that is needed for the release.

comment:23 by Nicklas Nordborg, 7 years ago

(In [4421]) References #938: Release exporter should export files that can be imported on a different server

Added support for creating the link script for linking to previously released files. An actual check for finding existing files has not been implemented yet (we simply assume that all files exists in release 0.9).

A temporary (for debugging) 'cat.sh' script has been created in place of the 'rsync.sh' script that creates dummy files.

comment:24 by Nicklas Nordborg, 7 years ago

(In [4422]) References #938: Release exporter should export files that can be imported on a different server

The rsync script is now being created. It seems to work but we still need to implement checks for which files already exists (and should be linked) and which need to be copied.

The code is also a bit messy now when it comes to finding the correct path to sync to/from. We are converting back and forth between external and scan-b id too many times...

comment:25 by Nicklas Nordborg, 7 years ago

(In [4423]) References #938: Release exporter should export files that can be imported on a different server

Added support for checking for existing released files. To find existing files we first find the top-level directory for each release. Then for each release, we find all REAL files (ignoring symlinks). We store the result in a map that allows us to find the release version a file appears in.

comment:26 by Nicklas Nordborg, 7 years ago

(In [4427]) References #938: Release exporter should export files that can be imported on a different server

Cleaning up the script generation code.

  • Moved common parts to functions
  • Scripts are created with execute permission set
  • More checks and error handling

comment:27 by Nicklas Nordborg, 7 years ago

(In [4428]) References #938: Release exporter should export files that can be imported on a different server

Removed the auto-generated UUID since it has been fully replaced by the release version.

comment:28 by Nicklas Nordborg, 7 years ago

(In [4430]) References #938: Release exporter should export files that can be imported on a different server

Added "overwrite" option when exporting to a remote server.

When checking for existing files, the current release directory is ignored. Needed when the "overwrite" option is in effect since otherwise we would create symbolic links that points to themselves.

The 'rsync' script uses information from the ProjectArchive item to generate a default value for the PROJECTARCHIVE parameter.

comment:29 by Nicklas Nordborg, 7 years ago

(In [4432]) References #938: Release exporter should export files that can be imported on a different server

Files that have been marked for deletion are not included in the export.

comment:30 by Nicklas Nordborg, 7 years ago

(In [4434]) References #938: Release exporter should export files that can be imported on a different server

Fixed incorrect description for plugin parameter.

comment:31 by Nicklas Nordborg, 7 years ago

(In [4435]) References #938: Release exporter should export files that can be imported on a different server

Exporting platform and platform variant for raw bioassays.

comment:32 by Nicklas Nordborg, 7 years ago

(In [4437]) References #938: Release exporter should export files that can be imported on a different server

Adding and changing exported properties and annotations.

For all items: no dates Case: Only "Yes" is allowed for consent (in case we accidentally include a No in the export) Specimen: OriginalQuantityMilliGram, DaysToLab, !MinutesToRNALater, BiopsyType, SpecimenType, !Laterality, NofPieces, LinkedSpecimen

comment:33 by Nicklas Nordborg, 7 years ago

(In [4438]) References #938: Release exporter should export files that can be imported on a different server

Added support for white- and blacklisting files that should be included in the export.

comment:34 by Nicklas Nordborg, 7 years ago

(In [4439]) References #938: Release exporter should export files that can be imported on a different server

Including some files at the aligned level in the sync scripts (no JSON).

comment:35 by Nicklas Nordborg, 7 years ago

(In [4442]) References #938: Release exporter should export files that can be imported on a different server

Re-factored loading of the related "GoodStain" sample so that it can be used with other writers than the StainedWriter.

comment:36 by Nicklas Nordborg, 7 years ago

(In [4443]) References #938: Release exporter should export files that can be imported on a different server

Added histology scores to the export for specimen.

comment:37 by Nicklas Nordborg, 7 years ago

(In [4444]) References #938: Release exporter should export files that can be imported on a different server

Fixed an UnsupportedOperationException issue with converting dates that are coming from the SQL server.

comment:38 by Nicklas Nordborg, 7 years ago

(In [4448]) References #938: Release exporter should export files that can be imported on a different server

Exporting some more annotations for RNA and Library items.

comment:39 by Nicklas Nordborg, 7 years ago

(In [4461]) References #962 and #938. The release exporter is now exporting the reference date and source. The date is converted to a year-only value (integer).

comment:40 by Nicklas Nordborg, 7 years ago

(In [4462]) References #938. Added support for exporting annotation type definitions with units. Tested with specimen writer.

comment:41 by Nicklas Nordborg, 7 years ago

(In [4464]) References #938. Added "His" prefix to annotations related to histology score. "HisName" is also exported to indicate the existence of a histology item even if there are no scores.

comment:42 by Nicklas Nordborg, 7 years ago

(In [4465]) References #938. Added Lysate and Qiacube annotations to the RNA writer. The date annotations should be converted to a "batch index" that is not related to the actual date, except that the should sort in the same order.

comment:43 by Nicklas Nordborg, 7 years ago

(In [4466]) References #938. Batch index annotations are now created with random and unique proxy values that are mapped to an index value after all items have been exported. The mapping is saved in the batch-index-lookup.json file which is used on the importing side to map the proxy to the batch index value.

comment:44 by Nicklas Nordborg, 7 years ago

(In [4468]) References #938. Started to refactor the CohortAnnotationTypeFactory to make it easier (=less code) to create annotation type definitions for the export. Used by the Patient, Case, Specimen and RNA exporters which should now be complete (except that BASE has no unit for concentration that we need for the NdConc annotation).

comment:45 by Nicklas Nordborg, 7 years ago

(In [4469]) References #938. More re-factoring. The CohortAnnotationTypeFactory is now CohortTypeDefFactory and can be used to create both annotation type definitions and file type definitions.

comment:46 by Nicklas Nordborg, 7 years ago

(In [4470]) References #938. Library annotations should now be complete. Added unit to the NDConc annotation for RNA.

comment:47 by Nicklas Nordborg, 7 years ago

(In [4471]) References #938. Added unit to the library molarity annotation.

comment:48 by Nicklas Nordborg, 7 years ago

(In [4472]) References #938. Merged sequences annotations should now be completed.

comment:49 by Nicklas Nordborg, 7 years ago

(In [4473]) References #938. Cufflinks annnotations should now be complete.

comment:50 by Nicklas Nordborg, 7 years ago

(In [4474]) References #938. Re-factoring to make similar methods to use parameters in the same order.

comment:51 by Nicklas Nordborg, 7 years ago

(In [4475]) References #938. Added support for exporting INCA annotations. The annotations to export must be added to the "INCA_Release" category. The annotations are currently exported as is since there is no support for creating masked or re-calculated values.

comment:52 by Nicklas Nordborg, 7 years ago

(In [4479]) References #938. Implemented support for converting INCA values to some other value. The currently implemented rules:

  • IncaExportDate: Converted to year+quarter
  • INCA_A030DiaDat: Converted to year
  • All other INCA date annotations: Converted to number of days relative the INCA_A030DiaDat.
  • INCA_A000Alder: Converted to the nearest higher 5-year value.

comment:53 by Nicklas Nordborg, 7 years ago

(In [4480]) References #938. Exporting Site as an annotation.

comment:54 by Nicklas Nordborg, 7 years ago

(In [4481]) References #938. Implemented an option for if the expression matrix (and related files) should be created or not. Implemented an option for if JSON files should be created or not.

When doing a local export both options are avilable for user configuration. The matrix export is enabled by default while the JSON is disabled.

When doing a remote export, only the matrix export option is available (disabled by default). JSON file export can't be turned off.

The progress bar should now also go from 0 to 100% regardless which options that are selected.

comment:55 by Nicklas Nordborg, 7 years ago

(In [4484]) References #938. Added "SamplingDate" as a specimen annotation. It is calculated as the number of days between the "ReferenceDate" on the case item and the "SamplingDateTime" annotation.

Added unit to INCA dates that are converted to number of days since the reference date.

comment:56 by Nicklas Nordborg, 7 years ago

(In [4485]) References #938. The release export plug-in now ends with a "Done" message.

comment:57 by Nicklas Nordborg, 7 years ago

(In [4513]) References #938. Added support for re-naming annotation types that are exported.

comment:58 by Nicklas Nordborg, 7 years ago

(In [4518]) References #938. Fixed echo commands in scripts so that variables are output by value instead of by name.

comment:59 by Nicklas Nordborg, 7 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.