Opened 2 months ago

Last modified 3 hours ago

#938 assigned enhancement

Release exporter should export files that can be imported on a different server

Reported by: nicklas Owned by: nicklas
Priority: critical Milestone: Reggie v4.10
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

The tab-separated files that are currently exported by the Release exporter are not very suitable to use when importing. It would be better to divide the data in a way that allows for importing an entire chain of items from raw bioassay up to biosource from a single file. This would make it easier to handle the multiple transactions that are needed (there is too much data to be able to import it in a single transaction).

The current idea is to produce one JSON-formatted file with data for each item chain. The files are named after the raw bioassay that is the starting point. The files should contain all annotations and file references that are needed on the importing side.

Change History (41)

comment:1 Changed 2 months ago by nicklas

(In [4346]) References #938: Release exporter should export files that can be imported on a different server

Added a first (and very simple) version of a JSON writer. It creates one JSON file for each raw bioassay which holds an array of all items in the cohort chain up to patient. So far, only the name, type and subtype of each item is exported.

comment:2 Changed 2 months ago by nicklas

  • Status changed from new to assigned

comment:3 Changed 8 weeks ago by nicklas

(In [4365]) References #938: Release exporter should export files that can be imported on a different server

The JSON writer now creates index.json before starting the export. This file contains some information about the list that is exported and is intended to be used by the relax importer to extract some information.

comment:4 Changed 8 weeks ago by nicklas

(In [4367]) References #938: Release exporter should export files that can be imported on a different server

The release exporter now creates the exportcomplete file to indicate that the export has been completed.

comment:5 Changed 8 weeks ago by nicklas

(In [4369]) References #938: Release exporter should export files that can be imported on a different server

Exporting the parent item name for each item.

comment:6 Changed 8 weeks ago by nicklas

(In [4371]) References #938: Release exporter should export files that can be imported on a different server

Re-factored the JSON export so that each CohortWriter implementation is responsible for generating it's own part of the JSON array.

comment:7 Changed 7 weeks ago by nicklas

(In [4378]) References #938: Release exporter should export files that can be imported on a different server

Added support exporting annotation types and annotation values. All annotation type definitions that are part of the release are exported to "annotationtypes.json" so that they can automatically be re-created on the importing server. The patient and library writer has been used for testing.

comment:8 Changed 7 weeks ago by nicklas

(In [4379]) References #938: Release exporter should export files that can be imported on a different server

Started to refactor the cohort annotation export to make it more flexible and easier to use for "virtual" annotation types.

comment:9 Changed 7 weeks ago by nicklas

(In [4380]) References #938: Release exporter should export files that can be imported on a different server

Some more changes to make it possible to format dates (and other values) as we want.

comment:10 Changed 7 weeks ago by nicklas

(In [4382]) References #938: Release exporter should export files that can be imported on a different server

Exporting an additional file files.json with paths to the JSON files for the cohort items. This should make it easier on the import side that just have to read the files.json instead of connecting via SSH and executing a ls command.

comment:11 Changed 7 weeks ago by nicklas

(In [4385]) References #938: Release exporter should export files that can be imported on a different server

Added some basic support for exporting file information. Implemented for FASTQ files on the merged level and FPKM files on the rawbioassay level.

The actual files are not copied. It's not decided yet how the file structure is going to look like on the relax side yet so the file export will probably change.

comment:12 Changed 7 weeks ago by nicklas

(In [4389]) References #938: Release exporter should export files that can be imported on a different server

Added support for exporting annotation values as project-specific annotations. Use RNAQC and RNAQC date as a test case on the RNA item since we always use the latest information here.

comment:13 Changed 6 weeks ago by nicklas

(In [4394]) References #938: Release exporter should export files that can be imported on a different server

Exporting the registration date.

comment:14 Changed 6 weeks ago by nicklas

(In [4396]) References #938: Release exporter should export files that can be imported on a different server

Exporting the creation date.

comment:15 Changed 6 weeks ago by nicklas

(In [4398]) References #938: Release exporter should export files that can be imported on a different server

RNAQC date need to be formatted as a date.

comment:16 Changed 5 weeks ago by nicklas

(In [4402]) References #938: Release exporter should export files that can be imported on a different server

Exporting the DataFilesFolder annotation. We want to make this a project-specific annotation on the relax side since the idea is to create a new top-folder for every release.

comment:17 Changed 5 weeks ago by nicklas

(In [4404]) References #938: Release exporter should export files that can be imported on a different server

Exporting more metadata about files.

comment:18 Changed 5 weeks ago by nicklas

(In [4407]) References #938: Release exporter should export files that can be imported on a different server

Started to implement support for exporting other "type" definitions than annotation types... The first other case is to export data file types.

While it works as it is I think we need to rename a few things to make them more generic (eg. CohortAnnotationType, etc.)

comment:19 Changed 5 weeks ago by nicklas

(In [4408]) References #938: Release exporter should export files that can be imported on a different server

Renamed the CohortAnnotationType class to CohortTypeDef and other changes related to this.

Type definitions are now exported to typedefs.json instead of annotationtypes.json.

comment:20 Changed 5 weeks ago by nicklas

(In [4410]) References #938: Release exporter should export files that can be imported on a different server

Exporting FPKM file type.

comment:21 Changed 5 weeks ago by nicklas

(In [4412]) References #938: Release exporter should export files that can be imported on a different server

Exporting files linked to items with any-to-any links.

comment:22 Changed 5 weeks ago by nicklas

(In [4420]) References #938: Release exporter should export files that can be imported on a different server

Added a plugin paramter that asks for the release version to create. Before accepting the parameters the file server is checked to make sure that the given version number doesn't already correspond to a directory in the release archive.

Addded ScriptWriter class that should be responsible for created the bash scripts that are required for syncing the released data files between the project archive and release archive. At the moment it only creates the mkdirs.sh script which creates the directory structure that is needed for the release.

comment:23 Changed 4 weeks ago by nicklas

(In [4421]) References #938: Release exporter should export files that can be imported on a different server

Added support for creating the link script for linking to previously released files. An actual check for finding existing files has not been implemented yet (we simply assume that all files exists in release 0.9).

A temporary (for debugging) 'cat.sh' script has been created in place of the 'rsync.sh' script that creates dummy files.

comment:24 Changed 4 weeks ago by nicklas

(In [4422]) References #938: Release exporter should export files that can be imported on a different server

The rsync script is now being created. It seems to work but we still need to implement checks for which files already exists (and should be linked) and which need to be copied.

The code is also a bit messy now when it comes to finding the correct path to sync to/from. We are converting back and forth between external and scan-b id too many times...

comment:25 Changed 4 weeks ago by nicklas

(In [4423]) References #938: Release exporter should export files that can be imported on a different server

Added support for checking for existing released files. To find existing files we first find the top-level directory for each release. Then for each release, we find all REAL files (ignoring symlinks). We store the result in a map that allows us to find the release version a file appears in.

comment:26 Changed 4 weeks ago by nicklas

(In [4427]) References #938: Release exporter should export files that can be imported on a different server

Cleaning up the script generation code.

  • Moved common parts to functions
  • Scripts are created with execute permission set
  • More checks and error handling

comment:27 Changed 4 weeks ago by nicklas

(In [4428]) References #938: Release exporter should export files that can be imported on a different server

Removed the auto-generated UUID since it has been fully replaced by the release version.

comment:28 Changed 4 weeks ago by nicklas

(In [4430]) References #938: Release exporter should export files that can be imported on a different server

Added "overwrite" option when exporting to a remote server.

When checking for existing files, the current release directory is ignored. Needed when the "overwrite" option is in effect since otherwise we would create symbolic links that points to themselves.

The 'rsync' script uses information from the ProjectArchive item to generate a default value for the PROJECTARCHIVE parameter.

comment:29 Changed 4 weeks ago by nicklas

(In [4432]) References #938: Release exporter should export files that can be imported on a different server

Files that have been marked for deletion are not included in the export.

comment:30 Changed 4 weeks ago by nicklas

(In [4434]) References #938: Release exporter should export files that can be imported on a different server

Fixed incorrect description for plugin parameter.

comment:31 Changed 4 weeks ago by nicklas

(In [4435]) References #938: Release exporter should export files that can be imported on a different server

Exporting platform and platform variant for raw bioassays.

comment:32 Changed 3 weeks ago by nicklas

(In [4437]) References #938: Release exporter should export files that can be imported on a different server

Adding and changing exported properties and annotations.

For all items: no dates Case: Only "Yes" is allowed for consent (in case we accidentally include a No in the export) Specimen: OriginalQuantityMilliGram, DaysToLab, !MinutesToRNALater, BiopsyType, SpecimenType, !Laterality, NofPieces, LinkedSpecimen

comment:33 Changed 3 weeks ago by nicklas

(In [4438]) References #938: Release exporter should export files that can be imported on a different server

Added support for white- and blacklisting files that should be included in the export.

comment:34 Changed 3 weeks ago by nicklas

(In [4439]) References #938: Release exporter should export files that can be imported on a different server

Including some files at the aligned level in the sync scripts (no JSON).

comment:35 Changed 3 weeks ago by nicklas

(In [4442]) References #938: Release exporter should export files that can be imported on a different server

Re-factored loading of the related "GoodStain?" sample so that it can be used with other writers than the StainedWriter.

comment:36 Changed 3 weeks ago by nicklas

(In [4443]) References #938: Release exporter should export files that can be imported on a different server

Added histology scores to the export for specimen.

comment:37 Changed 3 weeks ago by nicklas

(In [4444]) References #938: Release exporter should export files that can be imported on a different server

Fixed an UnsupportedOperationException issue with converting dates that are coming from the SQL server.

comment:38 Changed 2 weeks ago by nicklas

(In [4448]) References #938: Release exporter should export files that can be imported on a different server

Exporting some more annotations for RNA and Library items.

comment:39 Changed 3 days ago by nicklas

(In [4461]) References #962 and #938. The release exporter is now exporting the reference date and source. The date is converted to a year-only value (integer).

comment:40 Changed 3 days ago by nicklas

(In [4462]) References #938. Added support for exporting annotation type definitions with units. Tested with specimen writer.

comment:41 Changed 3 hours ago by nicklas

(In [4464]) References #938. Added "His" prefix to annotations related to histology score. "HisName?" is also exported to indicate the existence of a histology item even if there are no scores.

Note: See TracTickets for help on using tickets.