Opened 8 years ago
Closed 8 years ago
#938 closed enhancement (fixed)
Release exporter should export files that can be imported on a different server
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | critical | Milestone: | Reggie v4.10 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description
The tab-separated files that are currently exported by the Release exporter are not very suitable to use when importing. It would be better to divide the data in a way that allows for importing an entire chain of items from raw bioassay up to biosource from a single file. This would make it easier to handle the multiple transactions that are needed (there is too much data to be able to import it in a single transaction).
The current idea is to produce one JSON-formatted file with data for each item chain. The files are named after the raw bioassay that is the starting point. The files should contain all annotations and file references that are needed on the importing side.
Change History (59)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
Status: | new → assigned |
---|
comment:3 by , 8 years ago
(In [4365]) References #938: Release exporter should export files that can be imported on a different server
The JSON writer now creates index.json
before starting the export. This file contains some information about the list that is exported and is intended to be used by the relax importer to extract some information.
comment:4 by , 8 years ago
comment:5 by , 8 years ago
comment:6 by , 8 years ago
comment:7 by , 8 years ago
(In [4378]) References #938: Release exporter should export files that can be imported on a different server
Added support exporting annotation types and annotation values. All annotation type definitions that are part of the release are exported to "annotationtypes.json" so that they can automatically be re-created on the importing server. The patient and library writer has been used for testing.
comment:8 by , 8 years ago
comment:9 by , 8 years ago
comment:10 by , 8 years ago
(In [4382]) References #938: Release exporter should export files that can be imported on a different server
Exporting an additional file files.json
with paths to the JSON files for the cohort items. This should make it easier on the import side that just have to read the files.json
instead of connecting via SSH and executing a ls
command.
comment:11 by , 8 years ago
(In [4385]) References #938: Release exporter should export files that can be imported on a different server
Added some basic support for exporting file information. Implemented for FASTQ files on the merged level and FPKM files on the rawbioassay level.
The actual files are not copied. It's not decided yet how the file structure is going to look like on the relax side yet so the file export will probably change.
comment:12 by , 8 years ago
comment:13 by , 8 years ago
comment:14 by , 8 years ago
comment:15 by , 8 years ago
comment:16 by , 8 years ago
comment:17 by , 8 years ago
comment:18 by , 8 years ago
(In [4407]) References #938: Release exporter should export files that can be imported on a different server
Started to implement support for exporting other "type" definitions than annotation types... The first other case is to export data file types.
While it works as it is I think we need to rename a few things to make them more generic (eg. CohortAnnotationType
, etc.)
comment:19 by , 8 years ago
comment:20 by , 8 years ago
comment:21 by , 8 years ago
comment:22 by , 8 years ago
(In [4420]) References #938: Release exporter should export files that can be imported on a different server
Added a plugin paramter that asks for the release version to create. Before accepting the parameters the file server is checked to make sure that the given version number doesn't already correspond to a directory in the release archive.
Addded ScriptWriter
class that should be responsible for created the bash scripts that are required for syncing the released data files between the project archive and release archive. At the moment it only creates the mkdirs.sh
script which creates the directory structure that is needed for the release.
comment:23 by , 8 years ago
(In [4421]) References #938: Release exporter should export files that can be imported on a different server
Added support for creating the link script for linking to previously released files. An actual check for finding existing files has not been implemented yet (we simply assume that all files exists in release 0.9).
A temporary (for debugging) 'cat.sh' script has been created in place of the 'rsync.sh' script that creates dummy files.
comment:24 by , 8 years ago
(In [4422]) References #938: Release exporter should export files that can be imported on a different server
The rsync script is now being created. It seems to work but we still need to implement checks for which files already exists (and should be linked) and which need to be copied.
The code is also a bit messy now when it comes to finding the correct path to sync to/from. We are converting back and forth between external and scan-b id too many times...
comment:25 by , 8 years ago
(In [4423]) References #938: Release exporter should export files that can be imported on a different server
Added support for checking for existing released files. To find existing files we first find the top-level directory for each release. Then for each release, we find all REAL files (ignoring symlinks). We store the result in a map that allows us to find the release version a file appears in.
comment:26 by , 8 years ago
comment:27 by , 8 years ago
comment:28 by , 8 years ago
(In [4430]) References #938: Release exporter should export files that can be imported on a different server
Added "overwrite" option when exporting to a remote server.
When checking for existing files, the current release directory is ignored. Needed when the "overwrite" option is in effect since otherwise we would create symbolic links that points to themselves.
The 'rsync' script uses information from the ProjectArchive item to generate a default value for the PROJECTARCHIVE parameter.
comment:29 by , 8 years ago
comment:30 by , 8 years ago
comment:31 by , 8 years ago
comment:32 by , 8 years ago
(In [4437]) References #938: Release exporter should export files that can be imported on a different server
Adding and changing exported properties and annotations.
For all items: no dates Case: Only "Yes" is allowed for consent (in case we accidentally include a No in the export) Specimen: OriginalQuantityMilliGram, DaysToLab, !MinutesToRNALater, BiopsyType, SpecimenType, !Laterality, NofPieces, LinkedSpecimen
comment:33 by , 8 years ago
comment:34 by , 8 years ago
comment:35 by , 8 years ago
comment:36 by , 8 years ago
comment:37 by , 8 years ago
comment:38 by , 8 years ago
comment:39 by , 8 years ago
comment:40 by , 8 years ago
comment:41 by , 8 years ago
comment:42 by , 8 years ago
comment:43 by , 8 years ago
(In [4466]) References #938. Batch index annotations are now created with random and unique proxy values that are mapped to an index value after all items have been exported. The mapping is saved in the batch-index-lookup.json
file which is used on the importing side to map the proxy to the batch index value.
comment:44 by , 8 years ago
(In [4468]) References #938. Started to refactor the CohortAnnotationTypeFactory
to make it easier (=less code) to create annotation type definitions for the export. Used by the Patient, Case, Specimen and RNA exporters which should now be complete (except that BASE has no unit for concentration that we need for the NdConc annotation).
comment:45 by , 8 years ago
comment:46 by , 8 years ago
comment:47 by , 8 years ago
comment:48 by , 8 years ago
comment:49 by , 8 years ago
comment:50 by , 8 years ago
comment:51 by , 8 years ago
comment:52 by , 8 years ago
(In [4479]) References #938. Implemented support for converting INCA values to some other value. The currently implemented rules:
IncaExportDate
: Converted to year+quarterINCA_A030DiaDat
: Converted to year- All other INCA date annotations: Converted to number of days relative the
INCA_A030DiaDat
. INCA_A000Alder
: Converted to the nearest higher 5-year value.
comment:54 by , 8 years ago
(In [4481]) References #938. Implemented an option for if the expression matrix (and related files) should be created or not. Implemented an option for if JSON files should be created or not.
When doing a local export both options are avilable for user configuration. The matrix export is enabled by default while the JSON is disabled.
When doing a remote export, only the matrix export option is available (disabled by default). JSON file export can't be turned off.
The progress bar should now also go from 0 to 100% regardless which options that are selected.
comment:55 by , 8 years ago
comment:56 by , 8 years ago
comment:57 by , 8 years ago
comment:58 by , 8 years ago
comment:59 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
(In [4346]) References #938: Release exporter should export files that can be imported on a different server
Added a first (and very simple) version of a JSON writer. It creates one JSON file for each raw bioassay which holds an array of all items in the cohort chain up to patient. So far, only the name, type and subtype of each item is exported.