Opened 6 years ago
Closed 6 years ago
#1099 closed task (fixed)
The cohort exporter should have support for exporting downstream items
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | Relax v1.4 |
Component: | net.sf.basedb.relax | Keywords: | |
Cc: |
Description
The cohort exporter (see #987) currently takes an item list and export data tables for all items and their parent items.
The functionality should be extended with support for also exporting the child items. The major use case is that it should be possible to select a list of biosource items (=all items in a given release) and then export all child items.
One file for each child subtype should be exported (including splitting of StringTie and Cufflinks). Since the number of child items vary the number of lines in each file will of course also vary and it will not be possible to use line number as a key for matching data across different files. It is of course possible to use item names and split on '.' to find substrings that can be used for matching, but this tend to be difficult since not all items follow the same path in the lab (for example, NeoPrep libraries go directly from RNA to Lib, while other libraries have mRNA and cDNA inbetween). I think the exported files should at least have a "Parent name" column to make the mapping a little easier.
Change History (15)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
(In [5220]) References #1099: The cohort exporter should have support for exporting downstream items
Added a "Parent item name" column to all exported files. The columns have been re-organized so that the current item is always the first column, then the parent item name and root item name follows. For raw bioassays the library name is also included.
comment:3 by , 6 years ago
(In [5221]) References #1099: The cohort exporter should have support for exporting downstream items
Using the raw data type as the subtype for raw bioassays splits the ouput into different files for each raw data type. But this causes a problem if the source list is a raw bioassay list with different raw data types since there is a requirement that all subtypes are equal (it would be the same problem if mixing different subtype of, for example, extracts).
I think we need to re-work the upwards export to allow different subtypes at the same level (for example, we could start with a list containing both DNA and RNA items). Of course, this would also mean that files that are exported from the bottom and up no longer have the same number of rows and it will not be possible to match items by row number. We should then also need to check that no duplicates are written to a higher-level file (for example, the same lysate is only written once even if we have both a RNA and a DNA in the source list).
I think this may even be an advantage since item matching must always done by name no matter if the export is from the top or the bottom. Mistakes resulting from using row number matching on the wrong type of export can then avoided.
comment:4 by , 6 years ago
(In [5222]) References #1099: The cohort exporter should have support for exporting downstream items
Each item should now only appear on one row in a file even if we are exporting top-down or bottom-up.
The "Root item" column was also removed since it will not work in bottom-up mode unless rows are duplicated. Also removed some other RawBioAssay specific columns that are not needed.
One remaining issue with splitting RawBioAssays in different files is that all annotations types are included in all files even if they are not used. For example, the PILOT_*
annotations should only be included for Cufflinks
data. This can probably be solved by treating the RawDataType as a subtype that should be matched against an annotation type category.
comment:5 by , 6 years ago
(In [5223]) References #1099: The cohort exporter should have support for exporting downstream items
Annotation types are now restricted by category (=raw data type) also for RawBioAssay items. The only problem now is that nothing is exported since the importer is not creating categories, but that should be relatively easy to fix (can also be done manually as a workaround).
comment:6 by , 6 years ago
Regarding categories for RawBioAssay annotations. This can't be set by the importer since there is no information about which raw data type an annotation is used by in the exported file. This must be set by the exporter plug-in in Reggie. See #1100.
comment:7 by , 6 years ago
(In [5227]) References #1099: The cohort exporter should have support for exporting downstream items
Export the top item name (=patient name) in all files. I think this could be helpful in a lot of cases and also makes the specimen export much more useful since it contains the names of specimen, case and patient in a single place.
comment:8 by , 6 years ago
(In [5228]) References #1099: The cohort exporter should have support for exporting downstream items
All "name" columns are now exported with camelcase (eg. "SpecimenName" instead of "Specimen name".
comment:9 by , 6 years ago
comment:10 by , 6 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:11 by , 6 years ago
comment:12 by , 6 years ago
comment:13 by , 6 years ago
comment:14 by , 6 years ago
comment:15 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
(In [5219]) References #1099: The cohort exporter should have support for exporting downstream items
The basic functionality is in place. Child items are loaded and data files are created and filled with data. There is not yet any parent name column and raw bioassays are all in a single file.