Opened 5 years ago

Closed 5 years ago

#1099 closed task (fixed)

The cohort exporter should have support for exporting downstream items

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Relax v1.4
Component: net.sf.basedb.relax Keywords:
Cc:

Description

The cohort exporter (see #987) currently takes an item list and export data tables for all items and their parent items.

The functionality should be extended with support for also exporting the child items. The major use case is that it should be possible to select a list of biosource items (=all items in a given release) and then export all child items.

One file for each child subtype should be exported (including splitting of StringTie and Cufflinks). Since the number of child items vary the number of lines in each file will of course also vary and it will not be possible to use line number as a key for matching data across different files. It is of course possible to use item names and split on '.' to find substrings that can be used for matching, but this tend to be difficult since not all items follow the same path in the lab (for example, NeoPrep libraries go directly from RNA to Lib, while other libraries have mRNA and cDNA inbetween). I think the exported files should at least have a "Parent name" column to make the mapping a little easier.

Change History (15)

comment:1 by Nicklas Nordborg, 5 years ago

(In [5219]) References #1099: The cohort exporter should have support for exporting downstream items

The basic functionality is in place. Child items are loaded and data files are created and filled with data. There is not yet any parent name column and raw bioassays are all in a single file.

comment:2 by Nicklas Nordborg, 5 years ago

(In [5220]) References #1099: The cohort exporter should have support for exporting downstream items

Added a "Parent item name" column to all exported files. The columns have been re-organized so that the current item is always the first column, then the parent item name and root item name follows. For raw bioassays the library name is also included.

comment:3 by Nicklas Nordborg, 5 years ago

(In [5221]) References #1099: The cohort exporter should have support for exporting downstream items

Using the raw data type as the subtype for raw bioassays splits the ouput into different files for each raw data type. But this causes a problem if the source list is a raw bioassay list with different raw data types since there is a requirement that all subtypes are equal (it would be the same problem if mixing different subtype of, for example, extracts).

I think we need to re-work the upwards export to allow different subtypes at the same level (for example, we could start with a list containing both DNA and RNA items). Of course, this would also mean that files that are exported from the bottom and up no longer have the same number of rows and it will not be possible to match items by row number. We should then also need to check that no duplicates are written to a higher-level file (for example, the same lysate is only written once even if we have both a RNA and a DNA in the source list).

I think this may even be an advantage since item matching must always done by name no matter if the export is from the top or the bottom. Mistakes resulting from using row number matching on the wrong type of export can then avoided.

comment:4 by Nicklas Nordborg, 5 years ago

(In [5222]) References #1099: The cohort exporter should have support for exporting downstream items

Each item should now only appear on one row in a file even if we are exporting top-down or bottom-up.

The "Root item" column was also removed since it will not work in bottom-up mode unless rows are duplicated. Also removed some other RawBioAssay specific columns that are not needed.

One remaining issue with splitting RawBioAssays in different files is that all annotations types are included in all files even if they are not used. For example, the PILOT_* annotations should only be included for Cufflinks data. This can probably be solved by treating the RawDataType as a subtype that should be matched against an annotation type category.

comment:5 by Nicklas Nordborg, 5 years ago

(In [5223]) References #1099: The cohort exporter should have support for exporting downstream items

Annotation types are now restricted by category (=raw data type) also for RawBioAssay items. The only problem now is that nothing is exported since the importer is not creating categories, but that should be relatively easy to fix (can also be done manually as a workaround).

comment:6 by Nicklas Nordborg, 5 years ago

Regarding categories for RawBioAssay annotations. This can't be set by the importer since there is no information about which raw data type an annotation is used by in the exported file. This must be set by the exporter plug-in in Reggie. See #1100.

comment:7 by Nicklas Nordborg, 5 years ago

(In [5227]) References #1099: The cohort exporter should have support for exporting downstream items

Export the top item name (=patient name) in all files. I think this could be helpful in a lot of cases and also makes the specimen export much more useful since it contains the names of specimen, case and patient in a single place.

comment:8 by Nicklas Nordborg, 5 years ago

(In [5228]) References #1099: The cohort exporter should have support for exporting downstream items

All "name" columns are now exported with camelcase (eg. "SpecimenName" instead of "Specimen name".

comment:9 by Nicklas Nordborg, 5 years ago

(In [5229]) References #1099: The cohort exporter should have support for exporting downstream items

The cohort exporter now also works with child items if the start list is a list with derived bioassays.

comment:10 by Nicklas Nordborg, 5 years ago

Owner: set to Nicklas Nordborg
Status: newassigned

comment:11 by Nicklas Nordborg, 5 years ago

(In [5238]) References #1099: The cohort exporter should have support for exporting downstream items

Added a parameter for controlling if child items should be exported or not.

comment:12 by Nicklas Nordborg, 5 years ago

(In [5241]) References #1099: The cohort exporter should have support for exporting downstream items

The installation wizard is now creating categories for Cufflinks and StringTie annotations.

comment:13 by Nicklas Nordborg, 5 years ago

(In [5244]) References #1099: The cohort exporter should have support for exporting downstream items

Changes to the Level3 exporter due to changes made in functionality that is shared with the cohort exporter.

comment:14 by Nicklas Nordborg, 5 years ago

(In [5246]) References #1099: The cohort exporter should have support for exporting downstream items

Added a check to the cohort and level3 exporters that verifies that all items in the selected list are shared to the current project.

comment:15 by Nicklas Nordborg, 5 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.