Opened 6 years ago

Closed 6 years ago

#1016 closed task (fixed)

Implement Stringtie step in the Hisat pipeline

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.15
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

This is similar to the Cufflinks step in the legacy pipeline. It will produce some different but similar files. No data should be imported into the database. We need to define a new rawdata type.

Change History (16)

comment:1 by Nicklas Nordborg, 6 years ago

Status: newassigned

comment:2 by Nicklas Nordborg, 6 years ago

(In [4659]) References #1016: Implement Stringtie step in the Hisat pipeline

Installing an item list (Stringtie Pipeline) for AlignedSequences that should be processed with Stringtie.

comment:3 by Nicklas Nordborg, 6 years ago

(In [4660]) References #1016: Implement Stringtie step in the Hisat pipeline

The auto-confirm after Hisat now adds the resulting AlignedSequences item to the Stringtie list if it passes the rules for auto-confirmation.

The manual "Confirm Hisat alignment" wizard has also been updated to add items to the Stringtie list.

comment:4 by Nicklas Nordborg, 6 years ago

(In [4661]) References #1016: Implement Stringtie step in the Hisat pipeline

Started with the "Start Stringtie" wizard. It is possible to select all parameters needed to be able to create the job. The actual job/script generation is still an empty skeleton.

The code for selecting Protocol, Software and Array design is currently listing items related to the Cufflinks pipeline. We need a way to separate the items that is similar to the AlignmentType annotation used to separate Tophat and Hisat. This should work well for software and protocol. The array design may need some more thinking since it is currently linked to the "Sequencing/Expression-like" platform item defined by BASE. This is a rather Cufflinks-centric approach and requires a GTF file for the array design, and FPKM files for the "raw data".

comment:5 by Nicklas Nordborg, 6 years ago

(In [4662]) References #1016: Implement Stringtie step in the Hisat pipeline

Added ExpressionType annotation type which can take values 'Cufflinks' or 'Stringtie'. It should be used on protocol and software items (feature extraction) to make it possible to filter out software and protocols for the stringtie/cufflinks wizards.

Added auto-confirmation code for starting Stringtie. It should work once the script generation is implemented in StringtieJobCreator.

comment:6 by Nicklas Nordborg, 6 years ago

(In [4663]) References #1016: Implement Stringtie step in the Hisat pipeline

"Stringtie" should be "StringTie". Changed in a lot of places.

comment:7 by Nicklas Nordborg, 6 years ago

(In [4664]) References #1016: Implement Stringtie step in the Hisat pipeline

A StringTie script is now generated and submitted to the cluster. Generated files are linked back to BASE.

The raw bioassay item that is created is currently using the "cufflinks" raw data type and platform. This must be changed since this now means that the resulting item is showing up in a lot places in reggie that expects Cufflinks data.

comment:8 by Nicklas Nordborg, 6 years ago

(In [4665]) References #1016: Implement Stringtie step in the Hisat pipeline

Changes to all places in reggie that work with raw bioassays to make sure that they all have a proper filter on "rawDataType=cufflinks" where this matters.

Hopefully this should prevent wizards that need Cufflinks-data from work with other data. It may still be possible to manually select a different raw bioassay, but this will most likely result in some kind of error message.

Note that the StringTie wizard still create a "cufflinks" raw bioassay. This need to be solved and then we can re-check the other wizards if they should be able to work with StringTie data.

comment:9 by Nicklas Nordborg, 6 years ago

(In [4666]) References #1016: Implement Stringtie step in the Hisat pipeline

Updated to require BASE 3.11.3 since we need the changes in http://base.thep.lu.se/ticket/2108 to be able to link StringTie raw bioassays with jobs and the Open Grid Cluster.

comment:10 by Nicklas Nordborg, 6 years ago

(In [4667]) References #1016: Implement Stringtie step in the Hisat pipeline

Extended the Rawdatatype class with more information about platform variant. All places which need filtering on raw data type has been updated again. This time it was also possible to fix filtering and searching for a matching ArrayDesign.

Added a STRINGTIE raw data type. So far, this need to be registered manually as a variant to the 'Sequencing' platform and must have an external id = 'sequencing.stringtie'. The 'GTF' file should be added as a file type.

An array design must also be created manually and a GTF file should be attached to it (this is checked before it can be used in the wizards).

comment:11 by Nicklas Nordborg, 6 years ago

(In [4668]) References #1016: Implement Stringtie step in the Hisat pipeline

More changes to filters in different places to make manual selection more reliable.

comment:12 by Nicklas Nordborg, 6 years ago

(In [4669]) References #1016: Implement Stringtie step in the Hisat pipeline

Preliminary using the "generic raw data" file type for the "gene.tsv" file created by StringTie. Changing it to a more specific type should be relatively easy if we decide to do that. The "Generic raw data" need to be manually linked to the "StringTie" platform (see [4667]).

Also added a step that parses the "gene.tsv" file and counts the number of unique "Gene ID" values in it. This number is stored in the RawBioAssay.numFileSpots property and the "Valid" flag is set on the file.

comment:13 by Nicklas Nordborg, 6 years ago

(In [4670]) References #1016: Implement Stringtie step in the Hisat pipeline

Added StringTie confirmation wizard and auto-confirmation support.

comment:14 by Nicklas Nordborg, 6 years ago

(In [4673]) References #1016: Implement Stringtie step in the Hisat pipeline

Installation wizard will now create the requried platform variant for StringTie.

comment:15 by Nicklas Nordborg, 6 years ago

(In [4674]) References #1016: Implement Stringtie step in the Hisat pipeline

Added raw data type to case summary.

comment:16 by Nicklas Nordborg, 6 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.