Opened 7 years ago
Closed 7 years ago
#1016 closed task (fixed)
Implement Stringtie step in the Hisat pipeline
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | Reggie v4.15 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description
This is similar to the Cufflinks step in the legacy pipeline. It will produce some different but similar files. No data should be imported into the database. We need to define a new rawdata type.
Change History (16)
comment:1 by , 7 years ago
Status: | new → assigned |
---|
comment:2 by , 7 years ago
comment:3 by , 7 years ago
(In [4660]) References #1016: Implement Stringtie step in the Hisat pipeline
The auto-confirm after Hisat now adds the resulting AlignedSequences item to the Stringtie list if it passes the rules for auto-confirmation.
The manual "Confirm Hisat alignment" wizard has also been updated to add items to the Stringtie list.
comment:4 by , 7 years ago
(In [4661]) References #1016: Implement Stringtie step in the Hisat pipeline
Started with the "Start Stringtie" wizard. It is possible to select all parameters needed to be able to create the job. The actual job/script generation is still an empty skeleton.
The code for selecting Protocol, Software and Array design is currently listing items related to the Cufflinks pipeline. We need a way to separate the items that is similar to the AlignmentType annotation used to separate Tophat and Hisat. This should work well for software and protocol. The array design may need some more thinking since it is currently linked to the "Sequencing/Expression-like" platform item defined by BASE. This is a rather Cufflinks-centric approach and requires a GTF file for the array design, and FPKM files for the "raw data".
comment:5 by , 7 years ago
(In [4662]) References #1016: Implement Stringtie step in the Hisat pipeline
Added ExpressionType annotation type which can take values 'Cufflinks' or 'Stringtie'. It should be used on protocol and software items (feature extraction) to make it possible to filter out software and protocols for the stringtie/cufflinks wizards.
Added auto-confirmation code for starting Stringtie. It should work once the script generation is implemented in StringtieJobCreator.
comment:6 by , 7 years ago
comment:7 by , 7 years ago
(In [4664]) References #1016: Implement Stringtie step in the Hisat pipeline
A StringTie script is now generated and submitted to the cluster. Generated files are linked back to BASE.
The raw bioassay item that is created is currently using the "cufflinks" raw data type and platform. This must be changed since this now means that the resulting item is showing up in a lot places in reggie that expects Cufflinks data.
comment:8 by , 7 years ago
(In [4665]) References #1016: Implement Stringtie step in the Hisat pipeline
Changes to all places in reggie that work with raw bioassays to make sure that they all have a proper filter on "rawDataType=cufflinks" where this matters.
Hopefully this should prevent wizards that need Cufflinks-data from work with other data. It may still be possible to manually select a different raw bioassay, but this will most likely result in some kind of error message.
Note that the StringTie wizard still create a "cufflinks" raw bioassay. This need to be solved and then we can re-check the other wizards if they should be able to work with StringTie data.
comment:9 by , 7 years ago
(In [4666]) References #1016: Implement Stringtie step in the Hisat pipeline
Updated to require BASE 3.11.3 since we need the changes in http://base.thep.lu.se/ticket/2108 to be able to link StringTie raw bioassays with jobs and the Open Grid Cluster.
comment:10 by , 7 years ago
(In [4667]) References #1016: Implement Stringtie step in the Hisat pipeline
Extended the Rawdatatype
class with more information about platform variant. All places which need filtering on raw data type has been updated again. This time it was also possible to fix filtering and searching for a matching ArrayDesign.
Added a STRINGTIE raw data type. So far, this need to be registered manually as a variant to the 'Sequencing' platform and must have an external id = 'sequencing.stringtie'. The 'GTF' file should be added as a file type.
An array design must also be created manually and a GTF file should be attached to it (this is checked before it can be used in the wizards).
comment:11 by , 7 years ago
comment:12 by , 7 years ago
(In [4669]) References #1016: Implement Stringtie step in the Hisat pipeline
Preliminary using the "generic raw data" file type for the "gene.tsv" file created by StringTie. Changing it to a more specific type should be relatively easy if we decide to do that. The "Generic raw data" need to be manually linked to the "StringTie" platform (see [4667]).
Also added a step that parses the "gene.tsv" file and counts the number of unique "Gene ID" values in it. This number is stored in the RawBioAssay.numFileSpots
property and the "Valid" flag is set on the file.
comment:13 by , 7 years ago
comment:14 by , 7 years ago
comment:15 by , 7 years ago
comment:16 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
(In [4659]) References #1016: Implement Stringtie step in the Hisat pipeline
Installing an item list (Stringtie Pipeline) for AlignedSequences that should be processed with Stringtie.