Implement new secondary analysis pipeline
|Reported by:||Nicklas Nordborg||Owned by:||Nicklas Nordborg|
The new secondary analysis pipeline should be implemented. It starts with
MergedSequences. The first step is similar to the old Mask+Align step but uses Hisat instead of Tophat. The post-processing scripts that is run afterwards for collecting statistics should not have to be changed but this needs to be verified. Just as for the original pipeline there is a breakpoint after the alignment step which means we have to store result files back to the project archive.
Auto-confirm rules should be considered but we may want to start with manual confirmation.
The second step is to calculate expression values with Stringtie. We need a new raw data type to be able to separate this from the cufflinks data. We need to investigate what kind of files that are produced by Stringtie and which files we should define as file types and which files that should only be generically linked.
Since the new pipeline is going to live alongside the legacy pipeline we need a way to separate items. We can, for example, define new subtypes for items belonging to the new pipeline. While this makes it relatively easy to implement the new pipeline there are some drawbacks in other areas:
- The current structure (
RawBioAssay -> AlignedSequences -> MaskedSequences -> MergedSequences) is built into a lot of other places like the case summary, yellow label wizard, release exporter, etc. Introducing new subtypes will require changes in several other places to make them behave as we want to.
- We also need new subtypes for protocols and software.
- If we add more pipelines in the future there is going to be a "subtype explosion" which will make the first point above even more complex to handle.
Another possibility is to keep and re-use the current subtypes and maybe use an annotation to indicate which pipeline an item belongs to. We still need to check the case summary, yellow label wizard, etc. but I think fewer changes are needed. More care is needed when implementing the new pipeline since we really have to be sure that items are not mixed up and suddenly starts to be processed by the incorrect pipeline.