Opened 4 years ago

Closed 4 years ago

#1266 closed task (fixed)

Run prepDE.py in the StringTie pipeline

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.27.4
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

The prepDE.py script (http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#deseq) is part of the StringTie package and produces files with hypothetical count for gene and transcripts. It uses information from the transcript.gtf produced by StringTie:

echo SAMPLEID stringtie/transcript.gtf > input.lst
prepDE.py -i input.lst \
  -g stringtie/gene_count.csv \
  -t stringtie/transcript_count.csv

Note! We are investigating what to do with the --length parameter. The default value is 75, but we have used different read lengths at different times (eg. 2x50, 2x75, 2x100).

The gene_count.csv and transcript_count.csv files should also be included in the release export.

Change History (8)

comment:1 by Nicklas Nordborg, 4 years ago

In 6001:

References #1266: Run prepDE.py in the StringTie pipeline

Now implemented in the script. We use the alignement name (with external specimen id) as SAMPLEID. Output is stored in gene_count.csv and transcript_count.csv.

comment:2 by Nicklas Nordborg, 4 years ago

In 6002:

References #1266: Run prepDE.py in the StringTie pipeline

Added gene_count.csv and transcript_count.csv to files that are exported by the release exporter.

comment:3 by Nicklas Nordborg, 4 years ago

In 6003:

References #1266: Run prepDE.py in the StringTie pipeline

Better to use the name of the StringTie item instead of the alignment.

comment:4 by Nicklas Nordborg, 4 years ago

In 6004:

References #1266: Run prepDE.py in the StringTie pipeline

Added a wizard for running prepDE.py on all existing StringTie rawbioassays that doesn't have count data files. Only the GUI has been implemented so far.

comment:5 by Nicklas Nordborg, 4 years ago

In 6005:

References #1266: Run prepDE.py in the StringTie pipeline

Added PrepDEJobCreator which generate job for running prepDE.py for existing StringTie raw bioassays. The wizard simply submit jobs in batches of 500 for all existing raw bioassays that doesn't have count data already.

comment:6 by Nicklas Nordborg, 4 years ago

In 6006:

References #1266: Run prepDE.py in the StringTie pipeline

Inlcuded prepDE jobs in the auto-confirmation handling by adding a temporary any-to-any link "gene_count.csv" that points to the current job instead of the actual file. This causes raw bioassays that have been scheduled for processing to not be included in the count anymore andthere is no risk that prepDE is scheduled more than once for the same raw bioassays. If all goes well, the link is replaced with the actual result file, otherwise PrepDEAutoConfirmer will remove the link and it will be possible to try again.

comment:7 by Nicklas Nordborg, 4 years ago

In 6007:

References #1266: Run prepDE.py in the StringTie pipeline

Added -l parameter to the prepDE command line. The length is calculated from the read string used in the demux by adding all T values. If the data comes from more than one sequencing run and have been demuxed with different settings the average value is used.

comment:8 by Nicklas Nordborg, 4 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.