Opened 4 years ago
Closed 3 years ago
#1299 closed task (fixed)
Import of FASTQ files from external lab — at Version 13
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | Reggie v4.32 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description (last modified by )
This is related to #1295 and is the second step after importing information about the specimen. The import in #1295 should create all items from the specimen down to MergedBioAssay DemuxedSquences. The DemuxedSquences will be different and only represent a single library instead of an entire flow cell. See [6215].
It will probably be a good idea to use the pattern with an item list to control the flow of items between #1295 and this import.
It is expected that the FASTQ files have not been processed after the demux except for adapter trimming that has replaced reads with N instead of removing reads from the FASTQ files.
The import functionality need to make those FASTQ files "compatible" with the FASTQ files we get from the regular demux wizard. This means we have to replicate the following steps:
- Bowtie to estimate the fragment size and standard deviation
- Trimmomatic step 1 to remove the N reads. This is an alternate version of step 1 in the regular demux and will allow is to get values for the PF_READS and ADAPTER_READS annotations
- Trimmomatic step 2 to remove low-quality reads. This is exactly the same as the second step in the regular demux wizard.
Auto-confirmation should be supported and continue with the regular secondary analysis workflow (eg. the Legacy pipeline and Hisat alignment). The same checks (for individual items) that are used in the regular demux wizard should be made.
A manual confirmation wizard is also needed with options for accepting, retrying the import or flagging the RNA.
Change History (13)
comment:1 by , 4 years ago
comment:12 by , 3 years ago
In [6215]:
References #1295: Registration of specimen handled by external lab
Re-designed the importer to stop at DemuxedSequnces instead of MergedSequences.
There will be one DemuxedSequences for each library instead of one for each flow cell. The DemuxedSequences item will be a named after the library an use 'x' suffix. The suffix is NOT propagated to child MergedSequences which uses 'g' as before. For example:
Lib: 1234.r.lib Demux: 1234.r.lib.x Merge: 1234.r.lib.g
The DemuxedSequences will also have information about the original FASTQ files that we recieved from the external lab.
The MergedSequences item is now created by the FASTQ import script. This change makes it easier to redo failed FASTQ import since we can just delete the MergedSequences and try again.
comment:13 by , 3 years ago
Description: | modified (diff) |
---|---|
Resolution: | → fixed |
Status: | new → closed |
In 6179: