Opened 4 years ago

Closed 3 years ago

#1299 closed task (fixed)

Import of FASTQ files from external lab — at Version 13

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.32
Component: net.sf.basedb.reggie Keywords:
Cc:

Description (last modified by Nicklas Nordborg)

This is related to #1295 and is the second step after importing information about the specimen. The import in #1295 should create all items from the specimen down to MergedBioAssay DemuxedSquences. The DemuxedSquences will be different and only represent a single library instead of an entire flow cell. See [6215].

It will probably be a good idea to use the pattern with an item list to control the flow of items between #1295 and this import.

It is expected that the FASTQ files have not been processed after the demux except for adapter trimming that has replaced reads with N instead of removing reads from the FASTQ files.

The import functionality need to make those FASTQ files "compatible" with the FASTQ files we get from the regular demux wizard. This means we have to replicate the following steps:

  • Bowtie to estimate the fragment size and standard deviation
  • Trimmomatic step 1 to remove the N reads. This is an alternate version of step 1 in the regular demux and will allow is to get values for the PF_READS and ADAPTER_READS annotations
  • Trimmomatic step 2 to remove low-quality reads. This is exactly the same as the second step in the regular demux wizard.

Auto-confirmation should be supported and continue with the regular secondary analysis workflow (eg. the Legacy pipeline and Hisat alignment). The same checks (for individual items) that are used in the regular demux wizard should be made.

A manual confirmation wizard is also needed with options for accepting, retrying the import or flagging the RNA.

Change History (13)

comment:1 by Nicklas Nordborg, 4 years ago

In 6179:

References #1299: Import of FASTQ files from external lab

Started to implement a wizard for importing FASTQ files. The wizard follows the usual pattern for most secondary analysis wizards that start someting. The first step includes a selection list of merged sequences items that are in the "FASTQ import pipeline" list. There is no "Select manually" functionality since it is typically not possible to run this wizard unless there are FASTQ files available. How to access the FASTQ files is not yet resolved but it is probably a responsibility for #1295 to solve this.

comment:2 by Nicklas Nordborg, 4 years ago

In 6180:

References #1299: Import of FASTQ files from external lab

Started to implement a wizard for importing FASTQ files. The wizard follows the usual pattern for most secondary analysis wizards that start someting. The first step includes a selection list of merged sequences items that are in the "FASTQ import pipeline" list. There is no "Select manually" functionality since it is typically not possible to run this wizard unless there are FASTQ files available. How to access the FASTQ files is not yet resolved but it is probably a responsibility for #1295 to solve this.

comment:3 by Nicklas Nordborg, 4 years ago

In 6181:

References #1299: Import of FASTQ files from external lab

Implemented ImportFastqJobCreator that generates a script and submit it to the analysis cluster. The script re-uses many settings from the <demux> section in reggie-config.xml since we are trying to reproduce the final steps of the regular demux script in a comapatible way. The main difference is the first Trimmomatic step which uses settings from the <step-1-import> tag instead with a default setting of TRAILING:3 MINLEN:2. This should give us compatible numbers for the ADAPTER_READS annotation assuming that the FASTQ files we get either have trimmed or masked adapter sequences. There are still some unresolved issues with the script. The most important one is that we don't know exactly which FASTQ files are associated with which merged item. The current implementation only works if there is exacly one pair of FASTQ files available in the import directory.

comment:4 by Nicklas Nordborg, 4 years ago

In 6182:

References #1299: Import of FASTQ files from external lab

Implemented ImportFastqJobCreator that generates a script and submit it to the analysis cluster. The script re-uses many settings from the <demux> section in reggie-config.xml since we are trying to reproduce the final steps of the regular demux script in a comapatible way. The main difference is the first Trimmomatic step which uses settings from the <step-1-import> tag instead with a default setting of TRAILING:3 MINLEN:2. This should give us compatible numbers for the ADAPTER_READS annotation assuming that the FASTQ files we get either have trimmed or masked adapter sequences. There are still some unresolved issues with the script. The most important one is that we don't know exactly which FASTQ files are associated with which merged item. The current implementation only works if there is exacly one pair of FASTQ files available in the import directory.

comment:5 by Nicklas Nordborg, 4 years ago

In 6184:

References #1299: Import of FASTQ files from external lab

Implemented auto-confirmation for FASTQ import.

comment:6 by Nicklas Nordborg, 4 years ago

In 6185:

References #1299: Import of FASTQ files from external lab

Implemented auto-confirmation for FASTQ import.

comment:7 by Nicklas Nordborg, 4 years ago

In 6186:

References #1299: Import of FASTQ files from external lab

Unrelated minor fixes to speed up auto-confirmation after SSP and report creation.

comment:8 by Nicklas Nordborg, 4 years ago

In 6187:

References #1299: Import of FASTQ files from external lab

Started with a manual confirmation wizard for FASTQ import. It should work if just confirimg a success, but failures and other cases need more testing and options.

comment:9 by Nicklas Nordborg, 4 years ago

In 6188:

References #1299: Import of FASTQ files from external lab

Started with a manual confirmation wizard for FASTQ import. It should work if just confirimg a success, but failures and other cases need more testing and options.

comment:10 by Nicklas Nordborg, 4 years ago

In 6339:

References #1299: Import of FASTQ files from external lab

Progress report for both Trimmomatic steps.

comment:11 by Nicklas Nordborg, 4 years ago

In 6340:

References #1299: Import of FASTQ files from external lab

Progress reporting was inserted at incorrect locations in the script.

comment:12 by Nicklas Nordborg, 3 years ago

In [6215]:

References #1295: Registration of specimen handled by external lab

Re-designed the importer to stop at DemuxedSequnces instead of MergedSequences.

There will be one DemuxedSequences for each library instead of one for each flow cell. The DemuxedSequences item will be a named after the library an use 'x' suffix. The suffix is NOT propagated to child MergedSequences which uses 'g' as before. For example:

Lib: 1234.r.lib Demux: 1234.r.lib.x Merge: 1234.r.lib.g

The DemuxedSequences will also have information about the original FASTQ files that we recieved from the external lab.

The MergedSequences item is now created by the FASTQ import script. This change makes it easier to redo failed FASTQ import since we can just delete the MergedSequences and try again.

comment:13 by Nicklas Nordborg, 3 years ago

Description: modified (diff)
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.