Opened 5 years ago

Closed 5 years ago

#1146 closed task (fixed)

Demux for the MIPs pipeline

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.23
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

This is probably similar to the RNAseq demux in some sense, but details are different.

Change History (28)

comment:1 by Nicklas Nordborg, 5 years ago

Status: newaccepted

comment:2 by Nicklas Nordborg, 5 years ago

In 5479:

References #1146: Demux for the MIPs pipeline

The existing demux wizard has been updated to accept a parameter for which pipeline to work with. This currently works for selecting the correct sequencing runs only.

However, help texts and instructions are probably a bit RNA-seq specific so it is not certain that this wizard should be re-used for the MIPs pipeline. There is also not yet any filter for selecting protocol and software items.

If the wizard is completed it will always create a script for RNA-seq demuxing so it will not work with MIPs.

comment:3 by Nicklas Nordborg, 5 years ago

In 5480:

References #1146: Demux for the MIPs pipeline

Added "Pipeline" annotation to demuxed and merged sequences subtypes and to software and protocol items for demux and merge.

comment:4 by Nicklas Nordborg, 5 years ago

In 5485:

References #1146 and #1142. Added role MIPsSecondayAnalysis.

comment:5 by Nicklas Nordborg, 5 years ago

In 5486:

References #1146: Demux for the MIPs pipeline

Started to prepare for MIPs demux by moving some of the existing code to a new class RnaSeqDemuxJobCreator and adding a new (empty) class MipsDemuxJobCreator. The old DemuxJobCreator is intended to be used for things that are common for both pipelines.

comment:6 by Nicklas Nordborg, 5 years ago

In 5487:

References #1146: Demux for the MIPs pipeline

Implemented parts of the demux script for MIPs. It is so far only the two picard steps (ExtractIlluminaBarcodes and IlluminaBasecallsToFastq) and the merge step that are the same as for the RNAseq pipeline. The extra Bowtie and Trimmomatic steps have been removed. A new section <demux-mips> has been added to reggie-config.xml.

The script will most likely not work correctly for the MIPs data since we are currently only generating the demux and multiplex data files with a single barcode.

comment:7 by Nicklas Nordborg, 5 years ago

In 5489:

References #1146: Demux for the MIPs pipeline

Added a second annotation for barcode sequences. The BarcodeFilesForDemuxExporter has been updated to also work with two barcodes.

The first try to run a "real" demux seemed to work. No complaints or errors from picard. At least some data seems to be assigned to libraries (barcodes was randomly selected from a list without knowledge of actual barcodes).

Job completion handler is still from the RNAseq pipeline which fails due to missing files.

comment:8 by Nicklas Nordborg, 5 years ago

In 5490:

References #1146: Demux for the MIPs pipeline

Implemented job completion handler for the MIPs demux. It is very similar to the RNAseq counterpart:

  • It will parse out statistics from the demultiplex_metrics.txt (READS, PF_READS). Percentage rules for warnings are not implemented.
  • Skipped tiles are checked
  • FASTQ files are linked to files in the BASE file system
  • Things related to Trimmomatic and estimated fragment sizes are skipped

comment:9 by Nicklas Nordborg, 5 years ago

In 5491:

References #1146 and #1142.

Auto-confirmation of a MIPs sequencing run should now be able to start the correct demux job.

comment:10 by Nicklas Nordborg, 5 years ago

In 5492:

References #1146: Demux for the MIPs pipeline

Re-factored a lot of the demux script generating code. Things that are the same have been moved to the abstract DemuxJobCreator class. Typically, the common parts are those that create all child items that will recieve the demuxed data (MergedSequences).

The job completion handlers for importing results back also have some common functionality. This will be fixed later.

comment:11 by Nicklas Nordborg, 5 years ago

In 5493:

References #1146: Demux for the MIPs pipeline

Added a wizard for confirming demux of MIPs sequencings. It is expected to be different from the RNA-seq counterpart so a copy of the existing wizard was made and put in the mipsanalysis folder. For now, only the columns related to Trimmomatic and fragment sizes have been removed and there are no rules for displaying warnings.

comment:12 by Nicklas Nordborg, 5 years ago

In 5494:

References #1146: Demux for the MIPs pipeline

The demux auto-confirm has been disabled for MIPs since we do not yet know what it should check or what comes after.

Starting the demux as part of the auto-confirm after sequencing has also been disabled for MIPs since we may want to handle this a few times manually before going auto.

comment:13 by Nicklas Nordborg, 5 years ago

In 5526:

References #1130, #1135, #1142, #1146. Fixing and getting rid of some 'TODO' entries in the code.

comment:14 by Nicklas Nordborg, 5 years ago

In 5534:

References #1161 and #1146. The MIPs demux is now running with picard 2.20. Changes that were needed:

  • Use new paramater syntax (including values in reggie-config.xml)
  • No TILE_LIMIT for ExtractIlluminaBarcodes when debugging
  • Parsing the demultiplex_metrics.txt file required a differnt regular expression since there are extra columns in places
  • Wrapping picard output with 'stdwrap.sh' to make sure that messages go to stdout or stderr depending on exit code

comment:15 by Nicklas Nordborg, 5 years ago

In 5538:

References #1146: Demux for the MIPs pipeline

Demux parameters has been updated to values provided in the sample script from AK.

comment:16 by Nicklas Nordborg, 5 years ago

In 5539:

References #1146: Demux for the MIPs pipeline

Added support for handling UMI fastq files. We merge from all lanes just as for R1 and R2 and then then use the 'fqpaste.pl' script to merge the two UMI files to a single one. At the moment only the final merged file is kept (the example script seems to save all 3 files).

The 'fqpaste.pl' is currently handled as if it is a "pipeline-script".

comment:17 by Nicklas Nordborg, 5 years ago

In 5542:

References #1146: Demux for the MIPs pipeline

Added filter and check in the "Start Hisat" and "Start Tophat/Cufflinks" wizard to make sure that only items intended for the RNAseq pipeline are selected.

comment:18 by Nicklas Nordborg, 5 years ago

In 5545:

References #1146 and #1162. Implemented a sequencing-cycles-to-read-string converter for the MIPs pipeline that builds on the example demux script from AK. Need to verify that this is what should actually be used. The example scripts seems to be from a MiSeq run but we should use HiSeq or NextSeq.

The current implementation converts X reads to 4M(X-4)T and uses all index reads for the barcode (B).

comment:19 by Nicklas Nordborg, 5 years ago

In 5551:

References #1146: Demux for the MIPs pipeline

Updated the "Confirm demux" wizard for MIPs to prepare it for selecting items for the next analysis step (this is yet not defined).

comment:20 by Nicklas Nordborg, 5 years ago

In 5582:

References #1146 and #1142. Changed the MIPsSecondaryAnalysis role to a group. This should make it behave similar to the SecondaryAnalysis group that is used for RNAseq.

comment:21 by Nicklas Nordborg, 5 years ago

In 5585:

References #1146: Demux for the MIPs pipeline

Allow barcode names to include '.' and/or '-'.

comment:22 by Nicklas Nordborg, 5 years ago

In 5586:

References #1146: Demux for the MIPs pipeline

Changes the MIPs demux to not merge fastq files from multiple lanes. File name pattern is now: <lib-name>_<flowcell-id>_L<lane>_. Typically, 3 files are created with suffixes R1.fastq.gz, R2.fastq.gz and UMI.fastq.gz.

comment:23 by Nicklas Nordborg, 5 years ago

In 5587:

References #1146: Demux for the MIPs pipeline

Now generates a "read group" file for each set of files as well. It contains the following fields:

  • RG: <flowcell-id>.<lane>
  • SM: Name of specimen (SCAN-B) or topmost DNA (external) item without prefix
  • LB: Name of library without prefix
  • PU: <flowcell-id>.<lane>.<barcode1>-<barcode2>
  • DT: Date sequencing was started in yyyy-mm-dd format
  • PL: ILLUMINA (taken from reggie-config.xml)
  • CN: BRCAlab (taken from reggie-config.xml)


comment:24 by Nicklas Nordborg, 5 years ago

In 5588:

References #1146: Demux for the MIPs pipeline

The "read group" file is now attached as an any-to-any link and not as a FASTQ file. Also report number of reads per lane in the file description.

comment:25 by Nicklas Nordborg, 5 years ago

In 5589:

References #1146: Demux for the MIPs pipeline

Added fqpaste.pl to the pipeline scripts. It is used by the MIPs demux script to combine UMI fastq files.

comment:26 by Nicklas Nordborg, 5 years ago

In 5592:

References #1146: Demux for the MIPs pipeline

Implemented support for reverse-complementing the second barcode sequence. This is needed because sequencers are not alike.

See https://support.illumina.com/content/dam/illumina-support/documents/documentation/system_documentation/miseq/indexed-sequencing-overview-guide-15057455-05.pdf

Note that we have decided to store barcode sequences according to NextSeq specifications so we need to reverse-complement when demuxing data from HiSeq 2000/2500, MiSeq, etc.

comment:27 by Nicklas Nordborg, 5 years ago

In 5606:

References #1146: Demux for the MIPs pipeline

Changed default 'smp' option to 8-16 since it will run a bit faster when the list of barcodes is long.

comment:28 by Nicklas Nordborg, 5 years ago

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.