Opened 3 months ago

Closed 3 months ago

#1613 closed enhancement (fixed)

Include FastQ Screen in the demux step

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v5.3
Component: net.sf.basedb.reggie Keywords:
Cc:

Description (last modified by Nicklas Nordborg)

FastQ Screen (https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/) can be used to detect that sequences matches the expected organism.

We should include this as a QC step that is part of the regular demux step and also in the FASTQ import step for RNAseq data. Maybe we can also implement it for the WGS import step.

To begin with, we should use the default configuration of databases (see attached image).

We probably want to check the results and issue warnings, etc. Details has not yet been decided.

Attachments (2)

test_screen.png (7.3 KB ) - added by Nicklas Nordborg 3 months ago.
mouse_screen.png (7.3 KB ) - added by Nicklas Nordborg 3 months ago.

Download all attachments as: .zip

Change History (16)

by Nicklas Nordborg, 3 months ago

Attachment: test_screen.png added

comment:1 by Nicklas Nordborg, 3 months ago

Description: modified (diff)

by Nicklas Nordborg, 3 months ago

Attachment: mouse_screen.png added

comment:2 by Nicklas Nordborg, 3 months ago

Description: modified (diff)

comment:3 by Nicklas Nordborg, 3 months ago

In 7854:

References #1613: Include FastQ Screen in the demux step

Updated the RNAseq demux container:

  • Added FastQ Screen
  • Changed from Rocky 8.4 to 9.0
  • Changed from (old) Miniconda to Mambaforge
  • Updated Picard, Bowtie and Trimmomatic to newer versions
  • Updated to Java 17



comment:4 by Nicklas Nordborg, 3 months ago

In 7855:

References #1613: Include FastQ Screen in the demux step

Updated the demux script to run Fastq screen. It produces 3 files. One HTML page, one image and one tab-separated text file. Unfortunately the HTML page contains unsafe javascript. In this case we it doesn't help to relax the Content Security Policy (https://base.thep.lu.se/ticket/2327) so we simply don't save the HTML file.

comment:5 by Nicklas Nordborg, 3 months ago

In 7856:

References #1613: Include FastQ Screen in the demux step

The new version of Picard get an ArrayIndexOutOfBoundsException when the length of the barocdes in the barcodes/multiplex files doesn't match what the read-string says. The barcode we have are 7 bases, but the sequencer have only sequenced 6 cycles of the barcodes.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 6 out of bounds for length 6
  at picard.util.SingleBarcodeDistanceMetric.hammingDistance(SingleBarcodeDistanceMetric.java:80)
  at picard.illumina.DistanceMetric$1.distance0(DistanceMetric.java:35)
  at picard.illumina.DistanceMetric.distance(DistanceMetric.java:70)
  at picard.illumina.BarcodeExtractor.calculateBarcodeMatch(BarcodeExtractor.java:156)
  at picard.illumina.BarcodeExtractor.<init>(BarcodeExtractor.java:68)
  at picard.illumina.ExtractBarcodesProgram.createBarcodeExtractor(ExtractBarcodesProgram.java:109)
  at picard.illumina.ExtractIlluminaBarcodes.doWork(ExtractIlluminaBarcodes.java:179)
  at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:281)
  at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:105)
  at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:115)

The fix is to only export the first 6 bases of the barcodes.

comment:6 by Nicklas Nordborg, 3 months ago

In 7857:

References #1613: Include FastQ Screen in the demux step

The path to Trimmomatic adapters also need to be updated.

comment:7 by Nicklas Nordborg, 3 months ago

In 7858:

References #1613: Include FastQ Screen in the demux step

Added FRACTION_HUMAN and FRACTION_RRNA annotations. Values are parsed from the fastq_screen.txt and imported at the end of the job.

comment:8 by Nicklas Nordborg, 3 months ago

In 7859:

References #1613: Include FastQ Screen in the demux step

Display the fraction of Human and rRNA alignment in the manual confirmation wizard. Also includes a link to the Fastq screen image.

comment:9 by Nicklas Nordborg, 3 months ago

In 7860:

References #1613: Include FastQ Screen in the demux step

Included in the case summary.

comment:10 by Nicklas Nordborg, 3 months ago

In 7861:

References #1613: Include FastQ Screen in the demux step

The FASTQ import wizard has been updated to also use Fastq Screen.

comment:11 by Nicklas Nordborg, 3 months ago

In 7862:

References #1613: Include FastQ Screen in the demux step

Added Fastq Screen to the WGS import container.

comment:12 by Nicklas Nordborg, 3 months ago

In 7863:

References #1613: Include FastQ Screen in the demux step

Updated the WGS fastq importer to also run Fastq screen.

To improve on the performance the python script for splitting the FASTQ file has been modified to output a subset with 1/1000 of the reads to a temporary file that is used by Fastq Screen. Otherwise Fastq Screen will scan the complete FASTQ file twice. One time to count the reads and one time to extract the subset. 1/1000 should give us a subset of around 4-500 thousand reads.

comment:13 by Nicklas Nordborg, 3 months ago

In 7864:

References #1613: Include FastQ Screen in the demux step

Added to the WGS fastq import confirmation wizard and case summary.

comment:14 by Nicklas Nordborg, 3 months ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.