Opened 6 months ago

Closed 5 months ago

#1231 closed task (fixed)

Add support for sequencing with NovaSeq

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.26
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

Sequencing with a NovaSeq is similar to the sequencing with NextSeq and HiSeq but there are some differences that affect various parts of our pipeline:

  • Different tags are used in RunParameters.xml. We use this file and a few other files for progress reporting and detecting when a sequencing run has been completed. We also use this file to extract some information after the sequencing run which are then used when demuxing.
  • Dual indexes are used for barcoding libraries. We already have support for this in the MIPs pipeline so this should probably be relatively easy to implement also for the RNASeq pipeline.

Change History (19)

comment:1 Changed 6 months ago by Nicklas Nordborg

Summary: Add support for sequencing with !NovaSeqAdd support for sequencing with NovaSeq

comment:2 Changed 6 months ago by Nicklas Nordborg

In 5855:

References #1231: Add support for sequencing with NovaSeq?

Added NovaSeq as a valid value for the FlowCellType annotation.

Progress reporting during sequencing uses the novaseq_status.sh pipeliene script (to be added). This will extract more or less the same values as the nextseq_status.sh and hiseq_status.sh scripts. When it detects that sequencing is complete it will update annotations and trigger auto-confirmation (if enabled).

Note that auto-confirmation will not work correctly yet since it will not get the read string correct. The manual "Sequencing ended" wizard also doens't work as expected.

There is also no way to manually set a flow cell to NovaSeq type. To get into this track the Custom type must first be selected and then manually changed via the regular BASE edit functionality.

comment:3 Changed 6 months ago by Nicklas Nordborg

In 5856:

References #1231: Add support for sequencing with NovaSeq?

Added novaseq_status.sh script.

comment:4 Changed 6 months ago by Nicklas Nordborg

In 5857:

References #1231: Add support for sequencing with NovaSeq?

The "Register sequencing ended" has been updated to work with a NovaSeq run.

comment:5 Changed 6 months ago by Nicklas Nordborg

In 5862:

References #1231: Add support for sequencing with NovaSeq?

Fixes some minor isses with the "Check data files" functionality. A read string is generated in the same manner as for NextSeq.

The output doesn't contain tile information so we do not display a warning for this.

The 'genseq_check_illumina_dir.pl' script doesn't support cbcl files so this option is disabled.

NOTE! The current picard version has a bug that causes CheckIlluminaDirectory to fail if the read string contains S. A suggested fix has been submitted to the picard developers
https://github.com/broadinstitute/picard/issues/1485
https://github.com/nnordborg/picard/commit/0e87c4c0ed04dda492151c28bb3a6de1be3ec17f

If the fix is not accepted we either have to use our own modified picard version or use an alternate read-string where S is merged with the T:s.

comment:6 Changed 6 months ago by Nicklas Nordborg

In 5865:

References #1231: Add support for sequencing with NovaSeq?

Demuxing NovaSeq? data should now work.

Introduced the BarcodeSet annotation to be used on barcodes for grouping possible barcodes that belong together. The main reason for this is so that we can output a logical set of barcodes for the UNUSED tag when demuxing to help us catch errors with incorrectly barcoded libraries.

There are currently two possible values for the BarcodeSet annotation on the RNAseq pipeline:

  • TruSeqSingle: Used by the regular RNA-seq pipeline thas is sequenced on a NextSeq
  • TruSeqUniqueDual: Used by the "external" pipeline that is sequenced on the a NovaSeq


The MIPs pipline currently doesn't need this annotation.

comment:7 Changed 6 months ago by Nicklas Nordborg

In 5866:

References #1231: Add support for sequencing with NovaSeq?

The regular workflow should only select barcodes from the 'TRUSEQ_SINGLE' barcode set.

comment:8 Changed 6 months ago by Nicklas Nordborg

Status: newaccepted

comment:9 Changed 6 months ago by Nicklas Nordborg

In 5867:

References #1231: Add support for sequencing with NovaSeq?

Started to implement a wizard for registering a sample sheet from an external sequencing. A lot of checks are made to try to make sure that only the expected libraries are registered.

The wizard expects that library items have been pre-created and placed on a "External library plate" (it is not possible to do that by a wizard at the moment). The libraries should be without a 'creation date'.

The new wizard will set the 'creation date' to the date found in the sample sheet file (it is possible to manually change this), and associate the library with a barcode. Actual barcode sequences in the sample sheet are verified against the database.

The wizard will then create a single "Pooled library" item for all the libraries as well as a single "Flow cell" item and "Sequencing run". It will try to put in as much information as possible to make it possible for the "auto-confirmation" function to detect when data is available in the run archive and continue with demux and other analysis.

It should also be possible to use the manual "Sequencing ended" wizard.

At the moment, they doesn't work since some information is not yet available. We can probably parse out what we need from the "RunParameters?.xml" file, it just has to be done in a slightly different order.

comment:10 Changed 6 months ago by Nicklas Nordborg

In 5868:

References #1231: Add support for sequencing with NovaSeq?

Implemented a counter for the number of libraries that are waiting for external sequencing.

comment:11 Changed 6 months ago by Nicklas Nordborg

In 5869:

References #1231: Add support for sequencing with NovaSeq?

Fixed auto-confirmation and the manual "Register sequencing ended" wizard so that they are compatible with the external NovaSeq pipeline.

On the library and pool level there are lots of annotations with missing values. Most of them are related to lab-specific things such as concentrations, volumes, etc.

Several items have an emtpy 'XxxxOperator?' annotation. We could maybe use this to specify the external lab (eg. 'CTG').

We need to check if some of the missing annotations are used by the release exporter, and if so, if they are affecting things. For example, we use dates to order items to get a "batch index". We would not like this to break.

comment:12 Changed 6 months ago by Nicklas Nordborg

In 5870:

References #1231: Add support for sequencing with NovaSeq?

Added annotations for storing "External plate position" and "External operator". External plate position is stored on the library item, and the external operator is used for library, library plate and pooled library. The external operator is also stored on flow cell and sequencing run, but re-uses the existing annotations (ClusterOperator and SequencingOperator).

The comment field from the wizard is now saved as description on the library plate, flow cell and sequencing run.

comment:13 Changed 6 months ago by Nicklas Nordborg

In 5871:

References #1231: Add support for sequencing with NovaSeq?

Fixes to make the relase exporter work with data via external sequencing. The library exporter need a date to generate a batch index. The code has been updated to use the registration date as a last resort to make sure that a date always exists.

The external sequencing wizard also set PlateProcessResult=Successful on library plate, otherwise no libraries on that plate will be included in the release.

comment:14 Changed 6 months ago by Nicklas Nordborg

In 5885:

References #1231: Add support for sequencing with NovaSeq?

Added a function for generating a fake sample sheet. This will make it easier to debug the "Register external sequencing" wizard.

comment:15 Changed 6 months ago by Nicklas Nordborg

In 5889:

References #1231 and #1232.

Added more columns to the exported sample template Excel file. Major difference is that the "Sample ID" now contains the internal ID of the Library item, while the "Name" column now contains the name of the either the pre-normalized RNA or the Library item (trying to match it with what is on the tube labels).

This change also affects the "Register external sequencing" wizard which now assume that the "Sample_ID" column contains the ID of the Library item. The "Sample_Name" column is not needed, but its used to display a warning if the name doesn't match the first part of the Libary name.

comment:16 Changed 5 months ago by Nicklas Nordborg

In 5893:

References #1231: Add support for sequencing with NovaSeq?

Adding UDI #33 to #96 to the installation wizard.

comment:17 Changed 5 months ago by Nicklas Nordborg

In 5896:

References #1231: Add support for sequencing with NovaSeq?

Updated path to Picard 2.22.3.

comment:18 Changed 5 months ago by Nicklas Nordborg

In 5898:

References #1231: Add support for sequencing with NovaSeq?

Removed debug output.

comment:19 Changed 5 months ago by Nicklas Nordborg

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.