Context Navigation

← Previous Ticket
Next Ticket →

#1464 closed task (fixed)

New pipeline for DNA whole genome sequencing

Reported by:	Nicklas Nordborg	Owned by:	Nicklas Nordborg
Priority:	major	Milestone:	Reggie v4.45
Component:	net.sf.basedb.reggie	Keywords:
Cc:

Description

Typically samples are paired with one blood DNA sample and one (or more) tumor DNA sample from the same patient, but the initial parts of the pipeline treat samples one-by-one. The first step is to get a MergedSequences item that represents the sequenced data with FASTQ files and link that back to the source DNA or Blood DNA item.

There may be multiple pairs of FASTQ files for a single sample since if the demux was done per-lane of the flow cell or if multiple sequencing runs was needed to get the desired amount of data. But there are also cases where everything has been merged into a single pair of FASTQ files. This should not be an issue at the MergedSequnces level but is something that downstream analysis need to be aware of.

The directory structure and naming convention of the FASTQ files should be similar to the solution used for the RNA pipeline, but filenames need to consider lanes and multiple sequencing runs as well.

Due to the large amount of data the FASTQ files will be located on another server (casa28) so we need a new FileServer item in BASE that points to this location.

The MergedSequences items should be linked back to the originating DNA/Blood DNA. There should be at least one intermediary aliquot that we can use if we want to attach some extra information (for example the Barcode).

If possible, it would also be nice to create intermediary FlowCell and SequencingRun items.