Opened 3 months ago

Closed 3 months ago

#1566 closed task (fixed)

Implement a wizard for restoring FASTQ files from aligned BAM files

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.53
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

In the WGS pipeline the alignment BAM files also contain all unaligned reads. This makes it possible to restore the original FASTQ files from the BAM with the Picard SamToFastq tool. The restored FASTQ files will not be sorted in a different order, but it not should not matter as long as all reads are kept.

The idea is to save disk space by not keeping both FASTQ files and BAM files on the production server. We do not intend to implement a wizard that delete FASTQ files. This need to be done manually. The file items that represents the deleted FASTQ files should NOT be removed from the database. But they can maybe be marked as OFFLINE?

The wizard can be used on any alignment in the WGS pipeline, but it should not overwrite already existing FASTQ files.

Change History (8)

comment:1 by Nicklas Nordborg, 3 months ago

In 7572:

References #1566: Implement a wizard for restoring FASTQ files from aligned BAM files

Started with a wizard that allows the user to select aligned sequences.

comment:2 by Nicklas Nordborg, 3 months ago

In 7573:

References #1566: Implement a wizard for restoring FASTQ files from aligned BAM files

Created a wizard that scan the the project archive and mark missing FASTQ files as OFFLINE.

comment:3 by Nicklas Nordborg, 3 months ago

In 7574:

References #1566: Implement a wizard for restoring FASTQ files from aligned BAM files

The Restore FASTQ files wizard can now generate a script and submit to the cluster. It seems to work there is still some error handling and other checks to implement.

comment:4 by Nicklas Nordborg, 3 months ago

In 7575:

References #1566: Implement a wizard for restoring FASTQ files from aligned BAM files

The debug option now only uses part of chr21 from the bam file.

comment:5 by Nicklas Nordborg, 3 months ago

In 7580:

References #1566: Implement a wizard for restoring FASTQ files from aligned BAM files

Added a check for the number of reads in the restored FASTQ file. It should match the existing count that we have stored in READS annotation.

comment:6 by Nicklas Nordborg, 3 months ago

In 7581:

References #1566: Implement a wizard for restoring FASTQ files from aligned BAM files

The scanning wizard can now also scan for FASTQ files that have been restored. It may be that we just move FASTQ files to a different server and move them back in case we need them.

comment:7 by Nicklas Nordborg, 3 months ago

In 7586:

References #1566: Implement a wizard for restoring FASTQ files from aligned BAM files

Added a button that load all aligned items that has at least one missing FASTQ file.

comment:8 by Nicklas Nordborg, 3 months ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.