Opened 9 months ago

Closed 9 months ago

#1335 closed enhancement (fixed)

Calculate average read length in FASTQ files after Trimmomatic

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.33
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

In the legacy pipeline we need the average inner distance between R1 and R2. Typically we calculate that from the average fragment size (FragementSizeAvg on MergedItem) and the read length in the sequencing. Typically this works good enough, but when sequencing 2x150 and average fragment size of 160-180bp we get a lot of reading through to the adapter and trimming in the FASTQ files. In test data we have an average read length of ~120bp in the FASTQ files. Thus, we should really use the average read length instead of the sequencing length when calculating the average inner distance.

Change History (5)

comment:1 Changed 9 months ago by Nicklas Nordborg

In 6419:

References #1335: Calculate average read length in FASTQ files after Trimmomatic

Added a readlength_averager.awk script which can be used to calculate the average read length in FASTQ files.

comment:2 Changed 9 months ago by Nicklas Nordborg

In 6420:

References #1335: Calculate average read length in FASTQ files after Trimmomatic

Added a step in the demux script and FASTQ import step that calculates the average length in the final FASTQ files.

comment:3 Changed 9 months ago by Nicklas Nordborg

In 6421:

References #1335: Calculate average read length in FASTQ files after Trimmomatic

The calculated values are imported into annotations ReadLengthAvgR1 and ReadLengthAvgR2 stored on the MergedItem.

comment:4 Changed 9 months ago by Nicklas Nordborg

In 6424:

References #1335: Calculate average read length in FASTQ files after Trimmomatic

The ReadLengthAvgR1 and ReadLengthAvgR2 are now included in the release export.

comment:5 Changed 9 months ago by Nicklas Nordborg

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.