Opened 3 months ago

Closed 3 months ago

Last modified 3 months ago

#1325 closed defect (fixed)

Calculating the average fragment size needs fixing due to Hisat bug

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.32
Component: net.sf.basedb.reggie Keywords:
Cc:

Description (last modified by Nicklas Nordborg)

It seems that Hisat 2.1 has a bug that in some cases output a negative TLEN for both read pairs. This affects the singlecolumnaverager.awk pipeline script that calculates the average fragment size. The script only consider reads with TLEN>0 and TLEN<500 and will miss both reads in pairs that are affected by the Hisat bug.

It seems like the bug is only present for fragments that are shorter than the read length. It can be seen a sharp edge in the distribution of TLEN values (>0). See the attached image.

The solution is to let the singlecolumnaverager.awk consider both positive and negative TLEN values and then use the absolute value in the calculations. The final average and standard deviation should be the same, but the number of fragments need to be divided by 2.

Hisat 2.2 also fixes the problem.

https://github.com/DaehwanKimLab/hisat2/issues/205

Attachments (1)

fragment-size-abs-tlen-hisat2.2.png (34.3 KB) - added by Nicklas Nordborg 3 months ago.

Download all attachments as: .zip

Change History (5)

Changed 3 months ago by Nicklas Nordborg

comment:1 Changed 3 months ago by Nicklas Nordborg

Description: modified (diff)

comment:2 Changed 3 months ago by Nicklas Nordborg

Description: modified (diff)

comment:3 Changed 3 months ago by Nicklas Nordborg

Resolution: fixed
Status: newclosed

In 6378:

Fixes #1325: Calculating the average fragment size needs fixing due to Hisat bug

comment:4 Changed 3 months ago by Nicklas Nordborg

Description: modified (diff)
Note: See TracTickets for help on using tickets.