Opened 4 years ago

Last modified 4 years ago

#1253 closed defect

Change the name in VCF file created by variant calling — at Version 2

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.27.2
Component: net.sf.basedb.reggie Keywords:
Cc:

Description (last modified by Nicklas Nordborg)

The current variant calling pipeline gives the parent Library item as the name (-N parameter to VarDict) to use in the VCF file.

This can be a problem for downstream tools since the name is not unique among all VCF files. There are Library items that have been part of more than one pool.

Instead, we should use the name of the Alignment item.

Existing VCF files should be manually fixed (with help of some clever grep/sed/awk/... commands). Basically we can do sed 's/S0123456.l.r.m.c.lib/S0123456.l.r.m.c.lib.g.k2.a/' to replace all library names (S0123456.l.r.m.c.lib) with alignment names (S0123456.l.r.m.c.lib.g.k2.a). The variants-filtered.vcf file is not compressed and this can be done in-place with the -i flag to sed. But the raw and annotated files are compressed with bgzip so we need to decompress and re-compress before and after and probably need to store the result in a temporary file before overwriting the original file.

Change History (2)

comment:1 by Nicklas Nordborg, 4 years ago

In 5964:

References #1253: Change the name in VCF file created by variant calling

The alignment name is now used instead of the library name. It has also been changed in the PDF created by the mutation signature script.

comment:2 by Nicklas Nordborg, 4 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.