#1329 closed defect (fixed)
HaplotypeCaller is "inventing" new variants
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | Reggie v4.32.1 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description
The output from HaplotypeCaller sometimes contain results for more variants that we submitted in the reference VCF file. There is probably some logic to it, but we would like to only keep the results for the variants that we asked for. For example, for ESR1 we test two variants that are located next to each other:
chr6:152098787 T›A chr6:152098788 A›C
HaplotypeCaller may also output results for the combined variant:
chr6:152098787 TA›AC
On the protein level this results in 3 different variants: Y539N, Y539T and Y539S so this could very well be important, but it is not what we asked for.
Since we are annotating the results with the TYPE annotation from the reference VCF it should be relatively easy to remove entries that have no TYPE annotation from the final result.
I think it should be possible to use
grep "\(^#\|TYPE\=\)"
to filter out the lines without a TYPE annotation. We need to keep all header lines (starting with # and all other lines containing TYPE=.