Opened 3 years ago

Closed 3 years ago

#1324 closed task (fixed)

Implement support for indexing VCF files from the targeted genotyping

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Variant Search v1.2
Component: net.sf.basedb.varsearch Keywords:
Cc:

Description

See Reggie ticket #1323. The VCF files should be more or less compatible with the existing once from the regular variant calling. But there are some differences:

  • Each VariantCall may have multiple VCF files from targeted genotyping. The should all be indexed in the same index.
  • The AF (Allele frequency) and VD (Variant Depth) fields are not added by the HaplotypeCaller, but they can be calculated from the AD and DP fields.
  • The VCF files also contain results for ​homozygous genotypes. They should be indexed since it may be useful to query everything, but some functions and information need to be changed since they are not all variants.

Change History (12)

comment:1 by Nicklas Nordborg, 3 years ago

In 6371:

References #1324: Implement support for indexing VCF files from the targeted genotyping

Added an item list for handling targeted genotyping.

comment:2 by Nicklas Nordborg, 3 years ago

In 6372:

References #1324: Implement support for indexing VCF files from the targeted genotyping

Added an index for the targeted genotyping. It will not find any files to index yet.

comment:3 by Nicklas Nordborg, 3 years ago

In 6373:

References #1324: Implement support for indexing VCF files from the targeted genotyping

Added VcfFileLocator with two different implementations for finding the VCF files that should be indexed. This should work with the 3 different indexes (but support for more than one file has not yet been implemented).

comment:4 by Nicklas Nordborg, 3 years ago

In 6374:

References #1324: Implement support for indexing VCF files from the targeted genotyping

Added VcfFileLocator with two different implementations for finding the VCF files that should be indexed. This should work with the 3 different indexes (but support for more than one file has not yet been implemented).

comment:5 by Nicklas Nordborg, 3 years ago

In 6375:

References #1324: Implement support for indexing VCF files from the targeted genotyping

Indexing multiple VCF files should now work.

comment:6 by Nicklas Nordborg, 3 years ago

In 6376:

References #1324: Implement support for indexing VCF files from the targeted genotyping

AF and VD are calculated from the AD annotation if needed.

comment:7 by Nicklas Nordborg, 3 years ago

In 6377:

References #1324: Implement support for indexing VCF files from the targeted genotyping

Indexing is now aware of different genotypes. A './.' call is not indexed since it means that there is no data. '0/0' is only indexed if the 'indexAllGenotypes' flag is enabled. Some counting function have been updated to only consider actual variants and there are new functions that also consider all genotypes.

comment:8 by Nicklas Nordborg, 3 years ago

In 6379:

References #1324: Implement support for indexing VCF files from the targeted genotyping

Minor changes in the hit details dialog since not all genotypes are variants.

comment:9 by Nicklas Nordborg, 3 years ago

In 6380:

References #1324: Implement support for indexing VCF files from the targeted genotyping

When querying the targeted genotypes index we should only return results with a variant unless the query explicitely asks for other genotypes.

comment:10 by Nicklas Nordborg, 3 years ago

In 6381:

References #1324: Implement support for indexing VCF files from the targeted genotyping

Ignore 'p.?' when indexing and displaying hits, since it is not very interesting.

comment:11 by Nicklas Nordborg, 3 years ago

In 6395:

References #1324: Implement support for indexing VCF files from the targeted genotyping

If the genotype is './.' it is not a variant.

comment:12 by Nicklas Nordborg, 3 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.