Opened 3 years ago

Last modified 3 years ago

#1290 closed task

Implement a variant search extension — at Version 1

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Variant Search v1.0
Component: net.sf.basedb.varsearch Keywords:
Cc:

Description (last modified by Nicklas Nordborg)

The variant calling pipeline in Reggie (see #1199) produces VCF files with lots of variants. In theory it is "easy" to search for variants with grep or SnpSift, but this takes a really long time (several hours) due to the large number of files in the system. Some kind of indexing is needed before searching is realistic.

I have made some initial tests with Apache Lucene (https://lucene.apache.org/) and indexed a few of the annotations in the annotated and filtered VCF files. The results are very promising. Searching for variants in a gene or at a specific location typically takes only a few milliseconds. The results can be returned as a list of raw bioassay id:s which means that it should be possible to include this functionality in the regular table listing.

This functionality could of course be integrated into Reggie, but since we also copy the VCF files to Relax, it would be nice if we could implement this as a separate extension that works in both the Reggie and Relax environments.

Change History (1)

comment:1 by Nicklas Nordborg, 3 years ago

Component: not classifiednet.sf.basedb.varsearch
Description: modified (diff)
Milestone: Variant Search v1.0
Owner: changed from Jari Häkkinen to Nicklas Nordborg
Status: newaccepted
Note: See TracTickets for help on using tickets.