Changes between Initial Version and Version 1 of Ticket #1290


Ignore:
Timestamp:
Jan 22, 2021, 1:59:06 PM (10 months ago)
Author:
Nicklas Nordborg
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #1290

    • Property Status changed from new to accepted
    • Property Owner changed from Jari Häkkinen to Nicklas Nordborg
    • Property Component changed from not classified to net.sf.basedb.varsearch
    • Property Milestone changed from to Variant Search v1.0
  • Ticket #1290 – Description

    initial v1  
    1 The variant calling pipeline in Reggie (see #1199) produces VCF files with lots of variants. In theory it is "easy" to search for variants with `grep` or `SnpSif`, but this takes a really long time (several hours) due to the large number of files in the system. Some kind of indexing is needed before searching is realistic.
     1The variant calling pipeline in Reggie (see #1199) produces VCF files with lots of variants. In theory it is "easy" to search for variants with `grep` or `SnpSift`, but this takes a really long time (several hours) due to the large number of files in the system. Some kind of indexing is needed before searching is realistic.
    22
    33I have made some initial tests with Apache Lucene (https://lucene.apache.org/) and indexed a few of the annotations in the annotated and filtered VCF files. The results are very promising. Searching for variants in a gene or at a specific location typically takes only a few milliseconds. The results can be returned as a list of raw bioassay id:s which means that it should be possible to include this functionality in the regular table listing.