Version 2 (modified by 4 years ago) ( diff ) | ,
---|
How to use the Variant Search extension
Introduction
The current search functionality is integrated directly in BASE. Go to the raw bioassays list page and add the Variant (filtered) column to the table using the Columns... dialog, or right-click on the title row and enabled it from the popup menu.
By default, the column displays Yes or No depending on if the VCF file has been indexed or not. If Yes, the number of variants is also displayed.
Searching
The variant database uses Apache Lucene instead of PostgreSQL. Thus, the search field for this column behaves differently from other search fields in BASE.
It is possible to search for raw bioassays that have been indexed or not and the number of variants:
%
: Find all indexed raw bioassays<>%
or=
: Find all raw bioassays that are not indexed=0
: Find indexed raw bioassays with 0 variants>=500
: Find indexed raw bioassays with at least 500 variants
Note! This syntax is similar to the regular BASE syntax, but is for example, not possible to use a list of values: =1|2
(will not work)
Searching for variants
A typical search in a Lucene database consists of a field name and query string separated by a colon (:). Here are a few examples:
gene:TP53
: Find variants in the TP53 genechr17:7675189
: Find variants at the specified genomic locationc:423C>G
: Find variants at the given transcript positionp:C141W
: Find variants causing the specified protein changecosmic:COSV52706449
: Find variants for the COSMIC IDrsid:rs1057519977
: Find variants for the dbSNP IDgt:1/1
: Find variants that are homozygousdp:>=100
: Find variants that has been sequenced to a depth 100 or morevd:>=100
: Find variants where at least 100 alternate alleles have been foundaf:>0.8
: Find variants where the alternate allele frequency is larger than 0.8effect:synonymous
: Find variants where the predicted effect is synonymous
Tip! The prefix can in many cases be skipped since it is almost always possible to automatically make an educated guess about the field with some pattern matching on the query string. The gene: field is the default and will be used if a field can't be automatically guessed. Here are the same examples as above:
TP53
: Since gene: is the default field423C>G
: Numbers followed by [ACGT] and >C141W
: Single-letter with numbers in-betweenCOSV52706449
: 'COSV' follwed by numbersrs1057519977
: 'rs' followed by numbers1/1
: 0 or 1 with forward slash in-between
Note! If a query don't return the expected result always try again with explicit field prefixes. The following fields are never auto-detected:
dp
vd
af
effect
It is possible to specify multiple conditions with AND and OR.
TP53 AND 423C>G
TP53 AND (423C>G OR 742C>T)
The AND and OR keywords are case-sensitive and OR is implicit if not specified. Thus, the query string TP53 and 423C>G
will not return the expected results since it is interpreted as: gene:TP53 OR gene:and OR c:423C>G
.
For more information about Lucene query syntax see https://lucene.apache.org/core/8_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package.description
Indexed fields
The variant database currently contains the following indexed fields:
gene:
Gene name from the ncbiRefSeq annotation the SnpEff Gene_Name column. Case-insensitive.type:
Type of variant. Can be one of SNV, Insertion, Deletion, Complex.cosmic:
ID of the variant in the COSMIC database.rsid:
ID of variant in the dbSNP database.c:
Transcript change. Taken from the cosmic_CDS annotation and SnpEff HGVS.c column.p:
Expected amino-acid change in the protein. Taken from cosmic_AA annotation and SnpEff HGVS.p column. Note! COSMIC uses single-letter notation for amino acids, while SnpEff uses three-letter notation. To make things simple, all values have been converted to single-letter notation in the index. See https://www.hgvs.org/mutnomen/codon.html for a table with translations.ref:
Reference allelealt:
Alternate allelechrom:
Name of chromosome(eg. chr1, chr2, etc)pos:
Location within the chromosome. The location is also indexed using the chromsome name as a field name. Thus, the query stringchrom:chr17 AND pos:7675189
can be written aschr17:7675189
(recommended).effect:
Expected effect of the variant as determined by SnpEff. There is a table with a list of possible values. The_variant
suffix is ignored for those values that have it.gt:
: Genotype of the variant. Can be one of0/1
,1/0
or1/1
.dp:
: Depth at the variant location.vd:
: Depth of the alternate allele at the variant location.af:
: Allele frequency of the alternate allele.