Opened 5 years ago
Last modified 5 years ago
#1225 closed enhancement
Update databases used in variant calling pipeline — at Version 5
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | blocker | Milestone: | Reggie v4.25 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description (last modified by )
The aim is to update all databases that has been updated since the original variant calling pipeline. Most of the work is done outside of Reggie. Some changes and information can be found here http://onk-wiki.bmc.lu.se/trac/scanbprim/browser/scanbprim/support-files/variant-calling
dbSNP updated to version 153
It contains a lot more variants than before. Some of the fields have been removed. We no longer annotate with CDA
, G5
or G5A
. See also #1222.
- http://www.ensembl.info/2019/08/29/coming-soon-to-an-ensembl-near-you-dbsnp-2-0/
- https://www.ncbi.nlm.nih.gov/variation/docs/snp2_human_variation_vcf/
COSMIC updated to version 90
They have made major changes to ID assignment and how samples are reported. This affected the custom scripts for calculating mutation frequencies. This was solved by matching ID+GENE from the VCF to ID+GENE in the sample mutation table. The end result should be compatible with older version of COSMIC.
gnomAD updated to version 2.1.1
The major change is a very big increase in file size due to a lot more annotations that have been added to the VCF files. Most annotations are related to variant frequencies in different populations. The big files are impractical so we create smaller files by simply removing all annotations that we don't need. The annotations we keep:
- Exomes: AF, popmax, AF_popmax, AF_female, AF_nfe
- Genomes: AF, AF_female, AF_male, AF_nfe
- https://gnomad.broadinstitute.org/about
- https://macarthurlab.org/2018/10/17/gnomad-v2-1/
- https://gnomad.broadinstitute.org/faq
Swegen updated to version 20180409
We decided to use the newer hg38 version (swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz
) instead of the hg19 version (swegen_frequencies_hg19_20180409.tar
).
SCAN-B tumor and normal samples databases
These two databases need to be updated after the release and installation of Reggie 4.25. See #1208 and Reggie 4.25 update notes for more information.
All other databases
The other databases have not been updated:
- Danish genome
- Low complexity regions
- RNA edit databases
- UCSC RefGene and Repeats
Change History (5)
comment:1 by , 5 years ago
Description: | modified (diff) |
---|
comment:2 by , 5 years ago
Description: | modified (diff) |
---|
comment:3 by , 5 years ago
Description: | modified (diff) |
---|
comment:4 by , 5 years ago
comment:5 by , 5 years ago
Description: | modified (diff) |
---|
In 5816: