Opened 7 years ago
Closed 7 years ago
#1001 closed enhancement (fixed)
Add HaplotypeCaller to Hisat align step
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | Reggie v4.13 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description (last modified by )
In the Hisat alignment pipeline we should also run HaplotypeCaller to check genotypes of 51 known SNPs.
The script below has been used during the testing phase. We need to re-arrange so that some things go as options into reggie-config.xml.
We also need to check that error handling is working as expected.
snp.list
and snp.vcf
are two files containing information about the 51 SNPs we are interested in. We should investigate if we can do without the snp.list
.
[UPDATE] No, we need it, since otherwise the HaplotypeCaller will check the entire BAM file which will take about one 1 hour instead of ~10 seconds.
[UPDATE 2] It seems like the -L
parameter also accepts a VCF-file. Thus we can (re-)use the snp.vcf
and there is no need for the snp.list
file.
JAVA=/usr/bin/java JAR=GenomeAnalysisTK.jar REF=hg38.analysisSet.fa DBSNP=snp.vcf INTERVALS=snp.list INPUT_BAM=$1 OUTPUT_VCF=$2 OPTIONS="-stand_call_conf 20 --filter_reads_with_N_cigar --genotyping_mode GENOTYPE_GIVEN_ALLELES --annotation AlleleBalance" ${JAVA} -jar ${JAR} -T HaplotypeCaller -R ${REF} -L ${INTERVALS} --dbsnp ${DBSNP} --alleles ${DBSNP} ${OPTIONS} -I ${INPUT_BAM} -o ${OUTPUT_VCF}
The VCF file that is produced should either be uploaded to BASE or we should extract enough information from it to be able to do pair-wise comparisons later on (but this can also be done the first time a sample is included to the pair-wise all-against-all comparison).
Some information should be extracted immediately and added as annotations on the AlignedSequences item:
- GT_COUNT: Number of genotypes that we could get from the aligned data. If this value is too low we will not be able to pair-wise comparisons. Typically, we will get 45 or more from most data files. The main reason for getting a lower count is that there are not enough aligned pairs (excluding duplicates).
- GT_HET_COUNT: Numer of genotypes that are heterozygous. An unexptected high count (>35) may indicate a contamination.
Change History (10)
comment:1 by , 7 years ago
Description: | modified (diff) |
---|
comment:2 by , 7 years ago
Description: | modified (diff) |
---|
comment:3 by , 7 years ago
Description: | modified (diff) |
---|
comment:4 by , 7 years ago
comment:5 by , 7 years ago
(In [4619]) References #1001: Add HaplotypeCaller to Hisat align step
Added VCF MIME type and file subtype. The VCF file is also copied to the BASE files server after the job has finished.
comment:6 by , 7 years ago
(In [4620]) References #1001: Add HaplotypeCaller to Hisat align step
Added QC_GenoTypeCount
and QC_GenoTypeHET
annotation types.
The "Confirm Hisat alignment" wizard has been updated to display the genotype QC values. No warnings are issued due to low count or high HET values.
Added parser implementation for extracting information from VCF files. A popup dialog can be opened from the "Confirm Hisat alignment" wizard to display some data from the VCF.
comment:7 by , 7 years ago
(In [4621]) References #1001: Add HaplotypeCaller to Hisat align step
Add genotype QC information to the case summary and a link to the VCF statistics dialog.
comment:8 by , 7 years ago
(In [4622]) References #1001: Add HaplotypeCaller to Hisat align step
VCF statistics is now imported after the Hisat alignment job has ended.
comment:9 by , 7 years ago
Status: | new → assigned |
---|
comment:10 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
(In [4618]) References #1001: Add HaplotypeCaller to Hisat align step
The VCF file should now be created. No data is parsed out of it yet.