Opened 7 years ago
Closed 7 years ago
#1009 closed task (fixed)
Genotype quality control wizard
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | Reggie v4.15 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description
Since #1001 we are creating a VCF file with genotypes for 51 SNPs. A wizard should be implemented that uses this information for quality control. There are basically two things that we can check by comparing two VCF files:
- If the samples are from different patients the genotypes should be different.
- If the samples are from the same patient the genotypes should be similar.
The wizard is intended to be run manually at regular intervals. Typically one time for each library plate. When running the wizard it should compare the VCF files from the selected samples with each other AND the VCF files for all other samples that has already been checked. The wizard should NOT compare against unselected and unchecked samples.
Since we are starting out with lots of existing and unchecked samples, the wizard should sort them in library plate order to make it easy to go through all of them in a controlled way.
The exact details of the how to compare the VCF files and the parameters for situations that should generate warnings are not yet settled. We also need to think about how to store the warnings since most will be of a nature that can't be solved immediately (for example, a sample may need to be re-processed).
Change History (12)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
Status: | new → assigned |
---|
comment:3 by , 7 years ago
(In [4646]) References #1009: Genotype quality control wizard
Started with the second step of the qc wizard. It will load genotypes of the selected alignments and compare them to all existing alignments annotatated with QC_GENOTYPE_STATUS=Checked
(and to each other).
Some warnings are issued (currently based on hard-coded rules and limits) and displayed for the user.
The user can currently only select if the alignments should be annotated with QC_GENOTYPE_STATUS=Checked
or QC_GENOTYPE_STATUS=Disabled
. There is a 'Flag' and 'Comment' option in the wizard, but they currently don't do anything. The only logic implemented so far is that alignments with a low number of genotypes are set to "Disabled". Warnings will be lost.
comment:4 by , 7 years ago
comment:5 by , 7 years ago
(In [4648]) References #1009: Genotype quality control wizard
Added QC_GenoTypeComment
annotation for storing comments related to the genotype checking.
The wizard will now load and display more information about the alignments: LibPlate, ALIGNED_PAIRS.
Warning message have more context about the other alignment. The "View genotypes" dialog has been modified with support for viewing two alignments at the same time. Warning messages that are related to a comparison are linked to this dialog.
Introduced a "MEDIUM MISMATCH" warning level for alignments from the same patient that have between 5 and 15 mismatches and where most of the mismatches have one end with GQ under 50. This seems to capture false warnings that are due to quality problems or low number of reads.
comment:6 by , 7 years ago
comment:7 by , 7 years ago
comment:8 by , 7 years ago
Milestone: | Reggie v4.14 → Reggie v4.15 |
---|
comment:9 by , 7 years ago
(In [4671]) References #1009: Genotype quality control wizard
Several changes to this wizard. The major change is that limits for mismatches are now based on percentages instead of absolute numbers.
High HET values are handled a bit differently now that we believe this is due to contamination. Since a high HET perentage should trigger a re-run of the same sample, the current alignment should NOT be disabled. We want to compare it with the re-run some time later. Two alignments with a high HET are however only compared to each other if the belong to the same patient.
The wizard will also show a bit more information about the mismatches. Mismatches where the genotypes have low GQ value are much more common, and unless there are also mismatches with high GQ it may be ignored.
comment:10 by , 7 years ago
comment:11 by , 7 years ago
(In [4681]) References #1009: Genotype quality control wizard
Removed the "experimental" status of the wizard.
Some minor changes to limits when comparing genotypes. HIGH MISMATCH would now mostly be used for swapped or contaminated samples. Mismatches in the "fuzzy" area are tagged with MEDIUM MISMATCH.
comment:12 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
(In [4642]) References #1009: Genotype quality control wizard
Started with the "Genotype quality control" wizard. It has been added to the index page under the "Hisat" section.
The annotation type
QC_GenotypeStatus
was added to keep track of alignments that has aldready been checked (or disabled).The first step of the wizard will display alignments waiting to be checked (they have no
QC_GenotypeStatus
annotation). The alignments are sorted by library plate and at most 250 at a time.The VCF statistics has also been moved from the
HisatServlet
to theGenotypeServlet
.