Opened 11 months ago

Closed 8 months ago

#1009 closed task (fixed)

Genotype quality control wizard

Reported by: nicklas Owned by: nicklas
Priority: major Milestone: Reggie v4.15
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

Since #1001 we are creating a VCF file with genotypes for 51 SNPs. A wizard should be implemented that uses this information for quality control. There are basically two things that we can check by comparing two VCF files:

  • If the samples are from different patients the genotypes should be different.
  • If the samples are from the same patient the genotypes should be similar.

The wizard is intended to be run manually at regular intervals. Typically one time for each library plate. When running the wizard it should compare the VCF files from the selected samples with each other AND the VCF files for all other samples that has already been checked. The wizard should NOT compare against unselected and unchecked samples.

Since we are starting out with lots of existing and unchecked samples, the wizard should sort them in library plate order to make it easy to go through all of them in a controlled way.

The exact details of the how to compare the VCF files and the parameters for situations that should generate warnings are not yet settled. We also need to think about how to store the warnings since most will be of a nature that can't be solved immediately (for example, a sample may need to be re-processed).

Change History (12)

comment:1 Changed 11 months ago by nicklas

(In [4642]) References #1009: Genotype quality control wizard

Started with the "Genotype quality control" wizard. It has been added to the index page under the "Hisat" section.

The annotation type QC_GenotypeStatus was added to keep track of alignments that has aldready been checked (or disabled).

The first step of the wizard will display alignments waiting to be checked (they have no QC_GenotypeStatus annotation). The alignments are sorted by library plate and at most 250 at a time.

The VCF statistics has also been moved from the HisatServlet to the GenotypeServlet.

comment:2 Changed 10 months ago by nicklas

  • Status changed from new to assigned

comment:3 Changed 10 months ago by nicklas

(In [4646]) References #1009: Genotype quality control wizard

Started with the second step of the qc wizard. It will load genotypes of the selected alignments and compare them to all existing alignments annotatated with QC_GENOTYPE_STATUS=Checked (and to each other).

Some warnings are issued (currently based on hard-coded rules and limits) and displayed for the user.

The user can currently only select if the alignments should be annotated with QC_GENOTYPE_STATUS=Checked or QC_GENOTYPE_STATUS=Disabled. There is a 'Flag' and 'Comment' option in the wizard, but they currently don't do anything. The only logic implemented so far is that alignments with a low number of genotypes are set to "Disabled". Warnings will be lost.

comment:4 Changed 10 months ago by nicklas

(In [4647]) References #1009: Genotype quality control wizard

Adding file that should have been included in [4646].

comment:5 Changed 10 months ago by nicklas

(In [4648]) References #1009: Genotype quality control wizard

Added QC_GenoTypeComment annotation for storing comments related to the genotype checking.

The wizard will now load and display more information about the alignments: LibPlate, ALIGNED_PAIRS.

Warning message have more context about the other alignment. The "View genotypes" dialog has been modified with support for viewing two alignments at the same time. Warning messages that are related to a comparison are linked to this dialog.

Introduced a "MEDIUM MISMATCH" warning level for alignments from the same patient that have between 5 and 15 mismatches and where most of the mismatches have one end with GQ under 50. This seems to capture false warnings that are due to quality problems or low number of reads.

comment:6 Changed 10 months ago by nicklas

(In [4649]) References #1009: Genotype quality control wizard

Added "Flagged Alignment" item list. It is used to store alignments that are flagged due to genotype QC checks.

comment:7 Changed 9 months ago by nicklas

(In [4652]) References #1009: Genotype quality control wizard

Flagged this wizard as an experimental feature (disabled by default). It can be enabled by changing the flag in reggie-config.xml.

comment:8 Changed 9 months ago by nicklas

  • Milestone changed from Reggie v4.14 to Reggie v4.15

comment:9 Changed 8 months ago by nicklas

(In [4671]) References #1009: Genotype quality control wizard

Several changes to this wizard. The major change is that limits for mismatches are now based on percentages instead of absolute numbers.

High HET values are handled a bit differently now that we believe this is due to contamination. Since a high HET perentage should trigger a re-run of the same sample, the current alignment should NOT be disabled. We want to compare it with the re-run some time later. Two alignments with a high HET are however only compared to each other if the belong to the same patient.

The wizard will also show a bit more information about the mismatches. Mismatches where the genotypes have low GQ value are much more common, and unless there are also mismatches with high GQ it may be ignored.

comment:10 Changed 8 months ago by nicklas

(In [4678]) References #1009: Genotype quality control wizard

Layout changes to the compare dialog. Added link to "Case summary" in the alignment list.

comment:11 Changed 8 months ago by nicklas

(In [4681]) References #1009: Genotype quality control wizard

Removed the "experimental" status of the wizard.

Some minor changes to limits when comparing genotypes. HIGH MISMATCH would now mostly be used for swapped or contaminated samples. Mismatches in the "fuzzy" area are tagged with MEDIUM MISMATCH.

comment:12 Changed 8 months ago by nicklas

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.