Opened 6 months ago

Closed 5 months ago

Last modified 5 months ago

#1536 closed task (fixed)

Implement variant calling for paired WGS

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.50
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

The plan is to use GATK Mutect2 (https://gatk.broadinstitute.org/hc/en-us/articles/13832710384155-Mutect2) for variant calling on tumor/normal pairs.

The general workflow is outlined here: https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-

Before we can implement the whole pipeline there are a number of things that need to be prepared:

  1. Our BAM files are not fully "analysis-ready" since GATK recommends that a "base quality recalibration" step is performed (https://gatk.broadinstitute.org/hc/en-us/articles/360035535912). We have tested this and noticed that this step increases the size of the BAM file with about 2x. So we will keep our BAM files as the are and run the recalibration step as part of the variant calling (it should take about 3-4 hours for a pair of BAMs with 30x coverage).
  1. Several steps in the pipeline requires a VCF (one or more) with information about know variants. The suggestion is to use for example, GnomAD or dbSNP. But they will need to be properly prepared and maybe filtered before use.
  1. A "panel-of-normals" data set need to be created by running Mutect2 in tumor-only mode on a number of normal samples and then building a combined VCF from that information (https://gatk.broadinstitute.org/hc/en-us/articles/13832769396635-CreateSomaticPanelOfNormals-BETA-). Since we have more than one library preparation protocol we should create separate sets of "panel-of-normals" for each protocol.

Change History (34)

comment:1 by Nicklas Nordborg, 6 months ago

In 7386:

References #1536: Implement variant calling for paired WGS

Added a container definition for the GATK and related software that is needed for the variant calling.

comment:2 by Nicklas Nordborg, 6 months ago

Milestone: Reggie v4.xReggie v4.50
Status: newassigned

comment:3 by Nicklas Nordborg, 6 months ago

In 7398:

References #1536: Implement variant calling for paired WGS

Added an item list for tumor items that should have variant calling.

comment:4 by Nicklas Nordborg, 6 months ago

In 7400:

References #1536: Implement variant calling for paired WGS

Started to implement the "Start variant calling" wizard. It should be possible to select paired tumor/normal samples but the registration doesn't do anything yet.

comment:5 by Nicklas Nordborg, 6 months ago

In 7406:

References #1536: Implement variant calling for paired WGS

Started to implement the variant calling script. Jobs can be submitted and the first steps (Base recalibration and Mutect2) in the pipeline seems to work.

comment:6 by Nicklas Nordborg, 6 months ago

In 7407:

References #1536: Implement variant calling for paired WGS

Added the steps for calculating contamination, tumor segmentation and orientation bias.

comment:7 by Nicklas Nordborg, 6 months ago

In 7408:

References #1536: Implement variant calling for paired WGS

Added the filter step.

comment:8 by Nicklas Nordborg, 6 months ago

In 7409:

References #1536: Implement variant calling for paired WGS

Added environment variable CommonGATKOptions that is used in all calls to GATK so that we don't have to repeat --QUIET --verbosity WARNING in all other options.

Also changed the panel-of-normals variant call script so that it uses the same file names, etc. as in the paired variant call script.

comment:9 by Nicklas Nordborg, 6 months ago

In 7410:

References #1536: Implement variant calling for paired WGS

Some result files are now saved back to the project archive, and there is now code for linking the files and importing annotations.

comment:10 by Nicklas Nordborg, 6 months ago

In 7411:

References #1536: Implement variant calling for paired WGS

Introducing DNA/Paired/VariantCall as a new value for the pipeline annotation.

Updated the Case summary so that the RNA variant calling and DNA variant calling have different sections.

comment:11 by Nicklas Nordborg, 6 months ago

In 7412:

References #1536: Implement variant calling for paired WGS

Files ending with .table now get a text/plain MIME type and are also included in the View-as-table extension.

comment:12 by Nicklas Nordborg, 6 months ago

In 7413:

References #1536: Implement variant calling for paired WGS

It is nice the the VCF parser and viewing options that already exists in Reggie is not failing with the VCF files from the variant calling. But, they do not display the information fully or correctly either so we need to make a bunch of changes.

First out is a way to filter VCF files based on the value in the FILTER column. We may need to use a filter that only load variants with the value PASS.

comment:13 by Nicklas Nordborg, 6 months ago

In 7414:

References #1536: Implement variant calling for paired WGS

Added support for phased genotypes (which are written as 0|1 instead of 0/1) and the PS tag in the format field that groups genotypes in the same phase set.

comment:14 by Nicklas Nordborg, 6 months ago

In 7415:

References #1536: Implement variant calling for paired WGS

MergeVcfs is a Picard tool and doesn't have the same parameters as other GATK tools. verbosity must be VERBOSITY.

comment:15 by Nicklas Nordborg, 6 months ago

In 7416:

References #1536: Implement variant calling for paired WGS

Renaming some results files and preparing for the annotation step. There will now be two VCF files that are saved. The "raw" variant calling with filtering information is saved as variants-raw.vcf.gz. This file is filtered with bcftools so that only variants with FILTER=PASS remain. The filtered file is annotated (not yet implemented) and saved as variants-somatic.vcf.gz which will be the primary result file.

comment:16 by Nicklas Nordborg, 6 months ago

In 7422:

References #1536: Implement variant calling for paired WGS

Added support for VCF with result for paired normal/tumor samples. The implementation is very simple and assumes that the last 2 columns in the VCF has genotype date for the normal and tumor (in that order).

Statistics in the VcfData class is collected for the tumor only, but each entry can be associated with the genotype information for the normal sample.

comment:17 by Nicklas Nordborg, 6 months ago

In 7423:

References #1536: Implement variant calling for paired WGS

Added a new dialog for viewing variants from the wgs pipeline. The main difference is a column with data for the normal.

comment:18 by Nicklas Nordborg, 5 months ago

In 7424:

References #1536: Implement variant calling for paired WGS

Implemented annotation of the VCF file with vcfanno and snpEff in a way that is similar to the RNAseq variant calling. This will make it a lot easier to use the already existing tools for viewing variants and it may also be possible to implement support in the variant search extension.

It is also possible to run the Funcotator step from GATK but we are not using those annotations.

comment:19 by Nicklas Nordborg, 5 months ago

In 7425:

References #1536: Implement variant calling for paired WGS

We need the stderrwrap.sh script also.

comment:20 by Nicklas Nordborg, 5 months ago

In 7426:

References #1536: Implement variant calling for paired WGS

Display the non-coding Cosmic ID in case that exists. If the "TYPE" annotation is missing we derive it from the REF and ALT information.

comment:21 by Nicklas Nordborg, 5 months ago

In 7427:

References #1536: Implement variant calling for paired WGS

The dialog need to be a bit larger due to the columns with normal genotype information.

comment:22 by Nicklas Nordborg, 5 months ago

In 7428:

References #1536: Implement variant calling for paired WGS

Added a code section for cleaning up the VCF file from ##contig entries that are not used in the variant calling. Eg. we only keep chr1 to chrY.

comment:23 by Nicklas Nordborg, 5 months ago

In 7433:

References #1536: Implement variant calling for paired WGS

Use unique names for dialog windows in case summary so that is is possible to open more than one.

comment:24 by Nicklas Nordborg, 5 months ago

In 7434:

References #1536: Implement variant calling for paired WGS

Limit the length of HGVS.c values since they can sometimes get very long.

comment:25 by Nicklas Nordborg, 5 months ago

In 7435:

References #1536: Implement variant calling for paired WGS

Added counter for unconfirmed variant calls. Noticed that the RNAseq variant calling was missing a filter on the pipeline annotation and could potentially also load variant calls from other pipelines.

comment:26 by Nicklas Nordborg, 5 months ago

In 7436:

References #1536: Implement variant calling for paired WGS

Started to implement the manual confirmation wizard. It is more or less the same as the corresponding wizard in the RNAseq variant calling pipeline.

Also realized that there is no link back to the paired normal alignment that was used in the variant call so this is added as an any-to-any link.

Registration of the results is not yet implemented.

comment:27 by Nicklas Nordborg, 5 months ago

In 7437:

References #1536: Implement variant calling for paired WGS

Registration should now work.

comment:28 by Nicklas Nordborg, 5 months ago

In 7438:

References #1536: Implement variant calling for paired WGS

Implemented auto-confirmation for the variant calling.

comment:29 by Nicklas Nordborg, 5 months ago

In 7439:

References #1536: Implement variant calling for paired WGS

Implemented support for having different panel-of-normals. Paths to the VCF files with the normal information are configured in reggie-config.xml and linked to an given ExternalOperator on the Library item:

<panel-of-normals operator="Sanger">
	export PON="${BaseDir}/panel-of-normals/sanger.vcf.gz"
</panel-of-normals>

comment:30 by Nicklas Nordborg, 5 months ago

In 7440:

References #1536: Implement variant calling for paired WGS

Display the panel-of-normals in the confirmation wizard and in the case summary.

comment:31 by Nicklas Nordborg, 5 months ago

In 7441:

References #1536: Implement variant calling for paired WGS

Updated configuration settings.

comment:32 by Nicklas Nordborg, 5 months ago

In 7442:

References #1536: Implement variant calling for paired WGS

Added a check in the "Build panel-or-normals" wizard that display a warning message if not all items in the selected item list have the same Library.ExternalOperator annotation.

comment:33 by Nicklas Nordborg, 5 months ago

Resolution: fixed
Status: assignedclosed

comment:34 by Nicklas Nordborg, 5 months ago

In 7443:

References #1536: Implement variant calling for paired WGS

Use GRCh38.p13 with snpEff so that we have the same version as Gencode 41.

Note: See TracTickets for help on using tickets.