Opened 4 years ago

Closed 4 years ago

#1208 closed task (fixed)

Implement wizard for building database of variant frequencies in SCAN-B samples

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.25
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

The annotation/filtering step in the variant calling pipeline is using a database with counts and frequencies for variants that has been found in the SCAN-B data. The current database was built from the release 3 data. It would be nice to have a wizard for building a new database. The input would be an item list with alignments. A script is generated that takes the raw variant calling file together with the patient and counts all variants. For each variant it should count the total number of times it has been seen and the number of patients. Results is saved to a VCF file that is compatible with the existing database.

Change History (14)

comment:1 by Nicklas Nordborg, 4 years ago

Status: newaccepted

comment:2 by Nicklas Nordborg, 4 years ago

In 5763:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Started to implement the wizard. It is possible to select an item list, but nothing will happen after registration.

comment:3 by Nicklas Nordborg, 4 years ago

In 5764:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added a quick check that the selected item list seems to contain alignements with a 'variants-raw.vcf.gz' file.

comment:4 by Nicklas Nordborg, 4 years ago

In 5765:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Started with the code for submitting the statistics script to the cluster. The framework is ready and it will submit an empty script that doesn't do anything.

comment:5 by Nicklas Nordborg, 4 years ago

In 5766:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Implemented export step for creating a file with patient and VCF file information. This file is intended to be used by the statistics script.

comment:6 by Nicklas Nordborg, 4 years ago

In 5767:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added a python script to the pipeline scripts repository that calculates variant statistics for a list of VCF files.

comment:7 by Nicklas Nordborg, 4 years ago

In 5768:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Started to implement the script for calculating the variant statistics. Most of the work is done by the mut_stats.py python script. After that we need to sort and index the result. The final VCF file is currently left in the working directory which is obviously not a good place.

comment:8 by Nicklas Nordborg, 4 years ago

In 5769:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added progress reporting to the python script that collect variant statistics.

comment:9 by Nicklas Nordborg, 4 years ago

In 5770:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Progress reporting is now supported by the python script.

comment:10 by Nicklas Nordborg, 4 years ago

In 5771:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added a logfile to get some summary statistics.

comment:11 by Nicklas Nordborg, 4 years ago

In 5772:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added a new job subtype so that we can get a callback when the job has ended. The callback will get some information from the log file and generate a message.

comment:12 by Nicklas Nordborg, 4 years ago

In 5773:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Results files are now saved to the job folder which should preserve them after the job has finished.

comment:13 by Nicklas Nordborg, 4 years ago

In 5815:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Updated the wizard to ask if the list contains tumor or normal samples. This option affect only the file name of the VCF that is cretated:

  • Tumors: scanb-tumors.vcf.gz
  • Normals: scanb-normals.vcf.gz


comment:14 by Nicklas Nordborg, 4 years ago

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.