Opened 4 years ago

Closed 3 years ago

#1208 closed task (fixed)

Implement wizard for building database of variant frequencies in SCAN-B samples

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.25
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

The annotation/filtering step in the variant calling pipeline is using a database with counts and frequencies for variants that has been found in the SCAN-B data. The current database was built from the release 3 data. It would be nice to have a wizard for building a new database. The input would be an item list with alignments. A script is generated that takes the raw variant calling file together with the patient and counts all variants. For each variant it should count the total number of times it has been seen and the number of patients. Results is saved to a VCF file that is compatible with the existing database.

Change History (14)

comment:1 Changed 4 years ago by Nicklas Nordborg

Status: newaccepted

comment:2 Changed 4 years ago by Nicklas Nordborg

In 5763:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Started to implement the wizard. It is possible to select an item list, but nothing will happen after registration.

comment:3 Changed 4 years ago by Nicklas Nordborg

In 5764:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added a quick check that the selected item list seems to contain alignements with a 'variants-raw.vcf.gz' file.

comment:4 Changed 4 years ago by Nicklas Nordborg

In 5765:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Started with the code for submitting the statistics script to the cluster. The framework is ready and it will submit an empty script that doesn't do anything.

comment:5 Changed 4 years ago by Nicklas Nordborg

In 5766:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Implemented export step for creating a file with patient and VCF file information. This file is intended to be used by the statistics script.

comment:6 Changed 4 years ago by Nicklas Nordborg

In 5767:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added a python script to the pipeline scripts repository that calculates variant statistics for a list of VCF files.

comment:7 Changed 4 years ago by Nicklas Nordborg

In 5768:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Started to implement the script for calculating the variant statistics. Most of the work is done by the mut_stats.py python script. After that we need to sort and index the result. The final VCF file is currently left in the working directory which is obviously not a good place.

comment:8 Changed 4 years ago by Nicklas Nordborg

In 5769:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added progress reporting to the python script that collect variant statistics.

comment:9 Changed 4 years ago by Nicklas Nordborg

In 5770:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Progress reporting is now supported by the python script.

comment:10 Changed 4 years ago by Nicklas Nordborg

In 5771:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added a logfile to get some summary statistics.

comment:11 Changed 4 years ago by Nicklas Nordborg

In 5772:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added a new job subtype so that we can get a callback when the job has ended. The callback will get some information from the log file and generate a message.

comment:12 Changed 4 years ago by Nicklas Nordborg

In 5773:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Results files are now saved to the job folder which should preserve them after the job has finished.

comment:13 Changed 3 years ago by Nicklas Nordborg

In 5815:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Updated the wizard to ask if the list contains tumor or normal samples. This option affect only the file name of the VCF that is cretated:

  • Tumors: scanb-tumors.vcf.gz
  • Normals: scanb-normals.vcf.gz


comment:14 Changed 3 years ago by Nicklas Nordborg

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.