Opened 9 months ago
Last modified 7 months ago
#1575 closed task
Impute genotypes from OncoArray data — at Version 28
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | Reggie v4.54 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description (last modified by )
We have some samples which have been genotyped on the OncoArray platform. It should be possible to implement an analysis that impute genotypes for a lot more positions.
We have already made some tests with Shapeit5 (https://odelaneau.github.io/shapeit5/) and Impute5 (https://jmarchini.org/software/#impute-5) which seems to be working well.
Update We decided to use Beagle instead. See https://faculty.washington.edu/browning/beagle/beagle.html
Reference files can be downloaded from: https://github.com/odelaneau/shapeit4/tree/master/maps and https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/
Update This reference data set doesn't include all variants that we need for the PRS313 calculation. Instead we will use the Phase3 reference: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ and convert it to hg38.
Once we have the imputed genotypes we can use them to calculate a Polygenic Risc Score (PRS). See other ticket #1576.
Change History (28)
comment:1 by , 9 months ago
comment:6 by , 9 months ago
In 7604:
References #1575: Impute genotypes from OncoArray data
Implemented the phasing step with ShapeIT. Since the program is not so good at multi-threading, we start one process per chromosome instead limited to the number of assigned threads. The wrapper code is very similar to the code in the WGS variant calling. A full run takes ~1.5 hour to complete this step (using 8 threads).
comment:28 by , 7 months ago
Description: | modified (diff) |
---|
In 7599: