Opened 3 months ago

Closed 7 weeks ago

#1346 closed task (fixed)

Implement support for OncoArray SNP data

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.34
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

This ticket is about importing the OncoArray SNP data that we already have for 1995 blood samples.

We already have BloodDNA items in BASE that represents the aliquots used for this. Typical name is 1234567.b.d.x1.x1. The SNP data only have 1234567 as identifiers so we need to check that they match up.

The plan is to represent the lab-work also in the data-structure in BASE. The BloodDNA items should be linked to PhysicalBioAssay items representing the actual SNP chip used. A new subtype BeadChip should be created with a naming convention BeadChipNNNN, where NNNN is a counter (similar to how FlowCells are named). Annotations on a BeadChip:

  • ChipType: (eg. OncoArray500K)
  • Barcode: (numerical, eg. 10001187003)
  • more... ?

The Barcode will allow to find data files related to the chip since it is expected to be part of a directory name in a given folder structure. Each BeadChip has 24 locations and should be linked to the 24 BloodDNA items used on the chip. To find the correct data files for each sample we need information about the location. Locations are named with row+column coordinates (R01C01... R12C02). Theoretically we can construct a location string from the index (1-24), but it may be better to store this as annotation on the BloodDNA items.

The scanning of a BeadChip is represented by a DerivedBioAssay Scan item. Dates, Scanner ID, etc. can be extract from the data we have and should be imported as annotations or linked Hardware items. The DataFilesFolder annotation will point to a folder with the scan data. We will need the *.idat files in the next step.

The scanned data (*.idat) will be analyzed by iaap-cli (https://support.illumina.com/downloads/iaap-genotyping-cli.html) to produce a set of 24 GTC files. The result from this step will be represented by DerivedBioAssay GenotypeCall items (one for each sample). We use the regular naming convention by adding a .gt suffix (but the x1 are removed). Example: 1234567.b.d.gt. The GTC files will be stored in the <project-archive> using a similar convention to what we already have (eg. `../12/1234567.b/d.gt). Other useful metadata from the genotype calling can be stored as annotations:

  • Call rate
  • GC10, GC50
  • etc...

Downstream analysis

The GTC files can be used for extracting information to other formats. For example, it is possible to export tab-separated files or convert to VCF files. This will be addressed in another ticket.

Wizards

We do not plan to implement wizards for this. Reggie will simply create/define item types, annotation types, etc. that are needed. Batch importers will be used to import data and batch exporters will be used to get data into scripts that are manually created.

Change History (14)

comment:1 Changed 3 months ago by Nicklas Nordborg

In 6461:

References #1346: Implement support for OncoArray? SNP data

Added BeadChip and Scan item subtype and several new annotation types related to those items (BeadChipID, BeadChipType, BeadChipPosition and ScanDate).

DNA/Genotyping is a new value for the Pipeline annotation.

comment:2 Changed 3 months ago by Nicklas Nordborg

In 6475:

References #1346: Implement support for OncoArray? SNP data

Added GenotypeCall derived bioassay type and related file and software type.

comment:3 Changed 3 months ago by Nicklas Nordborg

In 6476:

References #1346: Implement support for OncoArray? SNP data

Added more annotation types for GenotypeCall items for storing data extracted from the GTC files.

comment:4 Changed 3 months ago by Nicklas Nordborg

In 6477:

References #1346: Implement support for OncoArray? SNP data

Added Scanner subtype.

comment:5 Changed 3 months ago by Nicklas Nordborg

In 6478:

References #1346: Implement support for OncoArray? SNP data

Added DAO files for BeadChip, Scan and GenotypeCall items.

comment:6 Changed 3 months ago by Nicklas Nordborg

In 6479:

References #1346: Implement support for OncoArray? SNP data

Added GenotypeCall section to the Case summary.

comment:7 Changed 7 weeks ago by Nicklas Nordborg

In 6511:

References #1346: Implement support for OncoArray? SNP data

Added QC_GenotypeCount and QC_GenotypeHET_PCT to the GenotypeCall annotation category.

comment:8 Changed 7 weeks ago by Nicklas Nordborg

In 6512:

References #1346: Implement support for OncoArray? SNP data

Added QC_GenotypeStatus to the annotation category

comment:9 Changed 7 weeks ago by Nicklas Nordborg

In 6513:

References #1346: Implement support for OncoArray? SNP data

Re-factored handling of quality scores to make it easier to handle output from different programs. For example, HaplotypeCaller output scores in the range 0-99 where <50 is low and 99 is high, while GenomeStudio (and iaap-cli) output scores in the range 0-1 where <0.2 is low and >=0.9 is high.

comment:10 Changed 7 weeks ago by Nicklas Nordborg

In 6514:

References #1346: Implement support for OncoArray? SNP data

Re-factored handling of quality scores to make it easier to handle output from different programs. For example, HaplotypeCaller output scores in the range 0-99 where <50 is low and 99 is high, while GenomeStudio (and iaap-cli) output scores in the range 0-1 where <0.2 is low and >=0.9 is high.

comment:11 Changed 7 weeks ago by Nicklas Nordborg

In 6515:

References #1346: Implement support for OncoArray? SNP data

Added the JSP/JS files for the manual genotype QC check.

comment:12 Changed 7 weeks ago by Nicklas Nordborg

In 6516:

References #1346: Implement support for OncoArray? SNP data

Hide the manual genotype check for non-administrators.

comment:13 Changed 7 weeks ago by Nicklas Nordborg

In 6517:

References #1346: Implement support for OncoArray? SNP data

Added genotype QC counts and HET percentage to the case summary and a link to the qc_genotype.vcf file.

comment:14 Changed 7 weeks ago by Nicklas Nordborg

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.