Context Navigation

← Previous Ticket
Next Ticket →

#887 closed task

Release export wizard — at Version 2

Reported by:	Nicklas Nordborg	Owned by:	Nicklas Nordborg
Priority:	critical	Milestone:	Reggie v4.5
Component:	net.sf.basedb.reggie	Keywords:
Cc:

Description (last modified by Nicklas Nordborg)

Implement a wizard for creating all files that should be included in a release. Th wizard should take an item list with raw bioassays as input and produce a lot of files. Unless noted, all files are tab-separated text files.

Transcript data (in folder dataTables/transcriptDataTable):
- tidmatrix.features.txt: Array design features with some annotations. The first line is a header line:
  - id, geneSymbol, refSeq, protAcc, description, chr, entrez.
  - Rows are sorted by ID.
  - All raw bioassays in the input list must use the same array design.
- tidmatrix_data.txt: FPKM values for all raw bioassays. Each row represents a feature and each column a raw bioassay.
  - The first line is a header line with raw bioassay names.
  - The first column contains the feature ID.
  - Same order of rows as the tidmatrix.features.txt.
- tidmatrix_FPKM_conf_hi.txt, tidmatrix_FPKM_conf_lo.txt, tidmatrix_FPKM_status.txt: More data files similar to the tidmatrix_data.txt file but with the FPKM_conf_hi, FPKM_conf_lo and FPKM_status values.

Gene data (in folder dataTables/geneDataTable):
- genematrix_data.txt: Sum of FPKM values per gene symbol.
  - The first line is a header line with raw bioassay names.
  - The first column is the gene symbol (in ~~no particular~~ alphabetical order).
- is.NM.gene.txt: TRUE/FALSE flag for each gene indicating if the refSeq ID starts with NM_ or not.
  - No header line.
  - First column is the line number (in this file, add +1 for getting the line number in genematrix_data.txt).
  - Second column is TRUE or FALSE.

Cohort data (in folder cohortTables):
- TODO

README files
- TODO

Change History (2)

comment:1 by Nicklas Nordborg, 9 years ago

Description:	modified (diff)

comment:2 by Nicklas Nordborg, 9 years ago

Description:	modified (diff)

For efficient calculations it is desirable to process the data gene symbol by gene symbol. Thus, the data must come sorted in gene symbol order.

Note: See TracTickets for help on using tickets.

Download in other formats: