Opened 9 years ago
Last modified 8 years ago
#887 closed task
Release export wizard — at Version 2
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | critical | Milestone: | Reggie v4.5 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description (last modified by )
Implement a wizard for creating all files that should be included in a release. Th wizard should take an item list with raw bioassays as input and produce a lot of files. Unless noted, all files are tab-separated text files.
- Transcript data (in folder
dataTables/transcriptDataTable
):tidmatrix.features.txt
: Array design features with some annotations. The first line is a header line:id
,geneSymbol
,refSeq
,protAcc
,description
,chr
,entrez
.- Rows are sorted by ID.
- All raw bioassays in the input list must use the same array design.
tidmatrix_data.txt
: FPKM values for all raw bioassays. Each row represents a feature and each column a raw bioassay.- The first line is a header line with raw bioassay names.
- The first column contains the feature ID.
- Same order of rows as the
tidmatrix.features.txt
.
tidmatrix_FPKM_conf_hi.txt
,tidmatrix_FPKM_conf_lo.txt
,tidmatrix_FPKM_status.txt
: More data files similar to thetidmatrix_data.txt
file but with theFPKM_conf_hi
,FPKM_conf_lo
andFPKM_status
values.
- Gene data (in folder
dataTables/geneDataTable
):genematrix_data.txt
: Sum of FPKM values per gene symbol.- The first line is a header line with raw bioassay names.
- The first column is the gene symbol (in
no particularalphabetical order).
is.NM.gene.txt
: TRUE/FALSE flag for each gene indicating if the refSeq ID starts withNM_
or not.- No header line.
- First column is the line number (in this file, add +1 for getting the line number in
genematrix_data.txt
). - Second column is
TRUE
orFALSE
.
- Cohort data (in folder
cohortTables
):- TODO
- README files
- TODO
Change History (2)
comment:1 by , 9 years ago
Description: | modified (diff) |
---|
comment:2 by , 9 years ago
Description: | modified (diff) |
---|
Note:
See TracTickets
for help on using tickets.
For efficient calculations it is desirable to process the data gene symbol by gene symbol. Thus, the data must come sorted in gene symbol order.