Opened 9 years ago
Last modified 8 years ago
#887 closed task
Release export wizard — at Version 3
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | critical | Milestone: | Reggie v4.5 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description (last modified by )
Implement a wizard for creating all files that should be included in a release. Th wizard should take an item list with raw bioassays as input and produce a lot of files. Unless noted, all files are tab-separated text files.
- Transcript data (in folder
dataTables/transcriptDataTable
):tidmatrix.features.txt
: Array design features with some annotations. The first line is a header line:id
,geneSymbol
,refSeq
,protAcc
,description
,chr
,entrez
.- Rows are sorted by ID.
- All raw bioassays in the input list must use the same array design.
tidmatrix_data.txt
: FPKM values for all raw bioassays. Each row represents a feature and each column a raw bioassay.- The first line is a header line with raw bioassay names.
- The first column contains the feature ID.
- Same order of rows as the
tidmatrix.features.txt
.
tidmatrix_FPKM_conf_hi.txt
,tidmatrix_FPKM_conf_lo.txt
,tidmatrix_FPKM_status.txt
: More data files similar to thetidmatrix_data.txt
file but with theFPKM_conf_hi
,FPKM_conf_lo
andFPKM_status
values.
- Gene data (in folder
dataTables/geneDataTable
):genematrix_data.txt
: Sum of FPKM values per gene symbol.- The first line is a header line with raw bioassay names.
- The first column is the gene symbol (in
no particularalphabetical order).
is.NM.gene.txt
: TRUE/FALSE flag for each gene indicating if the refSeq ID starts withNM_
or not.- No header line.
- First column is the line number (in this file, add +1 for getting the line number in
genematrix_data.txt
). - Second column is
TRUE
orFALSE
.
- Cohort data (in folder
cohortTables
): A set of tab-separated files with data for each raw bioassay and the parent items it is derived from. Each file starts with a header line. Each row contains data for one raw bioassay. The first column (rba
) is always the name of the raw bioassay.cohortRawbioassay.txt
: Data from the raw bioassay level. Columns:ID
: Internal ID in BASEName
: Name of raw bioassayPlatform
: Name of platform (Sequencing)Raw.data.type
: Name of raw data type (cufflinks)Has.data
: Flag indicating if there is raw data for this raw bioassay or not (TRUE/FALSE)Db.spots
: Number of raw data entriesArray.design
: Name of the array designSoftware
: Name of the software used to generate the raw dataImport.date
: Date the raw data was created (in YYYY-MM-DD format)AnalysisResult..A.
: Successful/FailedDataFilesFolder..A.
: Path to folder in project archive file server where data files are locatedFPKM.tracking.file..F.
: Path to theisoforms.fpkm_tracking
file in the BASE file system
cohortAligned
: Data from theAlignedSequences
parent item. Columns:- TODO
cohortMasked.txt
: Data from theMaskedSequences
parent item. Columns:- TODO
cohortMerged.txt
: Data from theMergedSequences
parent item. Columns:- TODO
cohortSequencing.txt
: Data from theSequencingRun
parent item. Columns:- TODO
cohortLibrary.txt
: Data from theLibrary
parent item. Columns:- TODO
cohortRNA.txt
: Data from theRNA
parent item. Columns:- TODO
cohortLysate.txt
: Data from theLysate
parent item. Columns:- TODO
cohortSample.txt
: Data from theSpecimen
parent item. Columns:- TODO
cohortCase.txt
: Data from theCase
parent item (except INCA data). Columns:- TODO
cohortPatient.txt
: Data from thePatient
parent item. Columns:- TODO
cohortStained.txt
: Data from theStained
parent item. Columns:- TODO
cohortINCA.txt
: Data from parent items (eg. Case) that have been imported from the INCA registry. Columns:- TODO
cohortSummaryTable.txt
: A single table collecting some of the most useful information from the other tables.
- Subtype data (in folder
cohortTables/subtypeTables
): Information generated by the R report scripts. We do not currently store this information in BASE, so it needs to be discussed how this should be done. The report plug-in could for example import the data from the R scripts as annotations.
- README files
- TODO
Change History (3)
comment:1 by , 9 years ago
Description: | modified (diff) |
---|
comment:2 by , 9 years ago
Description: | modified (diff) |
---|
comment:3 by , 9 years ago
Description: | modified (diff) |
---|
Note:
See TracTickets
for help on using tickets.
For efficient calculations it is desirable to process the data gene symbol by gene symbol. Thus, the data must come sorted in gene symbol order.