#593 closed task (fixed)
Start masking and alignment
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | Reggie v2.16 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description (last modified by )
Part of #533.
The wizard is started after successfully demultiplexing and merging data from a sequencing run. In the first step MergedSequences
items should be selected. The items need to be annotated with AnalysisResult=Successful
and AutoProcessing!=Disable
(eg. due to manually deselecting some libraries in the demux ended wizard).
For each selected item, the wizard create one MaskedSequences
child item:
- Name of item:
<lib-name>.g.k, <lib-name>.g.k2
- Software and protocol: (Type=Masking)
The wizard also create one AlignedSequences
grandchild item:
- Name of item:
<lib-name>.g.k.a, <lib-name>.g.k.a2
- Software and protocol: (Type=Alignment)
Parameters for the filter step:
- Target genome that filters away unwanted sequences. This could be hardcoded into the script, a configuration setting, or user selectable in the wizard.
- Other parameters ????
Parameters for the alignment step:
- Target genome to align against. This could be hardcoded into the script, a configuration setting, or user selectable in the wizard.
- Location were the final result files should be stored. This should probably be a configuration setting.
- Other parameters ????
After the alignment is done some information may be imported back to BASE. It would be nice to have number of aligned sequences, and possible some other information that is not yet decided.
Change History (27)
comment:1 by , 11 years ago
Status: | new → assigned |
---|
comment:2 by , 11 years ago
comment:3 by , 11 years ago
(In [2389]) References #593: Start filter and alignment
Added 'job priority' and 'debug' options to wizard. Started with the servlet for creating the FilteredSequences
and AlignedSequences
items.
A job script is generated and submitted to the cluster, but except for copying FASTQ files to the node this currently does nothing since not everything is in place on the cluster.
comment:4 by , 11 years ago
comment:5 by , 11 years ago
comment:6 by , 11 years ago
comment:7 by , 11 years ago
comment:8 by , 11 years ago
Summary: | Start filter and alignment → Start masking and alignment |
---|
comment:9 by , 11 years ago
Description: | modified (diff) |
---|
comment:10 by , 11 years ago
(In [2420]) References #533, #547, #548, #593, #595. Renamed FilteredSequences
subtype to MaskedSequences
and the related software and protocol type. Renamed annotations NumReads
to READS
and PassedFilterReads
to PF_READS
and added new annotation for number of reads on the masked (PM_READS
) and aligned level (ALIGNED_PAIRS
).
Lots of related changes in the code to make class and variable names match the new names.
comment:11 by , 10 years ago
comment:12 by , 10 years ago
(In [2535]) References #593: Start masking and alignment
Reset AutoProcessing
annotation when starting an alignment so that the bioassay disappears from the "Start masking and alignment" count and list.
Also use DISTINCT when counting or loading the list since otherwise the same bioassay will appear multiple times after a re-alignment.
comment:13 by , 10 years ago
(In [2560]) References #547 and #593. Do not use more threads than the number of slots that has been assigned by the queue system.
The number of slots that has been assigned is present in the NSLOTS enviroment variable and this is compared to the number of cores on the node. The smaller number is selected.
comment:14 by , 10 years ago
comment:15 by , 10 years ago
comment:16 by , 10 years ago
comment:17 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:18 by , 10 years ago
comment:19 by , 10 years ago
comment:20 by , 10 years ago
comment:21 by , 10 years ago
(In [2633]) References #593 and #595. Added "delete items created by failed jobs" option to alignment confirmation wizard.
This will delete MaskedSequences
and AlignedSequences
items so that the database is not filled up with unintersting items.
Re-starting the alignment will create new items with the same names so the script sent to the cluster has been modified so that it makes sure that the folders it is going to use are empty before starting to add data to them. Eg:
mkdir -p folder rm -rf folder/*
comment:22 by , 10 years ago
(In [2635]) References #593 and #595. Changes in [2633] that ensure job folders are empty also deleted sample sheet files uploaded by the demux script. The job definition has been modified so that files that are needed by the job must be part of the definition and uploaded at the same time as the job is sent to the cluster.
comment:23 by , 10 years ago
comment:24 by , 10 years ago
comment:25 by , 10 years ago
(In [2706]) References #593 and #595. Parse 'accepted_hits_picardmetrics.csv' and read out three values:
- READ_PAIRS_EXAMINED
- READ_PAIR_DUPLICATES
- PERCENT_DUPLICATION (FRACTION_DUPLICATION)
The values are stored in annotations on the AlignedSequences item. Note that what Picard says is a percentage is actually a fraction.
(In [2375]) References #593: Start filter and alignment
Started to implement this wizard. The index page shows the count and the wizard display the merged sequences waiting for alignment. Manual selection is possible. Registration does nothing.