Opened 10 years ago

Last modified 10 years ago

#593 closed task

Start masking and alignment — at Version 9

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v2.16
Component: net.sf.basedb.reggie Keywords:
Cc:

Description (last modified by Nicklas Nordborg)

Part of #533.

The wizard is started after successfully demultiplexing and merging data from a sequencing run. In the first step MergedSequences items should be selected. The items need to be annotated with AnalysisResult=Successful and AutoProcessing!=Disable (eg. due to manually deselecting some libraries in the demux ended wizard).

For each selected item, the wizard create one MaskedSequences child item:

  • Name of item: <lib-name>.g.k, <lib-name>.g.k2
  • Software and protocol: (Type=Masking)

The wizard also create one AlignedSequences grandchild item:

  • Name of item: <lib-name>.g.k.a, <lib-name>.g.k.a2
  • Software and protocol: (Type=Alignment)

Parameters for the filter step:

  • Target genome that filters away unwanted sequences. This could be hardcoded into the script, a configuration setting, or user selectable in the wizard.
  • Other parameters ????

Parameters for the alignment step:

  • Target genome to align against. This could be hardcoded into the script, a configuration setting, or user selectable in the wizard.
  • Location were the final result files should be stored. This should probably be a configuration setting.
  • Other parameters ????

After the alignment is done some information may be imported back to BASE. It would be nice to have number of aligned sequences, and possible some other information that is not yet decided.

Change History (9)

comment:1 by Nicklas Nordborg, 10 years ago

Status: newassigned

comment:2 by Nicklas Nordborg, 10 years ago

(In [2375]) References #593: Start filter and alignment

Started to implement this wizard. The index page shows the count and the wizard display the merged sequences waiting for alignment. Manual selection is possible. Registration does nothing.

comment:3 by Nicklas Nordborg, 10 years ago

(In [2389]) References #593: Start filter and alignment

Added 'job priority' and 'debug' options to wizard. Started with the servlet for creating the FilteredSequences and AlignedSequences items.

A job script is generated and submitted to the cluster, but except for copying FASTQ files to the node this currently does nothing since not everything is in place on the cluster.

comment:4 by Nicklas Nordborg, 10 years ago

(In [2394]) References #593: Start filter and alignment

Generate a script that is working for the filter step. The filtered_<lib-name>.out file is parsed for number of remaining reads which is stored as NumReads annotation on the FilteredSequences item.

comment:5 by Nicklas Nordborg, 10 years ago

(In [2397]) References #593: Start filter and alignment

Now running tophat and sync files back to project_archive. Using tophat_single.sh instead of tophat.sh so we don't have to mess with samplesheet.csv.

comment:6 by Nicklas Nordborg, 10 years ago

(In [2399]) References #593: Start filter and alignment

Running statistics_tophat.sh to get some information about aligned reads that we can import back to BASE (as NumReads annotation on AlignedSequences).

comment:7 by Nicklas Nordborg, 10 years ago

(In [2414]) References #593: Start filter and alignment

Removed 'filter_' prefix in the 'PE_filter' script.

comment:8 by Nicklas Nordborg, 10 years ago

Summary: Start filter and alignmentStart masking and alignment

comment:9 by Nicklas Nordborg, 10 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.