Opened 5 months ago

Closed 7 weeks ago

#1607 closed task (fixed)

Micro RNA secondary analysis

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v5.2
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

See also ticket #1606. This ticket is for the secondary analysis pipeline.

The first step is to align the data. This has originally been done with novoalign which requires a license. We need to check if and how we can do this with Singularity containers. We should investigate if other aligners could be used instead.

After alignment we have some (old) perl scripts that calculate expressions and some other things.

Change History (18)

comment:1 by Nicklas Nordborg, 4 months ago

Milestone: Reggie v5.xReggie v5.2

comment:2 by Nicklas Nordborg, 2 months ago

In 7822:

References #1607: Micro RNA secondary analysis

Started to implement the wizard for Small RNA alignment. It is possible to selected merged sequences but it will only submit empty jobs to the cluster.

comment:3 by Nicklas Nordborg, 2 months ago

In 7823:

References #1607: Micro RNA secondary analysis

Implemented alignment with sRNAMapper (https://github.com/mzytnicki/srnaMapper). We may change to Novoalign later if that produces better results. Some statistics is also needed but it has not been implemented yet.

comment:4 by Nicklas Nordborg, 2 months ago

In 7824:

References #1607: Micro RNA secondary analysis

Added samtools flagstat to the script. The "primary mapped" value is parsed out and stored in the new ALIGNED_READS annotation.

comment:5 by Nicklas Nordborg, 2 months ago

In 7825:

References #1607: Micro RNA secondary analysis

Added a "sRNA" section to the case summary.

comment:6 by Nicklas Nordborg, 2 months ago

In 7826:

References #1607: Micro RNA secondary analysis

Implemented manual and automatic confirmation of alignment.

comment:7 by Nicklas Nordborg, 2 months ago

In 7828:

References #1607: Micro RNA secondary analysis

Changed from srnaMapper to Novoalign, due to a what seems like a bug in srnaMapper (this has been reported to the developer https://github.com/mzytnicki/srnaMapper/issues/2).

comment:8 by Nicklas Nordborg, 2 months ago

In 7830:

References #1607: Micro RNA secondary analysis

Prepared the alignment step to also include the gene expression analysis. A child item of type GeneExpression is created and we expect result files to be in a subdirectory to the alignment.

The actual analysis has not yet been implemented.

comment:9 by Nicklas Nordborg, 2 months ago

In 7832:

References #1607: Micro RNA secondary analysis

Added miRNA_expression.pl which calculates the expression (counts and cpm) for miRNA.

comment:10 by Nicklas Nordborg, 8 weeks ago

In 7833:

References #1607: Micro RNA secondary analysis

Added small_RNA_size_distribution_and_expression.pl which gives a table with size distributions for various types of small RNA. We extract values for miRNA and save to annotations FRACTION_MIRNA, miRNASizeAvg and miRNASizeStdev. Note that the fraction miRNA in this case is relative the aligned reads only, whereas the fraction miRNA on the merged parent item is against all reads and is done before alignment.

comment:11 by Nicklas Nordborg, 7 weeks ago

In 7834:

References #1607: Micro RNA secondary analysis

The bug in srnaMapper that was mentioned in [7828] has been fixed and the new version (1.0.10) is now included in the container. reggie-config.xml has been modified to include an option (via Software.ParameterSet annotation) to use srnaMapper instead of Novoalign.

comment:12 by Nicklas Nordborg, 7 weeks ago

In 7835:

References #1607: Micro RNA secondary analysis

The miRNA_expression.pl has been fixed to also check that the NM=0 for an alignment to be counted as an exact match. This is needed because srnaMApper often maps a mismatch at the ends as an insert instead of a substitution. An insert is not coded in the MD tag. Consider for example a sequence with 22 matches that has a single mismatch (A) at the end:

  • Novoalign: CIGAR: 22M, MD: 21A0, NM: 1
  • srnaMapper: CIGAR: 21=1I, MD: 21, NM: 1

comment:13 by Nicklas Nordborg, 7 weeks ago

In 7836:

References #1607: Micro RNA secondary analysis

Updated the srna-align.sh script so that it can align with both srnaMapper or Novoalign. If ALIGNER=srnaMapper then srnaMapper is used, otherwise Novoalign.

There are some differences that need to be handled:

  • srnaMapper doesn't add a @PG header to the output so we do that with samtools reheader.
  • srnaMapper doesn't add MD tag to alignments so we do that with samtools calmd.
  • srnaMapper doesn't fully sort the output (Novoalign seems to output in name order). To be sure that the perl scripts get proper SAM files we use samtools collate.

comment:14 by Nicklas Nordborg, 7 weeks ago

In 7837:

References #1607: Micro RNA secondary analysis

Added "Novoalign" and "srnaMapper" to the AlignemntType annotation.

comment:15 by Nicklas Nordborg, 7 weeks ago

In 7841:

References #1607: Micro RNA secondary analysis

Added an array design to the wizard.

comment:16 by Nicklas Nordborg, 7 weeks ago

In 7842:

References #1607: Micro RNA secondary analysis

Added support for simple validation of the GFF3 file that we use in the miRNA analysis pipeline. The GFF3 file should be attached to the array design. We need to manually create a new Sequencing variant for miRNA and give it an external id=sequencing.mirna.

comment:17 by Nicklas Nordborg, 7 weeks ago

In 7843:

References #1607: Micro RNA secondary analysis

Implemented auto-confirmation for the demux step and fixes an issue with the auto-confirmation after the alignment step.

comment:18 by Nicklas Nordborg, 7 weeks ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.