Opened 2 years ago

Closed 2 years ago

#1402 closed task (fixed)

Implement a new Hisat/Stringtie pipeline

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.40
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

The existing Hisat/Stringtie pipeline is getting old. There a newer versions of almost all programs that are used and there are also new versions of genome references.

It should be possible to run the new pipeline in parallel with the existing pipeline. The existing pipeline may be modified to save less data, but this remains to be decided.

The existing pipeline also include additional analysis step (eg. variant calling, SSP, etc) that may not be immediately available in the new pipeline.

Change History (23)

comment:1 by Nicklas Nordborg, 2 years ago

In 6795:

References #1402: Implement a new Hisat/Stringtie pipeline

Added two new basic container defintions for Rocky Linux 9.0 that can be used as starting points for the new pipeline.

There were some problems:

  • The en_US.UTF-8 locale was not installed resulting in lots of warning messages like Failed to set locale, defaulting to C.UTF-8. Solved by installing glibc-langpack-en.
  • find was not installed. Solved by installing findutils.
  • perl scripts would not run at all, but showed an error message: error while loading shared libraries: libcrypt.so.1 not found. Solved by installing libxcrypt-compat.


An additional problem was that Rocky Linux refused to start with error message: fatal glibc error: cpu does not support x86-64-v2
It turned out to be related to an issue with my VirtualBox installation that was running in the slower "green turtle" mode. The reason for that was that Microsoft Hyper-V had already claimed the resources needed for hardware virtualization (VT-x/AMD-V). More information and solution here: https://forums.virtualbox.org/viewtopic.php?f=1&t=62339

comment:2 by Nicklas Nordborg, 2 years ago

In 6806:

References #1402: Implement a new Hisat/Stringtie pipeline

Added a new container definition with new versions of the software used in the Hisat alignment. I call this the "Hisat2023" pipeline.

comment:3 by Nicklas Nordborg, 2 years ago

Milestone: Relax v1.xReggie v4.40
Status: newaccepted

comment:4 by Nicklas Nordborg, 2 years ago

In 6807:

References #1402: Implement a new Hisat/Stringtie pipeline

Created a new item list (Hisat 2023 Pipeline) for merged sequences to be processed with the new pipeline.

Added a new option to the Pipeline annotation (RNAseq/2023/Hisat/StringTie).

Added a new section on the Reggie index page for wizards in the new pipeline.

comment:5 by Nicklas Nordborg, 2 years ago

In 6809:

References #1402: Implement a new Hisat/Stringtie pipeline

Modified the manual and auto-confirmation wizards after demux and FASTQ import with support for adding items to the "Hisat/2023 pipeline".

comment:6 by Nicklas Nordborg, 2 years ago

In 6811:

References #1402: Implement a new Hisat/Stringtie pipeline

Renamed pipeline to RNAseq/Hisat/2023 and added a separate value for stringtie: RNAseq/StringTie/2023.

comment:7 by Nicklas Nordborg, 2 years ago

In 6813:

References #1402: Implement a new Hisat/Stringtie pipeline

Started to implement the "Start Hisat/2023 alignment wizard". Submitting jobs are not yet implemented.

comment:8 by Nicklas Nordborg, 2 years ago

In 6814:

References #1402: Implement a new Hisat/Stringtie pipeline

Implemented job submission for the Hisat/2023 pipeline.

Since the new pipeline will only run as singularity containers in the new Slurm-based cluster a new <host> entry was added to reggie-config.xml.

Most of the new implementation is the same as the existing one. There are some differences:

  • hisat2023.sh: The new version of GATK need different syntax for the parameters. Some parameters have changed names.
  • Genotype QC is disabled for the aligments in the new pipeline. The is no need to duplicate this functionality yet. The VCF files are created and linked so it would be relatively easy to switch in the future.

comment:9 by Nicklas Nordborg, 2 years ago

In 6815:

References #1402: Implement a new Hisat/Stringtie pipeline

Implemented the manual confirmation for the Hisat/2023.

comment:10 by Nicklas Nordborg, 2 years ago

In 6816:

References #1402: Implement a new Hisat/Stringtie pipeline

Partial implementation of the auto-confirmation for Hisat/2023. It is not possible to start the StringTie step yet.

comment:11 by Nicklas Nordborg, 2 years ago

In 6817:

References #1402: Implement a new Hisat/Stringtie pipeline

New entries on the index page for StringTie/2023.

comment:12 by Nicklas Nordborg, 2 years ago

In 6818:

References #1402: Implement a new Hisat/Stringtie pipeline

Started to implement the "Start StringTie/2023 wizard". Submitting jobs are not yet implemented.

comment:13 by Nicklas Nordborg, 2 years ago

In 6819:

References #1402: Implement a new Hisat/Stringtie pipeline

Array design need a "Pipeline" annotation so that we can select the correct design for the new StringTie/2023 pipeline.

comment:14 by Nicklas Nordborg, 2 years ago

In 6820:

References #1402: Implement a new Hisat/Stringtie pipeline

Added a new container definition with new versions of the software used in the StringTie step. I call this the "StringTie2023" pipeline.

comment:15 by Nicklas Nordborg, 2 years ago

In 6821:

References #1402: Implement a new Hisat/Stringtie pipeline

Starting StringTie/2023 jobs should now work.

comment:16 by Nicklas Nordborg, 2 years ago

In 6822:

References #1402: Implement a new Hisat/Stringtie pipeline

Added wizard for manual confirmation of StringTie/2023. There are no additional steps.

comment:17 by Nicklas Nordborg, 2 years ago

In 6823:

References #1402: Implement a new Hisat/Stringtie pipeline

Implemented auto-confirmation for StringTie/2023.

comment:18 by Nicklas Nordborg, 2 years ago

In 6832:

References #1402: Implement a new Hisat/Stringtie pipeline

Fixed some incorrect paths to references.

comment:19 by Nicklas Nordborg, 2 years ago

In 6838:

References #1402: Implement a new Hisat/Stringtie pipeline

The GTF used by StringTie have a new name since there are now entries with transcript_type=protein_coding|IG_C_gene|TR_C_gene.

comment:20 by Nicklas Nordborg, 2 years ago

In 6839:

References #1402: Implement a new Hisat/Stringtie pipeline

Added ${TMPDIR} to the list of directories that are binded into the singularity container since there is a risk that those directories will be inaccessible otherwise.

comment:21 by Nicklas Nordborg, 2 years ago

In 6840:

References #1402: Implement a new Hisat/Stringtie pipeline

The new version of samtools that was switched out for Tophat didn't work properly (there were a lot of "Broken pipe" error messages related to samtools). It seems like the version that ships with Tophat works in the container environment.

comment:22 by Nicklas Nordborg, 2 years ago

In 6841:

References #1402: Implement a new Hisat/Stringtie pipeline

The legacy script may need a higher value for ulimit -n (number of open files).

comment:23 by Nicklas Nordborg, 2 years ago

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.