Changes between Initial Version and Version 1 of net.sf.basedb.reggie/notes216

Jun 13, 2014, 12:03:40 PM (8 years ago)
Nicklas Nordborg



  • net.sf.basedb.reggie/notes216

    v1 v1  
     1= Updating to Reggie 2.16 =
     3Reggie 2.16 contains several new wizards in the "Secondary analysis" section. Those wizards are not plug-and-play, but require that a lot of other infrastructure has been setup and configured in a compatible manner. In this document I'll try to collect as much information as possible about the required infrastructure, but it may take some time until the information is 100% complete and up to date.
     6== Open Grid Scheduler ==
     8To run the analysis jobs, an Open Grid Scheduler cluster is required ( Reggie will connect to the cluster via SSH and auto-generate scripts that interact with the cluster and add jobs to the queuing system. Information about how to setup a cluster is beyond the scope of this document and must be sought elsewhere.
     10== Programs that must be installed on the cluster ==
     12 * Picard: Customized version currently found at:
     13   There is currently no binary release, but we hope to create one in the future.
     15 * Trimmomatic:
     17 * Bowtie2:
     19 * Tophat:
     21 * Samtools:
     23 * Custom scripts: Some bash scripts for our custom analysis pipeline found at:
     24   At the moment, use the 'trunk' version. In the future, more formalized release procedure is expected.
     27== Configuration ==
     29Before all of this can work, everything must be configured. The reggie distribution include '''reggie-ogs-hosts.xml''' which should be placed in the `WEB-INF\classes` directory (same as `base.config`). In this xml file, it is possible to configure connection information for the Open Grid cluster. You'll need the address, username and password and the SSH public key to be able to connect to the cluster.
     31In the configuration file, you'll also set up various paths on the cluster. Some paths must be globally accessible from all nodes on the cluster and some can be internal for each node. You'll also need to configure paths where data (eg. from the !HiSeq) can be found and where the analyzed data should be stored.
     33In the configuration you can also specify parameters for some of the above mentioned programs (picard and trimmomatic).
     35For some of the programs the parameters are coded directly into the pipeline scripts (bowtie and tophat).