MageTab Exporter Plug-in --------------------------- From BASE2 experiment the plug-in exports the experimental metadata in MageTab format I Description ---------------- Executed on an experiment in BASE2, the plug-in generates an SDRF and IDF files (see MageTab specification at http://www.mged.org/mage-tab/) with experiment's metadata, and an archive containing raw data files. II Requirements ---------------- The plug-in was developed and tested with BASE 2.16.1. BASE 2.17.x versions should also be fine. If you are able to use the plug-in with earlier versions of BASE, please let the developer know. III Installation ---------------- (1) Drop the jar in a folder in BASE (e.g. /plugins). (2) Log in to BASE as a root, browse to Administrate -> Plugins -> Definitions, and click New. (3) In the dialog that opens, enter "no.uib.cbu.base.magetabexport.MageTabExporterPlugin" for class, and path to the plug-in jar file (e.g. /plugins/MageTabExporter-x.x.jar) for path. (4) In the job agents tab, add job agents that will be able to execute the plug-in (if any). (5) Click Save The new plug-in appears under the name "MageTab exporter" in the plug-ins list, and is now ready for use. IV Usage ---------------- The plug-in can be launched from the details page of an experiment. Please make sure that the experiment that is to be exported is properly created (documented). Use the experiment overview/validation in BASE. The plug-in supports pooled samples, extracts and labeled extracts, but currently the maximum level of nesting of items of one type is 2. This means that pool of samples (and extracts/labeled extracts) is supported, but pool of pooled samples (or extracts/labeled extracts) is not. The MageTab exporter plug-in has following parameters: - Experiment the experiment to export, set by default to the one that was open when launching the plug-in - Save as a path and prefix for the files that are created, e.g. using /home/me/prefix will result in two files being created during the plug-in execution: /home/me/prefix_idf.txt and /home/me/prefix_sdrf.txt - Save raw data archive as name of the zip archive with raw data - Overwrite if files with the same names as the ones specified in "Save as" and/or "Save raw data archive as" should be overwritten - Release date a date that is set in the IDF file as the Public Release Date - Quote fields if the content in tab-delimited IDF and SDRF files should be quoted with double quotes - Handle missing content by in certain experimental setups missing/empty content is allowed for biosources and/or samples. The plug-in can ignore these missing content and fill empty values with a replacement text (see next parameter). If you don't expect missing items in your experiment and set the parameter to Error, the plug-in will fail with an error message if missing/empty item is found - Replace missing content with a text to replace missing items, e.g. "N/A", "-", "" - Raw data file type select the type of the raw data files that should be included in the raw data file archive. Options available here are platform/variant specific - ArrayExpress accession AnnotationType type of annotations holding ArrayExpress accession ids (see (2) on section V to learn more) - Material Type AnnotationType type of annotations holding MGED ontology Material Type terms (see (3) in section V to learn more) The MageTab exporter plug-in supports configuration, so site's default/preferred settings of many parameters can be stored and reused to ease the execution. V Useful tips ---------------- To get most of the plug-in capabilities you could: (1) make sure that the format of the author and affiliation fields is parsable by the plug-in. This will allow to split individual authors and automatically map them to institutions they are affiliated with. Format the fields in the following way: - individual author entires should be split by ", " (comma + space) - author's names should be split by space, and in follow the order: first_name middle_name last_name - affiliation mark(s) should be numbers following the last name - between last name and affiliation mark(s) no spaces/characters is allowed - affiliation marks should be split by a single comma Example: "John Jack Jones1,3 Joly Jane2, Jeremy Joe Jakes2,1,3" - individual institutions in affiliation field should be in separate lines (split by \n, \r or both) - line number is the institution mark, e.g institution in line 3 will be mapped to authors with affiliation mark 3. - if you put a number in front of the institution name it will help you check if the plug-in did a good job mapping authors with affiliations Example: "1 Institute Without a Name, Somestreet 100, 5044 Sometown, Somewhere 2 Institute With a Name, Otherstreet 10, 1042 Othertown, Somewhereelse" The affiliation marks and the numbers preceding institutions are not removed, and will be present in the exported IDF file. This allows to check if the automatic mapping is correct. Afterwards, before the files are submitted to ArrayExpress, the numbers should be removed. (2) create an annotation type representing an ArrayExpress accession identifier. The annotation type should be enabled for array designs and protocols, and used to set accession ids on the items that had been previously exported to ArrayExpress. If you export an experiment using a design or protocol(s) with such an annotation, the plug-in will pick up identifiers from the annotations, and use them in the exported SDRF file (Array Ref and Protocol Ref columns). Just remember to set the correct annotation type when you configure the plug-in ("ArrayExpress accession AnnotationType" parameter). (3) create an annotation type representing an MGED Ontology term: Material Type. The annotation type should be enabled for biosources, samples, extracts and labeled extracts. If you then annotate your biomaterials using this annotation type, and select it in the plug-in configuration parameter ("Material Type AnnotationType"), the plug-in will use the annotations in the exported SDRF file (Material Type column). (4) store some of the settings in a plug-in configuration. The annotation types are the best candidates. Also raw data file types could be predefined in platform specific configurations. License ---------------- All rights reserved. This program and the accompanying materials are made available under the terms of the GNU Public License v3.0 which accompanies this distribution, and is available at http://www.gnu.org/licenses/gpl-3.0.html