Opened 5 years ago

Closed 5 years ago

#849 closed defect (fixed)

Demux wizard doesn't check that items have names that are accepted by downstream analysis steps

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.2
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

Much of the automatic analysis pipeline builds on interacting with the computing cluster dynamically via ssh. We are generating commands and script on the fly and sometimes those commands and scripts contain text that is retrieved from item names and properties from the BASE database. For example, file names and folder structure is based on the name of the library. To prevent bad things from happening on the cluster in case someone "evil" messes with the BASE database we normally check everything that is retrieved from the database before we put it into a script. For example, we do not want it to be possible to wipe the filesystem if someone names a library rm -rf /.

In the current implementation the checks are quite restricted and only letters, numbers, dot and underscore is allowed [a-z0-9._].

However it seems like the demux step is not making the same checks when generating the demultiplex file. Just by looking at the code it seems like it is generating output file names based on the item names without any check at all. In this case it is just a configuration file going into Picard, but I think we should really go through the code and the script it generates to see if we can find more places that are not checked. Then, the checks should be implemented.

Change History (6)

comment:1 Changed 5 years ago by Nicklas Nordborg

Status: newassigned

comment:2 Changed 5 years ago by Nicklas Nordborg

After investigating this issue it was found that it is just not a problem with the demux wizard not checking all values. There are actually two different checks that are implemented:

  • File paths checks
  • Script parameter checks

They checks are similar but there are a dew differences:

  • File paths may contain '/' but not '..' or '-'

In some cases (in the demux wizard and also other wizards), script parameters will end up as file/path names which may be problematic if they have a value that pass the 'script parameter check' but no the 'file path' check.

comment:3 Changed 5 years ago by Nicklas Nordborg

(In [3753]) References #849: Demux wizard doesn't check that items have names that are accepted by downstream analysis steps

Implemented check in the barocode/multiplex file exporter that checks that barcode names and sequences are ok. Still no checks on other items names (demux and merge items) that are also used to generate path names.

comment:4 Changed 5 years ago by Nicklas Nordborg

(In [3754]) References #849: Demux wizard doesn't check that items have names that are accepted by downstream analysis steps

Moved check methods to SshUtil and added checkValidFilename() which should be used to validate, for example, script parameters that will end up as a part in a filename/path.

All validation is still calling the old methods. We need to investigate the code in each place for the best check method to use.

comment:5 Changed 5 years ago by Nicklas Nordborg

(In [3755]) References #849: Demux wizard doesn't check that items have names that are accepted by downstream analysis steps

Added some more checks and changed some checks to use the SshUtil.checkValidFilename() instead of SshUtil.checkValidScriptParameter().

comment:6 Changed 5 years ago by Nicklas Nordborg

Resolution: fixed
Status: assignedclosed

Seems to be working

Note: See TracTickets for help on using tickets.