Opened 5 years ago
Closed 5 years ago
#1160 closed task (fixed)
Investigate changes in Picard since our last modified version
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | critical | Milestone: | Reggie v4.23 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description (last modified by )
We are using a modified version of Picard for demuxing. It is base on Picard 1.128 and changes are listed on our BASE-hacks site: http://dev.thep.lu.se/basehacks/wiki/Picard
The MIPs pipeline is using UMI (unique molecular indices) which require that 'M' is used in the READ_STRUCTURE parameter to Picard. This is not supported in our modified version, but works in a later version (for example: 2.20.0).
However, we may not be able to just switch to the newer Picard due to our modifications. This ticket is about investigating potential issues.
To summarize our findings from below:
It should be possible to use the newer Picard version. All critical modification have also been fixed in Picard. The exception is Item 1 in Fork release 2, but this only affects 31 old sequencing runs and can be fixed by manually copying the TileMetrics.bin
file.
The remaining annoyance with all messages going to stderr should not prevent us from updating. If we can't get acceptable results by fixing code in Reggie, there is always the possibility to fork Picard again.
Change History (9)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
Fork relase 3 - Item 1
We do not use this (in the RNA-seq pipeline), but Picard now has support for this via a parameter IGNORE_UNEXPECTED_BARCODES
. Which is good since it is useful when demuxing in the MIPs pipeline.
Fork release 3 - Item 2 and 3
There is a parameter in Picard for this also, but we prefer to use uncompressed files as long as possible and then compress with pigz
. See #809 and #663.
comment:3 by , 5 years ago
Fork release 4 - Item 1
For background see #742. Picard has made some changes related to file size checks but it is not certain that it fixes our issues. This need to be tested. A quick check on the production server finds 25 sequencing runs (all of them on the NextSeq) with this problem and they seem to happen every now and then. The last one was just a week ago. So, this item is a real problem that we need to verify if it has been fixed in Picard or if we need some other workaround.
I have a made test run with a demux and it seems like Picard is able to handle this without crashing. On the other hand our code crashed when trying import data from the demultiplex_metrics.txt
file. The reason was that there are more columns that causes our regular expressions to not find the data.
comment:4 by , 5 years ago
Fork release 2 - Item 1
See #631 for background information. A quick check on the file server finds 31 sequencing runs with a 12-byte TileMetricsOut.bin file. They are all in the late 2013 or first half of 2014 so my guess is that this has happened due to some "temporary" hiccup in file syncing. In any case, we can probably live without this fix and if we ever need to demux any of the 31 problematic sequencing runs again, it should be fixable by first renaming of copying TileMetrics.bin
to TileMetricsOut.bin
.
comment:6 by , 5 years ago
comment:7 by , 5 years ago
Fork release 1 - Item 1, 2 and 3
Fork release 2 - Item 2
The changes are all related to separating error messages from other messages. The benefit is that it makes it easier for error handling code to catch and display the actual error message for the user instead of mixing this up with other messages. We can live without this, but may have to change some code so that we do not loose all messages. Will take some time test a go through everything.
comment:8 by , 5 years ago
Description: | modified (diff) |
---|
comment:9 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fork release 5 - Item 1
We do not use AddOrReplaceReadGroups any longer. It was removed from the legacy pipeline since we no longer keep BAM files there and in the new pipeline, Hisat has built-in support for this. See #983 and #997.