#1508 closed enhancement (fixed)
Use Apache Commons Compress instead of builtin GZIPInputStream
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | Variant Search v1.9 |
Component: | net.sf.basedb.varsearch | Keywords: | |
Cc: |
Description
One of our servers is running with Java 11 (OpenJDK Runtime Environment 11.0.17+8-LTS). There is a problem with parsing VCF files from the variant calling (variants-annotated.vcf.gz) that has been compressed with bgzip. The stack trace:
net.sf.basedb.core.InvalidDataException: Could not find line #41612: variants-annotated.vcf.gz at net.sf.basedb.varsearch.servlet.HitServlet.doGet(HitServlet.java:106) at javax.servlet.http.HttpServlet.service(HttpServlet.java:634) at javax.servlet.http.HttpServlet.service(HttpServlet.java:741) ...
The same file can be downloaded and de-compressed with other tools and it does have more than the specified number of lines. And we don't see the problem on our other server with a newer Java version.
The problem seems to be related to the multi-block format that bgzip uses (this is required for indexing and quick access to random locations in the file).
There is an Apache project with an alternate implementation that seems to work. https://commons.apache.org/proper/commons-compress/
In 7302: