#1592 closed defect (fixed)
Improve performance for indexing the reference VCF files
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | critical | Milestone: | Variant Search v1.12 |
Component: | net.sf.basedb.varsearch | Keywords: | |
Cc: |
Description
This is very slow with the imputed genotypes data set. It takes almost 20 hours and then it crashes due to the database connection is closed since it has not been used for a long time.
net.sf.basedb.core.ConnectionClosedException: The connection has been closed. at net.sf.basedb.core.DbControl.commit(DbControl.java:442) at net.sf.basedb.varsearch.index.SplitIndex.rebuildReferenceIndex(SplitIndex.java:352) at net.sf.basedb.varsearch.index.SplitIndex.doCustomAction(SplitIndex.java:243) at net.sf.basedb.varsearch.service.VarSearchService.autoUpdateIndexes(VarSearchService.java:366) at net.sf.basedb.varsearch.service.VarSearchService$IndexUpdateTimerTask.run(VarSearchService.java:476) at net.sf.basedb.util.timer.ThreadTimerTask$1.run(ThreadTimerTask.java:94) at java.base/java.lang.Thread.run(Thread.java:833)
There are multiple things that need to be fixed. First of all, we need to speed up the actual index. But it will take a long time no matter what so we also need to fix the database connection timeout issue.
I also noted that it is possible to abort the indexing process by stopping the service but that will replace the existing index (if there is one) with the new index that is only partly completed. In the log there is a message that the build is about to be aborted, but then the next message is that the index was built successfully.
Change History (5)
comment:1 by , 8 months ago
comment:5 by , 8 months ago
Seems to work good now on the production server. Total speedup with 8 indexing threads is from ~1000 lines/second to ~20000 lines/second.
In 7699: