Operations | Crossref Knowledge Base

Operations

Memory usage notes for CDDB

The CDDB servers host a number of Berkley Dbs that are periodically updated by master/servant push services. These are set to have 500,000k cache each from the heap. While this seems OK, especially on machines using SSD drive systems, the advice from CDDB is to increase it to the size of the index. Unfortunately most of our indexes are too large to make that feasible.

The default configuration for tomcats for the content system is to have a heap of 8G and -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSFullGCsBeforeCompaction=1 garbage collection methods.

The problem we ran into was the MixedRelationship data does not use Berkley Db but instead creates an index in memory using HashSets. That results in the entire mixed relations table (about 130M records) is loaded into memory into 3 HashSet indexes creating 3 Objects of MixedRelationship having a String and 2 MixedParticipant objects which has 2 Strings. The garbage collection causes the processing of the service to pause from 40 to 70 seconds when it invokes. Sometimes it invokes during the creation and sometimes after the creation of the sets. Simply increasing the heap space available caused the GC to occur almost always after the creation and did not, by itself, fix the pause issue. Having tried the other GC schemes, the most stable turned out to be G1GC, which, again, by itself was insufficient to not block. Using both an increase in heapspace to allow for both indexes to be loaded and using the G1GC allows the process to not normally block long enough to cause the outage.

The outages were shown as down time on HAProxy, which first has 503 errors because the service didn’t respond quickly enough, but then marks the servers down, once they’re all down, there is no service available to 503 are generated with <NOSRV> and the HAProxy log messages switch from sC to SC (codes can be looked up http://cbonte.github.io/haproxy-dconv/1.6/configuration.html#8.5 )