Compression
Cassandra offers operators the ability to configure compression on a
per-table basis. Compression reduces the size of data on disk by
compressing the SSTable in user-configurable compression
chunk_length_in_kb. As Cassandra SSTables are immutable, the CPU cost
of compressing is only necessary when the SSTable is written -
subsequent updates to data will land in different SSTables, so Cassandra
will not need to decompress, overwrite, and recompress data when UPDATE
commands are issued. On reads, Cassandra will locate the relevant
compressed chunks on disk, decompress the full chunk, and then proceed
with the remainder of the read path (merging data from disks and
memtables, read repair, and so on).
Compression algorithms typically trade off between the following three areas:
-
Compression speed: How fast does the compression algorithm compress data. This is critical in the flush and compaction paths because data must be compressed before it is written to disk.
-
Decompression speed: How fast does the compression algorithm de-compress data. This is critical in the read and compaction paths as data must be read off disk in a full chunk and decompressed before it can be returned.
-
Ratio: By what ratio is the uncompressed data reduced by. Cassandra typically measures this as the size of data on disk relative to the uncompressed size. For example a ratio of
0.5means that the data on disk is 50% the size of the uncompressed data. Cassandra exposes this ratio per table as theSSTable Compression Ratiofield ofnodetool tablestats.
Cassandra offers five compression algorithms by default that make different tradeoffs in these areas. While benchmarking compression algorithms depends on many factors (algorithm parameters such as compression level, the compressibility of the input data, underlying processor class, etc …), the following table should help you pick a starting point based on your application’s requirements with an extremely rough grading of the different choices by their performance in these areas (A is relatively good, F is relatively bad):
| Compression Algorithm | Cassandra Class | Compression | Decompression | Ratio | C* Version |
|---|---|---|---|---|---|
|
A+ |
A+ |
C+ |
|
|
|
C+ |
A+ |
B+ |
|
|
|
A- |
A- |
A+ |
|
|
|
A- |
A- |
A++ |
|
|
|
A- |
A |
C |
|
|
|
C |
C |
A |
|
Generally speaking for a performance critical (latency or throughput)
application LZ4 is the right choice as it gets excellent ratio per CPU
cycle spent. This is why it is the default choice in Cassandra.
For storage critical applications (disk footprint), however, Zstd may
be a better choice as it can get significant additional ratio to LZ4.
For workloads with highly repetitive or similar data patterns,
ZstdDictionaryCompressor can achieve even better compression ratios by
training a compression dictionary on representative data samples.
Snappy is kept for backwards compatibility and LZ4 will typically be
preferable.
Deflate is kept for backwards compatibility and Zstd will typically
be preferable.
ZSTD Dictionary Compression
The ZstdDictionaryCompressor extends standard ZSTD compression by using
trained compression dictionaries to achieve superior compression ratios,
particularly for workloads with repetitive or similar data patterns.
How Dictionary Compression Works
Dictionary compression improves upon standard compression by training a compression dictionary on representative samples of your data. This dictionary captures common patterns, repeated strings, and data structures, allowing the compressor to reference these patterns more efficiently than discovering them independently in each compression chunk.
When to Use Dictionary Compression
Dictionary compression is most effective for:
-
Tables with similar row structures: JSON documents, XML data, or repeated data schemas benefit significantly from dictionary compression.
-
Storage-critical workloads: When disk space savings justify the additional operational overhead of dictionary training and management.
-
Large datasets with repetitive patterns: The more similar your data, the better the compression ratio improvement.
Dictionary compression may not be ideal for:
-
Highly random or unique data: Already-compressed data or cryptographic data will see minimal benefit.
-
Small tables: The overhead of dictionary management may outweigh the storage savings.
-
Frequently changing schemas: Schema changes may require retraining dictionaries to maintain optimal compression ratios.
Dictionary Training
Before dictionary compression can provide optimal results, a compression dictionary must be trained on representative data samples. Cassandra supports both manual and automatic training approaches.
Manual Dictionary Training
Use the nodetool compressiondictionary train command to manually train
a compression dictionary:
nodetool compressiondictionary train <keyspace> <table>
The command trains a dictionary by sampling from existing SSTables. If no SSTables are available on disk (e.g., all data is in memtables), the command will automatically flush the memtable before sampling.
The training process completes synchronously and displays progress information including sample count, sample size, and elapsed time. Training typically completes within minutes for most workloads.
By default, training will only proceed if enough samples have been collected.
To force training even with insufficient samples, use the --force or -f option:
nodetool compressiondictionary train --force <keyspace> <table>
This can be useful for testing or when you want to train a dictionary from limited data during initial setup.
Automatic Dictionary Training
Enable automatic training in cassandra.yaml:
compression_dictionary_training_auto_train_enabled: true
compression_dictionary_training_sampling_rate: 100 # 1% of writes
When enabled, Cassandra automatically samples write operations and trains dictionaries in the background based on the configured sampling rate (range: 1-10000, where 100 = 1% of writes).
Dictionary Storage and Distribution
Compression dictionaries are stored cluster-wide in the
system_distributed.compression_dictionaries table. Each table can
maintain multiple dictionary versions: the current dictionary for
compressing new SSTables, plus historical dictionaries needed for
reading older SSTables.
Dictionaries are identified by dict_id, with higher IDs representing
newer dictionaries. Cassandra automatically refreshes dictionaries
across the cluster based on configured intervals, and caches them
locally to minimize lookup overhead.
Configuring Compression
Compression is configured on a per-table basis as an optional argument
to CREATE TABLE or ALTER TABLE. Three options are available for all
compressors:
-
class(default:LZ4Compressor): specifies the compression class to use. The two "fast" compressors areLZ4CompressorandSnappyCompressorand the two "good" ratio compressors areZstdCompressorandDeflateCompressor. -
chunk_length_in_kb(default:16KiB): specifies the number of kilobytes of data per compression chunk. The main tradeoff here is that larger chunk sizes give compression algorithms more context and improve their ratio, but require reads to deserialize and read more off disk.
The LZ4Compressor supports the following additional options:
-
lz4_compressor_type(defaultfast): specifies if we should use thehigh(a.k.aLZ4HC) ratio version or thefast(a.k.aLZ4) version ofLZ4. Thehighmode supports a configurable level, which can allow operators to tune the performance <→ ratio tradeoff via thelz4_high_compressor_leveloption. Note that in4.0and above it may be preferable to use theZstdcompressor. -
lz4_high_compressor_level(default9): A number between1and17inclusive that represents how much CPU time to spend trying to get more compression ratio. Generally lower levels are "faster" but they get less ratio and higher levels are slower but get more compression ratio.
The ZstdCompressor supports the following options in addition:
-
compression_level(default3): A number between-131072and22inclusive that represents how much CPU time to spend trying to get more compression ratio. The lower the level, the faster the speed (at the cost of ratio). Values from 20 to 22 are called "ultra levels" and should be used with caution, as they require more memory. The default of3is a good choice for competing withDeflateratios and1is a good choice for competing withLZ4.
The ZstdDictionaryCompressor supports the same options as
ZstdCompressor:
-
compression_level(default3): Same range and behavior asZstdCompressor. Dictionary compression provides improved ratios at any compression level compared to standard ZSTD.
ZstdDictionaryCompressor requires a trained compression
dictionary to achieve optimal results. See the ZSTD Dictionary
Compression section above for training instructions.
|
Users can set compression using the following syntax:
CREATE TABLE keyspace.table (id int PRIMARY KEY)
WITH compression = {'class': 'LZ4Compressor'};
Or
ALTER TABLE keyspace.table
WITH compression = {'class': 'LZ4Compressor', 'chunk_length_in_kb': 64};
For dictionary compression:
CREATE TABLE keyspace.table (id int PRIMARY KEY)
WITH compression = {'class': 'ZstdDictionaryCompressor'};
Or with a specific compression level:
ALTER TABLE keyspace.table
WITH compression = {
'class': 'ZstdDictionaryCompressor',
'compression_level': '3'
};
Once enabled, compression can be disabled with ALTER TABLE setting
enabled to false:
ALTER TABLE keyspace.table
WITH compression = {'enabled':'false'};
Operators should be aware, however, that changing compression is not
immediate. The data is compressed when the SSTable is written, and as
SSTables are immutable, the compression will not be modified until the
table is compacted. Upon issuing a change to the compression options via
ALTER TABLE, the existing SSTables will not be modified until they are
compacted - if an operator needs compression changes to take effect
immediately, the operator can trigger an SSTable rewrite using
nodetool scrub or nodetool upgradesstables -a, both of which will
rebuild the SSTables on disk, re-compressing the data in the process.
Dictionary Compression Configuration
When using ZstdDictionaryCompressor, several additional configuration
options are available in cassandra.yaml to control dictionary
management, caching, and training behavior.
Dictionary Refresh Settings
-
compression_dictionary_refresh_interval(default:3600): How often (in seconds) to check for and refresh compression dictionaries cluster-wide. Newly trained dictionaries will be picked up by all nodes within this interval. -
compression_dictionary_refresh_initial_delay(default:10): Initial delay (in seconds) before the first dictionary refresh check after node startup.
Dictionary Caching
-
compression_dictionary_cache_size(default:10): Maximum number of compression dictionaries to cache per table. Higher values reduce lookup overhead but increase memory usage. -
compression_dictionary_cache_expire(default:3600): Dictionary cache entry TTL in seconds. Expired entries are evicted and reloaded on next access.
Training Configuration
-
compression_dictionary_training_max_dictionary_size(default:65536): Maximum size of trained dictionaries in bytes. Larger dictionaries can capture more patterns but increase memory overhead. -
compression_dictionary_training_max_total_sample_size(default:10485760): Maximum total size of sample data to collect for training, approximately 10MB. -
compression_dictionary_training_auto_train_enabled(default:false): Enable automatic background dictionary training. When enabled, Cassandra samples writes and trains dictionaries automatically. -
compression_dictionary_training_sampling_rate(default:100): Sampling rate for automatic training, range 1-10000 where 100 = 1% of writes. Lower values reduce training overhead but may miss data patterns.
Example configuration:
# Dictionary refresh and caching
compression_dictionary_refresh_interval: 3600
compression_dictionary_refresh_initial_delay: 10
compression_dictionary_cache_size: 10
compression_dictionary_cache_expire: 3600
# Automatic training
compression_dictionary_training_auto_train_enabled: false
compression_dictionary_training_sampling_rate: 100
compression_dictionary_training_max_dictionary_size: 65536
compression_dictionary_training_max_total_sample_size: 10485760
Other options
-
crc_check_chance(default:1.0): determines how likely Cassandra is to verify the checksum on each compression chunk during reads to protect against data corruption. Unless you have profiles indicating this is a performance problem it is highly encouraged not to turn this off as it is Cassandra’s only protection against bitrot. In earlier versions of Cassandra a duplicate of this option existed in the compression configuration. The latter was deprecated in Cassandra 3.0 and removed in Cassandra 5.0.
Benefits and Uses
Compression’s primary benefit is that it reduces the amount of data written to disk. Not only does the reduced size save in storage requirements, it often increases read and write throughput, as the CPU overhead of compressing data is faster than the time it would take to read or write the larger volume of uncompressed data from disk.
Compression is most useful in tables comprised of many rows, where the rows are similar in nature. Tables containing similar text columns (such as repeated JSON blobs) often compress very well. Tables containing data that has already been compressed or random data (e.g. benchmark datasets) do not typically compress well.
Operational Impact
-
Compression metadata is stored off-heap and scales with data on disk. This often requires 1-3GB of off-heap RAM per terabyte of data on disk, though the exact usage varies with
chunk_length_in_kband compression ratios. -
Streaming operations involve compressing and decompressing data on compressed tables - in some code paths (such as non-vnode bootstrap), the CPU overhead of compression can be a limiting factor.
-
To prevent slow compressors (
Zstd,Deflate,LZ4HC) from blocking flushes for too long, all three flush with the default fastLZ4compressor and then rely on normal compaction to re-compress the data into the desired compression strategy. See CASSANDRA-15379 for more details. -
The compression path checksums data to ensure correctness - while the traditional Cassandra read path does not have a way to ensure correctness of data on disk, compressed tables allow the user to set
crc_check_chance(a float from 0.0 to 1.0) to allow Cassandra to probabilistically validate chunks on read to verify bits on disk are not corrupt.
Dictionary Compression Operational Considerations
When using ZstdDictionaryCompressor, additional operational factors
apply:
-
Dictionary Storage: Compression dictionaries are stored in the
system_distributed.compression_dictionariestable and replicated cluster-wide. Each table maintains current and historical dictionary versions. -
Dictionary Cache Memory: Dictionaries are cached locally on each node according to
compression_dictionary_cache_size. Memory overhead is typically minimal (default 64KB per dictionary × cache size). -
Dictionary Training Overhead: Manual training via
nodetool compressiondictionary trainsamples SSTable chunk data and performs CPU-intensive dictionary training. Consider running training during off-peak hours. -
Automatic Training Impact: When
compression_dictionary_training_auto_train_enabledis true, write operations are sampled based oncompression_dictionary_training_sampling_rate. This adds minimal overhead but should be monitored in write-intensive workloads. -
Dictionary Refresh: The dictionary refresh process (
compression_dictionary_refresh_interval) checks for new dictionaries cluster-wide. The default 1-hour interval balances freshness with overhead. -
SSTable Compatibility: Each SSTable is compressed with a specific dictionary version. Historical dictionaries must be retained to read older SSTables until they are compacted with new dictionaries.
-
Schema Changes: Significant schema changes or data pattern shifts may require retraining dictionaries to maintain optimal compression ratios. Monitor the
SSTable Compression Ratiovianodetool tablestatsto detect degradation.
Available nodetool commands for compressiondictionary
There are these four commands for now related to compression dictionaries:
-
train - training a dictionary, described above.
-
list - a user can list all dictionaries for given keyspace and table
-
export - a user can export a compression dictionary of a keyspace and a table, either the last one or by a specific id, to a file.
-
import - a user can import a compression dictionary, exported by above command, from a file to a cluster.
Importing a dictionary to a table from a file should happen only against one node at a time as
dictionary will be eventually stored in system_distributed.compression_dictionaries table and reused
cluster-wide. When imports happen from multiple nodes, the highest-version dictionary will be used.
