Cassandra Documentation

Version:

You are viewing the documentation for a prerelease version.

View Latest

Compression

Cassandra offers operators the ability to configure compression on a per-table basis. Compression reduces the size of data on disk by compressing the SSTable in user-configurable compression chunk_length_in_kb. As Cassandra SSTables are immutable, the CPU cost of compressing is only necessary when the SSTable is written - subsequent updates to data will land in different SSTables, so Cassandra will not need to decompress, overwrite, and recompress data when UPDATE commands are issued. On reads, Cassandra will locate the relevant compressed chunks on disk, decompress the full chunk, and then proceed with the remainder of the read path (merging data from disks and memtables, read repair, and so on).

Compression algorithms typically trade off between the following three areas:

  • Compression speed: How fast does the compression algorithm compress data. This is critical in the flush and compaction paths because data must be compressed before it is written to disk.

  • Decompression speed: How fast does the compression algorithm de-compress data. This is critical in the read and compaction paths as data must be read off disk in a full chunk and decompressed before it can be returned.

  • Ratio: By what ratio is the uncompressed data reduced by. Cassandra typically measures this as the size of data on disk relative to the uncompressed size. For example a ratio of 0.5 means that the data on disk is 50% the size of the uncompressed data. Cassandra exposes this ratio per table as the SSTable Compression Ratio field of nodetool tablestats.

Cassandra offers five compression algorithms by default that make different tradeoffs in these areas. While benchmarking compression algorithms depends on many factors (algorithm parameters such as compression level, the compressibility of the input data, underlying processor class, etc …​), the following table should help you pick a starting point based on your application’s requirements with an extremely rough grading of the different choices by their performance in these areas (A is relatively good, F is relatively bad):

Compression Algorithm Cassandra Class Compression Decompression Ratio C* Version

LZ4

LZ4Compressor

A+

A+

C+

>=1.2.2

LZ4HC

LZ4Compressor

C+

A+

B+

>= 3.6

Zstd

ZstdCompressor

A-

A-

A+

>= 4.0

Zstd with Dictionary

ZstdDictionaryCompressor

A-

A-

A++

>= 6.0

Snappy

SnappyCompressor

A-

A

C

>= 1.0

Deflate (zlib)

DeflateCompressor

C

C

A

>= 1.0

Generally speaking for a performance critical (latency or throughput) application LZ4 is the right choice as it gets excellent ratio per CPU cycle spent. This is why it is the default choice in Cassandra.

For storage critical applications (disk footprint), however, Zstd may be a better choice as it can get significant additional ratio to LZ4. For workloads with highly repetitive or similar data patterns, ZstdDictionaryCompressor can achieve even better compression ratios by training a compression dictionary on representative data samples.

Snappy is kept for backwards compatibility and LZ4 will typically be preferable.

Deflate is kept for backwards compatibility and Zstd will typically be preferable.

ZSTD Dictionary Compression

The ZstdDictionaryCompressor extends standard ZSTD compression by using trained compression dictionaries to achieve superior compression ratios, particularly for workloads with repetitive or similar data patterns.

How Dictionary Compression Works

Dictionary compression improves upon standard compression by training a compression dictionary on representative samples of your data. This dictionary captures common patterns, repeated strings, and data structures, allowing the compressor to reference these patterns more efficiently than discovering them independently in each compression chunk.

When to Use Dictionary Compression

Dictionary compression is most effective for:

  • Tables with similar row structures: JSON documents, XML data, or repeated data schemas benefit significantly from dictionary compression.

  • Storage-critical workloads: When disk space savings justify the additional operational overhead of dictionary training and management.

  • Large datasets with repetitive patterns: The more similar your data, the better the compression ratio improvement.

Dictionary compression may not be ideal for:

  • Highly random or unique data: Already-compressed data or cryptographic data will see minimal benefit.

  • Small tables: The overhead of dictionary management may outweigh the storage savings.

  • Frequently changing schemas: Schema changes may require retraining dictionaries to maintain optimal compression ratios.

Dictionary Training

Before dictionary compression can provide optimal results, a compression dictionary must be trained on representative data samples. Cassandra supports both manual and automatic training approaches.

Manual Dictionary Training

Use the nodetool compressiondictionary train command to manually train a compression dictionary:

nodetool compressiondictionary train <keyspace> <table>

The command trains a dictionary by sampling from existing SSTables. If no SSTables are available on disk (e.g., all data is in memtables), the command will automatically flush the memtable before sampling.

The training process completes synchronously and displays progress information including sample count, sample size, and elapsed time. Training typically completes within minutes for most workloads.

By default, training will only proceed if enough samples have been collected. To force training even with insufficient samples, use the --force or -f option:

nodetool compressiondictionary train --force <keyspace> <table>

This can be useful for testing or when you want to train a dictionary from limited data during initial setup.

Automatic Dictionary Training

Enable automatic training in cassandra.yaml:

compression_dictionary_training_auto_train_enabled: true
compression_dictionary_training_sampling_rate: 100  # 1% of writes

When enabled, Cassandra automatically samples write operations and trains dictionaries in the background based on the configured sampling rate (range: 1-10000, where 100 = 1% of writes).

Dictionary Storage and Distribution

Compression dictionaries are stored cluster-wide in the system_distributed.compression_dictionaries table. Each table can maintain multiple dictionary versions: the current dictionary for compressing new SSTables, plus historical dictionaries needed for reading older SSTables.

Dictionaries are identified by dict_id, with higher IDs representing newer dictionaries. Cassandra automatically refreshes dictionaries across the cluster based on configured intervals, and caches them locally to minimize lookup overhead.

Configuring Compression

Compression is configured on a per-table basis as an optional argument to CREATE TABLE or ALTER TABLE. Three options are available for all compressors:

  • class (default: LZ4Compressor): specifies the compression class to use. The two "fast" compressors are LZ4Compressor and SnappyCompressor and the two "good" ratio compressors are ZstdCompressor and DeflateCompressor.

  • chunk_length_in_kb (default: 16KiB): specifies the number of kilobytes of data per compression chunk. The main tradeoff here is that larger chunk sizes give compression algorithms more context and improve their ratio, but require reads to deserialize and read more off disk.

The LZ4Compressor supports the following additional options:

  • lz4_compressor_type (default fast): specifies if we should use the high (a.k.a LZ4HC) ratio version or the fast (a.k.a LZ4) version of LZ4. The high mode supports a configurable level, which can allow operators to tune the performance <→ ratio tradeoff via the lz4_high_compressor_level option. Note that in 4.0 and above it may be preferable to use the Zstd compressor.

  • lz4_high_compressor_level (default 9): A number between 1 and 17 inclusive that represents how much CPU time to spend trying to get more compression ratio. Generally lower levels are "faster" but they get less ratio and higher levels are slower but get more compression ratio.

The ZstdCompressor supports the following options in addition:

  • compression_level (default 3): A number between -131072 and 22 inclusive that represents how much CPU time to spend trying to get more compression ratio. The lower the level, the faster the speed (at the cost of ratio). Values from 20 to 22 are called "ultra levels" and should be used with caution, as they require more memory. The default of 3 is a good choice for competing with Deflate ratios and 1 is a good choice for competing with LZ4.

The ZstdDictionaryCompressor supports the same options as ZstdCompressor:

  • compression_level (default 3): Same range and behavior as ZstdCompressor. Dictionary compression provides improved ratios at any compression level compared to standard ZSTD.

ZstdDictionaryCompressor requires a trained compression dictionary to achieve optimal results. See the ZSTD Dictionary Compression section above for training instructions.

Users can set compression using the following syntax:

CREATE TABLE keyspace.table (id int PRIMARY KEY)
   WITH compression = {'class': 'LZ4Compressor'};

Or

ALTER TABLE keyspace.table
   WITH compression = {'class': 'LZ4Compressor', 'chunk_length_in_kb': 64};

For dictionary compression:

CREATE TABLE keyspace.table (id int PRIMARY KEY)
   WITH compression = {'class': 'ZstdDictionaryCompressor'};

Or with a specific compression level:

ALTER TABLE keyspace.table
   WITH compression = {
       'class': 'ZstdDictionaryCompressor',
       'compression_level': '3'
   };

Once enabled, compression can be disabled with ALTER TABLE setting enabled to false:

ALTER TABLE keyspace.table
   WITH compression = {'enabled':'false'};

Operators should be aware, however, that changing compression is not immediate. The data is compressed when the SSTable is written, and as SSTables are immutable, the compression will not be modified until the table is compacted. Upon issuing a change to the compression options via ALTER TABLE, the existing SSTables will not be modified until they are compacted - if an operator needs compression changes to take effect immediately, the operator can trigger an SSTable rewrite using nodetool scrub or nodetool upgradesstables -a, both of which will rebuild the SSTables on disk, re-compressing the data in the process.

Dictionary Compression Configuration

When using ZstdDictionaryCompressor, several additional configuration options are available in cassandra.yaml to control dictionary management, caching, and training behavior.

Dictionary Refresh Settings

  • compression_dictionary_refresh_interval (default: 3600): How often (in seconds) to check for and refresh compression dictionaries cluster-wide. Newly trained dictionaries will be picked up by all nodes within this interval.

  • compression_dictionary_refresh_initial_delay (default: 10): Initial delay (in seconds) before the first dictionary refresh check after node startup.

Dictionary Caching

  • compression_dictionary_cache_size (default: 10): Maximum number of compression dictionaries to cache per table. Higher values reduce lookup overhead but increase memory usage.

  • compression_dictionary_cache_expire (default: 3600): Dictionary cache entry TTL in seconds. Expired entries are evicted and reloaded on next access.

Training Configuration

  • compression_dictionary_training_max_dictionary_size (default: 65536): Maximum size of trained dictionaries in bytes. Larger dictionaries can capture more patterns but increase memory overhead.

  • compression_dictionary_training_max_total_sample_size (default: 10485760): Maximum total size of sample data to collect for training, approximately 10MB.

  • compression_dictionary_training_auto_train_enabled (default: false): Enable automatic background dictionary training. When enabled, Cassandra samples writes and trains dictionaries automatically.

  • compression_dictionary_training_sampling_rate (default: 100): Sampling rate for automatic training, range 1-10000 where 100 = 1% of writes. Lower values reduce training overhead but may miss data patterns.

Example configuration:

# Dictionary refresh and caching
compression_dictionary_refresh_interval: 3600
compression_dictionary_refresh_initial_delay: 10
compression_dictionary_cache_size: 10
compression_dictionary_cache_expire: 3600

# Automatic training
compression_dictionary_training_auto_train_enabled: false
compression_dictionary_training_sampling_rate: 100
compression_dictionary_training_max_dictionary_size: 65536
compression_dictionary_training_max_total_sample_size: 10485760

Other options

  • crc_check_chance (default: 1.0): determines how likely Cassandra is to verify the checksum on each compression chunk during reads to protect against data corruption. Unless you have profiles indicating this is a performance problem it is highly encouraged not to turn this off as it is Cassandra’s only protection against bitrot. In earlier versions of Cassandra a duplicate of this option existed in the compression configuration. The latter was deprecated in Cassandra 3.0 and removed in Cassandra 5.0.

Benefits and Uses

Compression’s primary benefit is that it reduces the amount of data written to disk. Not only does the reduced size save in storage requirements, it often increases read and write throughput, as the CPU overhead of compressing data is faster than the time it would take to read or write the larger volume of uncompressed data from disk.

Compression is most useful in tables comprised of many rows, where the rows are similar in nature. Tables containing similar text columns (such as repeated JSON blobs) often compress very well. Tables containing data that has already been compressed or random data (e.g. benchmark datasets) do not typically compress well.

Operational Impact

  • Compression metadata is stored off-heap and scales with data on disk. This often requires 1-3GB of off-heap RAM per terabyte of data on disk, though the exact usage varies with chunk_length_in_kb and compression ratios.

  • Streaming operations involve compressing and decompressing data on compressed tables - in some code paths (such as non-vnode bootstrap), the CPU overhead of compression can be a limiting factor.

  • To prevent slow compressors (Zstd, Deflate, LZ4HC) from blocking flushes for too long, all three flush with the default fast LZ4 compressor and then rely on normal compaction to re-compress the data into the desired compression strategy. See CASSANDRA-15379 for more details.

  • The compression path checksums data to ensure correctness - while the traditional Cassandra read path does not have a way to ensure correctness of data on disk, compressed tables allow the user to set crc_check_chance (a float from 0.0 to 1.0) to allow Cassandra to probabilistically validate chunks on read to verify bits on disk are not corrupt.

Dictionary Compression Operational Considerations

When using ZstdDictionaryCompressor, additional operational factors apply:

  • Dictionary Storage: Compression dictionaries are stored in the system_distributed.compression_dictionaries table and replicated cluster-wide. Each table maintains current and historical dictionary versions.

  • Dictionary Cache Memory: Dictionaries are cached locally on each node according to compression_dictionary_cache_size. Memory overhead is typically minimal (default 64KB per dictionary × cache size).

  • Dictionary Training Overhead: Manual training via nodetool compressiondictionary train samples SSTable chunk data and performs CPU-intensive dictionary training. Consider running training during off-peak hours.

  • Automatic Training Impact: When compression_dictionary_training_auto_train_enabled is true, write operations are sampled based on compression_dictionary_training_sampling_rate. This adds minimal overhead but should be monitored in write-intensive workloads.

  • Dictionary Refresh: The dictionary refresh process (compression_dictionary_refresh_interval) checks for new dictionaries cluster-wide. The default 1-hour interval balances freshness with overhead.

  • SSTable Compatibility: Each SSTable is compressed with a specific dictionary version. Historical dictionaries must be retained to read older SSTables until they are compacted with new dictionaries.

  • Schema Changes: Significant schema changes or data pattern shifts may require retraining dictionaries to maintain optimal compression ratios. Monitor the SSTable Compression Ratio via nodetool tablestats to detect degradation.

Available nodetool commands for compressiondictionary

There are these four commands for now related to compression dictionaries:

  • train - training a dictionary, described above.

  • list - a user can list all dictionaries for given keyspace and table

  • export - a user can export a compression dictionary of a keyspace and a table, either the last one or by a specific id, to a file.

  • import - a user can import a compression dictionary, exported by above command, from a file to a cluster.

Importing a dictionary to a table from a file should happen only against one node at a time as dictionary will be eventually stored in system_distributed.compression_dictionaries table and reused cluster-wide. When imports happen from multiple nodes, the highest-version dictionary will be used.

Advanced Use

Advanced users can provide their own compression class by implementing the interface at org.apache.cassandra.io.compress.ICompressor.