Thread: ZStandard (with dictionaries) compression support for TOAST compression
ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi all,
The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers significantly improved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes. Attached find for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained dictionary achieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed LZ4/pglz by about two times during compression of data.
Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that regard. To use the dictionary support requires training and storing a dictionary for a given variable-length column type. On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the ZStandard training API which will return a dictionary. In the example, the column is of JSONB type. The SQL function takes the table name and the attribute number as inputs. If the training is successful, it will return true; otherwise, it will return false.
‘’‘
test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)
‘’‘
The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method to write other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option called ‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own type-specific training function by handling sampling logic and returning the necessary information for the ZStandard training API in “ZstdTrainingData” format.
```
typedef struct ZstdTrainingData
{
char *sample_buffer; /* Pointer to the raw sample buffer */
size_t *sample_sizes; /* Array of sample sizes */
int nitems; /* Number of sample sizes */
} ZstdTrainingData;
```
This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the dictionary catalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary ID for that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and use it to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA.
We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’
```
test=# \d pg_zstd_dictionaries
Table "pg_catalog.pg_zstd_dictionaries"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
dictid | oid | | not null |
dict | bytea | | not null |
Indexes:
"pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
```
This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest dictionary will be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a dictionary is not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For decompression, the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used for compression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform decompression.
#############################################################################
Enter toast compression framework changes,
We identify a compressed datum compression algorithm using the top two bits of va_tcinfo (varattrib_4b.va_compressed).
It is possible to have four compression methods. However, based on previous community email discussions regarding toast compression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion has been made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible.
```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
uint32 va_cmp_alg; /* The additional compression algorithms
* information. */
} varatt_external;
```
As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and decompression routines related to ZStandard as needed. These are major design changes in the patch to incorporate ZStandard with dictionary compression.
Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would appreciate your thoughts on the on-disk changes that result from this.
kind regards,
Nikhil Veldanda
Amazon Web Services: https://aws.amazon.com
[1] https://facebook.github.io/zstd/
[2] https://github.com/facebook/zstd
[3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz
The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers significantly improved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes. Attached find for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained dictionary achieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed LZ4/pglz by about two times during compression of data.
Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that regard. To use the dictionary support requires training and storing a dictionary for a given variable-length column type. On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the ZStandard training API which will return a dictionary. In the example, the column is of JSONB type. The SQL function takes the table name and the attribute number as inputs. If the training is successful, it will return true; otherwise, it will return false.
‘’‘
test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)
‘’‘
The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method to write other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option called ‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own type-specific training function by handling sampling logic and returning the necessary information for the ZStandard training API in “ZstdTrainingData” format.
```
typedef struct ZstdTrainingData
{
char *sample_buffer; /* Pointer to the raw sample buffer */
size_t *sample_sizes; /* Array of sample sizes */
int nitems; /* Number of sample sizes */
} ZstdTrainingData;
```
This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the dictionary catalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary ID for that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and use it to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA.
We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’
```
test=# \d pg_zstd_dictionaries
Table "pg_catalog.pg_zstd_dictionaries"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
dictid | oid | | not null |
dict | bytea | | not null |
Indexes:
"pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
```
This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest dictionary will be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a dictionary is not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For decompression, the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used for compression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform decompression.
#############################################################################
Enter toast compression framework changes,
We identify a compressed datum compression algorithm using the top two bits of va_tcinfo (varattrib_4b.va_compressed).
It is possible to have four compression methods. However, based on previous community email discussions regarding toast compression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion has been made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible.
```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
uint32 va_cmp_alg; /* The additional compression algorithms
* information. */
} varatt_external;
```
As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and decompression routines related to ZStandard as needed. These are major design changes in the patch to incorporate ZStandard with dictionary compression.
Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would appreciate your thoughts on the on-disk changes that result from this.
kind regards,
Nikhil Veldanda
Amazon Web Services: https://aws.amazon.com
[1] https://facebook.github.io/zstd/
[2] https://github.com/facebook/zstd
[3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz
Attachment
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Kirill Reshke
Date:
On Thu, 6 Mar 2025 at 08:43, Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> wrote: > > Hi all, > > The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers significantlyimproved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes. Attachedfind for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained dictionaryachieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed LZ4/pglzby about two times during compression of data. > > Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levelsof compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that regard.To use the dictionary support requires training and storing a dictionary for a given variable-length column type.On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the ZStandardtraining API which will return a dictionary. In the example, the column is of JSONB type. The SQL function takesthe table name and the attribute number as inputs. If the training is successful, it will return true; otherwise, itwill return false. > > ‘’‘ > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1); > build_zstd_dict_for_attribute > ------------------------------- > t > (1 row) > ‘’‘ > > The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method towrite other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option called‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own type-specifictraining function by handling sampling logic and returning the necessary information for the ZStandard trainingAPI in “ZstdTrainingData” format. > > ``` > typedef struct ZstdTrainingData > { > char *sample_buffer; /* Pointer to the raw sample buffer */ > size_t *sample_sizes; /* Array of sample sizes */ > int nitems; /* Number of sample sizes */ > } ZstdTrainingData; > ``` > This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the dictionarycatalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary IDfor that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and useit to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT, andBYTEA. > > We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’ > > ``` > test=# \d pg_zstd_dictionaries > Table "pg_catalog.pg_zstd_dictionaries" > Column | Type | Collation | Nullable | Default > --------+-------+-----------+----------+--------- > dictid | oid | | not null | > dict | bytea | | not null | > Indexes: > "pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid) > ``` > > This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest dictionarywill be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a dictionaryis not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For decompression,the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used forcompression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform decompression. > > ############################################################################# > > Enter toast compression framework changes, > > We identify a compressed datum compression algorithm using the top two bits of va_tcinfo (varattrib_4b.va_compressed). > It is possible to have four compression methods. However, based on previous community email discussions regarding toastcompression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion hasbeen made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’ and‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible. > > ``` > typedef union > { > struct /* Normal varlena (4-byte length) */ > { > uint32 va_header; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_4byte; > struct /* Compressed-in-line format */ > { > uint32 va_header; > uint32 va_tcinfo; /* Original data size (excludes header) and > * compression method; see va_extinfo */ > char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */ > } va_compressed; > struct > { > uint32 va_header; > uint32 va_tcinfo; > uint32 va_cmp_alg; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_compressed_ext; > } varattrib_4b; > > typedef struct varatt_external > { > int32 va_rawsize; /* Original data size (includes header) */ > uint32 va_extinfo; /* External saved size (without header) and > * compression method */ > Oid va_valueid; /* Unique ID of value within TOAST table */ > Oid va_toastrelid; /* RelID of TOAST table containing it */ > uint32 va_cmp_alg; /* The additional compression algorithms > * information. */ > } varatt_external; > ``` > > As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and decompressionroutines related to ZStandard as needed. These are major design changes in the patch to incorporate ZStandardwith dictionary compression. > > Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would appreciateyour thoughts on the on-disk changes that result from this. > > kind regards, > > Nikhil Veldanda > Amazon Web Services: https://aws.amazon.com > > [1] https://facebook.github.io/zstd/ > [2] https://github.com/facebook/zstd > [3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz > Hi! I generally love this idea, however I am not convinced in-core support this is the right direction here. Maybe we can introduce some API infrastructure here to allow delegating compression to extension's? This is merely my opinion; perhaps dealing with a redo is not worthwhile. I did a brief lookup on patch v1. I feel like this is too much for a single patch. Take, for example this change: ``` -#define NO_LZ4_SUPPORT() \ +#define NO_METHOD_SUPPORT(method) \ ereport(ERROR, \ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \ - errmsg("compression method lz4 not supported"), \ - errdetail("This functionality requires the server to be built with lz4 support."))) + errmsg("compression method %s not supported", method), \ + errdetail("This functionality requires the server to be built with %s support.", method))) ``` This could be a separate preliminary refactoring patch in series. Perhaps we need to divide the patch into smaller pieces if we follow the suggested course of this thread (in-core support). I will try to give another in-depth look here soon. -- Best regards, Kirill Reshke
06.03.2025 08:32, Nikhil Kumar Veldanda пишет: > Hi all, > > The ZStandard compression algorithm [1][2], though not currently used for > TOAST compression in PostgreSQL, offers significantly improved compression > ratios compared to lz4/pglz in both dictionary-based and non-dictionary > modes. Attached find for review my patch to add ZStandard compression to > Postgres. In tests this patch used with a pre-trained dictionary achieved > up to four times the compression ratio of LZ4, while ZStandard without a > dictionary outperformed LZ4/pglz by about two times during compression of data. > > Notably, this is the first compression algorithm for Postgres that can make > use of a dictionary to provide higher levels of compression, but > dictionaries have to be generated and maintained, and so I’ve had to break > new ground in that regard. To use the dictionary support requires training > and storing a dictionary for a given variable-length column type. On a > variable-length column, a SQL function will be called. It will sample the > column’s data and feed it into the ZStandard training API which will return > a dictionary. In the example, the column is of JSONB type. The SQL function > takes the table name and the attribute number as inputs. If the training is > successful, it will return true; otherwise, it will return false. > > ‘’‘ > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1); > build_zstd_dict_for_attribute > ------------------------------- > t > (1 row) > ‘’‘ > > The sampling logic and data to feed to the ZStandard training API can vary > by data type. The patch includes an method to write other type-specific > training functions and includes a default for JSONB, TEXT and BYTEA. There > is a new option called ‘build_zstd_dict’ that takes a function name as > input in ‘CREATE TYPE’. In this way anyone can write their own type- > specific training function by handling sampling logic and returning the > necessary information for the ZStandard training API in “ZstdTrainingData” > format. > > ``` > typedef struct ZstdTrainingData > { > char *sample_buffer; /* Pointer to the raw sample buffer */ > size_t *sample_sizes; /* Array of sample sizes */ > int nitems; /* Number of sample sizes */ > } ZstdTrainingData; > ``` > This information is feed into the ZStandard train API, which generates a > dictionary and inserts it into the dictionary catalog table. Additionally, > we update the ‘pg_attribute’ attribute options to include the unique > dictionary ID for that specific attribute. During compression, based on the > available dictionary ID, we retrieve the dictionary and use it to compress > the documents. I’ve created standard training function > (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA. > > We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’ > > ``` > test=# \d pg_zstd_dictionaries > Table "pg_catalog.pg_zstd_dictionaries" > Column | Type | Collation | Nullable | Default > --------+-------+-----------+----------+--------- > dictid | oid | | not null | > dict | bytea | | not null | > Indexes: > "pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid) > ``` > > This is the entire ZStandard dictionary infrastructure. A column can have > multiple dictionaries. The latest dictionary will be identified by the > pg_attribute attoptions. We never delete dictionaries once they are > generated. If a dictionary is not provided and attcompression is set to > zstd, we compress with ZStandard without dictionary. For decompression, the > zstd-compressed frame contains a dictionary identifier (dictid) that > indicates the dictionary used for compression. By retrieving this dictid > from the zstd frame, we then fetch the corresponding dictionary and perform > decompression. > > ############################################################################# > > Enter toast compression framework changes, > > We identify a compressed datum compression algorithm using the top two bits > of va_tcinfo (varattrib_4b.va_compressed). > It is possible to have four compression methods. However, based on previous > community email discussions regarding toast compression changes[3], the > idea of using it for a new compression algorithm has been rejected, and a > suggestion has been made to extend it which I’ve implemented in this patch. > This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’ > on disk structures. I’ve made sure that this changes are backward compatible. > > ``` > typedef union > { > struct /* Normal varlena (4-byte length) */ > { > uint32 va_header; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_4byte; > struct /* Compressed-in-line format */ > { > uint32 va_header; > uint32 va_tcinfo; /* Original data size (excludes header) and > * compression method; see va_extinfo */ > char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */ > } va_compressed; > struct > { > uint32 va_header; > uint32 va_tcinfo; > uint32 va_cmp_alg; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_compressed_ext; > } varattrib_4b; > > typedef struct varatt_external > { > int32 va_rawsize; /* Original data size (includes header) */ > uint32 va_extinfo; /* External saved size (without header) and > * compression method */ > Oid va_valueid; /* Unique ID of value within TOAST table */ > Oid va_toastrelid; /* RelID of TOAST table containing it */ > uint32 va_cmp_alg; /* The additional compression algorithms > * information. */ > } varatt_external; > ``` > > As I need to update this structs, I’ve made changes to the existing macros. > Additionally added compression and decompression routines related to > ZStandard as needed. These are major design changes in the patch to > incorporate ZStandard with dictionary compression. > > Please let me know what you think about all this. Are there any concerns > with my approach? In particular, I would appreciate your thoughts on the > on-disk changes that result from this. > > kind regards, > > Nikhil Veldanda > Amazon Web Services: https://aws.amazon.com <https://aws.amazon.com/> > > [1] https://facebook.github.io/zstd/ <https://facebook.github.io/zstd/> > [2] https://github.com/facebook/zstd <https://github.com/facebook/zstd> > [3] https://www.postgresql.org/message-id/flat/ > YoMiNmkztrslDbNS%40paquier.xyz <https://www.postgresql.org/message-id/flat/ > YoMiNmkztrslDbNS%40paquier.xyz> Overall idea is great. I just want to mention LZ4 also have API to use dictionary. Its dictionary will be as simple as "virtually prepended" text (in contrast to complex ZStd dictionary format). I mean, it would be great if "dictionary" will be common property for different algorithms. On the other hand, zstd have "super fast" mode which is actually a bit faster than LZ4 and compresses a bit better. So may be support for different algos is not essential. (But then we need a way to change compression level to that "super fast" mode.) ------- regards Yura Sokolov aka funny-falcon
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Aleksander Alekseev
Date:
Hi Nikhil, Many thanks for working on this. I proposed a similar patch some time ago [1] but the overall feedback was somewhat mixed so I choose to focus on something else. Thanks for peeking this up. > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1); > build_zstd_dict_for_attribute > ------------------------------- > t > (1 row) Did you have a chance to familiarize yourself with the corresponding discussion [1] and probably the previous threads? Particularly it was pointed out that dictionaries should be built automatically during VACUUM. We also discussed a special syntax for the feature, besides other things. [1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com -- Best regards, Aleksander Alekseev
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi, > Overall idea is great. > > I just want to mention LZ4 also have API to use dictionary. Its dictionary > will be as simple as "virtually prepended" text (in contrast to complex > ZStd dictionary format). > > I mean, it would be great if "dictionary" will be common property for > different algorithms. > > On the other hand, zstd have "super fast" mode which is actually a bit > faster than LZ4 and compresses a bit better. So may be support for > different algos is not essential. (But then we need a way to change > compression level to that "super fast" mode.) > zstd compression level and zstd dictionary size is configurable at attribute level using ALTER TABLE. Default zstd level is 3 and dict size is 4KB. For super fast mode level can be set to 1. ``` test=# alter table zstd alter column doc set compression zstd; ALTER TABLE test=# alter table zstd alter column doc set(zstd_cmp_level = 1); ALTER TABLE test=# select * from pg_attribute where attrelid = 'zstd'::regclass and attname = 'doc'; attrelid | attname | atttypid | attlen | attnum | atttypmod | attndims | attbyval | attalign | attstorage | attcompre ssion | attnotnull | atthasdef | atthasmissing | attidentity | attgenerated | attisdropped | attislocal | attinhcount | attcollation | attstattarget | attacl | attoptions | attfdwoptions | attmissingval ----------+---------+----------+--------+--------+-----------+----------+----------+----------+------------+---------- ------+------------+-----------+---------------+-------------+--------------+--------------+------------+------------- +--------------+---------------+--------+----------------------------------+---------------+--------------- 16389 | doc | 3802 | -1 | 1 | -1 | 0 | f | i | x | z | f | f | f | | | f | t | 0 | 0 | | | {zstd_dictid=1,zstd_cmp_level=1} | | (1 row) ```
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi On Thu, Mar 6, 2025 at 5:35 AM Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi Nikhil, > > Many thanks for working on this. I proposed a similar patch some time > ago [1] but the overall feedback was somewhat mixed so I choose to > focus on something else. Thanks for peeking this up. > > > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1); > > build_zstd_dict_for_attribute > > ------------------------------- > > t > > (1 row) > > Did you have a chance to familiarize yourself with the corresponding > discussion [1] and probably the previous threads? Particularly it was > pointed out that dictionaries should be built automatically during > VACUUM. We also discussed a special syntax for the feature, besides > other things. > > [1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com Restricting dictionary generation to the vacuum process is not ideal because it limits user control and flexibility. Compression efficiency is highly dependent on data distribution, which can change dynamically. By allowing users to generate dictionaries on demand via an API, they can optimize compression when they detect inefficiencies rather than waiting for a vacuum process, which may not align with their needs. Additionally, since all dictionaries are stored in the catalog table anyway, users can generate and manage them independently without interfering with the system’s automatic maintenance tasks. This approach ensures better adaptability to real-world scenarios where compression performance needs to be monitored and adjusted in real time. --- Nikhil Veldanda
06.03.2025 19:29, Nikhil Kumar Veldanda пишет: > Hi, > >> Overall idea is great. >> >> I just want to mention LZ4 also have API to use dictionary. Its dictionary >> will be as simple as "virtually prepended" text (in contrast to complex >> ZStd dictionary format). >> >> I mean, it would be great if "dictionary" will be common property for >> different algorithms. >> >> On the other hand, zstd have "super fast" mode which is actually a bit >> faster than LZ4 and compresses a bit better. So may be support for >> different algos is not essential. (But then we need a way to change >> compression level to that "super fast" mode.) >> > > zstd compression level and zstd dictionary size is configurable at > attribute level using ALTER TABLE. Default zstd level is 3 and dict > size is 4KB. For super fast mode level can be set to 1. No. Super-fast mode levels are negative. See parsing "--fast" parameter in `programs/zstdcli.c` in zstd's repository and definition of ZSTD_minCLevel(). So, to support "super-fast" mode you have to accept negative compression levels. I didn't check, probably you're already support them? ------- regards Yura Sokolov aka funny-falcon
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi Yura, > So, to support "super-fast" mode you have to accept negative compression > levels. I didn't check, probably you're already support them? > The key point I want to emphasize is that both zstd compression levels and dictionary size should be configurable based on user preferences at attribute level. --- Nikhil Veldanda