Thread: ZStandard (with dictionaries) compression support for TOAST compression
ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi all,
The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers significantly improved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes. Attached find for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained dictionary achieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed LZ4/pglz by about two times during compression of data.
Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that regard. To use the dictionary support requires training and storing a dictionary for a given variable-length column type. On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the ZStandard training API which will return a dictionary. In the example, the column is of JSONB type. The SQL function takes the table name and the attribute number as inputs. If the training is successful, it will return true; otherwise, it will return false.
‘’‘
test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)
‘’‘
The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method to write other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option called ‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own type-specific training function by handling sampling logic and returning the necessary information for the ZStandard training API in “ZstdTrainingData” format.
```
typedef struct ZstdTrainingData
{
char *sample_buffer; /* Pointer to the raw sample buffer */
size_t *sample_sizes; /* Array of sample sizes */
int nitems; /* Number of sample sizes */
} ZstdTrainingData;
```
This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the dictionary catalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary ID for that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and use it to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA.
We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’
```
test=# \d pg_zstd_dictionaries
Table "pg_catalog.pg_zstd_dictionaries"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
dictid | oid | | not null |
dict | bytea | | not null |
Indexes:
"pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
```
This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest dictionary will be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a dictionary is not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For decompression, the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used for compression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform decompression.
#############################################################################
Enter toast compression framework changes,
We identify a compressed datum compression algorithm using the top two bits of va_tcinfo (varattrib_4b.va_compressed).
It is possible to have four compression methods. However, based on previous community email discussions regarding toast compression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion has been made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible.
```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
uint32 va_cmp_alg; /* The additional compression algorithms
* information. */
} varatt_external;
```
As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and decompression routines related to ZStandard as needed. These are major design changes in the patch to incorporate ZStandard with dictionary compression.
Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would appreciate your thoughts on the on-disk changes that result from this.
kind regards,
Nikhil Veldanda
Amazon Web Services: https://aws.amazon.com
[1] https://facebook.github.io/zstd/
[2] https://github.com/facebook/zstd
[3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz
The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers significantly improved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes. Attached find for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained dictionary achieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed LZ4/pglz by about two times during compression of data.
Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that regard. To use the dictionary support requires training and storing a dictionary for a given variable-length column type. On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the ZStandard training API which will return a dictionary. In the example, the column is of JSONB type. The SQL function takes the table name and the attribute number as inputs. If the training is successful, it will return true; otherwise, it will return false.
‘’‘
test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)
‘’‘
The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method to write other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option called ‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own type-specific training function by handling sampling logic and returning the necessary information for the ZStandard training API in “ZstdTrainingData” format.
```
typedef struct ZstdTrainingData
{
char *sample_buffer; /* Pointer to the raw sample buffer */
size_t *sample_sizes; /* Array of sample sizes */
int nitems; /* Number of sample sizes */
} ZstdTrainingData;
```
This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the dictionary catalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary ID for that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and use it to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA.
We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’
```
test=# \d pg_zstd_dictionaries
Table "pg_catalog.pg_zstd_dictionaries"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
dictid | oid | | not null |
dict | bytea | | not null |
Indexes:
"pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
```
This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest dictionary will be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a dictionary is not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For decompression, the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used for compression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform decompression.
#############################################################################
Enter toast compression framework changes,
We identify a compressed datum compression algorithm using the top two bits of va_tcinfo (varattrib_4b.va_compressed).
It is possible to have four compression methods. However, based on previous community email discussions regarding toast compression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion has been made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible.
```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
uint32 va_cmp_alg; /* The additional compression algorithms
* information. */
} varatt_external;
```
As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and decompression routines related to ZStandard as needed. These are major design changes in the patch to incorporate ZStandard with dictionary compression.
Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would appreciate your thoughts on the on-disk changes that result from this.
kind regards,
Nikhil Veldanda
Amazon Web Services: https://aws.amazon.com
[1] https://facebook.github.io/zstd/
[2] https://github.com/facebook/zstd
[3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz
Attachment
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Kirill Reshke
Date:
On Thu, 6 Mar 2025 at 08:43, Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> wrote: > > Hi all, > > The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers significantlyimproved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes. Attachedfind for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained dictionaryachieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed LZ4/pglzby about two times during compression of data. > > Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levelsof compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that regard.To use the dictionary support requires training and storing a dictionary for a given variable-length column type.On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the ZStandardtraining API which will return a dictionary. In the example, the column is of JSONB type. The SQL function takesthe table name and the attribute number as inputs. If the training is successful, it will return true; otherwise, itwill return false. > > ‘’‘ > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1); > build_zstd_dict_for_attribute > ------------------------------- > t > (1 row) > ‘’‘ > > The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method towrite other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option called‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own type-specifictraining function by handling sampling logic and returning the necessary information for the ZStandard trainingAPI in “ZstdTrainingData” format. > > ``` > typedef struct ZstdTrainingData > { > char *sample_buffer; /* Pointer to the raw sample buffer */ > size_t *sample_sizes; /* Array of sample sizes */ > int nitems; /* Number of sample sizes */ > } ZstdTrainingData; > ``` > This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the dictionarycatalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary IDfor that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and useit to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT, andBYTEA. > > We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’ > > ``` > test=# \d pg_zstd_dictionaries > Table "pg_catalog.pg_zstd_dictionaries" > Column | Type | Collation | Nullable | Default > --------+-------+-----------+----------+--------- > dictid | oid | | not null | > dict | bytea | | not null | > Indexes: > "pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid) > ``` > > This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest dictionarywill be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a dictionaryis not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For decompression,the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used forcompression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform decompression. > > ############################################################################# > > Enter toast compression framework changes, > > We identify a compressed datum compression algorithm using the top two bits of va_tcinfo (varattrib_4b.va_compressed). > It is possible to have four compression methods. However, based on previous community email discussions regarding toastcompression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion hasbeen made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’ and‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible. > > ``` > typedef union > { > struct /* Normal varlena (4-byte length) */ > { > uint32 va_header; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_4byte; > struct /* Compressed-in-line format */ > { > uint32 va_header; > uint32 va_tcinfo; /* Original data size (excludes header) and > * compression method; see va_extinfo */ > char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */ > } va_compressed; > struct > { > uint32 va_header; > uint32 va_tcinfo; > uint32 va_cmp_alg; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_compressed_ext; > } varattrib_4b; > > typedef struct varatt_external > { > int32 va_rawsize; /* Original data size (includes header) */ > uint32 va_extinfo; /* External saved size (without header) and > * compression method */ > Oid va_valueid; /* Unique ID of value within TOAST table */ > Oid va_toastrelid; /* RelID of TOAST table containing it */ > uint32 va_cmp_alg; /* The additional compression algorithms > * information. */ > } varatt_external; > ``` > > As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and decompressionroutines related to ZStandard as needed. These are major design changes in the patch to incorporate ZStandardwith dictionary compression. > > Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would appreciateyour thoughts on the on-disk changes that result from this. > > kind regards, > > Nikhil Veldanda > Amazon Web Services: https://aws.amazon.com > > [1] https://facebook.github.io/zstd/ > [2] https://github.com/facebook/zstd > [3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz > Hi! I generally love this idea, however I am not convinced in-core support this is the right direction here. Maybe we can introduce some API infrastructure here to allow delegating compression to extension's? This is merely my opinion; perhaps dealing with a redo is not worthwhile. I did a brief lookup on patch v1. I feel like this is too much for a single patch. Take, for example this change: ``` -#define NO_LZ4_SUPPORT() \ +#define NO_METHOD_SUPPORT(method) \ ereport(ERROR, \ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \ - errmsg("compression method lz4 not supported"), \ - errdetail("This functionality requires the server to be built with lz4 support."))) + errmsg("compression method %s not supported", method), \ + errdetail("This functionality requires the server to be built with %s support.", method))) ``` This could be a separate preliminary refactoring patch in series. Perhaps we need to divide the patch into smaller pieces if we follow the suggested course of this thread (in-core support). I will try to give another in-depth look here soon. -- Best regards, Kirill Reshke
06.03.2025 08:32, Nikhil Kumar Veldanda пишет: > Hi all, > > The ZStandard compression algorithm [1][2], though not currently used for > TOAST compression in PostgreSQL, offers significantly improved compression > ratios compared to lz4/pglz in both dictionary-based and non-dictionary > modes. Attached find for review my patch to add ZStandard compression to > Postgres. In tests this patch used with a pre-trained dictionary achieved > up to four times the compression ratio of LZ4, while ZStandard without a > dictionary outperformed LZ4/pglz by about two times during compression of data. > > Notably, this is the first compression algorithm for Postgres that can make > use of a dictionary to provide higher levels of compression, but > dictionaries have to be generated and maintained, and so I’ve had to break > new ground in that regard. To use the dictionary support requires training > and storing a dictionary for a given variable-length column type. On a > variable-length column, a SQL function will be called. It will sample the > column’s data and feed it into the ZStandard training API which will return > a dictionary. In the example, the column is of JSONB type. The SQL function > takes the table name and the attribute number as inputs. If the training is > successful, it will return true; otherwise, it will return false. > > ‘’‘ > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1); > build_zstd_dict_for_attribute > ------------------------------- > t > (1 row) > ‘’‘ > > The sampling logic and data to feed to the ZStandard training API can vary > by data type. The patch includes an method to write other type-specific > training functions and includes a default for JSONB, TEXT and BYTEA. There > is a new option called ‘build_zstd_dict’ that takes a function name as > input in ‘CREATE TYPE’. In this way anyone can write their own type- > specific training function by handling sampling logic and returning the > necessary information for the ZStandard training API in “ZstdTrainingData” > format. > > ``` > typedef struct ZstdTrainingData > { > char *sample_buffer; /* Pointer to the raw sample buffer */ > size_t *sample_sizes; /* Array of sample sizes */ > int nitems; /* Number of sample sizes */ > } ZstdTrainingData; > ``` > This information is feed into the ZStandard train API, which generates a > dictionary and inserts it into the dictionary catalog table. Additionally, > we update the ‘pg_attribute’ attribute options to include the unique > dictionary ID for that specific attribute. During compression, based on the > available dictionary ID, we retrieve the dictionary and use it to compress > the documents. I’ve created standard training function > (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA. > > We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’ > > ``` > test=# \d pg_zstd_dictionaries > Table "pg_catalog.pg_zstd_dictionaries" > Column | Type | Collation | Nullable | Default > --------+-------+-----------+----------+--------- > dictid | oid | | not null | > dict | bytea | | not null | > Indexes: > "pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid) > ``` > > This is the entire ZStandard dictionary infrastructure. A column can have > multiple dictionaries. The latest dictionary will be identified by the > pg_attribute attoptions. We never delete dictionaries once they are > generated. If a dictionary is not provided and attcompression is set to > zstd, we compress with ZStandard without dictionary. For decompression, the > zstd-compressed frame contains a dictionary identifier (dictid) that > indicates the dictionary used for compression. By retrieving this dictid > from the zstd frame, we then fetch the corresponding dictionary and perform > decompression. > > ############################################################################# > > Enter toast compression framework changes, > > We identify a compressed datum compression algorithm using the top two bits > of va_tcinfo (varattrib_4b.va_compressed). > It is possible to have four compression methods. However, based on previous > community email discussions regarding toast compression changes[3], the > idea of using it for a new compression algorithm has been rejected, and a > suggestion has been made to extend it which I’ve implemented in this patch. > This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’ > on disk structures. I’ve made sure that this changes are backward compatible. > > ``` > typedef union > { > struct /* Normal varlena (4-byte length) */ > { > uint32 va_header; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_4byte; > struct /* Compressed-in-line format */ > { > uint32 va_header; > uint32 va_tcinfo; /* Original data size (excludes header) and > * compression method; see va_extinfo */ > char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */ > } va_compressed; > struct > { > uint32 va_header; > uint32 va_tcinfo; > uint32 va_cmp_alg; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_compressed_ext; > } varattrib_4b; > > typedef struct varatt_external > { > int32 va_rawsize; /* Original data size (includes header) */ > uint32 va_extinfo; /* External saved size (without header) and > * compression method */ > Oid va_valueid; /* Unique ID of value within TOAST table */ > Oid va_toastrelid; /* RelID of TOAST table containing it */ > uint32 va_cmp_alg; /* The additional compression algorithms > * information. */ > } varatt_external; > ``` > > As I need to update this structs, I’ve made changes to the existing macros. > Additionally added compression and decompression routines related to > ZStandard as needed. These are major design changes in the patch to > incorporate ZStandard with dictionary compression. > > Please let me know what you think about all this. Are there any concerns > with my approach? In particular, I would appreciate your thoughts on the > on-disk changes that result from this. > > kind regards, > > Nikhil Veldanda > Amazon Web Services: https://aws.amazon.com <https://aws.amazon.com/> > > [1] https://facebook.github.io/zstd/ <https://facebook.github.io/zstd/> > [2] https://github.com/facebook/zstd <https://github.com/facebook/zstd> > [3] https://www.postgresql.org/message-id/flat/ > YoMiNmkztrslDbNS%40paquier.xyz <https://www.postgresql.org/message-id/flat/ > YoMiNmkztrslDbNS%40paquier.xyz> Overall idea is great. I just want to mention LZ4 also have API to use dictionary. Its dictionary will be as simple as "virtually prepended" text (in contrast to complex ZStd dictionary format). I mean, it would be great if "dictionary" will be common property for different algorithms. On the other hand, zstd have "super fast" mode which is actually a bit faster than LZ4 and compresses a bit better. So may be support for different algos is not essential. (But then we need a way to change compression level to that "super fast" mode.) ------- regards Yura Sokolov aka funny-falcon
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Aleksander Alekseev
Date:
Hi Nikhil, Many thanks for working on this. I proposed a similar patch some time ago [1] but the overall feedback was somewhat mixed so I choose to focus on something else. Thanks for peeking this up. > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1); > build_zstd_dict_for_attribute > ------------------------------- > t > (1 row) Did you have a chance to familiarize yourself with the corresponding discussion [1] and probably the previous threads? Particularly it was pointed out that dictionaries should be built automatically during VACUUM. We also discussed a special syntax for the feature, besides other things. [1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com -- Best regards, Aleksander Alekseev
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi, > Overall idea is great. > > I just want to mention LZ4 also have API to use dictionary. Its dictionary > will be as simple as "virtually prepended" text (in contrast to complex > ZStd dictionary format). > > I mean, it would be great if "dictionary" will be common property for > different algorithms. > > On the other hand, zstd have "super fast" mode which is actually a bit > faster than LZ4 and compresses a bit better. So may be support for > different algos is not essential. (But then we need a way to change > compression level to that "super fast" mode.) > zstd compression level and zstd dictionary size is configurable at attribute level using ALTER TABLE. Default zstd level is 3 and dict size is 4KB. For super fast mode level can be set to 1. ``` test=# alter table zstd alter column doc set compression zstd; ALTER TABLE test=# alter table zstd alter column doc set(zstd_cmp_level = 1); ALTER TABLE test=# select * from pg_attribute where attrelid = 'zstd'::regclass and attname = 'doc'; attrelid | attname | atttypid | attlen | attnum | atttypmod | attndims | attbyval | attalign | attstorage | attcompre ssion | attnotnull | atthasdef | atthasmissing | attidentity | attgenerated | attisdropped | attislocal | attinhcount | attcollation | attstattarget | attacl | attoptions | attfdwoptions | attmissingval ----------+---------+----------+--------+--------+-----------+----------+----------+----------+------------+---------- ------+------------+-----------+---------------+-------------+--------------+--------------+------------+------------- +--------------+---------------+--------+----------------------------------+---------------+--------------- 16389 | doc | 3802 | -1 | 1 | -1 | 0 | f | i | x | z | f | f | f | | | f | t | 0 | 0 | | | {zstd_dictid=1,zstd_cmp_level=1} | | (1 row) ```
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi On Thu, Mar 6, 2025 at 5:35 AM Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi Nikhil, > > Many thanks for working on this. I proposed a similar patch some time > ago [1] but the overall feedback was somewhat mixed so I choose to > focus on something else. Thanks for peeking this up. > > > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1); > > build_zstd_dict_for_attribute > > ------------------------------- > > t > > (1 row) > > Did you have a chance to familiarize yourself with the corresponding > discussion [1] and probably the previous threads? Particularly it was > pointed out that dictionaries should be built automatically during > VACUUM. We also discussed a special syntax for the feature, besides > other things. > > [1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com Restricting dictionary generation to the vacuum process is not ideal because it limits user control and flexibility. Compression efficiency is highly dependent on data distribution, which can change dynamically. By allowing users to generate dictionaries on demand via an API, they can optimize compression when they detect inefficiencies rather than waiting for a vacuum process, which may not align with their needs. Additionally, since all dictionaries are stored in the catalog table anyway, users can generate and manage them independently without interfering with the system’s automatic maintenance tasks. This approach ensures better adaptability to real-world scenarios where compression performance needs to be monitored and adjusted in real time. --- Nikhil Veldanda
06.03.2025 19:29, Nikhil Kumar Veldanda пишет: > Hi, > >> Overall idea is great. >> >> I just want to mention LZ4 also have API to use dictionary. Its dictionary >> will be as simple as "virtually prepended" text (in contrast to complex >> ZStd dictionary format). >> >> I mean, it would be great if "dictionary" will be common property for >> different algorithms. >> >> On the other hand, zstd have "super fast" mode which is actually a bit >> faster than LZ4 and compresses a bit better. So may be support for >> different algos is not essential. (But then we need a way to change >> compression level to that "super fast" mode.) >> > > zstd compression level and zstd dictionary size is configurable at > attribute level using ALTER TABLE. Default zstd level is 3 and dict > size is 4KB. For super fast mode level can be set to 1. No. Super-fast mode levels are negative. See parsing "--fast" parameter in `programs/zstdcli.c` in zstd's repository and definition of ZSTD_minCLevel(). So, to support "super-fast" mode you have to accept negative compression levels. I didn't check, probably you're already support them? ------- regards Yura Sokolov aka funny-falcon
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi Yura, > So, to support "super-fast" mode you have to accept negative compression > levels. I didn't check, probably you're already support them? > The key point I want to emphasize is that both zstd compression levels and dictionary size should be configurable based on user preferences at attribute level. --- Nikhil Veldanda
On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> wrote: > Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levelsof compression, but dictionaries have to be generated and maintained, I think that solving the problems around using a dictionary is going to be really hard. Can we see some evidence that the results will be worth it? -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda > <veldanda.nikhilkumar17@gmail.com> wrote: >> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levelsof compression, but dictionaries have to be generated and maintained, > I think that solving the problems around using a dictionary is going > to be really hard. Can we see some evidence that the results will be > worth it? BTW, this is hardly the first such attempt. See [1] for a prior attempt at something fairly similar, which ended up going nowhere. It'd be wise to understand why that failed before pressing forward. Note that the thread title for [1] is pretty misleading, as the original discussion about JSONB-specific compression soon migrated to discussion of compressing TOAST data using dictionaries. At least from a ten-thousand-foot viewpoint, that seems like exactly what you're proposing here. I see that you dismissed [1] as irrelevant upthread, but I think you'd better look closer. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi Robert, > I think that solving the problems around using a dictionary is going > to be really hard. Can we see some evidence that the results will be > worth it? With the latest patch I've shared, Using a Kaggle dataset of Nintendo-related tweets[1], we leveraged PostgreSQL's acquire_sample_rows function to quickly gather just 1,000 sample rows for a specific attribute out of 104695 rows. These raw samples were passed into Zstd's sampling buffer, generating a custom dictionary. This dictionary was then directly used to compress the documents, resulting in 62% of space savings after compressed: ``` test=# \dt+ List of tables Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------------+-------+----------+-------------+---------------+--------+------------- public | lz4 | table | nikhilkv | permanent | heap | 297 MB | public | pglz | table | nikhilkv | permanent | heap | 259 MB | public | zstd_with_dict | table | nikhilkv | permanent | heap | 114 MB | public | zstd_wo_dict | table | nikhilkv | permanent | heap | 210 MB | (4 rows) ``` We've observed similarly strong results on other datasets as well with using dictionaries. [1] https://www.kaggle.com/code/dcalambas/nintendo-tweets-analysis/data --- Nikhil Veldanda
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi Tom, On Thu, Mar 6, 2025 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda > > <veldanda.nikhilkumar17@gmail.com> wrote: > >> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levelsof compression, but dictionaries have to be generated and maintained, > > > I think that solving the problems around using a dictionary is going > > to be really hard. Can we see some evidence that the results will be > > worth it? > > BTW, this is hardly the first such attempt. See [1] for a prior > attempt at something fairly similar, which ended up going nowhere. > It'd be wise to understand why that failed before pressing forward. > > Note that the thread title for [1] is pretty misleading, as the > original discussion about JSONB-specific compression soon migrated > to discussion of compressing TOAST data using dictionaries. At > least from a ten-thousand-foot viewpoint, that seems like exactly > what you're proposing here. I see that you dismissed [1] as > irrelevant upthread, but I think you'd better look closer. > > regards, tom lane > > [1] https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com Thank you for highlighting the previous discussion—I reviewed [1] closely. While both methods involve dictionary-based compression, the approach I'm proposing differs significantly. The previous method explicitly extracted string values from JSONB and assigned unique OIDs to each entry, resulting in distinct dictionary entries for every unique value. In contrast, this approach directly leverages Zstandard's dictionary training API. We provide raw data samples to Zstd, which generates a dictionary of a specified size. This dictionary is then stored in a catalog table and used to compress subsequent inserts for the specific attribute it was trained on. Key differences include: 1. No new data types are required. 2. Attributes can optionally have multiple dictionaries; the latest dictionary is used during compression, and the exact dictionary used during compression is retrieved and applied for decompression. 3. Compression utilizes Zstandard's trained dictionaries when available. Additionally, I have provided an option for users to define custom sampling and training logic, as directly passing raw buffers to the training API may not always yield optimal results, especially for certain custom variable-length data types. This flexibility motivates the necessary adjustments to `pg_type`. I would greatly appreciate your feedback or any additional suggestions you might have. [1] https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com Best regards, Nikhil Veldanda
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Aleksander Alekseev
Date:
Hi Nikhil, > Thank you for highlighting the previous discussion—I reviewed [1] > closely. While both methods involve dictionary-based compression, the > approach I'm proposing differs significantly. > > The previous method explicitly extracted string values from JSONB and > assigned unique OIDs to each entry, resulting in distinct dictionary > entries for every unique value. In contrast, this approach directly > leverages Zstandard's dictionary training API. We provide raw data > samples to Zstd, which generates a dictionary of a specified size. > This dictionary is then stored in a catalog table and used to compress > subsequent inserts for the specific attribute it was trained on. > > [...] You didn't read closely enough I'm afraid. As Tom pointed out, the title of the thread is misleading. On top of that there are several separate threads. I did my best to cross-reference them, but apparently didn't do good enough. Initially I proposed to add ZSON extension [1][2] to the PostgreSQL core. However the idea evolved into TOAST improvements that don't require a user to use special types. You may also find interesting the related "Pluggable TOASTer" discussion [3]. The idea there was rather different but the discussion about extending TOAST pointers so that in the future we can use something else than ZSTD is relevant. You will find the recent summary of the reached agreements somewhere around this message [4], take a look at the thread a bit above and below it. I believe this effort is important. You can't, however, simply discard everything that was discussed in this area for the past several years. If you want to succeed of course. No one will look at your patch if it doesn't account for all the previous discussions. I'm sorry, I know it's disappointing. This being said you should have done better research before submitting the code. You could just ask if anyone was working on something like this before and save a lot of time. Personally I would suggest starting with one little step toward compression dictionaries. Particularly focusing on extendability of TOAST pointers. You are going to need to store dictionary ids there and allow using other compression algorithms in the future. This will require something like a varint/utf8-like bitmask for this. See the previous discussions. [1]: https://github.com/afiskon/zson [2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com [3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru [4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com -- Best regards, Aleksander Alekseev
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Aleksander Alekseev
Date:
Hi Robert, > I think that solving the problems around using a dictionary is going > to be really hard. Can we see some evidence that the results will be > worth it? Compression dictionaries give a good compression ratio (~50%) and also increase TPS a bit (5-10%) due to better buffer cache utilization. At least according to synthetic and not trustworthy benchmarks I did some years ago [1]. The result may be very dependent on the actual data of course, not to mention particular implementation of the idea. [1]: https://github.com/afiskon/zson/blob/master/docs/benchmark.md -- Best regards, Aleksander Alekseev
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi, I reviewed the discussions, and while most agreements focused on changes to the toast pointer, the design I propose requires no modifications to it. I’ve carefully considered the design choices made previously, and I recognize Zstd’s clear advantages in compression efficiency and performance over algorithms like PGLZ and LZ4, we can integrate it without altering the existing toast pointer (varatt_external) structure. By simply using the top two bits of the va_extinfo field (setting them to '11') in `varatt_external`, we can signal an alternative compression algorithm, clearly distinguishing new methods from legacy ones. The specific algorithm used would then be recorded in the va_cmp_alg field. This approach addresses the issues raised in the summarized thread[1] and to leverage dictionaries for the data that can stay in-line. While my initial patch includes modifications to toast_pointer due to a single dependency on (pg_column_compression), those changes aren’t strictly necessary; resolving that dependency separately would make the overall design even less intrusive. Here’s an illustrative structure: ``` typedef union { struct /* Normal varlena (4-byte length) */ { uint32 va_header; char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_4byte; struct /* Current Compressed format */ { uint32 va_header; uint32 va_tcinfo; /* Original size and compression method */ char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */ } va_compressed; struct /* Extended compression format */ { uint32 va_header; uint32 va_tcinfo; uint32 va_cmp_alg; uint32 va_cmp_dictid; char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_compressed_ext; } varattrib_4b; typedef struct varatt_external { int32 va_rawsize; /* Original data size (includes header) */ uint32 va_extinfo; /* External saved size (without header) and * compression method */ `11` indicates new compression methods. Oid va_valueid; /* Unique ID of value within TOAST table */ Oid va_toastrelid; /* RelID of TOAST table containing it */ } varatt_external; ``` Decompression flow remains straightforward: once a datum is identified as external, we detoast it, then we identify the compression algorithm using ` TOAST_COMPRESS_METHOD` macro which refers to a varattrib_4b structure not a toast pointer. We retrieve the compression algorithm from either va_tcinfo or va_cmp_alg based on adjusted macros, and decompress accordingly. In summary, integrating Zstandard into the TOAST framework in this minimally invasive way should yield substantial benefits. [1] https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com Best regards, Nikhil Veldanda On Fri, Mar 7, 2025 at 3:42 AM Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi Nikhil, > > > Thank you for highlighting the previous discussion—I reviewed [1] > > closely. While both methods involve dictionary-based compression, the > > approach I'm proposing differs significantly. > > > > The previous method explicitly extracted string values from JSONB and > > assigned unique OIDs to each entry, resulting in distinct dictionary > > entries for every unique value. In contrast, this approach directly > > leverages Zstandard's dictionary training API. We provide raw data > > samples to Zstd, which generates a dictionary of a specified size. > > This dictionary is then stored in a catalog table and used to compress > > subsequent inserts for the specific attribute it was trained on. > > > > [...] > > You didn't read closely enough I'm afraid. As Tom pointed out, the > title of the thread is misleading. On top of that there are several > separate threads. I did my best to cross-reference them, but > apparently didn't do good enough. > > Initially I proposed to add ZSON extension [1][2] to the PostgreSQL > core. However the idea evolved into TOAST improvements that don't > require a user to use special types. You may also find interesting the > related "Pluggable TOASTer" discussion [3]. The idea there was rather > different but the discussion about extending TOAST pointers so that in > the future we can use something else than ZSTD is relevant. > > You will find the recent summary of the reached agreements somewhere > around this message [4], take a look at the thread a bit above and > below it. > > I believe this effort is important. You can't, however, simply discard > everything that was discussed in this area for the past several years. > If you want to succeed of course. No one will look at your patch if it > doesn't account for all the previous discussions. I'm sorry, I know > it's disappointing. This being said you should have done better > research before submitting the code. You could just ask if anyone was > working on something like this before and save a lot of time. > > Personally I would suggest starting with one little step toward > compression dictionaries. Particularly focusing on extendability of > TOAST pointers. You are going to need to store dictionary ids there > and allow using other compression algorithms in the future. This will > require something like a varint/utf8-like bitmask for this. See the > previous discussions. > > [1]: https://github.com/afiskon/zson > [2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com > [3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru > [4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com > > -- > Best regards, > Aleksander Alekseev
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi all, Attached an updated version of the patch. Specifically, I've removed changes related to the TOAST pointer structure. This proposal is different from earlier discussions on this topic[1], where extending the TOAST pointer was considered essential for enabling dictionary-based compression. Key improvements introduced in this proposal: 1. No Changes to TOAST Pointer: The existing TOAST pointer structure remains untouched, simplifying integration and minimizing potential disruptions. 2. Extensible Design: The solution is structured to seamlessly incorporate future compression algorithms beyond zstd [2], providing greater flexibility and future-proofing. 3. Inline Data Compression with Dictionary Support: Crucially, this approach supports dictionary-based compression for inline data. Dictionaries are highly effective for compressing small-sized documents, providing substantial storage savings. Please refer to the attached image from the zstd README[2] for supporting evidence. Omitting dictionary-based compression for inline data would significantly reduce these benefits. For example, under previous design constraints [3], if a 16KB document compressed down to 256 bytes using a dictionary, storing this inline would not have been feasible. The current proposal addresses this limitation, thereby fully leveraging dictionary-based compression. I believe this solution effectively addresses the limitations identified in our earlier discussions [1][3]. Feedback on this approach would be greatly appreciated, I welcome any feedback or suggestions you might have. References: [1] https://www.postgresql.org/message-id/flat/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com [2] https://github.com/facebook/zstd [3] https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com ``` typedef union { struct /* Normal varlena (4-byte length) */ { uint32 va_header; char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_4byte; struct /* Compressed-in-line format */ { uint32 va_header; uint32 va_tcinfo; /* Original data size (excludes header) and * compression method; see va_extinfo */ char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */ } va_compressed; struct { uint32 va_header; uint32 va_tcinfo; uint32 va_cmp_alg; uint32 va_cmp_dictid; char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_compressed_ext; } varattrib_4b; ``` Additional algorithm information and dictid is stored in varattrib_4b. Best regards, Nikhil Veldanda On Fri, Mar 7, 2025 at 5:35 PM Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> wrote: > > Hi, > > I reviewed the discussions, and while most agreements focused on > changes to the toast pointer, the design I propose requires no > modifications to it. I’ve carefully considered the design choices made > previously, and I recognize Zstd’s clear advantages in compression > efficiency and performance over algorithms like PGLZ and LZ4, we can > integrate it without altering the existing toast pointer > (varatt_external) structure. > > By simply using the top two bits of the va_extinfo field (setting them > to '11') in `varatt_external`, we can signal an alternative > compression algorithm, clearly distinguishing new methods from legacy > ones. The specific algorithm used would then be recorded in the > va_cmp_alg field. > > This approach addresses the issues raised in the summarized thread[1] > and to leverage dictionaries for the data that can stay in-line. While > my initial patch includes modifications to toast_pointer due to a > single dependency on (pg_column_compression), those changes aren’t > strictly necessary; resolving that dependency separately would make > the overall design even less intrusive. > > Here’s an illustrative structure: > ``` > typedef union > { > struct /* Normal varlena (4-byte length) */ > { > uint32 va_header; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_4byte; > struct /* Current Compressed format */ > { > uint32 va_header; > uint32 va_tcinfo; /* Original size and compression method */ > char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */ > } va_compressed; > struct /* Extended compression format */ > { > uint32 va_header; > uint32 va_tcinfo; > uint32 va_cmp_alg; > uint32 va_cmp_dictid; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_compressed_ext; > } varattrib_4b; > > typedef struct varatt_external > { > int32 va_rawsize; /* Original data size (includes header) */ > uint32 va_extinfo; /* External saved size (without header) and > * compression method */ `11` indicates new compression methods. > Oid va_valueid; /* Unique ID of value within TOAST table */ > Oid va_toastrelid; /* RelID of TOAST table containing it */ > } varatt_external; > ``` > > Decompression flow remains straightforward: once a datum is identified > as external, we detoast it, then we identify the compression algorithm > using ` > TOAST_COMPRESS_METHOD` macro which refers to a varattrib_4b structure > not a toast pointer. We retrieve the compression algorithm from either > va_tcinfo or va_cmp_alg based on adjusted macros, and decompress > accordingly. > > In summary, integrating Zstandard into the TOAST framework in this > minimally invasive way should yield substantial benefits. > > [1] https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com > > Best regards, > Nikhil Veldanda > > On Fri, Mar 7, 2025 at 3:42 AM Aleksander Alekseev > <aleksander@timescale.com> wrote: > > > > Hi Nikhil, > > > > > Thank you for highlighting the previous discussion—I reviewed [1] > > > closely. While both methods involve dictionary-based compression, the > > > approach I'm proposing differs significantly. > > > > > > The previous method explicitly extracted string values from JSONB and > > > assigned unique OIDs to each entry, resulting in distinct dictionary > > > entries for every unique value. In contrast, this approach directly > > > leverages Zstandard's dictionary training API. We provide raw data > > > samples to Zstd, which generates a dictionary of a specified size. > > > This dictionary is then stored in a catalog table and used to compress > > > subsequent inserts for the specific attribute it was trained on. > > > > > > [...] > > > > You didn't read closely enough I'm afraid. As Tom pointed out, the > > title of the thread is misleading. On top of that there are several > > separate threads. I did my best to cross-reference them, but > > apparently didn't do good enough. > > > > Initially I proposed to add ZSON extension [1][2] to the PostgreSQL > > core. However the idea evolved into TOAST improvements that don't > > require a user to use special types. You may also find interesting the > > related "Pluggable TOASTer" discussion [3]. The idea there was rather > > different but the discussion about extending TOAST pointers so that in > > the future we can use something else than ZSTD is relevant. > > > > You will find the recent summary of the reached agreements somewhere > > around this message [4], take a look at the thread a bit above and > > below it. > > > > I believe this effort is important. You can't, however, simply discard > > everything that was discussed in this area for the past several years. > > If you want to succeed of course. No one will look at your patch if it > > doesn't account for all the previous discussions. I'm sorry, I know > > it's disappointing. This being said you should have done better > > research before submitting the code. You could just ask if anyone was > > working on something like this before and save a lot of time. > > > > Personally I would suggest starting with one little step toward > > compression dictionaries. Particularly focusing on extendability of > > TOAST pointers. You are going to need to store dictionary ids there > > and allow using other compression algorithms in the future. This will > > require something like a varint/utf8-like bitmask for this. See the > > previous discussions. > > > > [1]: https://github.com/afiskon/zson > > [2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com > > [3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru > > [4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com > > > > -- > > Best regards, > > Aleksander Alekseev
Attachment
On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> wrote: > struct /* Extended compression format */ > { > uint32 va_header; > uint32 va_tcinfo; > uint32 va_cmp_alg; > uint32 va_cmp_dictid; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_compressed_ext; > } varattrib_4b; First, thanks for sending along the performance results. I agree that those are promising. Second, thanks for sending these design details. The idea of keeping dictionaries in pg_zstd_dictionaries literally forever doesn't seem very appealing, but I'm not sure what the other options are. I think we've established in previous work in this area that compressed values can creep into unrelated tables and inside records or other container types like ranges. Therefore, we have no good way of knowing when a dictionary is unreferenced and can be dropped. So in that sense your decision to keep them forever is "right," but it's still unpleasant. It would even be necessary to make pg_upgrade carry them over to new versions. If we could make sure that compressed datums never leaked out into other tables, then tables could depend on dictionaries and dictionaries could be dropped when there were no longer any tables depending on them. But like I say, previous work suggested that this would be very difficult to achieve. However, without that, I imagine users generating new dictionaries regularly as the data changes and eventually getting frustrated that they can't get rid of the old ones. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ZStandard (with dictionaries) compression support for TOAST compression
From
Nikhil Kumar Veldanda
Date:
Hi Robert, Thank you for your response, and apologies for the delay in getting back to you. You raised some important concerns in your reply, I’ve worked hard to understand and hopefully address these two: * Dictionary Cleanup via Dependency Tracking * Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...) Dictionary Cleanup via Dependency Tracking: To address your question on how we can safely clean up unused dictionaries, I’ve implemented a mechanism based on PostgreSQL’s standard dependency system (pg_depend), permit me to explain. When a Zstandard dictionary is created for a table, we record a DEPENDENCY_NORMAL dependency from the table to the dictionary. This ensures that when the table is dropped, the corresponding entry is removed from the pg_depend catalog. Users can then call the cleanup_unused_dictionaries() function to remove any dictionaries that are no longer referenced by any table. // create dependency, { ObjectAddress dictObj; ObjectAddress relation; ObjectAddressSet(dictObj, ZstdDictionariesRelationId, dictid); ObjectAddressSet(relation, RelationRelationId, relid); /* NORMAL dependency: relid → Dictionary */ recordDependencyOn(&relation, &dictObj, DEPENDENCY_NORMAL); } Example: Consider two tables, each using its own Zstandard dictionary: test=# \dt+ List of tables Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+-------+-------+----------+-------------+---------------+-------+------------- public | temp | table | nikhilkv | permanent | heap | 16 kB | public | temp1 | table | nikhilkv | permanent | heap | 16 kB | (2 rows) // Dictionary dependencies test=# select * from pg_depend where refclassid = 9946; classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype ---------+-------+----------+------------+----------+-------------+--------- 1259 | 16389 | 0 | 9946 | 1 | 0 | n 1259 | 16394 | 0 | 9946 | 2 | 0 | n (2 rows) // the corresponding dictionaries: test=# select * from pg_zstd_dictionaries ; dictid | dict --------+---------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- -------------------------------------- 1 | \x37a430ec71451a10091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414 14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01 0100000004000000080000004c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696 e6720656c69742e204c6f72656d2069 2 | \x37a430ec7d1a933a091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414 14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01 0100000004000000080000004e696b68696c206b756d616e722076656c64616e64612c206973206f6b61792063616e6469646174652c2068652069732 0696e2073656174746c65204e696b68696c20 (2 rows) If cleanup_unused_dictionaries() is called while the dependencies still exist, nothing is removed: test=# select cleanup_unused_dictionaries(); cleanup_unused_dictionaries ----------------------------- 0 (1 row) After dropping temp1, the associated dictionary becomes eligible for cleanup: test=# drop table temp1; DROP TABLE test=# select cleanup_unused_dictionaries(); cleanup_unused_dictionaries ----------------------------- 1 (1 row) ________________________________ Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...) As compressed datums can be copied to other unrelated tables via CTAS, INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a method inheritZstdDictionaryDependencies. This method is invoked at the end of such statements and ensures that any dictionary dependencies from source tables are copied to the destination table. We determine the set of source tables using the relationOids field in PlannedStmt. This guarantees that if compressed datums reference a zstd dictionary the destination table is marked as dependent on the dictionaries that the source tables depend on, preventing premature cleanup by cleanup_unused_dictionaries. Example: Consider this example where we have two tables which has their own dictionary List of tables Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+-------+-------+----------+-------------+---------------+-------+------------- public | temp | table | nikhilkv | permanent | heap | 16 kB | public | temp1 | table | nikhilkv | permanent | heap | 16 kB | (2 rows) Using CTAS (CREATE TABLE AS), one table is copied to another. In this case, the compressed datums in the temp table are copied to copy_tbl. Since the dictionary is shared between two tables, a dependency on that dictionary is also established for the destination table. Even if the original temp table is deleted and cleanup is triggered, the dictionary will not be dropped because there remains an active dependency. test=# create table copy_tbl as select * from temp; SELECT 20 // dictid 1 is shared between two tables. test=# select * from pg_depend where refclassid = 9946; classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype ---------+-------+----------+------------+----------+-------------+--------- 1259 | 16389 | 0 | 9946 | 1 | 0 | n 1259 | 16404 | 0 | 9946 | 1 | 0 | n 1259 | 16399 | 0 | 9946 | 3 | 0 | n (3 rows) // After dropping the temp tale where dictid 1 is used to compress datums test=# drop table temp; DROP TABLE // dependency for temp table is dropped. test=# select * from pg_depend where refclassid = 9946; classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype ---------+-------+----------+------------+----------+-------------+--------- 1259 | 16404 | 0 | 9946 | 1 | 0 | n 1259 | 16399 | 0 | 9946 | 3 | 0 | n (2 rows) // No dictionaries are being deleted. test=# select cleanup_unused_dictionaries(); cleanup_unused_dictionaries ----------------------------- 0 (1 row) Once the new copy_tbl is also deleted, the dictionary can be dropped because no dependency exists on it: test=# drop table copy_tbl; DROP TABLE // The dictionary is then deleted. test=# select cleanup_unused_dictionaries(); cleanup_unused_dictionaries ----------------------------- 1 (1 row) Another example using composite types, including a more complex scenario involving two source tables. // Create a base composite type with two text fields test=# create type my_composite as (f1 text, f2 text); CREATE TYPE // Create a nested composite type that uses my_composite twice test=# create type my_composite1 as (f1 my_composite, f2 my_composite); CREATE TYPE test=# \d my_composite Composite type "public.my_composite" Column | Type | Collation | Nullable | Default --------+------+-----------+----------+--------- f1 | text | | | f2 | text | | | test=# \d my_composite1 Composite type "public.my_composite1" Column | Type | Collation | Nullable | Default --------+--------------+-----------+----------+--------- f1 | my_composite | | | f2 | my_composite | | | // Sample table with ZSTD dictionary compression on text columns test=# \d+ orders Table "public.orders" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description -------------+---------+-----------+----------+---------+----------+-------------+--------------+------------- order_id | integer | | | | plain | | | customer_id | integer | | | | plain | | | random1 | text | | | | extended | zstd | | random2 | text | | | | extended | zstd | | Access method: heap // Sample table with ZSTD dictionary compression on one of the text column test=# \d+ customers Table "public.customers" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description -------------+---------+-----------+----------+---------+----------+-------------+--------------+------------- customer_id | integer | | | | plain | | | random3 | text | | | | extended | zstd | | random4 | text | | | | extended | | | Access method: heap // Check existing dictionaries: dictid 1 for random1, dictid 2 for random2, dictid 3 for random3 attribute test=# select dictid from pg_zstd_dictionaries; dictid -------- 1 2 3 (3 rows) // List all objects dependent on ZSTD dictionaries test=# select objid::regclass, * from pg_depend where refclassid = 9946; objid | classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype -----------+---------+-------+----------+------------+----------+-------------+--------- orders | 1259 | 16391 | 0 | 9946 | 1 | 0 | n orders | 1259 | 16391 | 0 | 9946 | 2 | 0 | n customers | 1259 | 16396 | 0 | 9946 | 3 | 0 | n (3 rows) // Create new table using nested composite type // This copies compressed datums into temp1. test=# create table temp1 as select ROW( ROW(random3, random4)::my_composite, ROW(random1, random2)::my_composite )::my_composite1 from customers full outer join orders using (customer_id); SELECT 51 test=# select objid::regclass, * from pg_depend where refclassid = 9946; objid | classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype -----------+---------+-------+----------+------------+----------+-------------+--------- orders | 1259 | 16391 | 0 | 9946 | 1 | 0 | n temp1 | 1259 | 16423 | 0 | 9946 | 1 | 0 | n orders | 1259 | 16391 | 0 | 9946 | 2 | 0 | n temp1 | 1259 | 16423 | 0 | 9946 | 2 | 0 | n temp1 | 1259 | 16423 | 0 | 9946 | 3 | 0 | n customers | 1259 | 16396 | 0 | 9946 | 3 | 0 | n (6 rows) // Drop the original source tables. test=# drop table orders; DROP TABLE test=# drop table customers ; DROP TABLE // Even after dropping orders, customers table, temp1 still holds references to the dictionaries. test=# select objid::regclass, * from pg_depend where refclassid = 9946; objid | classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype -------+---------+-------+----------+------------+----------+-------------+--------- temp1 | 1259 | 16423 | 0 | 9946 | 1 | 0 | n temp1 | 1259 | 16423 | 0 | 9946 | 2 | 0 | n temp1 | 1259 | 16423 | 0 | 9946 | 3 | 0 | n (3 rows) // Attempt cleanup, No cleanup occurs, because temp1 table still depends on the dictionaries. test=# select cleanup_unused_dictionaries(); cleanup_unused_dictionaries ----------------------------- 0 (1 row) test=# select dictid from pg_zstd_dictionaries ; dictid -------- 1 2 3 (3 rows) // Drop the destination table test=# drop table temp1; DROP TABLE // Confirm no remaining dependencies test=# select objid::regclass, * from pg_depend where refclassid = 9946; objid | classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype -------+---------+-------+----------+------------+----------+-------------+--------- (0 rows) // Cleanup now succeeds test=# select cleanup_unused_dictionaries(); cleanup_unused_dictionaries ----------------------------- 3 (1 row) test=# select dictid from pg_zstd_dictionaries ; dictid -------- (0 rows) This design ensures that: Dictionaries are only deleted when no table depends on them. We avoid costly decompression/recompression to avoid compressed datum leakage. We don’t retain dictionaries forever. These changes are the core additions in this revision of the patch to address concern around long-lived dictionaries and compressed datum leakage. Additionally, this update incorporates feedback by enabling automatic zstd dictionary generation and cleanup during the VACUUM process and includes changes to support copying ZSTD dictionaries during pg_upgrade. Patch summary: v11-0001-varattrib_4b-changes-and-macros-update-needed-to.patch Refactors varattrib_4b structures and updates related macros to enable ZSTD dictionary support. v11-0002-Zstd-compression-and-decompression-routines-incl.patch Adds ZSTD compression and decompression routines, and introduces a new catalog to store dictionary metadata. v11-0003-Zstd-dictionary-training-process.patch Implements the dictionary training workflow. Includes built-in support for text and jsonb types. Allows users to define custom sampling functions per type by specifying a C function name in the pg_type.typzstdsampling field. v11-0004-Dependency-tracking-mechanism-to-track-compresse.patch Introduces a dependency tracking mechanism using pg_depend to record which ZSTD dictionaries a table depends on. When compressed datums that rely on a dictionary are copied to unrelated target tables, the corresponding dictionary dependencies from the source table are also recorded for the target table, ensuring the dictionaries are not prematurely cleaned up. v11-0005-generate-and-cleanup-dictionaries-using-vacuum.patch Adds integration with VACUUM to automatically generate and clean up ZSTD dictionaries. v11-0006-pg_dump-pg_upgrade-needed-changes-to-support-new.patch Extends pg_dump and pg_upgrade to support migrating ZSTD dictionaries and their dependencies during pg_upgrade. v11-0007-Some-tests-related-to-zstd-dictionary-based-comp.patch Provides test coverage for ZSTD dictionary-based compression features, including training, usage, and cleanup. I hope that these changes address your concerns, any thoughts or suggestions on this approach are welcome. Best regards, Nikhil Veldanda On Mon, Mar 17, 2025 at 1:03 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda > <veldanda.nikhilkumar17@gmail.com> wrote: > > struct /* Extended compression format */ > > { > > uint32 va_header; > > uint32 va_tcinfo; > > uint32 va_cmp_alg; > > uint32 va_cmp_dictid; > > char va_data[FLEXIBLE_ARRAY_MEMBER]; > > } va_compressed_ext; > > } varattrib_4b; > > First, thanks for sending along the performance results. I agree that > those are promising. Second, thanks for sending these design details. > > The idea of keeping dictionaries in pg_zstd_dictionaries literally > forever doesn't seem very appealing, but I'm not sure what the other > options are. I think we've established in previous work in this area > that compressed values can creep into unrelated tables and inside > records or other container types like ranges. Therefore, we have no > good way of knowing when a dictionary is unreferenced and can be > dropped. So in that sense your decision to keep them forever is > "right," but it's still unpleasant. It would even be necessary to make > pg_upgrade carry them over to new versions. > > If we could make sure that compressed datums never leaked out into > other tables, then tables could depend on dictionaries and > dictionaries could be dropped when there were no longer any tables > depending on them. But like I say, previous work suggested that this > would be very difficult to achieve. However, without that, I imagine > users generating new dictionaries regularly as the data changes and > eventually getting frustrated that they can't get rid of the old ones. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Attachment
- v11-0005-generate-and-cleanup-dictionaries-using-vacuum.patch
- v11-0003-Zstd-dictionary-training-process.patch
- v11-0007-Some-tests-related-to-zstd-dictionary-based-comp.patch
- v11-0006-pg_dump-pg_upgrade-needed-changes-to-support-new.patch
- v11-0004-Dependency-tracking-mechanism-to-track-compresse.patch
- v11-0001-varattrib_4b-changes-and-macros-update-needed-to.patch
- v11-0002-Zstd-compression-and-decompression-routines-incl.patch
On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> wrote: > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...) > > As compressed datums can be copied to other unrelated tables via CTAS, > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a > method inheritZstdDictionaryDependencies. This method is invoked at > the end of such statements and ensures that any dictionary > dependencies from source tables are copied to the destination table. > We determine the set of source tables using the relationOids field in > PlannedStmt. With the disclaimer that I haven't opened the patch or thought terribly deeply about this issue, at least not yet, my fairly strong suspicion is that this design is not going to work out, for multiple reasons. In no particular order: 1. I don't think users will like it if dependencies on a zstd dictionary spread like kudzu across all of their tables. I don't think they'd like it even if it were 100% accurate, but presumably this is going to add dependencies any time there MIGHT be a real dependency rather than only when there actually is one. 2. Inserting into a table or updating it only takes RowExclusiveLock, which is not even self-exclusive. I doubt that it's possible to change system catalogs in a concurrency-safe way with such a weak lock. For instance, if two sessions tried to do the same thing in concurrent transactions, they could both try to add the same dependency at the same time. 3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE TABLE...EXECUTE are the only ways that datums can creep from one table into another. For example, what if I create a plpgsql function that gets a value from one table and stores it in a variable, and then use that variable to drive an INSERT into another table? I seem to recall there are complex cases involving records and range types and arrays, too, where the compressed object gets wrapped inside of another object; though maybe that wouldn't matter to your implementation if INSERT INTO ... SELECT uses a sufficiently aggressive strategy for adding dependencies. When Dilip and I were working on lz4 TOAST compression, my first instinct was to not let LZ4-compressed datums leak out of a table by forcing them to be decompressed (and then possibly recompressed). We spent a long time trying to make that work before giving up. I think this is approximately where things started to unravel, and I'd suggest you read both this message and some of the discussion before and after: https://www.postgresql.org/message-id/20210316185455.5gp3c5zvvvq66iyj@alap3.anarazel.de I think we could add plain-old zstd compression without really tackling this issue, but if we are going to add dictionaries then I think we might need to revisit the idea of preventing things from leaking out of tables. What I can't quite remember at the moment is how much of the problem was that it was going to be slow to force the recompression, and how much of it was that we weren't sure we could even find all the places in the code that might need such handling. I'm now also curious to know whether Andres would agree that it's bad if zstd dictionaries are un-droppable. After all, I thought it would be bad if there was no way to eliminate a dependency on a compression method, and he disagreed. So maybe he would also think undroppable dictionaries are fine. But maybe not. It seems even worse to me than undroppable compression methods, because you'll probably not have that many compression methods ever, but you could have a large number of dictionaries eventually. -- Robert Haas EDB: http://www.enterprisedb.com