Re: ZStandard (with dictionaries) compression support for TOAST compression - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: ZStandard (with dictionaries) compression support for TOAST compression |
Date | |
Msg-id | CA+Tgmob1Ux7R2_0mbZmTbsVmUda04X1AdHGjpu6FkZfdbztugQ@mail.gmail.com Whole thread Raw |
In response to | Re: ZStandard (with dictionaries) compression support for TOAST compression (Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>) |
List | pgsql-hackers |
On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> wrote: > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...) > > As compressed datums can be copied to other unrelated tables via CTAS, > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a > method inheritZstdDictionaryDependencies. This method is invoked at > the end of such statements and ensures that any dictionary > dependencies from source tables are copied to the destination table. > We determine the set of source tables using the relationOids field in > PlannedStmt. With the disclaimer that I haven't opened the patch or thought terribly deeply about this issue, at least not yet, my fairly strong suspicion is that this design is not going to work out, for multiple reasons. In no particular order: 1. I don't think users will like it if dependencies on a zstd dictionary spread like kudzu across all of their tables. I don't think they'd like it even if it were 100% accurate, but presumably this is going to add dependencies any time there MIGHT be a real dependency rather than only when there actually is one. 2. Inserting into a table or updating it only takes RowExclusiveLock, which is not even self-exclusive. I doubt that it's possible to change system catalogs in a concurrency-safe way with such a weak lock. For instance, if two sessions tried to do the same thing in concurrent transactions, they could both try to add the same dependency at the same time. 3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE TABLE...EXECUTE are the only ways that datums can creep from one table into another. For example, what if I create a plpgsql function that gets a value from one table and stores it in a variable, and then use that variable to drive an INSERT into another table? I seem to recall there are complex cases involving records and range types and arrays, too, where the compressed object gets wrapped inside of another object; though maybe that wouldn't matter to your implementation if INSERT INTO ... SELECT uses a sufficiently aggressive strategy for adding dependencies. When Dilip and I were working on lz4 TOAST compression, my first instinct was to not let LZ4-compressed datums leak out of a table by forcing them to be decompressed (and then possibly recompressed). We spent a long time trying to make that work before giving up. I think this is approximately where things started to unravel, and I'd suggest you read both this message and some of the discussion before and after: https://www.postgresql.org/message-id/20210316185455.5gp3c5zvvvq66iyj@alap3.anarazel.de I think we could add plain-old zstd compression without really tackling this issue, but if we are going to add dictionaries then I think we might need to revisit the idea of preventing things from leaking out of tables. What I can't quite remember at the moment is how much of the problem was that it was going to be slow to force the recompression, and how much of it was that we weren't sure we could even find all the places in the code that might need such handling. I'm now also curious to know whether Andres would agree that it's bad if zstd dictionaries are un-droppable. After all, I thought it would be bad if there was no way to eliminate a dependency on a compression method, and he disagreed. So maybe he would also think undroppable dictionaries are fine. But maybe not. It seems even worse to me than undroppable compression methods, because you'll probably not have that many compression methods ever, but you could have a large number of dictionaries eventually. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: