Thread: Zstandard support for toast compression
Hi all, Toast compression is supported for LZ4, and thanks to the refactoring work done with compression methods assigned to an attribute, adding support for more methods is straight-forward, as long as we don't support more than 4 methods as the compression ID is stored within the first 2 bits of the raw length. Do we have an argument against supporting zstd for this stuff? Zstandard compresses a bit more than LZ4 at the cost of some extra CPU, outclassing easily pglz, but those facts are known, and zstd has benefits over LZ4 when one is ready to pay more CPU for the extra compression. It took me a couple of hours to get that done. I have not added any tests for pg_dump or cross-checks with the default compressions as this is basically what compression.sql already does, so this patch includes a minimum to look after the compression, decompression and slice decompression. Another thing is that the errors generated by SET default_toast_compression make the output generated build-dependent, and that becomes annoying once there is more than one compression option. The attached removes those cases for simplicity, and perhaps we'd better remove from compression.sql all the LZ4-only tests. ZSTD_decompress() does not allow the use of a destination buffer lower than the full decompressed size, but similarly to base backups, streams seem to handle the case of slices fine. Thoughts? -- Michael
Attachment
On Tue, May 17, 2022 at 12:19 AM Michael Paquier <michael@paquier.xyz> wrote: > Toast compression is supported for LZ4, and thanks to the refactoring > work done with compression methods assigned to an attribute, adding > support for more methods is straight-forward, as long as we don't > support more than 4 methods as the compression ID is stored within the > first 2 bits of the raw length. Yeah - I think we had better reserve the fourth bit pattern for something extensible e.g. another byte or several to specify the actual method, so that we don't have a hard limit of 4 methods. But even with such a system, the first 3 methods will always and forever be privileged over all others, so we'd better not make the mistake of adding something silly as our third algorithm. I don't particularly have anything against adding Zstandard compression here, but I wonder whether there's any rush. If we decide not to add this now, we can always change our minds and add it later, but if we decide to add it now, there's no backing it out. I'd probably be inclined to wait and see if our public demands it of us. -- Robert Haas EDB: http://www.enterprisedb.com
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Tue, May 17, 2022 at 12:19 AM Michael Paquier <michael@paquier.xyz> wrote: > > Toast compression is supported for LZ4, and thanks to the refactoring > > work done with compression methods assigned to an attribute, adding > > support for more methods is straight-forward, as long as we don't > > support more than 4 methods as the compression ID is stored within the > > first 2 bits of the raw length. > > Yeah - I think we had better reserve the fourth bit pattern for > something extensible e.g. another byte or several to specify the > actual method, so that we don't have a hard limit of 4 methods. But > even with such a system, the first 3 methods will always and forever > be privileged over all others, so we'd better not make the mistake of > adding something silly as our third algorithm. In such a situation, would they really end up being properly distinct when it comes to what our users see..? I wouldn't really think so. > I don't particularly have anything against adding Zstandard > compression here, but I wonder whether there's any rush. If we decide > not to add this now, we can always change our minds and add it later, > but if we decide to add it now, there's no backing it out. I'd > probably be inclined to wait and see if our public demands it of us. If anything, this strikes me as a reason to question using a bit for LZ4 and not a mark against Zstd. Still tho- there seems like a clear path to having more than 4 when we get demand for it, and here's a patch for what is pretty clearly one of the better compression methods out there today. As another point, while pgbackrest supports gzip, lz4, zstd, and bzip2, where it's supported, zstd seems to be the most used. We had gzip first as zstd wasn't really a proper thing at the time, and lz4 for speed. Bzip2 was added more as it was easy to do and of some interest on systems that didn't have zstd but I wouldn't support adding it to PG as I'd hope that nearly all systems where v16 is deployed will have Zstd support. +1 for adding Zstd for me. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Yeah - I think we had better reserve the fourth bit pattern for >> something extensible e.g. another byte or several to specify the >> actual method, so that we don't have a hard limit of 4 methods. But >> even with such a system, the first 3 methods will always and forever >> be privileged over all others, so we'd better not make the mistake of >> adding something silly as our third algorithm. > In such a situation, would they really end up being properly distinct > when it comes to what our users see..? I wouldn't really think so. It should be transparent to users, sure, but the point is that the first three methods will have a storage space advantage over others. Plus we'd have to do some actual work to create that extension mechanism. I'm with Robert in that I do not see any urgency to add another method. The fact that Stephen is already questioning whether LZ4 should have been added first is not making me any more eager to jump here. Compression methods come, and they go, and we do not serve anyone's interest by being early adopters. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Robert Haas (robertmhaas@gmail.com) wrote: > >> Yeah - I think we had better reserve the fourth bit pattern for > >> something extensible e.g. another byte or several to specify the > >> actual method, so that we don't have a hard limit of 4 methods. But > >> even with such a system, the first 3 methods will always and forever > >> be privileged over all others, so we'd better not make the mistake of > >> adding something silly as our third algorithm. > > > In such a situation, would they really end up being properly distinct > > when it comes to what our users see..? I wouldn't really think so. > > It should be transparent to users, sure, but the point is that the > first three methods will have a storage space advantage over others. > Plus we'd have to do some actual work to create that extension mechanism. > > I'm with Robert in that I do not see any urgency to add another method. > The fact that Stephen is already questioning whether LZ4 should have > been added first is not making me any more eager to jump here. > Compression methods come, and they go, and we do not serve anyone's > interest by being early adopters. I'm getting a bit of deja-vu here from when I was first trying to add TRUNCATE as a GRANT'able option and being told we didn't want to burn those precious bits. But, fine, then I'd suggest to Michael that he work on actively solving the problem we've now got where we have such a limited number of bits, and then come back and add Zstd after that's done. I disagree that we should be pushing back so hard on adding Zstd in general, but if we are going to demand that we have a way to support more than these few compression options before ever adding any new ones (considering how long it's taken Zstd to get to the level it is now, we're talking about close to a *decade* from such a new algorithm showing up and getting to a similar level of adoption, and then apparently more because we don't feel it's 'ready' yet), then let's work towards that and not complain when it shows up that it's not needed yet (as I fear would happen ... and just leave us unable to make useful progress). Thanks, Stephen
Attachment
On Tue, May 17, 2022 at 04:12:14PM -0400, Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I'm with Robert in that I do not see any urgency to add another method. Okay. >> The fact that Stephen is already questioning whether LZ4 should have >> been added first is not making me any more eager to jump here. >> Compression methods come, and they go, and we do not serve anyone's >> interest by being early adopters. FWIW, I don't really question the choice of LZ4 as an alternative to pglz. One very easily outclasses the other, guess which one. Perhaps we would have gone with zstd back in the day, but here we are, andx this option is already very good in itself. Zstandard may not be old enough to vote, being only 7, but its use is already quite spread. So I would not be surprised if it remains popular for many years. We'll see how it goes. > But, fine, then I'd suggest to Michael that he work on actively solving > the problem we've now got where we have such a limited number of bits, > and then come back and add Zstd after that's done. I disagree that we > should be pushing back so hard on adding Zstd in general, but if we are > going to demand that we have a way to support more than these few > compression options before ever adding any new ones (considering how > long it's taken Zstd to get to the level it is now, we're talking > about close to a *decade* from such a new algorithm showing up and > getting to a similar level of adoption, and then apparently more because > we don't feel it's 'ready' yet), then let's work towards that and not > complain when it shows up that it's not needed yet (as I fear would > happen ... and just leave us unable to make useful progress). Saying that, I agree with the point to not set in stone the 4th bit used in the toast compression header, and that is would be better to use it for a more extensible design. Didn't the proposal to introduce the custom compression mechanisms actually touch this area? The set of macros we have currently for the toast values in a varlena are already kind of hard to figure out. Making that harder to parse would not be appealing, definitely. -- Michael
Attachment
On Tue, May 17, 2022 at 4:12 PM Stephen Frost <sfrost@snowman.net> wrote: > I'm getting a bit of deja-vu here from when I was first trying to add > TRUNCATE as a GRANT'able option and being told we didn't want to burn > those precious bits. Right, it's the same issue ... although in that case there are a lot more bits available than we have here. > But, fine, then I'd suggest to Michael that he work on actively solving > the problem we've now got where we have such a limited number of bits, > and then come back and add Zstd after that's done. I disagree that we > should be pushing back so hard on adding Zstd in general, but if we are > going to demand that we have a way to support more than these few > compression options before ever adding any new ones (considering how > long it's taken Zstd to get to the level it is now, we're talking > about close to a *decade* from such a new algorithm showing up and > getting to a similar level of adoption, and then apparently more because > we don't feel it's 'ready' yet), then let's work towards that and not > complain when it shows up that it's not needed yet (as I fear would > happen ... and just leave us unable to make useful progress). It's kind of ridiculous to talk about "pushing back so hard on adding Zstd in general" when there's like 2 emails expressing only moderate skepticism. I clearly said I wasn't 100% against it. But I want to point out here that you haven't really offered any kind of argument in favor of supporting Zstd. You basically seem to just be arguing that it's dumb to worry about running out of bit space, and I think that's just obviously false. PostgreSQL is full of things that are hard to improve because nearly all of the bit space was gobbled up early on, and there's not much left for future features. The heap tuple header format is an excellent example of this. Surely if we were designing that over again today we wouldn't have expended some of those bits on the things we did. I do understand that Zstandard is a good compression algorithm, and if we had an extensibility mechanism here where one of the four possible bit patterns then indicates that the next byte (or two or four) stores the real algorithm type, then what about adding Zstandard that way instead of consuming one of our four primary bit patterns? That way we'd have this option for people who want it, but we'd have more options for the future instead of fewer. i.e. something like: 00 = PGLZ 01 = LZ4 10 = reserved for future emergencies 11 = extended header with additional type byte (1 of 256 possible values reserved for Zstandard) I wouldn't be worried about getting backed into a corner with that approach. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, May 17, 2022 at 02:54:28PM -0400, Robert Haas wrote: > I don't particularly have anything against adding Zstandard > compression here, but I wonder whether there's any rush. If we decide > not to add this now, we can always change our minds and add it later, > but if we decide to add it now, there's no backing it out. I'd > probably be inclined to wait and see if our public demands it of us. +1 One consideration is that zstd with negative compression levels is comparable to LZ4, and with positive levels gets better compression. It can serve both purposes (oltp vs DW or storage-limited vs cpu-limited). If zstd is supported, then for sure at least its compression level should be configurable. default_toast_compression should support it. https://commitfest.postgresql.org/35/3102/ Also, zstd is a few years newer than lz4. Which I hope means that the API is a bit better/further advanced - but (as we've seen) may still be evolving. Zstd allows some of its options to be set by environment variable - in particular, the number of threads. We should consider explicitly setting that to zero in the toast context unless we're convinced it's no issue for every backend (not just basebackup). -- Justin
On Wed, May 18, 2022 at 9:17 AM Robert Haas <robertmhaas@gmail.com> wrote: > But I want to point out here that you haven't really offered any kind > of argument in favor of supporting Zstd. You basically seem to just be > arguing that it's dumb to worry about running out of bit space, and I > think that's just obviously false. +1 -- Peter Geoghegan
On Wed, May 18, 2022 at 12:17:16PM -0400, Robert Haas wrote: > i.e. something like: > > 00 = PGLZ > 01 = LZ4 > 10 = reserved for future emergencies > 11 = extended header with additional type byte (1 of 256 possible > values reserved for Zstandard) Btw, shouldn't we have something a bit more, err, extensible for the design of an extensible varlena header? If we keep it down to some bitwise information, we'd be fine for a long time but it would be annoying to review again an extended design if we need to extend it with more data. -- Michael
Attachment
On Wed, May 18, 2022 at 9:47 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I do understand that Zstandard is a good compression algorithm, and if > we had an extensibility mechanism here where one of the four possible > bit patterns then indicates that the next byte (or two or four) stores > the real algorithm type, then what about adding Zstandard that way > instead of consuming one of our four primary bit patterns? That way > we'd have this option for people who want it, but we'd have more > options for the future instead of fewer. > > i.e. something like: > > 00 = PGLZ > 01 = LZ4 > 10 = reserved for future emergencies > 11 = extended header with additional type byte (1 of 256 possible > values reserved for Zstandard) > +1 for such an extensible mechanism if we decide to go with Zstandard compression algorithm. To decide that won't it make sense to see some numbers as Michael already has a patch for the new algorithm? -- With Regards, Amit Kapila.
В письме от вторник, 17 мая 2022 г. 23:01:07 MSK пользователь Tom Lane написал: Hi! I came to this branch looking for a patch to review, but I guess I would join the discussion instead of reading the code. > >> Yeah - I think we had better reserve the fourth bit pattern for > >> something extensible e.g. another byte or several to specify the > >> actual method, so that we don't have a hard limit of 4 methods. But > >> even with such a system, the first 3 methods will always and forever > >> be privileged over all others, so we'd better not make the mistake of > >> adding something silly as our third algorithm. > > > > In such a situation, would they really end up being properly distinct > > when it comes to what our users see..? I wouldn't really think so. > > It should be transparent to users, sure, but the point is that the > first three methods will have a storage space advantage over others. > Plus we'd have to do some actual work to create that extension mechanism. Postgres is well known for extensiblility. One can write your own implementation of almost everything and make it an extension. Though one would hardly need more than one (or two) additional compression methods, but which method one will really need is hard to say. So I guess it would be much better to create and API for creating and registering own compression method and create build in Zstd compression method that can be used (or optionally not used) via that API. Moreover I guess this API (may be with some modification) can be used for seamless data encryption, for example. So I guess it would be better to make it extensible from the start and use this precious bit for compression method chosen by user, and may be later extend it with another byte of compression method bits, if it is needed. -- Nikolay Shaplov aka Nataraj Fuzzing Engineer at Postgres Professional Matrix IM: @dhyan:nataraj.su
Attachment
On Thu, May 19, 2022 at 4:20 AM Michael Paquier <michael@paquier.xyz> wrote: > Btw, shouldn't we have something a bit more, err, extensible for the > design of an extensible varlena header? If we keep it down to some > bitwise information, we'd be fine for a long time but it would be > annoying to review again an extended design if we need to extend it > with more data. What do you have in mind? -- Robert Haas EDB: http://www.enterprisedb.com
Greetings, * Nikolay Shaplov (dhyan@nataraj.su) wrote: > В письме от вторник, 17 мая 2022 г. 23:01:07 MSK пользователь Tom Lane > написал: > > Hi! I came to this branch looking for a patch to review, but I guess I would > join the discussion instead of reading the code. Seems that's what would be helpful now thanks for joining the discussion. > > >> Yeah - I think we had better reserve the fourth bit pattern for > > >> something extensible e.g. another byte or several to specify the > > >> actual method, so that we don't have a hard limit of 4 methods. But > > >> even with such a system, the first 3 methods will always and forever > > >> be privileged over all others, so we'd better not make the mistake of > > >> adding something silly as our third algorithm. > > > > > > In such a situation, would they really end up being properly distinct > > > when it comes to what our users see..? I wouldn't really think so. > > > > It should be transparent to users, sure, but the point is that the > > first three methods will have a storage space advantage over others. > > Plus we'd have to do some actual work to create that extension mechanism. > > Postgres is well known for extensiblility. One can write your own > implementation of almost everything and make it an extension. > Though one would hardly need more than one (or two) additional compression > methods, but which method one will really need is hard to say. A thought I've had before is that it'd be nice to specify a particular compression method on a data type basis. Wasn't the direction that this was taken, for reasons, but I wonder about perhaps still having a data type compression method and perhaps one of these bits might be "the data type's (default?) compression method". Even so though, having an extensible way to add new compression methods would be a good thing. For compression methods that we already support in other parts of the system, seems clear that we should allow those to be used for column compression too. We should certainly also support a way to specify on a compression-type specific level what the compression level should be though. > So I guess it would be much better to create and API for creating and > registering own compression method and create build in Zstd compression method > that can be used (or optionally not used) via that API. While I generally agree that we want to provide extensibility in this area, given that we already have zstd as a compile-time option and it exists in other parts of the system, I don't think it makes sense to require users to install an extension to use it. > Moreover I guess this API (may be with some modification) can be used for > seamless data encryption, for example. Perhaps.. but this kind of encryption wouldn't allow indexing and certainly lots of other metadata would still be unencrypted (the entire system catalog being a good example..). Thanks, Stephen
Attachment
On Thu, May 19, 2022 at 04:12:01PM -0400, Robert Haas wrote: > On Thu, May 19, 2022 at 4:20 AM Michael Paquier <michael@paquier.xyz> wrote: >> Btw, shouldn't we have something a bit more, err, extensible for the >> design of an extensible varlena header? If we keep it down to some >> bitwise information, we'd be fine for a long time but it would be >> annoying to review again an extended design if we need to extend it >> with more data. > > What do you have in mind? A per-varlena checksum was one thing that came into my mind. -- Michael
Attachment
On Mon, May 23, 2022 at 12:33 AM Michael Paquier <michael@paquier.xyz> wrote: > On Thu, May 19, 2022 at 04:12:01PM -0400, Robert Haas wrote: > > On Thu, May 19, 2022 at 4:20 AM Michael Paquier <michael@paquier.xyz> wrote: > >> Btw, shouldn't we have something a bit more, err, extensible for the > >> design of an extensible varlena header? If we keep it down to some > >> bitwise information, we'd be fine for a long time but it would be > >> annoying to review again an extended design if we need to extend it > >> with more data. > > > > What do you have in mind? > > A per-varlena checksum was one thing that came into my mind. It's a bit hard for me to believe that such a thing would be desirable. I think it makes more sense to checksum blocks than datums, because: (1) There might be a lot of really small datums, and storing checksums for all of them could be costly, or (2) The datums could on the other hand be really big, and then the checksum is pretty non-specific about where the problem has happened. YMMV, of course. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, May 20, 2022 at 4:17 PM Stephen Frost <sfrost@snowman.net> wrote: > A thought I've had before is that it'd be nice to specify a particular > compression method on a data type basis. Wasn't the direction that this > was taken, for reasons, but I wonder about perhaps still having a data > type compression method and perhaps one of these bits might be "the data > type's (default?) compression method". Even so though, having an > extensible way to add new compression methods would be a good thing. If we look at pglz vs. LZ4, there's no argument that it makes more sense to use LZ4 for some data types and PGLZ for others. Indeed, it's unclear why you would ever use PGLZ if you had LZ4 as an option. Even if we imagine a world in which a full spectrum of modern compressors - Zstandard, bzip2, gzip, and whatever else you want - it's basically a time/space tradeoff. You will either want a fast compressor or a good one. The situation in which this sort of thing might make sense is if we had a compressor that is specifically designed to work well on a certain data type, and especially if the code for that data type could perform some operations directly on the compressed representation. From what I understand, the ideas that people have in this area around jsonb require that there be a dictionary available. For instance, you might scan a jsonb column, collect all the keys that occur frequently, put them in a dictionary, and then use them to compress the column. I can see that being effective, but the infrastructure to store that dictionary someplace is infrastructure we have not got. It may be better to try to handle these use cases by building the compression into the data type representation proper, perhaps disabling the general-purpose TOAST compression stuff, rather than by making it part of TOAST compression. We found during the implementation of LZ4 TOAST compression that it's basically impossible to keep a compressed datum from "leaking out" into other parts of the system. We have to assume that any datum we create by TOAST compression may continue to exist somewhere in the system long after the table in which it was originally stored is gone. So, while a dictionary could be used for compression, it would have to be done in a way where that dictionary wasn't required to decompress, unless we're prepared to prohibit ever dropping a dictionary, which sounds like not a lot of fun. If the compression were part of the data type instead of part of TOAST compression, we would dodge this problem. I think that might be a better way to go. -- Robert Haas EDB: http://www.enterprisedb.com
В письме от пятница, 20 мая 2022 г. 23:17:42 MSK пользователь Stephen Frost написал: > While I generally agree that we want to provide extensibility in this > area, given that we already have zstd as a compile-time option and it > exists in other parts of the system, I don't think it makes sense to > require users to install an extension to use it. I mean that there can be Compression Method Provider, either built in postgres core, or implemented in extension. And one will need to create Compression method using desired Compression Method Provider. Like (it is just pure imagination) CREATE COMPRESSION METHOD MY_COMPRESSION USING 'my_compression_provider'; This will assign certain bits combination to that method, and later one can use that method for TOAST compression... > > Moreover I guess this API (may be with some modification) can be used for > > seamless data encryption, for example. > > Perhaps.. but this kind of encryption wouldn't allow indexing Yes, this will require some more efforts. But for encrypting pure storage that API can be quite useful. > and certainly lots of other metadata would still be unencrypted (the entire > system catalog being a good example..). In many it is enough to encrypt only sensible information itself, not whole metadata. My point was not to discuss DB encryption, it is quite complex issue, my point was to point out that API that allows custom compression methods may became useful for other solutions. Encryption was just first example that came in my mind. Robert Haas has another example for compression method optimized for certain data type. So it is good if you can have method of your own. -- Nikolay Shaplov aka Nataraj Fuzzing Engineer at Postgres Professional Matrix IM: @dhyan:nataraj.su
Attachment
Hi, > Yeah - I think we had better reserve the fourth bit pattern for > something extensible e.g. another byte or several to specify the > actual method, so that we don't have a hard limit of 4 methods. TWIMC there is an ongoing discussion [1] of making TOAST pointers extendable since this is a dependency for several patches that are currently in development. TL;DR the consensus so far seems to be using varattrib_1b_e.va_tag as a sign of an alternative / extended TOAST pointer content. For the on-disk values va_tag currently stores a constant 18 (VARTAG_ONDISK). Where 18 is sizeof(varatt_external) + /* header size */ 2, which seems to be not extremely useful anyway. If you are interested in the topic please consider joining the thread. [1]: https://postgr.es/m/CAN-LCVMq2X%3Dfhx7KLxfeDyb3P%2BBXuCkHC0g%3D9GF%2BJD4izfVa0Q%40mail.gmail.com -- Best regards, Aleksander Alekseev
On Tue, May 23, 2023 at 05:56:13PM +0300, Aleksander Alekseev wrote: > TWIMC there is an ongoing discussion [1] of making TOAST pointers > extendable since this is a dependency for several patches that are > currently in development. Thanks for the ping. I have seen and read the other thread, and yes, that's an exciting proposal, not only for what's specific to the thread here. -- Michael