Thread: adding 'zstd' as a compression algorithm
Hi, Over on the rather-long "refactoring basebackup.c" thread, there is a proposal, which I endorse, to add base-backup compression via zstd. To do that, we'd need to patch configure.ac to create a new --with-zstd flag and appropriate supporting infrastructure. My colleague Jeevan Ladhe included that in a patch posted over there; I've extracted just the part adding configure support for libzstd and attach it here. I thought it would be good to have a new thread specifically devoted to the topic of whether zstd is a thing that PostgreSQL ought to support in general. In general, deciding on new compression algorithms can feel a bit like debating the merits of vi vs. emacs, or one political party vs. another. Everyone has their own favorites, for reasons that can often seem idiosyncratic. One advantage of zstd is that it is already being used by other prominent open-source projects, including the Linux kernel.[1] This means that it is unlikely to just dry up and vanish, and it also reduces the risk of legal issues. On a technical level, zstd offers compression ratios similar to or better than gzip, but with much faster compression speed. Furthermore, the zstd library has built-in multi-threaded compression which we may be able to leverage for even better performance. In fact, parallel zstd might be able to compress faster than lz4, which is already extremely fast. What I imagine if this patch is accepted is that we (or our users) will end up using lz4 for places where compression needs to be very lightweight, and zstd for places where it's acceptable or even desirable to spend more CPU cycles in exchange for better compression. I think that gzip and pglz are really only of historical interest - and I don't say that to mean that we shouldn't continue to support them or that they won't get use. Lots of people are perfectly happy with TOAST compression using pglz, and I'm perfectly happy if they continue to do that forever, even though I'm glad LZ4 is now an option. Likewise, I still download the .tar.gz version of anything that gives me that option, basically because I'm familiar with the format and it's easy for me to just carry on using it -- and in a similar way I expect a lot of people will be happy to continue to compress backups with gzip for many years to come. But I think there is value in supporting newer and better technology, too. I realize that we don't want to support every new and shiny thing that shows up, but I don't think that's what I am proposing here. Anyway, those are my thoughts. What are yours? Thanks, [1] https://en.wikipedia.org/wiki/Zstd#Usage -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
I think the WAL patch (4035cd5d4) should support zstd if library support is added. A supplementary patch for that already exists. https://www.postgresql.org/message-id/YNqWd2GSMrnqWIfx@paquier.xyz There's also some in-progress patches: - Konstantin and Daniil have a patch to add libpq compression, which ultimately wants to use zstd. https://commitfest.postgresql.org/37/3499/ - I had a patch to add zstd to pg_dump, but I ended up storing backups to ZFS+zstd rather than waiting to progress the patch. https://commitfest.postgresql.org/32/2888/ Regarding the patch: I suppose it should support windows, and whatever patches use zstd should update the install instructions. See 9ca40dcd4, 02a93e7ef, 4035cd5d4.
On Tue, Feb 15, 2022 at 10:20 AM Robert Haas <robertmhaas@gmail.com> wrote: > In general, deciding on new compression algorithms can feel a bit like > debating the merits of vi vs. emacs, or one political party vs. > another. Really? I don't get that impression myself. (That's not important, though.) > What I imagine if this patch is accepted is that we (or our users) > will end up using lz4 for places where compression needs to be very > lightweight, and zstd for places where it's acceptable or even > desirable to spend more CPU cycles in exchange for better compression. Sounds reasonable to me. > Likewise, I still download the .tar.gz version of anything > that gives me that option, basically because I'm familiar with the > format and it's easy for me to just carry on using it -- and in a > similar way I expect a lot of people will be happy to continue to > compress backups with gzip for many years to come. Maybe, but it seems more likely that they'll usually do whatever the easiest reasonable-seeming thing is. Perhaps we need to distinguish between "container format" (e.g., a backup that is produced by a tool like pgBackrest, including backup manifest) and compression algorithm. Isn't it an incontrovertible fact that LZ4 is superior to pglz in every way? LZ4 is pretty much its successor. And so it seems totally fine to assume that users will always want to use the clearly better option, and that that option will be generally available going forward. TOAST compression is applied selectively already, based on various obscure implementation details, some of which are quite arbitrary. I know less about zstd, but I imagine that a similar dynamic exists there. Overall, everything that you've said makes sense to me. -- Peter Geoghegan
Hi, On 2022-02-15 11:40:20 -0800, Peter Geoghegan wrote: > On Tue, Feb 15, 2022 at 10:20 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Likewise, I still download the .tar.gz version of anything > > that gives me that option, basically because I'm familiar with the > > format and it's easy for me to just carry on using it -- and in a > > similar way I expect a lot of people will be happy to continue to > > compress backups with gzip for many years to come. There's a difference between downloading a source tarball of 1.5x the size, and 3x the decompression cost (made up numbers), because the cost of either is not a relevant factor. I think basebackups are a different story. > Isn't it an incontrovertible fact that LZ4 is superior to pglz in > every way? LZ4 is pretty much its successor. And so it seems totally > fine to assume that users will always want to use the clearly better > option, and that that option will be generally available going > forward. TOAST compression is applied selectively already, based on > various obscure implementation details, some of which are quite > arbitrary. Yea, we should really default to lz4 in initdb when available. Expecting users to know about magic incantation to get often vastly superior performance isn't a good approach. And it even makes initdb itself faster, because it turns out we compress enough that the cpu cycles for it are a measurable factor. > I know less about zstd, but I imagine that a similar dynamic exists > there. Overall, everything that you've said makes sense to me. IMO zstd has pretty much won the "gzip" type usecase of compression. Even prior lz4 uses have even been converted to it because it's often cheap enough. Greetings, Andres Freund
On Tue, Feb 15, 2022 at 2:40 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Feb 15, 2022 at 10:20 AM Robert Haas <robertmhaas@gmail.com> wrote: > > In general, deciding on new compression algorithms can feel a bit like > > debating the merits of vi vs. emacs, or one political party vs. > > another. > > Really? I don't get that impression myself. (That's not important, though.) Well, that's a good thing. Maybe we're getting better at keeping things constructive... > > What I imagine if this patch is accepted is that we (or our users) > > will end up using lz4 for places where compression needs to be very > > lightweight, and zstd for places where it's acceptable or even > > desirable to spend more CPU cycles in exchange for better compression. > > Sounds reasonable to me. Cool. > > Likewise, I still download the .tar.gz version of anything > > that gives me that option, basically because I'm familiar with the > > format and it's easy for me to just carry on using it -- and in a > > similar way I expect a lot of people will be happy to continue to > > compress backups with gzip for many years to come. > > Maybe, but it seems more likely that they'll usually do whatever the > easiest reasonable-seeming thing is. Perhaps we need to distinguish > between "container format" (e.g., a backup that is produced by a tool > like pgBackrest, including backup manifest) and compression algorithm. I'm not sure I completely follow this. There are cases where we use compression algorithms for internal purposes, and we can change the defaults without users knowing or caring. For example, we could decide that LZ4-compressing FPIs in WAL records is a sensible default and just start doing it, and users need not care. But backups are different, because when you pg_basebackup -Ft, you get .tar or .tar.gz or .tar.lz4 files which we don't give you tools to extract. You have to be able to extract them with OS tools. In cases like that, people are going to be more likely to pick formats with which they are familiar even if they are not technically best, and I don't think we really dare change the defaults. That seems OK, though. > Isn't it an incontrovertible fact that LZ4 is superior to pglz in > every way? LZ4 is pretty much its successor. And so it seems totally > fine to assume that users will always want to use the clearly better > option, and that that option will be generally available going > forward. TOAST compression is applied selectively already, based on > various obscure implementation details, some of which are quite > arbitrary. I pretty much agree with all of that. Nevertheless, there's a lot of pglz-compressed data that's already on a disk someplace which people are not necessarily keen to rewrite, and that will probably remain true for the foreseable future. If we change the default to something that's not pglz, and then wait 10 years, we MIGHT be able to get rid of it without pushback. But I wouldn't be surprised to learn that even then there are a lot of people who have just pg_upgrade'd the same database over and over again. And honestly I think that's fine. Yeah, lz4 is better. But, on some data sets, there's very little real difference in the compression ratio you get. Until keeping pglz around starts costing us something tangible, there's no reason to spend any effort worrying about it. > I know less about zstd, but I imagine that a similar dynamic exists > there. Overall, everything that you've said makes sense to me. Great, thanks for chiming in. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Feb 15, 2022 at 2:54 PM Andres Freund <andres@anarazel.de> wrote: > There's a difference between downloading a source tarball of 1.5x the size, > and 3x the decompression cost (made up numbers), because the cost of either is > not a relevant factor. I think basebackups are a different story. To be clear, I'm not saying that people shouldn't choose to adopt the new stuff. I'm just saying that many of them won't, for various reasons, including inertia. There may come a point when we want to make a push to remove obsolete stuff, but I think what is far more important right now is making the new stuff available. > Yea, we should really default to lz4 in initdb when available. Expecting users > to know about magic incantation to get often vastly superior performance isn't > a good approach. And it even makes initdb itself faster, because it turns out > we compress enough that the cpu cycles for it are a measurable factor. I don't mind if you want to propose a patch for that, but it seems like a separate topic from this thread. > IMO zstd has pretty much won the "gzip" type usecase of compression. Even > prior lz4 uses have even been converted to it because it's often cheap enough. You know more about this than I do... -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Feb 15, 2022 at 1:55 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > Regarding the patch: > > I suppose it should support windows, and whatever patches use zstd should > update the install instructions. See 9ca40dcd4, 02a93e7ef, 4035cd5d4. Thanks, this is helpful. To me, it looks like we need something similar to 9ca40dcd4, plus something like the last hunk of 02a93e7ef. The rest of 02a93e7ef and all of 4035cd5d4 seem like they address the application of lz4 rather than just adding support for it, so I don't see the need for those elements in this patch. Am I missing something? -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Feb 15, 2022 at 03:23:30PM -0500, Robert Haas wrote: > On Tue, Feb 15, 2022 at 1:55 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > Regarding the patch: > > > > I suppose it should support windows, and whatever patches use zstd should > > update the install instructions. See 9ca40dcd4, 02a93e7ef, 4035cd5d4. > > Thanks, this is helpful. To me, it looks like we need something > similar to 9ca40dcd4, plus something like the last hunk of 02a93e7ef. If you mean a patch which only adds the configure options, I don't think 02a9 is applicable at all. What I mean is that any patches which *use* the zstd support should update those for completeness. doc/src/sgml/config.sgml | 14 +++++++++----- doc/src/sgml/install-windows.sgml | 2 +- doc/src/sgml/installation.sgml | 5 +++-- > The rest of 02a93e7ef and all of 4035cd5d4 seem like they address the > application of lz4 rather than just adding support for it, so I don't > see the need for those elements in this patch. Am I missing something?
Hi, On 2022-02-15 15:09:37 -0500, Robert Haas wrote: > On Tue, Feb 15, 2022 at 2:54 PM Andres Freund <andres@anarazel.de> wrote: > > There's a difference between downloading a source tarball of 1.5x the size, > > and 3x the decompression cost (made up numbers), because the cost of either is > > not a relevant factor. I think basebackups are a different story. > > To be clear, I'm not saying that people shouldn't choose to adopt the > new stuff. I'm just saying that many of them won't, for various > reasons, including inertia. There may come a point when we want to > make a push to remove obsolete stuff, but I think what is far more > important right now is making the new stuff available. I think well before removing stuff we should default to decent compression algorithms. E.g. -Z/--compress should probably not continue to use gzip if better things are available. Greetings, Andres Freund
On Tue, Feb 15, 2022 at 12:00 PM Robert Haas <robertmhaas@gmail.com> wrote: > I'm not sure I completely follow this. There are cases where we use > compression algorithms for internal purposes, and we can change the > defaults without users knowing or caring. For example, we could decide > that LZ4-compressing FPIs in WAL records is a sensible default and > just start doing it, and users need not care. But backups are > different, because when you pg_basebackup -Ft, you get .tar or .tar.gz > or .tar.lz4 files which we don't give you tools to extract. What I meant is that you're buying into an ecosystem by choosing to use a tool like pgBackrest. That might not be the right thing to do universally, but it comes pretty close. You as a user are largely deferring to the maintainers and their choice of defaults. Not entirely, of course, but to a significant degree, especially in matters like this. There aren't that many dimensions to the problem once the question of compatibility is settled (it's settled here because you've already bought into a tool that requires the library as standard, say). A lot of things are *not* like that, but ISTM that backup compression really is -- we have the luxury of a constrained problem space, for once. There aren't that many metrics to consider, because it must be lossless compression in any case, and because the requirements are relatively homogenous. The truly important thing (once compatibility is accounted for) is to use something basically reasonable, with no noticeable weaknesses relative to any of the alternatives. > I pretty much agree with all of that. Nevertheless, there's a lot of > pglz-compressed data that's already on a disk someplace which people > are not necessarily keen to rewrite, and that will probably remain > true for the foreseable future. If we change the default to something > that's not pglz, and then wait 10 years, we MIGHT be able to get rid > of it without pushback. But I wouldn't be surprised to learn that even > then there are a lot of people who have just pg_upgrade'd the same > database over and over again. And honestly I think that's fine. I think that it's fine too. It's unlikely that anybody is going to go to any trouble to get better compression, certainly, but we should give them every opportunity to use better alternatives. Ideally without anybody having to even think about it. -- Peter Geoghegan
On 2/15/22 16:10, Peter Geoghegan wrote: > On Tue, Feb 15, 2022 at 12:00 PM Robert Haas <robertmhaas@gmail.com> wrote: >> I'm not sure I completely follow this. There are cases where we use >> compression algorithms for internal purposes, and we can change the >> defaults without users knowing or caring. For example, we could decide >> that LZ4-compressing FPIs in WAL records is a sensible default and >> just start doing it, and users need not care. I'm not sure that this is entirely true. lz4 is available on most non-EOL distros but you don't need to look to far to find a distro that does not have it. So, choosing lz4 by default could impact binary portability. Of course, there are lots of other factors that impact binary portability, e.g. collations, but this would add one more. This is even more true for zstd since it is not as ubiquitous as lz4. In fact, it is not even included with base RHEL8. You need to install EPEL to get zstd. >> But backups are >> different, because when you pg_basebackup -Ft, you get .tar or .tar.gz >> or .tar.lz4 files which we don't give you tools to extract. > > What I meant is that you're buying into an ecosystem by choosing to > use a tool like pgBackrest. That might not be the right thing to do > universally, but it comes pretty close. You as a user are largely > deferring to the maintainers and their choice of defaults. Not > entirely, of course, but to a significant degree, especially in > matters like this. There aren't that many dimensions to the problem > once the question of compatibility is settled (it's settled here > because you've already bought into a tool that requires the library as > standard, say). As much as we would like to, we have not changed the default compression for pgBackRest. lz4 and zstd are both optional at build time since they are not always available (though lz4 is getting close). So we have left gzip as the default, though a lot of that has to do with maintaining compatibility with prior versions of pgBackRest. Having said all that, I'm absolutely in favor of adding zstd to Postgres as an optional compression format. Regards, David
On Tue, Feb 15, 2022 at 01:42:55PM -0800, Andres Freund wrote: > I think well before removing stuff we should default to decent compression > algorithms. E.g. -Z/--compress should probably not continue to use gzip if > better things are available. Which one of lz4 or zstd would it be though? LZ4 is light on CPU, and compresses less than zstd which is more CPU intensive with more compression, so the choice does not look that straight-forward to me. Both of them are better than zlib, still they are less available on the platforms Postgres needs to provide support for. It does not seem really intuitive to me to have different defaults depending on the build options provided, either :/ Saying that, I'd like to think that zstd would still be a better default choice than LZ4. -- Michael
Attachment
On Tue, Feb 15, 2022 at 7:09 PM David Steele <david@pgmasters.net> wrote: > I'm not sure that this is entirely true. lz4 is available on most > non-EOL distros but you don't need to look to far to find a distro that > does not have it. So, choosing lz4 by default could impact binary > portability. Of course, there are lots of other factors that impact > binary portability, e.g. collations, but this would add one more. > > This is even more true for zstd since it is not as ubiquitous as lz4. In > fact, it is not even included with base RHEL8. You need to install EPEL > to get zstd. Yeah, I agree. One thing I was thinking about but didn't include in the previous email is that if we did choose to make something like LZ4 the default, it would presumably only be the default on builds that include LZ4 support. Other builds would need to use something else, unless we just chose to make LZ4 a hard requirement, which would be bolder than we usually are. And that has the consequence that you mention. It's something we should consider as we think about changing defaults. > Having said all that, I'm absolutely in favor of adding zstd to Postgres > as an optional compression format. Cool. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Feb 15, 2022 at 7:09 PM David Steele <david@pgmasters.net> wrote: >> This is even more true for zstd since it is not as ubiquitous as lz4. In >> fact, it is not even included with base RHEL8. You need to install EPEL >> to get zstd. > Yeah, I agree. One thing I was thinking about but didn't include in > the previous email is that if we did choose to make something like LZ4 > the default, it would presumably only be the default on builds that > include LZ4 support. Other builds would need to use something else, > unless we just chose to make LZ4 a hard requirement, which would be > bolder than we usually are. And that has the consequence that you > mention. It's something we should consider as we think about changing > defaults. Yeah. I'm +1 on adding zstd as an option, but I think we need to move *very* slowly on changing any defaults for user-accessible data (like backup files). Maybe we have a bit more flexibility for TOAST, not sure. regards, tom lane
Hi, On 2022-02-16 09:58:44 +0900, Michael Paquier wrote: > On Tue, Feb 15, 2022 at 01:42:55PM -0800, Andres Freund wrote: > > I think well before removing stuff we should default to decent compression > > algorithms. E.g. -Z/--compress should probably not continue to use gzip if > > better things are available. > > Which one of lz4 or zstd would it be though? LZ4 is light on CPU, and > compresses less than zstd which is more CPU intensive with more > compression, so the choice does not look that straight-forward to me. For backups it's pretty obviously zstd imo. At the lower levels it achieves considerably higher compression ratios while still being vastly faster than gzip. Without even using the threaded compression support the library has. For something like wal_compression it'd be a harder question. Greetings, Andres Freund
On Tue, Feb 15, 2022 at 02:33:32PM -0600, Justin Pryzby wrote: > What I mean is that any patches which *use* the zstd support should update > those for completeness. > > doc/src/sgml/config.sgml | 14 +++++++++----- > doc/src/sgml/install-windows.sgml | 2 +- > doc/src/sgml/installation.sgml | 5 +++-- Yes, the patch misses the fact that each ./configure switch is documented. For consistency, I think that you should also add that in the MSVC scripts in the first version. It is really straight-forward to do so, and it should be just a matter of grepping for the code paths of lz4, then adjust things for zstd. -- Michael
Attachment
Hi,m On 2022-02-15 20:56:15 -0500, Tom Lane wrote: > Maybe we have a bit more flexibility for TOAST, not sure. It'd be nice to at least add it as an option for initdb. Afaics there's no way to change the default at that point. initdb itself is measurably faster. Although sadly it's a bigger difference for optimized builds. I think it matter a bit even after initdb, pg_rewrite is most of the compressed data and some of the views are regularly used... - Andres
On Tue, Feb 15, 2022 at 9:33 PM Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Feb 15, 2022 at 02:33:32PM -0600, Justin Pryzby wrote: > > What I mean is that any patches which *use* the zstd support should update > > those for completeness. > > > > doc/src/sgml/config.sgml | 14 +++++++++----- > > doc/src/sgml/install-windows.sgml | 2 +- > > doc/src/sgml/installation.sgml | 5 +++-- > > Yes, the patch misses the fact that each ./configure switch is > documented. For consistency, I think that you should also add that in > the MSVC scripts in the first version. It is really straight-forward > to do so, and it should be just a matter of grepping for the code > paths of lz4, then adjust things for zstd. Makes sense. Here's an attempt by me to do that. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
On Tue, Feb 15, 2022 at 06:24:13PM -0800, Andres Freund wrote: > For backups it's pretty obviously zstd imo. At the lower levels it achieves > considerably higher compression ratios while still being vastly faster than > gzip. Without even using the threaded compression support the library has. Noted. > For something like wal_compression it'd be a harder question. FWIW, I have done some measurements for wal_compression with zstd, as of: https://www.postgresql.org/message-id/YMmlvyVyAFlxZ+/H@paquier.xyz The result is not surprising, a bit more CPU for zstd with more compression compared to LZ4, both outclassing easily zlib. I am not sure which one would be more adapted as default as FPI patterns depend on the workload, for one, and this is just one corner case. -- Michael
Attachment
On Wed, Feb 16, 2022 at 10:40:01AM -0500, Robert Haas wrote: > On Tue, Feb 15, 2022 at 9:33 PM Michael Paquier <michael@paquier.xyz> wrote: >> Yes, the patch misses the fact that each ./configure switch is >> documented. For consistency, I think that you should also add that in >> the MSVC scripts in the first version. It is really straight-forward >> to do so, and it should be just a matter of grepping for the code >> paths of lz4, then adjust things for zstd. > > Makes sense. Here's an attempt by me to do that. Thanks. This looks pretty much right, except for two things that I have taken the freedom to fix as of the v3 attached. %define needs to include HAVE_LIBZSTD, HAVE_ZSTD_H and USE_ZSTD, so this version fails the sanity check between pg_config.h.in and the MSVC scripts checking that all flags exist. @@ -351,6 +351,7 @@ XGETTEXT = @XGETTEXT@ GZIP = gzip BZIP2 = bzip2 LZ4 = @LZ4@ +ZSTD = @ZSTD@ A similar refresh is needed in vcregress.pl. + $proj->AddLibrary($self->{options}->{zstd} . '\lib\libzstd.lib'); The upstream code is also using this library name, so that should be fine. -- Michael
Attachment
> 15 февр. 2022 г., в 23:20, Robert Haas <robertmhaas@gmail.com> написал(а): > > Anyway, those are my thoughts. What are yours? +1 for adding Zstd everywhere. Quite similar patches are already a part of "libpq compression" and "zstd for pg_dump" commitfestentries, so pushing this will simplify those entries IMV. Also I like the idea of defaulting FPI compression to lz4 and adding Zstd as wal_compression option. I don't think Zstd is obvious choice for default FPI compression: there are HW setups when disks are almost as fast as RAM.In this case lz4 can improve performance, while Zstd might make things slower. Thanks! Best regards, Andrey Borodin.
On Wed, Feb 16, 2022 at 11:34 PM Michael Paquier <michael@paquier.xyz> wrote: > Thanks. This looks pretty much right, except for two things that I > have taken the freedom to fix as of the v3 attached. Ah, OK, cool. It seems like we have consensus on this being a good direction, so I plan to commit this later today unless objections or additional review comments turn up. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Feb 17, 2022 at 09:40:13AM -0500, Robert Haas wrote: > Ah, OK, cool. It seems like we have consensus on this being a good > direction, so I plan to commit this later today unless objections or > additional review comments turn up. So, will there be a part of the system where we'll make use of that in 15? pg_basebackup for server-side and client-side compression, at least, as far as I can guess? -- Michael
Attachment
Hi, On 2022-02-17 13:34:08 +0900, Michael Paquier wrote: > %define needs to include HAVE_LIBZSTD, HAVE_ZSTD_H and USE_ZSTD, so > this version fails the sanity check between pg_config.h.in and the > MSVC scripts checking that all flags exist. Do we really need all three defines? How about using AC_CHECK_HEADER() instead of AC_CHECK_HEADERS()? That wouldn't define HAVE_ZSTD_H. Cases where we error out if a header isn't found make it a bit pointless to then still define HAVE_*_H. Plenty other cases in configure.ac just use AC_CHECK_HEADER. Greetings, Andres Freund
On Thu, Feb 17, 2022 at 8:18 PM Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Feb 17, 2022 at 09:40:13AM -0500, Robert Haas wrote: > > Ah, OK, cool. It seems like we have consensus on this being a good > > direction, so I plan to commit this later today unless objections or > > additional review comments turn up. > > So, will there be a part of the system where we'll make use of that in > 15? pg_basebackup for server-side and client-side compression, at > least, as far as I can guess? That's my plan, yes. I think the patches only need a small amount of work at this point, but I'd like to get this part committed first. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Feb 17, 2022 at 8:36 PM Andres Freund <andres@anarazel.de> wrote: > On 2022-02-17 13:34:08 +0900, Michael Paquier wrote: > > %define needs to include HAVE_LIBZSTD, HAVE_ZSTD_H and USE_ZSTD, so > > this version fails the sanity check between pg_config.h.in and the > > MSVC scripts checking that all flags exist. > > Do we really need all three defines? How about using AC_CHECK_HEADER() instead > of AC_CHECK_HEADERS()? That wouldn't define HAVE_ZSTD_H. Cases where we error > out if a header isn't found make it a bit pointless to then still define > HAVE_*_H. Plenty other cases in configure.ac just use AC_CHECK_HEADER. I have to admit to being somewhat confused by the apparent lack of consistency in the way we do configure checks. The ZSTD check we added here is just based on the LZ4 check just above it, which was the result of my commit of Dilip's patch to add LZ4 TOAST compression. So if we want to do something different we should change them both. But that just begs the question of why the LZ4 support looks the way it does, and to be honest I don't recall. The zlib and XSLT formulas just above are much simpler, but for some reason what we're doing here seems to be based on the more-complex formula we use for XML support instead of either of those. But having said that, the proposed patch adds no new call to AC_CHECK_HEADER(), and does add a new call to AC_CHECK_HEADERS(), so I don't understand the specifics of your complaint here. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 18, 2022 at 8:48 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Feb 17, 2022 at 8:36 PM Andres Freund <andres@anarazel.de> wrote: > > On 2022-02-17 13:34:08 +0900, Michael Paquier wrote: > > > %define needs to include HAVE_LIBZSTD, HAVE_ZSTD_H and USE_ZSTD, so > > > this version fails the sanity check between pg_config.h.in and the > > > MSVC scripts checking that all flags exist. > > > > Do we really need all three defines? How about using AC_CHECK_HEADER() instead > > of AC_CHECK_HEADERS()? That wouldn't define HAVE_ZSTD_H. Cases where we error > > out if a header isn't found make it a bit pointless to then still define > > HAVE_*_H. Plenty other cases in configure.ac just use AC_CHECK_HEADER. > > I have to admit to being somewhat confused by the apparent lack of > consistency in the way we do configure checks. The ZSTD check we added > here is just based on the LZ4 check just above it, which was the > result of my commit of Dilip's patch to add LZ4 TOAST compression. So > if we want to do something different we should change them both. But > that just begs the question of why the LZ4 support looks the way it > does, and to be honest I don't recall. The zlib and XSLT formulas just > above are much simpler, but for some reason what we're doing here > seems to be based on the more-complex formula we use for XML support > instead of either of those. > > But having said that, the proposed patch adds no new call to > AC_CHECK_HEADER(), and does add a new call to AC_CHECK_HEADERS(), so I > don't understand the specifics of your complaint here. Oh wait ... you want it the other way. Yeah, that seems harmless to change. I wonder how many others there are that could be changed similarly... -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 18, 2022 at 9:02 AM Robert Haas <robertmhaas@gmail.com> wrote: > Oh wait ... you want it the other way. Yeah, that seems harmless to > change. I wonder how many others there are that could be changed > similarly... I went through configure.ac looking for instances of AC_CHECK_HEADERS() where the corresponding symbol was not used. I found four: AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required for LZ4])]) AC_CHECK_HEADERS(gssapi/gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])]) AC_CHECK_HEADERS(ldap.h, [], [AC_MSG_ERROR([header file <ldap.h> is required for LDAP])] AC_CHECK_HEADER(winldap.h, [], ...stuff...) I guess we could clean all of those up similarly. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 18, 2022 at 9:24 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 18, 2022 at 9:02 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Oh wait ... you want it the other way. Yeah, that seems harmless to > > change. I wonder how many others there are that could be changed > > similarly... > > I went through configure.ac looking for instances of > AC_CHECK_HEADERS() where the corresponding symbol was not used. I > found four: > > AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is > required for LZ4])]) > AC_CHECK_HEADERS(gssapi/gssapi.h, [], [AC_MSG_ERROR([gssapi.h header > file is required for GSSAPI])])]) > AC_CHECK_HEADERS(ldap.h, [], [AC_MSG_ERROR([header file <ldap.h> is > required for LDAP])] > AC_CHECK_HEADER(winldap.h, [], ...stuff...) > > I guess we could clean all of those up similarly. So here's a revised patch for zstd support that uses AC_CHECK_HEADER(), plus a second patch to change the occurrences above to use AC_CHECK_HEADER() and remove all traces of the corresponding preprocessor symbol. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
On Fri, Feb 18, 2022 at 9:52 AM Robert Haas <robertmhaas@gmail.com> wrote: > So here's a revised patch for zstd support that uses > AC_CHECK_HEADER(), plus a second patch to change the occurrences above > to use AC_CHECK_HEADER() and remove all traces of the corresponding > preprocessor symbol. And I've now committed the first of these. -- Robert Haas EDB: http://www.enterprisedb.com
On 2022-02-18 09:52:52 -0500, Robert Haas wrote: > plus a second patch to change the occurrences above to use AC_CHECK_HEADER() > and remove all traces of the corresponding preprocessor symbol. LGTM. I'm not entirely sure the gssapi one is a real improvement, because we kind of test for that branch, via the #else in HAVE_GSSAPI_H. But on balance, I'd probably go for it.