Thread: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
From
Michael Paquier
Date:
Hi all, (Added Robert and Georgios in CC:) Since babbbb5 and the introduction of LZ4, I have reworked the way compression is controlled for pg_receivewal, with two options: - --compress-method, settable to "gzip", "none" or "lz4". - --compress, to pass down a compression level, where the allowed range is 1-9. If passing down 0, we'd get an error rather than implying no compression, contrary to what we did in ~14. I initially thought that this was fine as-is, but then Robert and others have worked on client/server compression for pg_basebackup, introducing a much better design with centralized APIs where one can use METHOD:DETAIL for as compression value, where DETAIL is a comma-separated list of keyword=value (keyword = "level" or "workers"), with centralized checks and an extensible design. This is something I think we had better fix before beta1, because now we have binaries that use an inconsistent set of options. So, attached is a patch set aimed at rework this option set from the ground, taking advantage of the recent work done by Robert and others for pg_basebackup: - 0001 is a simple rename of backup_compression.{c,h} to compression.{c,h}, removing anything related to base backups from that. One extra reason behind this renaming is that I would like to use this infra for pg_dump, but that's material for 16~. - 0002 removes WalCompressionMethod, replacing it by pg_compress_algorithm as these are the same enums. Robert complained about the confusion that WalCompressionMethod could lead to as this could be used for the compression of data, and not only WAL. I have renamed some variables to be more consistent, while on it. - 0003 is the actual option rework for pg_receivewal. This takes advantage of 0001, leading to the result of removing --compress-method and replacing it with --compress, taking care of the backward compatibility problems for an integer value, aka 0 implies no compression and val > 0 implies gzip. One bonus reason to switch to that is that this would make the addition of zstd for pg_receivewal easier in the future. I am going to add an open item for this stuff. Comments or thoughts? Thanks, -- Michael
Attachment
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
From
gkokolatos@pm.me
Date:
On Monday, April 11th, 2022 at 8:52 AM, Michael Paquier <michael@paquier.xyz> wrote: > This is something I think we had better fix before beta1, because now > we have binaries that use an inconsistent set of options. So, > attached is a patch set aimed at rework this option set from the > ground, taking advantage of the recent work done by Robert and others > for pg_basebackup: Agreed. It is rather inconsistent now. > - 0001 is a simple rename of backup_compression.{c,h} to > compression.{c,h}, removing anything related to base backups from > that. One extra reason behind this renaming is that I would like to > use this infra for pg_dump, but that's material for 16~. I agree with the design. If you permit me a couple of nitpicks regarding naming. +typedef enum pg_compress_algorithm +{ + PG_COMPRESSION_NONE, + PG_COMPRESSION_GZIP, + PG_COMPRESSION_LZ4, + PG_COMPRESSION_ZSTD +} pg_compress_algorithm; Elsewhere in the codebase, (e.g. create_table.sgml, alter_table.sgml, brin_tuple.c, detoast.c, toast_compression.c, tupdesc.c, gist.c to mention a few) variations of of the nomenclature "compression method" are used, like 'VARDATA_COMPRESSED_GET_COMPRESS_METHOD' or 'InvalidCompressionMethod' etc. I feel that it would be nicer if we followed one naming rule for this and I recommend to substitute algorithm for method throughout. On a similar note, it would help readability to be able to distinguish at a glance the type from the variable. Maybe uppercase or camelcase the type? Last, even though it is not needed now, it will be helpful to have a PG_COMPRESSION_INVALID in some scenarios. Though we can add it when we come to it. > - 0002 removes WalCompressionMethod, replacing it by > pg_compress_algorithm as these are the same enums. Robert complained > about the confusion that WalCompressionMethod could lead to as this > could be used for the compression of data, and not only WAL. I have > renamed some variables to be more consistent, while on it. It looks good. If you choose to discard the comment regarding the use of 'method' over 'algorithm' from above, can you please use the full word in the variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I can not really explain it, the later reads a bit rude. Then again that may be just me. > - 0003 is the actual option rework for pg_receivewal. This takes > advantage of 0001, leading to the result of removing --compress-method > and replacing it with --compress, taking care of the backward > compatibility problems for an integer value, aka 0 implies no > compression and val > 0 implies gzip. One bonus reason to switch to > that is that this would make the addition of zstd for pg_receivewal > easier in the future. Looks good. > I am going to add an open item for this stuff. Comments or thoughts? I agree that it is better to not release pg_receivewal with the distinct set of options. Cheers, //Georgios > Thanks, > -- > Michael
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
From
Robert Haas
Date:
On Mon, Apr 11, 2022 at 2:52 AM Michael Paquier <michael@paquier.xyz> wrote: > Since babbbb5 and the introduction of LZ4, I have reworked the way > compression is controlled for pg_receivewal, with two options: > - --compress-method, settable to "gzip", "none" or "lz4". > - --compress, to pass down a compression level, where the allowed > range is 1-9. If passing down 0, we'd get an error rather than > implying no compression, contrary to what we did in ~14. > > I initially thought that this was fine as-is, but then Robert and > others have worked on client/server compression for pg_basebackup, > introducing a much better design with centralized APIs where one can > use METHOD:DETAIL for as compression value, where DETAIL is a > comma-separated list of keyword=value (keyword = "level" or > "workers"), with centralized checks and an extensible design. > > This is something I think we had better fix before beta1, because now > we have binaries that use an inconsistent set of options. So, > attached is a patch set aimed at rework this option set from the > ground, taking advantage of the recent work done by Robert and others > for pg_basebackup: +1 for this in general, but I think that naming like "compression_algo" stinks. If you think "compression_algorithm" is too long, I think you should use "algorithm" or "compression" or "compression_method" or something. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
From
Michael Paquier
Date:
On Mon, Apr 11, 2022 at 11:15:46AM -0400, Robert Haas wrote: > +1 for this in general, but I think that naming like > "compression_algo" stinks. If you think "compression_algorithm" is too > long, I think you should use "algorithm" or "compression" or > "compression_method" or something. Yes, I found "compression_algorithm" to be too long initially. For walmethods.c and pg_receivewal.c, it may be better to just stick to "algorithm" then, at least that's consistent with pg_basebackup.c. -- Michael
Attachment
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
From
Michael Paquier
Date:
On Mon, Apr 11, 2022 at 12:46:02PM +0000, gkokolatos@pm.me wrote: > On Monday, April 11th, 2022 at 8:52 AM, Michael Paquier <michael@paquier.xyz> > wrote: >> - 0001 is a simple rename of backup_compression.{c,h} to >> compression.{c,h}, removing anything related to base backups from >> that. One extra reason behind this renaming is that I would like to >> use this infra for pg_dump, but that's material for 16~. > > I agree with the design. If you permit me a couple of nitpicks regarding naming. > > +typedef enum pg_compress_algorithm > +{ > + PG_COMPRESSION_NONE, > + PG_COMPRESSION_GZIP, > + PG_COMPRESSION_LZ4, > + PG_COMPRESSION_ZSTD > +} pg_compress_algorithm; > > Elsewhere in the codebase, (e.g. create_table.sgml, alter_table.sgml, > brin_tuple.c, detoast.c, toast_compression.c, tupdesc.c, gist.c to mention a > few) variations of of the nomenclature "compression method" are used, like > 'VARDATA_COMPRESSED_GET_COMPRESS_METHOD' or 'InvalidCompressionMethod' etc. I > feel that it would be nicer if we followed one naming rule for this and I > recommend to substitute algorithm for method throughout. Technically and as far as I know, both are correct and hold more or less the same meaning. pg_basebackup's code exposes algorithm in a more extended way, so I have just stuck to it for the internal variables and such. Perhaps we could rename the whole, but I see no strong reason to do that. > Last, even though it is not needed now, it will be helpful to have a > PG_COMPRESSION_INVALID in some scenarios. Though we can add it when we come to > it. Perhaps. There is no need for it yet, though. pg_dump would not need that, as well. >> - 0002 removes WalCompressionMethod, replacing it by >> pg_compress_algorithm as these are the same enums. Robert complained >> about the confusion that WalCompressionMethod could lead to as this >> could be used for the compression of data, and not only WAL. I have >> renamed some variables to be more consistent, while on it. > > It looks good. If you choose to discard the comment regarding the use of > 'method' over 'algorithm' from above, can you please use the full word in the > variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I can > not really explain it, the later reads a bit rude. Then again that may be just > me. Thanks. I have been able to do an extra pass on 0001 and 0002, fixing those naming inconsistencies with "algo" vs "algorithm" that you and Robert have reported, and applied them. For 0003, I'll look at it later. Attached is a rebase with improvements about the variable names. -- Michael
Attachment
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
From
Michael Paquier
Date:
On Tue, Apr 12, 2022 at 06:22:48PM +0900, Michael Paquier wrote: > On Mon, Apr 11, 2022 at 12:46:02PM +0000, gkokolatos@pm.me wrote: >> It looks good. If you choose to discard the comment regarding the use of >> 'method' over 'algorithm' from above, can you please use the full word in the >> variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I can >> not really explain it, the later reads a bit rude. Then again that may be just >> me. > > Thanks. I have been able to do an extra pass on 0001 and 0002, fixing > those naming inconsistencies with "algo" vs "algorithm" that you and > Robert have reported, and applied them. For 0003, I'll look at it > later. Attached is a rebase with improvements about the variable > names. This has been done with the proper renames. With that in place, I see no reason now to not be able to set the compression level as it is possible to pass it down with the options available. This requires only a couple of lines, as of the attached. LZ4 has a dummy structure called LZ4F_INIT_PREFERENCES to initialize LZ4F_preferences_t, that holds the compression level before passing it down to LZ4F_compressBegin(), but that's available only in v1.8.3. Using it risks lowering down the minimal version of LZ4 we are able to use now, but replacing that with a set of memset()s is also a way to set up things as per its documentation. Thoughts? -- Michael
Attachment
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
From
gkokolatos@pm.me
Date:
------- Original Message ------- On Wednesday, April 13th, 2022 at 7:25 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Tue, Apr 12, 2022 at 06:22:48PM +0900, Michael Paquier wrote: > > > On Mon, Apr 11, 2022 at 12:46:02PM +0000, gkokolatos@pm.me wrote: > > > > > It looks good. If you choose to discard the comment regarding the use of > > > 'method' over 'algorithm' from above, can you please use the full word in the > > > variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I can > > > not really explain it, the later reads a bit rude. Then again that may be just > > > me. > > > > Thanks. I have been able to do an extra pass on 0001 and 0002, fixing > > those naming inconsistencies with "algo" vs "algorithm" that you and > > Robert have reported, and applied them. For 0003, I'll look at it > > later. Attached is a rebase with improvements about the variable > > names. > > This has been done with the proper renames. With that in place, I see > no reason now to not be able to set the compression level as it is > possible to pass it down with the options available. This requires > only a couple of lines, as of the attached. LZ4 has a dummy structure > called LZ4F_INIT_PREFERENCES to initialize LZ4F_preferences_t, that > holds the compression level before passing it down to > LZ4F_compressBegin(), but that's available only in v1.8.3. Using it > risks lowering down the minimal version of LZ4 we are able to use now, > but replacing that with a set of memset()s is also a way to set up > things as per its documentation. > > Thoughts? It's really not hard to add compression level. However we had briefly discussed it in the original thread [1] and decided against. That is why I did not write that code. If the community thinks differently now, let me know if you would like me to offer a patch for it. Cheers, //Georgios [1] https://www.postgresql.org/message-id/flat/CABUevEwuq7XXyd4fA0W3jY9MsJu9B2WRbHumAA%2B3WzHrGAQjsg%40mail.gmail.com#b6456fa2adc1cdb049a57bf3587666b9
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
From
Michael Paquier
Date:
On Wed, Apr 13, 2022 at 02:58:28PM +0000, gkokolatos@pm.me wrote: > It's really not hard to add compression level. However we had briefly > discussed it in the original thread [1] and decided against. That is why > I did not write that code. If the community thinks differently now, let > me know if you would like me to offer a patch for it. The issue back then was how to design the option set to handle all that (right? My memories may be short on that), and pg_basebackup takes care of that with its option design. This is roughly what has been done here, except that this was for the contentSize: https://www.postgresql.org/message-id/rYyZ3Fj9qayyY9-egNsV_kkLbL_BSWcOEdi3Mb6M9eQRTkcA2jrqFEHglLUEYnzWR_wttCqn7VI94MZ2p7mwNje51lHTvWYnJ1jHdOceen4=@pm.me Do you think that the extra test coverage is going to be too much of a burden? I was thinking about just adding a level to the main lz4 command, with an extra negative test in the SKIP block with a level out of range -- Michael
Attachment
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
From
Michael Paquier
Date:
On Thu, Apr 14, 2022 at 06:18:29AM +0900, Michael Paquier wrote: > The issue back then was how to design the option set to handle all > that (right? My memories may be short on that), and pg_basebackup > takes care of that with its option design. I have looked at that again this morning, and the change is straight-forward so I saw no reason to not do it. And, applied. -- Michael