Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | 20210320000625.GP11765@telsasoft.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Fri, Mar 19, 2021 at 04:38:03PM -0400, Robert Haas wrote: > On Fri, Mar 19, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Nope ... crake's displeased with your assumption that it's OK to > > clutter dumps with COMPRESSION clauses. As am I: that is going to > > be utterly fatal for cross-version transportation of dumps. > > Regarding your point, that does look like clutter. We don't annotate > the dump with a storage clause unless it's non-default, so probably we > should do the same thing here. I think I gave Dilip bad advice here... On Fri, Mar 19, 2021 at 05:49:37PM -0400, Robert Haas wrote: > Here's a patch for that. It's a little strange because you're going to > skip dumping the toast compression based on the default value on the > source system, but that might not be the default on the system where > the dump is being restored, so you could fail to recreate the state > you had. That is avoidable if you understand how things work, but some > people might not. I don't have a better idea, though, so let me know > what you think of this. On Fri, Mar 19, 2021 at 07:22:42PM -0300, Alvaro Herrera wrote: > Do you mean the column storage strategy, attstorage? I don't think > that's really related, because the difference there is not a GUC setting > but a compiled-in default for the type. In the case of compression, I'm > not sure it makes sense to do it like that, but I can see the clutter > argument: if we dump compression for all columns, it's going to be super > noisy. > > (At least, for binary upgrade surely you must make sure to apply the > correct setting regardless of defaults on either system). > > Maybe it makes sense to dump the compression clause if it is different > from pglz, regardless of the default on the source server. Then, if the > target server has chosen lz4 as default, *all* columns are going to end > up as lz4, and if it hasn't, then only the ones that were lz4 in the > source server are going to. That seems reasonable behavior. Also, if > some columns are lz4 in source, and target does not have lz4, then > everything is going to work out to not-lz4 with just a bunch of errors > in the output. I think what's missing is dumping the GUC value itself, and then also dump any columns that differ from the GUC's setting. An early version of the GUC patch actually had an "XXX" comment about pg_dump support, and I was waiting for a review before polishing it. This was modelled after default_tablespace and default_table_access_method - I've mentioned that before that there's no pg_restore --no-table-am, and I have an unpublished patch to add it. That may be how I missed this until now. Then, this will output COMPRESSION on "a" (x)or "b" depending on the current default: | CREATE TABLE a(a text compression lz4, b text compression pglz); When we restore it, we set the default before restoring columns. I think it may be a good idea to document that dumps of columns with non-default compression aren't portable to older server versions, or servers --without-lz4. This is a consequence of the CREATE command being a big text blob, so pg_restore can't reasonably elide the COMPRESSION clause. While looking at this, I realized that someone added the GUC to postgresql.conf.sample, but not to doc/ - this was a separate patch until yesterday. I think since we're not doing catalog access for "pluggable" compression, this should just be an enum GUC, with #ifdef LZ4. Then we don't need a hook to validate it. ALTER and CREATE are silently accepting bogus compression names. I can write patches for these later. -- Justin
pgsql-hackers by date: