Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id 20210320000625.GP11765@telsasoft.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Mar 19, 2021 at 04:38:03PM -0400, Robert Haas wrote:
> On Fri, Mar 19, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Nope ... crake's displeased with your assumption that it's OK to
> > clutter dumps with COMPRESSION clauses.  As am I: that is going to
> > be utterly fatal for cross-version transportation of dumps.
> 
> Regarding your point, that does look like clutter. We don't annotate
> the dump with a storage clause unless it's non-default, so probably we
> should do the same thing here. I think I gave Dilip bad advice here...

On Fri, Mar 19, 2021 at 05:49:37PM -0400, Robert Haas wrote:
> Here's a patch for that. It's a little strange because you're going to
> skip dumping the toast compression based on the default value on the
> source system, but that might not be the default on the system where
> the dump is being restored, so you could fail to recreate the state
> you had. That is avoidable if you understand how things work, but some
> people might not. I don't have a better idea, though, so let me know
> what you think of this.

On Fri, Mar 19, 2021 at 07:22:42PM -0300, Alvaro Herrera wrote:
> Do you mean the column storage strategy, attstorage?  I don't think
> that's really related, because the difference there is not a GUC setting
> but a compiled-in default for the type.  In the case of compression, I'm
> not sure it makes sense to do it like that, but I can see the clutter
> argument: if we dump compression for all columns, it's going to be super
> noisy.
> 
> (At least, for binary upgrade surely you must make sure to apply the
> correct setting regardless of defaults on either system).
> 
> Maybe it makes sense to dump the compression clause if it is different
> from pglz, regardless of the default on the source server.  Then, if the
> target server has chosen lz4 as default, *all* columns are going to end
> up as lz4, and if it hasn't, then only the ones that were lz4 in the
> source server are going to.  That seems reasonable behavior.  Also, if
> some columns are lz4 in source, and target does not have lz4, then
> everything is going to work out to not-lz4 with just a bunch of errors
> in the output.

I think what's missing is dumping the GUC value itself, and then also dump any
columns that differ from the GUC's setting.  An early version of the GUC patch
actually had an "XXX" comment about pg_dump support, and I was waiting for a
review before polishing it.  This was modelled after default_tablespace and
default_table_access_method - I've mentioned that before that there's no
pg_restore --no-table-am, and I have an unpublished patch to add it.  That may
be how I missed this until now.

Then, this will output COMPRESSION on "a" (x)or "b" depending on the current
default:
| CREATE TABLE a(a text compression lz4, b text compression pglz);
When we restore it, we set the default before restoring columns.

I think it may be a good idea to document that dumps of columns with
non-default compression aren't portable to older server versions, or servers
--without-lz4.  This is a consequence of the CREATE command being a big text
blob, so pg_restore can't reasonably elide the COMPRESSION clause.

While looking at this, I realized that someone added the GUC to
postgresql.conf.sample, but not to doc/ - this was a separate patch until
yesterday.

I think since we're not doing catalog access for "pluggable" compression, this
should just be an enum GUC, with #ifdef LZ4.  Then we don't need a hook to
validate it.

ALTER and CREATE are silently accepting bogus compression names.

I can write patches for these later.

-- 
Justin



pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Next
From: Bruce Momjian
Date:
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?