Re: Small issues with CREATE TABLE COMPRESSION - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Small issues with CREATE TABLE COMPRESSION
Date
Msg-id 20210429040132.GF27406@telsasoft.com
Whole thread Raw
In response to Small issues with CREATE TABLE COMPRESSION  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Small issues with CREATE TABLE COMPRESSION
List pgsql-hackers
On Tue, Apr 27, 2021 at 03:22:25PM +0900, Michael Paquier wrote:
> Hi all,
> 
> I have been looking at and testing the patch set for CREATE TABLE
> COMPRESSION, and spotted a couple of things in parallel of some work
> done by Jacob (added in CC).
> 
> The behavior around CREATE TABLE AS and matviews is a bit confusing,
> and not documented.  First, at the grammar level, it is not possible
> to specify which compression option is used per column when creating
> the relation.  So, all the relation columns would just set a column's
> compression to be default_toast_compression for all the toastable
> columns of the relation.  That's not enforceable at column level when
> the relation is created, except with a follow-up ALTER TABLE.  That's
> similar to STORAGE when it comes to matviews, but these are at least
> documented.
> 
> And so, ALTER MATERIALIZED VIEW supports SET COMPRESSION but this is
> not mentioned in its docs:
> https://www.postgresql.org/docs/devel/sql-altermaterializedview.html
>
> psql could have tab completion support for that.

Actually ALTER matview ALTER col has no tab completion at all, right ?

> Now, we don't really document any of that, and the per-column
> compression value would be set to default_toast_compression while the
> stored values may use a mix of the compression methods, depending on
> where the toasted values come from.  If this behavior is intended, this
> makes me wonder in what the possibility to set the compression for a
> materialized view column is useful for except for a logical
> dump/restore?  As of HEAD we'd just insert the toasted value from the
> origin as-is so the compression of the column has no effect at all.

That may be true if the mat view is trivial, but not true if it has
expressions.  The mat view column may be built on multiple table columns, or be
of a different type than the columns it's built on top of, so the relationship
may not be so direct.

> Another thing here is the inconsistency that this brings with pg_dump.
> For example, as the dumped values are decompressed, we could have
> values compressed with pglz at the origin, with a column using lz4
> within its definition that would make everything compressed with lz4
> once the values are restored.  This choice may be fine, but I think
> that it would be good to document all that.  That would be less
> surprising to the user.

Can you suggest what or where we'd say it?  I think this is just the behavior
that pg_dump shouldn't lose the user's compression setting.

The setting itself is for "future" data, and the only way to guarantee what
compression types are in use are by vacuum full/cluster or pg_dump restore.

> Similarly, we may want to document that COMPRESSION does not trigger a
> table rewrite, but that it is effective only for the new toast values
> inserted if a tuple is rebuilt and rewritten?

Good point.  I started with this.

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 39927be41e..8cceea41d0 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -391,7 +391,21 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </term>
     <listitem>
      <para>
-      This sets the compression method for a column.  The supported compression
+      This sets the compression method to be used for data inserted into a column.
+
+      This does not cause the table to be rewritten, so existing data may still
+      be compressed with other compression methods.  If the table is rewritten with
+      <command>VACUUM FULL</command> or <command>CLUSTER</command>, or restored
+      with <application>pg_restore</application>, then all tuples are rewritten
+      with the configured compression methods.
+
+      Also, note that when data is inserted from another relation (for example,
+      by <command>INSERT ... SELECT</command>), tuples from the source data are
+      not necessarily detoasted, and any previously compressed data is retained
+      with its existing compression method, rather than recompressing with the
+      compression methods of the target columns.
+
+      The supported compression
       methods are <literal>pglz</literal> and <literal>lz4</literal>.
       <literal>lz4</literal> is available only if <literal>--with-lz4</literal>
       was used when building <productname>PostgreSQL</productname>.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Replication slot stats misgivings
Next
From: David Rowley
Date:
Subject: Re: Use simplehash.h instead of dynahash in SMgr