Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-utJPbTLQ9i10wT_zmHX=un+RQMB1B1xbkTgrh971vqjw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [HACKERS] Custom compression methods  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> The most recent patch doesn't compile --without-lz4:
>
> compress_lz4.c:191:17: error: ‘lz4_cmcheck’ undeclared here (not in a function)
>   .datum_check = lz4_cmcheck,
> ...
>

My bad, fixed this.

> And fails pg_upgrade check, apparently losing track of the compression (?)
>
>  CREATE TABLE public.cmdata2 (
> -    f1 text COMPRESSION lz4
> +    f1 text
>  );

I did not get this?  pg_upgrade check is passing for me.

> You added pg_dump --no-compression, but the --help isn't updated.

Fixed.

  I think
> there should also be an option for pg_restore, like --no-tablespaces.  And I
> think there should be a GUC for default_compression, like
> default_table_access_method, so one can restore into an alternate compression
> by setting PGOPTIONS=-cdefault_compression=lz4.
>
> I'd like to be able to make all compressible columns of a table use a
> non-default compression (except those which cannot), without having to use
> \gexec...  We have tables with up to 1600 columns.  So a GUC would allow that.
>
> Previously (on separate threads) I wondered whether pg_dump
> --no-table-access-method was needed - maybe that be sufficient for this case,
> too, but I think it should be possible to separately avoid restoring
> compression AM and AM "proper".  So maybe it'd be like --no-tableam=compress
> --no-tableam=storage or --no-tableam-all.

I will put more thought into this and respond separately.

> Some language fixes:
>
> Subject: [PATCH v16 1/6] Built-in compression method
>
> +++ b/doc/src/sgml/ddl.sgml
> @@ -3762,6 +3762,8 @@ CREATE TABLE measurement (
>         <productname>PostgreSQL</productname>
>         tables (or, possibly, foreign tables).  It is possible to specify a
>         tablespace and storage parameters for each partition separately.
> +       Partitions inherits the compression method of the parent for each column
> +       however we can set different compression method for each partition.
>
> Should say:
> +       By default, each column in a partition inherits the compression method from its parent table,
> +       however a different compression method can be set for each partition.

Done

> +++ b/doc/src/sgml/ref/create_table.sgml
>
> +       <varlistentry>
> +        <term><literal>INCLUDING COMPRESSION</literal></term>
> +        <listitem>
> +         <para>
> +          Compression method of the columns will be coppied.  The default
> +          behavior is to exclude compression method, resulting in the copied
> +          column will have the default compression method if the column type is
> +          compressible.
>
> Say:
> +          Compression method of the columns will be copied.  The default
> +          behavior is to exclude compression methods, resulting in the
> +          columns having the default compression method.

Done

> +   <varlistentry>
> +    <term><literal>COMPRESSION <replaceable class="parameter">compression_method</replaceable></literal></term>
> +    <listitem>
> +     <para>
> +      This clause adds the compression method to a column.  Compression method
> +      can be set from the available built-in compression methods.  The available
> +      options are <literal>pglz</literal> and <literal>lz4</literal>.  If the
> +      compression method is not sepcified for the compressible type then it will
> +      have the default compression method.  The default compression method is
> +      <literal>pglz</literal>.
>
> Say "The compression method can be set from available compression methods" (or
> remove this sentence).
> Say "The available BUILT-IN methods are ..."
> sepcified => specified

Done

> +
> +       /*
> +        * No point in wasting a palloc cycle if value size is out of the allowed
> +        * range for compression
>
> say "outside the allowed range"
>
> +               if (pset.sversion >= 120000 &&
> +               if (pset.sversion >= 120000 &&
>
> A couple places that need to say >= 14

Fixed

> Subject: [PATCH v16 2/6] alter table set compression
>
> +     <literal>SET COMPRESSION <replaceable class="parameter">compression_method</replaceable></literal>
> +      This clause adds compression to a column. Compression method can be set
> +      from available built-in compression methods. The available built-in
> +      methods are <literal>pglz</literal> and <literal>lz4</literal>.
>
> Should say "The compression method can be set to any available method.  The
> built in methods are >PGLZ< or >LZ<"
> That fixes grammar, and correction that it's possible to set to an available
> method other than what's "built-in".

Done

> +++ b/src/include/commands/event_trigger.h
> @@ -32,7 +32,7 @@ typedef struct EventTriggerData
>  #define AT_REWRITE_ALTER_PERSISTENCE   0x01
>  #define AT_REWRITE_DEFAULT_VAL                 0x02
>  #define AT_REWRITE_COLUMN_REWRITE              0x04
> -
> +#define AT_REWRITE_ALTER_COMPRESSION   0x08
>  /*
>
> This is losing a useful newline.

Fixed

> Subject: [PATCH v16 4/6] Create custom compression methods
>
> +      This clause adds compression to a column. Compression method
> +      could be created with <xref linkend="sql-create-access-method"/> or it can
> +      be set from the available built-in compression methods.  The available
> +      built-in methods are <literal>pglz</literal> and <literal>lz4</literal>.
> +      The PRESERVE list contains list of compression methods used on the column
> +      and determines which of them should be kept on the column.  Without
> +      PRESERVE or if all the previous compression methods are not preserved then
> +      the table will be rewritten.  If PRESERVE ALL is specified then all the
> +      previous methods will be preserved and the table will not be rewritten.
>       </para>
>      </listitem>
>     </varlistentry>
>
> diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
> index f404dd1088..ade3989d75 100644
> --- a/doc/src/sgml/ref/create_table.sgml
> +++ b/doc/src/sgml/ref/create_table.sgml
> @@ -999,11 +999,12 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
> +      could be created with <xref linkend="sql-create-access-method"/> or it can
> +      be set from the available built-in compression methods.  The available
>
> remove this first "built-in" ?

Done

> +      built-in methods are <literal>pglz</literal> and <literal>lz4</literal>.
>
>
> +GetCompressionAmRoutineByAmId(Oid amoid)
> ...
> +       /* Check if it's an index access method as opposed to some other AM */
> +       if (amform->amtype != AMTYPE_COMPRESSION)
> +               ereport(ERROR,
> +                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                                errmsg("access method \"%s\" is not of type %s",
> +                                               NameStr(amform->amname), "INDEX")));
> ...
> +                                errmsg("index access method \"%s\" does not have a handler",
>
> In 3 places, the comment and code should say "COMPRESSION" right ?

Fixed, along with some other refactoring around this code.

> Subject: [PATCH v16 6/6] Support compression methods options
>
> +      If compression method has options they could be specified with
> +      <literal>WITH</literal> parameter.
>
> If *the* compression method has options, they *can* be specified with *the* ...

Done

> @@ -1004,7 +1004,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
> +      method is <literal>pglz</literal>.  If the compression method has options
> +      they could be specified by <literal>WITH</literal>
> +      parameter.
>
> same

Done

> +static void *
> +lz4_cminitstate(List *options)
> +{
> +       int32   *acceleration = palloc(sizeof(int32));
> +
> +       /* initialize with the default acceleration */
> +       *acceleration = 1;
> +
> +       if (list_length(options) > 0)
> +       {
> +               ListCell        *lc;
> +
> +               foreach(lc, options)
> +               {
> +                       DefElem    *def = (DefElem *) lfirst(lc);
> +
> +                       if (strcmp(def->defname, "acceleration") == 0)
> +                               *acceleration = pg_atoi(defGetString(def), sizeof(int32), 0);
>
> Don't you need to say "else: error: unknown compression option" ?

Done

> +       /*
> +        * Compression option must be only valid if we are updating the compression
> +        * method.
> +        */
> +       Assert(DatumGetPointer(acoptions) == NULL || OidIsValid(newcompression));
> +
>
> should say "need be valid only if .."

Changed.

Apart from this, I have also done some refactoring and comment improvement.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: RE: Parallel Inserts in CREATE TABLE AS
Next
From: Amit Kapila
Date:
Subject: Re: Single transaction in the tablesync worker?