Re: Optimize binary serialization format of arrays with fixed size elements - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Optimize binary serialization format of arrays with fixed size elements
Date
Msg-id 20120123031657.GE15693@tornado.leadboat.com
Whole thread Raw
In response to Re: Optimize binary serialization format of arrays with fixed size elements  (Mikko Tiihonen <mikko.tiihonen@nitorcreations.com>)
List pgsql-hackers
On Sun, Jan 22, 2012 at 11:47:06PM +0200, Mikko Tiihonen wrote:
> On 01/20/2012 04:45 AM, Noah Misch wrote:
>> On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote:
>>> Having said that, if we're to follow the precedent set by
>>> bytea_format, maybe we ought to just add
>>> binary_array_format={huge,ittybitty} and be done with it, rather than
>>> inventing a minor protocol version GUC for something that isn't really
>>> a protocol version change at all.  We could introduce a
>>> differently-named general mechanism, but I guess I'm not seeing the
>>> point of that either.  Just because someone has a
>>> backward-compatibility issue with one change of this type doesn't mean
>>> they have a similar issue with all of them.  So I think adding a
>>> special-purpose GUC is more logical and more parallel to what we've
>>> done in the past, and it doesn't saddle us with having to be certain
>>> that we've designed the mechanism generally enough to handle all the
>>> cases that may come later.
>>
>> That makes sense.  An attraction of a single binary format version was avoiding
>> the "Is this worth a GUC?" conversation for each change.  However, adding a GUC
>> should be no more notable than bumping a binary format version.
>
> I see the main difference between the GUC per feature vs minor version being that
> in versioned changes old clients keep working because the have to explicitly
> request a specific version. Whereas in separate GUC variables each feature will be
> enabled by default and users have to either keep up with new client versions or
> figure out how to explicitly disable the changes.

No, we can decide that anew for each GUC.  If you'd propose that array_output
= full be the default, that works for me.

> Here is a second version of the patch. The binary array encoding changes
> stay the same but all code around was rewritten.
>
> Changes from previous versions based on received comments:
> * removed the minor protocol version concept
> * introduced a new GUC variable array_output copying the current
>   bytea_output type, with values "full" (old value) and
>   "smallfixed" (new default)

How about the name "array_binary_output"?

> * added documentation for the new GUC variable
> * used constants for the array flags variable values

> *** a/doc/src/sgml/config.sgml
> --- b/doc/src/sgml/config.sgml
> *************** COPY postgres_log FROM '/full/path/to/lo
> *** 4888,4893 ****
> --- 4888,4910 ----
>         </listitem>
>        </varlistentry>
>   
> +      <varlistentry id="guc-array-output" xreflabel="array_output">
> +       <term><varname>array_output</varname> (<type>enum</type>)</term>
> +       <indexterm>
> +        <primary><varname>array_output</> configuration parameter</primary>
> +       </indexterm>
> +       <listitem>
> +        <para>
> +         Sets the output format for binary encoding of values of
> +         type <type>array</type>. Valid values are
> +         <literal>smallfixed</literal> (the default)
> +         and <literal>full</literal> (the traditional PostgreSQL
> +         format).  The <type>array</type> type always

It's not "The array type" but "Array types", a class.

> +         accepts both formats on input, regardless of this setting.
> +        </para>
> +       </listitem>
> +      </varlistentry>

The section "Array Input and Output Syntax" should reference this GUC.

> *** a/src/backend/utils/misc/guc.c
> --- b/src/backend/utils/misc/guc.c
> ***************
> *** 64,69 ****
> --- 64,70 ----
>   #include "storage/predicate.h"
>   #include "tcop/tcopprot.h"
>   #include "tsearch/ts_cache.h"
> + #include "utils/array.h"
>   #include "utils/builtins.h"
>   #include "utils/bytea.h"
>   #include "utils/guc_tables.h"
> *************** static const struct config_enum_entry by
> *** 225,230 ****
> --- 226,243 ----
>   };
>   
>   /*
> +  * Options for enum values defined in this module.
> +  *
> +  * NOTE! Option values may not contain double quotes!
> +  */

Don't replicate this comment.

> + 
> + static const struct config_enum_entry array_output_options[] = {
> +     {"full", ARRAY_OUTPUT_FULL, false},
> +     {"smallfixed", ARRAY_OUTPUT_SMALLFIXED, false},
> +     {NULL, 0, false}
> + };
> + 
> + /*
>    * We have different sets for client and server message level options because
>    * they sort slightly different (see "log" level)
>    */
> *************** static struct config_enum ConfigureNames
> *** 3047,3052 ****
> --- 3060,3075 ----
>           NULL, NULL, NULL
>       },
>   
> +     {
> +         {"array_output", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,

You've put the GUC in COMPAT_OPTIONS_PREVIOUS, but you've put its
documentation in the section for CLIENT_CONN_STATEMENT.  I don't have a strong
opinion on which one to use, but be consistent.

> +             gettext_noop("Sets the binary output format for arrays."),
> +             NULL
> +         },
> +         &bytea_output,

&array_output

> +         ARRAY_OUTPUT_SMALLFIXED, array_output_options,
> +         NULL, NULL, NULL
> +     },
> + 
>       {
>           {"client_min_messages", PGC_USERSET, LOGGING_WHEN,
>               gettext_noop("Sets the message levels that are sent to the client."),

Thanks,
nm


pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [PATCH] Support for foreign keys with arrays
Next
From: Robert Haas
Date:
Subject: Re: Inline Extension