Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation - Mailing list pgsql-hackers

From Noah Misch
Subject Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Date
Msg-id 20241120012832.7d.nmisch@google.com
Whole thread Raw
In response to Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Thu, Oct 03, 2024 at 06:55:47AM -0700, Masahiko Sawada wrote:
> I've attached PoC patches for the idea Noah proposed. Newly created
> clusters unconditionally have default_char_signedness=true, and the
> only source of signedness=false is pg_upgrade. To update the
> signedness in the controlfile, pg_resetwal now has a new option
> --char-signedness, which is used by pg_upgrade internally. Feedback is
> very welcome.

Upthread, we discussed testability.  Does pg_resetwal facilitate all
appropriate testing, or do testing difficulties remain?

I reviewed these patches, finding only one non-cosmetic review comment.  Given
the PoC status, some of the observations below are likely ones you already
know or would have found before exiting PoC.

> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -27858,6 +27858,11 @@ acl      | {postgres=arwdDxtm/postgres,foo=r/postgres}
>         <entry><type>integer</type></entry>
>        </row>
>  
> +      <row>
> +       <entry><structfield>signed_char</structfield></entry>
> +       <entry><type>bool</type></entry>

s/signed_char/default_char_signedness/ to align with the naming below.

> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -4256,6 +4256,33 @@ WriteControlFile(void)

> +     * Newly created database clusters unconditionally set the default char
> +     * signedness to true. pg_upgrade changes this flag for clusters that were
> +     * initialized on signedness=false platforms. As a result,
> +     * signedness=false setting will become rare over time. will get rarer
> +     * over time.

There's a redundant fragment of sentence.

> --- a/doc/src/sgml/ref/pg_resetwal.sgml
> +++ b/doc/src/sgml/ref/pg_resetwal.sgml
> @@ -171,6 +171,16 @@ PostgreSQL documentation
>    </para>
>  
>    <variablelist>
> +   <varlistentry>
> +    <term><option>--char-signedness=<replaceable class="parameter">option</replaceable></option></term>
> +    <listitem>
> +     <para>
> +      Manually set the default char signedness. Possible values are
> +      <literal>signed</literal> and <literal>unsigned</literal>.
> +     </para>
> +    </listitem>
> +   </varlistentry>

This needs more docs, like its <varlistentry> neighbors have.  First, see the
point below about the pg_upgrade docs.

> --- a/src/bin/pg_resetwal/pg_resetwal.c
> +++ b/src/bin/pg_resetwal/pg_resetwal.c
> @@ -1189,6 +1213,7 @@ usage(void)
>      printf(_("  -u, --oldest-transaction-id=XID  set oldest transaction ID\n"));
>      printf(_("  -x, --next-transaction-id=XID    set next transaction ID\n"));
>      printf(_("      --wal-segsize=SIZE           size of WAL segments, in megabytes\n"));
> +    printf(_("      --char-signedness=OPTION     set char signedness to \"signed\"  or \"unsigned\"\n"));

This help output alphabetizes by short option, with the one non-short option
at the end.  So I'd also alphabetize among the non-short options, specifically
putting the new line before --wal-segsize.

> source cluster was initialized on the same platform where pg_upgrae is

Typo "pg_upgrade"

> --- a/src/bin/pg_upgrade/controldata.c
> +++ b/src/bin/pg_upgrade/controldata.c
> @@ -62,6 +62,7 @@ get_control_data(ClusterInfo *cluster)
>      bool        got_date_is_int = false;
>      bool        got_data_checksum_version = false;
>      bool        got_cluster_state = false;
> +    bool        got_default_char_signedness = false;
>      char       *lc_collate = NULL;
>      char       *lc_ctype = NULL;
>      char       *lc_monetary = NULL;
> @@ -206,6 +207,13 @@ get_control_data(ClusterInfo *cluster)
>          got_data_checksum_version = true;
>      }
>  
> +    /* Only in <= 17 */
> +    if (GET_MAJOR_VERSION(cluster->major_version) <= 1700)
> +    {
> +        cluster->controldata.default_char_signedness = true;

If anything reads the "true" stored here, that would be a bug.  Is there a
reasonable way do one or two of the following?

1. not initialize it at all
2. store something like -1 instead
3. add a comment that it's never read

> +        got_default_char_signedness = true;
> +    }

I wouldn't say we "got" this.  I'd handle this more like got_oldestmulti,
where we know !got_oldestmulti is okay until MULTIXACT_FORMATCHANGE_CAT_VER.
Maybe also replace the "<= 1700" checks with catversion checks.

> +
>      /* we have the result of cmd in "output". so parse it line by line now */
>      while (fgets(bufin, sizeof(bufin), output))
>      {
> @@ -501,6 +509,17 @@ get_control_data(ClusterInfo *cluster)
>              cluster->controldata.data_checksum_version = str2uint(p);
>              got_data_checksum_version = true;
>          }
> +        else if ((p = strstr(bufin, "char data signedness:")) != NULL)
> +        {
> +            p = strchr(p, ':');
> +
> +            if (p == NULL || strlen(p) <= 1)
> +                pg_fatal("%d: controldata retrieval problem", __LINE__);
> +
> +            p++;                /* remove ':' char */
> +            cluster->controldata.default_char_signedness = strstr(p, "signed") != NULL;

Won't this match "unsigned", too?  (This is my only non-cosmetic review
comment.)  I'd strcmp for the two expected values.  If neither matches, report
the pg_fatal "retrieval problem".

> --- a/doc/src/sgml/ref/pgupgrade.sgml
> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -276,6 +276,24 @@ PostgreSQL documentation
>        </listitem>
>       </varlistentry>
>  
> +     <varlistentry>
> +      <term><option>--set-char-signedness=</option><replaceable>option</replaceable></term>
> +      <listitem>
> +       <para>
> +        Manually set the default char signedness of new clusters. Possible values
> +        are <literal>signed</literal> and <literal>unsigned</literal>.
> +       </para>
> +       <para>
> +        The signedness of the 'char' type in C is implementation-dependent. For instance,
> +        'signed char' is used by default on x86 CPUs, while 'unsigned char' is used on aarch
> +        CPUs. <application>pg_upgrade</application> automatically inherits the char
> +        signedness from the old cluster. This option is useful for upgrading the cluster
> +        that user knew they copied it to a platform having different char signedness
> +        (e.g. from x86 to aarch).

That last sentence would need some grammar help.  Instead, let's rewrite this
paragraph to assume the reader is not a C programmer.  Focus on what such a
reader should do to choose what argument, if any, to pass.

Our docs haven't used the term "aarch".  This applies to 32-bit ARM in
addition to 64-bit ARM.  Hence, I'd write the term "ARM" instead.

> --- a/src/bin/pg_upgrade/option.c
> +++ b/src/bin/pg_upgrade/option.c
> @@ -307,6 +316,7 @@ usage(void)
>      printf(_("  --copy                        copy files to new cluster (default)\n"));
>      printf(_("  --copy-file-range             copy files to new cluster with copy_file_range\n"));
>      printf(_("  --sync-method=METHOD          set method for syncing files to disk\n"));
> +    printf(_("  --set-char-signedness=OPTION  set new cluster char signedness to \"signed\" or \"unsigned\""));

Move this up one line, to preserve alphabetization.

> --- a/src/backend/storage/lmgr/predicate.c
> +++ b/src/backend/storage/lmgr/predicate.c
> @@ -2588,6 +2588,11 @@ PredicateLockPage(Relation relation, BlockNumber blkno, Snapshot snapshot)
>      if (!SerializationNeededForRead(relation, snapshot))
>          return;
>  
> +    ereport(LOG, (errmsg("predicate %s blk %u",
> +                         RelationGetRelationName(relation),
> +                         blkno),
> +                  errbacktrace()));

Revert this file.

Thanks,
nm



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
Next
From: Greg Sabino Mullane
Date:
Subject: Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE