Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Date
Msg-id CAD21AoBO-iL=UgHKQrKE=7zBrMYR_fAhLb8AGo_kmNgmkPKuQA@mail.gmail.com
Whole thread Raw
In response to Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation  (Noah Misch <noah@leadboat.com>)
Responses Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
List pgsql-hackers
On Sun, May 19, 2024 at 6:46 AM Noah Misch <noah@leadboat.com> wrote:
>
> On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote:
> >
> > How about extending amcheck to support GIN and GIst indexes so that it
> > can detect potential data incompatibility due to changing 'char' to
> > 'unsigned char'? I think these new tests would be useful also for
> > users to check if they really need to reindex indexes due to such
> > changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18.
> > Users who upgraded to PG18 can run the new amcheck tests on the
> > primary as well as the standby.
>
> If I were standardizing pg_trgm on one or the other notion of "char", I would
> choose signed char, since I think it's still the majority.  More broadly, I
> see these options to fix pg_trgm:
>
> 1. Change to signed char.  Every arm64 system needs to scan pg_trgm indexes.
> 2. Change to unsigned char.  Every x86 system needs to scan pg_trgm indexes.

Even though it's true that signed char systems are the majority, it
would not be acceptable to force the need to scan pg_trgm indexes on
unsigned char systems.

> 3. Offer both, as an upgrade path.  For example, pg_trgm could have separate
>    operator classes gin_trgm_ops and gin_trgm_ops_unsigned.  Running
>    pg_upgrade on an unsigned-char system would automatically map v17
>    gin_trgm_ops to v18 gin_trgm_ops_unsigned.  This avoids penalizing any
>    architecture with upgrade-time scans.

Very interesting idea. How can new v18 users use the correct operator
class? I don't want to require users to specify the correct signed or
unsigned operator classes when creating a GIN index. Maybe we need to
dynamically use the correct compare function for the same operator
class depending on the char signedness. But is it possible to do it on
the extension (e.g. pg_trgm) side?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Next
From: Tom Lane
Date:
Subject: Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.