Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Date
Msg-id CAD21AoC=7oF29N6iKaTssn-rRGnoTVSLzTQh4ia4ZUO1iKKc_w@mail.gmail.com
Whole thread Raw
In response to Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation  (Noah Misch <noah@leadboat.com>)
Responses Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
List pgsql-hackers
On Fri, Aug 30, 2024 at 8:10 PM Noah Misch <noah@leadboat.com> wrote:
>
> On Thu, Aug 29, 2024 at 03:48:53PM -0500, Masahiko Sawada wrote:
> > On Sun, May 19, 2024 at 6:46 AM Noah Misch <noah@leadboat.com> wrote:
> > > If I were standardizing pg_trgm on one or the other notion of "char", I would
> > > choose signed char, since I think it's still the majority.  More broadly, I
> > > see these options to fix pg_trgm:
> > >
> > > 1. Change to signed char.  Every arm64 system needs to scan pg_trgm indexes.
> > > 2. Change to unsigned char.  Every x86 system needs to scan pg_trgm indexes.
> >
> > Even though it's true that signed char systems are the majority, it
> > would not be acceptable to force the need to scan pg_trgm indexes on
> > unsigned char systems.
> >
> > > 3. Offer both, as an upgrade path.  For example, pg_trgm could have separate
> > >    operator classes gin_trgm_ops and gin_trgm_ops_unsigned.  Running
> > >    pg_upgrade on an unsigned-char system would automatically map v17
> > >    gin_trgm_ops to v18 gin_trgm_ops_unsigned.  This avoids penalizing any
> > >    architecture with upgrade-time scans.
> >
> > Very interesting idea. How can new v18 users use the correct operator
> > class? I don't want to require users to specify the correct signed or
> > unsigned operator classes when creating a GIN index. Maybe we need to
>
> In brief, it wouldn't matter which operator class new v18 indexes use.  The
> documentation would focus on gin_trgm_ops and also say something like:
>
>   There's an additional operator class, gin_trgm_ops_unsigned.  It behaves
>   exactly like gin_trgm_ops, but it uses a deprecated on-disk representation.
>   Use gin_trgm_ops in new indexes, but there's no disadvantage from continuing
>   to use gin_trgm_ops_unsigned.  Before PostgreSQL 18, gin_trgm_ops used a
>   platform-dependent representation.  pg_upgrade automatically uses
>   gin_trgm_ops_unsigned when upgrading from source data that used the
>   deprecated representation.
>
> What concerns might users have, then?  (Neither operator class would use plain
> "char" in a context that affects on-disk state.  They'll use "signed char" and
> "unsigned char".)

I think I understand your idea now. Since gin_trgm_ops will use
"signed char", there is no impact for v18 users -- they can continue
using gin_trgm_ops.

But how does pg_upgrade use gin_trgm_ops_unsigned? This opclass will
be created by executing the pg_trgm script for v18, but it isn't
executed during pg_upgrade. Another way would be to do these opclass
replacement when executing the pg_trgm's update script (i.e., 1.6 to
1.7).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: AIO v2.0
Next
From: Noah Misch
Date:
Subject: Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation