Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation - Mailing list pgsql-hackers

From Noah Misch
Subject Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Date
Msg-id 20240518214546.e8@rfd.leadboat.com
Whole thread Raw
In response to Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
List pgsql-hackers
On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote:
> On Sat, May 4, 2024 at 7:36 AM Joe Conway <mail@joeconway.com> wrote:
> > On 5/3/24 11:44, Peter Eisentraut wrote:
> > > On 03.05.24 16:13, Tom Lane wrote:
> > >> Peter Eisentraut <peter@eisentraut.org> writes:
> > >>> On 30.04.24 19:29, Tom Lane wrote:
> > >>>> Also, the bigger picture here is the seeming assumption that "if
> > >>>> we change pg_trgm then it will be safe to replicate from x86 to
> > >>>> arm".  I don't believe that that's a good idea and I'm unwilling
> > >>>> to promise that it will work, regardless of what we do about
> > >>>> char signedness.  That being the case, I don't want to invest a
> > >>>> lot of effort in the signedness issue.  Option (1) is clearly
> > >>>> a small change with little if any risk of future breakage.
> > >>
> > >>> But note that option 1 would prevent some replication that is currently
> > >>> working.
> > >>
> > >> The point of this thread though is that it's working only for small
> > >> values of "work".  People are rightfully unhappy if it seems to work
> > >> and then later they get bitten by compatibility problems.
> > >>
> > >> Treating char signedness as a machine property in pg_control would
> > >> signal that we don't intend to make it work, and would ensure that
> > >> even the most minimal testing would find out that it doesn't work.
> > >>
> > >> If we do not do that, it seems to me we have to buy into making
> > >> it work.  That would mean dealing with the consequences of an
> > >> incompatible change in pg_trgm indexes, and then going through
> > >> the same dance again the next time(s) similar problems are found.
> > >
> > > Yes, that is understood.  But anecdotally, replicating between x86-64 arm64 is
> > > occasionally used for upgrades or migrations.  In practice, this appears to have
> > > mostly worked.  If we now discover that it won't work with certain index
> > > extension modules, it's usable for most users. Even if we say, you have to
> > > reindex everything afterwards, it's probably still useful for these scenarios.

Like you, I would not introduce a ControlFileData block for sign-of-char,
given the signs of breakage in extension indexing only.  That would lose much
useful migration capability.  I'm sympathetic to Tom's point, which I'll
attempt to summarize as: a ControlFileData block is a promise we know how to
keep, whereas we should expect further bug discoveries without it.  At the
same time, I put more weight on the apparently-wide span of functionality that
works fine.  (I wonder whether any static analyzer does or practically could
detect char sign dependence with a decent false positive rate.)

> +1
> 
> How about extending amcheck to support GIN and GIst indexes so that it
> can detect potential data incompatibility due to changing 'char' to
> 'unsigned char'? I think these new tests would be useful also for
> users to check if they really need to reindex indexes due to such
> changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18.
> Users who upgraded to PG18 can run the new amcheck tests on the
> primary as well as the standby.

If I were standardizing pg_trgm on one or the other notion of "char", I would
choose signed char, since I think it's still the majority.  More broadly, I
see these options to fix pg_trgm:

1. Change to signed char.  Every arm64 system needs to scan pg_trgm indexes.
2. Change to unsigned char.  Every x86 system needs to scan pg_trgm indexes.
3. Offer both, as an upgrade path.  For example, pg_trgm could have separate
   operator classes gin_trgm_ops and gin_trgm_ops_unsigned.  Running
   pg_upgrade on an unsigned-char system would automatically map v17
   gin_trgm_ops to v18 gin_trgm_ops_unsigned.  This avoids penalizing any
   architecture with upgrade-time scans.

Independently, having amcheck for GIN and/or GiST would be great.



pgsql-hackers by date:

Previous
From: Yasir
Date:
Subject: Re: Ignore Visual Studio's Temp Files While Working with PG on Windows
Next
From: Bruce Momjian
Date:
Subject: Re: First draft of PG 17 release notes