Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation |
Date | |
Msg-id | CAD21AoDYBNG-EsgDy-sMiT6cinmrvsKe6N5BVOhizUb1k_ta2g@mail.gmail.com Whole thread Raw |
In response to | Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation (Noah Misch <noah@leadboat.com>) |
Responses |
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
|
List | pgsql-hackers |
On Mon, Sep 30, 2024 at 11:49 AM Noah Misch <noah@leadboat.com> wrote: > > On Wed, Sep 25, 2024 at 03:46:27PM -0700, Masahiko Sawada wrote: > > On Wed, Sep 18, 2024 at 6:46 PM Noah Misch <noah@leadboat.com> wrote: > > > On Tue, Sep 17, 2024 at 09:43:41PM -0700, Masahiko Sawada wrote: > > > > On Mon, Sep 16, 2024 at 9:24 AM Noah Misch <noah@leadboat.com> wrote: > > > > > On Thu, Sep 12, 2024 at 03:42:48PM -0700, Masahiko Sawada wrote: > > > > > > On Tue, Sep 10, 2024 at 3:05 PM Noah Misch <noah@leadboat.com> wrote: > > > > > > > On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote: > > > > > > > > Got it. So now I'm wondering if we need all the complexity of storing > > > > > > > > stuff in the GIN metapages. Could we simply read the (primary's) > > > > > > > > signedness out of pg_control and use that? > > > > > Thanks. I like the simplicity of this approach. If we agree with this > > > > approach, I'd like to proceed with it. > > > > > > Works for me. > > > > > > > Regardless of what approach we take, I wanted to provide some > > > > regression tests for these changes, but I could not come up with a > > > > reasonable idea. It would be great if we could do something like > > > > 027_stream_regress.pl on cross-architecture replication. But just > > > > executing 027_stream_regress.pl on cross-architecture replication > > > > could not be sufficient since we would like to confirm query results > > > > as well. If we could have a reasonable tool or way, it would also help > > > > find other similar bugs related architectural differences. > > > > > > Perhaps add a regress.c function that changes the control file flag and > > > flushes the change to disk? > > > > I've tried this idea but found out that extensions can flush the > > controlfile on the disk after flipping the char signedness value, they > > cannot update the controlfile data on the shared memory. And, when the > > server shuts down, the on-disk controlfile is updated with the > > on-memory controlfile data, so the char signedness is reverted. > > > > We can add a function to set the char signedness of on-memory > > controlfile data or add a new option to pg_resetwal to change the char > > signedness on-disk controlfile data but they seem to be overkill to me > > and I'm concerned about misusing these option and function. > > Before the project is over, pg_upgrade will have some way to set the control > file signedness to the source cluster signedness. I think pg_upgrade should > also take an option for overriding the source cluster signedness for source > clusters v17 and older. That way, a user knowing they copied the v17 source > cluster from x86 to ARM can pg_upgrade properly without the prerequisite of > acquiring an x86 VM. Good idea. > Once that all exists, perhaps it will be clearer how > best to change signedness for testing. Agreed. > > > While this change does not encourage the use of 'char' without > > explicitly specifying its signedness, it provides a hint to ensure > > consistent behavior for indexes (e.g., GIN and GiST) and tables that > > already store data sorted by the 'char' type on disk, especially in > > cross-platform replication scenarios. > > Let's put that sort of information in the GetDefaultCharSignedness() header > comment. New code should use explicit "signed char" or "unsigned char" when > it matters for data file compatibility. GetDefaultCharSignedness() exists for > code that deals with pre-v18 data files, where we accidentally let C > implementation signedness affect persistent data. Agreed, will fix. > > > @@ -4256,6 +4257,18 @@ WriteControlFile(void) > > > > ControlFile->float8ByVal = FLOAT8PASSBYVAL; > > > > + /* > > + * The signedness of the char type is implementation-defined. For example > > + * on x86 architecture CPUs, the char data type is typically treated as > > + * signed by default whereas on aarch architecture CPUs it is typically > > + * treated as unsigned by default. > > + */ > > +#if CHAR_MIN != 0 > > + ControlFile->default_char_signedness = true; > > +#else > > + ControlFile->default_char_signedness = false; > > +#endif > > This has initdb set the field to the current C implementation's signedness. I > wonder if, instead, initdb should set signedness=true unconditionally. Then > the only source of signedness=false will be pg_upgrade. > > Advantage: signedness=false will get rarer over time. If we had known about > this problem during the last development cycle that forced initdb (v8.3), we > would have made all clusters signed or all clusters unsigned. Making > pg_upgrade the only source of signedness=false will cause the population of > database clusters to converge toward that retrospective ideal. I think it's a good idea. Being able to treat one case as a rarer one rather than treating both cases equally may have various advantages in the future, for example when developing or investigating a problem. > Disadvantage: testing signedness=false will require more planning. If testing signedness=false always requires pg_upgrade, there might be some cumbersomeness. Inventing a testing-purpose-only tool (e.g., a CLI program) that changes the signedness might make tests easier. > What other merits should we consider as part of deciding? Considering that the population of database cluster signedness will converge to signedness=true in the future, we can consider using -fsigned-char to prevent similar problems for the future. We need to think about possible side-effects as well, though. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: