Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation |
Date | |
Msg-id | 20240930184941.da.nmisch@google.com Whole thread Raw |
In response to | Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
|
List | pgsql-hackers |
On Wed, Sep 25, 2024 at 03:46:27PM -0700, Masahiko Sawada wrote: > On Wed, Sep 18, 2024 at 6:46 PM Noah Misch <noah@leadboat.com> wrote: > > On Tue, Sep 17, 2024 at 09:43:41PM -0700, Masahiko Sawada wrote: > > > On Mon, Sep 16, 2024 at 9:24 AM Noah Misch <noah@leadboat.com> wrote: > > > > On Thu, Sep 12, 2024 at 03:42:48PM -0700, Masahiko Sawada wrote: > > > > > On Tue, Sep 10, 2024 at 3:05 PM Noah Misch <noah@leadboat.com> wrote: > > > > > > On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote: > > > > > > > Got it. So now I'm wondering if we need all the complexity of storing > > > > > > > stuff in the GIN metapages. Could we simply read the (primary's) > > > > > > > signedness out of pg_control and use that? > > > Thanks. I like the simplicity of this approach. If we agree with this > > > approach, I'd like to proceed with it. > > > > Works for me. > > > > > Regardless of what approach we take, I wanted to provide some > > > regression tests for these changes, but I could not come up with a > > > reasonable idea. It would be great if we could do something like > > > 027_stream_regress.pl on cross-architecture replication. But just > > > executing 027_stream_regress.pl on cross-architecture replication > > > could not be sufficient since we would like to confirm query results > > > as well. If we could have a reasonable tool or way, it would also help > > > find other similar bugs related architectural differences. > > > > Perhaps add a regress.c function that changes the control file flag and > > flushes the change to disk? > > I've tried this idea but found out that extensions can flush the > controlfile on the disk after flipping the char signedness value, they > cannot update the controlfile data on the shared memory. And, when the > server shuts down, the on-disk controlfile is updated with the > on-memory controlfile data, so the char signedness is reverted. > > We can add a function to set the char signedness of on-memory > controlfile data or add a new option to pg_resetwal to change the char > signedness on-disk controlfile data but they seem to be overkill to me > and I'm concerned about misusing these option and function. Before the project is over, pg_upgrade will have some way to set the control file signedness to the source cluster signedness. I think pg_upgrade should also take an option for overriding the source cluster signedness for source clusters v17 and older. That way, a user knowing they copied the v17 source cluster from x86 to ARM can pg_upgrade properly without the prerequisite of acquiring an x86 VM. Once that all exists, perhaps it will be clearer how best to change signedness for testing. > While this change does not encourage the use of 'char' without > explicitly specifying its signedness, it provides a hint to ensure > consistent behavior for indexes (e.g., GIN and GiST) and tables that > already store data sorted by the 'char' type on disk, especially in > cross-platform replication scenarios. Let's put that sort of information in the GetDefaultCharSignedness() header comment. New code should use explicit "signed char" or "unsigned char" when it matters for data file compatibility. GetDefaultCharSignedness() exists for code that deals with pre-v18 data files, where we accidentally let C implementation signedness affect persistent data. > @@ -4256,6 +4257,18 @@ WriteControlFile(void) > > ControlFile->float8ByVal = FLOAT8PASSBYVAL; > > + /* > + * The signedness of the char type is implementation-defined. For example > + * on x86 architecture CPUs, the char data type is typically treated as > + * signed by default whereas on aarch architecture CPUs it is typically > + * treated as unsigned by default. > + */ > +#if CHAR_MIN != 0 > + ControlFile->default_char_signedness = true; > +#else > + ControlFile->default_char_signedness = false; > +#endif This has initdb set the field to the current C implementation's signedness. I wonder if, instead, initdb should set signedness=true unconditionally. Then the only source of signedness=false will be pg_upgrade. Advantage: signedness=false will get rarer over time. If we had known about this problem during the last development cycle that forced initdb (v8.3), we would have made all clusters signed or all clusters unsigned. Making pg_upgrade the only source of signedness=false will cause the population of database clusters to converge toward that retrospective ideal. Disadvantage: testing signedness=false will require more planning. What other merits should we consider as part of deciding?
pgsql-hackers by date: