Thread: pg_trgm comparison bug on cross-architecture replication due to different char implementation
pg_trgm comparison bug on cross-architecture replication due to different char implementation
Hi all,
I would like to report an issue with the pg_trgm extension on cross-architecture replication scenarios. When an x86_64 standby server is replicating from an aarch64 primary server or vice versa, the gist_trgm_ops opclass returns different results on the primary and standby. Masahiko previously reported a similar issue affecting the pg_bigm extension [1].
To reproduce, execute the following on the x86_64 primary server:
CREATE EXTENSION pg_trgm;
CREATE TABLE tbl (c text);
CREATE INDEX ON tbl USING gist (c gist_trgm_ops);
INSERT INTO tbl VALUES ('Bóbr');
On the x86_64 primary server:
postgres=> select * from tbl where c like '%Bób%';
c
------
Bóbr
(1 row)
On the aarch64 replica server:
postgres=> select * from tbl where c like '%Bób%';
c
---
(0 rows)
The root cause looks the same as the pg_bigm issue that Masahiko reported. To compare trigrams, pg_trgm uses a numerical comparison of chars [2]. On x86_64 a char is signed by default, whereas on aarch64 it is unsigned by default. gist_trgm_ops expects the trigram list to be sorted, but due to the different signedness of chars, the sort order is broken when replicating the values across architectures.
The different sort behaviour can be demonstrated using show_trgm.
On the x86_64 primary server:
postgres=> SELECT show_trgm('Bóbr');
show_trgm
------------------------------------------
{0x89194c," b","br ",0x707c72,0x7f7849}
(1 row)
On the aarch64 replica server:
postgres=> SELECT show_trgm('Bóbr');
show_trgm
------------------------------------------
{" b","br ",0x707c72,0x7f7849,0x89194c}
(1 row)
One simple solution for this specific case is to declare the char signedness in the CMPPCHAR macro.
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -42,7 +42,7 @@
typedef char trgm[3];
#define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) )
-#define CMPPCHAR(a,b,i) CMPCHAR( *(((const char*)(a))+i), *(((const char*)(b))+i) )
+#define CMPPCHAR(a,b,i) CMPCHAR( *(((unsigned char*)(a))+i), *(((unsigned char*)(b))+i) )
#define CMPTRGM(a,b) ( CMPPCHAR(a,b,0) ? CMPPCHAR(a,b,0) : ( CMPPCHAR(a,b,1) ? CMPPCHAR(a,b,1) : CMPPCHAR(a,b,2) ) )
#define CPTRGM(a,b) do { \
Alternatively, Postgres can be compiled with -funsigned-char or -fsigned-char. I came across a code comment suggesting that this may not be a good idea in general [3].
Given that this has problem has come up before and seems likely to come up again, I'm curious what other broad solutions there might be to resolve it? Looking forward to any feedback, thanks!
Best,
Adam Guo
Amazon Web Services: https://aws.amazon.com
[1] https://osdn.net/projects/pgbigm/lists/archive/hackers/2024-February/000370.html
[2] https://github.com/postgres/postgres/blob/480bc6e3ed3a5719cdec076d4943b119890e8171/contrib/pg_trgm/trgm.h#L45
[3] https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/cash.c#L114-L123
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
"Guo, Adam" <adamguo@amazon.com> writes: > I would like to report an issue with the pg_trgm extension on > cross-architecture replication scenarios. When an x86_64 standby > server is replicating from an aarch64 primary server or vice versa, > the gist_trgm_ops opclass returns different results on the primary > and standby. I do not think that is a supported scenario. Hash functions and suchlike are not guaranteed to produce the same results on different CPU architectures. As a quick example, I get regression=# select hashfloat8(34); hashfloat8 ------------ 21570837 (1 row) on x86_64 but postgres=# select hashfloat8(34); hashfloat8 ------------ -602898821 (1 row) on ppc32 thanks to the endianness difference. > Given that this has problem has come up before and seems likely to > come up again, I'm curious what other broad solutions there might be > to resolve it? Reject as not a bug. Discourage people from thinking that physical replication will work across architectures. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Apr 23, 2024 at 11:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Guo, Adam" <adamguo@amazon.com> writes: > > I would like to report an issue with the pg_trgm extension on > > cross-architecture replication scenarios. When an x86_64 standby > > server is replicating from an aarch64 primary server or vice versa, > > the gist_trgm_ops opclass returns different results on the primary > > and standby. > > I do not think that is a supported scenario. Hash functions and > suchlike are not guaranteed to produce the same results on different > CPU architectures. As a quick example, I get > > regression=# select hashfloat8(34); > hashfloat8 > ------------ > 21570837 > (1 row) > > on x86_64 but > > postgres=# select hashfloat8(34); > hashfloat8 > ------------ > -602898821 > (1 row) > > on ppc32 thanks to the endianness difference. > > > Given that this has problem has come up before and seems likely to > > come up again, I'm curious what other broad solutions there might be > > to resolve it? > > Reject as not a bug. Discourage people from thinking that physical > replication will work across architectures. While cross-arch physical replication is not supported, I think having architecture dependent differences is not good and It's legitimate to fix it. FYI the 'char' data type comparison is done as though char is unsigned. I've attached a small patch to fix it. What do you think? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Masahiko Sawada <sawada.mshk@gmail.com> writes: > On Tue, Apr 23, 2024 at 11:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Reject as not a bug. Discourage people from thinking that physical >> replication will work across architectures. > While cross-arch physical replication is not supported, I think having > architecture dependent differences is not good and It's legitimate to > fix it. FYI the 'char' data type comparison is done as though char is > unsigned. I've attached a small patch to fix it. What do you think? I think this will break existing indexes that are working fine. Yeah, it would have been better to avoid the difference, but it's too late now. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Apr 30, 2024 at 12:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Masahiko Sawada <sawada.mshk@gmail.com> writes: > > On Tue, Apr 23, 2024 at 11:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Reject as not a bug. Discourage people from thinking that physical > >> replication will work across architectures. > > > While cross-arch physical replication is not supported, I think having > > architecture dependent differences is not good and It's legitimate to > > fix it. FYI the 'char' data type comparison is done as though char is > > unsigned. I've attached a small patch to fix it. What do you think? > > I think this will break existing indexes that are working fine. > Yeah, it would have been better to avoid the difference, but > it's too late now. True. So it will be a PG18 item. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Masahiko Sawada <sawada.mshk@gmail.com> writes: > On Tue, Apr 30, 2024 at 12:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think this will break existing indexes that are working fine. >> Yeah, it would have been better to avoid the difference, but >> it's too late now. > True. So it will be a PG18 item. How will it be any better in v18? It's still an on-disk compatibility break for affected platforms. Now, people could recover by reindexing affected indexes, but I think we need to have a better justification than this for making them do so. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Apr 23, 2024 at 5:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Guo, Adam" <adamguo@amazon.com> writes: > > I would like to report an issue with the pg_trgm extension on > > cross-architecture replication scenarios. When an x86_64 standby > > server is replicating from an aarch64 primary server or vice versa, > > the gist_trgm_ops opclass returns different results on the primary > > and standby. > > I do not think that is a supported scenario. Hash functions and > suchlike are not guaranteed to produce the same results on different > CPU architectures. As a quick example, I get > > regression=# select hashfloat8(34); > hashfloat8 > ------------ > 21570837 > (1 row) > > on x86_64 but > > postgres=# select hashfloat8(34); > hashfloat8 > ------------ > -602898821 > (1 row) > > on ppc32 thanks to the endianness difference. Given this, should we try to do better with binary compatibility checks using ControlFileData? AFAICS they are supposed to check if the database cluster is binary compatible with the running architecture. But it obviously allows incompatibilities. ------ Regards, Alexander Korotkov
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Alexander Korotkov <aekorotkov@gmail.com> writes: > Given this, should we try to do better with binary compatibility > checks using ControlFileData? AFAICS they are supposed to check if > the database cluster is binary compatible with the running > architecture. But it obviously allows incompatibilities. Perhaps. pg_control already covers endianness, which I think is the root of the hashing differences I showed. Adding a field for char signedness feels a little weird, since it's not directly a property of the bits-on-disk, but maybe we should. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Apr 30, 2024 at 7:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <aekorotkov@gmail.com> writes: > > Given this, should we try to do better with binary compatibility > > checks using ControlFileData? AFAICS they are supposed to check if > > the database cluster is binary compatible with the running > > architecture. But it obviously allows incompatibilities. > > Perhaps. pg_control already covers endianness, which I think > is the root of the hashing differences I showed. Adding a field > for char signedness feels a little weird, since it's not directly > a property of the bits-on-disk, but maybe we should. I agree that storing char signedness might seem weird. But it appears that we already store indexes that depend on char signedness. So, it's effectively property of bits-on-disk even though it affects indirectly. Then I see two options to make the picture consistent. 1) Assume that char signedness is somehow a property of bits-on-disk even though it's weird. Then pg_trgm indexes are correct, but we need to store char signedness in pg_control. 2) Assume that char signedness is not a property of bits-on-disk. Then pg_trgm indexes are buggy and need to be fixed. What do you think? ------ Regards, Alexander Korotkov
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Alexander Korotkov <aekorotkov@gmail.com> writes: > I agree that storing char signedness might seem weird. But it appears > that we already store indexes that depend on char signedness. So, > it's effectively property of bits-on-disk even though it affects > indirectly. Then I see two options to make the picture consistent. > 1) Assume that char signedness is somehow a property of bits-on-disk > even though it's weird. Then pg_trgm indexes are correct, but we need > to store char signedness in pg_control. > 2) Assume that char signedness is not a property of bits-on-disk. > Then pg_trgm indexes are buggy and need to be fixed. > What do you think? The problem with option (2) is the assumption that pg_trgm's behavior is the only bug of this kind, either now or in the future. I think that's just about an impossible standard to meet, because there's no realistic way to test whether char signedness is affecting things. (Sure, you can compare results across platforms, but maybe you just didn't test the right case.) Also, the bigger picture here is the seeming assumption that "if we change pg_trgm then it will be safe to replicate from x86 to arm". I don't believe that that's a good idea and I'm unwilling to promise that it will work, regardless of what we do about char signedness. That being the case, I don't want to invest a lot of effort in the signedness issue. Option (1) is clearly a small change with little if any risk of future breakage. Option (2) ... not so much. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Wed, May 1, 2024 at 2:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alexander Korotkov <aekorotkov@gmail.com> writes: > > I agree that storing char signedness might seem weird. But it appears > > that we already store indexes that depend on char signedness. So, > > it's effectively property of bits-on-disk even though it affects > > indirectly. Then I see two options to make the picture consistent. > > 1) Assume that char signedness is somehow a property of bits-on-disk > > even though it's weird. Then pg_trgm indexes are correct, but we need > > to store char signedness in pg_control. > > 2) Assume that char signedness is not a property of bits-on-disk. > > Then pg_trgm indexes are buggy and need to be fixed. > > What do you think? > > The problem with option (2) is the assumption that pg_trgm's behavior > is the only bug of this kind, either now or in the future. I think > that's just about an impossible standard to meet, because there's no > realistic way to test whether char signedness is affecting things. > (Sure, you can compare results across platforms, but maybe you > just didn't test the right case.) > > Also, the bigger picture here is the seeming assumption that "if > we change pg_trgm then it will be safe to replicate from x86 to > arm". I don't believe that that's a good idea and I'm unwilling > to promise that it will work, regardless of what we do about > char signedness. That being the case, I don't want to invest a > lot of effort in the signedness issue. I think that the char signedness issue is an issue also for developers (and extension authors) since it could lead to confusion and potential bugs in the future due to that. x86 developers would think of char as always being signed and write code that will misbehave on arm machines. For example, since logical replication should behave correctly even in cross-arch replication all developers need to be aware of that. I thought of using the -funsigned-char (or -fsigned-char) compiler flag to avoid that but it would have a broader impact not only on indexes. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On 30.04.24 19:29, Tom Lane wrote: >> 1) Assume that char signedness is somehow a property of bits-on-disk >> even though it's weird. Then pg_trgm indexes are correct, but we need >> to store char signedness in pg_control. >> 2) Assume that char signedness is not a property of bits-on-disk. >> Then pg_trgm indexes are buggy and need to be fixed. >> What do you think? > Also, the bigger picture here is the seeming assumption that "if > we change pg_trgm then it will be safe to replicate from x86 to > arm". I don't believe that that's a good idea and I'm unwilling > to promise that it will work, regardless of what we do about > char signedness. That being the case, I don't want to invest a > lot of effort in the signedness issue. Option (1) is clearly > a small change with little if any risk of future breakage. But note that option 1 would prevent some replication that is currently working.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Peter Eisentraut <peter@eisentraut.org> writes: > On 30.04.24 19:29, Tom Lane wrote: >> Also, the bigger picture here is the seeming assumption that "if >> we change pg_trgm then it will be safe to replicate from x86 to >> arm". I don't believe that that's a good idea and I'm unwilling >> to promise that it will work, regardless of what we do about >> char signedness. That being the case, I don't want to invest a >> lot of effort in the signedness issue. Option (1) is clearly >> a small change with little if any risk of future breakage. > But note that option 1 would prevent some replication that is currently > working. The point of this thread though is that it's working only for small values of "work". People are rightfully unhappy if it seems to work and then later they get bitten by compatibility problems. Treating char signedness as a machine property in pg_control would signal that we don't intend to make it work, and would ensure that even the most minimal testing would find out that it doesn't work. If we do not do that, it seems to me we have to buy into making it work. That would mean dealing with the consequences of an incompatible change in pg_trgm indexes, and then going through the same dance again the next time(s) similar problems are found. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On 03.05.24 16:13, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: >> On 30.04.24 19:29, Tom Lane wrote: >>> Also, the bigger picture here is the seeming assumption that "if >>> we change pg_trgm then it will be safe to replicate from x86 to >>> arm". I don't believe that that's a good idea and I'm unwilling >>> to promise that it will work, regardless of what we do about >>> char signedness. That being the case, I don't want to invest a >>> lot of effort in the signedness issue. Option (1) is clearly >>> a small change with little if any risk of future breakage. > >> But note that option 1 would prevent some replication that is currently >> working. > > The point of this thread though is that it's working only for small > values of "work". People are rightfully unhappy if it seems to work > and then later they get bitten by compatibility problems. > > Treating char signedness as a machine property in pg_control would > signal that we don't intend to make it work, and would ensure that > even the most minimal testing would find out that it doesn't work. > > If we do not do that, it seems to me we have to buy into making > it work. That would mean dealing with the consequences of an > incompatible change in pg_trgm indexes, and then going through > the same dance again the next time(s) similar problems are found. Yes, that is understood. But anecdotally, replicating between x86-64 arm64 is occasionally used for upgrades or migrations. In practice, this appears to have mostly worked. If we now discover that it won't work with certain index extension modules, it's usable for most users. Even if we say, you have to reindex everything afterwards, it's probably still useful for these scenarios. The way I understand the original report, the issue has to do specifically with how signed and unsigned chars compare differently. I don't imagine this is used anywhere in the table/heap code. So it's plausible that this issue is really contained to indexes. On the other hand, if we put in a check against this, then at least we can answer any user questions about this with more certainty: No, won't work, here is why.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On 5/3/24 11:44, Peter Eisentraut wrote: > On 03.05.24 16:13, Tom Lane wrote: >> Peter Eisentraut <peter@eisentraut.org> writes: >>> On 30.04.24 19:29, Tom Lane wrote: >>>> Also, the bigger picture here is the seeming assumption that "if >>>> we change pg_trgm then it will be safe to replicate from x86 to >>>> arm". I don't believe that that's a good idea and I'm unwilling >>>> to promise that it will work, regardless of what we do about >>>> char signedness. That being the case, I don't want to invest a >>>> lot of effort in the signedness issue. Option (1) is clearly >>>> a small change with little if any risk of future breakage. >> >>> But note that option 1 would prevent some replication that is currently >>> working. >> >> The point of this thread though is that it's working only for small >> values of "work". People are rightfully unhappy if it seems to work >> and then later they get bitten by compatibility problems. >> >> Treating char signedness as a machine property in pg_control would >> signal that we don't intend to make it work, and would ensure that >> even the most minimal testing would find out that it doesn't work. >> >> If we do not do that, it seems to me we have to buy into making >> it work. That would mean dealing with the consequences of an >> incompatible change in pg_trgm indexes, and then going through >> the same dance again the next time(s) similar problems are found. > > Yes, that is understood. But anecdotally, replicating between x86-64 arm64 is > occasionally used for upgrades or migrations. In practice, this appears to have > mostly worked. If we now discover that it won't work with certain index > extension modules, it's usable for most users. Even if we say, you have to > reindex everything afterwards, it's probably still useful for these scenarios. +1 I have heard similar anecdotes, and the reported experience goes even further -- many such upgrade/migration uses, with exceedingly rare reported failures. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Sat, May 4, 2024 at 7:36 AM Joe Conway <mail@joeconway.com> wrote: > > On 5/3/24 11:44, Peter Eisentraut wrote: > > On 03.05.24 16:13, Tom Lane wrote: > >> Peter Eisentraut <peter@eisentraut.org> writes: > >>> On 30.04.24 19:29, Tom Lane wrote: > >>>> Also, the bigger picture here is the seeming assumption that "if > >>>> we change pg_trgm then it will be safe to replicate from x86 to > >>>> arm". I don't believe that that's a good idea and I'm unwilling > >>>> to promise that it will work, regardless of what we do about > >>>> char signedness. That being the case, I don't want to invest a > >>>> lot of effort in the signedness issue. Option (1) is clearly > >>>> a small change with little if any risk of future breakage. > >> > >>> But note that option 1 would prevent some replication that is currently > >>> working. > >> > >> The point of this thread though is that it's working only for small > >> values of "work". People are rightfully unhappy if it seems to work > >> and then later they get bitten by compatibility problems. > >> > >> Treating char signedness as a machine property in pg_control would > >> signal that we don't intend to make it work, and would ensure that > >> even the most minimal testing would find out that it doesn't work. > >> > >> If we do not do that, it seems to me we have to buy into making > >> it work. That would mean dealing with the consequences of an > >> incompatible change in pg_trgm indexes, and then going through > >> the same dance again the next time(s) similar problems are found. > > > > Yes, that is understood. But anecdotally, replicating between x86-64 arm64 is > > occasionally used for upgrades or migrations. In practice, this appears to have > > mostly worked. If we now discover that it won't work with certain index > > extension modules, it's usable for most users. Even if we say, you have to > > reindex everything afterwards, it's probably still useful for these scenarios. > > +1 +1 How about extending amcheck to support GIN and GIst indexes so that it can detect potential data incompatibility due to changing 'char' to 'unsigned char'? I think these new tests would be useful also for users to check if they really need to reindex indexes due to such changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18. Users who upgraded to PG18 can run the new amcheck tests on the primary as well as the standby. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote: > On Sat, May 4, 2024 at 7:36 AM Joe Conway <mail@joeconway.com> wrote: > > On 5/3/24 11:44, Peter Eisentraut wrote: > > > On 03.05.24 16:13, Tom Lane wrote: > > >> Peter Eisentraut <peter@eisentraut.org> writes: > > >>> On 30.04.24 19:29, Tom Lane wrote: > > >>>> Also, the bigger picture here is the seeming assumption that "if > > >>>> we change pg_trgm then it will be safe to replicate from x86 to > > >>>> arm". I don't believe that that's a good idea and I'm unwilling > > >>>> to promise that it will work, regardless of what we do about > > >>>> char signedness. That being the case, I don't want to invest a > > >>>> lot of effort in the signedness issue. Option (1) is clearly > > >>>> a small change with little if any risk of future breakage. > > >> > > >>> But note that option 1 would prevent some replication that is currently > > >>> working. > > >> > > >> The point of this thread though is that it's working only for small > > >> values of "work". People are rightfully unhappy if it seems to work > > >> and then later they get bitten by compatibility problems. > > >> > > >> Treating char signedness as a machine property in pg_control would > > >> signal that we don't intend to make it work, and would ensure that > > >> even the most minimal testing would find out that it doesn't work. > > >> > > >> If we do not do that, it seems to me we have to buy into making > > >> it work. That would mean dealing with the consequences of an > > >> incompatible change in pg_trgm indexes, and then going through > > >> the same dance again the next time(s) similar problems are found. > > > > > > Yes, that is understood. But anecdotally, replicating between x86-64 arm64 is > > > occasionally used for upgrades or migrations. In practice, this appears to have > > > mostly worked. If we now discover that it won't work with certain index > > > extension modules, it's usable for most users. Even if we say, you have to > > > reindex everything afterwards, it's probably still useful for these scenarios. Like you, I would not introduce a ControlFileData block for sign-of-char, given the signs of breakage in extension indexing only. That would lose much useful migration capability. I'm sympathetic to Tom's point, which I'll attempt to summarize as: a ControlFileData block is a promise we know how to keep, whereas we should expect further bug discoveries without it. At the same time, I put more weight on the apparently-wide span of functionality that works fine. (I wonder whether any static analyzer does or practically could detect char sign dependence with a decent false positive rate.) > +1 > > How about extending amcheck to support GIN and GIst indexes so that it > can detect potential data incompatibility due to changing 'char' to > 'unsigned char'? I think these new tests would be useful also for > users to check if they really need to reindex indexes due to such > changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18. > Users who upgraded to PG18 can run the new amcheck tests on the > primary as well as the standby. If I were standardizing pg_trgm on one or the other notion of "char", I would choose signed char, since I think it's still the majority. More broadly, I see these options to fix pg_trgm: 1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes. 2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes. 3. Offer both, as an upgrade path. For example, pg_trgm could have separate operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running pg_upgrade on an unsigned-char system would automatically map v17 gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any architecture with upgrade-time scans. Independently, having amcheck for GIN and/or GiST would be great.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Sun, May 19, 2024 at 6:46 AM Noah Misch <noah@leadboat.com> wrote: > > On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote: > > > > How about extending amcheck to support GIN and GIst indexes so that it > > can detect potential data incompatibility due to changing 'char' to > > 'unsigned char'? I think these new tests would be useful also for > > users to check if they really need to reindex indexes due to such > > changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18. > > Users who upgraded to PG18 can run the new amcheck tests on the > > primary as well as the standby. > > If I were standardizing pg_trgm on one or the other notion of "char", I would > choose signed char, since I think it's still the majority. More broadly, I > see these options to fix pg_trgm: > > 1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes. > 2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes. Even though it's true that signed char systems are the majority, it would not be acceptable to force the need to scan pg_trgm indexes on unsigned char systems. > 3. Offer both, as an upgrade path. For example, pg_trgm could have separate > operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running > pg_upgrade on an unsigned-char system would automatically map v17 > gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any > architecture with upgrade-time scans. Very interesting idea. How can new v18 users use the correct operator class? I don't want to require users to specify the correct signed or unsigned operator classes when creating a GIN index. Maybe we need to dynamically use the correct compare function for the same operator class depending on the char signedness. But is it possible to do it on the extension (e.g. pg_trgm) side? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Thu, Aug 29, 2024 at 03:48:53PM -0500, Masahiko Sawada wrote: > On Sun, May 19, 2024 at 6:46 AM Noah Misch <noah@leadboat.com> wrote: > > If I were standardizing pg_trgm on one or the other notion of "char", I would > > choose signed char, since I think it's still the majority. More broadly, I > > see these options to fix pg_trgm: > > > > 1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes. > > 2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes. > > Even though it's true that signed char systems are the majority, it > would not be acceptable to force the need to scan pg_trgm indexes on > unsigned char systems. > > > 3. Offer both, as an upgrade path. For example, pg_trgm could have separate > > operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running > > pg_upgrade on an unsigned-char system would automatically map v17 > > gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any > > architecture with upgrade-time scans. > > Very interesting idea. How can new v18 users use the correct operator > class? I don't want to require users to specify the correct signed or > unsigned operator classes when creating a GIN index. Maybe we need to In brief, it wouldn't matter which operator class new v18 indexes use. The documentation would focus on gin_trgm_ops and also say something like: There's an additional operator class, gin_trgm_ops_unsigned. It behaves exactly like gin_trgm_ops, but it uses a deprecated on-disk representation. Use gin_trgm_ops in new indexes, but there's no disadvantage from continuing to use gin_trgm_ops_unsigned. Before PostgreSQL 18, gin_trgm_ops used a platform-dependent representation. pg_upgrade automatically uses gin_trgm_ops_unsigned when upgrading from source data that used the deprecated representation. What concerns might users have, then? (Neither operator class would use plain "char" in a context that affects on-disk state. They'll use "signed char" and "unsigned char".) > dynamically use the correct compare function for the same operator > class depending on the char signedness. But is it possible to do it on > the extension (e.g. pg_trgm) side? No, I don't think the extension can do that cleanly. It would need to store the signedness in the index somehow, and GIN doesn't call the opclass at a time facilitating that. That would need help from the core server.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Fri, Aug 30, 2024 at 8:10 PM Noah Misch <noah@leadboat.com> wrote: > > On Thu, Aug 29, 2024 at 03:48:53PM -0500, Masahiko Sawada wrote: > > On Sun, May 19, 2024 at 6:46 AM Noah Misch <noah@leadboat.com> wrote: > > > If I were standardizing pg_trgm on one or the other notion of "char", I would > > > choose signed char, since I think it's still the majority. More broadly, I > > > see these options to fix pg_trgm: > > > > > > 1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes. > > > 2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes. > > > > Even though it's true that signed char systems are the majority, it > > would not be acceptable to force the need to scan pg_trgm indexes on > > unsigned char systems. > > > > > 3. Offer both, as an upgrade path. For example, pg_trgm could have separate > > > operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running > > > pg_upgrade on an unsigned-char system would automatically map v17 > > > gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any > > > architecture with upgrade-time scans. > > > > Very interesting idea. How can new v18 users use the correct operator > > class? I don't want to require users to specify the correct signed or > > unsigned operator classes when creating a GIN index. Maybe we need to > > In brief, it wouldn't matter which operator class new v18 indexes use. The > documentation would focus on gin_trgm_ops and also say something like: > > There's an additional operator class, gin_trgm_ops_unsigned. It behaves > exactly like gin_trgm_ops, but it uses a deprecated on-disk representation. > Use gin_trgm_ops in new indexes, but there's no disadvantage from continuing > to use gin_trgm_ops_unsigned. Before PostgreSQL 18, gin_trgm_ops used a > platform-dependent representation. pg_upgrade automatically uses > gin_trgm_ops_unsigned when upgrading from source data that used the > deprecated representation. > > What concerns might users have, then? (Neither operator class would use plain > "char" in a context that affects on-disk state. They'll use "signed char" and > "unsigned char".) I think I understand your idea now. Since gin_trgm_ops will use "signed char", there is no impact for v18 users -- they can continue using gin_trgm_ops. But how does pg_upgrade use gin_trgm_ops_unsigned? This opclass will be created by executing the pg_trgm script for v18, but it isn't executed during pg_upgrade. Another way would be to do these opclass replacement when executing the pg_trgm's update script (i.e., 1.6 to 1.7). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Fri, Sep 06, 2024 at 02:07:10PM -0700, Masahiko Sawada wrote: > On Fri, Aug 30, 2024 at 8:10 PM Noah Misch <noah@leadboat.com> wrote: > > On Thu, Aug 29, 2024 at 03:48:53PM -0500, Masahiko Sawada wrote: > > > On Sun, May 19, 2024 at 6:46 AM Noah Misch <noah@leadboat.com> wrote: > > > > If I were standardizing pg_trgm on one or the other notion of "char", I would > > > > choose signed char, since I think it's still the majority. More broadly, I > > > > see these options to fix pg_trgm: > > > > > > > > 1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes. > > > > 2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes. > > > > > > Even though it's true that signed char systems are the majority, it > > > would not be acceptable to force the need to scan pg_trgm indexes on > > > unsigned char systems. > > > > > > > 3. Offer both, as an upgrade path. For example, pg_trgm could have separate > > > > operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running > > > > pg_upgrade on an unsigned-char system would automatically map v17 > > > > gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any > > > > architecture with upgrade-time scans. > > > > > > Very interesting idea. How can new v18 users use the correct operator > > > class? I don't want to require users to specify the correct signed or > > > unsigned operator classes when creating a GIN index. Maybe we need to > > > > In brief, it wouldn't matter which operator class new v18 indexes use. The > > documentation would focus on gin_trgm_ops and also say something like: > > > > There's an additional operator class, gin_trgm_ops_unsigned. It behaves > > exactly like gin_trgm_ops, but it uses a deprecated on-disk representation. > > Use gin_trgm_ops in new indexes, but there's no disadvantage from continuing > > to use gin_trgm_ops_unsigned. Before PostgreSQL 18, gin_trgm_ops used a > > platform-dependent representation. pg_upgrade automatically uses > > gin_trgm_ops_unsigned when upgrading from source data that used the > > deprecated representation. > > > > What concerns might users have, then? (Neither operator class would use plain > > "char" in a context that affects on-disk state. They'll use "signed char" and > > "unsigned char".) > > I think I understand your idea now. Since gin_trgm_ops will use > "signed char", there is no impact for v18 users -- they can continue > using gin_trgm_ops. Right. > But how does pg_upgrade use gin_trgm_ops_unsigned? This opclass will > be created by executing the pg_trgm script for v18, but it isn't > executed during pg_upgrade. Another way would be to do these opclass > replacement when executing the pg_trgm's update script (i.e., 1.6 to > 1.7). Yes, that's one way to make it work. If we do it that way, it would be critical to make the ALTER EXTENSION UPDATE run before anything uses the index. Otherwise, we'll run new v18 "signed char" code on a v17 "unsigned char" data file. Running ALTER EXTENSION UPDATE early enough should be feasible, so that's fine. Some other options: - If v18 "pg_dump -b" decides to emit CREATE INDEX ... gin_trgm_ops_unsigned, then make it also emit the statements to create the opclass. - Ship pg_trgm--1.6--1.7.sql in back branches. If the upgrade wants to use gin_trgm_ops_unsigned, require the user to ALTER EXTENSION UPDATE first. (In back branches, the C code behind gin_trgm_ops_unsigned could just raise an error if called.) What other options exist? I bet there are more.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Noah Misch <noah@leadboat.com> writes: > Yes, that's one way to make it work. If we do it that way, it would be > critical to make the ALTER EXTENSION UPDATE run before anything uses the > index. Otherwise, we'll run new v18 "signed char" code on a v17 "unsigned > char" data file. Running ALTER EXTENSION UPDATE early enough should be > feasible, so that's fine. Some other options: > - If v18 "pg_dump -b" decides to emit CREATE INDEX ... gin_trgm_ops_unsigned, > then make it also emit the statements to create the opclass. > - Ship pg_trgm--1.6--1.7.sql in back branches. If the upgrade wants to use > gin_trgm_ops_unsigned, require the user to ALTER EXTENSION UPDATE first. > (In back branches, the C code behind gin_trgm_ops_unsigned could just raise > an error if called.) I feel like all of these are leaning heavily on users to get it right, and therefore have a significant chance of breaking use-cases that were perfectly fine before. Remind me of why we can't do something like this: * Add APIs that allow opclasses to read/write some space in the GIN metapage. (We could get ambitious and add such APIs for other AMs too, but doing it just for GIN is probably a prudent start.) Ensure that this space is initially zeroed. * In gin_trgm_ops, read a byte of this space and interpret it as 0 = unset 1 = signed chars 2 = unsigned chars If the value is zero, set the byte on the basis of the native char-signedness of the current platform. (Obviously, a secondary server couldn't write the byte, and would have to wait for the primary to update the value. In the meantime, it's no more broken than today.) * Subsequently, use either signed or unsigned comparisons based on that value. This would automatically do the right thing in all cases that work today, and it'd provide the ability for cross-platform replication to work in future. You can envision cases where transferring a pre-existing index to a platform of the other stripe would misbehave, but they're the same cases that fail today, and the fix remains the same: reindex. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Fri, Sep 06, 2024 at 06:36:53PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > Yes, that's one way to make it work. If we do it that way, it would be > > critical to make the ALTER EXTENSION UPDATE run before anything uses the > > index. Otherwise, we'll run new v18 "signed char" code on a v17 "unsigned > > char" data file. Running ALTER EXTENSION UPDATE early enough should be > > feasible, so that's fine. Some other options: > > > - If v18 "pg_dump -b" decides to emit CREATE INDEX ... gin_trgm_ops_unsigned, > > then make it also emit the statements to create the opclass. > > > - Ship pg_trgm--1.6--1.7.sql in back branches. If the upgrade wants to use > > gin_trgm_ops_unsigned, require the user to ALTER EXTENSION UPDATE first. > > (In back branches, the C code behind gin_trgm_ops_unsigned could just raise > > an error if called.) > > I feel like all of these are leaning heavily on users to get it right, What do you have in mind? I see things for the pg_upgrade implementation to get right, but I don't see things for pg_upgrade users to get right. The last option is disruptive for users of unsigned char platforms, since it requires a two-step upgrade if upgrading from v11 or earlier. The other two don't have that problem. > and therefore have a significant chance of breaking use-cases that > were perfectly fine before. > > Remind me of why we can't do something like this: > > * Add APIs that allow opclasses to read/write some space in the GIN > metapage. (We could get ambitious and add such APIs for other AMs > too, but doing it just for GIN is probably a prudent start.) Ensure > that this space is initially zeroed. > > * In gin_trgm_ops, read a byte of this space and interpret it as > 0 = unset > 1 = signed chars > 2 = unsigned chars > If the value is zero, set the byte on the basis of the native > char-signedness of the current platform. (Obviously, a > secondary server couldn't write the byte, and would have to > wait for the primary to update the value. In the meantime, > it's no more broken than today.) > > * Subsequently, use either signed or unsigned comparisons > based on that value. > > This would automatically do the right thing in all cases that > work today, and it'd provide the ability for cross-platform > replication to work in future. You can envision cases where > transferring a pre-existing index to a platform of the other > stripe would misbehave, but they're the same cases that fail > today, and the fix remains the same: reindex. No reason you couldn't do it that way. Works for me.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Noah Misch <noah@leadboat.com> writes: > On Fri, Sep 06, 2024 at 06:36:53PM -0400, Tom Lane wrote: >> I feel like all of these are leaning heavily on users to get it right, > What do you have in mind? I see things for the pg_upgrade implementation to > get right, but I don't see things for pg_upgrade users to get right. Well, yeah, if you are willing to put pg_upgrade in charge of executing ALTER EXTENSION UPDATE, then that would be a reasonably omission-proof path. But we have always intended the pg_upgrade process to *not* auto-update extensions, so I'm concerned about the potential side-effects of drilling a hole in that policy. As an example: even if we ensure that pg_trgm 1.6 to 1.7 is totally transparent except for this fix, what happens if the user's old database is still on some pre-1.6 version? Is it okay to force an update that includes feature upgrades? regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Fri, Sep 06, 2024 at 07:37:09PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Fri, Sep 06, 2024 at 06:36:53PM -0400, Tom Lane wrote: > >> I feel like all of these are leaning heavily on users to get it right, > > > What do you have in mind? I see things for the pg_upgrade implementation to > > get right, but I don't see things for pg_upgrade users to get right. > > Well, yeah, if you are willing to put pg_upgrade in charge of > executing ALTER EXTENSION UPDATE, then that would be a reasonably > omission-proof path. But we have always intended the pg_upgrade > process to *not* auto-update extensions, so I'm concerned about > the potential side-effects of drilling a hole in that policy. > As an example: even if we ensure that pg_trgm 1.6 to 1.7 is totally > transparent except for this fix, what happens if the user's old > database is still on some pre-1.6 version? Is it okay to force an > update that includes feature upgrades? Those are disadvantages of some of the options. I think it could be okay to inject the mandatory upgrade here or just tell the user to update to 1.7 before attempting the upgrade (if not at 1.7, pg_upgrade would fail with an error message to that effect). Your counterproposal avoids the issue, and I'm good with that design.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Fri, Sep 6, 2024 at 3:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Noah Misch <noah@leadboat.com> writes: > > Yes, that's one way to make it work. If we do it that way, it would be > > critical to make the ALTER EXTENSION UPDATE run before anything uses the > > index. Otherwise, we'll run new v18 "signed char" code on a v17 "unsigned > > char" data file. Running ALTER EXTENSION UPDATE early enough should be > > feasible, so that's fine. Some other options: > > > - If v18 "pg_dump -b" decides to emit CREATE INDEX ... gin_trgm_ops_unsigned, > > then make it also emit the statements to create the opclass. > > > - Ship pg_trgm--1.6--1.7.sql in back branches. If the upgrade wants to use > > gin_trgm_ops_unsigned, require the user to ALTER EXTENSION UPDATE first. > > (In back branches, the C code behind gin_trgm_ops_unsigned could just raise > > an error if called.) > > I feel like all of these are leaning heavily on users to get it right, > and therefore have a significant chance of breaking use-cases that > were perfectly fine before. > > Remind me of why we can't do something like this: > > * Add APIs that allow opclasses to read/write some space in the GIN > metapage. (We could get ambitious and add such APIs for other AMs > too, but doing it just for GIN is probably a prudent start.) Ensure > that this space is initially zeroed. > > * In gin_trgm_ops, read a byte of this space and interpret it as > 0 = unset > 1 = signed chars > 2 = unsigned chars > If the value is zero, set the byte on the basis of the native > char-signedness of the current platform. (Obviously, a > secondary server couldn't write the byte, and would have to > wait for the primary to update the value. In the meantime, > it's no more broken than today.) When do we set the byte on the primary server? If it's the first time to use the GIN index, secondary servers would have to wait for the primary to use the GIN index, which could be an unpredictable time or it may never come depending on index usages. Probably we can make pg_upgrade set the byte in the meta page of GIN indexes that use the gin_trgm_ops. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Masahiko Sawada <sawada.mshk@gmail.com> writes: > When do we set the byte on the primary server? If it's the first time > to use the GIN index, secondary servers would have to wait for the > primary to use the GIN index, which could be an unpredictable time or > it may never come depending on index usages. Probably we can make > pg_upgrade set the byte in the meta page of GIN indexes that use the > gin_trgm_ops. Hmm, perhaps. That plus set-it-during-index-create would remove the need for dynamic update like I suggested. So very roughly the amount of complexity would balance out. Do you have an idea for how we'd get this to happen during pg_upgrade, exactly? regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Mon, Sep 9, 2024 at 4:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Masahiko Sawada <sawada.mshk@gmail.com> writes: > > When do we set the byte on the primary server? If it's the first time > > to use the GIN index, secondary servers would have to wait for the > > primary to use the GIN index, which could be an unpredictable time or > > it may never come depending on index usages. Probably we can make > > pg_upgrade set the byte in the meta page of GIN indexes that use the > > gin_trgm_ops. > > Hmm, perhaps. That plus set-it-during-index-create would remove the > need for dynamic update like I suggested. So very roughly the amount > of complexity would balance out. Yes, I think your set-it-during-index-create would be necessary. > Do you have an idea for how we'd get > this to happen during pg_upgrade, exactly? What I was thinking is that we have "pg_dump --binary-upgrade" emit a function call, say "SELECT binary_upgrade_update_gin_meta_page()" for each GIN index, and the meta pages are updated when restoring the schemas. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Masahiko Sawada <sawada.mshk@gmail.com> writes: > On Mon, Sep 9, 2024 at 4:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Do you have an idea for how we'd get >> this to happen during pg_upgrade, exactly? > What I was thinking is that we have "pg_dump --binary-upgrade" emit a > function call, say "SELECT binary_upgrade_update_gin_meta_page()" for > each GIN index, and the meta pages are updated when restoring the > schemas. Hmm, but ... 1. IIRC we don't move the relation files into the new cluster until after we've run the schema dump/restore step. I think this'd have to be driven in some other way than from the pg_dump output. I guess we could have pg_upgrade start up the new postmaster and call a function in each DB, which would have to scan for GIN indexes by itself. 2. How will this interact with --link mode? I don't see how it doesn't involve scribbling on files that are shared with the old cluster, leading to possible problems if the pg_upgrade fails later and the user tries to go back to using the old cluster. It's not so much the metapage update that is worrisome --- we're assuming that that will modify storage that's unused in old versions. But the change would be unrecorded in the old cluster's WAL, which sounds risky. Maybe we could get away with forcing --copy mode for affected indexes, but that feels a bit messy. We'd not want to do it for unaffected indexes, so the copying code would need to know a great deal about this problem. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Mon, Sep 9, 2024 at 11:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Masahiko Sawada <sawada.mshk@gmail.com> writes: > > On Mon, Sep 9, 2024 at 4:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Do you have an idea for how we'd get > >> this to happen during pg_upgrade, exactly? > > > What I was thinking is that we have "pg_dump --binary-upgrade" emit a > > function call, say "SELECT binary_upgrade_update_gin_meta_page()" for > > each GIN index, and the meta pages are updated when restoring the > > schemas. > > Hmm, but ... > > 1. IIRC we don't move the relation files into the new cluster until > after we've run the schema dump/restore step. I think this'd have to > be driven in some other way than from the pg_dump output. I guess we > could have pg_upgrade start up the new postmaster and call a function > in each DB, which would have to scan for GIN indexes by itself. You're right. > > 2. How will this interact with --link mode? I don't see how it > doesn't involve scribbling on files that are shared with the old > cluster, leading to possible problems if the pg_upgrade fails later > and the user tries to go back to using the old cluster. It's not so > much the metapage update that is worrisome --- we're assuming that > that will modify storage that's unused in old versions. But the > change would be unrecorded in the old cluster's WAL, which sounds > risky. > > Maybe we could get away with forcing --copy mode for affected > indexes, but that feels a bit messy. We'd not want to do it > for unaffected indexes, so the copying code would need to know > a great deal about this problem. Good point. I agree that it would not be a very good idea to force --copy mode. An alternative way would be that we store the char signedness in the control file, and gin_trgm_ops opclass reads it if the bytes in the meta page shows 'unset'. The char signedness in the control file doesn't mean to be used for the compatibility check for physical replication but used as a hint. But it also could be a bit messy, though. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Masahiko Sawada <sawada.mshk@gmail.com> writes: > An alternative way would be that we store the char signedness in the > control file, and gin_trgm_ops opclass reads it if the bytes in the > meta page shows 'unset'. The char signedness in the control file > doesn't mean to be used for the compatibility check for physical > replication but used as a hint. But it also could be a bit messy, > though. Yeah, that seems like it could work. But are we sure that replicas get a copy of the primary's control file rather than creating their own? regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Sep 10, 2024 at 11:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Masahiko Sawada <sawada.mshk@gmail.com> writes: > > An alternative way would be that we store the char signedness in the > > control file, and gin_trgm_ops opclass reads it if the bytes in the > > meta page shows 'unset'. The char signedness in the control file > > doesn't mean to be used for the compatibility check for physical > > replication but used as a hint. But it also could be a bit messy, > > though. > > Yeah, that seems like it could work. But are we sure that replicas > get a copy of the primary's control file rather than creating their > own? Yes, I think so. Since at least the system identifiers of primary and replicas must be identical for physical replication, if replicas use their own control files then they cannot start the replication. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Masahiko Sawada <sawada.mshk@gmail.com> writes: > On Tue, Sep 10, 2024 at 11:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, that seems like it could work. But are we sure that replicas >> get a copy of the primary's control file rather than creating their >> own? > Yes, I think so. Since at least the system identifiers of primary and > replicas must be identical for physical replication, if replicas use > their own control files then they cannot start the replication. Got it. So now I'm wondering if we need all the complexity of storing stuff in the GIN metapages. Could we simply read the (primary's) signedness out of pg_control and use that? We might need some caching mechanism to make that cheap enough, but accessing the current index's metapage is far from free either. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote: > Masahiko Sawada <sawada.mshk@gmail.com> writes: > > On Tue, Sep 10, 2024 at 11:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Yeah, that seems like it could work. But are we sure that replicas > >> get a copy of the primary's control file rather than creating their > >> own? > > > Yes, I think so. Since at least the system identifiers of primary and > > replicas must be identical for physical replication, if replicas use > > their own control files then they cannot start the replication. > > Got it. So now I'm wondering if we need all the complexity of storing > stuff in the GIN metapages. Could we simply read the (primary's) > signedness out of pg_control and use that? Yes.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Thu, Sep 12, 2024 at 03:42:48PM -0700, Masahiko Sawada wrote: > On Tue, Sep 10, 2024 at 3:05 PM Noah Misch <noah@leadboat.com> wrote: > > On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote: > > > Got it. So now I'm wondering if we need all the complexity of storing > > > stuff in the GIN metapages. Could we simply read the (primary's) > > > signedness out of pg_control and use that? > I've attached a PoC patch for this idea. We write the default char > signedness to the control file at initdb time. Then when comparing two > trgms, pg_trgm opclasses use a comparison function based on the char > signedness of the cluster. I've confirmed that the patch fixes the > reported case at least. I agree that proves the concept.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Mon, Sep 16, 2024 at 9:24 AM Noah Misch <noah@leadboat.com> wrote: > > On Thu, Sep 12, 2024 at 03:42:48PM -0700, Masahiko Sawada wrote: > > On Tue, Sep 10, 2024 at 3:05 PM Noah Misch <noah@leadboat.com> wrote: > > > On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote: > > > > Got it. So now I'm wondering if we need all the complexity of storing > > > > stuff in the GIN metapages. Could we simply read the (primary's) > > > > signedness out of pg_control and use that? > > > I've attached a PoC patch for this idea. We write the default char > > signedness to the control file at initdb time. Then when comparing two > > trgms, pg_trgm opclasses use a comparison function based on the char > > signedness of the cluster. I've confirmed that the patch fixes the > > reported case at least. > > I agree that proves the concept. Thanks. I like the simplicity of this approach. If we agree with this approach, I'd like to proceed with it. Regardless of what approach we take, I wanted to provide some regression tests for these changes, but I could not come up with a reasonable idea. It would be great if we could do something like 027_stream_regress.pl on cross-architecture replication. But just executing 027_stream_regress.pl on cross-architecture replication could not be sufficient since we would like to confirm query results as well. If we could have a reasonable tool or way, it would also help find other similar bugs related architectural differences. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Sep 17, 2024 at 09:43:41PM -0700, Masahiko Sawada wrote: > On Mon, Sep 16, 2024 at 9:24 AM Noah Misch <noah@leadboat.com> wrote: > > On Thu, Sep 12, 2024 at 03:42:48PM -0700, Masahiko Sawada wrote: > > > On Tue, Sep 10, 2024 at 3:05 PM Noah Misch <noah@leadboat.com> wrote: > > > > On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote: > > > > > Got it. So now I'm wondering if we need all the complexity of storing > > > > > stuff in the GIN metapages. Could we simply read the (primary's) > > > > > signedness out of pg_control and use that? > > > > > I've attached a PoC patch for this idea. We write the default char > > > signedness to the control file at initdb time. Then when comparing two > > > trgms, pg_trgm opclasses use a comparison function based on the char > > > signedness of the cluster. I've confirmed that the patch fixes the > > > reported case at least. > > > > I agree that proves the concept. > > Thanks. I like the simplicity of this approach. If we agree with this > approach, I'd like to proceed with it. Works for me. > Regardless of what approach we take, I wanted to provide some > regression tests for these changes, but I could not come up with a > reasonable idea. It would be great if we could do something like > 027_stream_regress.pl on cross-architecture replication. But just > executing 027_stream_regress.pl on cross-architecture replication > could not be sufficient since we would like to confirm query results > as well. If we could have a reasonable tool or way, it would also help > find other similar bugs related architectural differences. Perhaps add a regress.c function that changes the control file flag and flushes the change to disk?
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Wed, Sep 25, 2024 at 03:46:27PM -0700, Masahiko Sawada wrote: > On Wed, Sep 18, 2024 at 6:46 PM Noah Misch <noah@leadboat.com> wrote: > > On Tue, Sep 17, 2024 at 09:43:41PM -0700, Masahiko Sawada wrote: > > > On Mon, Sep 16, 2024 at 9:24 AM Noah Misch <noah@leadboat.com> wrote: > > > > On Thu, Sep 12, 2024 at 03:42:48PM -0700, Masahiko Sawada wrote: > > > > > On Tue, Sep 10, 2024 at 3:05 PM Noah Misch <noah@leadboat.com> wrote: > > > > > > On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote: > > > > > > > Got it. So now I'm wondering if we need all the complexity of storing > > > > > > > stuff in the GIN metapages. Could we simply read the (primary's) > > > > > > > signedness out of pg_control and use that? > > > Thanks. I like the simplicity of this approach. If we agree with this > > > approach, I'd like to proceed with it. > > > > Works for me. > > > > > Regardless of what approach we take, I wanted to provide some > > > regression tests for these changes, but I could not come up with a > > > reasonable idea. It would be great if we could do something like > > > 027_stream_regress.pl on cross-architecture replication. But just > > > executing 027_stream_regress.pl on cross-architecture replication > > > could not be sufficient since we would like to confirm query results > > > as well. If we could have a reasonable tool or way, it would also help > > > find other similar bugs related architectural differences. > > > > Perhaps add a regress.c function that changes the control file flag and > > flushes the change to disk? > > I've tried this idea but found out that extensions can flush the > controlfile on the disk after flipping the char signedness value, they > cannot update the controlfile data on the shared memory. And, when the > server shuts down, the on-disk controlfile is updated with the > on-memory controlfile data, so the char signedness is reverted. > > We can add a function to set the char signedness of on-memory > controlfile data or add a new option to pg_resetwal to change the char > signedness on-disk controlfile data but they seem to be overkill to me > and I'm concerned about misusing these option and function. Before the project is over, pg_upgrade will have some way to set the control file signedness to the source cluster signedness. I think pg_upgrade should also take an option for overriding the source cluster signedness for source clusters v17 and older. That way, a user knowing they copied the v17 source cluster from x86 to ARM can pg_upgrade properly without the prerequisite of acquiring an x86 VM. Once that all exists, perhaps it will be clearer how best to change signedness for testing. > While this change does not encourage the use of 'char' without > explicitly specifying its signedness, it provides a hint to ensure > consistent behavior for indexes (e.g., GIN and GiST) and tables that > already store data sorted by the 'char' type on disk, especially in > cross-platform replication scenarios. Let's put that sort of information in the GetDefaultCharSignedness() header comment. New code should use explicit "signed char" or "unsigned char" when it matters for data file compatibility. GetDefaultCharSignedness() exists for code that deals with pre-v18 data files, where we accidentally let C implementation signedness affect persistent data. > @@ -4256,6 +4257,18 @@ WriteControlFile(void) > > ControlFile->float8ByVal = FLOAT8PASSBYVAL; > > + /* > + * The signedness of the char type is implementation-defined. For example > + * on x86 architecture CPUs, the char data type is typically treated as > + * signed by default whereas on aarch architecture CPUs it is typically > + * treated as unsigned by default. > + */ > +#if CHAR_MIN != 0 > + ControlFile->default_char_signedness = true; > +#else > + ControlFile->default_char_signedness = false; > +#endif This has initdb set the field to the current C implementation's signedness. I wonder if, instead, initdb should set signedness=true unconditionally. Then the only source of signedness=false will be pg_upgrade. Advantage: signedness=false will get rarer over time. If we had known about this problem during the last development cycle that forced initdb (v8.3), we would have made all clusters signed or all clusters unsigned. Making pg_upgrade the only source of signedness=false will cause the population of database clusters to converge toward that retrospective ideal. Disadvantage: testing signedness=false will require more planning. What other merits should we consider as part of deciding?
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Mon, Sep 30, 2024 at 11:49 AM Noah Misch <noah@leadboat.com> wrote: > > On Wed, Sep 25, 2024 at 03:46:27PM -0700, Masahiko Sawada wrote: > > On Wed, Sep 18, 2024 at 6:46 PM Noah Misch <noah@leadboat.com> wrote: > > > On Tue, Sep 17, 2024 at 09:43:41PM -0700, Masahiko Sawada wrote: > > > > On Mon, Sep 16, 2024 at 9:24 AM Noah Misch <noah@leadboat.com> wrote: > > > > > On Thu, Sep 12, 2024 at 03:42:48PM -0700, Masahiko Sawada wrote: > > > > > > On Tue, Sep 10, 2024 at 3:05 PM Noah Misch <noah@leadboat.com> wrote: > > > > > > > On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote: > > > > > > > > Got it. So now I'm wondering if we need all the complexity of storing > > > > > > > > stuff in the GIN metapages. Could we simply read the (primary's) > > > > > > > > signedness out of pg_control and use that? > > > > > Thanks. I like the simplicity of this approach. If we agree with this > > > > approach, I'd like to proceed with it. > > > > > > Works for me. > > > > > > > Regardless of what approach we take, I wanted to provide some > > > > regression tests for these changes, but I could not come up with a > > > > reasonable idea. It would be great if we could do something like > > > > 027_stream_regress.pl on cross-architecture replication. But just > > > > executing 027_stream_regress.pl on cross-architecture replication > > > > could not be sufficient since we would like to confirm query results > > > > as well. If we could have a reasonable tool or way, it would also help > > > > find other similar bugs related architectural differences. > > > > > > Perhaps add a regress.c function that changes the control file flag and > > > flushes the change to disk? > > > > I've tried this idea but found out that extensions can flush the > > controlfile on the disk after flipping the char signedness value, they > > cannot update the controlfile data on the shared memory. And, when the > > server shuts down, the on-disk controlfile is updated with the > > on-memory controlfile data, so the char signedness is reverted. > > > > We can add a function to set the char signedness of on-memory > > controlfile data or add a new option to pg_resetwal to change the char > > signedness on-disk controlfile data but they seem to be overkill to me > > and I'm concerned about misusing these option and function. > > Before the project is over, pg_upgrade will have some way to set the control > file signedness to the source cluster signedness. I think pg_upgrade should > also take an option for overriding the source cluster signedness for source > clusters v17 and older. That way, a user knowing they copied the v17 source > cluster from x86 to ARM can pg_upgrade properly without the prerequisite of > acquiring an x86 VM. Good idea. > Once that all exists, perhaps it will be clearer how > best to change signedness for testing. Agreed. > > > While this change does not encourage the use of 'char' without > > explicitly specifying its signedness, it provides a hint to ensure > > consistent behavior for indexes (e.g., GIN and GiST) and tables that > > already store data sorted by the 'char' type on disk, especially in > > cross-platform replication scenarios. > > Let's put that sort of information in the GetDefaultCharSignedness() header > comment. New code should use explicit "signed char" or "unsigned char" when > it matters for data file compatibility. GetDefaultCharSignedness() exists for > code that deals with pre-v18 data files, where we accidentally let C > implementation signedness affect persistent data. Agreed, will fix. > > > @@ -4256,6 +4257,18 @@ WriteControlFile(void) > > > > ControlFile->float8ByVal = FLOAT8PASSBYVAL; > > > > + /* > > + * The signedness of the char type is implementation-defined. For example > > + * on x86 architecture CPUs, the char data type is typically treated as > > + * signed by default whereas on aarch architecture CPUs it is typically > > + * treated as unsigned by default. > > + */ > > +#if CHAR_MIN != 0 > > + ControlFile->default_char_signedness = true; > > +#else > > + ControlFile->default_char_signedness = false; > > +#endif > > This has initdb set the field to the current C implementation's signedness. I > wonder if, instead, initdb should set signedness=true unconditionally. Then > the only source of signedness=false will be pg_upgrade. > > Advantage: signedness=false will get rarer over time. If we had known about > this problem during the last development cycle that forced initdb (v8.3), we > would have made all clusters signed or all clusters unsigned. Making > pg_upgrade the only source of signedness=false will cause the population of > database clusters to converge toward that retrospective ideal. I think it's a good idea. Being able to treat one case as a rarer one rather than treating both cases equally may have various advantages in the future, for example when developing or investigating a problem. > Disadvantage: testing signedness=false will require more planning. If testing signedness=false always requires pg_upgrade, there might be some cumbersomeness. Inventing a testing-purpose-only tool (e.g., a CLI program) that changes the signedness might make tests easier. > What other merits should we consider as part of deciding? Considering that the population of database cluster signedness will converge to signedness=true in the future, we can consider using -fsigned-char to prevent similar problems for the future. We need to think about possible side-effects as well, though. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Oct 01, 2024 at 11:55:48AM -0700, Masahiko Sawada wrote: > On Mon, Sep 30, 2024 at 11:49 AM Noah Misch <noah@leadboat.com> wrote: > > This has initdb set the field to the current C implementation's signedness. I > > wonder if, instead, initdb should set signedness=true unconditionally. Then > > the only source of signedness=false will be pg_upgrade. > > > > Advantage: signedness=false will get rarer over time. If we had known about > > this problem during the last development cycle that forced initdb (v8.3), we > > would have made all clusters signed or all clusters unsigned. Making > > pg_upgrade the only source of signedness=false will cause the population of > > database clusters to converge toward that retrospective ideal. > > I think it's a good idea. Being able to treat one case as a rarer one > rather than treating both cases equally may have various advantages in > the future, for example when developing or investigating a problem. > > > Disadvantage: testing signedness=false will require more planning. > > If testing signedness=false always requires pg_upgrade, there might be > some cumbersomeness. Inventing a testing-purpose-only tool (e.g., a > CLI program) that changes the signedness might make tests easier. > > > What other merits should we consider as part of deciding? > > Considering that the population of database cluster signedness will > converge to signedness=true in the future, we can consider using > -fsigned-char to prevent similar problems for the future. We need to > think about possible side-effects as well, though. It's good to think about -fsigned-char. While I find it tempting, several things would need to hold for us to benefit from it: - Every supported compiler has to offer it or an equivalent. - The non-compiler parts of every supported C implementation need to cooperate. For example, CHAR_MIN must change in response to the flag. See the first comment in cash_in(). - Libraries we depend on can't do anything incompatible with it. Given that, I would lean toward not using -fsigned-char. It's unlikely all three things will hold. Even if they do, the benefit is not large.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Noah Misch <noah@leadboat.com> writes: > On Tue, Oct 01, 2024 at 11:55:48AM -0700, Masahiko Sawada wrote: >> Considering that the population of database cluster signedness will >> converge to signedness=true in the future, we can consider using >> -fsigned-char to prevent similar problems for the future. We need to >> think about possible side-effects as well, though. > It's good to think about -fsigned-char. While I find it tempting, several > things would need to hold for us to benefit from it: > - Every supported compiler has to offer it or an equivalent. > - The non-compiler parts of every supported C implementation need to > cooperate. For example, CHAR_MIN must change in response to the flag. See > the first comment in cash_in(). > - Libraries we depend on can't do anything incompatible with it. > Given that, I would lean toward not using -fsigned-char. It's unlikely all > three things will hold. Even if they do, the benefit is not large. I am very, very strongly against deciding that Postgres will only support one setting of char signedness. It's a step on the way to hardware monoculture, and we know where that eventually leads. (In other words, I categorically reject Sawada-san's assertion that signed chars will become universal. I'd reject the opposite assertion as well.) regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Oct 1, 2024 at 8:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Noah Misch <noah@leadboat.com> writes: > > On Tue, Oct 01, 2024 at 11:55:48AM -0700, Masahiko Sawada wrote: > >> Considering that the population of database cluster signedness will > >> converge to signedness=true in the future, we can consider using > >> -fsigned-char to prevent similar problems for the future. We need to > >> think about possible side-effects as well, though. > > > It's good to think about -fsigned-char. While I find it tempting, several > > things would need to hold for us to benefit from it: > > > - Every supported compiler has to offer it or an equivalent. > > - The non-compiler parts of every supported C implementation need to > > cooperate. For example, CHAR_MIN must change in response to the flag. See > > the first comment in cash_in(). > > - Libraries we depend on can't do anything incompatible with it. > > > Given that, I would lean toward not using -fsigned-char. It's unlikely all > > three things will hold. Even if they do, the benefit is not large. > > I am very, very strongly against deciding that Postgres will only > support one setting of char signedness. It's a step on the way to > hardware monoculture, and we know where that eventually leads. > (In other words, I categorically reject Sawada-san's assertion > that signed chars will become universal. I'd reject the opposite > assertion as well.) Thank you for pointing this out. I agree with both of you. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Thu, Oct 03, 2024 at 06:55:47AM -0700, Masahiko Sawada wrote: > I've attached PoC patches for the idea Noah proposed. Newly created > clusters unconditionally have default_char_signedness=true, and the > only source of signedness=false is pg_upgrade. To update the > signedness in the controlfile, pg_resetwal now has a new option > --char-signedness, which is used by pg_upgrade internally. Feedback is > very welcome. Upthread, we discussed testability. Does pg_resetwal facilitate all appropriate testing, or do testing difficulties remain? I reviewed these patches, finding only one non-cosmetic review comment. Given the PoC status, some of the observations below are likely ones you already know or would have found before exiting PoC. > --- a/doc/src/sgml/func.sgml > +++ b/doc/src/sgml/func.sgml > @@ -27858,6 +27858,11 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres} > <entry><type>integer</type></entry> > </row> > > + <row> > + <entry><structfield>signed_char</structfield></entry> > + <entry><type>bool</type></entry> s/signed_char/default_char_signedness/ to align with the naming below. > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -4256,6 +4256,33 @@ WriteControlFile(void) > + * Newly created database clusters unconditionally set the default char > + * signedness to true. pg_upgrade changes this flag for clusters that were > + * initialized on signedness=false platforms. As a result, > + * signedness=false setting will become rare over time. will get rarer > + * over time. There's a redundant fragment of sentence. > --- a/doc/src/sgml/ref/pg_resetwal.sgml > +++ b/doc/src/sgml/ref/pg_resetwal.sgml > @@ -171,6 +171,16 @@ PostgreSQL documentation > </para> > > <variablelist> > + <varlistentry> > + <term><option>--char-signedness=<replaceable class="parameter">option</replaceable></option></term> > + <listitem> > + <para> > + Manually set the default char signedness. Possible values are > + <literal>signed</literal> and <literal>unsigned</literal>. > + </para> > + </listitem> > + </varlistentry> This needs more docs, like its <varlistentry> neighbors have. First, see the point below about the pg_upgrade docs. > --- a/src/bin/pg_resetwal/pg_resetwal.c > +++ b/src/bin/pg_resetwal/pg_resetwal.c > @@ -1189,6 +1213,7 @@ usage(void) > printf(_(" -u, --oldest-transaction-id=XID set oldest transaction ID\n")); > printf(_(" -x, --next-transaction-id=XID set next transaction ID\n")); > printf(_(" --wal-segsize=SIZE size of WAL segments, in megabytes\n")); > + printf(_(" --char-signedness=OPTION set char signedness to \"signed\" or \"unsigned\"\n")); This help output alphabetizes by short option, with the one non-short option at the end. So I'd also alphabetize among the non-short options, specifically putting the new line before --wal-segsize. > source cluster was initialized on the same platform where pg_upgrae is Typo "pg_upgrade" > --- a/src/bin/pg_upgrade/controldata.c > +++ b/src/bin/pg_upgrade/controldata.c > @@ -62,6 +62,7 @@ get_control_data(ClusterInfo *cluster) > bool got_date_is_int = false; > bool got_data_checksum_version = false; > bool got_cluster_state = false; > + bool got_default_char_signedness = false; > char *lc_collate = NULL; > char *lc_ctype = NULL; > char *lc_monetary = NULL; > @@ -206,6 +207,13 @@ get_control_data(ClusterInfo *cluster) > got_data_checksum_version = true; > } > > + /* Only in <= 17 */ > + if (GET_MAJOR_VERSION(cluster->major_version) <= 1700) > + { > + cluster->controldata.default_char_signedness = true; If anything reads the "true" stored here, that would be a bug. Is there a reasonable way do one or two of the following? 1. not initialize it at all 2. store something like -1 instead 3. add a comment that it's never read > + got_default_char_signedness = true; > + } I wouldn't say we "got" this. I'd handle this more like got_oldestmulti, where we know !got_oldestmulti is okay until MULTIXACT_FORMATCHANGE_CAT_VER. Maybe also replace the "<= 1700" checks with catversion checks. > + > /* we have the result of cmd in "output". so parse it line by line now */ > while (fgets(bufin, sizeof(bufin), output)) > { > @@ -501,6 +509,17 @@ get_control_data(ClusterInfo *cluster) > cluster->controldata.data_checksum_version = str2uint(p); > got_data_checksum_version = true; > } > + else if ((p = strstr(bufin, "char data signedness:")) != NULL) > + { > + p = strchr(p, ':'); > + > + if (p == NULL || strlen(p) <= 1) > + pg_fatal("%d: controldata retrieval problem", __LINE__); > + > + p++; /* remove ':' char */ > + cluster->controldata.default_char_signedness = strstr(p, "signed") != NULL; Won't this match "unsigned", too? (This is my only non-cosmetic review comment.) I'd strcmp for the two expected values. If neither matches, report the pg_fatal "retrieval problem". > --- a/doc/src/sgml/ref/pgupgrade.sgml > +++ b/doc/src/sgml/ref/pgupgrade.sgml > @@ -276,6 +276,24 @@ PostgreSQL documentation > </listitem> > </varlistentry> > > + <varlistentry> > + <term><option>--set-char-signedness=</option><replaceable>option</replaceable></term> > + <listitem> > + <para> > + Manually set the default char signedness of new clusters. Possible values > + are <literal>signed</literal> and <literal>unsigned</literal>. > + </para> > + <para> > + The signedness of the 'char' type in C is implementation-dependent. For instance, > + 'signed char' is used by default on x86 CPUs, while 'unsigned char' is used on aarch > + CPUs. <application>pg_upgrade</application> automatically inherits the char > + signedness from the old cluster. This option is useful for upgrading the cluster > + that user knew they copied it to a platform having different char signedness > + (e.g. from x86 to aarch). That last sentence would need some grammar help. Instead, let's rewrite this paragraph to assume the reader is not a C programmer. Focus on what such a reader should do to choose what argument, if any, to pass. Our docs haven't used the term "aarch". This applies to 32-bit ARM in addition to 64-bit ARM. Hence, I'd write the term "ARM" instead. > --- a/src/bin/pg_upgrade/option.c > +++ b/src/bin/pg_upgrade/option.c > @@ -307,6 +316,7 @@ usage(void) > printf(_(" --copy copy files to new cluster (default)\n")); > printf(_(" --copy-file-range copy files to new cluster with copy_file_range\n")); > printf(_(" --sync-method=METHOD set method for syncing files to disk\n")); > + printf(_(" --set-char-signedness=OPTION set new cluster char signedness to \"signed\" or \"unsigned\"")); Move this up one line, to preserve alphabetization. > --- a/src/backend/storage/lmgr/predicate.c > +++ b/src/backend/storage/lmgr/predicate.c > @@ -2588,6 +2588,11 @@ PredicateLockPage(Relation relation, BlockNumber blkno, Snapshot snapshot) > if (!SerializationNeededForRead(relation, snapshot)) > return; > > + ereport(LOG, (errmsg("predicate %s blk %u", > + RelationGetRelationName(relation), > + blkno), > + errbacktrace())); Revert this file. Thanks, nm
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Fri, Jan 17, 2025 at 05:11:41PM -0800, Masahiko Sawada wrote: > Thank you for reviewing the patches. I agree with all the comments you > made. I've addressed them and I've attached new version patches that > now have some regression tests. > Subject: [PATCH v3 4/5] pg_upgrade: Add --set-char-signedness to set the > default char signedness of new cluster. > > This change adds a new option --set-char-signedness to pg_upgrade. It > enables user to set arbitrary signedness during pg_upgrade. This helps > cases where user who knew they copied the v17 source cluster from > x86 (signedness=true) to ARM (signedness=falese) can pg_upgrade s/falese/false/ > --- a/doc/src/sgml/ref/pgupgrade.sgml > +++ b/doc/src/sgml/ref/pgupgrade.sgml > @@ -276,6 +276,54 @@ PostgreSQL documentation > </listitem> > </varlistentry> > > + <varlistentry> > + <term><option>--set-char-signedness=</option><replaceable>option</replaceable></term> If upgrading from v18+, valid use cases for this option are rare. I think using the option should raise an error for v18+ source. A user who knows what they're doing can use pg_resetwal manually, before the upgrade. This page shouldn't refer to the pg_resetwal flag specifically, because that need is so rare. What do you think? > + <listitem> > + <para> > + Manually set the default char signedness of new clusters. Possible values > + are <literal>signed</literal> and <literal>unsigned</literal>. > + </para> > + <para> > + In the C language, the default signedness of the <type>char</type> type > + (when not explicitly specified) varies across platforms. For example, > + <type>char</type> defaults to <type>signed char</type> on x86 CPUs but > + to <type>unsigned char</type> on ARM CPUs. When data stored using the > + <type>char</type> type is persisted to disk, such as in GIN indexes, > + this platform-dependent behavior results in incorrect data comparisons > + in two scenarios: > + </para> > + <itemizedlist> > + <listitem> > + <para> > + When data is moved between platforms with different char signedness. > + </para> > + </listitem> > + <listitem> > + <para> > + When data is replicated using streaming replication across different architectures. > + </para> > + </listitem> > + </itemizedlist> After your patch series, core PostgreSQL persistent data doesn't depend on "char" signedness. Extensions should also start calling GetDefaultCharSignedness() to become independent of "char" signedness. Hence, let's remove the part starting at "When data stored using the char type" and ending here. Future users don't need to understand why pre-v18 had a problem. They mainly need to know how to set this flag. If we wanted to say something about the breakage before v18, it might be, "If a pre-v18 cluster has trgm indexes and ran on different platforms at different times in the history of those indexes, REINDEX those indexes before or after running pg_upgrade." I'd put that in the release notes, not here in the pg_upgrade doc page. > + <para> > + Starting from <productname>PostgreSQL</productname> 18, database clusters > + maintain their own default char signedness setting, which can be used as > + a hint to ensure consistent behavior across platforms with different Setting this wrong makes your trgm indexes return wrong answers, so it's not a "hint". (I think "hint" implies minor consequences for getting it wrong.) > + default char signedness. By default, <application>pg_upgrade</application> > + preserves the char signedness setting when upgrading from an existing cluster. > + However, when upgrading from <productname>PostgreSQL</productname> 17 or > + earlier, <application>pg_upgrade</application> adopts the char signedness > + of the platform on which it was built. > + </para> > + <para> > + This option allows you to explicitly set the default char signedness for > + the new cluster, overriding any inherited values. This is particularly useful > + when you plan to migrate the upgraded cluster to a platform with different > + char signedness (for example, when moving from an x86-based system to an > + ARM-based system). Let's refine the terminology around "plan to migrate". Here's an informal description (not documentation-quality): 1. If you "plan to migrate" but have not yet done so, don't use this flag. The default behavior is right. Upgrade on the old platform without using this flag, and then migrate. This is the most-foolproof approach. 2. If you already migrated the cluster data files to a different platform just before running pg_upgrade, use this flag. It's essential not to modify any trgm indexes between migrating the data files and running pg_upgrade. Best to make pg_upgrade the first action that starts the cluster on the new platform, not starting the cluster manually. This is trickier than (1). > --- a/src/bin/pg_upgrade/pg_upgrade.c > +++ b/src/bin/pg_upgrade/pg_upgrade.c > @@ -399,8 +399,14 @@ set_new_cluster_char_signedness(void) > { > bool new_char_signedness; > > - /* Inherit the source database's signedness */ > - new_char_signedness = old_cluster.controldata.default_char_signedness; > + /* > + * Use the specified char signedness if specifies. Otherwise we inherit s/if specifies/if specified/ > --- a/contrib/pg_trgm/trgm.h > +++ b/contrib/pg_trgm/trgm.h > @@ -41,14 +41,12 @@ > typedef char trgm[3]; > > #define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) ) > -#define CMPPCHAR(a,b,i) CMPCHAR( *(((const char*)(a))+i), *(((const char*)(b))+i) ) > -#define CMPTRGM(a,b) ( CMPPCHAR(a,b,0) ? CMPPCHAR(a,b,0) : ( CMPPCHAR(a,b,1) ? CMPPCHAR(a,b,1) : CMPPCHAR(a,b,2) ) ) I recommend moving CMPCHAR into the C file along with the two related macros that you're moving. > --- a/contrib/pg_trgm/trgm_op.c > +++ b/contrib/pg_trgm/trgm_op.c > @@ -42,6 +42,9 @@ PG_FUNCTION_INFO_V1(strict_word_similarity_commutator_op); > PG_FUNCTION_INFO_V1(strict_word_similarity_dist_op); > PG_FUNCTION_INFO_V1(strict_word_similarity_dist_commutator_op); > > +static inline int CMPTRGM_CHOOSE(const void *a, const void *b); All instances of the "inline" keyword in this patch series are on functions called only via function pointers. Hence, those functions are never inlined. I'd remove the keyword. > --- a/src/bin/pg_upgrade/pg_upgrade.h > +++ b/src/bin/pg_upgrade/pg_upgrade.h > @@ -125,6 +125,12 @@ extern char *output_files[]; > */ > #define JSONB_FORMAT_CHANGE_CAT_VER 201409291 > > +/* > + * The control file was changed to have the default char signedness, > + * commit XXXXX. > + */ > +#define DEFAULT_CHAR_SIGNEDNESS_VAT_VER 202501161 s/VAT/CAT/ > +command_like( > + [ 'pg_controldata', $old->data_dir ], > + qr/Default char data signedness:\s+signed/, > + 'default char signedness of old cluster is signed in control file'); > +command_like( > + [ 'pg_controldata', $new->data_dir ], > + qr/Default char data signedness:\s+signed/, > + 'default char signedness is new cluster signed in control file'); s/is new cluster signed/of new cluster is signed/ for consistency with the previous test name. > + > +# Set the old cluster's default char signedness to unsigned for test. > +command_ok( > + [ > + 'pg_resetwal', '--char-signedness', 'unsigned', > + '-f', $old->data_dir > + ], > + "set old cluster's default char signeness to unsigned"); s/signeness/signedness/ > Subject: [PATCH v3 2/5] pg_resetwal: Add --char-signedness option to change > the default char signedness. > > With the newly added option --char-signedness, pg_resetwal updates the > default char signeness flag in the controlfile. s/signeness/signedness/ > --- a/doc/src/sgml/ref/pg_resetwal.sgml > +++ b/doc/src/sgml/ref/pg_resetwal.sgml > @@ -171,6 +171,20 @@ PostgreSQL documentation > </para> > > <variablelist> > + <varlistentry> > + <term><option>--char-signedness=<replaceable class="parameter">option</replaceable></option></term> > + <listitem> > + <para> > + Manually set the default char signedness. Possible values are > + <literal>signed</literal> and <literal>unsigned</literal>. > + </para> > + <para> > + This option can also be used to change the default char signedness > + of an existing database cluster. > + </para> > + </listitem> > + </varlistentry> Each other list entry discusses how to choose a safe value, so this should say something like that. For users in doubt, nothing other than pg_upgrade should use this flag. Theoretically, if you just ran pg_upgrade with the wrong value and had not yet modified any trgm indexes, you could run this to fix the problem. That's too rare of a situation and too dangerous to document, so the documentation for this flag should discourage using the flag. > --- a/src/bin/pg_resetwal/pg_resetwal.c > +++ b/src/bin/pg_resetwal/pg_resetwal.c > @@ -105,6 +106,7 @@ main(int argc, char *argv[]) > {"multixact-offset", required_argument, NULL, 'O'}, > {"oldest-transaction-id", required_argument, NULL, 'u'}, > {"next-transaction-id", required_argument, NULL, 'x'}, > + {"char-signedness", required_argument, NULL, 2}, > {"wal-segsize", required_argument, NULL, 1}, Put the ", 2}" entry after the ", 1}" entry. (We alphabetize in usage(), but we append to the end in these C arrays.) > @@ -1189,6 +1213,7 @@ usage(void) > printf(_(" -u, --oldest-transaction-id=XID set oldest transaction ID\n")); > printf(_(" -x, --next-transaction-id=XID set next transaction ID\n")); > printf(_(" --wal-segsize=SIZE size of WAL segments, in megabytes\n")); > + printf(_(" --char-signedness=OPTION set char signedness to \"signed\" or \"unsigned\"\n")); This --help output alphabetizes by short option, with the one non-short option at the end. So I'd also alphabetize among the non-short options, specifically putting the new line before --wal-segsize.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Apart from two doc issues, this is ready: On Tue, Feb 18, 2025 at 01:23:20PM -0800, Masahiko Sawada wrote: > On Mon, Feb 17, 2025 at 2:57 PM Noah Misch <noah@leadboat.com> wrote: > > On Fri, Jan 17, 2025 at 05:11:41PM -0800, Masahiko Sawada wrote: > + However, when upgrading from <productname>PostgreSQL</productname> 17 or, > + earlier <application>pg_upgrade</application> adopts the char signedness s/or, earlier/or earlier,/ > --- a/doc/src/sgml/ref/pg_resetwal.sgml > +++ b/doc/src/sgml/ref/pg_resetwal.sgml > @@ -171,6 +171,22 @@ PostgreSQL documentation > </para> > > <variablelist> > + <varlistentry> > + <term><option>--char-signedness=<replaceable class="parameter">option</replaceable></option></term> > + <listitem> > + <para> > + Manually set the default char signedness. Possible values are > + <literal>signed</literal> and <literal>unsigned</literal>. > + </para> > + <para> > + A safe value for this option is, if known, the default char signedness > + of the platform where the database cluster was initialized. However, Only if initialized on v17 or earlier. I recommend this edit: diff --git a/doc/src/sgml/ref/pg_resetwal.sgml b/doc/src/sgml/ref/pg_resetwal.sgml index a72678d..dd011d2 100644 --- a/doc/src/sgml/ref/pg_resetwal.sgml +++ b/doc/src/sgml/ref/pg_resetwal.sgml @@ -179,8 +179,11 @@ PostgreSQL documentation <literal>signed</literal> and <literal>unsigned</literal>. </para> <para> - A safe value for this option is, if known, the default char signedness - of the platform where the database cluster was initialized. However, + For a database cluster that <command>pg_upgrade</command> upgraded from + a <productname>PostgreSQL</productname> version before 18, the safe + value would be the default <type>char</type> signedness of the platform + that ran the cluster before that upgrade. For all other + clusters, <literal>signed</literal> would be the safe value. However, this option is exclusively for use with <command>pg_upgrade</command> and should not normally be used manually. </para>
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Feb 18, 2025 at 3:23 PM Noah Misch <noah@leadboat.com> wrote: > > Apart from two doc issues, this is ready: > > On Tue, Feb 18, 2025 at 01:23:20PM -0800, Masahiko Sawada wrote: > > On Mon, Feb 17, 2025 at 2:57 PM Noah Misch <noah@leadboat.com> wrote: > > > On Fri, Jan 17, 2025 at 05:11:41PM -0800, Masahiko Sawada wrote: > > > + However, when upgrading from <productname>PostgreSQL</productname> 17 or, > > + earlier <application>pg_upgrade</application> adopts the char signedness > > s/or, earlier/or earlier,/ > > > --- a/doc/src/sgml/ref/pg_resetwal.sgml > > +++ b/doc/src/sgml/ref/pg_resetwal.sgml > > @@ -171,6 +171,22 @@ PostgreSQL documentation > > </para> > > > > <variablelist> > > + <varlistentry> > > + <term><option>--char-signedness=<replaceable class="parameter">option</replaceable></option></term> > > + <listitem> > > + <para> > > + Manually set the default char signedness. Possible values are > > + <literal>signed</literal> and <literal>unsigned</literal>. > > + </para> > > + <para> > > + A safe value for this option is, if known, the default char signedness > > + of the platform where the database cluster was initialized. However, > > Only if initialized on v17 or earlier. I recommend this edit: > > diff --git a/doc/src/sgml/ref/pg_resetwal.sgml b/doc/src/sgml/ref/pg_resetwal.sgml > index a72678d..dd011d2 100644 > --- a/doc/src/sgml/ref/pg_resetwal.sgml > +++ b/doc/src/sgml/ref/pg_resetwal.sgml > @@ -179,8 +179,11 @@ PostgreSQL documentation > <literal>signed</literal> and <literal>unsigned</literal>. > </para> > <para> > - A safe value for this option is, if known, the default char signedness > - of the platform where the database cluster was initialized. However, > + For a database cluster that <command>pg_upgrade</command> upgraded from > + a <productname>PostgreSQL</productname> version before 18, the safe > + value would be the default <type>char</type> signedness of the platform > + that ran the cluster before that upgrade. For all other > + clusters, <literal>signed</literal> would be the safe value. However, > this option is exclusively for use with <command>pg_upgrade</command> > and should not normally be used manually. > </para> Thank you for reviewing the patches. I've fixed these issues and attached the updated patches. I have one question about the 0001 patch; since we add 'default_char_signedness' field to ControlFileData do we need to bump PG_CONTROL_VERSION? We have comments about bumping PG_CONTROL_VERSION when changing CheckPoint struct or DBState enum so it seems likely but I'd like to confirm just in case that we need to bump PG_CONTROL_VERSION also when changing ControlFileData. If we need, can we bump it to 1800? or 1701? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
- v5-0002-pg_resetwal-Add-char-signedness-option-to-change-.patch
- v5-0005-Fix-an-issue-with-index-scan-using-pg_trgm-due-to.patch
- v5-0003-pg_upgrade-Preserve-default-char-signedness-value.patch
- v5-0001-Add-default_char_signedness-field-to-ControlFileD.patch
- v5-0004-pg_upgrade-Add-set-char-signedness-to-set-the-def.patch
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Wed, Feb 19, 2025 at 10:48:29AM -0800, Masahiko Sawada wrote: > Thank you for reviewing the patches. I've fixed these issues and > attached the updated patches. Looks good. > I have one question about the 0001 patch; since we add > 'default_char_signedness' field to ControlFileData do we need to bump > PG_CONTROL_VERSION? We have comments about bumping PG_CONTROL_VERSION > when changing CheckPoint struct or DBState enum so it seems likely but > I'd like to confirm just in case that we need to bump > PG_CONTROL_VERSION also when changing ControlFileData. Yes. (I'm not aware of value we get from having distinct control file version and catalog version, but we do have both.) > If we need, can > we bump it to 1800? or 1701? I'd do 1800. The pattern seems to be to bump to 1800 for the first pg_control change of the v18 cycle, then 1801, then 1802 for the third change of the cycle. That's based on this history: git log -U0 -p src/include/catalog/pg_control.h | grep -E '^(Date|\+#define PG_CONTROL_VERSION)'
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Thu, Feb 20, 2025 at 2:07 PM Noah Misch <noah@leadboat.com> wrote: > > On Wed, Feb 19, 2025 at 10:48:29AM -0800, Masahiko Sawada wrote: > > Thank you for reviewing the patches. I've fixed these issues and > > attached the updated patches. > > Looks good. > > > I have one question about the 0001 patch; since we add > > 'default_char_signedness' field to ControlFileData do we need to bump > > PG_CONTROL_VERSION? We have comments about bumping PG_CONTROL_VERSION > > when changing CheckPoint struct or DBState enum so it seems likely but > > I'd like to confirm just in case that we need to bump > > PG_CONTROL_VERSION also when changing ControlFileData. > > Yes. (I'm not aware of value we get from having distinct control file version > and catalog version, but we do have both.) > > > If we need, can > > we bump it to 1800? or 1701? > > I'd do 1800. The pattern seems to be to bump to 1800 for the first pg_control > change of the v18 cycle, then 1801, then 1802 for the third change of the > cycle. That's based on this history: > > git log -U0 -p src/include/catalog/pg_control.h | grep -E '^(Date|\+#define PG_CONTROL_VERSION)' Thank you for the confirmation. That makes sense to me. I'll push these patches with version bumps, barring any objections or further comments. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Thu, Feb 20, 2025 at 3:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Feb 20, 2025 at 2:07 PM Noah Misch <noah@leadboat.com> wrote: > > > > On Wed, Feb 19, 2025 at 10:48:29AM -0800, Masahiko Sawada wrote: > > > Thank you for reviewing the patches. I've fixed these issues and > > > attached the updated patches. > > > > Looks good. > > > > > I have one question about the 0001 patch; since we add > > > 'default_char_signedness' field to ControlFileData do we need to bump > > > PG_CONTROL_VERSION? We have comments about bumping PG_CONTROL_VERSION > > > when changing CheckPoint struct or DBState enum so it seems likely but > > > I'd like to confirm just in case that we need to bump > > > PG_CONTROL_VERSION also when changing ControlFileData. > > > > Yes. (I'm not aware of value we get from having distinct control file version > > and catalog version, but we do have both.) > > > > > If we need, can > > > we bump it to 1800? or 1701? > > > > I'd do 1800. The pattern seems to be to bump to 1800 for the first pg_control > > change of the v18 cycle, then 1801, then 1802 for the third change of the > > cycle. That's based on this history: > > > > git log -U0 -p src/include/catalog/pg_control.h | grep -E '^(Date|\+#define PG_CONTROL_VERSION)' > > Thank you for the confirmation. That makes sense to me. > > I'll push these patches with version bumps, barring any objections or > further comments. Pushed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Hi, While working on another round of the long option and fat comma style cleanup, I noticed that the test for pg_upgrade --set-char-signedess doesn't test what it's supposed to: Masahiko Sawada <sawada.mshk@gmail.com> writes: > diff --git a/src/bin/pg_upgrade/t/005_char_signedness.pl b/src/bin/pg_upgrade/t/005_char_signedness.pl > index 05c3014a27d..c024106863e 100644 > --- a/src/bin/pg_upgrade/t/005_char_signedness.pl > +++ b/src/bin/pg_upgrade/t/005_char_signedness.pl > @@ -40,6 +40,23 @@ command_like( > qr/Default char data signedness:\s+unsigned/, > 'updated default char signedness is unsigned in control file'); > > +# Cannot use --set-char-signedness option for upgrading from v18+ > +command_fails( > + [ > + 'pg_upgrade', '--no-sync', > + '-d', $old->data_dir, > + '-D', $new->data_dir, > + '-b', $old->config_data('--bindir'), > + '-B', $new->config_data('--bindir'), > + '-s', $new->host, > + '-p', $old->port, > + '-P', $new->port, > + '-set-char-signedness', 'signed', This is missing a dash, which causes the command to fail, but for the wrong reason. pg_uprade seems to print all its errors on stdout, which I guess is why the test use plain command_fails() instead of command_fails_like(). However, we have another function to deal with this: command_checks_all(). Attached are patches that fix the above test, and also convert the other command_fails() calls in the pg_upgrade tests to test for specific messages. - ilmari From b80c653c6635096345ec453bfe9445a82b7ab049 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org> Date: Tue, 25 Feb 2025 22:27:36 +0000 Subject: [PATCH 1/2] pg_upgrade: fix --set-char-signedness failure test The test had the option misspelled with only one dash, which fails, but not with the expected message. Since pg_upgrade gives its error messages on stdout(?!), use command_checks_all() instead of command_fails_like(). --- src/bin/pg_upgrade/t/005_char_signedness.pl | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/bin/pg_upgrade/t/005_char_signedness.pl b/src/bin/pg_upgrade/t/005_char_signedness.pl index c024106863e..d186822ac77 100644 --- a/src/bin/pg_upgrade/t/005_char_signedness.pl +++ b/src/bin/pg_upgrade/t/005_char_signedness.pl @@ -41,7 +41,7 @@ command_like( 'updated default char signedness is unsigned in control file'); # Cannot use --set-char-signedness option for upgrading from v18+ -command_fails( +command_checks_all( [ 'pg_upgrade', '--no-sync', '-d', $old->data_dir, @@ -51,9 +51,12 @@ command_fails( '-s', $new->host, '-p', $old->port, '-P', $new->port, - '-set-char-signedness', 'signed', + '--set-char-signedness', 'signed', $mode ], + 1, + [qr/--set-char-signedness option cannot be used/], + [], '--set-char-signedness option cannot be used for upgrading from v18 or later' ); -- 2.43.0 From 03d25df987029da7511ff5a6f4ce69d1aef16ea9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org> Date: Tue, 25 Feb 2025 22:56:12 +0000 Subject: [PATCH 2/2] pg_upgrade: use command_checks_all() to check for errors pg_upgrade prints its error messages on stdout, so we can't use command_fails_like() to check that it fails for the right reason. Instead, we have to use command_checks_all(), which this patch does. --- src/bin/pg_upgrade/t/002_pg_upgrade.pl | 5 ++++- src/bin/pg_upgrade/t/004_subscription.pl | 8 +++++++- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl index 45ea94c84bb..cccba9dc3db 100644 --- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl +++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl @@ -396,7 +396,7 @@ $oldnode->stop; # Cause a failure at the start of pg_upgrade, this should create the logging # directory pg_upgrade_output.d but leave it around. Keep --check for an # early exit. -command_fails( +command_checks_all( [ 'pg_upgrade', '--no-sync', '-d', $oldnode->data_dir, @@ -408,6 +408,9 @@ command_fails( '-P', $newnode->port, $mode, '--check', ], + 1, + [qr{check for ".*?does.not.exist" failed:}], + [], 'run of pg_upgrade --check for new instance with incorrect binary path'); ok(-d $newnode->data_dir . "/pg_upgrade_output.d", "pg_upgrade_output.d/ not removed after pg_upgrade failure"); diff --git a/src/bin/pg_upgrade/t/004_subscription.pl b/src/bin/pg_upgrade/t/004_subscription.pl index 13773316e1d..3f9734f7b3f 100644 --- a/src/bin/pg_upgrade/t/004_subscription.pl +++ b/src/bin/pg_upgrade/t/004_subscription.pl @@ -130,7 +130,7 @@ $old_sub->safe_psql('postgres', $old_sub->stop; -command_fails( +command_checks_all( [ 'pg_upgrade', '--no-sync', @@ -144,6 +144,12 @@ command_fails( $mode, '--check', ], + 1, + [ + qr{\QYour installation contains subscriptions without origin}, + qr{\Qrelations not in i (initialize) or r (ready) state}, + ], + [], 'run of pg_upgrade --check for old instance with relation in \'d\' datasync(invalid) state and missing replication origin' ); -- 2.43.0
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Feb 25, 2025 at 3:03 PM Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote: > > Hi, > > While working on another round of the long option and fat comma style > cleanup, I noticed that the test for pg_upgrade --set-char-signedess > doesn't test what it's supposed to: > > Masahiko Sawada <sawada.mshk@gmail.com> writes: > > > diff --git a/src/bin/pg_upgrade/t/005_char_signedness.pl b/src/bin/pg_upgrade/t/005_char_signedness.pl > > index 05c3014a27d..c024106863e 100644 > > --- a/src/bin/pg_upgrade/t/005_char_signedness.pl > > +++ b/src/bin/pg_upgrade/t/005_char_signedness.pl > > @@ -40,6 +40,23 @@ command_like( > > qr/Default char data signedness:\s+unsigned/, > > 'updated default char signedness is unsigned in control file'); > > > > +# Cannot use --set-char-signedness option for upgrading from v18+ > > +command_fails( > > + [ > > + 'pg_upgrade', '--no-sync', > > + '-d', $old->data_dir, > > + '-D', $new->data_dir, > > + '-b', $old->config_data('--bindir'), > > + '-B', $new->config_data('--bindir'), > > + '-s', $new->host, > > + '-p', $old->port, > > + '-P', $new->port, > > + '-set-char-signedness', 'signed', > > This is missing a dash, which causes the command to fail, but for the > wrong reason. pg_uprade seems to print all its errors on stdout, which > I guess is why the test use plain command_fails() instead of > command_fails_like(). However, we have another function to deal with > this: command_checks_all(). Attached are patches that fix the above > test, and also convert the other command_fails() calls in the pg_upgrade > tests to test for specific messages. Thank you for the report. I agree with both points. I believe that replacing command_fails_like() with command_checks_all() is an improvement whereas adding the missing dash to --set-char-signendess option is a bug fix. How about reorganizing the patches as follows? - 0001 patch just adds the dash to the --set-char-signedness. - 0002 patch uses command_checks_all() in 002_pg_upgrade.pl, 004_subscription.pl, and 005_char_signedness.pl for checking the output better. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Masahiko Sawada <sawada.mshk@gmail.com> writes: > On Tue, Feb 25, 2025 at 3:03 PM Dagfinn Ilmari Mannsåker > <ilmari@ilmari.org> wrote: >> >> Hi, >> >> While working on another round of the long option and fat comma style >> cleanup, I noticed that the test for pg_upgrade --set-char-signedess >> doesn't test what it's supposed to: >> >> Masahiko Sawada <sawada.mshk@gmail.com> writes: >> >> > diff --git a/src/bin/pg_upgrade/t/005_char_signedness.pl b/src/bin/pg_upgrade/t/005_char_signedness.pl >> > index 05c3014a27d..c024106863e 100644 >> > --- a/src/bin/pg_upgrade/t/005_char_signedness.pl >> > +++ b/src/bin/pg_upgrade/t/005_char_signedness.pl >> > @@ -40,6 +40,23 @@ command_like( >> > qr/Default char data signedness:\s+unsigned/, >> > 'updated default char signedness is unsigned in control file'); >> > >> > +# Cannot use --set-char-signedness option for upgrading from v18+ >> > +command_fails( >> > + [ >> > + 'pg_upgrade', '--no-sync', >> > + '-d', $old->data_dir, >> > + '-D', $new->data_dir, >> > + '-b', $old->config_data('--bindir'), >> > + '-B', $new->config_data('--bindir'), >> > + '-s', $new->host, >> > + '-p', $old->port, >> > + '-P', $new->port, >> > + '-set-char-signedness', 'signed', >> >> This is missing a dash, which causes the command to fail, but for the >> wrong reason. pg_uprade seems to print all its errors on stdout, which >> I guess is why the test use plain command_fails() instead of >> command_fails_like(). However, we have another function to deal with >> this: command_checks_all(). Attached are patches that fix the above >> test, and also convert the other command_fails() calls in the pg_upgrade >> tests to test for specific messages. > > Thank you for the report. I agree with both points. > > I believe that replacing command_fails_like() with > command_checks_all() is an improvement whereas adding the missing dash > to --set-char-signendess option is a bug fix. How about reorganizing > the patches as follows? > > - 0001 patch just adds the dash to the --set-char-signedness. > - 0002 patch uses command_checks_all() in 002_pg_upgrade.pl, > 004_subscription.pl, and 005_char_signedness.pl for checking the > output better. Yeah, that's a better way to organise it, updated patches attached. - ilmari From 0671eb2a6f59926bf94b041a1e20b864ec370a25 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org> Date: Wed, 26 Feb 2025 10:51:28 +0000 Subject: [PATCH 1/2] pg_upgrade: fix spelling of --set-char-signedness in test The single dash causes it to fail with a different error, but because the test isn't checking the error message it was passing anyway. The next commit will make the test check for the actual error. --- src/bin/pg_upgrade/t/005_char_signedness.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/bin/pg_upgrade/t/005_char_signedness.pl b/src/bin/pg_upgrade/t/005_char_signedness.pl index c024106863e..b3092f03bb7 100644 --- a/src/bin/pg_upgrade/t/005_char_signedness.pl +++ b/src/bin/pg_upgrade/t/005_char_signedness.pl @@ -51,7 +51,7 @@ '-s', $new->host, '-p', $old->port, '-P', $new->port, - '-set-char-signedness', 'signed', + '--set-char-signedness', 'signed', $mode ], '--set-char-signedness option cannot be used for upgrading from v18 or later' -- 2.48.1 From 371f4fafc93e8520b4cb49a657ca56131da3b3e4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org> Date: Wed, 26 Feb 2025 10:53:40 +0000 Subject: [PATCH 2/2] pg_upgrade: check for the expected error message in tests pg_upgrade prints its error messages on stdout, so we can't use command_fails_like() to check that it fails for the right reason. Instead use command_checks_all() to check the exit status and stdout. --- src/bin/pg_upgrade/t/002_pg_upgrade.pl | 5 ++++- src/bin/pg_upgrade/t/004_subscription.pl | 8 +++++++- src/bin/pg_upgrade/t/005_char_signedness.pl | 5 ++++- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl index 45ea94c84bb..cccba9dc3db 100644 --- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl +++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl @@ -396,7 +396,7 @@ sub filter_dump # Cause a failure at the start of pg_upgrade, this should create the logging # directory pg_upgrade_output.d but leave it around. Keep --check for an # early exit. -command_fails( +command_checks_all( [ 'pg_upgrade', '--no-sync', '-d', $oldnode->data_dir, @@ -408,6 +408,9 @@ sub filter_dump '-P', $newnode->port, $mode, '--check', ], + 1, + [qr{check for ".*?does.not.exist" failed:}], + [], 'run of pg_upgrade --check for new instance with incorrect binary path'); ok(-d $newnode->data_dir . "/pg_upgrade_output.d", "pg_upgrade_output.d/ not removed after pg_upgrade failure"); diff --git a/src/bin/pg_upgrade/t/004_subscription.pl b/src/bin/pg_upgrade/t/004_subscription.pl index 13773316e1d..3f9734f7b3f 100644 --- a/src/bin/pg_upgrade/t/004_subscription.pl +++ b/src/bin/pg_upgrade/t/004_subscription.pl @@ -130,7 +130,7 @@ $old_sub->stop; -command_fails( +command_checks_all( [ 'pg_upgrade', '--no-sync', @@ -144,6 +144,12 @@ $mode, '--check', ], + 1, + [ + qr{\QYour installation contains subscriptions without origin}, + qr{\Qrelations not in i (initialize) or r (ready) state}, + ], + [], 'run of pg_upgrade --check for old instance with relation in \'d\' datasync(invalid) state and missing replication origin' ); diff --git a/src/bin/pg_upgrade/t/005_char_signedness.pl b/src/bin/pg_upgrade/t/005_char_signedness.pl index b3092f03bb7..d186822ac77 100644 --- a/src/bin/pg_upgrade/t/005_char_signedness.pl +++ b/src/bin/pg_upgrade/t/005_char_signedness.pl @@ -41,7 +41,7 @@ 'updated default char signedness is unsigned in control file'); # Cannot use --set-char-signedness option for upgrading from v18+ -command_fails( +command_checks_all( [ 'pg_upgrade', '--no-sync', '-d', $old->data_dir, @@ -54,6 +54,9 @@ '--set-char-signedness', 'signed', $mode ], + 1, + [qr/--set-char-signedness option cannot be used/], + [], '--set-char-signedness option cannot be used for upgrading from v18 or later' ); -- 2.48.1
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Wed, Feb 26, 2025 at 3:12 AM Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote: > > Masahiko Sawada <sawada.mshk@gmail.com> writes: > > > On Tue, Feb 25, 2025 at 3:03 PM Dagfinn Ilmari Mannsåker > > <ilmari@ilmari.org> wrote: > >> > >> Hi, > >> > >> While working on another round of the long option and fat comma style > >> cleanup, I noticed that the test for pg_upgrade --set-char-signedess > >> doesn't test what it's supposed to: > >> > >> Masahiko Sawada <sawada.mshk@gmail.com> writes: > >> > >> > diff --git a/src/bin/pg_upgrade/t/005_char_signedness.pl b/src/bin/pg_upgrade/t/005_char_signedness.pl > >> > index 05c3014a27d..c024106863e 100644 > >> > --- a/src/bin/pg_upgrade/t/005_char_signedness.pl > >> > +++ b/src/bin/pg_upgrade/t/005_char_signedness.pl > >> > @@ -40,6 +40,23 @@ command_like( > >> > qr/Default char data signedness:\s+unsigned/, > >> > 'updated default char signedness is unsigned in control file'); > >> > > >> > +# Cannot use --set-char-signedness option for upgrading from v18+ > >> > +command_fails( > >> > + [ > >> > + 'pg_upgrade', '--no-sync', > >> > + '-d', $old->data_dir, > >> > + '-D', $new->data_dir, > >> > + '-b', $old->config_data('--bindir'), > >> > + '-B', $new->config_data('--bindir'), > >> > + '-s', $new->host, > >> > + '-p', $old->port, > >> > + '-P', $new->port, > >> > + '-set-char-signedness', 'signed', > >> > >> This is missing a dash, which causes the command to fail, but for the > >> wrong reason. pg_uprade seems to print all its errors on stdout, which > >> I guess is why the test use plain command_fails() instead of > >> command_fails_like(). However, we have another function to deal with > >> this: command_checks_all(). Attached are patches that fix the above > >> test, and also convert the other command_fails() calls in the pg_upgrade > >> tests to test for specific messages. > > > > Thank you for the report. I agree with both points. > > > > I believe that replacing command_fails_like() with > > command_checks_all() is an improvement whereas adding the missing dash > > to --set-char-signendess option is a bug fix. How about reorganizing > > the patches as follows? > > > > - 0001 patch just adds the dash to the --set-char-signedness. > > - 0002 patch uses command_checks_all() in 002_pg_upgrade.pl, > > 004_subscription.pl, and 005_char_signedness.pl for checking the > > output better. > > Yeah, that's a better way to organise it, updated patches attached. Thank you for the patches. The 0001 patch already got committed (945a9e38). I've simplified the tests and updated the commit message for the 0002 patch. I'm going to push it, barring further comments. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On 21.02.25 20:39, Masahiko Sawada wrote: >>>> I have one question about the 0001 patch; since we add >>>> 'default_char_signedness' field to ControlFileData do we need to bump >>>> PG_CONTROL_VERSION? We have comments about bumping PG_CONTROL_VERSION >>>> when changing CheckPoint struct or DBState enum so it seems likely but >>>> I'd like to confirm just in case that we need to bump >>>> PG_CONTROL_VERSION also when changing ControlFileData. >>> >>> Yes. (I'm not aware of value we get from having distinct control file version >>> and catalog version, but we do have both.) >>> >>>> If we need, can >>>> we bump it to 1800? or 1701? >>> >>> I'd do 1800. The pattern seems to be to bump to 1800 for the first pg_control >>> change of the v18 cycle, then 1801, then 1802 for the third change of the >>> cycle. That's based on this history: >>> >>> git log -U0 -p src/include/catalog/pg_control.h | grep -E '^(Date|\+#define PG_CONTROL_VERSION)' >> >> Thank you for the confirmation. That makes sense to me. >> >> I'll push these patches with version bumps, barring any objections or >> further comments. > > Pushed. Is there a reason why the pg_controldata and pg_resetwal output are "Default char *data* signedness", while the rest of the patch and description just says "char signedness"? The word "data" doesn't mean anything here, does it?
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Wed, Mar 19, 2025 at 2:58 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 21.02.25 20:39, Masahiko Sawada wrote: > >>>> I have one question about the 0001 patch; since we add > >>>> 'default_char_signedness' field to ControlFileData do we need to bump > >>>> PG_CONTROL_VERSION? We have comments about bumping PG_CONTROL_VERSION > >>>> when changing CheckPoint struct or DBState enum so it seems likely but > >>>> I'd like to confirm just in case that we need to bump > >>>> PG_CONTROL_VERSION also when changing ControlFileData. > >>> > >>> Yes. (I'm not aware of value we get from having distinct control file version > >>> and catalog version, but we do have both.) > >>> > >>>> If we need, can > >>>> we bump it to 1800? or 1701? > >>> > >>> I'd do 1800. The pattern seems to be to bump to 1800 for the first pg_control > >>> change of the v18 cycle, then 1801, then 1802 for the third change of the > >>> cycle. That's based on this history: > >>> > >>> git log -U0 -p src/include/catalog/pg_control.h | grep -E '^(Date|\+#define PG_CONTROL_VERSION)' > >> > >> Thank you for the confirmation. That makes sense to me. > >> > >> I'll push these patches with version bumps, barring any objections or > >> further comments. > > > > Pushed. > > Is there a reason why the pg_controldata and pg_resetwal output are > "Default char *data* signedness", while the rest of the patch and > description just says "char signedness"? The word "data" doesn't mean > anything here, does it? I wanted to refer to the word "data" here to the relation data stored as "char" type (without explicit signedness) into the database cluster. But I admit some inconsistency. While pg_controldata outputs "Default char data signedness", pg_control_init() SQL function outputs it as default_char_signedness. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com