Thread: hashing bpchar for nondeterministic collations is broken
CREATE COLLATION ctest_nondet( provider = icu, locale = '', deterministic = false); =# select hashbpcharextended('a' collate ctest_nondet, 3); hashbpcharextended --------------------- 2792701603979490504 (1 row) =# select hashbpcharextended('a ' collate ctest_nondet, 3); hashbpcharextended ---------------------- -4885217598372536483 (1 row) I don't see any major consequences, because in both hash indexes and partitioning it appears that the trailing spaces on the key are already gone before hashbpcharextended is called. If someone does see bigger consequences here, please let me know. Seems to be an oversight. Patch attached. Also, does someone have an opinion on backporting this? I'm inclined to. -- Jeff Davis PostgreSQL Contributor Team - AWS
Attachment
On Fri, Dec 2, 2022 at 4:11 AM Jeff Davis <pgsql@j-davis.com> wrote:
CREATE COLLATION ctest_nondet(
provider = icu, locale = '', deterministic = false);
=# select hashbpcharextended('a' collate ctest_nondet, 3);
hashbpcharextended
---------------------
2792701603979490504
(1 row)
=# select hashbpcharextended('a ' collate ctest_nondet, 3);
hashbpcharextended
----------------------
-4885217598372536483
(1 row)
Good catch. This really shouldn't happen for bpchar as the trailing
blanks are supposed to be ignored. I can see hashbpchar is doing it
with the correct "true" length, so maybe this is a copy-and-pasteo from
hashtextextended?
blanks are supposed to be ignored. I can see hashbpchar is doing it
with the correct "true" length, so maybe this is a copy-and-pasteo from
hashtextextended?
Also, does someone have an opinion on backporting this? I'm inclined
to.
I'm not familiar with the backporting policy, but I also think we need
to as this fixes an obvious oversight.
Thanks
Richard
to as this fixes an obvious oversight.
Thanks
Richard
On Thu, 2022-12-01 at 12:11 -0800, Jeff Davis wrote: > I don't see any major consequences Yikes, it looks like this is a problem for BPCHAR (without typmod specified): create table p_bpchar(t bpchar collate ctest_nondet, i int) partition by hash(t); create table p0_bpchar partition of p_bpchar for values with (modulus 4, remainder 0); create table p1_bpchar partition of p_bpchar for values with (modulus 4, remainder 1); create table p2_bpchar partition of p_bpchar for values with (modulus 4, remainder 2); create table p3_bpchar partition of p_bpchar for values with (modulus 4, remainder 3); insert into p_bpchar values ('a', 0), ('a ', 1), ('a ', 2), ('a ', 3), ('a ', 4), ('a ', 5), ('a ', 6), ('a ', 7); select count(*) from p0_bpchar; -- 2 select count(*) from p1_bpchar; -- 2 select count(*) from p2_bpchar; -- 3 select count(*) from p3_bpchar; -- 1 It seems like CHAR is not a problem, even though BPCHAR is documented as an alias, because the planner treats BPCHAR->CHAR as a length coercion, which trims trailing spaces. And we just documented BPCHAR in v16 (0937f6d172), so the problem is about to be worse. I suppose as of v15 we could argue that BPCHAR is just an internal detail and that people shouldn't be creating columns of that type? -- Jeff Davis PostgreSQL Contributor Team - AWS
On Fri, 2022-12-02 at 10:30 +0800, Richard Guo wrote: > > I'm not familiar with the backporting policy, but I also think we > need > to as this fixes an obvious oversight. The standard I use is that backpatching something should be safe enough that users feel safer updating to a minor release (containing ~100 such fixes) than *not* updating to a minor release. The only demonstrable problem that I'm aware of[1] relies on an undocumented type (undocumented before 16, that is) combined with a non-deterministic collation and hash partitioning. The intersection of those users is likely to be the empty set. But assuming someone does use that combination of features in 15.1, and we release 15.2 with my fix in it, then the consequences could be severe: a pg_dump with default options on their bpchar- nondeterministic-collation-hash-partitioned table in 15.1 would be unrestorable in 15.2. They'd need to take a new pg_dump from 15.1 with "--load-via-partition-root" and then restore it in 15.2. Obviously that would be bad in any case, but it's more tolerable going between servers of different major versions. So, given that this fix is unlikely to help many people in 15 and earlier, and that there's a chance that the fix is really painful for someone; I'm electing not to backpatch it. [1] https://www.postgresql.org/message-id/7692740d4736e79032a5dac689cf2e304c03fa78.camel@j-davis.com
Jeff Davis <pgsql@j-davis.com> writes: > But assuming someone does use that combination of features in 15.1, and > we release 15.2 with my fix in it, then the consequences could be > severe: a pg_dump with default options on their bpchar- > nondeterministic-collation-hash-partitioned table in 15.1 would be > unrestorable in 15.2. They'd need to take a new pg_dump from 15.1 with > "--load-via-partition-root" and then restore it in 15.2. Obviously that > would be bad in any case, but it's more tolerable going between servers > of different major versions. Yeah. Also, do we have this issue with hash indexes? If so reindexing might be necessary, which again would be more palatable in a major release. > So, given that this fix is unlikely to help many people in 15 and > earlier, and that there's a chance that the fix is really painful for > someone; I'm electing not to backpatch it. +1 regards, tom lane
On Fri, 2022-12-02 at 16:49 -0500, Tom Lane wrote: > Yeah. Also, do we have this issue with hash indexes? No, it appears that hash indexes only call hashbpchar, which doesn't have the bug. Extended hash functions seem to only be used in hash partitioning, and other extended hash functions. Regards, Jeff Davis