BUG #17746: Partitioning by hash of a text depends on icu version when text collation is not deterministic - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17746: Partitioning by hash of a text depends on icu version when text collation is not deterministic
Date
Msg-id 17746-83bc7dd7e202dc70@postgresql.org
Whole thread Raw
Responses Re: BUG #17746: Partitioning by hash of a text depends on icu version when text collation is not deterministic
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17746
Logged by:          Andrew Bille
Email address:      andrewbille@gmail.com
PostgreSQL version: 15.1
Operating system:   Ubuntu 20.04
Description:

Hello
I build two postgres (REL_15_STABLE) with different ICU versions (60.3 and
66.1)

In "old-icu-server" I do (exerpt from regression test collate.icu.utf8):

CREATE COLLATION coll (provider = icu, deterministic = false, locale =
'@colStrength=secondary');
CREATE TABLE test (
    a integer,
    b text COLLATE coll
)
PARTITION BY HASH (b);
CREATE TABLE test_0 (
    a integer,
    b text COLLATE coll
);
CREATE TABLE test_1 (
    a integer,
    b text COLLATE coll
);
ALTER TABLE ONLY test ATTACH PARTITION test_0 FOR VALUES WITH (modulus 2,
remainder 0);
ALTER TABLE ONLY test ATTACH PARTITION test_1 FOR VALUES WITH (modulus 2,
remainder 1);
INSERT INTO test VALUES (1, 'def');
SELECT tableoid::regclass::text as part, hashtext(b) as hash, * FROM test;
EXPLAIN SELECT * FROM test WHERE b='def';
SELECT * FROM test WHERE b='def';

and recieve:

CREATE COLLATION
CREATE TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLE
ALTER TABLE
INSERT 0 1
  part  |    hash    | a |  b  
--------+------------+---+-----
 test_0 | -493883174 | 1 | def
(1 row)

                         QUERY PLAN                          
-------------------------------------------------------------
 Seq Scan on test_0 test  (cost=0.00..25.88 rows=6 width=36)
   Filter: (b = 'def'::text)
(2 rows)
 a |  b  
---+-----
 1 | def
(1 row)

Then I stop the server and start this cluster with "new-icu-binaries"
(simulating OS upgrade) and do:
SELECT tableoid::regclass::text as part, hashtext(b) as hash, * FROM test;
EXPLAIN SELECT * FROM test WHERE b='def';
SELECT * FROM test WHERE b='def';

I got the following error and incorrect SELECT result:

WARNING:  collation "coll" has version mismatch
DETAIL:  The collation in the database was created using version 153.80, but
the operating system provides version 153.14.
HINT:  Rebuild all objects affected by this collation and run ALTER
COLLATION public.coll REFRESH VERSION, or build PostgreSQL with the right
library version.
  part  |    hash    | a |  b  
--------+------------+---+-----
 test_0 | 2087909916 | 1 | def
(1 row)

                        QUERY PLAN                          
-------------------------------------------------------------
 Seq Scan on test_1 test  (cost=0.00..25.88 rows=6 width=36)
   Filter: (b = 'def'::text)
(2 rows)

 a | b 
---+---
(0 rows)

---
If I make a dump in the "old-icu-server" and try to upload to the
"new-icu-server", then I get another failure:

psql:dump.sql:89: ERROR:  new row for relation "test_0" violates partition
constraint
DETAIL:  Failing row contains (1, def).
CONTEXT:  COPY test_0, line 1: "1       def"
COPY 0

This case discovered when trying to restore on OracleLinux 8 a dump produced
on Ubuntu 20.04 by the modified sql/collate.icu.utf8.sql

What bothers me is that partitioning depends on the hash that can be
computed differently with the OS upgrade/migration. Also I was surprised not
to find a description of the hashtext()/hashtextextended() in the docs.

Thanks!


pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: BUG #17745: RETURNING after INSERT does not return the same structure of response
Next
From: Alex Richman
Date:
Subject: Re: Logical Replica ReorderBuffer Size Accounting Issues