Thread: BUG #18125: Ordered set aggregate results (MD5 hashing) vary between postgres versions
BUG #18125: Ordered set aggregate results (MD5 hashing) vary between postgres versions
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18125 Logged by: Richard Wollhofen Email address: wollhofen@unisoftwareplus.com PostgreSQL version: 16.0 Operating system: Debian 16.0-1, Red Hat 8.5.0-10, Red Hat 8.5.0-18 Description: I tried to implement a validation script, to compare table data before and after an upgrade from Postgres 13.5 to Postgres 15.4. Following the idea described by Hans-Jürgen Schönig here https://www.cybertec-postgresql.com/en/postgresql-creating-checksums-for-tables/ I tried to generate an MD5 hash for every device_id in my raw data table. However, the hashes after the upgrade showed differences (compared to the hashes created before the upgrade) for around half the devices, while the raw data was identical. Please find below a minimum example to show the behavior. This script has to be executed with different Postgres versions, the grouped hashes will show different results for different major versions. Interestingly, the hashes are identical for versions 11,12,13,14. Tested major versions on dbfiddle: 10 - 16. However, this can only be observed, when the row number of the table is >= 79022, below this number the hashes are equal across versions. It is also independent of the number of groups in the last query. Link to test script https://dbfiddle.uk/AxZPWLur -- different hashes for Postgres 11-14 vs 15 vs 16 select version(); -- md5 aggregate sfunc CREATE FUNCTION md5_agg_sfunc(text, anyelement) RETURNS text LANGUAGE sql AS $$ SELECT md5($1 || $2::text) $$; -- md5 aggregate function CREATE AGGREGATE md5_agg (ORDER BY anyelement) ( STYPE = text, SFUNC = md5_agg_sfunc, INITCOND = '' ); -- test data. hashes change for a row number >= 79022 CREATE TABLE t_test (a int, b int); INSERT INTO t_test SELECT x, x + 10 FROM generate_series(1, 79022) AS x; -- whole table hash H1 SELECT md5_agg() WITHIN GROUP (ORDER BY t_test ASC) FROM t_test; -- ordered set grouped hashes differ across postgres versions -- grouped hash H2, all in one group: H2 identical with H1 for PG version >= 15 SELECT a % 1 AS grp, md5_agg() WITHIN GROUP (ORDER BY t_test) FROM t_test GROUP BY a % 1; -- hashes change for different postgres versions SELECT a % 3 AS grp, md5_agg() WITHIN GROUP (ORDER BY t_test) FROM t_test GROUP BY a % 3;