jff: checksum algorithm is not as intended - Mailing list pgsql-hackers

From Yura Sokolov
Subject jff: checksum algorithm is not as intended
Date
Msg-id 7d018a5e73186a08f891e46fa25bdf18@postgrespro.ru
Whole thread Raw
Responses Re: jff: checksum algorithm is not as intended  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Good day.

Current checksum is not calculated in intended way and
has the flaw.

Single round function is written as:

#define CHECKSUM_COMP(checksum, value) do {\
     uint32 __tmp = (checksum) ^ (value);\
     (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 17);\
} while (0)

And looks like it was intended to be
     (checksum) = (__tmp * FNV_PRIME) ^ (__tmp >> 17);

At least original Florian Pflug suggestion were correctly written
in this way (but with shift 1):
https://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396%40phlo.org

But due to C operation precedence it is actually calculated as:
     (checksum) = __tmp * (FNV_PRIME ^ (__tmp >> 17));

It has more longer collision chains and worse: it has 256 pathological
result slots of shape 0xXX000000 each has 517 collisions in average.
Totally 132352 __tmp values are collided into this 256 slots.

That is happens due to if top 16 bits are happens to be
0x0326 or 0x0327, then `FNV_PRIME ^ (__tmp >> 17) == 0x1000000`,
and then `__tmp * 0x1000000` keeps only lower 8 bits. That means,
9 bits masked by 0x0001ff00 are completely lost.

mix(0x03260001) == mix(0x03260101) = mix(0x0327aa01) == mix(0x0327ff01)
(where mix is a `__tmp` to `checksum` transformation)

regards,
Yura Sokolov
y.sokolov@postgrespro.ru
funny.falcon@gmail.com

PS. Test program in Crystal language is attached and output for current
CHECKSUM_COMP implementation and "correct" (intended).
Excuse me for Crystal, it is prettier to write for small compiled 
scripts.
Attachment

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Schema variables - new implementation for Postgres 15
Next
From: Justin Pryzby
Date:
Subject: Re: Write visibility map during CLUSTER/VACUUM FULL