Re: Hash aggregate collisions cause excessive spilling - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Hash aggregate collisions cause excessive spilling
Date
Msg-id nozry5gmmyddh3qq3o6x7ofqkbcrzfijd5s266polwaa7gddhz@kybgvdlkglzl
Whole thread
In response to Re: Hash aggregate collisions cause excessive spilling  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2026-02-19 14:01:15 -0500, Andres Freund wrote:
> The hash functions of some types are ... peculiar (partially due to
> cross-type hashjoin support).

In case you want a reference for said peculiarity, look at this:

=# SELECT lower, upper, lower | upper as V, hashint8(lower | upper) FROM (SELECT (i) as lower, ((i)::int8 << 32) as
upperFROM generate_series(0, 5) g(i));
 
┌───────┬─────────────┬─────────────┬────────────┐
│ lower │    upper    │      v      │  hashint8  │
├───────┼─────────────┼─────────────┼────────────┤
│     0 │           0 │           0 │ -272711505 │
│     1 │  4294967296 │  4294967297 │ -272711505 │
│     2 │  8589934592 │  8589934594 │ -272711505 │
│     3 │ 12884901888 │ 12884901891 │ -272711505 │
│     4 │ 17179869184 │ 17179869188 │ -272711505 │
│     5 │ 21474836480 │ 21474836485 │ -272711505 │
└───────┴─────────────┴─────────────┴────────────┘
(6 rows)

Which obviously could have some ... fun consequences.


Whoever chose to implement the cross type compatibility by
a) using int4 as the hash "domain", instead of always hashing 8 bytes
b) implementing int8 by doubling up the high 32bits into the lower 32 bits,
   instead of hashint4 iff the value value fits into 32bits and the full 8
   bytes otherwise

did us really no service.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: assume availability of "inline" keyword
Next
From: Robert Haas
Date:
Subject: Re: Consider low startup cost in add_partial_path