Indeed, good catch. I was generating the test data from an older prod data copy
and not a more recent one. In any case, the performance was fine on that same
copy on 14/15 and got bad on 16.
I just re-ran it with a larger database (and also replaced the gzipped SQL file
from my initial message with the latest one).
PG14: https://explain.depesz.com/s/ysdJ
PG16, 4M: massive cpu + disk usage and thus aborted after a few seconds
PG16, 32M: https://explain.depesz.com/s/mYiY
Cheers,
Adrian
> Hi,
> On 2026-04-02 16:06:27 +0200, Adrian Mönnich wrote:
>> thanks a lot, I just tried with work_mem set to 128MB on PG16 and it worked fine:
>> https://explain.depesz.com/s/7Zan
>>
>> Likewise on PG18:
>> https://explain.depesz.com/s/H15B
>>
>> And with enable_memoize=0 (PG18, 128MB):
>> https://explain.depesz.com/s/SaVI
> That's good.
>> So increasing work_mem seems like a good workaround for when we upgrade
>> our production DB. But I guess there's still a but somewhere that results to the
>> wrong estimate?
> I don't even know if it's a misestimate that didn't happen in the earlier
> versions - the join order is different in 14 than it's in the later ones. I
> don't know why that is at this point.
> This means that we don't know if 14 would have had the same misestimation if
> the same join order had been chosen.
> There also seem to be some data differences:
> 14: https://explain.depesz.com/s/17Fp#source
> -> Parallel Seq Scan on contributions contributions_1
> (cost=0.00..164891.13 rows=2687413 width=5) (actual time=0.013..454.721 rows=2143186 loops=3)
> 16: https://explain.depesz.com/s/7Zan
> -> Parallel Seq Scan on contributions contributions_1
> (cost=0.00..37776.28 rows=1643228 width=5) (actual time=0.081..78.499 rows=1314582.00 loops=3)
> That's a pretty substantial difference in the number of rows.
> Greetings,
> Andres Freund