Scott Carey <scott.carey@algonomy.com> writes: > I did not mention a few values that differ between the servers that > reproduce this, like autovacuum tuning parameters and > maintenance_work_men. adding settings to the explain gives a couple more, > unlikely to be related to the problem:
Of course your test case is controlling for work_mem, but I wonder whether temp_buffers could affect this. I think that those are only used for user-defined temp tables, not the temp files a batched hashjoin creates, but maybe I'm misremembering.
modifying temp_buffers does not affect it. The 5 systems reproducing the problem have various values for the settings above, some unset.
> While writing this, I decided to test out a few more vector extension test > cases, and discovered something new and mind boggling: : > On systems that reproduces the problem, if I create a new test database, > then test the query in that database, the problem does not occur.
That is a very strong clue. Check for property differences (e.g. with psql's "\l+" and "\drds") between the new test database and the database where you see the problem.
It is a strong clue that I don't know how to leverage. \l+ has no differences other than an access privilege.
\drds is new to me, but it reports "did not find any settings" for both the database with the problem and tne new test one without.
At this point, I wonder if there is some residual strangeness since the reproducing examples are all 'old' databases that have gone through many pg_upgrades over the years. They also 'leapt' from version 12 to 17. I suppose I could try a brand new test case on a v12 system (assuming I can find a yum repo that still has that or a system that still has the packages), then upgrade to v17 and see if that reproduces the problem.
I will probably attempt adding debug symbols and linux 'perf' first.