Fix relid-set clobber during join removal.
Commit cfcd57111 et al fell over under Valgrind testing.
(It seems to be enough to #define USE_VALGRIND, you don't actually
need to run it under Valgrind to see failures.) The cause is that
remove_rel_from_eclass updates each EquivalenceMember's em_relids,
and those can be aliases of the left_relids or right_relids of some
RestrictInfo in ec_sources. If the update made em_relids empty then
bms_del_member will have pfree'd the relid set, so that the subsequent
attempt to clean up ec_sources accesses already-freed memory.
We missed seeing ill effects before cfcd57111 because (a) if the
pfree happens then we will remove the EquivalenceMember altogether,
making the source RestrictInfo no longer of use, and (b) the
cleanup of ec_sources didn't touch left/right_relids before that.
I'm unclear though on how cfcd57111 managed to pass non-USE_VALGRIND
testing. Apparently we managed to store another Bitmapset into the
freed space before trying to access it, but you'd not think that would
happen 100% of the time. I think what USE_VALGRIND changes is that it
makes list.c much more memory-hungry, so that the freed space gets
claimed by some List node before a Bitmapset can be put there.
This failure can be seen in v16, v17, and master, but oddly enough not
v18. That's because the SJE patch replaced the simple bms_del_members
calls used here with adjust_relid_set, which is careful not to
scribble on its input. But commit 20efbdffe just recently put back
the old coding and thus resurrected the problem.
Discussion: https://postgr.es/m/458729.1776724816@sss.pgh.pa.us
Backpatch-through: 16, 17, master
Branch
------
REL_16_STABLE
Details
-------
https://git.postgresql.org/pg/commitdiff/798dabe8388764a8a9979f5c91237f807cd09188
Modified Files
--------------
src/backend/optimizer/plan/analyzejoins.c | 2 ++
1 file changed, 2 insertions(+)