Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)' - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)' |
Date | |
Msg-id | CAPpHfduuEve8ezjqNEmCFuH3+fGZ2z+gzFV_xGD1ENSgqKTAZA@mail.gmail.com Whole thread Raw |
In response to | Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)' (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
|
List | pgsql-hackers |
On Wed, Nov 15, 2023 at 8:02 AM Andres Freund <andres@anarazel.de> wrote: > On 2023-11-14 19:14:57 +0800, Richard Guo wrote: > > While working on BUG #18187 [1], I noticed that we also have issues with > > how SJE replaces join clauses involving the removed rel. As an example, > > consider the query below, which would trigger an Assert. > > > > create table t (a int primary key, b int); > > > > explain (costs off) > > select * from t t1 > > inner join t t2 on t1.a = t2.a > > left join t t3 on t1.b > 1 and t1.b < 2; > > server closed the connection unexpectedly > > > > The Assert failure happens in remove_self_join_rel() when we're trying > > to remove t1. The two join clauses of t1, 't1.b > 1' and 't1.b < 2', > > share the same pointer of 'required_relids', which is {t1, t3} at first. > > After we've performed replace_varno for the first clause, the > > required_relids becomes {t2, t3}, which is no problem. However, the > > second clause's required_relids also becomes {t2, t3}, because they are > > actually the same pointer. So when we proceed with replace_varno on the > > second clause, we'd trigger the Assert. > > Good catch. > > > > Off the top of my head I'm thinking that we can fix this kind of issue > > by bms_copying the bitmapset first before we make a substitution in > > replace_relid(), like attached. > > > > Alternatively, we can revise distribute_qual_to_rels() as below so that > > different RestrictInfos don't share the same pointer of required_relids. > > > --- a/src/backend/optimizer/plan/initsplan.c > > +++ b/src/backend/optimizer/plan/initsplan.c > > @@ -2385,7 +2385,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node > > *clause, > > * nonnullable-side rows failing the qual. > > */ > > Assert(ojscope); > > - relids = ojscope; > > + relids = bms_copy(ojscope); > > Assert(!pseudoconstant); > > } > > else > > > > With this way, I'm worrying that there are other places where we should > > avoid sharing the same pointer to Bitmapset structure. > > Indeed. > > > > I'm not sure how to discover all these places. Any thoughts? > > At the very least I think we should add a mode to bitmapset.c mode where > every modification of a bitmapset reallocates, rather than just when the size > actually changes. Because we only reallocte and free in relatively uncommon > cases, particularly on 64bit systems, it's very easy to not find spots that > continue to use the input pointer to one of the modifying bms functions. > > A very hacky implementation of that indeed catches this bug with the existing > regression tests. > > The tests do *not* pass with just the attached applied, as the "Delete relid > without substitution" path has the same issue. With that also copying and all > the "reusing" bms* functions always reallocating, the tests pass - kinda. > > > The kinda because there are callers to bms_(add|del)_members() that pass the > same bms as a and b, which only works if the reallocation happens "late". +1, Neat idea. I'm willing to work on this. Will propose the patch soon. ------ Regards, Alexander Korotkov
pgsql-hackers by date: