Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)' - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
Date
Msg-id CAPpHfduuEve8ezjqNEmCFuH3+fGZ2z+gzFV_xGD1ENSgqKTAZA@mail.gmail.com
Whole thread Raw
In response to Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'  (Andres Freund <andres@anarazel.de>)
Responses Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
List pgsql-hackers
On Wed, Nov 15, 2023 at 8:02 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-11-14 19:14:57 +0800, Richard Guo wrote:
> > While working on BUG #18187 [1], I noticed that we also have issues with
> > how SJE replaces join clauses involving the removed rel.  As an example,
> > consider the query below, which would trigger an Assert.
> >
> > create table t (a int primary key, b int);
> >
> > explain (costs off)
> > select * from t t1
> >    inner join t t2 on t1.a = t2.a
> >     left join t t3 on t1.b > 1 and t1.b < 2;
> > server closed the connection unexpectedly
> >
> > The Assert failure happens in remove_self_join_rel() when we're trying
> > to remove t1.  The two join clauses of t1, 't1.b > 1' and 't1.b < 2',
> > share the same pointer of 'required_relids', which is {t1, t3} at first.
> > After we've performed replace_varno for the first clause, the
> > required_relids becomes {t2, t3}, which is no problem.  However, the
> > second clause's required_relids also becomes {t2, t3}, because they are
> > actually the same pointer.  So when we proceed with replace_varno on the
> > second clause, we'd trigger the Assert.
>
> Good catch.
>
>
> > Off the top of my head I'm thinking that we can fix this kind of issue
> > by bms_copying the bitmapset first before we make a substitution in
> > replace_relid(), like attached.
> >
> > Alternatively, we can revise distribute_qual_to_rels() as below so that
> > different RestrictInfos don't share the same pointer of required_relids.
>
> > --- a/src/backend/optimizer/plan/initsplan.c
> > +++ b/src/backend/optimizer/plan/initsplan.c
> > @@ -2385,7 +2385,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node
> > *clause,
> >          * nonnullable-side rows failing the qual.
> >          */
> >         Assert(ojscope);
> > -       relids = ojscope;
> > +       relids = bms_copy(ojscope);
> >         Assert(!pseudoconstant);
> >     }
> >     else
> >
> > With this way, I'm worrying that there are other places where we should
> > avoid sharing the same pointer to Bitmapset structure.
>
> Indeed.
>
>
> > I'm not sure how to discover all these places.  Any thoughts?
>
> At the very least I think we should add a mode to bitmapset.c mode where
> every modification of a bitmapset reallocates, rather than just when the size
> actually changes. Because we only reallocte and free in relatively uncommon
> cases, particularly on 64bit systems, it's very easy to not find spots that
> continue to use the input pointer to one of the modifying bms functions.
>
> A very hacky implementation of that indeed catches this bug with the existing
> regression tests.
>
> The tests do *not* pass with just the attached applied, as the "Delete relid
> without substitution" path has the same issue. With that also copying and all
> the "reusing" bms* functions always reallocating, the tests pass - kinda.
>
>
> The kinda because there are callers to bms_(add|del)_members() that pass the
> same bms as a and b, which only works if the reallocation happens "late".

+1,
Neat idea. I'm willing to work on this. Will propose the patch soon.

------
Regards,
Alexander Korotkov



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
Next
From: Tom Lane
Date:
Subject: Re: Some performance degradation in REL_16 vs REL_15