Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id CAPpHfdsmmg6S8V63O3Z2j2bU9xyWd5mMaz68MfybQUY5e3iZVg@mail.gmail.com
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers
On Fri, Oct 4, 2024 at 4:34 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Sep 23, 2024 at 7:11 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Makes sense.  Please, check the attached patch freeing the consts list
> > while returning NULL from match_orclause_to_indexcol().
>
> Some review comments:
>
> I agree with the comments already given to the effect that the patch
> looks much better now. I was initially surprised to see this happening
> in match_clause_to_indexcol() but after studying it I think it looks
> like the right place. I think it makes sense to think about moving
> forward with this, although it would be nice to get Tom's take if we
> can.
>
> I see that the patch makes no update to the header comment for
> match_clause_to_indexcol() nor to the comment just above the cascade
> of if-statements. I think both need to be updated.
>
> More generally, many of the comments in this patch seem to just
> explain what the code does, and I'd like to reiterate my usual
> complaint: as far as possible, comments should explain WHY the code
> does what it does. Certainly, in some cases there's nothing to be said
> about that e.g. /* Lookup for operator to fetch necessary information
> for the SAOP node */ isn't really saying anything non-obvious but it's
> reasonable to have the comment here anyway. However, when there is
> something more interesting to be said, then we should do that rather
> than just reiterate what the reader who knows C can anyway see. For
> instance, the lengthy comment beginning with "Iterate over OR
> entries." could either be shorter and recapitulate less of the code
> that follows, or it could say something more interesting about why
> we're doing it like that.
>
> + /* We allow constant to be Const or Param */
> + if (!IsA(constExpr, Const) && !IsA(constExpr, Param))
> + break;
>
> This restriction is a lot tighter than the one mentioned in the header
> comment of match_clause_to_indexcol ("Our definition of const is
> exceedingly liberal"). If there's a reason for that, the comments
> should talk about it. If there isn't, it's better to be consistent.
>
>  + /*
> + * Check operator is present in the opfamily, expression collation
> + * matches index collation.  Also, there must be an array type in
> + * order to construct an array later.
> + */
> + if (!IndexCollMatchesExprColl(index->indexcollations[indexcol],
> inputcollid) ||
> + !op_in_opfamily(matchOpno, index->opfamily[indexcol]) ||
> + !OidIsValid(arraytype))
> + break;
>
> I spent some time wondering whether this was safe. The
> IndexCollMatchesExprColl() guarantees that either the input collation
> is equal to the index collation, or the index collation is 0. If the
> index collation is 0 then that I *think* that guarantees that the
> indexed type is non-collatable, but this could be a cross-type
> comparison, and it's possible that the other type is collatable. In
> that case, I don't think anything would prevent us from merging a
> bunch of OR clauses with different collations into a single SAOP. I
> don't really see how that could be a problem, because if the index is
> of a non-collatable type, then presumably the operator doesn't care
> about what the collation is, so it should all be fine, I guess? But
> I'm not very confident about that conclusion.
>
> I'm unclear what the current thinking is about the performance of this
> patch, both as to planning and as to execution. Do we believe that
> this transformation is a categorical win at execution-time? In theory,
> OR format alllows for short-circuit execution, but because of the
> Const-or-Param restriction above, I don't think that's mostly a
> non-issue. But maybe not completely, because I can see from the
> regression test changes that it's possible for us to apply this
> transformation when the Param is set by an InitPlan or SubPlan. If we
> have something like WHERE tenthous = 1 OR tenthous =
> (very_expensive_computation() + 1), maybe the patch could lose,
> because we'll have to do the very expensive calculation to evaluate
> the SAOP, and the OR could stop as soon as we establish that tenthous
> != 1. If we only did the transformation when the Param is an external
> parameter, then we wouldn't have this issue. Maybe this isn't worth
> worrying about; I'm not sure. Are there any other cases where the
> transformation can produce something that executes more slowly?
>
> As far as planning time is concerned, I don't think this is going to
> be too bad, because most of the work only needs to be done if there
> are OR-clauses, and my intuition is that the optimization will often
> apply in such cases, so it seems alright. But I wonder how much
> testing has been done of adversarial cases, e.g. lots of non-indexable
> clause in the query; or lots of OR clauses in the query but all of
> them turn out on inspection to be non-indexable. My expectation would
> be that there's no real problem here, but it would be good to verify
> that experimentally.

Thank you so much for the review.  I'm planning to work on all these
items next week.

------
Regards,
Alexander Korotkov
Supabase



pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Should rolpassword be toastable?
Next
From: Nathan Bossart
Date:
Subject: Re: Should rolpassword be toastable?