Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | CAPpHfdsmmg6S8V63O3Z2j2bU9xyWd5mMaz68MfybQUY5e3iZVg@mail.gmail.com Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
|
List | pgsql-hackers |
On Fri, Oct 4, 2024 at 4:34 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Sep 23, 2024 at 7:11 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > Makes sense. Please, check the attached patch freeing the consts list > > while returning NULL from match_orclause_to_indexcol(). > > Some review comments: > > I agree with the comments already given to the effect that the patch > looks much better now. I was initially surprised to see this happening > in match_clause_to_indexcol() but after studying it I think it looks > like the right place. I think it makes sense to think about moving > forward with this, although it would be nice to get Tom's take if we > can. > > I see that the patch makes no update to the header comment for > match_clause_to_indexcol() nor to the comment just above the cascade > of if-statements. I think both need to be updated. > > More generally, many of the comments in this patch seem to just > explain what the code does, and I'd like to reiterate my usual > complaint: as far as possible, comments should explain WHY the code > does what it does. Certainly, in some cases there's nothing to be said > about that e.g. /* Lookup for operator to fetch necessary information > for the SAOP node */ isn't really saying anything non-obvious but it's > reasonable to have the comment here anyway. However, when there is > something more interesting to be said, then we should do that rather > than just reiterate what the reader who knows C can anyway see. For > instance, the lengthy comment beginning with "Iterate over OR > entries." could either be shorter and recapitulate less of the code > that follows, or it could say something more interesting about why > we're doing it like that. > > + /* We allow constant to be Const or Param */ > + if (!IsA(constExpr, Const) && !IsA(constExpr, Param)) > + break; > > This restriction is a lot tighter than the one mentioned in the header > comment of match_clause_to_indexcol ("Our definition of const is > exceedingly liberal"). If there's a reason for that, the comments > should talk about it. If there isn't, it's better to be consistent. > > + /* > + * Check operator is present in the opfamily, expression collation > + * matches index collation. Also, there must be an array type in > + * order to construct an array later. > + */ > + if (!IndexCollMatchesExprColl(index->indexcollations[indexcol], > inputcollid) || > + !op_in_opfamily(matchOpno, index->opfamily[indexcol]) || > + !OidIsValid(arraytype)) > + break; > > I spent some time wondering whether this was safe. The > IndexCollMatchesExprColl() guarantees that either the input collation > is equal to the index collation, or the index collation is 0. If the > index collation is 0 then that I *think* that guarantees that the > indexed type is non-collatable, but this could be a cross-type > comparison, and it's possible that the other type is collatable. In > that case, I don't think anything would prevent us from merging a > bunch of OR clauses with different collations into a single SAOP. I > don't really see how that could be a problem, because if the index is > of a non-collatable type, then presumably the operator doesn't care > about what the collation is, so it should all be fine, I guess? But > I'm not very confident about that conclusion. > > I'm unclear what the current thinking is about the performance of this > patch, both as to planning and as to execution. Do we believe that > this transformation is a categorical win at execution-time? In theory, > OR format alllows for short-circuit execution, but because of the > Const-or-Param restriction above, I don't think that's mostly a > non-issue. But maybe not completely, because I can see from the > regression test changes that it's possible for us to apply this > transformation when the Param is set by an InitPlan or SubPlan. If we > have something like WHERE tenthous = 1 OR tenthous = > (very_expensive_computation() + 1), maybe the patch could lose, > because we'll have to do the very expensive calculation to evaluate > the SAOP, and the OR could stop as soon as we establish that tenthous > != 1. If we only did the transformation when the Param is an external > parameter, then we wouldn't have this issue. Maybe this isn't worth > worrying about; I'm not sure. Are there any other cases where the > transformation can produce something that executes more slowly? > > As far as planning time is concerned, I don't think this is going to > be too bad, because most of the work only needs to be done if there > are OR-clauses, and my intuition is that the optimization will often > apply in such cases, so it seems alright. But I wonder how much > testing has been done of adversarial cases, e.g. lots of non-indexable > clause in the query; or lots of OR clauses in the query but all of > them turn out on inspection to be non-indexable. My expectation would > be that there's no real problem here, but it would be good to verify > that experimentally. Thank you so much for the review. I'm planning to work on all these items next week. ------ Regards, Alexander Korotkov Supabase
pgsql-hackers by date: