Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | CA+TgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw@mail.gmail.com Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Andrei Lepikhov <a.lepikhov@postgrespro.ru>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
Re: POC, WIP: OR-clause support for indexes |
List | pgsql-hackers |
On Mon, Nov 27, 2023 at 3:02 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 25/11/2023 08:23, Alexander Korotkov wrote: > > I think patch certainly gets better in this aspect. One thing I can't > > understand is why do we use home-grown code for resolving > > hash-collisions. You can just define custom hash and match functions > > in HASHCTL. Even if we need to avoid repeated JumbleExpr calls, we > > still can save pre-calculated hash value into hash entry and use > > custom hash and match. This doesn't imply us to write our own > > collision-resolving code. > > Thanks, it was an insightful suggestion. > I implemented it, and the code has become shorter (see attachment). Neither the code comments nor the commit message really explain the design idea here. That's unfortunate, principally because it makes review difficult. I'm very skeptical about the idea of using JumbleExpr for any part of this. It seems fairly expensive, and it might produce false matches. If expensive is OK, then why not just use equal()? If it's not, then this probably isn't really OK either. But in any case there should be comments explaining why this strategy was chosen. The use of op_mergejoinable() seems pretty random to me. Why should we care about that? If somebody writes a<1 or a<2 or a<3 or a<4, you can transform that to a<any(array[1,2,3,4]) if you want. It might not be a good idea, but I think it's a legal transformation. The reader shouldn't be left to guess whether a rule like this was made for reasons of correctness or for reasons of efficiency or something else. Looking further, I see that the reason for this is likely that the operator for the transformation result is constructing using list_make1(makeString((char *) "=")), but trying to choose an operator based on the operator name is, I think, pretty clearly unacceptable. Tom has fired more than one hacker out of an airlock for such transgressions, and this violation seems less principled than some. The = operator that got chosen could be entirely unrelated to any operator in the original, untransformed query. It could be part of no operator class that was involved in the original query, in a different schema than the operator in the original query, and owned by a different user than the one who owned any operator referenced by the original query. I suspect that's not only incorrect but an exploitable security vulnerability. I am extremely dubious about the use of select_common_type() here. Why not do this only when the types already match exactly? Maybe the concern is unknown literals, but perhaps that should be handled in some other way. If you do this kind of thing, you need to justify why it can't fail or produce wrong answers. Honestly, it seems very hard to avoid the conclusion that this transformation is being done at too early a stage. Parse analysis is not the time to try to do query optimization. I can't really believe that there's a way to produce a committable patch along these lines. Ideally, a transformation like this should be done after we know what plan shape we're using (or considering using), so that we can make cost-based decisions about whether to transform or not. But at the very least it should happen somewhere in the planner. There's really no justification for parse analysis rewriting the SQL that the user entered. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: