Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | fa5f57e1-4051-93d8-5f54-f1e2f4587cc2@yandex.ru Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
|
List | pgsql-hackers |
Hi! Thank you for your research, I'm sure it will help me to fix the problem of calculating selectivity faster) I'm sorry I didn't answer right away, to be honest, I had a full diary of urgent matters at work. For this reason, I didn't have enough time to work on this patch properly. > The optimizer will itself do a limited form of "normalizing to CNF". > Are you familiar with extract_restriction_or_clauses(), from > orclauses.c? Comments above the function have an example of how this > can work: > > * Although a join clause must reference multiple relations overall, > * an OR of ANDs clause might contain sub-clauses that reference just one > * relation and can be used to build a restriction clause for that rel. > * For example consider > * WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45)); > * We can transform this into > * WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45)) > * AND (a.x = 42 OR a.x = 44) > * AND (b.y = 43 OR b.z = 45); > * which allows the latter clauses to be applied during the scans of a and b, > * perhaps as index qualifications, and in any case reducing the number of > * rows arriving at the join. In essence this is a partial transformation to > * CNF (AND of ORs format). It is not complete, however, because we do not > * unravel the original OR --- doing so would usually bloat the qualification > * expression to little gain. This is an interesting feature. I didn't notice this function before, I studied many times consider_new_or_cause, which were called there. As far as I know, there is a selectivity calculation going on there, but as far as I remember, I called it earlier after my conversion, and unfortunately it didn't solve my problem with calculating selectivity. I'll reconsider it again, maybe I can find something I missed. > Of course this immediately makes me wonder: shouldn't your patch be > able to perform an additional transformation here? You know, by > transforming "a.x = 42 OR a.x = 44" into "a IN (42, 44)"? Although I > haven't checked for myself, I assume that this doesn't happen right > now, since your patch currently performs all of its transformations > during parsing. > > I also noticed that the same comment block goes on to say something > about "clauselist_selectivity's inability to recognize redundant > conditions". Perhaps that is relevant to the problems you were having > with selectivity estimation, back when the code was in > preprocess_qual_conditions() instead? I have no reason to believe that > there should be any redundancy left behind by your transformation, so > this is just one possibility to consider. > Separately, the commit message of commit 25a9e54d2d says something > about how the planner builds RestrictInfos, which seems > possibly-relevant. That commit enhanced extended statistics for OR > clauses, so the relevant paragraph describes a limitation of extended > statistics with OR clauses specifically. I'm just guessing, but it > still seems like it might be relevant to the problem you ran into with > selectivity estimation. Another possibility to consider. I understood what is said about AND clauses in this comment. It seems to me that AND clauses saved like (BoolExpr *) expr->args->(RestrictInfo *) clauseA->(RestrictInfo *)clauseB lists and OR clauses saved like (BoolExpr *) expr -> orclause->(RestrictInfo *)clause A->(RestrictInfo *)clause B. As I understand it, selectivity is calculated for each expression. But I'll exploring it deeper, because I think this place may contain the answer to the question, what's wrong with selectivity calculation in my patch. > BTW, I sometimes use RR to help improve my understanding of the planner: > > https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Recording_Postgres_using_rr_Record_and_Replay_Framework > The planner has particularly complicated control flow, which has > unique challenges -- just knowing where to begin can be difficult > (unlike most other areas). I find that setting watchpoints to see when > and where the planner modifies state using RR is far more useful than > it would be with regular GDB. Once I record a query, I find that I can > "map out" what happens in the planner relatively easily. Thank you for sharing this source! I didn't know about this before, and it will definitely make my life easier to understand the optimizer. I understand what you mean, and I researched the optimizer in a similar way through gdb and looked at the comments and code in postgresql. This is a complicated way and I didn't always understand correctly what this variable was doing in this place, and this created some difficulties for me. So, thank you for the link! > Many interesting cases won't get SAOP transformation from the patch, > simply because of the or_transform_limit GUC's default of 500. I don't > think that that design makes too much sense. It made more sense back > when the focus was on expression evaluation overhead. But that's only > one of the benefits that we now expect from the patch, right? So it > seems like something that should be revisited soon. > > I'm not suggesting that there is no need for some kind of limit. But > it seems like a set of heuristics might be a better approach. Although > I would like to get a better sense of the costs of the transformation > to be able to say too much more. Yes, this may be revised in the future after some transformations. Initially, I was solving the problem described here [0]. So, after testing [1], I come to the conclusion that 500 is the ideal value for or_transform_limit. [0] https://www.postgresql.org/message-id/919bfbcb-f812-758d-d687-71f89f0d9a68%40postgrespro.ru [1] https://www.postgresql.org/message-id/6b97b517-f36a-f0c6-3b3a-0cf8cfba220c%40yandex.ru -- Regards, Alena Rybakina Postgres Professional
pgsql-hackers by date: