Re: Thinking about IN/EXISTS optimization - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Thinking about IN/EXISTS optimization |
Date | |
Msg-id | 200210230238.g9N2cVe07179@candle.pha.pa.us Whole thread Raw |
In response to | Thinking about IN/EXISTS optimization (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
This sounds like one of those classic optimizer problems we have had to deal with in the past. I suggest you go through the optimizer pass and set a boolean in Query whenever you do something that may require another loop through, then at the end, you check the boolean and loop if required. I think the rules system has to do something similar. I don't see any way around that, but because you are setting the boolean you only loop when you need to. --------------------------------------------------------------------------- Tom Lane wrote: > I've been thinking about how to convert "x IN (subselect)" and EXISTS > constructs into join-like processing, and I've run into a small problem > in getting the planner to do it nicely. The issue is that I need to > take the subselect and push it into the jointree --- essentially, make > it look like a subselect-in-FROM --- so that the join planner can deal > with it. Basically, I need to rearrange > > SELECT ... FROM ... WHERE ... AND x IN (SELECT y FROM ...) > > into > > SELECT ... FROM ..., (SELECT y FROM ...) ss > WHERE ... AND x =* ss.y > > where =* represents some specially-marked RestrictInfo node. (NOT IN is the > same except that the RestrictInfo node will be marked differently.) > > The difficulty is that there's no good place to do this in > subquery_planner(). We should push the subselect into FROM before we > run the pull_up_subqueries() and preprocess_jointree() operations; > if we don't pull up the subselect into the main query then we won't have > accomplished very much. But the WHERE clause isn't simplified into a > form that makes it easy to spot top-level IN() expressions until after > that. We can't simply switch the order of the subselect and > WHERE-clause processing, because pulling up subqueries typically adds > conditions to the WHERE clause. > > I haven't been able to think of a solution to this that doesn't involve > wasting a lot of cycles by repeating some of these processing steps, > or missing some optimization possibilities. (For example, if we pull up > a subquery that came from a view, it might contain an IN where-clause, > which ideally we'd want to be able to optimize. It almost seems like > we need to be able to loop around the whole operation; but most of the > time this will just waste cycles.) > > Anyone see a nice way to do this? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: