Re: planner missing a trick for foreign tables w/OR conditions - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: planner missing a trick for foreign tables w/OR conditions |
Date | |
Msg-id | CA+Tgmoa_bhRSVxAnu-rv1FK_0BtDB-GwDXwU6epzArFVpHABaA@mail.gmail.com Whole thread Raw |
In response to | Re: planner missing a trick for foreign tables w/OR conditions (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: planner missing a trick for foreign tables w/OR conditions
Re: planner missing a trick for foreign tables w/OR conditions |
List | pgsql-hackers |
On Tue, Dec 17, 2013 at 12:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I had been thinking it would fall down if there are several OR conditions > affecting different collections of rels, but after going through the math > again, I'm now thinking I was wrong and it does in fact work out. As you > say, we do depend on all paths generating the same rows, but since the > extracted single-rel quals are inserted as plain baserestrictinfo quals, > that'll be true. OK. > A bigger potential objection is that we're opening ourselves to larger > problems with estimation failures due to correlated qual conditions, but > again I'm finding that the math doesn't bear that out. It's reasonable > to assume that our estimate for the extracted qual will be better than > our estimate for the OR as a whole, so our adjusted size estimates for > the filtered base relations are probably OK. And the adjustment to the > OR clause selectivity means that the size estimate for the join comes > out exactly the same. We'll actually be better off than with what is > likely to happen now, which is that people manually extract the simplified > condition and insert it into the query explicitly. PG doesn't realize > that that's redundant and so will underestimate the join size. I had not thought of that, but it seems like a valid point. > So at this point I'm pretty much talked into it. We could eliminate the > dependence on indexes entirely, and replace this code with a step that > simply tries to pull single-base-relation quals out of ORs wherever it can > find one. You could argue that the produced quals would sometimes not be > worth testing for, but we could apply a heuristic that says to forget it > unless the estimated selectivity of the extracted qual is less than, > I dunno, 0.5 maybe. This is an area where I think things are very different from local and remote tables. For a local table, the cost of transmitting a row from one plan node to another is basically zero. For a remote table, it's potentially far higher, although unfortunately it's hard to know exactly. But if the qual is cheap to evaluate, and we're getting back a lot of rows, I suspect even eliminating 5-10% of them could work out to a win. With a local table, 50% sounds like a reasonable number. Another point to ponder is that there could be places where this actually changes the plan significantly for the better. Pre-filtering one side of a join might, for example, reduce the amount of data on one side to the point where a hash join is chosen over some other strategy. I don't know that this will actually help all that many people but the best case is pretty dramatic for those it does help: the partial qual might be almost as selective as the join condition itself. > (I wonder if it'd be worth inserting a check that > there's not already a manually-generated equivalent clause, too ...) Sounds a little too clever IMHO. > A very nice thing about this is we could do this step ahead of relation > size estimate setting and thus remove the redundant work that currently > happens in set_plain_rel_size when the optimization fires. Which is > another aspect of the current code that's a hack, so getting rid of it > would be a net reduction in hackiness. I'm not sure that would save anything measurable performance-wise, but the hackiness reduction would be nice to have. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: