Thread: WIP Join Removal
As discussed on 26 June, "Join Removal/Vertical Partitioning", here's a patch to remove joins in certain circumstances. Tested and blind reviewed, but this is complex and subtle enough I welcome and expect your comments on corner cases and missed complications. (Lord knows, I've been down a few blind alleys writing it to date...) Patch works, but there's a bit I haven't finished yet - checking unique indexes. So patch is marked WIP for now. Depending upon how long commitfest lasts I may have a fully working version. (Yes, I know its only 1 hours work, but haven't worked out how yet...) -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Attachment
Simon Riggs wrote: > Patch works, but there's a bit I haven't finished yet - checking unique > indexes. Did plan invalidation make it safe to rely on the presence of a unique index for planning decisions? Couldn't we also do join removal for inner joins, when there's a foreign key reference that enforces that there's one and only one matching tuple in the removed table: SELECT child.data FROM child, parent WHERE child.fkey = parent.pkey ? > + /* > + * We can now remove join by pulling up child plan from the keeprel. > + * This needs to be done considering costs, since its possible for > + * a nested inner indexscan plan to be cheaper. So it isn't > + * always desirable to remove the join. Can you elaborate that a bit? I can't imagine a case where we wouldn't want to remove a join, when we know we can. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2008-09-01 at 22:23 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > Patch works, but there's a bit I haven't finished yet - checking unique > > indexes. > > Did plan invalidation make it safe to rely on the presence of a unique > index for planning decisions? My understanding was "Yes" and this case was the specific reason I originally wanted to pursue plan invalidation back in 2006. > Couldn't we also do join removal for inner joins, when there's a foreign > key reference that enforces that there's one and only one matching tuple > in the removed table: > > SELECT child.data FROM child, parent WHERE child.fkey = parent.pkey Hmm, I had thought this was the same case, but the inner join possibility wasn't something I'd seen. Guess that flaw shows this is all original thought - I'll go back and read that optimizer blog again... I agree it will work. We would need to replace the join condition with an alteration of the original quals on child so that we add "AND child.fkey is not null". Which would mean we would need to re-plan the access to that base relation so we picked up the new qual and potentially used an index for it as well. That would be possible only if the join condition exactly matches the FK constraint. Hmm, will think about that, its certainly not an easy addition, for me. I'll concentrate on getting this patch finished and committed first. > > + /* > > + * We can now remove join by pulling up child plan from the keeprel. > > + * This needs to be done considering costs, since its possible for > > + * a nested inner indexscan plan to be cheaper. So it isn't > > + * always desirable to remove the join. > > Can you elaborate that a bit? I can't imagine a case where we wouldn't > want to remove a join, when we know we can. Neither could I when I first looked at this. It turns out that a join like this select a.col2 from a left outer join b on a.col1 = b.col1 where b.col2 = 1; can be cheaper if we don't remove the join, when there is an index on a.col1 and b.col2, because the presence of b allows the values returned from b to be used for an index scan on a. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs wrote: > It turns out that a join like this > > select a.col2 > from a left outer join b on a.col1 = b.col1 > where b.col2 = 1; > > can be cheaper if we don't remove the join, when there is an index on > a.col1 and b.col2, because the presence of b allows the values returned > from b to be used for an index scan on a. Umm, you *can't* remove that join. Because of the condition "b.col2 = 1", which implies that "b.col1 IS NOT NULL", that's actually equal to: select a.col2 from a inner join b on a.col1 = b.col1 where b.col2 = 1; -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2008-09-02 at 13:20 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > It turns out that a join like this > > > > select a.col2 > > from a left outer join b on a.col1 = b.col1 > > where b.col2 = 1; > > > > can be cheaper if we don't remove the join, when there is an index on > > a.col1 and b.col2, because the presence of b allows the values returned > > from b to be used for an index scan on a. > > Umm, you *can't* remove that join. Yes, you can. The presence or absence of rows in b is not important to the result of the query because of the "left outer join". I spent nearly a whole day going down that deadend also. > Because of the condition "b.col2 = > 1", which implies that "b.col1 IS NOT NULL", No it doesn't, but as above, it is irrelevant anyway. > that's actually equal to: > select a.col2 > from a inner join b on a.col1 = b.col1 > where b.col2 = 1; -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs wrote: > On Tue, 2008-09-02 at 13:20 +0300, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> It turns out that a join like this >>> >>> select a.col2 >>> from a left outer join b on a.col1 = b.col1 >>> where b.col2 = 1; >>> >>> can be cheaper if we don't remove the join, when there is an index on >>> a.col1 and b.col2, because the presence of b allows the values returned >>> from b to be used for an index scan on a. >> Umm, you *can't* remove that join. > > Yes, you can. The presence or absence of rows in b is not important to > the result of the query because of the "left outer join". > > I spent nearly a whole day going down that deadend also. Oh. How does the query look like after removing the join, then? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2008-09-02 at 13:41 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Tue, 2008-09-02 at 13:20 +0300, Heikki Linnakangas wrote: > >> Simon Riggs wrote: > >>> It turns out that a join like this > >>> > >>> select a.col2 > >>> from a left outer join b on a.col1 = b.col1 > >>> where b.col2 = 1; > >>> > >>> can be cheaper if we don't remove the join, when there is an index on > >>> a.col1 and b.col2, because the presence of b allows the values returned > >>> from b to be used for an index scan on a. > >> Umm, you *can't* remove that join. > > > > Yes, you can. The presence or absence of rows in b is not important to > > the result of the query because of the "left outer join". > > > > I spent nearly a whole day going down that deadend also. > > Oh. How does the query look like after removing the join, then? Same answer, just slower. Removing the join makes the access to a into a SeqScan, whereas it was a two-table index plan when both tables present. The two table plan is added by the immediately preceding call add_... - i.e. that plan is only added during join time not during planning of base relations. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs wrote: > On Tue, 2008-09-02 at 13:41 +0300, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> On Tue, 2008-09-02 at 13:20 +0300, Heikki Linnakangas wrote: >>>> Simon Riggs wrote: >>>>> It turns out that a join like this >>>>> >>>>> select a.col2 >>>>> from a left outer join b on a.col1 = b.col1 >>>>> where b.col2 = 1; >>>>> >>>>> can be cheaper if we don't remove the join, when there is an index on >>>>> a.col1 and b.col2, because the presence of b allows the values returned >>>>> from b to be used for an index scan on a. >>>> Umm, you *can't* remove that join. >>> Yes, you can. The presence or absence of rows in b is not important to >>> the result of the query because of the "left outer join". >>> >>> I spent nearly a whole day going down that deadend also. >> Oh. How does the query look like after removing the join, then? > > Same answer, just slower. Removing the join makes the access to a into a > SeqScan, whereas it was a two-table index plan when both tables present. > The two table plan is added by the immediately preceding call add_... - > i.e. that plan is only added during join time not during planning of > base relations. I mean, can you how me an SQL query of what's left after removing the join? Certainly just removing the join and the WHERE clause doesn't give the same answer. Or is it something that can't be expressed with SQL? What's the filter in the SeqScan? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2008-09-02 at 14:03 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Tue, 2008-09-02 at 13:41 +0300, Heikki Linnakangas wrote: > >> Simon Riggs wrote: > >>> On Tue, 2008-09-02 at 13:20 +0300, Heikki Linnakangas wrote: > >>>> Simon Riggs wrote: > >>>>> It turns out that a join like this > >>>>> > >>>>> select a.col2 > >>>>> from a left outer join b on a.col1 = b.col1 > >>>>> where b.col2 = 1; > >>>>> > >>>>> can be cheaper if we don't remove the join, when there is an index on > >>>>> a.col1 and b.col2, because the presence of b allows the values returned > >>>>> from b to be used for an index scan on a. > >>>> Umm, you *can't* remove that join. > >>> Yes, you can. The presence or absence of rows in b is not important to > >>> the result of the query because of the "left outer join". > >>> > >>> I spent nearly a whole day going down that deadend also. > >> Oh. How does the query look like after removing the join, then? > > > > Same answer, just slower. Removing the join makes the access to a into a > > SeqScan, whereas it was a two-table index plan when both tables present. > > The two table plan is added by the immediately preceding call add_... - > > i.e. that plan is only added during join time not during planning of > > base relations. > > I mean, can you how me an SQL query of what's left after removing the > join? Certainly just removing the join and the WHERE clause doesn't give > the same answer. Yes, it does select a.col2 from a left outer join b on a.col1 = b.col1 where b.col2 = 1; is logically equivalent to select a.col2 from a; and hence removing the join produces a SeqScan plan, whereas the equivalent join can in some circumstances be faster. I discovered this, I didn't think of it in advance. > Or is it something that can't be expressed with SQL? > What's the filter in the SeqScan? There is no filter in the SeqScan. Try some queries and you'll see what I mean. I've said its a dead end and that I spent hours thinking that, so please think about this... -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > Same answer, just slower. Removing the join makes the access to a into a > SeqScan, whereas it was a two-table index plan when both tables present. > The two table plan is added by the immediately preceding call add_... - > i.e. that plan is only added during join time not during planning of > base relations. Perhaps it would clearer to discuss a non-outer join here: select invoices.* from customer join invoices using (company_id,customer_id) where customer_id = ? where there's a foreign key relation guaranteeing that every invoice has a matching <company_id, customer_id>. If there's an index on customer(customer_id) but not on invoices(customer_id) then conceivably it would be faster to use that than scan all of the invoices. I wonder if it would be more worthwhile to remove them and have a subsequent phase where we look for possible joins to *add*. So even if the user writes "select * from invoices where customer_id=?" the planner might be able to discover that it can find those records quicker by scanning customer, finding the matching <company_id,customer_id> and then using an index to look them up in invoices. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Simon Riggs wrote: > select a.col2 > from a left outer join b on a.col1 = b.col1 > where b.col2 = 1; > > is logically equivalent to > > select a.col2 > from a; No, it's not: postgres=# CREATE TABLE a (col1 int4, col2 int4); CREATE TABLE postgres=# CREATE TABLE b (col1 int4, col2 int4); CREATE TABLE postgres=# INSERT INTO a VALUES (1,1); INSERT 0 1 postgres=# select a.col2 from a; col2 ------ 1 (1 row) postgres=# select a.col2 from a left outer join b on a.col1 = b.col1 where b.col2 = 1; col2 ------ (0 rows) But anyway, Greg's example looks valid, and proves the point that removing a join isn't always a win. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2008-09-02 at 10:41 +0100, Simon Riggs wrote: > On Mon, 2008-09-01 at 22:23 +0300, Heikki Linnakangas wrote: > > Couldn't we also do join removal for inner joins, when there's a foreign > > key reference that enforces that there's one and only one matching tuple > > in the removed table: > > > > SELECT child.data FROM child, parent WHERE child.fkey = parent.pkey > > Hmm, I had thought this was the same case, but the inner join > possibility wasn't something I'd seen. Guess that flaw shows this is all > original thought - I'll go back and read that optimizer blog again... > > I agree it will work. > > We would need to replace the join condition with an alteration of the > original quals on child so that we add "AND child.fkey is not null". > Which would mean we would need to re-plan the access to that base > relation so we picked up the new qual and potentially used an index for > it as well. That would be possible only if the join condition exactly > matches the FK constraint. Also, note that when we do an inner join we must not have any qualification of the rows on the checkrel. Any qualification that removes rows will alter the answer. This is a direct contrast to the left outer join case where the presence or absence of a single table qualification on the checkrel has *no effect* on the results of the query (as long as the qual is immutable). -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Tue, 2008-09-02 at 12:05 +0100, Gregory Stark wrote: > I wonder if it would be more worthwhile to remove them and have a > subsequent phase where we look for possible joins to *add*. So even if > the user writes > "select * from invoices where customer_id=?" the planner might be able > to discover that it can find those records quicker by scanning > customer, finding the matching <company_id,customer_id> and then using > an index to look them up in invoices. That's a good idea. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Tue, 2008-09-02 at 14:20 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > select a.col2 > > from a left outer join b on a.col1 = b.col1 > > where b.col2 = 1; > > > > is logically equivalent to > > > > select a.col2 > > from a; > > No, it's not: > > postgres=# CREATE TABLE a (col1 int4, col2 int4); > CREATE TABLE > postgres=# CREATE TABLE b (col1 int4, col2 int4); > CREATE TABLE > postgres=# INSERT INTO a VALUES (1,1); > INSERT 0 1 > postgres=# select a.col2 from a; > col2 > ------ > 1 > (1 row) > > postgres=# select a.col2 from a left outer join b on a.col1 = b.col1 > where b.col2 = 1; > col2 > ------ > (0 rows) You raise an interesting and important point that shows an error of mine. Notice that select a.col2 from a left outer join b on a.col1 = b.col1 *and* b.col2 = 1; can be re-written as select a.col2 from a; whereas select a.col2 from a left outer join b on a.col1 = b.col1 where b.col2 = 1; cannot, as you show. It seems I wrote my original tests using "and" instead of "where" and hadn't noticed the distinction. Thanks for helping me catch that error. I will put back the code that looks for an empty filter condition on the checkrel. That day was not wasted after all. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Tue, 2008-09-02 at 12:05 +0100, Gregory Stark wrote: > I wonder if it would be more worthwhile to remove them and have a subsequent > phase where we look for possible joins to *add*. So even if the user writes > "select * from invoices where customer_id=?" the planner might be able to > discover that it can find those records quicker by scanning customer, finding > the matching <company_id,customer_id> and then using an index to look them up > in invoices. This seems a less useful idea now just simply because it is such a special case. We would need to have a case where we have a table A that does not have an index on a specific column, yet table B does have an index on the specific column. But also when A references B as a foreign key and where the column is a subset of the columns of the primary key of B. That means only queries like select ... from a where a.col2 = x; can be transformed into select ... from a join b on (foreign key cols) where a.col2 = x; and then because a.col2 is a subset of foreign key columns we can infer that b.col2 = x. So the pre-conditions for this to be useful are: * constraint on subset of a FK * subset of FK is indexed on B * subset of FK is not indexed on A Which doesn't seem that likely to occur. Thanks both to Heikki and Greg for good, fast input on this patch. Nothing more needed now while I rework patch. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Gregory Stark wrote: > I wonder if it would be more worthwhile to remove them and have a subsequent > phase where we look for possible joins to *add*. So even if the user writes > "select * from invoices where customer_id=?" the planner might be able to > discover that it can find those records quicker by scanning customer, finding > the matching <company_id,customer_id> and then using an index to look them up > in invoices. Yeah, that would be cool. The question is whether it's worth the additional overhead in planner, compared to the gain in the rare case that it's applicable. That's always the thing with planner tricks like this. I think we'll eventually need some sort of tuning knob to control how hard the planner tries to apply different optimizations like that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > It seems I wrote my original tests using "and" instead of "where" and > hadn't noticed the distinction. Thanks for helping me catch that error. Ah, yeah, that's a big difference. Proving correctness is hard, but to refute something you need just one test case that fails ;-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, 2008-09-01 at 22:23 +0300, Heikki Linnakangas wrote: >> Did plan invalidation make it safe to rely on the presence of a unique >> index for planning decisions? > My understanding was "Yes" and this case was the specific reason I > originally wanted to pursue plan invalidation back in 2006. Yeah, it should work. The theory is that any schema change that could affect planning should result in broadcasting a relcache inval message for the table (not just the index, note). I'm pretty confident that that works for index addition and removal (cf index_update_stats and index_drop). There might be some situations where we need to force a relcache inval but don't currently do so --- constraint addition/removal for instance I'm not too sure about. But that would represent an easily fixable bug. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: >> Oh. How does the query look like after removing the join, then? > Same answer, just slower. Removing the join makes the access to a into a > SeqScan, whereas it was a two-table index plan when both tables present. I don't really believe this: please show an actual case where the join would be faster. AFAICS, in the outer-join examples, it is not possible for a join to enable some kind of indexscan on the outer table, because by definition an outer join excludes none of the left-hand rows. So a seqscan on the outer is optimal. I also find all the worry about generating other plans for the inner relation to be off the mark. You're not going to *use* any plan for the inner rel, so who cares what plans it has? regards, tom lane
On Tue, 2008-09-02 at 12:02 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > >> Oh. How does the query look like after removing the join, then? > > > Same answer, just slower. Removing the join makes the access to a into a > > SeqScan, whereas it was a two-table index plan when both tables present. > > I don't really believe this... Yes, this will be changed. No need to review further at this stage. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Tue, 2008-09-02 at 17:03 +0100, Gregory Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > Simon Riggs <simon@2ndQuadrant.com> writes: > >> On Mon, 2008-09-01 at 22:23 +0300, Heikki Linnakangas wrote: > >>> Did plan invalidation make it safe to rely on the presence of a unique > >>> index for planning decisions? > > > >> My understanding was "Yes" and this case was the specific reason I > >> originally wanted to pursue plan invalidation back in 2006. > > It may be worth considering what other cases... On other patches, yes. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Tom Lane <tgl@sss.pgh.pa.us> writes: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Mon, 2008-09-01 at 22:23 +0300, Heikki Linnakangas wrote: >>> Did plan invalidation make it safe to rely on the presence of a unique >>> index for planning decisions? > >> My understanding was "Yes" and this case was the specific reason I >> originally wanted to pursue plan invalidation back in 2006. It may be worth considering what other cases might need this info and taking them into account to be sure the solution is usable for them too. I suspect we'll probably need a generic function for determining whether a PathKey list can be proved unique. Other cases off the top of three other cases where this could be useful -- but generally anywhere the planner introduces a Unique node could benefit from looking at this. a) Turn a UNION into UNION ALL if there are unique indexes for any column in each side and at least one column is a constant in each side and none of the constants are equal. b) Remove the aggregate on IN subqueries when there's a unique constraint so that: SELECT * from a where a.fk IN (select pk FROM b) Can do a semijoin without taking care to avoid duplicating records in "a" if there should be duplicate values of "pk" in "b". c) Turn bad mysqlish queries which are really semijoins (used to work around their historic lack of subqueries) such as: SELECT DISTINCT a.pk FROM a JOIN b USING (x) into SELECT a.pk FROM a WHERE x IN (SELECT x FROM b) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Simon Riggs <simon@2ndQuadrant.com> writes: > As discussed on 26 June, "Join Removal/Vertical Partitioning", here's a > patch to remove joins in certain circumstances. Some points not made in the thread so far: > + if (checkrel->rtekind != RTE_RELATION) > + return; This isn't right, or at least isn't sufficient, because rtekind is undefined for joinrels. I think it should be something like /* * Since we can't prove anything unless keeprel has a unique * index, give up immediately if it's a join or not a table. */ if (checkrel->reloptkind == RELOPT_JOINREL || checkrel->rtekind != RTE_RELATION) return; Maybe you should also check for checkrel->indexlist == NIL here, since there's no point in doing the attr_needed bookkeeping if no indexes. > + /* > + * Check that the query targetlist does not contain any non-junk > + * target entries for the relation. If it does, we cannot remove join. > + */ > + if (checkrel->min_attr > 0) > + minattr = checkrel->min_attr; > + > + for (attrno = minattr; attrno < checkrel->max_attr; attrno++) > + { > + if (!bms_is_empty(checkrel->attr_needed[attrno])) > + return; > + } This part seems pretty wrong. In the first place, you mustn't discriminate against system columns like that. (Consider for instance that the only attr being used from the inner rel is its OID column.) You're failing to check max_attr too --- I believe the loop ought to run from min_attr to max_attr inclusive. In the second place, I don't see how the routine ever succeeds at all if it insists on attr_needed being empty, because whatever attrs are used in the join condition will surely have nonempty attr_needed. What you want is to see whether there are any attrs that are needed *above the join*, which would be something like if (!bms_is_subset(checkrel->attr_needed[attrno], joinrel->relids)) fail; The reference to non-junk in the comment is off base too. Attrs are used or not, there's no concept of a junk attr. > + * XXX Seems not worth searching partial indexes because those > + * are only usable with a appropriate restriction, so the > + * only way they could ever be usable is if there was a > + * restriction that exactly matched the index predicate, > + * which is possible but generally unlikely. I haven't thought this through entirely, but wouldn't a partial index be okay if it's marked predOK? You might be right that the case is unlikely, but if it's only one extra line to support it ... > + if (removable && > + joinrel->cheapest_total_path < keeprel->cheapest_total_path) This test on cheapest_total_path seems pretty wacko --- even granting that you meant to test the costs and not compare pointer addresses, isn't it backward? Also I doubt that the joinrel's cheapest_total_path field is set yet where you're calling this. In any case I'm unconvinced that join removal could ever not be a win, so that test seems unnecessary anyhow. > + { > + elog(LOG, "join removed"); > + joinrel->pathlist = keeprel->pathlist; > + joinrel->joininfo = keeprel->baserestrictinfo; > + } Huh? What in the world are you doing to joininfo here? This function should only be manipulating the path list. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: > + if (removable && > + joinrel->cheapest_total_path < keeprel->cheapest_total_path) > + { > + elog(LOG, "join removed"); > + joinrel->pathlist = keeprel->pathlist; > + joinrel->joininfo = keeprel->baserestrictinfo; > + } > + } On third thought: if you think that the join paths could possibly win, then the correct coding here is something like foreach(path, keeprel->pathlist) add_path(joinrel, ...) The reason is that it's not an all-or-nothing choice: some of the paths might offer cheaper startup cost, or present a useful sort order. So just present them as available alternatives and let add_path sort it out. regards, tom lane
On Tue, 2008-09-02 at 12:28 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > As discussed on 26 June, "Join Removal/Vertical Partitioning", here's a > > patch to remove joins in certain circumstances. > > Some points not made in the thread so far: Various comments accepted and agreed. > > + * XXX Seems not worth searching partial indexes because those > > + * are only usable with a appropriate restriction, so the > > + * only way they could ever be usable is if there was a > > + * restriction that exactly matched the index predicate, > > + * which is possible but generally unlikely. > > I haven't thought this through entirely, but wouldn't a partial index be > okay if it's marked predOK? You might be right that the case is > unlikely, but if it's only one extra line to support it ... As of now, its it is predOK then that implies there was a qual on the checkrel so can't remove the join. So no need to check. > > + if (removable && > > + joinrel->cheapest_total_path < keeprel->cheapest_total_path) > > This test on cheapest_total_path seems pretty wacko and stuff below it definitely is, from upthread discussion. Hang fire on more comments - its a short patch but needs a few thwacks from the cluestick yet before next review. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > On Tue, 2008-09-02 at 12:28 -0400, Tom Lane wrote: >> I haven't thought this through entirely, but wouldn't a partial index be >> okay if it's marked predOK? You might be right that the case is >> unlikely, but if it's only one extra line to support it ... > As of now, its it is predOK then that implies there was a qual on the > checkrel so can't remove the join. That conclusion seems utterly wrong to me. Per the example of a left join b on (a.x = b.y and b.z = 1) b.z = 1 will bubble down to be a qual on b. It doesn't prevent the join optimization, and it does possibly allow the use of a partial index. In particular a unique index on b.y with a predicate involving b.z would be relevant to the optimization decision here. In slightly more realistic cases, b might be a view on c that imposes a WHERE condition on c.z, so that's another avenue for a qual to exist below the join. regards, tom lane