Re: New design for FK-based join selectivity estimation - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: New design for FK-based join selectivity estimation |
Date | |
Msg-id | 20877.1466175216@sss.pgh.pa.us Whole thread Raw |
In response to | Re: New design for FK-based join selectivity estimation (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: New design for FK-based join selectivity estimation
|
List | pgsql-hackers |
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > Thanks for getting the patch into a much better shape. I've quickly > reviewed the patch this morning before leaving to the airport - I do > plan to do additional review/testing once in the air or perhaps over the > weekend. So at the moment I only have a few minor comments: > 1) Shouldn't we define a new macro in copyfuncs.c, to do the memcpy for > us? Seems a bit strange we have macros for everything else. Perhaps, but it seemed not that compelling since we need bespoke code for those fields in outfuncs.c etc. Maybe it would be worth thinking about macro infrastructure for array-type fields in all of those modules. > 2) I'm wondering whether removing the restrict infos when > if (fkinfo->eclass[i] == rinfo->parent_ec) > is actually correct. Can't the EC include conditions that do not match > the FK? Doesn't matter. The point is that it *does* include a condition that does match the FK, whether it chose to generate exactly that condition for this join or some related one. > I mean something like this: > CREATE TABLE a (id1 INT PRIMARY KEY, id2 INT); > CREATE TABLE b (id1 INT REFERENCES a (id1), id2 INT); > and then something like > SELECT * FROM a JOIN b ON (a.id1 = b.id1 AND a.id1 = b.id2) Right. In this case we'll have an EC containing {a.id1, b.id1, b.id2} which means that equivclass.c will generate a restriction condition b.id1 = b.id2 to be applied at the scan of b. At the join level, it has a choice whether to generate a.id1 = b.id1 or a.id1 = b.id2. It could generate both, but that would be pointlessly inefficient (and would likely confuse the selectivity estimators, too). But even if it chooses to generate a.id1 = b.id2, we should recognize that the FK is matched. What we're effectively doing by dropping that clause in favor of treating the FK as matched is overridding equivclass.c's arbitrary choice of which join clause to generate with an equally valid choice that is easier to estimate for. > 3) I think this comment in get_foreign_key_join_selectivity is wrong and > should instead say 'to FK': > /* Otherwise, see if rinfo was previously matched to EC */ Duh, yeah. I rewrote the comments in this section to clarify a bit: /* Drop this clause if it matches any column of the FK */ for (i = 0; i < fkinfo->nkeys; i++) { if (rinfo->parent_ec) { /* * EC-derived clauses canonly match by EC. It is okay to * consider any clause derived from the same EC as * matching the FK: even if equivclass.c chose to generate * a clause equating some other pair of Vars,it could * have generated one equating the FK's Vars. So for * purposes of estimation,we can act as though it did so. * * Note: checking parent_ec is a bit ofa cheat because * there are EC-derived clauses that don't have parent_ec * set; butsuch clauses must compare expressions that * aren't just Vars, so they cannot match the FK anyway. */ if (fkinfo->eclass[i] == rinfo->parent_ec) { remove_it = true; break; } } else { /* * Otherwise, see if rinfo was previously matched to FK as * a "loose" clause. */ if (list_member_ptr(fkinfo->rinfos[i], rinfo)) { remove_it = true; break; } } } regards, tom lane
pgsql-hackers by date: