Re: New design for FK-based join selectivity estimation - Mailing list pgsql-hackers

From Tom Lane
Subject Re: New design for FK-based join selectivity estimation
Date
Msg-id 15396.1466268742@sss.pgh.pa.us
Whole thread Raw
In response to Re: New design for FK-based join selectivity estimation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: New design for FK-based join selectivity estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: New design for FK-based join selectivity estimation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> A few more comments, about re-reading the patch more thoroughly. I 
> wouldn't say any of those qualify as bugs, but rather as discussion 
> about some of the design choices:

> 1) NULL handling

> I'd argue that we should do something about this, although I agree it's 
> non-trivial to estimate - at least until we get some sort of correlation 
> stats (e.g. my patch already provides most of the pieces, I believe). 

I concur, actually, but I feel that it's out of scope for this particular
patch, which is only trying to replace the functionality that was
committed previously.  If you want to come up with a patch on top of this
that adds some accounting for NULLs, I'd be willing to consider it as
a post-beta2 improvement.

> But I'd argue that in the case of multi-column foreign keys we can do 
> better even without it - my experience is that in such cases either all 
> values are NULL or none of them, and a single NULL value breaks the FK 
> of course. So I think max(null_frac) would work.

Yeah, I was thinking along the same lines: max of the per-column null
fractions is probably an OK estimate.

> The comment right before checking (removedlist == NIL) says:
>   * For a multi-column FK, it's possible that we found matches to some
>   * columns but not all, implying that one of the above effects applied
>   * to just some of the columns.  For the moment, we go ahead and
>   * remove those clauses and apply the FK's selectivity anyway.  It
>   * might be better to put back the removed clauses and ignore the FK;
>   * but that amounts to betting on independence of the clauses, which
>   * doesn't seem like a good bet in such messy cases.

> Is this a good idea? I'd probably vote to do what's proposed (and 
> rejected) in the second half of the comment, i.e. put back the clauses 
> and skip the FK as that pretty much says "improve estimate or keep the 
> current one, but don't risk making it worse."

I would buy that approach when it comes to "loose" quals, but I think
it's not right for EC-derived matches, because of the behavior we
discussed previously that an EC should be considered to have implicitly
generated all the matching quals even though it actually made only one
qual that might not be any of them exactly.

Now on the other side of the coin, if several FKs match the same EC,
we might be overshooting the mark to assume that we can just multiply
their selectivity estimates together.  But I think that's not really
a matter for the matching logic, but rather a question of how we want
to combine the estimates for multiple FKs.

Possibly what we could do here is assume that EC matches succeed,
whether there's a matching qual physically in the list or not, but
require a qual match for each column with non-EC matches.  That
would complicate the logic slightly but not terribly (he says without
having actually tried to code it).

If you have a proposal about adjusting the net selectivity estimate when
multiple FKs match the join, let's hear it.  But again that seems like
something we could address as a post-beta2 refinement.

> 3) ForeignKeyOptInfo->rinfos as a List
> Can we actually get a list of matching RestrictInfos for a single 
> foreign key? I've been unable to construct such query.

I think you'd actually have to write redundant outer join quals,
along the lines of     select ... a left join b on (a.x = b.y and a.x = b.y)
I don't believe we take the trouble to eliminate such duplicates
unless they get absorbed by an EC, which outer-join quals would
not be.  (Haven't tried this, though, as I don't have the patch
installed right now.)


The beta2 deadline is just about upon us; I feel that if we're going
to get this into this release at all, we need to push it today so
that we get a full buildfarm cycle on it before the wrap.

I plan to spend an hour or two adjusting the qual match logic as
discussed above, and re-reading the whole patch another time for
sanity.  If I've not heard objections by the time I'm done,
I will push it.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Whether to back-patch fix for aggregate transtype width estimates
Next
From: Tom Lane
Date:
Subject: Re: New design for FK-based join selectivity estimation