Hi Dean,
Here is an updated patch (hopefully) fixing the bugs you've reported so
far. In particular, it fixes this:
1) mostly harmless memset bug in UpdateStatisticsForTypeChange
2) passing the right list (stat_clauses) to mcv_clauselist_selectivity
3) corrections to a couple of outdated comments
4) handling of NOT clauses in MCV lists (and in histograms)
The query you posted does not fail anymore, but there's a room for
improvement. We should be able to handle queries like this:
select * from foo where a=1 and not b=1;
But we don't, because we only recognize F_EQSEL, F_SCALARLTSEL and
F_SCALARGTSEL, but F_NEQSEL (which is what "not b=1" uses). Should be
simple to fix, I believe.
5) handling of mcv_lowsel in statext_clauselist_selectivity
I do believe the new behavior is correct - as I suspected, I broke this
during the last rebase, where I also moved some stuff from the histogram
part to the MCV part. I've also added the (sum of MCV frequencies), as
you suggested.
I think we could improve the estimate further by computing ndistinct
estimate, and then using that to compute average frequency of non-MCV
items. Essentially what var_eq_const does:
if (otherdistinct > 1)
selec /= otherdistinct;
Not sure how to do that when there are not just equality clauses.
BTW I think there's a bug in handling the fullmatch flag - it should not
be passed to AND/OR subclauses the way it is, because then
WHERE a=1 OR (a=2 AND b=2)
will probably set it to 'true' because of (a=2 AND b=2). Which will
short-circuit the statext_clauselist_selectivity, forcing it to ignore
the non-MCV part.
But that's something I need to look at more closely tomorrow. Another
thing I probably need to do is to add more regression tests, protecting
against bugs similar to those you found.
Thanks for the feedback so far!
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services