Hi,
Here's a slightly improved / cleaned up version of the PoC patch,
removing a bunch of XXX and FIXMEs, adding comments, etc.
The approach is sound in principle, I think, although there's still a
bunch of things to address:
1) statext_compare_mcvs only really deals with equijoins / inner joins
at the moment, as it's based on eqjoinsel_inner. It's probably desirable
to add support for additional join types (inequality and outer joins).
2) Some of the steps are performed multiple times - e.g. matching base
restrictions to statistics, etc. Those probably can be cached somehow,
to reduce the overhead.
3) The logic of picking the statistics to apply is somewhat simplistic,
and maybe could be improved in some way. OTOH the number of candidate
statistics is likely low, so this is not a big issue.
4) statext_compare_mcvs is based on eqjoinsel_inner and makes a bunch of
assumptions similar to the original, but some of those assumptions may
be wrong in multi-column case, particularly when working with a subset
of columns. For example (ndistinct - size(MCV)) may not be the number of
distinct combinations outside the MCV, when ignoring some columns. Same
for nullfract, and so on. I'm not sure we can do much more than pick
some reasonable approximation.
5) There are no regression tests at the moment. Clearly a gap.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company