I wrote:
> I spent a fair chunk of the weekend hacking on this patch to make
> it more understandable and fix up a lot of what seemed to me pretty
> clear arithmetic errors in the "upper layers" of the patch. However,
> I couldn't quite convince myself to commit it, because the business
> around estimation for partial histogram-bucket matches still doesn't
> make any sense to me.
Actually, there's a second large problem with this patch: blindly
iterating through all combinations of MCV and histogram entries makes the
runtime O(N^2) in the statistics target. I made up some test data (by
scanning my mail logs) and observed the following planning times, as
reported by EXPLAIN ANALYZE:
explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip;
explain analyze select * from relays r1, relays r2 where r1.ip && r2.ip;
stats target eqjoinsel networkjoinsel
100 0.27 ms 1.85 ms
1000 2.56 ms 167.2 ms
10000 56.6 ms 13987.1 ms
I don't think it's necessary for network selectivity to be quite as
fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning
time for a query that might need just milliseconds to execute :-(
It seemed to me that it might be possible to reduce the runtime by
exploiting knowledge about the ordering of the histograms, but
I don't have time to pursue that right now.
regards, tom lane