Re: PATCH: adaptive ndistinct estimator v4 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: PATCH: adaptive ndistinct estimator v4 |
Date | |
Msg-id | CA+TgmoZ6FgvwVTyzM7hLiHDifYinXKsTiRWxh062AETOt8Dw7Q@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: adaptive ndistinct estimator v4 (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-hackers |
On Wed, May 13, 2015 at 5:07 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > With the warning it is very hard to correlate the discrepancy you do see > with which column is causing it, as the warnings don't include table or > column names (Assuming of course that you run it on a substantial > database--if you just run it on a few toy cases then the warning works > well). Presumably the warning is going to go away before we actually commit this thing. > If we want to have an explicitly experimental patch which we want people > with interesting real-world databases to report back on, what kind of patch > would it have to be to encourage that to happen? Or are we never going to > get such feedback no matter how friendly we make it? Another problem is > that you really need to have the gold standard to compare them to, and > getting that is expensive (which is why we resort to sampling in the first > place). I don't think there is much to be done on that front other than > bite the bullet and just do it--perhaps only for the tables which have > discrepancies. If we stick with the idea of a GUC to control the behavior, then somebody can run ANALYZE, save the ndistinct estimates, run ANALYZE again, and compare. They can also run SQL queries against the tables themselves to check the real value. We could even provide a script for all of that. I think that would be quite handy. > It can't hurt, but how effective will it be? Will developers know or care > whether ndistinct happened to get better or worse while they are working on > other things? I would think that problems will be found by focused testing, > or during beta, and probably not by accidental discovery during the > development cycle. It can't hurt, but I don't know how much it will help. Once we enter beta (or even feature freeze), it's too late to whack around the algorithm heavily. We're pretty much committed to releasing and supporting whatever we have got at that point. I guess we could revert it if it doesn't work out, but that's about the only option at that point. We have more flexibility during the main part of the development cycle. But your point is certainly valid and I don't mean to dispute it. > I agree with the "experimental GUC". That way if hackers do happen to see > something suspicious, they can just turn it off and see what difference it > makes. If they have to reverse out a patch from 6 months ago in an area of > the code they aren't particularly interested in and then recompile their > code and then juggle two different sets of binaries, they will likely just > shrug it off without investigation. Yep. Users, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: