Re: estimating # of distinct values - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: estimating # of distinct values
Date
Msg-id 4D38A8C4.3050706@fuzzy.cz
Whole thread Raw
In response to Re: estimating # of distinct values  (Nathan Boley <npboley@gmail.com>)
List pgsql-hackers
Dne 20.1.2011 03:06, Nathan Boley napsal(a):
>> And actually it does not depend on ndistinct for the columns only, it
>> depends on ndistinct estimates for the combination of columns. So
>> improving the ndistinct estimates for columns is just a necessary first
>> step (and only if it works reasonably well, we can do the next step).
> 
> I think that any approach which depends on precise estimates of
> ndistinct is not practical.

I'm not aware of any other approach to the 'discrete fail case' (where
the multi-dimensional histograms are not applicable). If someone finds a
better solution, I'll be the first one to throw away this stuff.

> I am very happy that you've spent so much time on this, and I'm sorry
> if my previous email came off as combative. My point was only that
> simple heuristics have served us well in the past and, before we go to
> the effort of new, complicated schemes, we should see how well similar
> heuristics work in the multiple column case. I am worried that if the
> initial plan is too complicated then nothing will happen and, even if
> something does happen, it will be tough to get it committed ( check
> the archives for cross column stat threads - there are a lot ).

If I've leaned one thing over the years in IT, it's not to take critique
personally. All the problems mentioned in this thread are valid
concerns, pointing out weak points of the approach. And I'm quite happy
to receive this feedback - that's why I started it.

On the other hand - Jara Cimrman (a famous Czech fictional character,
depicted as the best scientist/poet/teacher/traveller/... - see [1])
once said that you can't be really sure you don't get gold by blowing
cigarette smoke into a basin drain, until you actually try it. So I'm
blowing cigaretter smoke into the drain ...

It may wery vell happen this will be a dead end, but I'll do my best to
fix all the issues or to prove that the pros outweight the cons. And
even if it will be eventually rejected, I hope to get -1 from TL to be
eligible for that t-shirt ...

[1] http://en.wikipedia.org/wiki/Jara_Cimrman

regards
Tomas


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ALTER TABLE ... REPLACE WITH
Next
From: Tom Lane
Date:
Subject: Re: REVIEW: EXPLAIN and nfiltered