Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Date
Msg-id 18634.1212966193@sss.pgh.pa.us
Whole thread Raw
In response to Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  ("Nathan Boley" <npboley@gmail.com>)
Responses Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  (Jeff Davis <pgsql@j-davis.com>)
Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  ("Nathan Boley" <npboley@gmail.com>)
List pgsql-hackers
"Nathan Boley" <npboley@gmail.com> writes:
> ... There are two potential problems that I see with this approach:

> 1) It assumes the = is equivalent to <= and >= . This is certainly
> true for real numbers, but is it true for every equality relation that
> eqsel predicts for?

The cases that compute_scalar_stats is used in have that property, since
the < and = operators are taken from the same btree opclass.

> Do people think that the improved estimates would be worth the
> additional overhead?

Your argument seems to consider only columns having a normal
distribution.  How badly does it fall apart for non-normal
distributions?  (For instance, Zipfian distributions seem to be pretty
common in database work, from what I've seen.)
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Nathan Boley"
Date:
Subject: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Next
From: Alvaro Herrera
Date:
Subject: handling TOAST tables in autovacuum