Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?
Date
Msg-id 149287.1604106275@sss.pgh.pa.us
Whole thread Raw
In response to Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> * It's not apparent why, if ANALYZE's sample is all nulls, we wouldn't
> conclude stadistinct = 0 and thus arrive at the desired answer that
> way.  (Since we have a complaint, I'm guessing that ANALYZE might
> disbelieve its own result and stick in some larger stadistinct.  But
> then maybe that's where to fix this, not here.)

Oh, on second thought (and with some testing): ANALYZE *does* report
stadistinct = 0.  The real issue is that get_variable_numdistinct is
assuming it can use that value as meaning "stadistinct is unknown".
So maybe we should just fix that, probably by adding an explicit
bool flag for that condition.

BTW ... I've not looked at the callers, but now I'm wondering whether
get_variable_numdistinct ought to count NULL as one of the "distinct"
values.  In applications such as estimating the number of GROUP BY
groups, it seems like that would be correct.  There might be some
callers that don't want it though.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?
Next
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Add extra statistics to explain for Nested Loop