Thread: stawidth inconsistency with all NULL columns

stawidth inconsistency with all NULL columns

From
Joe Conway
Date:
Consider:

CREATE TABLE testwid
(
  txtnotnull text,
  txtnull text,
  int8notnull int8,
  int8null int8
);
INSERT INTO testwid
SELECT 'a' || g.i,
       NULL,
       g.i,
       NULL
FROM generate_series(1,10000) AS g(i);
ANALYZE testwid;
SELECT attname, avg_width FROM pg_stats WHERE tablename = 'testwid';
   attname   | avg_width
-------------+-----------
 txtnotnull  |         5
 txtnull     |         0
 int8notnull |         8
 int8null    |         8
(4 rows)


I see in analyze.c
8<-----------------
/* We can only compute average width if we found some non-null values.*/
if (nonnull_cnt > 0)

  [snip]

else if (null_cnt > 0)
{
    /* We found only nulls; assume the column is entirely null */
    stats->stats_valid = true;
    stats->stanullfrac = 1.0;
    if (is_varwidth)
        stats->stawidth = 0;    /* "unknown" */
    else
        stats->stawidth = stats->attrtype->typlen;
    stats->stadistinct = 0.0;       /* "unknown" */
}
8<-----------------

So apparently intentional, but seems gratuitously inconsistent. Could
this cause any actual inconsistent behaviors? In any case that first
comment does not reflect the code.

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: stawidth inconsistency with all NULL columns

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> else if (null_cnt > 0)
> {
>     /* We found only nulls; assume the column is entirely null */
>     stats->stats_valid = true;
>     stats->stanullfrac = 1.0;
>     if (is_varwidth)
>         stats->stawidth = 0;    /* "unknown" */
>     else
>         stats->stawidth = stats->attrtype->typlen;
>     stats->stadistinct = 0.0;       /* "unknown" */
> }
> 8<-----------------

> So apparently intentional, but seems gratuitously inconsistent. Could
> this cause any actual inconsistent behaviors? In any case that first
> comment does not reflect the code.

Are you suggesting that we should set stawidth to zero even for a
fixed-width datatype?  That seems pretty silly.  We know exactly what
the value should be, and would be if we'd chanced to find even one
non-null entry.

            regards, tom lane



Re: stawidth inconsistency with all NULL columns

From
Joe Conway
Date:
On 5/21/19 3:55 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> else if (null_cnt > 0)
>> {
>>     /* We found only nulls; assume the column is entirely null */
>>     stats->stats_valid = true;
>>     stats->stanullfrac = 1.0;
>>     if (is_varwidth)
>>         stats->stawidth = 0;    /* "unknown" */
>>     else
>>         stats->stawidth = stats->attrtype->typlen;
>>     stats->stadistinct = 0.0;       /* "unknown" */
>> }
>> 8<-----------------
>
>> So apparently intentional, but seems gratuitously inconsistent. Could
>> this cause any actual inconsistent behaviors? In any case that first
>> comment does not reflect the code.
>
> Are you suggesting that we should set stawidth to zero even for a
> fixed-width datatype?  That seems pretty silly.  We know exactly what
> the value should be, and would be if we'd chanced to find even one
> non-null entry.

Well you could argue in similar fashion for variable width values -- if
we find even one of those, it will be at least 4 bytes. So why set those
to zero?

Not a big deal, but it struck me as odd when I was looking at the
current state of affairs.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: stawidth inconsistency with all NULL columns

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> On 5/21/19 3:55 PM, Tom Lane wrote:
>> Are you suggesting that we should set stawidth to zero even for a
>> fixed-width datatype?  That seems pretty silly.  We know exactly what
>> the value should be, and would be if we'd chanced to find even one
>> non-null entry.

> Well you could argue in similar fashion for variable width values -- if
> we find even one of those, it will be at least 4 bytes. So why set those
> to zero?

Um, really the minimum width is 1 byte, given short headers.  But as
the code notes, zero means we don't know what a sane estimate would
be, which is certainly not the case for fixed-width types.

            regards, tom lane