Re: BUG #7853: Incorrect statistics in table with many dead rows. - Mailing list pgsql-bugs

From Jeff Janes
Subject Re: BUG #7853: Incorrect statistics in table with many dead rows.
Date
Msg-id CAMkU=1yE4Dwr0a7GPKMZjxMUiT6qSY=2Y1oF79myhdkxwWTu_A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #7853: Incorrect statistics in table with many dead rows.  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: BUG #7853: Incorrect statistics in table with many dead rows.
Re: BUG #7853: Incorrect statistics in table with many dead rows.
List pgsql-bugs
On Tue, Feb 5, 2013 at 2:00 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> "jimbob@seagate.com" <jimbob@seagate.com> wrote:
>
>> INFO:  analyzing "public.stream_file"
>> INFO:  "stream_file": scanned 30000 of 2123642 pages, containing
>> 184517 live rows and 2115512 dead rows; 30000 rows in sample,
>> 158702435 estimated total rows
>
> 184517 live rows in 30000 randomly sampled pages out of 2123642
> total pages, means that the statistics predict that a select
> count(*) will find about  13 million live rows to count.
>
>> After "analyze verbose", the table shows 158 million rows. A
>> select count(1) yields 13.8 million rows.
>
> OK, the estimate was 13 million and there were actually 13.8
> million, but it is a random sample used to generate estimates.
> That seems worse than average, but close enough to be useful.
> The 158.7 million total rows includes dead rows, which must be
> visited to determine visibility, but will not be counted because
> they are not visible to the counting transaction.

To clarify here, the 158.7 million estimate does not *intentionally*
include dead rows.  As you say, the ANALYZE did get a very good
instantaneous estimate of the number of live rows.  However, ANALYZE
doesn't over-write the old estimate, it averages its estimate into the
old one.  After the table shape changes dramatically, the ANALYZE
needs to be run repeatedly before the estimate will converge to the
new reality.  (Of course a cluster or vacuum full will blow away the
old statistics, so the next analyze after that will solely determine
the new statistics.)

I agree, not a bug.

Cheers,

Jeff

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #7866: even after T's child-tables are deleted it cannot have select-rule
Next
From: matti.aarnio@methics.fi
Date:
Subject: BUG #7865: Unexpected error code on insert of duplicate to composite primary key