Re: Does auto-analyze work on dirty writes? - Mailing list pgsql-performance

From Robert Haas
Subject Re: Does auto-analyze work on dirty writes?
Date
Msg-id AANLkTimeTk_reda4x3WwsgOfXN6NO6Be63YyjJ5UnZLz@mail.gmail.com
Whole thread Raw
In response to Re: Does auto-analyze work on dirty writes?  (Mark Mielke <mark@mark.mielke.cc>)
List pgsql-performance
On Fri, Feb 4, 2011 at 8:50 PM, Mark Mielke <mark@mark.mielke.cc> wrote:
> On 02/04/2011 10:41 AM, Tom Lane wrote:
>>
>> 1. Autovacuum fires when the stats collector's insert/update/delete
>> counts have reached appropriate thresholds.  Those counts are
>> accumulated from messages sent by backends at transaction commit or
>> rollback, so they take no account of what's been done by transactions
>> still in progress.
>>
>> 2. Only live rows are included in the stats computed by ANALYZE.
>> (IIRC it uses SnapshotNow to decide whether rows are live.)
>>
>> Although the stats collector does track an estimate of the number of
>> dead rows for the benefit of autovacuum, this isn't used by planning.
>> Table bloat is accounted for only in terms of growth of the physical
>> size of the table in blocks.
>
> Thanks, Tom.
>
> Does this un-analyzed "bloat" not impact queries? I guess the worst case
> here is if autovaccum is disabled for some reason and 99% of the table is
> dead rows. If I understand the above correctly, I think analyze might
> generate a bad plan under this scenario, thinking that a value is unique,
> using the index - but every tuple in the index has the same value and each
> has to be looked up in the table to see if it is visible?

It sounds like you're describing something like a one-row table with a
unique index on one of its column, getting updates that can't be made
HOT, and not getting vacuumed.  That scenario does suck - I had a test
case I was using it a while back that generated something similar -
but I'm not sure how much it's worth worrying about the plan, because
either an index scan or a sequential scan is going to be awful.

To put that another way, I've founded that the optimizer copes pretty
well with adjusting plans as tables get bloated - mostly by using
index scans rather than sequential scans.  It's possible there is some
improvement still to be had there, but I would be a lot more
interested in fixing the bloat, at least based on my own experiences.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Why we don't want hints Was: Slow count(*) again...
Next
From: Greg Smith
Date:
Subject: Re: Why we don't want hints Was: Slow count(*) again...