Re: Potential autovacuum optimization: new tables - Mailing list pgsql-hackers

From David Johnston
Subject Re: Potential autovacuum optimization: new tables
Date
Msg-id 52A425E9-D155-48CF-B529-02182ED71737@yahoo.com
Whole thread Raw
In response to Re: Potential autovacuum optimization: new tables  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Potential autovacuum optimization: new tables  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Oct 12, 2012, at 22:13, Stephen Frost <sfrost@snowman.net> wrote:

> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> [ shrug... ]  You're attacking a straw man, or more precisely putting
>> words into my mouth about what the percentage-based thresholds might be.
>> Notice the examples I gave involved update percentages quite far north
>> of 100%.  It's possible and maybe likely that we need a sliding scale.
>
> I was just discussing such a sliding scale approach w/ Josh on IRC, my
> thinking was that we could use a logarithmic approach based on table
> size.
>
>> Also, I don't necessarily accept the conclusion you seem to be drawing,
>> that it's okay to have complete turnover of a small table and not redo
>> its stats.  If you don't like the current behavior when there's no
>> stats, why would you like the behavior when there are some stats but
>> they no longer have the remotest relationship to reality?
>
> Josh's concern is about autovacuum causing lots of stats churn, which is
> understandable, we don't want it constantly rescanning a table, but
> perhaps we could use some kind of threshold for preventing autovac from
> rescanning a table it just scanned?  Note that I did *not* say 'GUC',
> but I don't know what the 'right' answer is for how frequently is
> good-but-not-too-frequent.  I'd also like to try and avoid adding GUCs.
>
>

Instead of global could you attach an interface function to the table and have the auto-analyzer call that function to
basicallyask the table whether it needs to be analyzed?  Still need to deal with defaults and provide a decent supply
ofbuilt-in algorithms but at least the system can be made tunable.  The default algorithm could maybe just handoff to a
tablesize specific handler.  The create table and alter table commands could be used to change the assigned algorithm
ifdesired and new ones could be supplied via extensions. 

The 1000 row default seems unusual at first glance and contributes to the problem described.

It is likely that the first I sent following the create table is going to be a bulk load if the table is going to have
manyrows.  In the case where rows are inserted individually it is likely that the expected row count will be closer to
1than 1000. 

One useful algorithm to provide the user is analyze on insert and, though maybe less so, analyze on update.  So that
anyinsert/update causes the table to be re-analyzed.  Not a good default but, combined with "delayed analyze" logic to
establisha minimum frequency, is a possible option for some use cases. 

Temporary table creation should have special attention given if changes are going to be made here.

Another idea is to have system after [command] trigger(s) than can be used to call analyze without waiting for the
auto-vacuumprocess.  Provide some way for CREATE/ALTER TABLE and maybe auto-vacuum to enable and disable the trigger. 

David J.




pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Potential autovacuum optimization: new tables
Next
From: Michael Paquier
Date:
Subject: Re: Support for REINDEX CONCURRENTLY