Re: Potential autovacuum optimization: new tables - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Potential autovacuum optimization: new tables
Date
Msg-id 5078C8BC.5060303@agliodbs.com
Whole thread Raw
In response to Re: Potential autovacuum optimization: new tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Potential autovacuum optimization: new tables  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> No, it's not that easy.  The question you have to ask is "when has that
> initial write burst stopped?".  As an example, if autovacuum happened to
> see that table in the instant after CREATE, it might autovacuum it while
> it's still empty, and then this rule fails to trigger any further effort.

Well, frankly, it would be useful to know it's empty too.  If you really
wanna see that 1000-row default estimate bite you on the tuchas, try a
JOIN against an empty new table.

> Personally I've always thought that autovacuum's rules should be based
> on a percentage of rows changed, not an absolute threshold (or maybe in
> addition to an absolute threshold).  This way, if you create a table and
> insert 10 rows, that would make it subject to analyze on-sight, even if
> autovac had managed to pass by while it was still empty, because the
> percentage-changed is infinite.  Then, if you insert the other 35 rows
> you meant to insert, it's *again* subject to autoanalyze on the next
> pass, because the percentage-changed is still 350%

> I remember having got voted down on the percentage approach back when
> we first put AV into core, but I remain convinced that decision was a
> bad one.

Yeah, I was one of the ones voting against you.  The reason not to have
percentage-only is for small tables.  Imagine that you have a table with
18 rows, and analyze_threshold is 0 and analyze_scale_factor is 0.1.

Add two rows --> ANALYZE
UPDATE two rows --> ANALYZE
UPDATE three more rows --> ANALYZE
DELETE three rows --> ANALYZE

Without a threshold, any table under 100 rows which gets UPDATEs would
be continually in the autoanalyze queue, which would not be beneficial;
the churn in pg_statistic alone would be detrimental.

I guess the real issue is that we tried to solve the issue of ANALYZEing
tables of radically different sizes with a simple heuristic of
threshold+scale_factor, and that's proving too simple for actual
production sites.  The other end where autoanalyze often falls down is
the high end (tables with a million rows).

Can anyone think of a new heuristic which doesn't involve adding 2-6 new
GUCS knobs?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Potential autovacuum optimization: new tables
Next
From: Tom Lane
Date:
Subject: Re: Potential autovacuum optimization: new tables