Re: autovacuum next steps, take 2 - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: autovacuum next steps, take 2
Date
Msg-id 20070227162509.GD29041@nasby.net
Whole thread Raw
In response to Re: autovacuum next steps, take 2  ("Matthew T. O'Connor" <matthew@zeut.net>)
List pgsql-hackers
On Tue, Feb 27, 2007 at 12:54:28AM -0500, Matthew T. O'Connor wrote:
> Jim C. Nasby wrote:
> >On Mon, Feb 26, 2007 at 10:18:36PM -0500, Matthew T. O'Connor wrote:
> >>Jim C. Nasby wrote:
> >>Here is a worst case example: A DB with 6 tables all of which are highly 
> >>active and will need to be vacuumed constantly.  While this is totally 
> >>hypothetical, it is how I envision things working (without the threshold).
> >
> >I fail to see how a simple 6 table case is 'worst case'. It's common to
> >see hundreds of tables, and I've run across more than one database with
> >thousands of tables (think partitioning). In cases like those it's
> >certainly possible, perhaps even likely that you would get many daemons
> >running in the database at one time just from different tables suddenly
> >needing vacuuming and appearing at a higher point in the list than other
> >tables. With 100 ~1G tables getting updates it certainly wouldn't be
> >hard to end up with 10 of those being vacuumed all at the same time.
> 
> Yes 6 tables is small, the worst-case part of the example was that all 
> the tables would need to be vacuumed constantly.  Most databases only 
> have a few hot tables.  Most tables only need to vacuumed every once in 
> a while.
It's not the hot tables that are the issue; it's how many large tables
(hot or not) that can come up for vacuuming in order. For example, if
A-Z are all large tables (ie: a few GB), with A being the largest and Z
the smallest, think about what happens here:

Round 1: A needs vacuuming. Daemon gets to it and starts working.
Round 2: B now needs vacuuming. It's slightly smaller than A, so daemon
2 gets to it.
Round 3: C now needs vacuuming. Daemon 3.
...
Round 26: Z now needs vacuuming. Daemon 26 picks it up.

You now have 26 daemons running in the database.

Now, we can argue about how likely that scenario is, but I don't think
it's relevant. What matters is that it *is* possible, and as long as
that's the case you'd have to have some kind of limit. (While this
simple 26 table example is definitely worst-case, if you've got hundreds
of tables that are all multiple GB in size I think it wouldn't be hard
at all for you to end up with a dozen or more daemons all hammering
away).

> >I do like the idea since it should be easier to tune, but I think we
> >still need some limit on it. Perhaps as a first-pass we could just have
> >a hard limit and log a message and/or set a flag any time we hit it.
> >That would hopefully allow us to get information about how big a problem
> >it really is. We could go one step further and say that the last daemon
> >that can start in a database will only vacuum tables that can be done
> >quickly; that's essentially what we've been talking about, except the
> >limit we've been discussing would be hard-coded at 2.
> 
> I'm confused, what limit would be set at 2?  The number of concurrent 
> workers?  I've never said that.

The point I was making is that the proposal about limiting the 2nd
daemon to only processing tables it can do in a short period of time is
akin to setting a limit of only 2 daemons in a database at a time.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


pgsql-hackers by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: autovacuum next steps, take 2
Next
From: "Jim C. Nasby"
Date:
Subject: Re: autovacuum next steps, take 2