Re: Very busy 24x7 databases and VACUUM - Mailing list pgsql-admin

From Christopher Browne
Subject Re: Very busy 24x7 databases and VACUUM
Date
Msg-id m3k6rwm53s.fsf@knuth.knuth.cbbrowne.com
Whole thread Raw
In response to Very busy 24x7 databases and VACUUM  ("David F. Skoll" <dfs@roaringpenguin.com>)
Responses Re: Very busy 24x7 databases and VACUUM  (Andrew Sullivan <ajs@crankycanuck.ca>)
List pgsql-admin
In the last exciting episode, dfs@roaringpenguin.com ("David F. Skoll") wrote:
> Does anyone run a very busy PostgreSQL datatabase, with lots of read
> and write operations that run 24x7?  (We're talking on the
> neighbourhood of 40 to 60 queries/second, with probably 5% to 10% of
> them being INSERT or UPDATE.)

Yup...  [Hand goes up...]

> Some of our clients run such a DB, and the nightly VACUUM slows
> things down tremendously while it is running.  I see that in 8.0,
> you can reduce the VACUUM's I/O impact, but from reading the code,
> it also looks like that means the VACUUM will hold locks for longer,
> which is probably bad news.

Yes, there's a trade-off there.  The "lazier" vacuum will indeed hold
its locks longer.

> Doing VACUUM more often than nightly is not a good idea; the tables
> tend to be pretty large and it looks like VACUUM has to scan all the
> tuples each time.  (The nightly VACUUM is already taking several
> hours in some cases.)

Yes, vacuum does need to scan all the tuples.  There's no shortcut at
this point.

> How do we handle this situation?  Are there any plans for some kind
> of "incremental" vacuum that recovers a few pages here and there in
> the background?  Is such a thing even possible?

There has been some talk of a "VACUUM CACHE" idea, where the idea
would be to walk through the shared buffer cache and vacuum just those
pages.  Recently updated pages ought to be in the cache, so we might
expect this to be reasonably fruitful, as well as being rather quick.

If that were to work out well, I would think it potentially fruitful
to have a perhaps longer list of "pages of interest" whereby
UPDATE/DELETE operations might throw pages that they touch into a
queue for later re-examination.  In a table that contains both "active
regions" and large, seldom-updated "inactive regions," it would be
nice to have a way to focus on the "active" bits.

> If we defer some write operations until after the VACUUM has
> finished, will that speed up the VACUUM?  There are some things we
> can save up until after VACUUM is finished.

One thing that would be somewhat helpful would be to be sure that each
of the tables that you are vacuuming is handled in a separate
transaction.

Supposing it's six tables that each take 1/2h to vacuum, if you do
each in a separate transaction, some locks may get established over
and over, but the "lease" will be dropped and renewed each half hour,
which ought to be helpful.  Any MVCC-relevant logic will just have
1/2h periods for which tuples are held onto instead of there being a
big 3h "lock" put on them.  That's not an explicit lock, but rather an
inability to purge entries out...
--
(format nil "~S@~S" "cbbrowne" "gmail.com")
http://www.ntlug.org/~cbbrowne/rdbms.html
If two people love each other, there can be no happy end to it.
-- Hemingway

pgsql-admin by date:

Previous
From: "David F. Skoll"
Date:
Subject: Very busy 24x7 databases and VACUUM
Next
From: Gourish Singbal
Date:
Subject: Need Help in Taking Backups