Re: Autovacuum in the backend - Mailing list pgsql-hackers

From Thomas F. O'Connell
Subject Re: Autovacuum in the backend
Date
Msg-id CD96413A-E1FF-465A-B532-0C935DCE8D47@sitening.com
Whole thread Raw
In response to Re: Autovacuum in the backend  (Gavin Sherry <swm@linuxworld.com.au>)
List pgsql-hackers
Gavin,

For the record, I don't consider myself a PostgreSQL newbie, nor do I
manage any 2 TB databases (much less tables), but I do have an
unusual production use case: thousands (> 10,000) of tables, many of
them inherited, and many of them with hundreds of thousands (a few
with millions) of rows.

Honestly, creating crontab vacuum management for this scenario would
be a nightmare, and pg_autovacuum has been a godsend. Considering the
recent revelations of O(n^2) iterations over table lists in the
current versions and the stated and apparent ease with which this
problem could be solved by integrating the basic functionality of
pg_autovacuum into the backend, I can personally attest to there
being real-world use cases that would benefit tremendously from
integrated autovacuum.

A few months ago, I attempted to solve the wrong problem by
converting a hardcoded threshold into another command-line option. If
I had spotted the O(n^2) problem, I might've spent the time working
on it then instead of the new command-line option. I suppose it's
possible that I'll head down this road anyway if it looks like
integrated pg_autovacuum is going to be put on hold indefinitely
after this discussion.

Anyway, just wanted to throw out some food for thought for the
practicality of a tool like pg_autovacuum.

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jun 16, 2005, at 5:22 PM, Gavin Sherry wrote:

> On Thu, 16 Jun 2005, Alvaro Herrera wrote:
>
>
>> On Thu, Jun 16, 2005 at 04:20:34PM +1000, Gavin Sherry wrote:
>>
>>
>>> 2) By no fault of its own, autovacuum's level of granularity is
>>> the table
>>> level. For people dealing with non-trivial amounts of data (and
>>> we're not
>>> talking gigabytes or terabytes here), this is a serious drawback.
>>> Vacuum
>>> at peak times can cause very intense IO bursts -- even with the
>>> enhancements in 8.0. I don't think the solution to the problem is
>>> to give
>>> users the impression that it is solved and then vacuum their
>>> tables during
>>> peak periods. I cannot stress this enough.
>>>
>>
>> People running systems with petabyte-sized tables can disable
>> autovacuum
>> for those tables, and leave it running for the rest.  Then they can
>> schedule whatever maintenance they see fit on their gigantic tables.
>> Trying to run a database with more than a dozen gigabytes of data
>> without expert advice (or at least reading the manual) would be
>> extremely stupid anyway.
>>
>
> As I've said a few times, I'm not concerned about such users. I'm
> concerned about users with some busy tables of a few hundred
> megabytes. I
> still don't think VACUUM at arbitary times on such tables is suitable.
>
> Thanks,
>
> Gavin


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] Escape handling in strings
Next
From: Michael Glaesemann
Date:
Subject: Re: [PATCHES] Escape handling in strings