Re: Autovacuum in the backend - Mailing list pgsql-hackers
From | Thomas F. O'Connell |
---|---|
Subject | Re: Autovacuum in the backend |
Date | |
Msg-id | CD96413A-E1FF-465A-B532-0C935DCE8D47@sitening.com Whole thread Raw |
In response to | Re: Autovacuum in the backend (Gavin Sherry <swm@linuxworld.com.au>) |
List | pgsql-hackers |
Gavin, For the record, I don't consider myself a PostgreSQL newbie, nor do I manage any 2 TB databases (much less tables), but I do have an unusual production use case: thousands (> 10,000) of tables, many of them inherited, and many of them with hundreds of thousands (a few with millions) of rows. Honestly, creating crontab vacuum management for this scenario would be a nightmare, and pg_autovacuum has been a godsend. Considering the recent revelations of O(n^2) iterations over table lists in the current versions and the stated and apparent ease with which this problem could be solved by integrating the basic functionality of pg_autovacuum into the backend, I can personally attest to there being real-world use cases that would benefit tremendously from integrated autovacuum. A few months ago, I attempted to solve the wrong problem by converting a hardcoded threshold into another command-line option. If I had spotted the O(n^2) problem, I might've spent the time working on it then instead of the new command-line option. I suppose it's possible that I'll head down this road anyway if it looks like integrated pg_autovacuum is going to be put on hold indefinitely after this discussion. Anyway, just wanted to throw out some food for thought for the practicality of a tool like pg_autovacuum. -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i™ http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Jun 16, 2005, at 5:22 PM, Gavin Sherry wrote: > On Thu, 16 Jun 2005, Alvaro Herrera wrote: > > >> On Thu, Jun 16, 2005 at 04:20:34PM +1000, Gavin Sherry wrote: >> >> >>> 2) By no fault of its own, autovacuum's level of granularity is >>> the table >>> level. For people dealing with non-trivial amounts of data (and >>> we're not >>> talking gigabytes or terabytes here), this is a serious drawback. >>> Vacuum >>> at peak times can cause very intense IO bursts -- even with the >>> enhancements in 8.0. I don't think the solution to the problem is >>> to give >>> users the impression that it is solved and then vacuum their >>> tables during >>> peak periods. I cannot stress this enough. >>> >> >> People running systems with petabyte-sized tables can disable >> autovacuum >> for those tables, and leave it running for the rest. Then they can >> schedule whatever maintenance they see fit on their gigantic tables. >> Trying to run a database with more than a dozen gigabytes of data >> without expert advice (or at least reading the manual) would be >> extremely stupid anyway. >> > > As I've said a few times, I'm not concerned about such users. I'm > concerned about users with some busy tables of a few hundred > megabytes. I > still don't think VACUUM at arbitary times on such tables is suitable. > > Thanks, > > Gavin
pgsql-hackers by date: