Thread: Is there a reason _not_ to vacuum continuously?
I'm running a load of stress scripts against my staging environment to simulate user interactions, and watching the various boxen as time goes by. I noticed that the CPU utilisation on the DB server (PG 7.2.3, RH7.3, Dual PII 550MHz, 1GB RAM, 1GB database on disk, Single 10k SCSI drive) was increasing over time, and manually launched a vacuum analyze verbose. A typical output from the VAV is: NOTICE: --Relation mobilepm-- NOTICE: Index mobilepm_ownerid_idx: Pages 1103; Tuples 32052: Deleted 46012. CPU 0.15s/0.66u sec elapsed 14.82 sec. NOTICE: Index mobilepm_id_idx: Pages 1113; Tuples 32143: Deleted 46012. CPU 0.33s/1.08u sec elapsed 45.89 sec. NOTICE: Index mobilepm_ownerid_status_idx: Pages 1423; Tuples 32319: Deleted 46 012. CPU 0.52s/1.05u sec elapsed 54.59 sec. NOTICE: Index mobilepm_number_idx: Pages 1141; Tuples 32413: Deleted 46012. CPU 0.26s/0.61u sec elapsed 16.13 sec. NOTICE: Removed 46012 tuples in 2548 pages. CPU 0.88s/0.79u sec elapsed 75.57 sec. NOTICE: Pages 3188: Changed 10, Empty 0; Tup 32007: Vac 46012, Keep 11, UnUsed 0. Total CPU 2.56s/4.25u sec elapsed 216.50 sec. NOTICE: --Relation pg_toast_112846940-- NOTICE: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0. Total CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Analyzing mobilepm So you can see that some tables are seeing a hell of a lot of updates. That's life, and yes, I do need all those indexes :-) Now I see no drop in performance while the VAV is running, the CPU utilisation gradually drops from 80% to 30% on the DB server, and life in general improves. On the live server (PG 7.2.3, RH7.3, Quad Xeon 700Mhz 1MB cache, 4Gb RAM, 256MB write-back RAID10 over 4 10K disks) I vacuum analyze daily, and vacuum analyze a couple of key tables every 15 minutes, but my question is... *** THE QUESTION(S) *** Is there any reason for me not to run continuous sequential vacuum analyzes? At least for the 6 tables that see a lot of updates? I hear 10% of tuples updated as a good time to vac-an, but does my typical count of 3 indexes per table affect that? Cheers Matt Postscript: I may have answered my own question while writing this mail. Under the current stress test load about 10% of the key tables' tuples are updated between sequential vacuum-analyzes, so the received wisdom on intervals suggests '0' in my case anyway...
On Wed, 17 Sep 2003, Matt Clark wrote: > *** THE QUESTION(S) *** > Is there any reason for me not to run continuous sequential vacuum analyzes? > At least for the 6 tables that see a lot of updates? > I hear 10% of tuples updated as a good time to vac-an, but does my typical > count of 3 indexes per table affect that? Generally, the only time continuous vacuuming is a bad thing is when you are I/O bound. If you are CPU bound, then continuous vacuuming is usually acceptable.
Matt, > Is there any reason for me not to run continuous sequential vacuum analyzes? > At least for the 6 tables that see a lot of updates? No. You've already proven that the performance gain on queries offsets the loss from the vacuuming. There is no other "gotcha". However: 1) You may be able to decrease the required frequency of vacuums by adjusting your FSM_relations parameter. Have you played with this at all? The default is very low. 2) Are you sure that ANALYZE is needed? Vacuum is required whenever lots of rows are updated, but analyze is needed only when the *distribution* of values changes significantly. 3) using PG 7.3 or less, you will also need to REINDEX these tables+indexes often (daily?). This issue will go away in 7.4, which should make you an early adopter of 7.4. > I hear 10% of tuples updated as a good time to vac-an, but does my typical > count of 3 indexes per table affect that? Not until 7.4. -- -Josh Berkus Aglio Database Solutions San Francisco
Yes, that makes sense. My worry is really the analyzes. I gather/imagine that: 1) Indexes on fields that are essentially random gain little from being analyzed. 2) Fields that increase monotonically with insertion order have a problem with index growth in 7.2. There may be a performance issue connected with this, although indexes on these fields also gain little from analysis. So if I can't vacuum full I'm SOL anyway and should upgrade to 7.4.1 when available? Further data: When I run a vacuum analyze my app servers do see an increase in response time from PG, even though the DB server is under no more apparent load. I can only assume some kind of locking issue. Is that fair? M > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of > scott.marlowe > Sent: 17 September 2003 20:55 > To: Matt Clark > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Is there a reason _not_ to vacuum continuously? > > > On Wed, 17 Sep 2003, Matt Clark wrote: > > > *** THE QUESTION(S) *** > > Is there any reason for me not to run continuous sequential > vacuum analyzes? > > At least for the 6 tables that see a lot of updates? > > I hear 10% of tuples updated as a good time to vac-an, but does > my typical > > count of 3 indexes per table affect that? > > Generally, the only time continuous vacuuming is a bad thing is when you > are I/O bound. If you are CPU bound, then continuous vacuuming > is usually > acceptable. > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >
> 2) Are you sure that ANALYZE is needed? Vacuum is required > whenever lots of > rows are updated, but analyze is needed only when the *distribution* of > values changes significantly. You are right. I have a related qn in this thread about random vs. monotonic values in indexed fields. > 3) using PG 7.3 or less, you will also need to REINDEX these > tables+indexes > often (daily?). This issue will go away in 7.4, which should > make you an > early adopter of 7.4. I understand this needs an exclusive lock on the whole table, which is simply not possible more than once a month, if that... Workarounds/hack suggestions are more than welcome :-) Ta M
Matt, > I understand this needs an exclusive lock on the whole table, which is > simply not possible more than once a month, if that... Workarounds/hack > suggestions are more than welcome :-) Would it be reasonable to use partial indexes on the table? -- -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Oops! josh@agliodbs.com (Josh Berkus) was seen spray-painting on a wall: >> I understand this needs an exclusive lock on the whole table, which is >> simply not possible more than once a month, if that... Workarounds/hack >> suggestions are more than welcome :-) > > Would it be reasonable to use partial indexes on the table? Dumb question... ... If you create a partial index, does this lock the whole table while it is being built, or only those records that are affected by the index definition? I expect that the answer to that is "Yes, it locks the whole table," which means that a partial index won't really help very much, except insofar as you might, by having it be restrictive in range, lock the table for a somewhat shorter period of time. An alternative that may or may not be viable would be to have a series of tables: create table t1 (); create table t2 (); create table t3 (); create table t4 (); Then create a view: create view t as select * from t1 union all select * from t2 union all select * from t13 union all select * from t4; Then you set this view to be updatable, by having a function that rotates between the 4 tables based on a sequence. You do SELECT NEXTVAL('t_controller') and the entries start flooding into t2 rather than t1, or into t3, or into t4, and after t4, they go back into t1. When you need to reindex t1, you switch over to load entries into t2, do maintenance on t1, and then maybe roll back to t1 so you can do the same maintenance on t2. -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://www3.sympatico.ca/cbbrowne/lisp.html Linux is like a Vorlon. It is incredibly powerful, gives terse, cryptic answers and has a lot of things going on in the background.
3) using PG 7.3 or less, you will also need to REINDEX these tables+indexes often (daily?). This issue will go away in 7.4, which should make you an early adopter of 7.4. Is this true? Haven't heard of this before. If so, how can this be managed in a cronjob? For the hourly VACUUM there's vacuumdb, but is there somehting similar like reindexdb ? regards, Oliver Scheit
> 3) using PG 7.3 or less, you will also need to REINDEX these > tables+indexes often (daily?). This issue will go away > in 7.4, which should make you an early adopter of 7.4. Try monthly maybe. > Is this true? Haven't heard of this before. > If so, how can this be managed in a cronjob? > For the hourly VACUUM there's vacuumdb, but is > there somehting similar like reindexdb ? Yes, there is reindexdb :) Chris
> Yes, there is reindexdb :) Not on my machine. (RH 7.3) #rpm -qa|grep postgres postgresql-server-7.2.3-5.73 postgresql-libs-7.2.3-5.73 postgresql-devel-7.2.3-5.73 postgresql-7.2.3-5.73 What package am I missing? regards, Oliver Scheit
> #rpm -qa|grep postgres > postgresql-server-7.2.3-5.73 > postgresql-libs-7.2.3-5.73 > postgresql-devel-7.2.3-5.73 > postgresql-7.2.3-5.73 > > What package am I missing? It's part of postgresql 7.3. Just get it from the 7.3 contrib dir - it works fine with 7.2 Note that this index growth problem has been basically solved as of postgresql 7.4 - so that is your other option. Chris
>> > It's part of postgresql 7.3. Just get it from the 7.3 >> > contrib dir - it works fine with 7.2 >> That's nice to hear. Thanx for that info. > That's alright - cron job it for once a month - that's what > I do. Basically the problem is that in certain cases > (monotonically increasing serial indexes) for instance, > PosgreSQL < 7.4 is unable to fully reclaim all the > space after a page split. This means that your indexes > just gradually grow really large. Uhm, I'm unable to find reindexdb. I have postgres 7.3.4 on another server, but there's no reindexdb. Can you point me to the right direction? Here's what's installed on that machine: # rpm -qa|grep postgres postgresql-perl-7.2.3-5.73 postgresql-libs-7.3.4-2PGDG postgresql-pl-7.3.4-2PGDG postgresql-7.3.4-2PGDG postgresql-contrib-7.3.4-2PGDG postgresql-server-7.3.4-2PGDG > Yeah - 7.4 beta3 will be out very shortly, you'll probably > have to wait a month or so for a final 7.4 release. Old version is rockstable and quite fast, so no problem with that. > Even then, ugprading postgresql is always a pain in the neck. Upgrading to 7.3.4 was quite easy here. dumped the dbs, uninstalled 7.2, installed 7.3 and let it read the dump. done. regards, Oli
(I've sent him reindexdb off-list) Chris On Thu, 18 Sep 2003, Oliver Scheit wrote: > >> > It's part of postgresql 7.3. Just get it from the 7.3 > >> > contrib dir - it works fine with 7.2 > >> That's nice to hear. Thanx for that info. > > > That's alright - cron job it for once a month - that's what > > I do. Basically the problem is that in certain cases > > (monotonically increasing serial indexes) for instance, > > PosgreSQL < 7.4 is unable to fully reclaim all the > > space after a page split. This means that your indexes > > just gradually grow really large. > > Uhm, I'm unable to find reindexdb. I have postgres 7.3.4 > on another server, but there's no reindexdb. Can you point > me to the right direction? > > Here's what's installed on that machine: > # rpm -qa|grep postgres > postgresql-perl-7.2.3-5.73 > postgresql-libs-7.3.4-2PGDG > postgresql-pl-7.3.4-2PGDG > postgresql-7.3.4-2PGDG > postgresql-contrib-7.3.4-2PGDG > postgresql-server-7.3.4-2PGDG > > > Yeah - 7.4 beta3 will be out very shortly, you'll probably > > have to wait a month or so for a final 7.4 release. > > Old version is rockstable and quite fast, so no problem with > that. > > > Even then, ugprading postgresql is always a pain in the neck. > > Upgrading to 7.3.4 was quite easy here. dumped the dbs, > uninstalled 7.2, installed 7.3 and let it read the dump. done. > > regards, > Oli > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
Guys, I also wrote a perl script that reindexes all tables, if anyone can't get reindexdb working or find it for 7.2. -- Josh Berkus Aglio Database Solutions San Francisco