Thread: Is there a reason _not_ to vacuum continuously?

Is there a reason _not_ to vacuum continuously?

From
"Matt Clark"
Date:
I'm running a load of stress scripts against my staging environment to
simulate user interactions, and watching the various boxen as time goes by.

I noticed that the CPU utilisation on the DB server (PG 7.2.3, RH7.3, Dual
PII 550MHz, 1GB RAM, 1GB database on disk, Single 10k SCSI drive) was
increasing over time, and manually launched a vacuum analyze verbose.

A typical output from the VAV is:

NOTICE:  --Relation mobilepm--
NOTICE:  Index mobilepm_ownerid_idx: Pages 1103; Tuples 32052: Deleted
46012.
        CPU 0.15s/0.66u sec elapsed 14.82 sec.
NOTICE:  Index mobilepm_id_idx: Pages 1113; Tuples 32143: Deleted 46012.
        CPU 0.33s/1.08u sec elapsed 45.89 sec.
NOTICE:  Index mobilepm_ownerid_status_idx: Pages 1423; Tuples 32319:
Deleted 46
012.
        CPU 0.52s/1.05u sec elapsed 54.59 sec.
NOTICE:  Index mobilepm_number_idx: Pages 1141; Tuples 32413: Deleted 46012.
        CPU 0.26s/0.61u sec elapsed 16.13 sec.
NOTICE:  Removed 46012 tuples in 2548 pages.
        CPU 0.88s/0.79u sec elapsed 75.57 sec.
NOTICE:  Pages 3188: Changed 10, Empty 0; Tup 32007: Vac 46012, Keep 11,
UnUsed
0.
        Total CPU 2.56s/4.25u sec elapsed 216.50 sec.
NOTICE:  --Relation pg_toast_112846940--
NOTICE:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
NOTICE:  Analyzing mobilepm

So you can see that some tables are seeing a hell of a lot of updates.
That's life, and yes, I do need all those indexes :-)

Now I see no drop in performance while the VAV is running, the CPU
utilisation gradually drops from 80% to 30% on the DB server, and life in
general improves.

On the live server (PG 7.2.3, RH7.3, Quad Xeon 700Mhz 1MB cache, 4Gb RAM,
256MB write-back RAID10 over 4 10K disks) I vacuum analyze daily, and vacuum
analyze a couple of key tables every 15 minutes, but my question is...

*** THE QUESTION(S) ***
Is there any reason for me not to run continuous sequential vacuum analyzes?
At least for the 6 tables that see a lot of updates?
I hear 10% of tuples updated as a good time to vac-an, but does my typical
count of 3 indexes per table affect that?

Cheers

Matt


Postscript:  I may have answered my own question while writing this mail.
Under the current stress test load about 10% of the key tables' tuples are
updated between sequential vacuum-analyzes, so the received wisdom on
intervals suggests '0' in my case anyway...




Re: Is there a reason _not_ to vacuum continuously?

From
"scott.marlowe"
Date:
On Wed, 17 Sep 2003, Matt Clark wrote:

> *** THE QUESTION(S) ***
> Is there any reason for me not to run continuous sequential vacuum analyzes?
> At least for the 6 tables that see a lot of updates?
> I hear 10% of tuples updated as a good time to vac-an, but does my typical
> count of 3 indexes per table affect that?

Generally, the only time continuous vacuuming is a bad thing is when you
are I/O bound.  If you are CPU bound, then continuous vacuuming is usually
acceptable.


Re: Is there a reason _not_ to vacuum continuously?

From
Josh Berkus
Date:
Matt,

> Is there any reason for me not to run continuous sequential vacuum analyzes?
> At least for the 6 tables that see a lot of updates?

No.  You've already proven that the performance gain on queries offsets the
loss from the vacuuming.   There is no other "gotcha".

However:
1) You may be able to decrease the required frequency of vacuums by adjusting
your FSM_relations parameter.  Have you played with this at all?  The default
is very low.
2) Are you sure that ANALYZE is needed?   Vacuum is required whenever lots of
rows are updated, but analyze is needed only when the *distribution* of
values changes significantly.
3) using PG 7.3 or less, you will also need to REINDEX these tables+indexes
often (daily?).   This issue will go away in 7.4, which should make you an
early adopter of 7.4.

> I hear 10% of tuples updated as a good time to vac-an, but does my typical
> count of 3 indexes per table affect that?

Not until 7.4.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Is there a reason _not_ to vacuum continuously?

From
"Matt Clark"
Date:
Yes, that makes sense.  My worry is really the analyzes.  I gather/imagine
that:

1)    Indexes on fields that are essentially random gain little from being
analyzed.
2)    Fields that increase monotonically with insertion order have a problem
with index growth in 7.2.  There may be a performance issue connected with
this, although indexes on these fields also gain little from analysis.  So
if I can't vacuum full I'm SOL anyway and should upgrade to 7.4.1 when
available?

Further data:  When I run a vacuum analyze my app servers do see an increase
in response time from PG, even though the DB server is under no more
apparent load.  I can only assume some kind of locking issue.  Is that fair?

M





> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of
> scott.marlowe
> Sent: 17 September 2003 20:55
> To: Matt Clark
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Is there a reason _not_ to vacuum continuously?
>
>
> On Wed, 17 Sep 2003, Matt Clark wrote:
>
> > *** THE QUESTION(S) ***
> > Is there any reason for me not to run continuous sequential
> vacuum analyzes?
> > At least for the 6 tables that see a lot of updates?
> > I hear 10% of tuples updated as a good time to vac-an, but does
> my typical
> > count of 3 indexes per table affect that?
>
> Generally, the only time continuous vacuuming is a bad thing is when you
> are I/O bound.  If you are CPU bound, then continuous vacuuming
> is usually
> acceptable.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>


Re: Is there a reason _not_ to vacuum continuously?

From
"Matt Clark"
Date:
> 2) Are you sure that ANALYZE is needed?   Vacuum is required
> whenever lots of
> rows are updated, but analyze is needed only when the *distribution* of
> values changes significantly.

You are right. I have a related qn in this thread about random vs. monotonic
values in indexed fields.

> 3) using PG 7.3 or less, you will also need to REINDEX these
> tables+indexes
> often (daily?).   This issue will go away in 7.4, which should
> make you an
> early adopter of 7.4.

I understand this needs an exclusive lock on the whole table, which is
simply not possible more than once a month, if that...  Workarounds/hack
suggestions are more than welcome :-)

Ta

M


Re: Is there a reason _not_ to vacuum continuously?

From
Josh Berkus
Date:
Matt,

> I understand this needs an exclusive lock on the whole table, which is
> simply not possible more than once a month, if that...  Workarounds/hack
> suggestions are more than welcome :-)

Would it be reasonable to use partial indexes on the table?

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                        Josh Berkus
   Complete information technology     josh@agliodbs.com
    and data management solutions     (415) 565-7293
   for law firms, small businesses      fax 621-2533
    and non-profit organizations.     San Francisco


Re: Is there a reason _not_ to vacuum continuously?

From
Christopher Browne
Date:
Oops! josh@agliodbs.com (Josh Berkus) was seen spray-painting on a wall:
>> I understand this needs an exclusive lock on the whole table, which is
>> simply not possible more than once a month, if that...  Workarounds/hack
>> suggestions are more than welcome :-)
>
> Would it be reasonable to use partial indexes on the table?

Dumb question...

... If you create a partial index, does this lock the whole table
while it is being built, or only those records that are affected by
the index definition?

I expect that the answer to that is "Yes, it locks the whole table,"
which means that a partial index won't really help very much, except
insofar as you might, by having it be restrictive in range, lock the
table for a somewhat shorter period of time.

An alternative that may or may not be viable would be to have a series
of tables:

 create table t1 ();
 create table t2 ();
 create table t3 ();
 create table t4 ();

Then create a view:

  create view t as select * from t1 union all select * from t2 union
    all select * from t13 union all select * from t4;

Then you set this view to be updatable, by having a function that
rotates between the 4 tables based on a sequence.

You do SELECT NEXTVAL('t_controller') and the entries start flooding
into t2 rather than t1, or into t3, or into t4, and after t4, they go
back into t1.

When you need to reindex t1, you switch over to load entries into t2,
do maintenance on t1, and then maybe roll back to t1 so you can do the
same maintenance on t2.
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://www3.sympatico.ca/cbbrowne/lisp.html
Linux is like a Vorlon. It is incredibly powerful, gives terse,
cryptic answers and has a lot of things going on in the background.

Re: Is there a reason _not_ to vacuum continuously?

From
"Oliver Scheit"
Date:
3) using PG 7.3 or less, you will also need to REINDEX these
   tables+indexes often (daily?).   This issue will go away
   in 7.4, which should make you an early adopter of 7.4.

Is this true? Haven't heard of this before.
If so, how can this be managed in a cronjob?
For the hourly VACUUM there's vacuumdb, but is
there somehting similar like reindexdb ?

regards,
Oliver Scheit

Re: Is there a reason _not_ to vacuum continuously?

From
"Christopher Kings-Lynne"
Date:
> 3) using PG 7.3 or less, you will also need to REINDEX these
>    tables+indexes often (daily?).   This issue will go away
>    in 7.4, which should make you an early adopter of 7.4.

Try monthly maybe.

> Is this true? Haven't heard of this before.
> If so, how can this be managed in a cronjob?
> For the hourly VACUUM there's vacuumdb, but is
> there somehting similar like reindexdb ?

Yes, there is reindexdb :)

Chris


Re: Is there a reason _not_ to vacuum continuously?

From
"Oliver Scheit"
Date:
> Yes, there is reindexdb :)

Not on my machine. (RH 7.3)

#rpm -qa|grep postgres
postgresql-server-7.2.3-5.73
postgresql-libs-7.2.3-5.73
postgresql-devel-7.2.3-5.73
postgresql-7.2.3-5.73

What package am I missing?

regards,
Oliver Scheit

Re: Is there a reason _not_ to vacuum continuously?

From
"Christopher Kings-Lynne"
Date:
> #rpm -qa|grep postgres
> postgresql-server-7.2.3-5.73
> postgresql-libs-7.2.3-5.73
> postgresql-devel-7.2.3-5.73
> postgresql-7.2.3-5.73
>
> What package am I missing?

It's part of postgresql 7.3.  Just get it from the 7.3 contrib dir - it
works fine with 7.2

Note that this index growth problem has been basically solved as of
postgresql 7.4 - so that is your other option.

Chris




Re: Is there a reason _not_ to vacuum continuously?

From
"Oliver Scheit"
Date:
>> > It's part of postgresql 7.3.  Just get it from the 7.3
>> > contrib dir - it works fine with 7.2
>> That's nice to hear. Thanx for that info.

> That's alright - cron job it for once a month - that's what
> I do.  Basically the problem is that in certain cases
> (monotonically increasing serial indexes) for instance,
> PosgreSQL < 7.4 is unable to fully reclaim all the
> space after a page split.  This means that your indexes
> just gradually grow really large.

Uhm, I'm unable to find reindexdb. I have postgres 7.3.4
on another server, but there's no reindexdb. Can you point
me to the right direction?

Here's what's installed on that machine:
# rpm -qa|grep postgres
postgresql-perl-7.2.3-5.73
postgresql-libs-7.3.4-2PGDG
postgresql-pl-7.3.4-2PGDG
postgresql-7.3.4-2PGDG
postgresql-contrib-7.3.4-2PGDG
postgresql-server-7.3.4-2PGDG

> Yeah - 7.4 beta3 will be out very shortly, you'll probably
> have to wait a month or so for a final 7.4 release.

Old version is rockstable and quite fast, so no problem with
that.

> Even then, ugprading postgresql is always a pain in the neck.

Upgrading to 7.3.4 was quite easy here. dumped the dbs,
uninstalled 7.2, installed 7.3 and let it read the dump. done.

regards,
Oli

Re: Is there a reason _not_ to vacuum continuously?

From
Christopher Kings-Lynne
Date:
(I've sent him reindexdb off-list)

Chris

On Thu, 18 Sep 2003, Oliver Scheit wrote:

> >> > It's part of postgresql 7.3.  Just get it from the 7.3
> >> > contrib dir - it works fine with 7.2
> >> That's nice to hear. Thanx for that info.
>
> > That's alright - cron job it for once a month - that's what
> > I do.  Basically the problem is that in certain cases
> > (monotonically increasing serial indexes) for instance,
> > PosgreSQL < 7.4 is unable to fully reclaim all the
> > space after a page split.  This means that your indexes
> > just gradually grow really large.
>
> Uhm, I'm unable to find reindexdb. I have postgres 7.3.4
> on another server, but there's no reindexdb. Can you point
> me to the right direction?
>
> Here's what's installed on that machine:
> # rpm -qa|grep postgres
> postgresql-perl-7.2.3-5.73
> postgresql-libs-7.3.4-2PGDG
> postgresql-pl-7.3.4-2PGDG
> postgresql-7.3.4-2PGDG
> postgresql-contrib-7.3.4-2PGDG
> postgresql-server-7.3.4-2PGDG
>
> > Yeah - 7.4 beta3 will be out very shortly, you'll probably
> > have to wait a month or so for a final 7.4 release.
>
> Old version is rockstable and quite fast, so no problem with
> that.
>
> > Even then, ugprading postgresql is always a pain in the neck.
>
> Upgrading to 7.3.4 was quite easy here. dumped the dbs,
> uninstalled 7.2, installed 7.3 and let it read the dump. done.
>
> regards,
> Oli
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>


Re: Is there a reason _not_ to vacuum continuously?

From
Josh Berkus
Date:
Guys,

I also wrote a perl script that reindexes all tables, if anyone can't get
reindexdb working or find it for 7.2.

--
Josh Berkus
Aglio Database Solutions
San Francisco