Thread: When to do a vacuum for highly active table

When to do a vacuum for highly active table

From
Markus Benne
Date:
We have a highly active table that has virtually all
entries updated every 5 minutes.  Typical size of the
table is 50,000 entries, and entries have grown fat.

We are currently vaccuming hourly, and towards the end
of the hour we are seeing degradation, when compared
to the top of the hour.

Vaccum is slowly killing our system, as it is starting
to take up to 10 minutes, and load at the time of
vacuum is 6+ on a Linux box.  During the vacuum,
overall system is goin unresponsive, then comes back
once vacuum completes.

If we run vacuum less frequently, degradation
continues to the point that we can't keep up with the
throughput, plus vacuum takes longer anyway.

Becoming quite a pickle:-)

We are thinking of splitting the table in two: the
part the updates often, and the part the updates
infrequently as we suspect that record size impacts
vacuum.

Any ideas?


Thanks,
Mark

-----------------

Re: When to do a vacuum for highly active table

From
Tom Lane
Date:
Markus Benne <thing@m-bass.com> writes:
> We have a highly active table that has virtually all
> entries updated every 5 minutes.  Typical size of the
> table is 50,000 entries, and entries have grown fat.

> We are currently vaccuming hourly, and towards the end
> of the hour we are seeing degradation, when compared
> to the top of the hour.

On something like this, you really need to be vacuuming more often
not less so; I'd think about how to do it every five or ten minutes
rather than backing off.  With only 50K rows it should really not take
more than a couple of seconds to do the vacuum.  When you wait till
there are 600K dead rows, it's going to take awhile, plus you are
suffering across-the-board performance degradation from all the dead
rows.

If you are using PG 8.0, there are some "vacuum cost" knobs you can
fiddle with to slow down vacuum so it doesn't impose as much I/O load.
Ideally you could get it to where you could run vacuum as often as
you need to without noticing much impact on foreground processing.

If you're not using 8.0 ... maybe it's time to update.

Another thing you might want to do is look at "vacuum verbose" output,
which will give you some idea of the time spent in each step.  It might
be there are specific aspects that could be improved.

> We are thinking of splitting the table in two: the
> part the updates often, and the part the updates
> infrequently as we suspect that record size impacts
> vacuum.

You just said that virtually all rows update constantly --- where's
the "infrequent" part?

            regards, tom lane

Re: When to do a vacuum for highly active table

From
mark@mark.mielke.cc
Date:
On Tue, Aug 30, 2005 at 05:29:17PM -0400, Tom Lane wrote:
> Markus Benne <thing@m-bass.com> writes:
> > We have a highly active table that has virtually all
> > entries updated every 5 minutes.  Typical size of the
> > table is 50,000 entries, and entries have grown fat.
> ...
> > We are thinking of splitting the table in two: the
> > part the updates often, and the part the updates
> > infrequently as we suspect that record size impacts
> > vacuum.
> You just said that virtually all rows update constantly --- where's
> the "infrequent" part?

I think he means splitting it vertically, instead of horizontally, and
it sounds like an excellent idea, if a large enough portion of each
record is in fact mostly fixed. Otherwise, PostgreSQL is copying data
multiple times, only to have the data expire as part of a dead row.

I've already started to notice such issues with postgresql - but more
because I'm using low-end hardware, and I'm projecting the effect for
when our database becomes much larger with much higher demand on the
database.

This is the sort of scenario where a database without transactional
integrity would significantly out-perform one designed around it. If
records are fixed sized, and updated in place, these problems would
occur far less often. Is it heresy to suggest MySQL in here? :-)

I switched from MySQL to PostgreSQL several months ago, and haven't
looked back - but they do work differently, and for certain uses, one
can destroy the other. Using a MyISAM table would be the way I would
go with this sort of problem.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: When to do a vacuum for highly active table

From
Chris Browne
Date:
thing@m-bass.com (Markus Benne) writes:
> We have a highly active table that has virtually all
> entries updated every 5 minutes.  Typical size of the
> table is 50,000 entries, and entries have grown fat.
>
> We are currently vaccuming hourly, and towards the end
> of the hour we are seeing degradation, when compared
> to the top of the hour.

You're not vacuuming the table nearly often enough.

You should vacuum this table every five minutes, and possibly more
often than that.

[We have some tables like that, albeit smaller than 50K entries, which
we vacuum once per minute in production...]

> We are thinking of splitting the table in two: the part the updates
> often, and the part the updates infrequently as we suspect that
> record size impacts vacuum.

There's *some* merit to that.

You might discover that there's a "hot spot" that needs to be vacuumed
once per minute.

But it may be simpler to just hit the table with a vacuum once every
few minutes even though some tuples are seldom updated.
--
output = reverse("gro.gultn" "@" "enworbbc")
http://cbbrowne.com/info/spreadsheets.html
Signs  of  a  Klingon  Programmer #3:  "By   filing this  TPR you have
challenged the honor of my family. Prepare to die!"

Re: When to do a vacuum for highly active table

From
Tom Lane
Date:
mark@mark.mielke.cc writes:
> I think he means splitting it vertically, instead of horizontally, and
> it sounds like an excellent idea, if a large enough portion of each
> record is in fact mostly fixed. Otherwise, PostgreSQL is copying data
> multiple times, only to have the data expire as part of a dead row.

Only up to a point.  Fields that are wide enough to get toasted
out-of-line (multiple Kb) do not get physically copied if there's
a row update that doesn't affect them.  We don't really have enough
information about his table to guess whether there's any point in
manually partitioning the columns, but my leaning would be "probably
not" --- the overhead in joining the resulting two tables would be
high enough that you'd need a heck of a big improvement to justify it.

            regards, tom lane

Re: When to do a vacuum for highly active table

From
"Rigmor Ukuhe"
Date:
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of Markus Benne
> Sent: Wednesday, August 31, 2005 12:14 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] When to do a vacuum for highly active table
>
> We have a highly active table that has virtually all
> entries updated every 5 minutes.  Typical size of the
> table is 50,000 entries, and entries have grown fat.
>
> We are currently vaccuming hourly, and towards the end
> of the hour we are seeing degradation, when compared
> to the top of the hour.
>
> Vaccum is slowly killing our system, as it is starting
> to take up to 10 minutes, and load at the time of
> vacuum is 6+ on a Linux box.  During the vacuum,
> overall system is goin unresponsive, then comes back
> once vacuum completes.

Play with vacuum_cost_delay option. In our case it made BIG difference
(going from very heavy hitting to almost unnoticed vacuuming.)

Hope it helps.

Rigmor Ukuhe

>
> If we run vacuum less frequently, degradation
> continues to the point that we can't keep up with the
> throughput, plus vacuum takes longer anyway.
>
> Becoming quite a pickle:-)
>
> We are thinking of splitting the table in two: the
> part the updates often, and the part the updates
> infrequently as we suspect that record size impacts
> vacuum.
>
> Any ideas?
>
>
> Thanks,
> Mark
>
> -----------------
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


Re: When to do a vacuum for highly active table

From
Chris Browne
Date:
rigmor.ukuhe@finestmedia.ee ("Rigmor Ukuhe") writes:

>> -----Original Message-----
>> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-
>> owner@postgresql.org] On Behalf Of Markus Benne
>> Sent: Wednesday, August 31, 2005 12:14 AM
>> To: pgsql-performance@postgresql.org
>> Subject: [PERFORM] When to do a vacuum for highly active table
>>
>> We have a highly active table that has virtually all
>> entries updated every 5 minutes.  Typical size of the
>> table is 50,000 entries, and entries have grown fat.
>>
>> We are currently vaccuming hourly, and towards the end
>> of the hour we are seeing degradation, when compared
>> to the top of the hour.
>>
>> Vaccum is slowly killing our system, as it is starting
>> to take up to 10 minutes, and load at the time of
>> vacuum is 6+ on a Linux box.  During the vacuum,
>> overall system is goin unresponsive, then comes back
>> once vacuum completes.
>
> Play with vacuum_cost_delay option. In our case it made BIG difference
> (going from very heavy hitting to almost unnoticed vacuuming.)

That helps only if the ONLY problem you're having is from the direct
I/O of the vacuum.

If part of the problem is that the table is so large that it takes 4h
for VACUUM to complete, thereby leaving a transaction open for 4h,
thereby causing other degradations, then vacuum_cost_delay will have a
NEGATIVE impact, as it will mean that the vacuum on that table will
take even /more/ than 4h.  :-(

For the above scenario, it is almost certain that the solution comes
in two pieces:

1.  VACUUM FULL / CLUSTER to bring the size down.

    The table has grown "fat," and no number of repetitions of "plain
    vacuum" will fix this.

2.  Do "plain vacuum" on the table VASTLY more frequently, probably
    every 5 minutes, possibly more often than that.

    By doing this, you prevent things from getting so bad again.

By the way, in this sort of situation, _ANY_ transaction that runs
more than about 5 minutes represents a serious enemy to performance,
as it will tend to cause the "hot" table to "get fatter."
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/linux.html
TECO Madness: a moment of regret, a lifetime of convenience.
-- Kent Pitman