Thread: slow vacuum performance

slow vacuum performance

From
pginfo
Date:
Hi,

I am running pg 7.4.1 on linux box.
I have a midle size DB with many updates and after it I try to run
vacuum full analyze.
It takes about 2 h.
If I try to dump and reload the DB it take 20 min.

How can I improve the vacuum full analyze time?

My configuration:

shared_buffers = 15000          # min 16, at least max_connections*2,
8KB each
sort_mem = 10000                # min 64, size in KB
vacuum_mem = 32000              # min 1024, size in KB
effective_cache_size = 40000    # typically 8KB each
#max_fsm_pages = 20000          # min max_fsm_relations*16, 6 bytes each

#max_fsm_relations = 1000       # min 100, ~50 bytes each


regards,
ivan.


Re: slow vacuum performance

From
Bill Moran
Date:
pginfo wrote:
> Hi,
>
> I am running pg 7.4.1 on linux box.
> I have a midle size DB with many updates and after it I try to run
> vacuum full analyze.
> It takes about 2 h.
> If I try to dump and reload the DB it take 20 min.
>
> How can I improve the vacuum full analyze time?

How often are you vacuuming?  If you've gone a LONG time since the last vacuum,
it can take quite a while, to the point where a dump/restore is faster.

A recent realization that I've had some misconceptions about vacuuming led me
to re-read section 8.2 of the admin guide (on vacuuming) ... I highly suggest
a review of these 3 pages of the admin manual, as it contains an excellent
description of why databases need vacuumed, that one can use to determine how
often vacuuming is necessary.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: slow vacuum performance

From
pginfo
Date:
Hi Bill,
I am vacuuming every 24 h.
I have a cron script about i.
But if I make massive update (for example it affects 1 M rows) and I start vacuum,
it take this 2 h.
Also I will note, that this massive update is running in one transaction ( I can
not update 100K and start vacuum after it).

regards,
ivan.

Bill Moran wrote:

> pginfo wrote:
> > Hi,
> >
> > I am running pg 7.4.1 on linux box.
> > I have a midle size DB with many updates and after it I try to run
> > vacuum full analyze.
> > It takes about 2 h.
> > If I try to dump and reload the DB it take 20 min.
> >
> > How can I improve the vacuum full analyze time?
>
> How often are you vacuuming?  If you've gone a LONG time since the last vacuum,
> it can take quite a while, to the point where a dump/restore is faster.
>
> A recent realization that I've had some misconceptions about vacuuming led me
> to re-read section 8.2 of the admin guide (on vacuuming) ... I highly suggest
> a review of these 3 pages of the admin manual, as it contains an excellent
> description of why databases need vacuumed, that one can use to determine how
> often vacuuming is necessary.
>
> --
> Bill Moran
> Potential Technologies
> http://www.potentialtech.com




Re: slow vacuum performance

From
"scott.marlowe"
Date:
On Wed, 24 Mar 2004, pginfo wrote:

> Hi,
>
> I am running pg 7.4.1 on linux box.
> I have a midle size DB with many updates and after it I try to run
> vacuum full analyze.

Is there a reason to not use just regular vacuum / analyze (i.e. NOT
full)?

> It takes about 2 h.

Full vacuums, by their nature, tend to be a bit slow.  It's better to let
the database achieve a kind of "steady state" with regards to number of
dead tuples, and use regular vacuums to reclaim said space rather than a
full vacuum.

> How can I improve the vacuum full analyze time?
>
> My configuration:
>
> shared_buffers = 15000          # min 16, at least max_connections*2,
> 8KB each
> sort_mem = 10000                # min 64, size in KB

You might want to look at dropping sort_mem.  It would appear you've been
going through the postgresql.conf file and bumping up numbers to see what
works and what doesn't.  While most of the settings aren't too dangerous
to crank up a little high, sort_mem is quite dangerous to crank up high,
should you have a lot of people connected who are all sorting.  Note that
sort_mem is a limit PER SORT, not per backend, or per database, or per
user, or even per table, but per sort.  IF a query needs to run three or
four sorts, it can use 3 or 4x sort_mem.  If a hundred users do this at
once, they can then use 300 or 400x sort_mem.  You can see where I'm
heading.

Note that for individual sorts in batch files, like import processes, you
can bump up sort_mem with the set command, so you don't have to have a
large setting in postgresql.conf to use a lot of sort mem when you need
to, you can just grab it during that one session.

> vacuum_mem = 32000              # min 1024, size in KB

If you've got lots of memory, crank up vacuum_mem to the 200 to 500 meg
range and see what happens.

For a good tuning guide, go here:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html


Re: slow vacuum performance

From
pginfo
Date:
Hi,

scott.marlowe wrote:

> On Wed, 24 Mar 2004, pginfo wrote:
>
> > Hi,
> >
> > I am running pg 7.4.1 on linux box.
> > I have a midle size DB with many updates and after it I try to run
> > vacuum full analyze.
>
> Is there a reason to not use just regular vacuum / analyze (i.e. NOT
> full)?
>

Yes, in case I make massive updates (only in my case of cource)   for example
2 M rows, I do not expect to have 2M new rows in next 180 days.That is the
reaso for running vacuum full.
My idea was to free unneedet space and so to have faster system.
It is possible that I am wrong.


> > It takes about 2 h.
>
> Full vacuums, by their nature, tend to be a bit slow.  It's better to let
> the database achieve a kind of "steady state" with regards to number of
> dead tuples, and use regular vacuums to reclaim said space rather than a
> full vacuum.
>
> > How can I improve the vacuum full analyze time?
> >
> > My configuration:
> >
> > shared_buffers = 15000          # min 16, at least max_connections*2,
> > 8KB each
> > sort_mem = 10000                # min 64, size in KB
>
> You might want to look at dropping sort_mem.  It would appear you've been
> going through the postgresql.conf file and bumping up numbers to see what
> works and what doesn't.  While most of the settings aren't too dangerous
> to crank up a little high, sort_mem is quite dangerous to crank up high,
> should you have a lot of people connected who are all sorting.  Note that
> sort_mem is a limit PER SORT, not per backend, or per database, or per
> user, or even per table, but per sort.  IF a query needs to run three or
> four sorts, it can use 3 or 4x sort_mem.  If a hundred users do this at
> once, they can then use 300 or 400x sort_mem.  You can see where I'm
> heading.
>
> Note that for individual sorts in batch files, like import processes, you
> can bump up sort_mem with the set command, so you don't have to have a
> large setting in postgresql.conf to use a lot of sort mem when you need
> to, you can just grab it during that one session.
>

I know. In my case we are using many ID's declared as varchar/name (I know it
is bad idea, butwe are migrating this system from oracle) and pg have very
bad performance with varchar/name indexes.
The only solution I found was to increase the sort mem.
But, I wll try to decrease this one and to see the result.

> > vacuum_mem = 32000              # min 1024, size in KB
>
> If you've got lots of memory, crank up vacuum_mem to the 200 to 500 meg
> range and see what happens.
>

I wil try it today. It is good idea and hope it will help.

> For a good tuning guide, go here:
>
> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

  I know it. It is the best I found and also the site.

Thanks for the help.
ivan.


Re: slow vacuum performance

From
pginfo
Date:

scott.marlowe wrote:

> On Wed, 24 Mar 2004, pginfo wrote:
>
> > Hi,
> >
> > scott.marlowe wrote:
> >
> > > On Wed, 24 Mar 2004, pginfo wrote:
> > >
> > > > Hi,
> > > >
> > > > I am running pg 7.4.1 on linux box.
> > > > I have a midle size DB with many updates and after it I try to run
> > > > vacuum full analyze.
> > >
> > > Is there a reason to not use just regular vacuum / analyze (i.e. NOT
> > > full)?
> > >
> >
> > Yes, in case I make massive updates (only in my case of cource)   for example
> > 2 M rows, I do not expect to have 2M new rows in next 180 days.That is the
> > reaso for running vacuum full.
> > My idea was to free unneedet space and so to have faster system.
> > It is possible that I am wrong.
>
> It's all about percentages.  If you've got an average of 5% dead tuples
> with regular vacuuming, then full vacuums won't gain you much, if
> anything.  If you've got 20 dead tuples for each live one, then a full
> vacuum is pretty much a necessity.  The generally accepted best
> performance comes with 5 to 50% or so dead tuples.  Keep in mind, having a
> few dead tuples is actually a good thing, as your database won't grow then
> srhink the file all the time, but keep it in a steady state size wise.

thanks for the good analyze,ivan.



Re: slow vacuum performance

From
"scott.marlowe"
Date:
On Wed, 24 Mar 2004, pginfo wrote:

> Hi,
>
> scott.marlowe wrote:
>
> > On Wed, 24 Mar 2004, pginfo wrote:
> >
> > > Hi,
> > >
> > > I am running pg 7.4.1 on linux box.
> > > I have a midle size DB with many updates and after it I try to run
> > > vacuum full analyze.
> >
> > Is there a reason to not use just regular vacuum / analyze (i.e. NOT
> > full)?
> >
>
> Yes, in case I make massive updates (only in my case of cource)   for example
> 2 M rows, I do not expect to have 2M new rows in next 180 days.That is the
> reaso for running vacuum full.
> My idea was to free unneedet space and so to have faster system.
> It is possible that I am wrong.

It's all about percentages.  If you've got an average of 5% dead tuples
with regular vacuuming, then full vacuums won't gain you much, if
anything.  If you've got 20 dead tuples for each live one, then a full
vacuum is pretty much a necessity.  The generally accepted best
performance comes with 5 to 50% or so dead tuples.  Keep in mind, having a
few dead tuples is actually a good thing, as your database won't grow then
srhink the file all the time, but keep it in a steady state size wise.