Re: vacuum locking - Mailing list pgsql-performance

From Rob Nagler
Subject Re: vacuum locking
Date
Msg-id 16279.4988.496000.539525@gargle.gargle.HOWL
Whole thread Raw
In response to Re: vacuum locking  (Vivek Khera <khera@kcilink.com>)
Responses Re: vacuum locking
Re: vacuum locking
List pgsql-performance
Vivek Khera writes:
> AMI or Adaptec based?

Adaptec, I think.  AIC-7899 LVD SCSI is what dmidecode says, and
Red Hat/Adaptec aacraid driver, Aug 18 2003 is what comes up when it
boots.  I haven't be able to use the aac utilities with this driver,
however, so it's hard to interrogate the device.

> If AMI, make sure it has write-back cache enabled (and you have
> battery backup!), and disable the 'readahead' feature if you can.

I can't do this so easily.  It's at a colo, and it's production.
I doubt this has anything to do with this problem, anyway.  We're
talking about hundreds of megabytes of data.

> What's the disk utilization proir to running vacuum?  If it is
> hovering around 95% or more of capacity, of course you're gonna
> overwhelm it.

Here's the vmstat 5 at a random time:

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0 272372  38416  78220 375048   0   3     2     0    0     0   2   2   0
 0  0  0 272372  30000  78320 375660   0   0    34   274  382   284   5   1  94
 0  1  0 272372  23012  78372 375924   0   0    25   558  445   488   8   2  90
 1  0  0 272368  22744  78472 376192   0   6   125   594  364   664   9   3  88

And here's it during vacuum:

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  2  1 277292   9620  72028 409664  46  32  4934  4812 1697   966   8   4  88
 0  3  0 277272   9588  72096 412964  61   0  7303  2478 1391   976   3   3  94
 2  2  0 277336   9644  72136 393264 1326  32  2827  2954 1693  1519   8   3  89
The pages are growing proportionately with the number of tuples, btw.
Here's a vacuum snippet from a few days ago after a clean import,
running every 15 minutes:

INFO:  Removed 2192 tuples in 275 pages.
        CPU 0.06s/0.01u sec elapsed 0.91 sec.
INFO:  Pages 24458: Changed 260, Empty 0; Tup 1029223: Vac 2192, Keep 3876, UnUsed 26.
        Total CPU 2.91s/2.22u sec elapsed 65.74 sec.

And here's the latest today, running every 2 hours:

INFO:  Removed 28740 tuples in 1548 pages.
        CPU 0.08s/0.06u sec elapsed 3.73 sec.
INFO:  Pages 27277: Changed 367, Empty 0; Tup 1114178: Vac 28740, Keep 1502, UnUsed 10631.
        Total CPU 4.78s/4.09u sec elapsed 258.10 sec.

The big tables/indexes are taking longer, but it's a big CPU/elapsed
time savings to vacuum every two hours vs every 15 minutes.

There's still the problem that when vacuum is running interactive
performance drops dramatically.  A query that takes a couple of
seconds to run when the db isn't being vacuumed will take minutes when
vacuum is running.  It's tough for me to correlate exactly, but I
suspect that while postgres is vacuuming an index or table, nothing else
runs.  In between relations, other stuff gets to run, and then vacuum
hogs all the resources again.  This could be for disk reasons or
simply because postgres locks the index or table while it is being
vacuumed.  Either way, the behavior is unacceptable.  Users shouldn't
have to wait minutes while the database picks up after itself.

The concept of vacuuming seems to be problematic.  I'm not sure why
the database simply can't garbage collect incrementally.  AGC is very
tricky, especially AGC that involves gigabytes of data on disk.
Incremental garbage collection seems to be what other databases do,
and it's been my experience that other databases don't have the type
of unpredictable behavior I'm seeing with Postgres.  I'd rather the
database be a little bit slower on average than have to figure out the
best time to inconvenience my users.

Since my customer already has Oracle, we'll be running tests in the
coming month(s :-) with Oracle to see how it performs under the same
load and hardware.  I'll keep this group posted.

Rob



pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: slow select
Next
From: Tom Lane
Date:
Subject: Re: vacuum locking