Thread: postgresql meltdown on PlanetMath.org

postgresql meltdown on PlanetMath.org

From
Aaron Krowne
Date:
Hi all.

As the topic suggests, I am having fairly critical troubles with
postgresql on PlanetMath.org (a site which I run).  You can go there and
try to pull up some entries and you will see the problem: everything is
incredibly slow.

It is hard to pinpoint when this began happening, but I've tried a
variety of optimizations to fix it, all of which have failed.

First: the machine.  The machine is not too spectactular, but it is not
so bad that the performance currently witnessed should be happening.  It
is a dual PIII-650 with 512MB of RAM and a 20gb IDE drive (yes, DMA is
on).  There is plenty of free space on the drive.

Now, the optimisations I have tried:

- Using hash indices everywhere.  A few months ago, I did this, and
  there was a dramatic and instant speed up.  However, this began
  degenerating.  I also noticed in the logs that there was deadlock
  happening all over the place.  The server response time was
  intolerable so I figured the deadlock might have something to do with
  this, and eliminated all hash indices (replaced with normal BTree
  indices).

- Going back to BTrees yielded a temporary respite, but soon enough the
  server was back to half a minute to pull up an already-cached entry,
  which is of course crazy.

- I then tried increasing the machines shared memory max to 75% of the
  physical memory, and scaled postgresql's buffers accordingly.  This
  also sped things up for a while, but again resulted in eventual
  degeneration.  Even worse, there were occasional crashes due to
  running out of memory that (according to my calculations) shouldn't
  have been happening.

- Lastly, I tried reducing the shared memory max and limiting postgresql
  to more conservative values, although still not to the out-of-box
  values.  Right now shared memory max on the system is 128mb,
  postgres's shared buffers are at 64mb, sort_mem is at 16mb, and
  effective cache size is at 10mb.

For perspective, the size of the PlanetMath database dump is 24mb.  It
should be able to fit in memory easily, so I'm not sure what I'm doing
wrong regarding the caching.

For the most trivial request, Postgresql takes up basically all the CPU
for the duration of the request.  The load average of the machine is
over-unity at all times, sometimes as bad as being the 30's.  None of
this happens without postgres running, so it is definitely the culprit.

The site averages about one hit every twenty seconds.  This should not
be an overwhelming load, especially for what is just pulling up cached
information 99% of the time.

Given this scenario, can anyone advise?  I am particularly puzzled as to
why everything I tried initially helped, but always degenerated rather
rapidly to a near standstill.  It seems to me that everything should be
able to be cached in memory with no problem, perhaps I need to force
this more explicitly.

My next step, if I cannot fix this, is to try mysql =(

Anyway, whoever helps would be doing a great service to many who use
PlanetMath =)  It'd be much appreciated.

Aaron Krowne



Re: postgresql meltdown on PlanetMath.org

From
Sean Chittenden
Date:
> As the topic suggests, I am having fairly critical troubles with
> postgresql on PlanetMath.org (a site which I run).  You can go there and
> try to pull up some entries and you will see the problem: everything is
> incredibly slow.

Have you read the following?

http://developer.postgresql.org/docs/postgres/performance-tips.html

> First: the machine.  The machine is not too spectactular, but it is not
> so bad that the performance currently witnessed should be happening.  It
> is a dual PIII-650 with 512MB of RAM and a 20gb IDE drive (yes, DMA is
> on).  There is plenty of free space on the drive.

This shouldn't be an issue for the load you describe.  A p-100 should
be okay, but it depends on your queries that you're performing.

> Now, the optimisations I have tried:

*) Stick with btree's.

> - I then tried increasing the machines shared memory max to 75% of the
>   physical memory, and scaled postgresql's buffers accordingly.  This
>   also sped things up for a while, but again resulted in eventual
>   degeneration.  Even worse, there were occasional crashes due to
>   running out of memory that (according to my calculations) shouldn't
>   have been happening.

*) Don't do this, go back to near default levels.  I bet this is
 hurting your setup.

> - Lastly, I tried reducing the shared memory max and limiting postgresql
>   to more conservative values, although still not to the out-of-box
>   values.  Right now shared memory max on the system is 128mb,
>   postgres's shared buffers are at 64mb, sort_mem is at 16mb, and
>   effective cache size is at 10mb.

*) You shouldn't have to do this either.

> For perspective, the size of the PlanetMath database dump is 24mb.
> It should be able to fit in memory easily, so I'm not sure what I'm
> doing wrong regarding the caching.

I hate to say this, but this sounds like a config error.  :-/

> For the most trivial request, Postgresql takes up basically all the
> CPU for the duration of the request.  The load average of the
> machine is over-unity at all times, sometimes as bad as being the
> 30's.  None of this happens without postgres running, so it is
> definitely the culprit.

*) Send an EXPLAIN statement as specified here:

http://developer.postgresql.org/docs/postgres/performance-tips.html#USING-EXPLAIN

> The site averages about one hit every twenty seconds.  This should not
> be an overwhelming load, especially for what is just pulling up cached
> information 99% of the time.

*) Have you done a vacuum analyze?

http://developer.postgresql.org/docs/postgres/populate.html#POPULATE-ANALYZE

> Given this scenario, can anyone advise?  I am particularly puzzled
> as to why everything I tried initially helped, but always
> degenerated rather rapidly to a near standstill.  It seems to me
> that everything should be able to be cached in memory with no
> problem, perhaps I need to force this more explicitly.

*) Send the EXPLAIN output and we can work from there.

> My next step, if I cannot fix this, is to try mysql =(

Bah, don't throw down the gauntlet, it's pretty clear this is a local
issue and not a problem with the DB.  :)

-sc

--
Sean Chittenden

Re: postgresql meltdown on PlanetMath.org

From
Tom Lane
Date:
Aaron Krowne <akrowne@vt.edu> writes:
> As the topic suggests, I am having fairly critical troubles with
> postgresql on PlanetMath.org (a site which I run).

Um ... not meaning to insult your intelligence, but how often do you
vacuum?  Also, exactly what Postgres version are you running?  Can
you show us EXPLAIN ANALYZE results for some of the slow queries?

            regards, tom lane

Re: postgresql meltdown on PlanetMath.org

From
Joe Conway
Date:
Aaron Krowne wrote:
> Given this scenario, can anyone advise?  I am particularly puzzled as to
> why everything I tried initially helped, but always degenerated rather
> rapidly to a near standstill.  It seems to me that everything should be
> able to be cached in memory with no problem, perhaps I need to force
> this more explicitly.

Basic guidance:
- Keep shared memory use reasonable; your final settings of 64M shared
   buffers and 16M sort_mem sound OK. In any case, be sure you're not
   disk-swapping.
- If you don't already, run VACUUM ANALYZE on some regular schedule
   (how often depends on your data turnover rate)
- Possibly consider running REINDEX periodically
- Post the SQL and EXPLAIN ANALYZE output for the queries causing the
   worst of your woes to the list

Explanations of these can be found by searching the list archives and
reading the related sections of the manual.

A few questions:
- What version of Postgres?
- Have you run VACUUM FULL ANALYZE lately (or at least VACUUM ANALYZE)?
- Does the database see mostly SELECTs and INSERTs, or are there many
   UPDATEs and/or DELETEs too?
- Are all queries slow, or particular ones?

HTH,
Joe



Re: postgresql meltdown on PlanetMath.org

From
Aaron Krowne
Date:
> - Keep shared memory use reasonable; your final settings of 64M shared
>   buffers and 16M sort_mem sound OK. In any case, be sure you're not
>   disk-swapping.

Yeah, those seem like reasonable values to me.  But I am not sure I'm
not disk-swapping, in fact it is almost certainly going on here bigtime.

> - If you don't already, run VACUUM ANALYZE on some regular schedule
>   (how often depends on your data turnover rate)

I've done it here and there, especially when things seem slow.  Never
seems to help much; the data turnover isn't high.

> - Possibly consider running REINDEX periodically

Ok thats a new one, I'll try that out.

> - Post the SQL and EXPLAIN ANALYZE output for the queries causing the
>   worst of your woes to the list
> - Are all queries slow, or particular ones?

I'm grouping two separate things together to reply to,  because the
second point answers the first: there's really no single culprit.  Every
SELECT has a lag on the scale of a second; resolving all of the foreign
keys in various tables to construct a typical data-rich page piles up
many of these.   I'm assuming the badness of this depends on how much
swapping is going on.

> Explanations of these can be found by searching the list archives and
> reading the related sections of the manual.

Will check that out, thanks.

> A few questions:
> - What version of Postgres?

7.2.1

> - Have you run VACUUM FULL ANALYZE lately (or at least VACUUM ANALYZE)?

Yes, after a particularly bad slowdown... it didn't seem to fix things.

> - Does the database see mostly SELECTs and INSERTs, or are there many
>   UPDATEs and/or DELETEs too?

Almost exclusively SELECTs.

OK, I have just run a VACUUM FULL ANALYZE and things seem much better...
which would be the first time its really made a difference =)  I tried
comparing an EXPLAIN ANALYZE of a single row select on the main objects
table before and after the vacuum, and the plan didn't change
(sequential scan still), but the response time went from ~1 second to
~5msec!  I'm not really sure what could have happened here
behind-the-scenes since it didn't start using the index, and there
probably weren't more than 10% updated/added rows since the last VACUUM.

I actually thought I had a task scheduled which was running a VACUUM
periodically, but maybe it broke for some reason or another.  Still, I
have not been getting consistent results from running VACUUMs, so I'm
not entirely confident that the book is closed on the problem.

Thanks for your help.

apk

Re: postgresql meltdown on PlanetMath.org

From
Aaron Krowne
Date:
> Have you read the following?
> http://developer.postgresql.org/docs/postgres/performance-tips.html

Yup.  I would never go and bother real people without first checking the
manual, but I bet you get a lot of that =)

> This shouldn't be an issue for the load you describe.  A p-100 should
> be okay, but it depends on your queries that you're performing.

Mostly just gather-retrieval based on unique identifier keys in a bunch
of tables.  Really mundane stuff.

> *) Stick with btree's.

Yeah, that saddens me, though =)  When I initially switched to hashes,
things were blazing.  This application makes heavy use of keys and equal
comparisons on indices, so hashes are really the optimal index
structure.  I'd like to be able to go back to using them some day... if
not for the concurrency issue, which seems like it should be fixable
(even having mutually exclusive locking on the entire index would
probably be fine for this application and would prevent deadlock).

> > - I then tried increasing the machines shared memory max to 75% of the
> >   physical memory, and scaled postgresql's buffers accordingly.  This
> *) Don't do this, go back to near default levels.  I bet this is
>  hurting your setup.
> > - Lastly, I tried reducing the shared memory max and limiting postgresql
> >   to more conservative values, although still not to the out-of-box
> >   values.  Right now shared memory max on the system is 128mb,
> >   postgres's shared buffers are at 64mb, sort_mem is at 16mb, and
> >   effective cache size is at 10mb.
> *) You shouldn't have to do this either.

Well, I've now been advised that the best way is all 3 that I have tried
(among aggressive buffering, moderate buffering, and default
conservative buffering).

Perhaps you could explain to me why the system shouldn't be ok with the
moderate set of buffer sizes on a 512mb machine?  I don't really know
enough about the internals of postgres to be doing anything but voodoo
when I change the values.

> I hate to say this, but this sounds like a config error.  :-/

Thats better than a hardware error!  This is what I wanted to hear =)

> *) Have you done a vacuum analyze?

See previous message to list (summary: it worked this time, but usually
it does not help.)

Thanks,

Aaron Krowne

Re: postgresql meltdown on PlanetMath.org

From
Sean Chittenden
Date:
> > > - Lastly, I tried reducing the shared memory max and limiting postgresql
> > >   to more conservative values, although still not to the out-of-box
> > >   values.  Right now shared memory max on the system is 128mb,
> > >   postgres's shared buffers are at 64mb, sort_mem is at 16mb, and
> > >   effective cache size is at 10mb.
> > *) You shouldn't have to do this either.
>
> Well, I've now been advised that the best way is all 3 that I have
> tried (among aggressive buffering, moderate buffering, and default
> conservative buffering).
>
> Perhaps you could explain to me why the system shouldn't be ok with
> the moderate set of buffer sizes on a 512mb machine?  I don't really
> know enough about the internals of postgres to be doing anything but
> voodoo when I change the values.

Honestly?  The defaults are small, but they're not small enough to
give you the lousy performance you were describing.  If your buffers
are too high or there are enough things that are using up KVM/system
memory... contention can cause thashing/swapping which it wasn't clear
that you weren't having happen.  Defaults shouldn't, under any
non-embedded circumstance cause problems with machines >233Mhz,
they're just too conservative to do any harm.  :)

> > *) Have you done a vacuum analyze?
>
> See previous message to list (summary: it worked this time, but
> usually it does not help.)

Hrmm... ENOTFREEBSD, eh?


http://www.freebsd.org/cgi/cvsweb.cgi/ports/databases/postgresql7/files/502.pgsql?rev=1.5&content-type=text/x-cvsweb-markup

You may want to setup a nightly vacuum/backup procedure.  Palle
Girgensohn <girgen@pingpong.net> has written a really nice and simple
script that's been in use for ages on FreeBSD PostgreSQL installations
for making sure that you don't have this problem.

Actually, it'd be really cool to lobby to get this script added to the
base PostgreSQL installation that way you wouldn't have this
problem... it'd also dramatically increase the number of nightly
backups performed for folks if a default script does this along with
vacuuming.  -sc


--
Sean Chittenden

Re: postgresql meltdown on PlanetMath.org

From
Aaron Krowne
Date:
> You may want to setup a nightly vacuum/backup procedure.  Palle
> Girgensohn <girgen@pingpong.net> has written a really nice and simple
> script that's been in use for ages on FreeBSD PostgreSQL installations
> for making sure that you don't have this problem.
>
> Actually, it'd be really cool to lobby to get this script added to the
> base PostgreSQL installation that way you wouldn't have this
> problem... it'd also dramatically increase the number of nightly
> backups performed for folks if a default script does this along with
> vacuuming.  -sc

*Actually*, I just double checked, and I was not hallucinating: I *do*
have a nightly vacuum script... because Debian postgres comes with it =)

So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just
VACUUM ANALYZE made all the difference.  Is this possible (the latter,
we know the former is possible...)?

apk

Re: postgresql meltdown on PlanetMath.org

From
Sean Chittenden
Date:
> > You may want to setup a nightly vacuum/backup procedure.  Palle
> > Girgensohn <girgen@pingpong.net> has written a really nice and simple
> > script that's been in use for ages on FreeBSD PostgreSQL installations
> > for making sure that you don't have this problem.
> >
> > Actually, it'd be really cool to lobby to get this script added to the
> > base PostgreSQL installation that way you wouldn't have this
> > problem... it'd also dramatically increase the number of nightly
> > backups performed for folks if a default script does this along with
> > vacuuming.  -sc
>
> *Actually*, I just double checked, and I was not hallucinating: I *do*
> have a nightly vacuum script... because Debian postgres comes with it =)

Cool, glad to hear other installations are picking up doing this.

> So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just
> VACUUM ANALYZE made all the difference.  Is this possible (the latter,
> we know the former is possible...)?

You shouldn't have to do a VACUUM FULL.  Upgrade your PostgreSQL
installation if you can (most recent if possible), there have been
many performance updates and VACUUM fixes worth noting.  Check the
release notes starting with your version and read through them up to
the current release... you'll be amazed at all the work that's been
done, some of which it looks like may affect your installation.

http://developer.postgresql.org/docs/postgres/release.html

-sc

--
Sean Chittenden

Re: postgresql meltdown on PlanetMath.org

From
Tom Lane
Date:
Aaron Krowne <akrowne@vt.edu> writes:
> So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just
> VACUUM ANALYZE made all the difference.  Is this possible (the latter,
> we know the former is possible...)?

If your FSM parameters in postgresql.conf are too small, then plain
vacuums might have failed to keep up with the available free space,
leading to a situation where vacuum full is essential.  Did you happen
to notice whether the vacuum full shrunk the database's disk footprint
noticeably?

            regards, tom lane

Re: postgresql meltdown on PlanetMath.org

From
Joe Conway
Date:
Aaron Krowne wrote:
>>- What version of Postgres?
> 7.2.1

You should definitely look at upgrading, at least to 7.2.4 (which you
can do without requiring a dump/reload cycle), but better yet to 7.3.2
(which will require a dump/reload cycle). I don't know that will fix you
specific issue, but there were some critical bug fixes between 7.2.1 and
7.2.4.

>>- Does the database see mostly SELECTs and INSERTs, or are there many
>>  UPDATEs and/or DELETEs too?
>
> Almost exclusively SELECTs.
>
> OK, I have just run a VACUUM FULL ANALYZE and things seem much better...

Hmmm, do you periodically do large updates or otherwise turn over rows
in batches?

> which would be the first time its really made a difference =)  I tried
> comparing an EXPLAIN ANALYZE of a single row select on the main objects
> table before and after the vacuum, and the plan didn't change
> (sequential scan still), but the response time went from ~1 second to
> ~5msec!  I'm not really sure what could have happened here
> behind-the-scenes since it didn't start using the index, and there
> probably weren't more than 10% updated/added rows since the last VACUUM.

If your app is mostly doing equi-lookups by primary key, and indexes
aren't being used (I think I saw you mention that on another post), then
something else is still wrong. Please pick one or two typical queries
that are doing seq scans and post the related table definitions,
indexes, SQL, and EXPLAIN ANALYZE. I'd bet you are getting bitten by a
datatype mismatch or something.

Joe


Re: postgresql meltdown on PlanetMath.org

From
"Christopher Kings-Lynne"
Date:
> - Lastly, I tried reducing the shared memory max and limiting postgresql
>   to more conservative values, although still not to the out-of-box
>   values.  Right now shared memory max on the system is 128mb,
>   postgres's shared buffers are at 64mb, sort_mem is at 16mb, and
>   effective cache size is at 10mb.

I found that 5000 shared buffers was best performance on my system.
However, your problems are probably due to maybe not running vacuum,
analyze, reindex, etc.  Your queries may not be effectively indexed -
EXPLAIN ANALYZE them all.

Chris


Re: postgresql meltdown on PlanetMath.org

From
Logan Bowers
Date:
I don't know what your definition of "high" is, but I do find that
turnover can degrade performance over time.  Perhaps one of the devs can
enlighten me, but I have a database that turns over ~100,000 rows/day that
does appear to slowly get worse.  The updates are done in batches and I
"VACUUM" and "VACUUM ANALYZE" after each batch (three/day) but I found
that over time simple queries would start to hit the disk more and more.

A "select count(*) FROM tblwordidx" initially took about 1 second to
return a count of 2 million but after a few months it took several minutes
of really hard HDD grinding.  Also, the database only had a couple hundred
megs of data in it, but the db was taking up 8-9 GB of disk space.  I'm
thinking data fragmentation is ruining cache performance?  When I did a
dump restore and updated from 7.2.1 to 7.3.1 queries were zippy again.
But, now it is starting to slow...  I have yet to measure the effects of a
VACUUM FULL, however.  I'll try it an report back...


Logan Bowers

On Sun, 16 Mar 2003, Aaron Krowne wrote:

<snip>
> I've done it here and there, especially when things seem slow.  Never
> seems to help much; the data turnover isn't high.
>
<snip>

Re: postgresql meltdown on PlanetMath.org

From
Sean Chittenden
Date:
> I don't know what your definition of "high" is, but I do find that
> turnover can degrade performance over time.  Perhaps one of the devs
> can enlighten me, but I have a database that turns over ~100,000
> rows/day that does appear to slowly get worse.  The updates are done
> in batches and I "VACUUM" and "VACUUM ANALYZE" after each batch
> (three/day) but I found that over time simple queries would start to
> hit the disk more and more.

Creeping index syndrome.  Tom recently fixed this in HEAD.  Try the
latest copy from the repo and see if this solves your problems.

> A "select count(*) FROM tblwordidx" initially took about 1 second to
> return a count of 2 million but after a few months it took several
> minutes of really hard HDD grinding.

That's because there are dead entries in the index that weren't being
reused or cleaned up.  As I said, this has been fixed.

-sc


PS  It's good to see you around again.  :)

--
Sean Chittenden

Re: postgresql meltdown on PlanetMath.org

From
"Neil Conway"
Date:
Sean Chittenden said:
>> A "select count(*) FROM tblwordidx" initially took about 1 second to
>> return a count of 2 million but after a few months it took several
>> minutes of really hard HDD grinding.
>
> That's because there are dead entries in the index that weren't being
> reused or cleaned up.  As I said, this has been fixed.

That's doubtful: "select count(*) FROM foo" won't use an index. There are
a bunch of other factors (e.g. dead heap tuples, changes in the pages
cached in the buffer, disk fragmentation, etc.) that could effect
performance in that situation, however.

Cheers,

Neil



Re: postgresql meltdown on PlanetMath.org

From
Sean Chittenden
Date:
> >> A "select count(*) FROM tblwordidx" initially took about 1 second to
> >> return a count of 2 million but after a few months it took several
> >> minutes of really hard HDD grinding.
> >
> > That's because there are dead entries in the index that weren't being
> > reused or cleaned up.  As I said, this has been fixed.
>
> That's doubtful: "select count(*) FROM foo" won't use an
> index. There are a bunch of other factors (e.g. dead heap tuples,
> changes in the pages cached in the buffer, disk fragmentation, etc.)
> that could effect performance in that situation, however.

*blush* Yeah, jumped the gun on that when I read that queries were
getting slower (churn of an index == slow creaping death for
performance).  A SELECT COUNT(*), however, wouldn't be affected by the
index growth problem.  Is the COUNT() on a view that uses an index?  I
haven't had any real problems with this kind of degredation outside of
indexes.  :-/  -sc

--
Sean Chittenden

Re: postgresql meltdown on PlanetMath.org

From
Tom Lane
Date:
"Neil Conway" <neilc@samurai.com> writes:
> Sean Chittenden said:
> A "select count(*) FROM tblwordidx" initially took about 1 second to
> return a count of 2 million but after a few months it took several
> minutes of really hard HDD grinding.
>>
>> That's because there are dead entries in the index that weren't being
>> reused or cleaned up.  As I said, this has been fixed.

> That's doubtful: "select count(*) FROM foo" won't use an index.

To know what's going on, as opposed to guessing about it, we'd need to
know something about the physical sizes of the table and its indexes.
"vacuum verbose" output would be instructive...

But my best theorizing-in-advance-of-the-data guess is that Logan's
FSM settings are too small, causing free space to be leaked over time.
If a vacuum full restores the original performance then that's probably
the right answer.

            regards, tom lane

Re: postgresql meltdown on PlanetMath.org

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Aaron Krowne <akrowne@vt.edu> writes:
> > So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just
> > VACUUM ANALYZE made all the difference.  Is this possible (the latter,
> > we know the former is possible...)?
>
> If your FSM parameters in postgresql.conf are too small, then plain
> vacuums might have failed to keep up with the available free space,
> leading to a situation where vacuum full is essential.  Did you happen
> to notice whether the vacuum full shrunk the database's disk footprint
> noticeably?

This seems to be a frequent problem.

Is there any easy way to check an existing table for lost free space?

Is there any way vauum could do this check and print a warning suggesting
using vaccuum full and/or increasing fsm parameters if it finds such?

--
greg

Re: postgresql meltdown on PlanetMath.org

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Is there any easy way to check an existing table for lost free space?

contrib/pgstattuple gives a pretty good set of statistics.  (I thought
VACUUM VERBOSE printed something about total free space in a table,
but apparently only VACUUM FULL VERBOSE does.  Maybe should change
that.)

> Is there any way vauum could do this check and print a warning suggesting
> using vaccuum full and/or increasing fsm parameters if it finds such?

In CVS tip, a whole-database VACUUM VERBOSE gives info about the free
space map occupancy, eg

INFO:  Free space map: 224 relations, 450 pages stored; 3776 total pages needed.
        Allocated FSM size: 1000 relations + 20000 pages = 178 KB shared mem.

If the "pages needed" number is drastically larger than the allocated
FSM size, you've got a problem.  (I don't think you need to panic if
it's just a little larger, though.  10X bigger would be time to do
something, 2X bigger maybe not.)

            regards, tom lane

Re: postgresql meltdown on PlanetMath.org

From
Chris Sutton
Date:
On Mon, 17 Mar 2003, Tom Lane wrote:

> In CVS tip, a whole-database VACUUM VERBOSE gives info about the free
> space map occupancy, eg
>
> INFO:  Free space map: 224 relations, 450 pages stored; 3776 total pages needed.
>         Allocated FSM size: 1000 relations + 20000 pages = 178 KB shared mem.
>

How do you get this information?

I just ran VACUUM VERBOSE and it spit out a bunch of information per
relation, but nothing about total relations and FSM space.  We are running
7.3.2.

Chris


Re: postgresql meltdown on PlanetMath.org

From
Max Baker
Date:
On Sun, Mar 16, 2003 at 03:37:32AM -0500, Tom Lane wrote:
> Aaron Krowne <akrowne@vt.edu> writes:
> > So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just
> > VACUUM ANALYZE made all the difference.  Is this possible (the latter,
> > we know the former is possible...)?
>
> If your FSM parameters in postgresql.conf are too small, then plain
> vacuums might have failed to keep up with the available free space,
> leading to a situation where vacuum full is essential.  Did you happen
> to notice whether the vacuum full shrunk the database's disk footprint
> noticeably?

I was having a similar problem a couple threads ago, and a VACUUM FULL
reduced my database from 3.9 gigs to 2.1 gigs !

So my question is how to (smartly) choose an FSM size?

thanks,
max`

Re: postgresql meltdown on PlanetMath.org

From
"Neil Conway"
Date:
Chris Sutton said:
> On Mon, 17 Mar 2003, Tom Lane wrote:
>> In CVS tip, a whole-database VACUUM VERBOSE gives info about the free
>> space map occupancy, eg

> How do you get this information?
>
> I just ran VACUUM VERBOSE and it spit out a bunch of information per
> relation, but nothing about total relations and FSM space.  We are
> running  7.3.2.

As Tom mentioned, that information is printed by a database-wide VACUUM
VERBOSE "in CVS tip" -- i.e. in the development code that will eventually
become PostgreSQL 7.4

Cheers,

Neil



Re: postgresql meltdown on PlanetMath.org

From
Tom Lane
Date:
Chris Sutton <chris@smoothcorp.com> writes:
> On Mon, 17 Mar 2003, Tom Lane wrote:
>> In CVS tip, a whole-database VACUUM VERBOSE gives info about the free
>> space map occupancy, eg
>> INFO:  Free space map: 224 relations, 450 pages stored; 3776 total pages needed.
>> Allocated FSM size: 1000 relations + 20000 pages = 178 KB shared mem.

> How do you get this information?

Before CVS tip, you don't.

[ thinks...]  Perhaps we could back-port the FSM changes into 7.3 ...
it would be a larger change than I'd usually consider reasonable for a
stable branch, though.  Particularly considering that it would be hard
to call it a bug fix.  By any sane definition this is a new feature,
and we have a policy against putting new features in stable branches.

            regards, tom lane

Re: postgresql meltdown on PlanetMath.org

From
Andrew Sullivan
Date:
On Mon, Mar 17, 2003 at 02:26:00PM -0500, Tom Lane wrote:
> [ thinks...]  Perhaps we could back-port the FSM changes into 7.3 ...

For what it's worth, I think that'd be a terrible precedent.  Perhaps
making a patch file akin to what the Postgres-R folks do, for people
who really want it.  But there is just no way it's a bug fix, and one
of the things I _really really_ like about Postgres is the way
"stable" means stable.  Introducing such a new feature to 7.3.x now
smacks to me of the direction the Linux kernal has gone, where major
new funcitonality gets "merged"[1] in dot-releases of the so-called
stable version.

[1] This is the meaning of "merge" also used in Toronto on the 401 at
rush hour.  8 lanes of traffic jam and growing.

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: postgresql meltdown on PlanetMath.org

From
Tom Lane
Date:
Andrew Sullivan <andrew@libertyrms.info> writes:
> On Mon, Mar 17, 2003 at 02:26:00PM -0500, Tom Lane wrote:
>> [ thinks...]  Perhaps we could back-port the FSM changes into 7.3 ...

> For what it's worth, I think that'd be a terrible precedent.

Oh, I quite agree.  I was just throwing up the option to see if anyone
thought the issue was important enough to take risks for.  I do not...

            regards, tom lane

Re: postgresql meltdown on PlanetMath.org

From
"Kendrick C. Wilson"
Date:
What is the structure of you table?
Is the data types in the table the same as in the SQL....

Did you create the index after the loading the table?
cluster the table around the most used index....

Is you web site on the same box you database is on?

telnet www.planetmath.org 5432
oh, $hit...

never mind........

If you have another box, please put the database on it. The web server maybe
killing the database but this depends on the amount of traffic.
and block the port.........


How fast is you hard drive? 5400rpm :S,

k=n^r/ck, SCJP

_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE*
http://join.msn.com/?page=features/virus


Re: postgresql meltdown on PlanetMath.org

From
"Lucas Adamski"
Date:
Or at least restrict TCP/IP connections from localhost only, and use SSH
tunnels if you must have direct external access (for pgAdmin, etc.) to the
DB.
  Lucas.

-----Original Message-----
From: Kendrick C. Wilson [mailto:kendrick_wilson@hotmail.com]
Sent: Monday, March 17, 2003 2:47 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] postgresql meltdown on PlanetMath.org


What is the structure of you table?
Is the data types in the table the same as in the SQL....

Did you create the index after the loading the table?
cluster the table around the most used index....

Is you web site on the same box you database is on?

telnet www.planetmath.org 5432
oh, $hit...

never mind........

If you have another box, please put the database on it. The web server maybe
killing the database but this depends on the amount of traffic.
and block the port.........


How fast is you hard drive? 5400rpm :S,

k=n^r/ck, SCJP

_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE*
http://join.msn.com/?page=features/virus


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: postgresql meltdown on PlanetMath.org

From
"Christopher Kings-Lynne"
Date:

> What is the structure of you table?
> Is the data types in the table the same as in the SQL....
>
> Did you create the index after the loading the table?
> cluster the table around the most used index....

There is no point clustering a table around the most used index, unless
access to the index is non-random.  eg. you are picking up more than one
consecutive entry from the index at a time.  eg. Indexes on foreign keys are
excellent for clustering.

Chris


Re: postgresql meltdown on PlanetMath.org

From
Aaron Krowne
Date:
I should have paid more attention to the disk space before... but it
looks like somewhere between half a gig and a gig was freed!  The disk
footprint is about a gig now.

Aaron Krowne

On Sun, Mar 16, 2003 at 03:37:32AM -0500, Tom Lane wrote:
> Aaron Krowne <akrowne@vt.edu> writes:
> > So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just
> > VACUUM ANALYZE made all the difference.  Is this possible (the latter,
> > we know the former is possible...)?
>
> If your FSM parameters in postgresql.conf are too small, then plain
> vacuums might have failed to keep up with the available free space,
> leading to a situation where vacuum full is essential.  Did you happen
> to notice whether the vacuum full shrunk the database's disk footprint
> noticeably?
>
>             regards, tom lane

Re: postgresql meltdown on PlanetMath.org

From
Logan Bowers
Date:
All right, I performed a VACUUM FULL last night and after about 3 hours I
tried running a select count(*) FROM tblwordidx and that did help things
considerably (it runs in ~20 seconds instead of 1-2 minutes).  Not as good
as originally, but close.

But, here's the breakdown of the db:

I'm using the database as a keyword based file search engine (not the most
efficient method, I know, but it works well for my purposes).  The biggest
and most relevant tables are a table of files and of words.  The basic
operation that each file has a set of keywords associated with it, I do a
whole word search on tblwordidx and join with tblfiles (I know, the naming
scheme sucks, sorry!).

Three times a day I scan the network and update the database.  I insert
about 180,000 rows into a temporary table and then use it to update
temporary table (tbltmp).  With the aid of a few other tables, I clean up
tblFiles so that existing rows have an updated timestamp in tblseen and
files with a timestamp older than 1 day are removed.  Then, I take the new
rows in tblfiles and use a perl script to add more words to tblwordidx.
After each update a do a VACUUM and VACUUM ANALYZE which usually grinds
for 10 to 15 minutes.

I'm running this db on a celeron 450Mhz with 256MB RAM and a 60GB HDD
(7200 rpm).  For the most part I have the db running "well enough."  Over
time however, I find that performance degrades, the count(*) above is an
example of a command that does worse over time.  It gets run once an hour
for stats collection.  When I first migrated the db to v7.3.1 it would
take about 5-10 seconds (which it is close to now after a VACUUM FULL) but
after a few weeks it would take over a minute of really intense HDD
activity.  Also of note is that when I first loaded the data it would
cache very well with the query taking maybe taking 15 seconds if I had
just started the db after reboot, but when it was in its "slow" state
repeating the query didn't noticably use the disk less (nor did it take
less time).

I've attached a VACUUM VERBOSE and my conf file (which is pretty vanilla,
I haven't tweaked it since updating).  If you have any suggestions on how
I can correct this situation through config changes that would be ideal
and thanks for your help, if is just a case of doing lots of VACUUM FULLs,
I can definitely see it as a performance bottleneck for postgres.
Fortunately I can afford the huge peroformance penalty of a VACUUM FULL,
but I can certainly think of apps that can't.


Logan Bowers

\d tblfiles: (219,248 rows)
  Column  |            Type             |                 Modifiers
----------+-----------------------------+-------------------------------------------
 fid      | integer                     | not null default
nextval('fileids'::text)
 hid      | integer                     | not null
 pid      | integer                     | not null
 name     | character varying(256)      | not null
 size     | bigint                      | not null
Indexes: temp_fid_key unique btree (fid),
         filediridx btree (hid, pid, name, size, fid),
         fileidx btree (name, hid, pid, fid),
         fileidxfid btree (fid, name, pid)

\d tblwordidx: (1,739,481 rows)
          Table "public.tblwordidx"
 Column |          Type          | Modifiers
--------+------------------------+-----------
 fid    | integer                | not null
 word   | character varying(128) | not null
 todel  | boolean                |
Indexes: wordidxfid btree (fid, word),
         wordidxfidonly btree (fid),
         wordidxw btree (word, fid)



On Mon, 17 Mar 2003, Tom Lane wrote:

> "Neil Conway" <neilc@samurai.com> writes:
> > Sean Chittenden said:
> > A "select count(*) FROM tblwordidx" initially took about 1 second to
> > return a count of 2 million but after a few months it took several
> > minutes of really hard HDD grinding.
> >>
> >> That's because there are dead entries in the index that weren't being
> >> reused or cleaned up.  As I said, this has been fixed.
>
> > That's doubtful: "select count(*) FROM foo" won't use an index.
>
> To know what's going on, as opposed to guessing about it, we'd need to
> know something about the physical sizes of the table and its indexes.
> "vacuum verbose" output would be instructive...
>
> But my best theorizing-in-advance-of-the-data guess is that Logan's
> FSM settings are too small, causing free space to be leaked over time.
> If a vacuum full restores the original performance then that's probably
> the right answer.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Attachment

Re: postgresql meltdown on PlanetMath.org

From
"Christopher Kings-Lynne"
Date:
> I'm running this db on a celeron 450Mhz with 256MB RAM and a 60GB HDD
> (7200 rpm).  For the most part I have the db running "well enough."  Over
> time however, I find that performance degrades, the count(*) above is an
> example of a command that does worse over time.  It gets run once an hour
> for stats collection.  When I first migrated the db to v7.3.1 it would
> take about 5-10 seconds (which it is close to now after a VACUUM FULL) but
> after a few weeks it would take over a minute of really intense HDD
> activity.  Also of note is that when I first loaded the data it would
> cache very well with the query taking maybe taking 15 seconds if I had
> just started the db after reboot, but when it was in its "slow" state
> repeating the query didn't noticably use the disk less (nor did it take
> less time).

To speed up your COUNT(*), how about doing this:

Create a separate table to hold a single integer.

Add a trigger after insert on your table to increment the counter in the
other table
Add a trigger after delete on your table to decrement the counter in the
other table.

That way you always have an O(1) count...

Chris


Re: postgresql meltdown on PlanetMath.org

From
Tom Lane
Date:
Logan Bowers <logan@datacurrent.com> writes:
> I've attached a VACUUM VERBOSE and my conf file (which is pretty vanilla,
> I haven't tweaked it since updating).

You definitely need to increase the fsm shared memory parameters.  The
default max_fsm_relations is just plain too small (try 1000) and the
default_max_fsm_pages is really only enough for perhaps a 100Mb
database.  I'd try bumping it to 100,000.  Note you need a postmaster
restart to make these changes take effect.

            regards, tom lane

Re: postgresql meltdown on PlanetMath.org

From
Chris Sutton
Date:
On Mon, 17 Mar 2003, Logan Bowers wrote:

> Logan Bowers
>
> \d tblfiles: (219,248 rows)
>   Column  |            Type             |                 Modifiers
> ----------+-----------------------------+-------------------------------------------
>  fid      | integer                     | not null default
> nextval('fileids'::text)
>  hid      | integer                     | not null
>  pid      | integer                     | not null
>  name     | character varying(256)      | not null
>  size     | bigint                      | not null
> Indexes: temp_fid_key unique btree (fid),
>          filediridx btree (hid, pid, name, size, fid),
>          fileidx btree (name, hid, pid, fid),
>          fileidxfid btree (fid, name, pid)

I'm no expert on indexes, but I seem to remember reading that creating
multicolumn indexes on more than 2 or 3 columns gets sort of pointless:

http://www.us.postgresql.org/users-lounge/docs/7.3/postgres/indexes-multicolumn.html

There is probably a ton of disk space and CPU used to keep all these multi
column indexes.  Might be part of the problem.

> \d tblwordidx: (1,739,481 rows)
>           Table "public.tblwordidx"
>  Column |          Type          | Modifiers
> --------+------------------------+-----------
>  fid    | integer                | not null
>  word   | character varying(128) | not null
>  todel  | boolean                |
> Indexes: wordidxfid btree (fid, word),
>          wordidxfidonly btree (fid),
>          wordidxw btree (word, fid)
>

Another index question for the pros.  When creating a multi-column index
do you need to do it both ways:

wordidxfid btree (fid, word)
wordidxw btree (word, fid

We have a very similar "dictonary" table here for searching.  It's about
1.7 million rows, takes about 80mb of disk space.  There is one multi
column index on the table which uses about 50mb of disk space.

To find out how much disk space you are using, the hard way is:

select relfilenode from pg_class where relname='tblwordidx';
select relfilenode from pg_class where relname='wordidxw';

relfilenode is the name of the file in your data directory.

I'm pretty sure there is an easier way to do this with a function I saw in
contrib.

Just some thoughts.

Chris


Re: postgresql meltdown on PlanetMath.org

From
"Kendrick C. Wilson"
Date:
Clustering is good for queries that return multiple values.

select this, that
from tableA
where this = 'whatever';

If there are multiple values, the location of the first record is found in
the indexFile.

Then dataFile is scanned until this != 'whatever';

This will decrease disk activity, which is the bottle neck in database
performance.

k=n^r/ck, SCJP


>From: "Christopher Kings-Lynne" <chriskl@familyhealth.com.au>
>To: "Kendrick C. Wilson"
><kendrick_wilson@hotmail.com>,<pgsql-performance@postgresql.org>
>Subject: Re: [PERFORM] postgresql meltdown on PlanetMath.org Date: Tue, 18
>Mar 2003 09:34:36 +0800
>MIME-Version: 1.0
>Received: from relay2.pgsql.com ([64.49.215.143]) by
>mc6-f41.law1.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Mon, 17 Mar
>2003 17:34:42 -0800
>Received: from postgresql.org (postgresql.org [64.49.215.8])by
>relay2.pgsql.com (Postfix) with ESMTPid 022ADE5BD; Mon, 17 Mar 2003
>20:34:36 -0500 (EST)
>Received: from houston.familyhealth.com.au (unknown [203.59.48.253])by
>postgresql.org (Postfix) with ESMTP id A55E5475F09for
><pgsql-performance@postgresql.org>; Mon, 17 Mar 2003 20:34:33 -0500 (EST)
>Received: (from root@localhost)by houston.familyhealth.com.au
>(8.11.6/8.11.6) id h2I1Yac95711for pgsql-performance@postgresql.org; Tue,
>18 Mar 2003 09:34:36 +0800 (WST)(envelope-from chriskl@familyhealth.com.au)
>Received: from mariner (mariner.internal [192.168.0.101])by
>houston.familyhealth.com.au (8.11.6/8.9.3) with SMTP id h2I1YW795594;Tue,
>18 Mar 2003 09:34:32 +0800 (WST)
>X-Message-Info: yilqo4+6kc64AXpUCzRAW30W84h6gtv8
>X-Original-To: pgsql-performance@postgresql.org
>Message-ID: <07b501c2ecee$8917b8c0$6500a8c0@fhp.internal>
>References: <BAY1-F104L3frQfGvB200019147@hotmail.com>
>X-Priority: 3
>X-MSMail-Priority: Normal
>X-Mailer: Microsoft Outlook Express 6.00.2800.1106
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
>X-scanner: scanned by Inflex 0.1.5c - (http://www.inflex.co.za/)
>Precedence: bulk
>Sender: pgsql-performance-owner@postgresql.org
>Return-Path: pgsql-performance-owner+M1426@postgresql.org
>X-OriginalArrivalTime: 18 Mar 2003 01:34:42.0860 (UTC)
>FILETIME=[8CCFDEC0:01C2ECEE]
>
>
>
> > What is the structure of you table?
> > Is the data types in the table the same as in the SQL....
> >
> > Did you create the index after the loading the table?
> > cluster the table around the most used index....
>
>There is no point clustering a table around the most used index, unless
>access to the index is non-random.  eg. you are picking up more than one
>consecutive entry from the index at a time.  eg. Indexes on foreign keys
>are
>excellent for clustering.
>
>Chris
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/docs/faqs/FAQ.html




_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.
http://join.msn.com/?page=features/virus


Re: postgresql meltdown on PlanetMath.org

From
Josh Berkus
Date:
Logan,

> I'm running this db on a celeron 450Mhz with 256MB RAM and a 60GB HDD
> (7200 rpm).  For the most part I have the db running "well enough."  Over

Hmmm ... actually, considering your hardware, I'd say the database performance
you're getting is excellent.   You're facing 3 bottlenecks:

1) The Celeron II's lack of on-chip cache will slow down even moderately
complex queries as much as 50% over a comparably-clocked pentium or AMD chip,
in my experience.

2) 256mb RAM is small enough that if you are running Apache on the same
machine, Apache & Postgres could be contesting for RAM during busy periods.

3) (most noticable) You have pretty much the bare minimum of disk.  For a
one-gb database, a Linux RAID array or mirror would be a lot better ...

Of course, that's all relative.   What I'm saying is, if you want your
database to "scream" you're going to have to put some money into hardware.
If you're just looking for adequate performance, then that can be had with a
little tweaking and maintainence.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: postgresql meltdown on PlanetMath.org

From
Manfred Koizar
Date:
On Tue, 18 Mar 2003 09:19:44 -0600, "Kendrick C. Wilson"
<kendrick_wilson@hotmail.com> wrote:
>If there are multiple values, the location of the first record is found in
>the indexFile.
>
>Then dataFile is scanned until this != 'whatever';

Nice, but unfortunately not true for Postgres.  When you do the first
UPDATE after CLUSTER the new version of the changed row(s) are written
to the end of the dataFile (heap relation in Postgres speech).  So the
*index* has to be scanned until this != 'whatever'.

>Clustering is good for queries that return multiple [rows with the same search] values.

Yes.  With clustering you can expect that most of the tuples you want
are near to each other and you find several of them in the same page.

Servus
 Manfred

Re: postgresql meltdown on PlanetMath.org

From
Florian Weimer
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> You definitely need to increase the fsm shared memory parameters.  The
> default max_fsm_relations is just plain too small (try 1000) and the
> default_max_fsm_pages is really only enough for perhaps a 100Mb
> database.  I'd try bumping it to 100,000.  Note you need a postmaster
> restart to make these changes take effect.

Hmm, are there any guidelines for choosing these values?

We have a database with a table into which we insert about 4,000,000
rows each day, and delete another 4,000,000 rows.  The total row count
is around 40 million, I guess, and the rows are about 150 bytes long.
(VACUUM FULL is running at the moment, so I can't check.)

The database is used as a research tool, and we run moderately complex
ad-hoc queries on it.  As a consequence, I don't see much room for
optimization.

One of the columns is time-based and indexed, so we suffer from the
creeping index syndrome.  A nightly index rebuild followed by a VACUUM
ANALYZE isn't a problem (it takes less than six ours), but this
doesn't seem to be enough (we seem to lose disk space nevertheless).

I can't afford a regular VACUUM FULL because it takes down the
database for over ten hours, and this starts to cut into the working
hours no matter when it starts.

Can you suggest some tweaks to the FSM values so that we can avoid the
full VACUUM?  The database runs 7.3.2 and resides on a 4-way Xeon box
with 4 GB of RAM and a severely underpowered disk subsystem (Linux
software RAID1 on two 10k 36 GB SCSI drives -- don't ask, this
database application is nothing but an accident which happened after
purchase of the box).

--
Florian Weimer                       Weimer@CERT.Uni-Stuttgart.DE
University of Stuttgart           http://CERT.Uni-Stuttgart.DE/people/fw/
RUS-CERT                          fax +49-711-685-5898

Re: postgresql meltdown on PlanetMath.org

From
"Shridhar Daithankar"
Date:
On Friday 21 Mar 2003 4:31 am, Florian Weimer wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> > You definitely need to increase the fsm shared memory parameters.  The
> > default max_fsm_relations is just plain too small (try 1000) and the
> > default_max_fsm_pages is really only enough for perhaps a 100Mb
> > database.  I'd try bumping it to 100,000.  Note you need a postmaster
> > restart to make these changes take effect.
>
> Hmm, are there any guidelines for choosing these values?
>
> We have a database with a table into which we insert about 4,000,000
> rows each day, and delete another 4,000,000 rows.  The total row count
> is around 40 million, I guess, and the rows are about 150 bytes long.
> (VACUUM FULL is running at the moment, so I can't check.)

I suggest you split your tables into exactly similar tables using inheritance.
Your queries won't be affected as you can make them on parent table and get
same result.

But as far as vacuuming goes, you can probably dump a child table entirely and
recreate it as a fast alternative to vacuum.

Only catch is, I don't know if inherited tables would use their respective
indxes other wise your queries might be slow as anything.

> One of the columns is time-based and indexed, so we suffer from the
> creeping index syndrome.  A nightly index rebuild followed by a VACUUM
> ANALYZE isn't a problem (it takes less than six ours), but this
> doesn't seem to be enough (we seem to lose disk space nevertheless).

I am sure a select * from table into another table; drop table; renamre temp
table kind of hack would be faster than vacuuming in this case..

This is just a suggestion. Good if this works for you..

 Shridhar

Re: postgresql meltdown on PlanetMath.org

From
Tom Lane
Date:
Florian Weimer <Weimer@CERT.Uni-Stuttgart.DE> writes:
> Hmm, are there any guidelines for choosing these values?

> We have a database with a table into which we insert about 4,000,000
> rows each day, and delete another 4,000,000 rows.  The total row count
> is around 40 million, I guess, and the rows are about 150 bytes long.

If you are replacing 10% of the rows in the table every day, then it's
a pretty good bet that every single page of the table contains free
space.  Accordingly, you'd better set max_fsm_pages large enough to
have a FSM slot for every page of the table.  (1 page = 8Kb normally)

You could possibly get away with a smaller FSM if you do (non-FULL)
vacuums more often than once a day.  Some people find they can run
background vacuums without hurting performance too much, some don't
--- I suspect it depends on how much spare disk bandwidth you have.

            regards, tom lane