Re: More Praise for 7.4RC2 - Mailing list pgsql-general

From scott.marlowe
Subject Re: More Praise for 7.4RC2
Date
Msg-id Pine.LNX.4.33.0311131356430.1026-100000@css120.ihs.com
Whole thread Raw
In response to Re: More Praise for 7.4RC2  (Reece Hart <reece@in-machina.com>)
Responses Re: More Praise for 7.4RC2
Re: More Praise for 7.4RC2
List pgsql-general
On Thu, 13 Nov 2003, Reece Hart wrote:

> On Thu, 2003-11-13 at 10:09, scott.marlowe wrote:
>
> > Do you vacuum full every so often?  If not, and if you've been overflowing
> > your fsm, then your tables will just grow without shrinking.
> > Also, index growth could be a problem.
>
>
> Hmm. I didn't realize that I needed to vacuum full as well -- I thought
> vacuum was sufficient for performance gains, and that full reclaimed
> space but didn't result in significant performance gains. I have
> reindexed infrequently, but since that locks the table I didn't do that
> (or vacuum full) often. I guess I should try out pg_autovacuum, but I
> think that full vacuums only to prevent XID wraparound (if age>1.5B
> transactions), but not for compaction (is this correct?).

Assuming your free space map has enough room, and you vacuum (plain
vacuum) often enough, you're dandy.

But, let's say you run 10,000 transactions on a 1,000 row table, then run
a plain vacuum.  Even if your fsm can hold all the space that is free in
that table's space, you've got a sparsely populated table that will take a
while to seq scan through.

I.e. if you don't vacuum (regular) often enough, then your tables may be
quite large.

I've tested out the pg_autovacuum daemon, and it seems to work quite well
for me.

What you're really shooting for is a "steady state" table size.  Let's say
you've got a 10,000 row table, and you average 500 changes an hour on it.
If you vacuum it every day it will probably be fine, as by the end of 24
hours, the table will take up the space of about 22,000 rows.  So, it will
have the same basic performance as if it had 22,000 rows in it.

If you start vacuuming it every hour after that first few days, then it
will always be about 22,000 rows in size, (assuming the rows aren't
noticable bigger or smaller from one version to the next.)

While plain vacuums can reclaim the space at the very end of a table, and
could theoretically make this table shrink over time, it's highly unlikely
to ever drop back down to the approximate 10,000 rows in size started
with.  Vacuum full will drop it back down to somewhere around there.

So, if your table is HIGHLY updated, you may need to run a plain vacuum
very often, and that's where the autovacuum daemon comes in handy.  Just
set it to run every 30 minutes or so, and let it go.  It should only
vacuum the tables that have had lots of change, and leave the others
alone.

With the vacuum delay patch that's in testing for 7.5, it may well be that
running the autovacuum daemon will become acceptable in places where,
right now, vacuum, even the regular kind, produce too much system load /
slow down in the middle of the day.

> > The real test is to dump the database and reload it to give 7.3.4 a fair
> > shake.
>
> It turns out that I have two copies of this database around at the
> moment running on 7.3.4. One was a fresh restore, and that's what I used
> to generate the explain. However, the query was run on the older
> database which was vacuumed and analyzed (but not vacuum full or
> reindexed), and on that instance the query took a long time. On the
> fresh install, it takes 72s. In summary:
>
> 7.3.4, long-running db: eons
> 7.3.4, freshly restored: 72s
> 7.4RC2, freshly restored: 0.3s

You can use the oid2name program in the contrib directory to kinda
research which files are big under those trees and see if it's a table or
index growth problem.

something like:

#su - postgres
$ oid2name
All databases:
---------------------------------
16975  = template0
16976  = postgres

$ cd $PGDATA/base/16976
du -s *|sort -n

1004    16640
12232   109169550
65644   109169548

will list the largest files in the postgres database directory.

oid2name -d postgres |grep 10169548
109169548 = accounts

tells me that it's the accounts table that's taking up all my room.

and so on.


pgsql-general by date:

Previous
From: jini us
Date:
Subject: Re: embedded postgresql
Next
From: "scott.marlowe"
Date:
Subject: Re: embedded postgresql