Thread: Vacuuming on heavily changed databases

Vacuuming on heavily changed databases

From

Bohdan Linda

Date:

19 May 2008, 11:50:38

Hello,

I would like to ask an opinion on vacuuming general. Imagine situation
that you have single table with 5 fields (one varchar). This table has
during the day

- cca 620 000 inserts
- 0 updates
- cca 620 000 deletes

The table is vacuumed daily, but somehow after several months I got to
size of ~50GB

Result of VACUUM FULL VERBOSE ANALYZE is:

Nonremovable row versions range from 102 to 315 bytes long.
There were 218253801 unused item pointers.
Total free space (including removable row versions) is 40627058888 bytes.
4850610 pages are or will become empty, including 0 at the end of the
table.
5121624 pages containing 40625563500 free bytes are potential move
destinations.
CPU 161.85s/35.51u sec elapsed 1191.17 sec.

This means 80% wasted space that could be reused. Right now, I am doing
vacuum full but this requires exclusive lock. During that time the
database is locked so I am missing "inserts and deletes" ;-)

I would like to avoid this in future, so I would like to prepare strategy
how to do it next time or avoid. Basically I have the follwing
limitations:

1) sometimes deletes vs vacuum analyze does not help, extra space is not
relcaimed. Do not know why this is happening, but maybe vacuum cannot get
lock

2) manualy evoked vacuum full requires bringing database long-time offline

3) There were suggestions (in archives) doing dump and then restore on
dropped database, but still requires downtime.

What would be your strategy for the database maintenance like this? What
tweaking of vacuuming can I make, so I do not get those "forgotten"
records?

Thank you,
Bohdan

Re: Vacuuming on heavily changed databases

From

"Harald Armin Massa"

Date:

19 May 2008, 11:59:55

Hello,

> I would like to ask an opinion on vacuuming general. Imagine situation
> that you have single table with 5 fields (one varchar). This table has
> during the day
>
> - cca 620 000 inserts
> - 0 updates
> - cca 620 000 deletes
>
> The table is vacuumed daily, but somehow after several months I got to
> size of ~50GB

do not vacuum DAILY. set up autovacuum to run AT LEAST every minute.
autovacuum will flag the "deleted" rows as to be reusable by next
insert. Make sure to use 8.3.<latest>, it's much more easy to setup
autovacuum then before.

best wishes,

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pidgeon
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

Re: Vacuuming on heavily changed databases

From

Martijn van Oosterhout

Date:

19 May 2008, 12:19:49

Apart from reinterating what someone else: you're not vacuuming
anywhere near often often. Normal vacuum takes no locks.

On Mon, May 19, 2008 at 04:50:18PM +0200, Bohdan Linda wrote:
> 3) There were suggestions (in archives) doing dump and then restore on
> dropped database, but still requires downtime.

CLUSTER will rebuild the table with only the actually used records in
it. It also requires an exclusive lock so try to keep it small :)

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

signature.asc

Re: Vacuuming on heavily changed databases

From

Bohdan Linda

Date:

19 May 2008, 12:24:18

On Mon, May 19, 2008 at 04:59:42PM +0200, Harald Armin Massa wrote:
> do not vacuum DAILY. set up autovacuum to run AT LEAST every minute.
> autovacuum will flag the "deleted" rows as to be reusable by next
> insert. Make sure to use 8.3.<latest>, it's much more easy to setup
> autovacuum then before.

Hello Harald,

Thank you, will look at that. My problem is I have to use 8.0.x, but it
should be supported.

Regards,
Bohdan

Re: Vacuuming on heavily changed databases

From

Sam Mason

Date:

19 May 2008, 14:21:27

On Mon, May 19, 2008 at 04:59:42PM +0200, Harald Armin Massa wrote:
> On Mon, May 19, 2008 at 04:50:18PM +0200, Bohdan Linda wrote:
> > I would like to ask an opinion on vacuuming general. Imagine situation
> > that you have single table with 5 fields (one varchar). This table has
> > during the day
> >
> > - cca 620 000 inserts
> > - 0 updates
> > - cca 620 000 deletes
> >
> > The table is vacuumed daily, but somehow after several months I got to
> > size of ~50GB
>
> do not vacuum DAILY. set up autovacuum to run AT LEAST every minute.

This will help if the changes to the database are evenly distributed
throughout the day, if they're very spiky then you may want to run
a vacuum after one of these bulk changes.  I believe that on larger
databases for those tables that are incurring heavy modification the
admin would configure an associated session whose sole responsibility
would be to issue a never ending stream of VACUUMs.  This tends to imply
larger RAID arrays that can tolerate multiple concurrent read/write
requests.  I.e. something like the following, but with some error
checking:

  for t in foo bar baz
    do ( while echo "VACUUM $t;" && false ; do true ; done | psql ) &
  done

As an ongoing thing, PG records partially filled pages in its so-called
"Free Space Map".  The FSM gets updated by VACUUM and is used by the
data modification statements to find places where new data can be
written.  If the FSM is too small then PG will grow the table even
though there is free space in the table (because it doesn't know it
exists).  VACUUM will normally give error messages about the FSM map
being too small, that and the fact that you didn't mentioned this makes
me think you have already found an optimum value here.

  Sam

Re: Vacuuming on heavily changed databases

From

Sam Mason

Date:

19 May 2008, 14:26:42

On Mon, May 19, 2008 at 06:21:18PM +0100, Sam Mason wrote:
>   for t in foo bar baz
>     do ( while echo "VACUUM $t;" && false ; do true ; done | psql ) &
>   done

oops, that "&& false" shouldn't be there! like like this:

  for t in foo bar baz
    do ( while echo "VACUUM $t;" ; do true ; done | psql ) &
  done


  Sam

Re: Vacuuming on heavily changed databases

From

"Scott Marlowe"

Date:

19 May 2008, 15:38:19

On Mon, May 19, 2008 at 8:50 AM, Bohdan Linda <bohdan.linda@seznam.cz> wrote:
> Hello,
>
> I would like to ask an opinion on vacuuming general. Imagine situation
> that you have single table with 5 fields (one varchar). This table has
> during the day
>
> - cca 620 000 inserts
> - 0 updates
> - cca 620 000 deletes
>
> The table is vacuumed daily, but somehow after several months I got to
> size of ~50GB

OK.  Assuming that the 50G is mostly dead space, there are a few
possibilities that could be biting you here, but the most likely one
is that your Free Space Map settings aren't high enough to include all
the rows that have been deleted since the last vacuum was run.  If you
can't take down the server to change those settings, then running
vacuum more often will help.

The autovacuum daemon is your friend.  Even with the default non
aggresive settings it comes with, it would have caught this long
before now.

Note that I no longer have 8.0 installed anywhere, only 8.1.  But I
know that 8.1 supported vacuum costing features to let you slow down
vacuum so it doesn't stomp on your I/O to the detriment of all the
other pgsql processes.  Look for those settings.  You can adjust those
and just reload.

Re: Vacuuming on heavily changed databases

From

Dragan Zubac

Date:

19 May 2008, 21:38:00

Hello

I have some similar situation like Yours,we're using at the moment PG
8.2.0. At the moment we do manually vacuum (one or more times to
minimize 'dead' data/tuples),and if necessary we do 'full' vacuum. On
heavy-updated PG,one surely must think of this procedures because they
are considered to be 'daily maintenance routine'. Still haven't migrated
to PG 8.3,which is planned,but one way to look at this problem is to
have 'timeout aware applications',meaning when You fire up vacuum or
some other command that will lock some (or all data), You application
does not stop operating but put itself into a little 'sleep' until data
became available again. Therefore Your frontend
(apps,clients,whatsoever) will observe only a small glitch and not a
corruption in database connectivity,operations,etc.

Also bear in mind that more TPS,more 'dead' data/tuples You will
have,meaning the following:

1. Ordinary inserts

insert into foo (column1,column2) values (val1,val2);
insert into foo (column1,column2) values (val3,val4);
insert into foo (column1,column2) values (val5,val6);

3 separated transaction,guess it means 3 'dead' tuples ?

2. Multi-insert command

insert into foo (column1,column2) values
(val1,val2),(val3,val4),(val5,val6);

1 transaction,guess it means 1 'dead' tuples ?

I'm not sure about this,but guess somebody will correct me if I'm wrong :)

The questions are:

Is number of transactions related to the number of 'dead' rows in PG ?
Meaning less transactions,means less frequently vacuum needed for same
amount of data ?

Sincerely

Dragan

Bohdan Linda wrote:
> Hello,
>
> I would like to ask an opinion on vacuuming general. Imagine situation
> that you have single table with 5 fields (one varchar). This table has
> during the day
>
> - cca 620 000 inserts
> - 0 updates
> - cca 620 000 deletes
>

Re: Vacuuming on heavily changed databases

From

"Gregory Williamson"

Date:

19 May 2008, 22:21:08

Sorry for top-posting -- challenged reader, but less challenged than running 8.2.0 -- upgrade to latest release -- 8.2.7! There were many fixes after the initial release and you're risking some bad mojo.

Greg Williamson
Senior DBA
DigitalGlobe

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)

-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Dragan Zubac
Sent: Mon 5/19/2008 4:44 PM
To: Bohdan Linda
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Vacuuming on heavily changed databases

> Hello
>
> I have some similar situation like Yours,we're using at the moment PG
> 8.2.0.
<...>

Re: Vacuuming on heavily changed databases

From

"Scott Marlowe"

Date:

19 May 2008, 23:09:20

On Mon, May 19, 2008 at 5:44 PM, Dragan Zubac <zubac@vlayko.tv> wrote:
> Hello
>
> I have some similar situation like Yours,we're using at the moment PG 8.2.0.

As Gregory above mentioned, update NOW to 8.2.7.  It only takes minutes to do.

> At the moment we do manually vacuum (one or more times to minimize 'dead'
> data/tuples),and if necessary we do 'full' vacuum.

Think of vacuum as regular maintenance, and full vacuum as emergency
fixing because regular vacuum wasn't run often enough.

Also, look at running the autovacuum daemon.  It does a great job for
most people.

> Still haven't migrated to PG 8.3,which is
> planned,but one way to look at this problem is to have 'timeout aware
> applications',meaning when You fire up vacuum or some other command that
> will lock some (or all data),

Regular vacuum does not lock the table (well for more than a split
second anyway).

However, vacuum without a cost delay may chew up so much of your I/O
that performance suffers .  Which is why you've got cost delay
settings in postgresql.conf to adjust it.

> Also bear in mind that more TPS,more 'dead' data/tuples You will
> have,meaning the following:
>
> 1. Ordinary inserts
>
> insert into foo (column1,column2) values (val1,val2);
> insert into foo (column1,column2) values (val3,val4);
> insert into foo (column1,column2) values (val5,val6);
>
> 3 separated transaction,guess it means 3 'dead' tuples ?

No, inserts (at least the ones that don't fail) don't create dead
tuples.  Updates and deletes do.

> 2. Multi-insert command
>
> insert into foo (column1,column2) values
> (val1,val2),(val3,val4),(val5,val6);
>
> 1 transaction,guess it means 1 'dead' tuples ?

No, for the reasons above.

However, if they were updates, like this:

begin;
update...
update...
update...
commit;
and each one updated one row, you'd have three dead tuples.

> Is number of transactions related to the number of 'dead' rows in PG ?
> Meaning less transactions,means less frequently vacuum needed for same
> amount of data ?

each tuple replaced by an update, or deleted by a delete, or created
by a failed insert will be a single dead tuple.  The number of
transactions means nothing.

But  the point about vacuuming stands.  More frequent updates /
deletes / failed inserts require more frequent vacuums.  The
autovacuum daemon can handle this for ya for the most part, but some
busy systems will outrun it sometimes.

Also, make sure you have enough free space map entries to cover all
your dead tuples or they can't be reclaimed.

Re: Vacuuming on heavily changed databases

From

Bohdan Linda

Date:

20 May 2008, 05:06:34

On Mon, May 19, 2008 at 08:38:09PM +0200, Scott Marlowe wrote:
> OK.  Assuming that the 50G is mostly dead space, there are a few
> possibilities that could be biting you here, but the most likely one
> is that your Free Space Map settings aren't high enough to include all
> the rows that have been deleted since the last vacuum was run.  If you
> can't take down the server to change those settings, then running
> vacuum more often will help.
>
> The autovacuum daemon is your friend.  Even with the default non
> aggresive settings it comes with, it would have caught this long
> before now.

I can bring down the DB for short time, but I am stuct with 8.0. Found
that autovacuum is part of contrib, thus will try

Thank you all for the opinion

Regards,
Bohdan