Re: Vacuuming strategy - Mailing list pgsql-general

From Jeff Janes
Subject Re: Vacuuming strategy
Date
Msg-id CAMkU=1wOnB1=8bJcLGBfqThSxFrrdQdx16OwEi8w4jXbYgpfwQ@mail.gmail.com
Whole thread Raw
In response to Vacuuming strategy  (Elanchezhiyan Elango <elanelango@gmail.com>)
Responses Re: Vacuuming strategy  (Elanchezhiyan Elango <elanelango@gmail.com>)
List pgsql-general
On Tue, Apr 29, 2014 at 4:59 PM, Elanchezhiyan Elango <elanelango@gmail.com> wrote:
Hi,

I need help on deciding my vacuuming strategy. I need to know if I ever need to do 'vacuum full' for my tables.

Tables1: Following is the query patterns on 4 high traffic table in my database:
1. Every 5 minutes about 50000 rows in the table are updated. And for a given clock hour the same 50000 rows are updated again and again.
2. When a new hour begins, another 50000 rows get inserted and they get updated every 5 minutes.
3. Every night a days worth of old stats are deleted. So this would be 24 * 50000 = 1.2M records that get deleted every night.

What indexes exist?  Are the updates to indexed columns?  Which day's tuples are deleted every night?  Is it the day a week ago, or a month ago, or something else?

 
Tables3: Another 4 tables have the following pattern:
1. Every 1 hour 50000 rows get updated. For a given day the same 50000 rows are updated again and again.
2. When a new day begins, another 50000 rows get inserted and they get updated every hour.
3. Every night 1.2M records get deleted.

If only 50,000 get inserted daily and 1.2M get deleted, this table will soon be empty!  I think you said daily when you meant hourly somewhere in there.
 

Tables4: Another 39 tables have the following pattern:
1. Every 1 hour 2000 rows gets updated. For a given day the same 2000 rows are updated again and again.
2. When a new day begins, another 2000 rows get inserted and they get updated every hour.
3. Every night 48000 rows get deleted.

With the above query pattern with intensive updates and deletes, I need to do some aggressive vacuuming.

Current strategy:I am running with default autovacuum settings (postgres 9.1.9) and I tried doing a 'vacuum full' for the 8 high traffic tables (Tables1 and Tables3) every night. But after a point, the 'vacuum full's started timing out (with 4min timeout) every night.

Why do you have a 4 minute timeout?  That seems counter-productive.

 
I think this is because the table is growing bigger (~5GB) and doing a vacuum full every night is probably not feasible.

It is probably not necessary, but it certainly seems feasible.  4 min * 8 tables = 32 minutes.  Call it one hour, since the 4 minute timeout has started not being enough.  Is the 5GB for the table plus indexes, or just the table itself?
 

Going with the default autovacuum settings and not doing 'vacuum full' at all is also not enough for my usecase. Whenever vacuum full succeeded every night, it did seem to reclaim a considerable amount of space.

You delete a bunch of tuples every night, so of course a vacuum full after that is going to return a lot of space.  But that space is probably just going to be needed again the next day.  If you don't do the vacuum full, does the *peak* space keep increasing, or does it stabilize?

Cheers,

Jeff

pgsql-general by date:

Previous
From: Michael Bostock
Date:
Subject: Re: Ingres to Postgres migration
Next
From: Raymond O'Donnell
Date:
Subject: Re: Escape double-quotes in text[]?