Re: Vacuuming strategy - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Vacuuming strategy
Date
Msg-id CA+bJJbzcZUU26h+d2jYOpT-d=FkZ8FxVJc_NpfodpHX4XUfP+g@mail.gmail.com
Whole thread Raw
In response to Vacuuming strategy  (Elanchezhiyan Elango <elanelango@gmail.com>)
List pgsql-general
Hi:

On Wed, Apr 30, 2014 at 1:59 AM, Elanchezhiyan Elango
<elanelango@gmail.com> wrote:
> I need help on deciding my vacuuming strategy. I need to know if I ever need
> to do 'vacuum full' for my tables.
>
> Tables1: Following is the query patterns on 4 high traffic table in my
> database:
> 1. Every 5 minutes about 50000 rows in the table are updated. And for a
> given clock hour the same 50000 rows are updated again and again.
> 2. When a new hour begins, another 50000 rows get inserted and they get
> updated every 5 minutes.
> 3. Every night a days worth of old stats are deleted. So this would be 24 *
> 50000 = 1.2M records that get deleted every night.
.......

Not a vacuuming strategy, but have you considered partitioning?

I mean. Your pattern seems to be insert rows, update during the hour,
delete daily. If you have a clear timestamp key, your problem seems to
be well suited for it, specially given you seem to have an hourly
process ( insert in step 2 ), and a daily process ( deletion in step 3
). I can see two potential uses for partitions here.

One is a classic, lets say a partition per day. In step 3 you drop the
old partition and create the one for the next day, this is fast, and
autovacuum will maintain the current table tidy enough. You could even
complement it with full vacuuming the newly inserted partition.
Something like, if you want to keep K days shortly after the  start of
day D ( not at exact midnight to avoid races ) you drop the partition
for day D-K, create the one for D+1 ( the one for D was created
beforehand, on D-1 to avoid races ) set it for adequate autovacuuming
and, if you want, do a vacuum full and reindex for day D-1 ( which is
now constant ). If you need to keep hundreds of days you could go for
weekly or monthly partition, if you keep few you could consider
hourly.

Another one could be to make, say, daily partitions, but no rules /
triggers to redirect inserts, just the constraint and inheritance
rules to get proper selects, let the hourly insert ( and subsequent
updates ) go to the main table.
Then, on the hourly job, you can move all the now constant past-hour
rows to the appropiate partition ( which just an "insert into part
delete from main where adequate_range returning *" ). The partitions
will not need reindex or vacuuming as they will become append-only,
and the main table will have between 1 and two hour of records, and
would probably be adequately managed with autovacuum, which given the
table will be small should be fast.
Old partitions could be dropped when unneeded as above.

    Francisco Olarte.


pgsql-general by date:

Previous
From: Hello World
Date:
Subject: Re: Security Issues: Allowing Clients to Execute SQL in the Backend.
Next
From: David Noel
Date:
Subject: Re: SQL query runs fine on one platform (FreeBSD), but hangs on another (Windows)