Re: Vacuuming strategy - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Vacuuming strategy
Date
Msg-id CA+bJJbx1At24s0DiV2FrvqtpoSQwNiNqc2hTzMZ2zdzargNLyQ@mail.gmail.com
Whole thread Raw
In response to Re: Vacuuming strategy  (Elanchezhiyan Elango <elanelango@gmail.com>)
List pgsql-general
Hi:

On Wed, Apr 30, 2014 at 7:40 PM, Elanchezhiyan Elango
<elanelango@gmail.com> wrote:
> Francisco,
> Thanks for the partitioning idea. I used to have the tables partitioned. But
> now that I have moved to a schema where data is split across about ~90
> tables I have moved away from partitioning. But it's something I have to
> reconsider at least for the high traffic tables.

I've discovered paritioning lends itself to some nifty tricks on high
traffic tables, specially when data is partitioned by a timestamp key
correlated with insertion times ( in our case these are CDRs ).

Aside from the easiness of dropping the old partitions and not having
to vaccum them, I've been testing something with a couple of tricks.
Like dropping indexes on old partitions ( which are no longer queried
frequently and whose queries normally always get a full table scan ),
or moving no longer updates partitions to a different archived schema
( they still get queried, inheritance takes care of this, but this
unclutters my table listings and lets me do a backup optimization. I
instruct pg_dump to dump the non-updating schema only whenever I
change it and all but the updating one more frequently ( these is for
pg_dump backups, for which I keep a rotating series to protect again
pilot / software errors and to restore them in auxiliary servers fr
testing / data mining, for disk failure we have a separate replication
/ log archiving setup ). These division also aided me when I had to do
a 9.1-9.3 updgrade, as we only needed to dump/restore the updating
schemas in the downtime window and then redid the archived one after
it.

Of course I need a full set of scripts to maintiain this, and if you
decide to make one of this you'll need an even bigger one.

Also, what puzzles me is your insert/update/delete pattern of access.
When I've found this I normally have just insert / delete. It seems
like you are doing cumulative hourly stats but updating them in place.
If this is your case I've found myself doing it ( daily stats updated
in place, to have the current day view growing on 'real' time ) and
switched to an schema where I inserted into an aux table, which was
queried using sum()/avg(), and added and moved the totals once the day
was done to another table. It was easier on vacuums, as the daily
table just needed a daily vaccum after the big delete, and not even a
full one, as the space was going to be reused ( this was before
partitioning, and I used a view to query a union of the  totals table
plus an agregated query of the incremental one, it worked quite well )

Francisco Olarte.


pgsql-general by date:

Previous
From: Michael Paquier
Date:
Subject: Re: importing a messy text file
Next
From: "Hengky Liwandouw"
Date:
Subject: Re: Return and sql tuple descriptions are incompatible