Re: High inserts, bulk deletes - autovacuum vs scheduled vacuum - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: High inserts, bulk deletes - autovacuum vs scheduled vacuum
Date
Msg-id 20070110203556.GX12217@nasby.net
Whole thread Raw
In response to High inserts, bulk deletes - autovacuum vs scheduled vacuum  ("Jeremy Haile" <jhaile@fastmail.fm>)
List pgsql-performance
On Tue, Jan 09, 2007 at 12:26:41PM -0500, Jeremy Haile wrote:
> I am developing an application that has very predictable database
> operations:
>   -inserts several thousand rows into 3 tables every 5 minutes. (table
>   contain around 10 million rows each)
>   -truncates and rebuilds aggregate tables of this data every 5 minutes.
>   (several thousand rows each)
>   -regular reads of aggregate table and sometimes large tables by user
>   interaction
>   -every night, hundreds of thousands of rows are deleted from these 3
>   tables (old data)
>   -20-30 other tables get inserted/updated slowly throughout the day
>
> In order to optimize performance of the inserts, I disabled
> autovacuum/row-level stats and instead run "vacuum analyze" on the whole
> DB every hour.  However this operation takes around 20 minutes of each
> hour.  This means that the database is involved in vacuum/analyzing
> tables 33% of the time.
>
> I'd like any performance advice, but my main concern is the amount of
> time vacuum/analyze runs and its possible impact on the overall DB
> performance.  Thanks!

If much of the data in the database isn't changing that often, then why
continually re-vacuum the whole thing?

I'd suggest trying autovacuum and see how it does (though you might want
to tune it to be more or less aggressive, and you'll probably want to
enable the cost delay).

The only cases where manual vacuum makes sense to me is if you've got a
defined slow period and vacuuming during that slow period is still
frequent enough to keep up with demand, or if you've got tables that
have a very high churn rate and need to be kept small. In the later
case, I'll usually setup a cronjob to vacuum those tables once a minute
with no cost delay. I'm sure there might be some other cases where not
using autovac might make sense, but generally I'd much rather let
autovac worry about this so I don't have to.

> I am running 8.2 (will be 8.2.1 soon).  The box is Windows with 2GB RAM
> connected to a SAN over fiber.  The data and pg_xlog are on separate
> partitions.
>
> Modified configuration:
> effective_cache_size = 1000MB
> random_page_cost = 3
> default_statistics_target = 50
> maintenance_work_mem = 256MB
> shared_buffers = 400MB
> temp_buffers = 10MB
> work_mem = 10MB
> max_fsm_pages = 1500000

One other useful manual vacuum to consider is running vacuumdb -av
periodically (say, once a month) and looking at the last few lines of
output. That will give you a good idea on how large you should set
max_fsm_pages. Running the output of vacuumdb -av through pgFouine will
give you other useful data.

> checkpoint_segments = 30
> stats_row_level = off
> stats_start_collector = off

Unless you're really trying to get the last ounce of performance out,
it's probably not worth turning those stats settings off.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

pgsql-performance by date:

Previous
From: "Jeremy Haile"
Date:
Subject: Re: Performance of PostgreSQL on Windows vs Linux
Next
From: Scott Marlowe
Date:
Subject: Re: Performance of PostgreSQL on Windows vs Linux