Re: pg_dump performance - Mailing list pgsql-performance

From Jared Mauch
Subject Re: pg_dump performance
Date
Msg-id 20071227145843.GA96276@puck.nether.net
Whole thread Raw
In response to Re: pg_dump performance  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-performance
On Thu, Dec 27, 2007 at 01:14:25PM +0000, Gregory Stark wrote:
> "Jared Mauch" <jared@puck.nether.net> writes:
>
> >     pg_dump is utilizing about 13% of the cpu and the
> > corresponding postgres backend is at 100% cpu time.
> > (multi-core, multi-cpu, lotsa ram, super-fast disk).
> >...
> >     pg8.3(beta) with the following variances from default
> >
> > checkpoint_segments = 300        # in logfile segments, min 1, 16MB each
> > effective_cache_size = 512MB    # typically 8KB each
> > wal_buffers = 128MB                # min 4, 8KB each
> > shared_buffers = 128MB            # min 16, at least max_connections*2, 8KB each
> > work_mem = 512MB                 # min 64, size in KB
>
> Fwiw those are pretty unusual numbers. Normally work_mem is much smaller than
> shared_buffers since you only need one block of memory for shared buffers and
> work_mem is for every query (and every sort within those queries). If you have
> ten queries running two sorts each this setting of work_mem could consume 5GB.

    I'd still have lots of ram left :)

    I'm dealing with normal query results that end up matching 5-10 million
rows based on the index (starttime) not counting the filter afterwards.  Each
backend rarely makes it over 256m.

> Raising shared buffers could improve your pg_dump speed. If all the data is in
> cache it would reduce the time spend moving data between filesystem cache and
> postgres shared buffers.

    I doubt it's all in cache, but I can look at this.  I did not do a
lot of fine tuning of numbers, just enough to get past the defaults and have
an acceptable amount of performance.

> What made you raise wal_buffers so high? I don't think it hurts but that's a
> few orders of magnitude higher than what I would expect to help.

    I'm adding chunks of ~1.2m rows every other minute.  Once I increase
my data collection pool, this will go up to around [1]2-3m rows or so.  I
found having higher wal and checkpoint helped.  I didn't spend a lot of time
tweaking these options.  Is there some way you know to determine high
watermark numbers for what is being used?

    - Jared

[1] - I am concerned that with my 'insert' speed being around 100k/sec
      and raw pg_dump speed being around 182k/sec i will start getting data
      faster than can be stored and postprocessed.

--
Jared Mauch  | pgp key available via finger from jared@puck.nether.net
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.

pgsql-performance by date:

Previous
From: Gregory Stark
Date:
Subject: Re: More shared buffers causes lower performances
Next
From: Mark Mielke
Date:
Subject: Re: With 4 disks should I go for RAID 5 or RAID 10