Re: pg_dump performance - Mailing list pgsql-performance

From Jared Mauch
Subject Re: pg_dump performance
Date
Msg-id 20071226214958.GA92245@puck.nether.net
Whole thread Raw
In response to Re: pg_dump performance  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-performance
On Wed, Dec 26, 2007 at 11:35:59PM +0200, Heikki Linnakangas wrote:
> I run a quick oprofile run on my laptop, with a table like that, filled
> with dummy data. It looks like indeed ~30% of the CPU time is spent in
> sprintf, to convert the integers and inets to string format. I think you
> could speed that up by replacing the sprintf calls in int2, int4 and inet
> output functions with faster, customized functions. We don't need all the
> bells and whistles of sprintf, which gives the opportunity to optimize.

    Hmm.  Given the above+below perhaps there's something that can
be tackled in the source here.. will look at poking around in there ...
our sysadmin folks don't like the idea of running patched stuff (aside from
conf changes) as they're concerned about losing patches btw upgrades.

    I'm waiting on one of my hosts in Japan to come back online
so perhaps I can hack the source and attempt some optimization
after that point.  It's not the beefy host that I have this on
though and not even multi-{core,cpu} so my luck may be poor.

> A binary mode dump should go a lot faster, because it doesn't need to do
> those conversions, but binary dumps are not guaranteed to work across
> versions.

    I'll look at this.  Since this stuff is going into something else
perhaps I can get it to be slightly faster to not convert from binary ->
string -> binary(memory) again.  A number of the columns are unused in my
processing and some are used only when certain criteria are met (some
are always used).

> BTW, the profiling I did earlier led me to think this should be optimized
> in the compiler. I started a thread about that on the gcc mailing list but
> got busy with other stuff and didn't follow through that idea:
> http://gcc.gnu.org/ml/gcc/2007-10/msg00073.html

(* drift=off mode=drifting-fast *)
    I'd have to say after a quick review of this, it does look
like they're right and it should go somewhat in the C lib.  I'm on
Solaris 10 with my host.  There may be some optimizations that the compiler
could do when linking the C library but I currently think they're on
sound footing.

(* drift=off  mode=end *)

    - Jared


--
Jared Mauch  | pgp key available via finger from jared@puck.nether.net
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.

pgsql-performance by date:

Previous
From: Mark Mielke
Date:
Subject: Re: With 4 disks should I go for RAID 5 or RAID 10
Next
From: "Guillaume Smet"
Date:
Subject: Re: More shared buffers causes lower performances