Re: bytea vs. pg_dump - Mailing list pgsql-hackers

From Tom Lane
Subject Re: bytea vs. pg_dump
Date
Msg-id 116.1241651061@sss.pgh.pa.us
Whole thread Raw
In response to Re: bytea vs. pg_dump  (Bernd Helmle <mailings@oopsware.de>)
Responses Re: bytea vs. pg_dump  (Bernd Helmle <mailings@oopsware.de>)
Re: bytea vs. pg_dump  (Bernd Helmle <mailings@oopsware.de>)
List pgsql-hackers
Bernd Helmle <mailings@oopsware.de> writes:
> --On Dienstag, Mai 05, 2009 10:00:37 -0400 Tom Lane <tgl@sss.pgh.pa.us> 
> wrote:
>> Seems like the right response might be some micro-optimization effort on
>> byteaout.

> Hmm looking into profiler statistics seems to second your suspicion:

> Normal COPY shows:

>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  31.29     81.38    81.38   134487     0.00     0.00  CopyOneRowTo
>  22.88    140.89    59.51   134487     0.00     0.00  byteaout
>  13.44    175.84    34.95 3052797224     0.00     0.00 
> appendBinaryStringInfo
>  12.10    207.32    31.48 3052990837     0.00     0.00  CopySendChar
>   8.45    229.31    21.99 3052797226     0.00     0.00  enlargeStringInfo
>   3.90    239.45    10.14    55500     0.00     0.00  pglz_decompress

I hadn't looked closely at these numbers before, but now that I do,
what I think they are telling us is that the high proportion of
backslashes in standard bytea output is a real killer for COPY
performance.  With no backslashes, CopySendChar wouldn't be in the
picture at all here, and appendBinaryStringInfo/enlargeStringInfo
would be called many fewer times (roughly 134487 not 3052797224)
with proportionately more characters processed per call.  The inner
loop of CopyOneRowTo (I assume CopyAttributeOutText has been inlined
into that function) is relatively cheap for ordinary characters and
much less so for backslashes, so I bet that number would go down too.
And as already noted, byteaout itself works pretty hard to produce
the current representation.

So I'm now persuaded that a better textual representation for bytea
should indeed make things noticeably better here.  It would be
useful though to cross-check this thought by profiling a case that
dumps a comparable volume of text data that contains no backslashes...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: conditional dropping of columns/constraints
Next
From: "Dickson S. Guedes"
Date:
Subject: WIP patch for TODO Item: Add prompt escape to display the client and server versions