On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>> On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> I did some tracing and verified that pg_dump passes data to deflate()
>>> one table row at a time. I'm not sure about the performance
>>> implications of that, but it does seem like it might be something to
>>> look into.
>
>> I suspect if this was a problem the zlib people would have added
>> internal buffering ages ago. I find it hard to believe we're not the
>> first application to use it this way.
>
> I dug into this a bit more. zlib *does* have internal buffering --- it
> has to, because it needs a minimum lookahead of several hundred bytes
> to ensure that compression works properly. The per-call overhead of
> deflate() looks a bit higher than one could wish when submitting short
> chunks, but oprofile shows that "pg_dump -Fc" breaks down about like
> this:
During dump (size of dump is 2.6 GB),
No Compression :
- postgres at 70-100% CPU and pg_dump at something like 10-20%
- dual core is useful (a bit...)
- dump size 2.6G
- dump time 2m25.288s
Compression Level 1 :
- postgres at 70-100% CPU and pg_dump at 20%-100%
- dual core is definitely useful
- dump size 544MB
- dump time 2m33.337s
Since this box is mostly idle right now, eating CPU for compression is no
problem...
Adding an option to use LZO instead of gzip could be useful...
Compressing the uncompressed 2.6GB dump :
- gzip -1 :
- compressed size : 565 MB
- compression throughput : 28.5 MB/s
- decompression throughput : 74 MB/s
- LZO -1 :
- compressed size : 696M
- compression throughput : 86 MB/s
- decompression throughput : 247 MB/s
Conclusion : LZO could help for fast disks (RAID) or slow disks on a
CPU-starved server...