Home > mailing lists

Re: a faster compression algorithm for pg_dump - Mailing list pgsql-hackers

From	Stefan Kaltenbrunner
Subject	Re: a faster compression algorithm for pg_dump
Date	April 14, 2010 05:25:44
Msg-id	4BC57BED.1090705@kaltenbrunner.cc Whole thread Raw
In response to	Re: a faster compression algorithm for pg_dump (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Tom Lane wrote:
> Joachim Wieland <joe@mcknight.de> writes:
>> If we still cannot do this, then what I am asking is: What does the
>> project need to be able to at least link against such a compression
>> algorithm?
> 
> Well, what we *really* need is a convincing argument that it's worth
> taking some risk for.  I find that not obvious.  You can pipe the output
> of pg_dump into your-choice-of-compressor, for example, and that gets
> you the ability to spread the work across multiple CPUs in addition to
> eliminating legal risk to the PG project.  And in any case the general
> impression seems to be that the main dump-speed bottleneck is on the
> backend side not in pg_dump's compression.

legal risks aside (I'm not a lawyer so I cannot comment on that) the 
current situation imho is:

* for a plain pg_dump the backend is the bottleneck
* for a pg_dump -Fc with compression, compression is a huge bottleneck
* for pg_dump | gzip, it is usually compression (or bytea and some other 
datatypes in <9.0)
* for a parallel dump you can either dump uncompressed and compress 
afterwards which increases diskspace requirements (and if you need 
parallel dump you usually have a large database) and complexity (because 
you would have to think about how to manually parallel the compression
* for a parallel dump that compresses inline you are limited by the 
compression algorithm on a per core base and given that the current 
inline compression overhead is huge you loose a lot of the benefits of 
parallel dump


Stefan

pgsql-hackers by date:

From: Simon Riggs
Date: 14 April 2010, 05:24:39
Subject: Re: master in standby mode croaks

From: Fujii Masao
Date: 14 April 2010, 05:32:26
Subject: Re: Remaining Streaming Replication Open Items

Re: a faster compression algorithm for pg_dump - Mailing list pgsql-hackers

Previous

Next