Re: Compression of full-page-writes - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Compression of full-page-writes
Date
Msg-id CAA4eK1+4_b1OayphqAzoEr1+b2K9vaBtPvUbeCBHuLMHixQ=zw@mail.gmail.com
Whole thread Raw
In response to Re: Compression of full-page-writes  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Responses Re: Compression of full-page-writes  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
List pgsql-hackers
On Tue, Oct 15, 2013 at 11:41 AM, KONDO Mitsumasa
<kondo.mitsumasa@lab.ntt.co.jp> wrote:
> (2013/10/15 13:33), Amit Kapila wrote:
>>
>> Snappy is good mainly for un-compressible data, see the link below:
>>
>> http://www.postgresql.org/message-id/CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4YebMOA-wA@mail.gmail.com
>
> This result was gotten in ARM architecture, it is not general CPU.
> Please see detail document.
> http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9

I think in general also snappy is mostly preferred for it's low CPU
usage not for compression, but overall my vote is also for snappy.

> I found compression algorithm test in HBase. I don't read detail, but it
> indicates snnapy algorithm gets best performance.
> http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of

The dataset used for performance is quite different from the data
which we are talking about here (WAL).
"These are the scores for a data which consist of 700kB rows, each
containing a binary image data. They probably won’t apply to things
like numeric or text data."

> In fact, most of modern NoSQL storages use snappy. Because it has good
> performance and good licence(BSD license).
>
>
>> I think it is bit difficult to prove that any one algorithm is best
>> for all kind of loads.
>
> I think it is necessary to make best efforts in community than I do the best
> choice with strict test.

Sure, it is good to make effort to select the best algorithm, but if
you are combining this patch with inclusion of new compression
algorithm in PG, it can only make the patch to take much longer time.

In general, my thinking is that we should prefer compression to reduce
IO (WAL volume), because reducing WAL volume has other benefits as
well like sending it to subscriber nodes. I think it will help cases
where due to less n/w bandwidth, the disk allocated for WAL becomes
full due to high traffic on master and then users need some
alternative methods to handle such situations.

I think many users would like to use a method which can reduce WAL
volume and the users which don't find it enough useful in their
environments due to decrease in TPS or not significant reduction in
WAL have the option to disable it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Review: Patch to compute Max LSN of Data Pages
Next
From: Amit Kapila
Date:
Subject: Re: Patch for reserved connections for replication users