Re: Compression of full-page-writes - Mailing list pgsql-hackers

From KONDO Mitsumasa
Subject Re: Compression of full-page-writes
Date
Msg-id 5260D52A.3000101@lab.ntt.co.jp
Whole thread Raw
In response to Re: Compression of full-page-writes  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Hi,

Sorry for my reply late...

(2013/10/11 2:32), Fujii Masao wrote:
> Could you let me know how much WAL records were generated
> during each benchmark?
It was not seen difference hardly about WAL in DBT-2 benchmark. It was because
largest tuples are filled in random character which is difficult to compress, I
survey it.

So I test two pattern data. One is original data which is hard to compress data.
Second is little bit changing data which are easy to compress data. Specifically,
I substitute zero padding tuple for random character tuple.
Record size is same in original test data, I changed only character fo record.
Sample changed record is here.

* Original record (item table)
> 1       9830    W+ùMî/aGhÞVJ;t+Pöþm5v2î.        82.62   Tî%N#ROò|?ö;[_îë~!YäHPÜï[S!JV58Ü#;+$cPì=dãNò;=Þô5
> 2       1492    VIKëyC..UCçWSèQð2?&s÷Jf 95.78   >ýoCj'nîHR`i]cøuDH&-wì4èè}{39ámLß2mC712Tao÷
> 3       4485    oJ)kLvP^_:91BOïé        32.00   ð<èüJ÷RÝ_Jze+?é4Ü7ä-r=DÝK\\$;Fsà8ál5

* Changed sample record (item table)
> 1       9830    000000000000000000000000        95.77   00000000000000000000000000000000000000000
> 2       764     00000000000000  47.92   00000000000000000000000000000000000000000000000000
> 3       4893    000000000000000000000   15.90   00000000000000000000000000000000000



* DBT-2 Result

@Werehouse = 340
                         | NOTPM     | 90%tile     | Average | S.Deviation
------------------------+-----------+-------------+---------+-------------
no-patched              | 3319.02   | 13.606648   | 7.589   | 8.428
patched                 | 3341.25   | 20.132364   | 7.471   | 10.458
patched-testdata_changed| 3738.07   | 20.493533   | 3.795   | 10.003

Compression patch gets higher performance than no-patch in easy to compress test
data. It is because compression patch make archive WAL more small size, in
result, waste file cache is less than no-patch. Therefore, it was inflected
file-cache more effectively.

However, test in hard to compress test data have little bit lessor performance
than no-patch. I think it is compression overhead in pglz.


> I think that this benchmark result clearly means that the patch
> has only limited effects in the reduction of WAL volume and
> the performance improvement unless the database contains
> highly-compressible data like pgbench_accounts.
Your expectation is right. I think that low CPU cost and high compression
algorithm make your patch more better and better performance, too.

> filler. But if
> we can use other compression algorithm, maybe we can reduce
> WAL volume very much.
Yes, Please!

> I'm not sure what algorithm is good  for WAL compression, though.
Community member think Snappy or lz4 is better. You'd better to select one,
or test two algorithms.

> It might be better to introduce the hook for compression of FPW
> so that users can freely use their compression module, rather
> than just using pglz_compress(). Thought?
In my memory, Andres Freund developed like this patch. Did it commit or
developing now? I have thought this idea is very good.

Regards,
--
Mitsumasa KONDO
NTT Open Source  Software Center

Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Multiple psql -c / -f options
Next
From: Mark Kirkwood
Date:
Subject: Re: ERROR : 'tuple concurrently updated'