Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: [REVIEW] Re: Compression of full-page-writes
Date
Msg-id CAGTBQpaPVMNW_Ew0yspdj2-FQKxtDeVQanPxxGvRBoWOQ_uqkg@mail.gmail.com
Whole thread Raw
In response to Re: [REVIEW] Re: Compression of full-page-writes  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Fri, Dec 12, 2014 at 7:25 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Dec 13, 2014 at 1:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Dec 12, 2014 at 10:04 AM, Andres Freund <andres@anarazel.de> wrote:
>>>> Note that autovacuum and fsync are off.
>>>> =# select phase, user_diff, system_diff,
>>>> pg_size_pretty(pre_update - pre_insert),
>>>> pg_size_pretty(post_update - pre_update) from results;
>>>>        phase        | user_diff | system_diff | pg_size_pretty |
>>>> pg_size_pretty
>>>> --------------------+-----------+-------------+----------------+----------------
>>>>  Compression FPW    | 42.990799 |    0.868179 | 429 MB         | 567 MB
>>>>  No compression     | 25.688731 |    1.236551 | 429 MB         | 727 MB
>>>>  Compression record | 56.376750 |    0.769603 | 429 MB         | 566 MB
>>>> (3 rows)
>>>> If we do record-level compression, we'll need to be very careful in
>>>> defining a lower-bound to not eat unnecessary CPU resources, perhaps
>>>> something that should be controlled with a GUC. I presume that this stands
>>>> true as well for the upper bound.
>>>
>>> Record level compression pretty obviously would need a lower boundary
>>> for when to use compression. It won't be useful for small heapam/btree
>>> records, but it'll be rather useful for large multi_insert, clean or
>>> similar records...
>>
>> Unless I'm missing something, this test is showing that FPW
>> compression saves 298MB of WAL for 17.3 seconds of CPU time, as
>> against master.  And compressing the whole record saves a further 1MB
>> of WAL for a further 13.39 seconds of CPU time.  That makes
>> compressing the whole record sound like a pretty terrible idea - even
>> if you get more benefit by reducing the lower boundary, you're still
>> burning a ton of extra CPU time for almost no gain on the larger
>> records.  Ouch!
>>
>> (Of course, I'm assuming that Michael's patch is reasonably efficient,
>> which might not be true.)
> Note that I was curious about the worst-case ever, aka how much CPU
> pg_lzcompress would use if everything is compressed, even the smallest
> records. So we'll surely need a lower-bound. I think that doing some
> tests with a lower bound set as a multiple of SizeOfXLogRecord would
> be fine, but in this case what we'll see is a result similar to what
> FPW compression does.


In general, lz4 (and pg_lz is similar to lz4) compresses very poorly
anything below about 128b in length. Of course there are outliers,
with some very compressible stuff, but with regular text or JSON data,
it's quite unlikely to compress at all with smaller input. Compression
is modest up to about 1k when it starts to really pay off.

That's at least my experience with lots JSON-ish, text-ish and CSV
data sets, compressible but not so much in small bits.



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching
Next
From: Josh Berkus
Date:
Subject: Re: On partitioning