Re: Compression of full-page-writes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Compression of full-page-writes
Date
Msg-id CA+TgmoYhw0pkAD=nPPdpoeT0itF5S3sHO-wEWEx7k9bYZS8VqA@mail.gmail.com
Whole thread Raw
In response to Re: Compression of full-page-writes  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Mon, Dec 8, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-12-08 14:09:19 -0500, Robert Haas wrote:
>> > records, just fpis. There is no evidence that we even want to compress
>> > other record types, nor that our compression mechanism is effective at
>> > doing so. Simple => keep name as compress_full_page_writes
>>
>> Quite right.
>
> I don't really agree with this. There's lots of records which can be
> quite big where compression could help a fair bit. Most prominently
> HEAP2_MULTI_INSERT + INIT_PAGE. During initial COPY that's the biggest
> chunk of WAL. And these are big and repetitive enough that compression
> is very likely to be beneficial.
>
> I still think that just compressing the whole record if it's above a
> certain size is going to be better than compressing individual
> parts. Michael argued thta that'd be complicated because of the varying
> size of the required 'scratch space'. I don't buy that argument
> though. It's easy enough to simply compress all the data in some fixed
> chunk size. I.e. always compress 64kb in one go. If there's more
> compress that independently.

I agree that idea is worth considering.  But I think we should decide
which way is better and then do just one or the other.  I can't see
the point in adding wal_compress=full_pages now and then offering an
alternative wal_compress=big_records in 9.5.

I think it's also quite likely that there may be cases where
context-aware compression strategies can be employed.  For example,
the prefix/suffix compression of updates that Amit did last cycle
exploit the likely commonality between the old and new tuple.  We
might have cases like that where there are meaningful trade-offs to be
made between CPU and I/O, or other reasons to have user-exposed knobs.
I think we'll be much happier if those are completely separate GUCs,
so we can say things like compress_gin_wal=true and
compress_brin_effort=3.14 rather than trying to have a single
wal_compress GUC and assuming that we can shoehorn all future needs
into it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: On partitioning
Next
From: Andres Freund
Date:
Subject: Re: On partitioning