Re: Compress ReorderBuffer spill files using LZ4 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Compress ReorderBuffer spill files using LZ4
Date
Msg-id 87060757-4936-4e3b-8d49-418f71d07eb3@enterprisedb.com
Whole thread Raw
In response to Re: Compress ReorderBuffer spill files using LZ4  (Julien Tachoires <julmon@gmail.com>)
Responses Re: Compress ReorderBuffer spill files using LZ4
Re: Compress ReorderBuffer spill files using LZ4
List pgsql-hackers
On 7/15/24 20:50, Julien Tachoires wrote:
> Hi,
> 
> Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit :
>>
>> Le ven. 7 juin 2024 à 05:59, Tomas Vondra
>> <tomas.vondra@enterprisedb.com> a écrit :
>>>
>>> On 6/6/24 12:58, Julien Tachoires wrote:
>>>> ...
>>>>
>>>> When compiled with LZ4 support (--with-lz4), this patch enables data
>>>> compression/decompression of these temporary files. Each transaction
>>>> change that must be written on disk (ReorderBufferDiskChange) is now
>>>> compressed and encapsulated in a new structure.
>>>>
>>>
>>> I'm a bit confused, but why tie this to having lz4? Why shouldn't this
>>> be supported even for pglz, or whatever algorithms we add in the future?
>>
>> That's right, reworking this patch in that sense.
> 
> Please find a new version of this patch adding support for LZ4, pglz
> and ZSTD. It introduces the new GUC logical_decoding_spill_compression
> which is used to set the compression method. In order to stay aligned
> with the other server side GUCs related to compression methods
> (wal_compression, default_toast_compression), the compression level is
> not exposed to users.
> 

Sounds reasonable. I wonder if it might be useful to allow specifying
the compression level in those places, but that's clearly not something
this patch needs to do.

> The last patch of this set is still in WIP, it adds the machinery
> required for setting the compression methods as a subscription option:
> CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
> I think there is a major problem with this approach: the logical
> decoding context is tied to one replication slot, but multiple
> subscriptions can use the same replication slot. How should this work
> if 2 subscriptions want to use the same replication slot but different
> compression methods?
> 

Do we really support multiple subscriptions sharing the same slot? I
don't think we do, but maybe I'm missing something.

> At this point, compression is only available for the changes spilled
> on disk. It is still not clear to me if the compression of data
> transiting through the streaming protocol should be addressed by this
> patch set or by another one. Thought ?
> 

I'd stick to only compressing the data spilled to disk. It might be
useful to compress the streamed data too, but why shouldn't we compress
the regular (non-streamed) transactions too? Yeah, it's more efficient
to compress larger chunks, but we can fit quite large transactions into
logical_decoding_work_mem without spilling.

FWIW I'd expect that to be handled at the libpq level - there's already
a patch for that, but I haven't checked if it would handle this. But
maybe more importantly, I think compressing streamed data might need to
handle some sort of negotiation of the compression algorithm, which
seems fairly complex.

To conclude, I'd leave this out of scope for this patch.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Nitin Motiani
Date:
Subject: Re: long-standing data loss bug in initial sync of logical replication
Next
From: Andres Freund
Date:
Subject: Re: Upgrade Debian CI images to Bookworm