Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Zedstore - compressed in-core columnar storage
Date
Msg-id 20190415130138.n4q4smvz5aqc5k4k@development
Whole thread Raw
In response to Re: Zedstore - compressed in-core columnar storage  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sun, Apr 14, 2019 at 06:39:47PM +0200, Tomas Vondra wrote:
>On Thu, Apr 11, 2019 at 06:20:47PM +0300, Heikki Linnakangas wrote:
>>On 11/04/2019 17:54, Tom Lane wrote:
>>>Ashwin Agrawal <aagrawal@pivotal.io> writes:
>>>>Thank you for trying it out. Yes, noticed for certain patterns
>>>>pg_lzcompress() actually requires much larger output buffers. Like
>>>>for one 86 len source it required 2296 len output buffer. Current
>>>>zedstore code doesn’t handle this case and errors out. LZ4 for same
>>>>patterns works fine, would highly recommend using LZ4 only, as
>>>>anyways speed is very fast as well with it.
>>>
>>>You realize of course that *every* compression method has some inputs
>>>that it makes bigger.  If your code assumes that compression always
>>>produces a smaller string, that's a bug in your code, not the
>>>compression algorithm.
>>
>>Of course. The code is not making that assumption, although clearly
>>there is a bug there somewhere because it throws that error. It's
>>early days..
>>
>>In practice it's easy to weasel out of that, by storing the data
>>uncompressed, if compression would make it longer. Then you need an
>>extra flag somewhere to indicate whether it's compressed or not. It
>>doesn't break the theoretical limit because the actual stored length
>>is then original length + 1 bit, but it's usually not hard to find a
>>place for one extra bit.
>>
>
>Don't we already have that flag, though? I see ZSCompressedBtreeItem
>has t_flags, and there's ZSBT_COMPRESSED, but maybe it's more
>complicated.
>

After thinking about this a bit more, I think a simple flag may not be
enough. It might be better to have some sort of ID of the compression
algorithm in each item, which would allow switching algorithm for new
data (which may be useful e.g after we add new stuff in core, or when
the initial choice was not the best one).

Of course, those are just wild thoughts at this point, it's not
something the current PoC has to solve right away.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: block-level incremental backup
Next
From: David Rowley
Date:
Subject: Re: partitioning performance tests after recent patches