Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Zedstore - compressed in-core columnar storage
Date
Msg-id f090d405-c5ac-835e-fdb3-0c8d9c850012@enterprisedb.com
Whole thread Raw
In response to Re: Zedstore - compressed in-core columnar storage  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Zedstore - compressed in-core columnar storage  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On 11/16/20 1:59 PM, Merlin Moncure wrote:
> On Thu, Nov 12, 2020 at 4:40 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>            master    zedstore/pglz    zedstore/lz4
>>   -------------------------------------------------
>>    copy      1855            68092            2131
>>    dump       751              905             811
>>
>> And the size of the lineitem table (as shown by \d+) is:
>>
>>   master: 64GB
>>   zedstore/pglz: 51GB
>>   zedstore/lz4: 20GB
>>
>> It's mostly expected lz4 beats pglz in performance and compression
>> ratio, but this seems a bit too extreme I guess. Per past benchmarks
>> (e.g. [1] and [2]) the difference in compression/decompression time
>> should be maybe 1-2x or something like that, not 35x like here.
> 
> I can't speak to the ratio, but in basic backup/restore scenarios pglz
> is absolutely killing me; Performance is just awful; we are cpubound
> in backups throughout the department.  Installations defaulting to
> plgz will make this feature show very poorly.
> 

Maybe. I'm not disputing that pglz is considerably slower than lz4, but
judging by previous benchmarks I'd expect the compression to be slower
maybe by a factor of ~2x. So the 30x difference is suspicious. Similarly
for the compression ratio - lz4 is great, but it seems strange it's 1/2
the size of pglz. Which is why I'm speculating that something else is
going on.

As for the "plgz will make this feature show very poorly" I think that
depends. I think we may end up with pglz doing pretty well (compared to
heap), but lz4 will probably outperform that. OTOH for various use cases
it may be more efficient to use something else with worse compression
ratio, but allowing execution on compressed data, etc.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Cache relation sizes?
Next
From: Bruce Momjian
Date:
Subject: Re: doc CREATE INDEX