Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Zedstore - compressed in-core columnar storage
Date
Msg-id 20190415201709.iuekkfen4df54pbg@development
Whole thread Raw
In response to Re: Zedstore - compressed in-core columnar storage  (Ashwin Agrawal <aagrawal@pivotal.io>)
List pgsql-hackers
On Mon, Apr 15, 2019 at 11:57:49AM -0700, Ashwin Agrawal wrote:
>   On Mon, Apr 15, 2019 at 11:18 AM Tomas Vondra
>   <tomas.vondra@2ndquadrant.com> wrote:
>
>     Maybe. I'm not going to pretend I fully understand the internals. Does
>     that mean the container contains ZSUncompressedBtreeItem as elements? Or
>     just the plain Datum values?
>
>   First, your reading of code and all the comments/questions so far have
>   been highly encouraging. Thanks a lot for the same.

;-)

>   Container contains ZSUncompressedBtreeItem as elements. As for Item will
>   have to store meta-data like size, undo and such info. We don't wish to
>   restrict compressing only items from same insertion sessions only. Hence,
>   yes doens't just store Datum values. Wish to consider it more tuple level
>   operations and have meta-data for it and able to work with tuple level
>   granularity than block level.

OK, thanks for the clarification, that somewhat explains my confusion.
So if I understand it correctly, ZSCompressedBtreeItem is essentially a
sequence of ZSUncompressedBtreeItem(s) stored one after another, along
with some additional top-level metadata.

>   Definitely many more tricks can be and need to be applied to optimize
>   storage format, like for fixed width columns no need to store the size in
>   every item. Keep it simple is theme have been trying to maintain.
>   Compression ideally should compress duplicate data pretty easily and
>   efficiently as well, but we will try to optimize as much we can without
>   the same.

I think there's plenty of room for improvement. The main problem I see
is that it mixes different types of data, which is bad for compression
and vectorized execution. I think we'll end up with a very different
representation of the container, essentially decomposing the items into 
arrays of values of the same type - array of TIDs, array of undo 
pointers, buffer of serialized values, etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Zedstore - compressed in-core columnar storage
Next
From: Bruce Momjian
Date:
Subject: Re: finding changed blocks using WAL scanning