Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Zedstore - compressed in-core columnar storage
Date
Msg-id 20190414162645.nrybpxeshtqf3r5s@development
Whole thread Raw
In response to Re: Zedstore - compressed in-core columnar storage  (Ashwin Agrawal <aagrawal@pivotal.io>)
List pgsql-hackers
On Tue, Apr 09, 2019 at 02:03:09PM -0700, Ashwin Agrawal wrote:
>   On Tue, Apr 9, 2019 at 9:13 AM Konstantin Knizhnik
>   <k.knizhnik@postgrespro.ru> wrote:
>
>     On 09.04.2019 18:51, Alvaro Herrera wrote:
>     > On 2019-Apr-09, Konstantin Knizhnik wrote:
>     >
>     >> On 09.04.2019 3:27, Ashwin Agrawal wrote:
>     >>> Heikki and I have been hacking recently for few weeks to implement
>     >>> in-core columnar storage for PostgreSQL. Here's the design and
>     initial
>     >>> implementation of Zedstore, compressed in-core columnar storage
>     (table
>     >>> access method). Attaching the patch and link to github branch [1] to
>     >>> follow along.
>     >> Thank you for publishing this patch. IMHO Postgres is really missing
>     normal
>     >> support of columnar store
>     > Yep.
>     >
>     >> and table access method API is the best way of integrating it.
>     > This is not surprising, considering that columnar store is precisely
>     the
>     > reason for starting the work on table AMs.
>     >
>     > We should certainly look into integrating some sort of columnar
>     storage
>     > in mainline.  Not sure which of zedstore or VOPS is the best
>     candidate,
>     > or maybe we'll have some other proposal.  My feeling is that having
>     more
>     > than one is not useful; if there are optimizations to one that can be
>     > borrowed from the other, let's do that instead of duplicating effort.
>     >
>     There are two different aspects:
>     1. Store format.
>     2. Vector execution.
>
>     1. VOPS is using mixed format, something similar with Apache parquet.
>     Tuples are stored vertically, but only inside one page.
>     It tries to minimize trade-offs between true horizontal and true
>     vertical storage:
>     first is most optimal for selecting all rows, while second - for
>     selecting small subset of rows.
>     To make this approach more efficient, it is better to use large page
>     size - default Postgres 8k pages is not enough.
>
>      From my point of view such format is better than pure vertical storage
>     which will be very inefficient if query access larger number of columns.
>     This problem can be somehow addressed by creating projections: grouping
>     several columns together. But it requires more space for storing
>     multiple projections.
>
>   Right, storing all the columns in single page doens't give any savings on
>   IO.
>

Yeah, although you could save some I/O thanks to compression even in
that case.

>     2. Doesn't matter which format we choose, to take all advantages of
>     vertical representation we need to use vector operations.
>     And Postgres executor doesn't support them now. This is why VOPS is
>     using some hacks, which is definitely not good and not working in all
>     cases.
>     zedstore is not using such hacks and ... this is why it never can reach
>     VOPS performance.
>
>   Vectorized execution is orthogonal to storage format. It can be even
>   applied to row store and performance gained. Similarly column store
>   without vectorized execution also gives performance gain better
>   compression rations and such benefits. Column store clubbed with
>   vecotorized execution makes it lot more performant agree. Zedstore
>   currently is focused to have AM piece in place, which fits the postgres
>   ecosystem and supports all the features heap does.

Not sure it's quite orthogonal. Sure, you can apply it to rowstores too,
but I'd say column stores are naturally better suited for it.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Zedstore - compressed in-core columnar storage
Next
From: Tomas Vondra
Date:
Subject: Re: Zedstore - compressed in-core columnar storage