Re: On columnar storage - Mailing list pgsql-hackers

From Qingqing Zhou
Subject Re: On columnar storage
Date
Msg-id CAJjS0u2Lh9ix9Ff7_gigXJEfC1+yPkoOdAbyzMFs+P3PQNiY+Q@mail.gmail.com
Whole thread Raw
In response to On columnar storage  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Thu, Jun 11, 2015 at 4:03 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I've been trying to figure out a plan to enable native column stores
> (CS or "colstore") for Postgres.  Motivations:
>
> * avoid the 32 TB limit for tables
> * avoid the 1600 column limit for tables
> * increased performance
>
And better compression ratio.

> We're not interested in perpetuating the idea that a CS needs to go
> through the FDW mechanism.
>
Agree. It is cleaner to add a ColumnScan node which does a scan
against a columnar table, and a possible ColumnIndexScan for an
indexed columnar table seek.

> Since we want to have pluggable implementations, we need to have a
> registry of store implementations.
>
If we do real native implementation, where columnar store sits on par
with heap, can give us arbitray flexibility to control performance and
transaction, without worrying about interface (you defined below)
compatibility.

> One critical detail is what will be used to identify a heap row when
> talking to a CS implementation.  There are two main possibilities:
>
> 1. use CTIDs
> 2. use some logical tuple identifier
>
I like the concept of half row, half columnar table: this allows row
part good for select * and updates, and columnar part for other
purpose. Popular columnar-only table uses position alignment, which is
virtual (no storage), to associate each column value. CTIDs are still
needed but not for this purpose. An alternaive is:
1.  Allow column groups, where several columns physically stored together;
2.  Updates are handled by a separate row store table associated with
each columnar table.

> Query Processing
> ----------------
>
If we treat columnar storage as first class citizen as heap, we can
model after heap, which enables much natural change in parser,
rewriter, planner and executor.

Regards,
Qingqing



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: DBT-3 with SF=20 got failed
Next
From: Peter Geoghegan
Date:
Subject: Re: The purpose of the core team