Thread: On columnar storage (2)

On columnar storage (2)

From
Alvaro Herrera
Date:
As discussed in
https://www.postgresql.org/message-id/20150611230316.GM133018@postgresql.org
we've been working on implementing columnar storage for Postgres.
Here's some initial code to show our general idea, and to gather
comments related to what we're building.  This is not a complete patch,
and we don't claim that it works!  This is in very early stages, and we
have a lot of work to do to get this in working shape.

This was proposed during the Developer's Unconference in Ottawa earlier
this year.  While some questions were raised about some elements of our
design, we don't think they were outright objections, so we have pressed
forward on the expectation that any limitations can be fixed before this
is final if they are critical, or in subsequent commits if not.

The commit messages for each patch should explain what we've done in
enough technical detail, and hopefully provide a high-level overview of
what we're developing.

The first few pieces are "ready for comment" -- feel free to speak up
about the catalog additions, the new COLUMN STORE bits we added to the
grammar, the way we handle column stores in the relcache, or the
mechanics to create column store catalog entries.

The later half of the patch series is much less well cooked yet; for
example, the colstore_dummy module is just a simple experiment to let us
verify that the API is working.  The planner and executor code are
mostly stubs, and we are not yet sure of what are the executor nodes
that we would like to have: while we have discussed this topic
internally a lot, we haven't yet formed final opinions, and of course
the stub implementations are not doing the proper things, and in many
cases they are even not doing anything at all.

Still, we believe this shows the general spirit of things, which is that
we would like these new objects be first-class citizens in the Postgres
architecture:

a) so that the optimizer will be able to extract as much benefit as is
possible from columnar storage: it won't be at arms-length through an
opaque interface, but rather directly wired into plans, and have Path
representation eventually.

b) so that it is possible to implement things such as tables that live
completely in columnar storage, as mentioned by Tom regarding Salesforce
extant columnar storage.


Please don't think that the commits attached below represent development
history.  We played with the early pieces for quite a while before
settling on what you see here.  The presented split is intended to ease
reading.  We continue to play with the planner and executor code,
getting ourselves familiar with it enough that we can write something
that actually works.

This patch is joint effort of Tomáš Vondra and myself, with
contributions from Simon Riggs.  There's a lot of code attribute to me
in the commit messages that was actually authored by Tomáš.  (Git
decided to lay blame on me because I split the commits.)



  The research leading to these results has received funding from the
  European Union’s Seventh Framework Programme (FP7/2007-2015) under grant
  agreement n° 318633.

--
Álvaro Herrera                          Developer, http://www.PostgreSQL.org/

Attachment

Re: On columnar storage (2)

From
Haribabu Kommi
Date:
On Tue, Sep 1, 2015 at 8:53 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> As discussed in
> https://www.postgresql.org/message-id/20150611230316.GM133018@postgresql.org
> we've been working on implementing columnar storage for Postgres.
> Here's some initial code to show our general idea, and to gather
> comments related to what we're building.  This is not a complete patch,
> and we don't claim that it works!  This is in very early stages, and we
> have a lot of work to do to get this in working shape.
>
> This was proposed during the Developer's Unconference in Ottawa earlier
> this year.  While some questions were raised about some elements of our
> design, we don't think they were outright objections, so we have pressed
> forward on the expectation that any limitations can be fixed before this
> is final if they are critical, or in subsequent commits if not.
>
> The commit messages for each patch should explain what we've done in
> enough technical detail, and hopefully provide a high-level overview of
> what we're developing.
>
> The first few pieces are "ready for comment" -- feel free to speak up
> about the catalog additions, the new COLUMN STORE bits we added to the
> grammar, the way we handle column stores in the relcache, or the
> mechanics to create column store catalog entries.
>
> The later half of the patch series is much less well cooked yet; for
> example, the colstore_dummy module is just a simple experiment to let us
> verify that the API is working.  The planner and executor code are
> mostly stubs, and we are not yet sure of what are the executor nodes
> that we would like to have: while we have discussed this topic
> internally a lot, we haven't yet formed final opinions, and of course
> the stub implementations are not doing the proper things, and in many
> cases they are even not doing anything at all.


Fujitsu is also interested in implementing a columnar storage extension.
First we thought of implementing this extension using index access methods
The following is the basic design idea of the columnar extension, currently
this may need to be redesigned according to columnar access methods,

create an vertical columnar index on a table with specified columns that are
needed to be stored in columnar storage format. To provide performance
benefit for both read and write operations, the data is stored in two formats,
1) write optimized storage (WOS) 2) read optimized storage (ROS). This
is useful for the users where there is a great chance of data modification
that is newly added.

Because of two storage's, we need maintain two entries in pg_class table.
one is WOS and others are all columns in columnar storage.

Insert:

write optimized storage is the data of all columns that are part of VCI are
stored in a row wise format. All the newly added data is stored in WOS
relation with xmin/xmax information also. If user wants to update/delete the
newly added data, it doesn't affect the performance much compared to
deleting the data from columnar storage.

The tuples which don't have multiple copies or frozen data will be moved
from WOS to ROS periodically by the background worker process or autovauum
process. Every column data is stored separately in it's relation file. There
is no transaction information is present in ROS. The data in ROS can be
referred with tuple ID.

In this approach, the column data is present in both heap and columnar
storage, whereas with columnar access methods the column data doesn't
present in the heap.

Select:

Because of two storage formats, during the select operation, the data in WOS
is converted into Local ROS for the statement to be executed. The conversion
cost depends upon the number of tuples present in the WOS file. This
may add some performance overhead for select statements.

Delete:

During the delete operation, whenever the data is deleted in heap at the same
time the data in WOS file is marked as deleted similar like heap. But in case
if the data is already migrated from WOS to ROS, then we will maintain some
delete vector to store the details of tuple id, transaction information and etc.
During the data read from ROS file, it is verified against delete
vector and confirms
whether the record is visible or not? All the delete vectors data is
applied to ROS
periodically.

The concept of columnar extension is from Fujitsu Labs, Japan.
Any comments for further evaluation of this approach according to
columnar access
methods?

Regards,
Hari Babu
Fujitsu Australia



Re: On columnar storage (2)

From
Jeff Janes
Date:
Could we get this rebased past the merge of the parallel execution commits?

Thanks,

Jeff



Re: On columnar storage (2)

From
Michael Paquier
Date:
On Wed, Dec 9, 2015 at 3:10 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Could we get this rebased past the merge of the parallel execution commits?

+1. Alvaro, Tomas, Simon, what are the next plans with those patches?
-- 
Michael



Re: On columnar storage (2)

From
Alvaro Herrera
Date:
Michael Paquier wrote:
> On Wed, Dec 9, 2015 at 3:10 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > Could we get this rebased past the merge of the parallel execution commits?
> 
> +1. Alvaro, Tomas, Simon, what are the next plans with those patches?

Yeah, I've been working intermittently on getting the whole tree rebased
and squashed, because after the last submission we made a lot of
progress.  I'll repost later.  I think it should be marked "returned
with feedback" for now.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: On columnar storage (2)

From
Michael Paquier
Date:
On Tue, Dec 22, 2015 at 11:43 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Michael Paquier wrote:
>> On Wed, Dec 9, 2015 at 3:10 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> > Could we get this rebased past the merge of the parallel execution commits?
>>
>> +1. Alvaro, Tomas, Simon, what are the next plans with those patches?
>
> Yeah, I've been working intermittently on getting the whole tree rebased
> and squashed, because after the last submission we made a lot of
> progress.  I'll repost later.  I think it should be marked "returned
> with feedback" for now.

Ok. Noted. Thanks.
-- 
Michael



Re: On columnar storage (2)

From
Konstantin Knizhnik
Date:
Hi Alvaro,

May be you know, that I have implemented IMCS (in-memory-columnar-store) 
as PostgreSQL extension.
It was not so successful, mostly because people prefer to use standard 
SQL rather than using some special functions for accessing columnar 
storage (CS). Now I am thinking about second reincarnation of IMCS, 
based on FDW and CSP (custom nodes). This is why I am very interested in 
your patch. I have investigated previous version of the patch and have 
some questions.
I will be pleased if you can clarify them to me:

1. CS API.
I agree with you that FDW API seems to be not enough to efficiently 
support work with CS.
At least we need batch insert.
But may be it is better to extend FDW API rather than creating special 
API for CS?

2. Horizontal<->Vertical data mapping. As far as I understand this 
patch, the model of CS assumes that some table columns are stored in 
horizontal format (in heap), some - in vertical format (in CS).  And 
there is one-to-one mapping between horizontal and vertical parts of row 
using CTID. But been involved in several projects requiring OLAP, I 
found out that in most cases it is more convenient to have one-to-many 
mapping. Assume some trading system dealing with stock quotes.
Data looks something like this:

Symbol  Day            Open Close High  Low  Volume
AAA       12/22/2015  10.0   12.0   13.0  8.0   100
AAB       12/22/2015  9.0       8.0   10.0  9.0   200
...
AAA       12/23/2015  12.0   11.5   12.5 11.0    50
AAB       12/23/2015  8.0       8.8    8.5    8.0  300

So it can be represented using the following table:
    create table Quote (Symbol char(10), Day date, Open real, High 
real, Low real, Close real, Volume integer);

Most likely we need to calculate some statistic for particular symbol or 
set of symbols.
For example, portfolio is set of symbols and we need to somehow analyze 
instruments in this portfolio.

There are about several thousands symbols,  tens instruments in 
portfolio and tens of thousands quotes per symbol (in other cases size 
of timeseries are much larger - millions elements).
How can we efficiently execute query like:
    select Symbol,sum(Close*Volume)/sum(Volume) as VWAP from Quote 
group by Symbol    where day between '01/01/2001' and '01/01/2010' and Symbol in 
('AAA', 'AAB','ABB',...);

If we have index by Symbol, then it will contain a lot of duplicates. 
And it is not clear how to efficiently combine index scan by symbol name 
and time slice.

One of the possible solution is to embed timeseries into tuples.
In this case we will have something like this:
    create table Quote (Symbol char(10), Day timeseries(date), Open 
timeseries(real), High timeseries(real),                                    Low timeseries(real), Close 
timeseries(real), Volume timeseries(integer));

We are using here unexisted type timeseries. It is something similar 
with array, but its content in stored in columnar storage rather than in 
record's TOAST.
In this case we can efficiently locate records by symbol (there are only 
few thousands entries in the table) and then perform CS operations with 
located timeseries.

So here we also split tuple into horizontal and vertical part. In 
horizontal part we store just identifier of timeseries.
Query plan should combine standard nodes with custom CS nodes. Mixing 
horizontal and vertical operations significantly complicates optimizer 
and restricts flexibility: having proposed representation it is 
difficult to efficiently calculate some characteristic for all symbols 
in specified time range. This is why I am not sure that it is the only 
possible and most efficient approach. But in any case there should be 
some efficient plan for queries like above.

3. Transpose of data and role of CS.
Let's look once again on Quote example above. Data is received in time 
ascending order. But most queries require grouping it by symbol.  So at 
some stage we have to "transpose"  data. To efficiently append data to 
timeseries we need to buffer it somewhere and then use append range of 
values. In Fujitsu approach two different representations of data are 
used: reader and writer optimized. In IMCS approach, CS is just 
temporary projection of normal PostgreSQL tables. So we do not need to 
worry about durability - it is enforced by PostgreSQL.

So the question is whether CS should be only storage for the data or 
just copy (may be transient) of normal table?

Best regards,
Konstantin

On 22.12.2015 17:43, Alvaro Herrera wrote:
> Michael Paquier wrote:
>> On Wed, Dec 9, 2015 at 3:10 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> Could we get this rebased past the merge of the parallel execution commits?
>> +1. Alvaro, Tomas, Simon, what are the next plans with those patches?
> Yeah, I've been working intermittently on getting the whole tree rebased
> and squashed, because after the last submission we made a lot of
> progress.  I'll repost later.  I think it should be marked "returned
> with feedback" for now.
>




Re: On columnar storage (2)

From
Alvaro Herrera
Date:
Konstantin Knizhnik wrote:

Hi,

> May be you know, that I have implemented IMCS (in-memory-columnar-store) as
> PostgreSQL extension.
> It was not so successful, mostly because people prefer to use standard SQL
> rather than using some special functions for accessing columnar storage
> (CS). Now I am thinking about second reincarnation of IMCS, based on FDW and
> CSP (custom nodes). This is why I am very interested in your patch.

Great to hear.

> I have investigated previous version of the patch and have some
> questions.  I will be pleased if you can clarify them to me:
> 
> 1. CS API.
> I agree with you that FDW API seems to be not enough to efficiently support
> work with CS.
> At least we need batch insert.
> But may be it is better to extend FDW API rather than creating special API
> for CS?

The patch we have proposed thus far does not mess with executor
structure too much, so probably it would be possible to add some things
here and there to the FDW API and it might work.  But in the long term I
think the columnar storage project is more ambitious; for instance, I'm
sure we will want to be able to vectorise certain operations, and the
FDW API will become a bottleneck, so to speak.  I'm thinking in
vectorisation in two different ways: one is that some operations such as
computing aggregates over large data sets can work a lot faster if you
feed the value of one column for multiple tuples at a time in columnar
format; that way you can execute the operation directly in the CPU
(this requires specific support from the aggregate functions.)
For this to work, the executor needs to be rejigged so that multiple
values (tuples) can be passed at once.

The other aspect of vectorisation is that one input tuple might have
been split in several data origins, so that one half of the tuple is in
columnar format and another format is in row format; that lets you do
very fast updates on the row-formatted part, while allowing fast reads
for the columnar format, for instance.  (It's well known that columnar
oriented storage does not go well with updates; some implementation even
disallow updates and deletes altogether.)  Currently within the executor
a tuple is a TupleTableSlot which contains one Datum array, which has
all the values coming out of the HeapTuple; but for split storage
tuples, we will need to have a TupleTableSlot that has multiple "Datum
arrays" (in a way --- because, actually, once we get to vectorise as in
the preceding paragraph, we no longer have a Datum array, but some more
complex representation).

I think that trying to make the FDW API address all these concerns,
while at the same time *also* serving the needs of external data
sources, insanity will ensue.

> 2. Horizontal<->Vertical data mapping. As far as I understand this patch,
> the model of CS assumes that some table columns are stored in horizontal
> format (in heap), some - in vertical format (in CS).  And there is
> one-to-one mapping between horizontal and vertical parts of row using CTID.

Yes, that part needs to go away.  We will deal with this eventually; the
patch I posted was just some very basic infrastructure.  In the future
we would like to be able to have real support for not having to
translate between column-oriented and row-oriented formats; at least for
some operations.  (I expect that we will leave most code as currently
and require translation, while other parts that have been optimized are
able to skip the translation step.  As things mature we make more things
understand the new format without translation.)  This is also dependent
on being able to vectorise the executor.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: On columnar storage (2)

From
Alvaro Herrera
Date:
Konstantin Knizhnik wrote:

> 3. Transpose of data and role of CS.
> Let's look once again on Quote example above. Data is received in time
> ascending order. But most queries require grouping it by symbol.  So at some
> stage we have to "transpose"  data. To efficiently append data to timeseries
> we need to buffer it somewhere and then use append range of values. In
> Fujitsu approach two different representations of data are used: reader and
> writer optimized. In IMCS approach, CS is just temporary projection of
> normal PostgreSQL tables. So we do not need to worry about durability - it
> is enforced by PostgreSQL.
> 
> So the question is whether CS should be only storage for the data or just
> copy (may be transient) of normal table?

Our original plan was that a CS was the primary storage of data, not a
duplicate.  However, after some discussion it became apparent that are
several use cases that are better served by allowing redundant storage,
i.e. having CSs that are just a reader-optimized copy of data that
exists elsewhere.  While I'm not a fan of that approach, I think it
would be good to leave the door open for a future implementation of
that.  However, I think it'll bring interesting challenges to the
optimizer side, so I'm not promising to work on it.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: On columnar storage (2)

From
Jim Nasby
Date:
On 12/28/15 1:15 PM, Alvaro Herrera wrote:
> Currently within the executor
> a tuple is a TupleTableSlot which contains one Datum array, which has
> all the values coming out of the HeapTuple; but for split storage
> tuples, we will need to have a TupleTableSlot that has multiple "Datum
> arrays" (in a way --- because, actually, once we get to vectorise as in
> the preceding paragraph, we no longer have a Datum array, but some more
> complex representation).
>
> I think that trying to make the FDW API address all these concerns,
> while at the same time*also*  serving the needs of external data
> sources, insanity will ensue.

Are you familiar with DataFrames in Pandas[1]? They're a collection of 
Series[2], which are essentially vectors. (Technically, they're more 
complex than that because you can assign arbitrary indexes). So instead 
of the normal collection of rows, a DataFrame is a collection of 
columns. Series are also sparse (like our tuples), but the sparse value 
can be anything, not just NULL (or NaN in panda-speak). There's also 
DataFrames in R; not sure how equivalent they are.

I mention this because there's a lot being done with dataframes and they 
might be a good basis for a columnstore API, killing 2 birds with one stone.

BTW, the underlying python type for Series is ndarrays[3], which are 
specifically designed to interface to things like C arrays. So a column 
store could potentially be accessed directly.

Aside from potential API inspiration, it might be useful to prototype a 
columnstore using Series (or maybe ndarrays).

[1] 
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
[2] http://pandas.pydata.org/pandas-docs/stable/api.html#series
[3] http://docs.scipy.org/doc/numpy-1.10.0/reference/internals.html
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: On columnar storage (2)

From
Alvaro Herrera
Date:
Jeff Janes wrote:
> Could we get this rebased past the merge of the parallel execution commits?

Here you go.

Actually, this is not just a rebase, but rather a heavily revamped
version of the previous patch.  This is now functional to some degree (I
bet you could break it with complex queries or perhaps even with simple
table inheritance -- but all TPC-H queries work, and many of them are
faster than with the original code), using the design that was proposed
previously: column stores are considered separate relations and added to
the plan tree, with a suitable join condition to their main table.
There's a new executor node called ColumnStoreScan which has special
glue code to call a specific column store implementation, previously
created with the provided CREATE COLUMN STORE ACCESS METHOD command.
We provide a sample access method, called "vertical" (for vertical
partitioning) which is the simplest we could make, to have something to
test.  It's not actually columnar oriented.

There's a lot of optimizer trickery to make this thing work (most of it
by David Rowley).  We have a first step that mutates the join tree to
add the nodes we need; at that point we also mutate the Var nodes that
point to columns that are in the store, so that they point to the column
store instead of to the relation.  David also added code to prune
colstore relations that are "unused" -- this is more tricky than it
sounds because the join code somewhere adds all Vars for the relations
in the range table,

Back on the executor side there's some code to ModifyTable and COPY so
that they put data into the column store, using the access method
routines.

Another thing we needed was to implement "physical attributes", which is
a cut-down version of the logical column mapping patch that Tomas and I
spent so long trying to get to work.  This version was implemented from
scratch by David; it's more limited in scope compared to the previous
version but it's enough to get colstores working.

I have a version of this patch that's split in smaller commits, easier
to read.  I can share that if anyone's interested.


Now, I don't actually intend that any of this is for application.  It's
more to start some discussion on where do we want to go next.  Simon,
David, Tomas and I have discussed this at length and we have various
ideas on where to go from here.  I (and/or somebody else) will post
later about this.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: On columnar storage (2)

From
Robert Haas
Date:
On Mon, Dec 28, 2015 at 2:15 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> 1. CS API.
>> I agree with you that FDW API seems to be not enough to efficiently support
>> work with CS.
>> At least we need batch insert.
>> But may be it is better to extend FDW API rather than creating special API
>> for CS?
>
> The patch we have proposed thus far does not mess with executor
> structure too much, so probably it would be possible to add some things
> here and there to the FDW API and it might work.  But in the long term I
> think the columnar storage project is more ambitious; for instance, I'm
> sure we will want to be able to vectorise certain operations, and the
> FDW API will become a bottleneck, so to speak.  I'm thinking in
> vectorisation in two different ways: one is that some operations such as
> computing aggregates over large data sets can work a lot faster if you
> feed the value of one column for multiple tuples at a time in columnar
> format; that way you can execute the operation directly in the CPU
> (this requires specific support from the aggregate functions.)
> For this to work, the executor needs to be rejigged so that multiple
> values (tuples) can be passed at once.
>
> The other aspect of vectorisation is that one input tuple might have
> been split in several data origins, so that one half of the tuple is in
> columnar format and another format is in row format; that lets you do
> very fast updates on the row-formatted part, while allowing fast reads
> for the columnar format, for instance.  (It's well known that columnar
> oriented storage does not go well with updates; some implementation even
> disallow updates and deletes altogether.)  Currently within the executor
> a tuple is a TupleTableSlot which contains one Datum array, which has
> all the values coming out of the HeapTuple; but for split storage
> tuples, we will need to have a TupleTableSlot that has multiple "Datum
> arrays" (in a way --- because, actually, once we get to vectorise as in
> the preceding paragraph, we no longer have a Datum array, but some more
> complex representation).
>
> I think that trying to make the FDW API address all these concerns,
> while at the same time *also* serving the needs of external data
> sources, insanity will ensue.

I think the opposite.  Suppose we add vectorization support (or
whatever other feature, could be asynchronous execution or
faster-than-light travel or whatever) to the executor.  Well, are we
going to say that FDWs can't get access to that feature?  I think that
would be an extremely surprising decision.  Presumably, if we add cool
capabilities to the executor, we want FDWs to be able to get access to
those new capabilities just as built-in tables can.  So, we'll
probably think about what new FDW methods - optional methods, probably
- would be needed to expose the new capabilities and add them.

Now, there may still be some reason why it doesn't make sense to have
the columnar store stuff go through the FDW API.  It's sorta doing
something different.  If you tilt your head right, a table with a
columnar store smells a lot like two tables that will frequently need
to be joined; and if we were to implement it that way, then one of
those tables would just be a table, and the other one would be a
"foreign table" that actually has backing storage.

If we don't do it that way, then I'm curious what my mental model for
this feature should be.  We don't have any concept currently of an
"incomplete tuple" that includes only a subset of the columns.  Some
of the columns can be TOAST pointers that have to be expanded before
use, but they can't be left out altogether...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: On columnar storage (2)

From
Alvaro Herrera
Date:
So we discussed some of this stuff during the developer meeting in
Brussels and the main conclusion is that we're going to split this up in
multiple independently useful pieces, and write up the general roadmap
in the wiki so that we can discuss in detail on-list.

I'm marking this as Returned with Feedback now.

Thanks everybody,

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: On columnar storage (2)

From
Haribabu Kommi
Date:
On Mon, Feb 1, 2016 at 12:11 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> So we discussed some of this stuff during the developer meeting in
> Brussels and the main conclusion is that we're going to split this up in
> multiple independently useful pieces, and write up the general roadmap
> in the wiki so that we can discuss in detail on-list.
>
> I'm marking this as Returned with Feedback now.
>
> Thanks everybody,

Here I attached the DBT-3 performance report that is measured on the
prototype patch
that is written for columnar storage as I mentioned in my earlier mail
with WOS and ROS
design.

Currently to measure the benefits of this design, we did the following changes,
1. Created the columnar storage index similar like other index methods
2. Used custom plan to generate the plan that can use the columnar storage
3. Optimized parallelism to use the columnar storage

The code is not fully ready yet, I posted the performance results to
get a view from
community, whether this approach is really beneficial?

I will provide the full details of the design and WIP patches later.

Regards,
Hari Babu
Fujitsu Australia

Attachment

Re: On columnar storage (2)

From
Bert
Date:
Hello Haribabu,

Thank you for the performance test. But please not that the patch is 'thrown away', and will be totally rewritten. I have no idea of the status of the second / third attempt however.
However, what is interesting is that for some queries this patch is already on par with VCI. Which db is that exactly?

Alvaro,
You wrote that a wiki page would be opened regarding this. But I still cannot find such a page (expect for an old page which hasn't changed in the last year). Is there already something we can look at?

Bert

On Thu, Mar 3, 2016 at 6:07 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Mon, Feb 1, 2016 at 12:11 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> So we discussed some of this stuff during the developer meeting in
> Brussels and the main conclusion is that we're going to split this up in
> multiple independently useful pieces, and write up the general roadmap
> in the wiki so that we can discuss in detail on-list.
>
> I'm marking this as Returned with Feedback now.
>
> Thanks everybody,

Here I attached the DBT-3 performance report that is measured on the
prototype patch
that is written for columnar storage as I mentioned in my earlier mail
with WOS and ROS
design.

Currently to measure the benefits of this design, we did the following changes,
1. Created the columnar storage index similar like other index methods
2. Used custom plan to generate the plan that can use the columnar storage
3. Optimized parallelism to use the columnar storage

The code is not fully ready yet, I posted the performance results to
get a view from
community, whether this approach is really beneficial?

I will provide the full details of the design and WIP patches later.

Regards,
Hari Babu
Fujitsu Australia


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Bert Desmet
0477/305361

Re: On columnar storage (2)

From
Haribabu Kommi
Date:
On Thu, Mar 3, 2016 at 7:46 PM, Bert <biertie@gmail.com> wrote:
>
> Thank you for the performance test. But please not that the patch is 'thrown
> away', and will be totally rewritten. I have no idea of the status of the
> second / third attempt however.
> However, what is interesting is that for some queries this patch is already
> on par with VCI. Which db is that exactly?

The performance report is taken on the patch that is WIP columnar storage
on PostgreSQL database. Only the storage part of the code is finished.
To test the performance, we used custom plan to generate the plans
where it can use the columnar storage. This way we ran the performance
test.

I want to integrate this patch with syntax proposed by Alvaro for columnar
storage and share it with community, before that i want to share the current
storage design with the community for review by preparing some readme
file. I will try to send this soon.


Regards,
Hari Babu
Fujitsu Australia



Re: On columnar storage (2)

From
Alvaro Herrera
Date:
Bert wrote:

> Alvaro,
> You wrote that a wiki page would be opened regarding this. But I still
> cannot find such a page (expect for an old page which hasn't changed in the
> last year). Is there already something we can look at?

Yeah, I haven't done that yet.  I will post here as soon as I get that
done.  Happy to share another beer to discuss, next time I'm over there.
I'm also going to have code to share for you to test by then!

What's the other page you mention?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: On columnar storage (2)

From
Alvaro Herrera
Date:
Haribabu Kommi wrote:

> The performance report is taken on the patch that is WIP columnar storage
> on PostgreSQL database. Only the storage part of the code is finished.
> To test the performance, we used custom plan to generate the plans
> where it can use the columnar storage. This way we ran the performance
> test.

Quickly eyeballing your results I think they are similar to ours: there
are some performance gains but nothing spectacular.  That's why I want
to take another, more invasive approach that buys us more.  The wiki
page I'm to write will describe our rough plan for that.  Your input on
that will be appreciated.

> I want to integrate this patch with syntax proposed by Alvaro for columnar
> storage and share it with community, before that i want to share the current
> storage design with the community for review by preparing some readme
> file. I will try to send this soon.

Please do, thanks.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: On columnar storage (2)

From
Bert
Date:
and yes, I'll be at the next fosdem / pgconf.eu for sure. :-)

Bert

On Thu, Mar 3, 2016 at 3:40 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Bert wrote:

> Alvaro,
> You wrote that a wiki page would be opened regarding this. But I still
> cannot find such a page (expect for an old page which hasn't changed in the
> last year). Is there already something we can look at?

Yeah, I haven't done that yet.  I will post here as soon as I get that
done.  Happy to share another beer to discuss, next time I'm over there.
I'm also going to have code to share for you to test by then!

What's the other page you mention?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
Bert Desmet
0477/305361