Home > mailing lists

Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension) - Mailing list pgsql-hackers

From	Haribabu Kommi
Subject	Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
Date	January 17, 2017 10:09:57
Msg-id	CAJrrPGefW1VomwWUS9BaLoXyHEAu9dLcW5s3JNNp5FxA0R5cFw@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] [WIP]Vertical Clustered Index (columnar storeextension) (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses	Re: [HACKERS] [WIP]Vertical Clustered Index (columnar storeextension) (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List	pgsql-hackers

Tree view

On Sun, Jan 8, 2017 at 2:01 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 12/29/16 9:55 PM, Haribabu Kommi wrote:
The tuples which don't have multiple copies or frozen data will be moved
from WOS to ROS periodically by the background worker process or autovauum
process. Every column data is stored separately in it's relation file. There
is no transaction information is present in ROS. The data in ROS can be
referred with tuple ID.

Would updates be handled via the delete mechanism you described then?

Updates are handled similar like delete operations, but there are some extra

index insert operations occurs in this index even when the update is of HOT

type, because of TID-CRID mapping.

In this approach, the column data is present in both heap and columnar
storage.

ISTM one of the biggest reasons to prefer a column store over heap is to ditch the 24 byte overhead, so I'm not sure how much of a win this is.

Yes, that' correct. Currently with this approach, it is not possible to ditch the

heap completely. This approach is useful for the cases, where the user wants

to store only some columns as part of clustered index.

Another complication is that one of the big advantages of a CSTORE is allowing analysis to be done efficiently on a column-by-column (as opposed to row-by-row) basis. Does your patch by chance provide that?

Not the base patch that I shared. But the further patches provides the data access

column-by-column basis using the custom plan methods.

Generally speaking, I do think the idea of adding support for this as an "index" is a really good starting point, since that part of the system is pluggable. It might be better to target getting only what needs to be in core into core to begin with, allowing the other code to remain an extension for now. I think there's a lot of things that will be discovered as we start moving into column stores, and it'd be very unfortunate to accidentally paint the core code into a corner somewhere.

Yes, it is possible to add only the code that is required in the core and keep the other part

as extension. Without providing the complete clustered index approach, I doubt whether

the necessary hooks and it's code gets accepted to the core.

As a side note, it's possible to get a lot of the benefits of a column store by using arrays. I've done some experiments with that and got an 80-90% space reduction, and most queries saw improved performance as well (there were a few cases that weren't better). The biggest advantage to this approach is people could start using it today, on any recent version of Postgres.

Interesting experiment.

That would be a great way to gain knowledge on what users would want to see in a column store, something else I suspect we need. It would also be far less code than what you or Alvaro are proposing. When it comes to large changes that don't have crystal-clear requirements, I think that's really important.

The main use case of this patch is to support mixed load environments,

where both OLTP and OLAP queries are possible. The advantage of

proposed patch design is, providing good performance to OLAP queries

without affecting OLTP.

Regards,

Hari Babu

Fujitsu Australia

pgsql-hackers by date:

From: Fujii Masao
Date: 17 January 2017, 09:48:19
Subject: Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

From: Amit Kapila
Date: 17 January 2017, 10:37:56
Subject: Re: [HACKERS] Cache Hash Index meta page.

Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension) - Mailing list pgsql-hackers

Previous

Next