Home > mailing lists

Re: On columnar storage - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: On columnar storage
Date	June 15, 2015 10:03:00
Msg-id	CAA4eK1J2tAObqh6aq=JbG2WBKhVmDna-S5rAHxvZ=YhWvZG=YQ@mail.gmail.com Whole thread Raw
In response to	Re: On columnar storage (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses	Re: On columnar storage (CK Tan <cktan@vitessedata.com>)
List	pgsql-hackers

Tree view

On Fri, Jun 12, 2015 at 10:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Amit Kapila wrote:
> > On Fri, Jun 12, 2015 at 4:33 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
> > wrote:
> > > One critical detail is what will be used to identify a heap row when
> > > talking to a CS implementation. There are two main possibilities:
> > >
> > > 1. use CTIDs
> > > 2. use some logical tuple identifier
> > >
> > > Using CTIDs is simpler. One disadvantage is that every UPDATE of a row
> > > needs to let the CS know about the new location of the tuple, so that
> > > the value is known associated with the new tuple location as well as the
> > > old. This needs to happen even if the value of the column itself is not
> > > changed.
> >
> > Isn't this somewhat similar to index segment?
>
> Not sure what you mean with "index segment".

The part similar to index segment is reference to heap for visibility

information and tuple id (TID). Have I misunderstood something?

> > Will the column store obey snapshot model similar to current heap tuples,
> > if so will it derive the transaction information from heap tuple?
>
> Yes, visibility will be tied to the heap tuple -- a value is accessed
> only when its corresponding heap row has already been determined to be
> visible.

Won't it possible that all columns of a table belong to column-store?

I think for such a case heap will just be used to store transaction information

(visibility info) for a column store tuple and depending on how the

column-store is organized, the reference to this information needs to be

stored in column-store (the same row reference might need to be stored for

each column value). Also any write operation could lead to much more

I/O because of updation at 2 different locations (one in column-store and

other in heap).

> One interesting point that raises from this is about vacuum:
> when are we able to remove a value from the store?

Yes, that could also be quite tricky to handle, may be one naive way

could be to make list of all TID's from heap that needs to be expired

and then search for references of all those TID's in column-store.

I understand your point for re-using the existing transaction infrastructure

for column-store by keeping that information in heap as it is done now,

but I think that won't be free either.

Another point to consider here is does the column-store needs

transactional consistency, do other commercial/opensource column-store

implementation's are transactional consistent and if yes, then can't we

think of doing it in a way where data could be present both in heap as well

as in column-store (I understand that it could lead to duplicate data,

OTOH, such an implementation anyway eliminates the need for indexes,

so may be worth considering).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Noah Misch
Date: 15 June 2015, 09:12:30
Subject: Re: "could not adopt C locale" failure at startup on Windows

From: Vik Fearing
Date: 15 June 2015, 12:33:10
Subject: Re: Sequence Access Method WIP

Re: On columnar storage - Mailing list pgsql-hackers

Previous

Next