Re: On columnar storage (2) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: On columnar storage (2)
Date
Msg-id CA+Tgmoa4dbLf6eKA0KTHCoTn5iBbK0TizjUsMABW11USrnrc0w@mail.gmail.com
Whole thread Raw
In response to Re: On columnar storage (2)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Mon, Dec 28, 2015 at 2:15 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> 1. CS API.
>> I agree with you that FDW API seems to be not enough to efficiently support
>> work with CS.
>> At least we need batch insert.
>> But may be it is better to extend FDW API rather than creating special API
>> for CS?
>
> The patch we have proposed thus far does not mess with executor
> structure too much, so probably it would be possible to add some things
> here and there to the FDW API and it might work.  But in the long term I
> think the columnar storage project is more ambitious; for instance, I'm
> sure we will want to be able to vectorise certain operations, and the
> FDW API will become a bottleneck, so to speak.  I'm thinking in
> vectorisation in two different ways: one is that some operations such as
> computing aggregates over large data sets can work a lot faster if you
> feed the value of one column for multiple tuples at a time in columnar
> format; that way you can execute the operation directly in the CPU
> (this requires specific support from the aggregate functions.)
> For this to work, the executor needs to be rejigged so that multiple
> values (tuples) can be passed at once.
>
> The other aspect of vectorisation is that one input tuple might have
> been split in several data origins, so that one half of the tuple is in
> columnar format and another format is in row format; that lets you do
> very fast updates on the row-formatted part, while allowing fast reads
> for the columnar format, for instance.  (It's well known that columnar
> oriented storage does not go well with updates; some implementation even
> disallow updates and deletes altogether.)  Currently within the executor
> a tuple is a TupleTableSlot which contains one Datum array, which has
> all the values coming out of the HeapTuple; but for split storage
> tuples, we will need to have a TupleTableSlot that has multiple "Datum
> arrays" (in a way --- because, actually, once we get to vectorise as in
> the preceding paragraph, we no longer have a Datum array, but some more
> complex representation).
>
> I think that trying to make the FDW API address all these concerns,
> while at the same time *also* serving the needs of external data
> sources, insanity will ensue.

I think the opposite.  Suppose we add vectorization support (or
whatever other feature, could be asynchronous execution or
faster-than-light travel or whatever) to the executor.  Well, are we
going to say that FDWs can't get access to that feature?  I think that
would be an extremely surprising decision.  Presumably, if we add cool
capabilities to the executor, we want FDWs to be able to get access to
those new capabilities just as built-in tables can.  So, we'll
probably think about what new FDW methods - optional methods, probably
- would be needed to expose the new capabilities and add them.

Now, there may still be some reason why it doesn't make sense to have
the columnar store stuff go through the FDW API.  It's sorta doing
something different.  If you tilt your head right, a table with a
columnar store smells a lot like two tables that will frequently need
to be joined; and if we were to implement it that way, then one of
those tables would just be a table, and the other one would be a
"foreign table" that actually has backing storage.

If we don't do it that way, then I'm curious what my mental model for
this feature should be.  We don't have any concept currently of an
"incomplete tuple" that includes only a subset of the columns.  Some
of the columns can be TOAST pointers that have to be expanded before
use, but they can't be left out altogether...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: checkpointer continuous flushing
Next
From: Robert Haas
Date:
Subject: Re: Relation extension scalability