Re: Pluggable storage - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Pluggable storage
Date
Msg-id CA+TgmoY3LXVUPQVdZW70XKp5PsXffO82pXXt=beegcV+=RsQgg@mail.gmail.com
Whole thread Raw
In response to Pluggable storage  (Alvaro Herrera <alvherre@2ndQuadrant.com>)
Responses Re: Pluggable storage
List pgsql-hackers
On Fri, Aug 12, 2016 at 7:15 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Many have expressed their interest in this topic, but I haven't seen any
> design of how it should work.  Here's my attempt; I've been playing with
> this for some time now and I think what I propose here is a good initial
> plan.  This will allow us to write permanent table storage that works
> differently than heapam.c.  At this stage, I haven't throught through
> whether this is going to allow extensions to define new storage modules;
> I am focusing on AMs that can coexist with heapam in core.

Thanks for taking a stab at this.  I'd like to throw out a few concerns.

One, I'm worried that adding an additional layer of pointer-jumping is
going to slow things down and make Andres' work to speed up the
executor more difficult.  I don't know that there is a problem there,
and if there is a problem I don't know what to do about it, but I
think it's something we need to consider.  I am somewhat inclined to
believe that we need to restructure the executor in a bigger way so
that it passes around datums instead of tuples; I'm inclined to
believe that the current tuple-centric model is probably not optimal
even for the existing storage format.  It seems even less likely to be
right for a data format in which fetching columns is more expensive
than currently, such as a columnar store.

Two, I think that we really need to think very hard about how the
query planner will interact with new heap storage formats.  For
example, suppose cstore_fdw were rewritten as a new heap storage
format.   Because ORC contains internal indexing structures with
characteristics somewhat similar to BRIN, many scans can be executed
much more efficiently than for our current heap storage format.  If it
can be seen that an entire chunk will fail to match the quals, we can
skip the whole chunk.  Some operations may permit additional
optimizations: for example, given SELECT count(*) FROM thing WHERE
quals, we may be able to push the COUNT(*) down into the heap access
layer.  If it can be verified that EVERY tuple in a chunk will match
the quals, we can just increment the count by that number without
visiting each tuple individually.  This could be really fast.  These
kinds of query planner issues are generally why I have favored trying
to do something like this through the FDW interface, which already has
the right APIs for this kind of thing, even if we're not using them
all yet.  I don't say that's the only way to crack this problem, but I
think we're going to find that a heap storage API that doesn't include
adequate query planner integration is not a very exciting thing.

Three, with respect to this limitation:

> iii) All tuples need to be identifiable by ItemPointers.  Storages that
> have different requirements will need careful additional thought across
> the board.

I think it's a good idea for a first patch in this area to ignore (or
mostly ignore) this problem - e.g. maybe allow such storage formats
but refuse to create indexes on them.  But eventually I think we're
going to want/need to do something about it.  There are an awful lot
of interesting ideas that we can't pursue without addressing this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Vladimir Sitnikov
Date:
Subject: Re: Slowness of extended protocol
Next
From: Robert Haas
Date:
Subject: Re: Bug in to_timestamp().