Re: Pluggable storage - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Pluggable storage |
Date | |
Msg-id | CA+TgmoY3LXVUPQVdZW70XKp5PsXffO82pXXt=beegcV+=RsQgg@mail.gmail.com Whole thread Raw |
In response to | Pluggable storage (Alvaro Herrera <alvherre@2ndQuadrant.com>) |
Responses |
Re: Pluggable storage
|
List | pgsql-hackers |
On Fri, Aug 12, 2016 at 7:15 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Many have expressed their interest in this topic, but I haven't seen any > design of how it should work. Here's my attempt; I've been playing with > this for some time now and I think what I propose here is a good initial > plan. This will allow us to write permanent table storage that works > differently than heapam.c. At this stage, I haven't throught through > whether this is going to allow extensions to define new storage modules; > I am focusing on AMs that can coexist with heapam in core. Thanks for taking a stab at this. I'd like to throw out a few concerns. One, I'm worried that adding an additional layer of pointer-jumping is going to slow things down and make Andres' work to speed up the executor more difficult. I don't know that there is a problem there, and if there is a problem I don't know what to do about it, but I think it's something we need to consider. I am somewhat inclined to believe that we need to restructure the executor in a bigger way so that it passes around datums instead of tuples; I'm inclined to believe that the current tuple-centric model is probably not optimal even for the existing storage format. It seems even less likely to be right for a data format in which fetching columns is more expensive than currently, such as a columnar store. Two, I think that we really need to think very hard about how the query planner will interact with new heap storage formats. For example, suppose cstore_fdw were rewritten as a new heap storage format. Because ORC contains internal indexing structures with characteristics somewhat similar to BRIN, many scans can be executed much more efficiently than for our current heap storage format. If it can be seen that an entire chunk will fail to match the quals, we can skip the whole chunk. Some operations may permit additional optimizations: for example, given SELECT count(*) FROM thing WHERE quals, we may be able to push the COUNT(*) down into the heap access layer. If it can be verified that EVERY tuple in a chunk will match the quals, we can just increment the count by that number without visiting each tuple individually. This could be really fast. These kinds of query planner issues are generally why I have favored trying to do something like this through the FDW interface, which already has the right APIs for this kind of thing, even if we're not using them all yet. I don't say that's the only way to crack this problem, but I think we're going to find that a heap storage API that doesn't include adequate query planner integration is not a very exciting thing. Three, with respect to this limitation: > iii) All tuples need to be identifiable by ItemPointers. Storages that > have different requirements will need careful additional thought across > the board. I think it's a good idea for a first patch in this area to ignore (or mostly ignore) this problem - e.g. maybe allow such storage formats but refuse to create indexes on them. But eventually I think we're going to want/need to do something about it. There are an awful lot of interesting ideas that we can't pursue without addressing this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: