Re: Pluggable storage - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Pluggable storage |
Date | |
Msg-id | 20160817170344.GA901677@alvherre.pgsql Whole thread Raw |
In response to | Re: Pluggable storage (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
Responses |
Re: Pluggable storage
|
List | pgsql-hackers |
Anastasia Lubennikova wrote: > 13.08.2016 02:15, Alvaro Herrera: > >To support this, we introduce StorageTuple and StorageScanDesc. > >StorageTuples represent a physical tuple coming from some storage AM. > >It is necessary to have a pointer to a StorageAmRoutine in order to > >manipulate the tuple. For heapam.c, a StorageTuple is just a HeapTuple. > > StorageTuples concept looks really cool. I've got some questions on > details of implementation. > > Do StorageTuples have fields common to all implementations? > Or StorageTuple is totally abstract structure that has nothing to do > with data, except pointing to it? > > I mean, now we already have HeapTupleData structure, which is a pretty > good candidate to replace with StorageTuple. I was planning to replace all uses of HeapTuple in the executor with StorageTuple, actually. But the main reason I would like to avoid HeapTupleData itself is that it contains an assumption that there is a single palloc chunk that contains the tuple (t_len and t_data). This might not be true in representations that split the tuple, for example in columnar storage where you have one column in page A and another column in page B, for the same tuple. I suppose there might be some point to keeping t_tableOid and t_self, though. > And maybe add a "t_handler" field that points out to handler functions. > I don't sure if it will be a name of StorageAm, or its OID, or maybe the > main function itself. Although, If I'm not mistaken, we always have > RelationData when we want to operate the tuple, so having t_handler > in the StorageTuple is excessive. Yeah, I think the RelationData (or more precisely the StorageAmRoutine) is going to be available always, so I don't think we need a pointer in the tuple itself. > This approach allows to minimize code changes and ensure that we > won't miss any function that handles tuples. > > Do you see any weak points of the suggestion? > What design do you use in your prototype? It's currently a "void *" pointer in my prototype. > >RelationData gains ->rd_stamroutine which is a pointer to the > >StorageAmRoutine for the relation in question. Similarly, > >TupleTableSlot is augmented with a link to the StorageAmRoutine to > >handle the StorageTuple it contains (probably in most cases it's set at > >the same time as the tupdesc). This implies that routines such as > >ExecAssignScanType need to pass down the StorageAmRoutine from the > >relation to the slot. > > If we already have this pointer in t_handler as described below, > we don't need to pass it between functions and slots. I think it's better to have it in slots, so you can install multiple tuples in the slot without having to change the routine pointers each time. > >The executor is modified so that instead of calling heap_insert etc > >directly, it uses rel->rd_stamroutine to call these methods. The > >executor is still in charge of dealing with indexes, constraints, and > >any other thing that's not the tuple storage itself (this is one major > >point in which this differs from FDWs). This all looks simple enough, > >with one exception and a few notes: > > That is exactly what I tried to describe in my proposal. > Chapter "Relation management". I'm sure, you've already noticed > that it will require huge source code cleaning. I've carefully read > the sources and found "violators" of abstraction in src/backend/commands. > The list is attached to the wiki page > https://wiki.postgresql.org/wiki/HeapamRefactoring. > > Except these, there are some pretty strange and unrelated functions in > src/backend/catalog. > I'm willing to fix them, but I'd like to synchronize our efforts. I very much would like to stay away from touching src/backend/catalog, which are the functions that deal with system catalogs. We can simply say that system catalogs are hardcoded to use heapam.c storage for now. If we later see a need to enable some particular catalog using a different storage implementation, we can change the code for that specific catalog in src/backend/catalog and everywhere else, to use the abstract API instead of hardcoding heap_insert etc. But that can be left for a second pass. (This is my point "iv" further below, to which you said "+1"). > Nothing to do, just substitute t_data with proper HeapTupleHeader > representation. I think it's a job for StorageAm. Let's say each StorageAm > must have stam_to_heaptuple() function and opposite function > stam_from_heaptuple(). Hmm, yeah, that also works. We'd have to check again whether it's more convenient to start as a slot rather than a StorageTuple. AFAICS the trigger.c code is all starting from a slot, so it makes sense to have the conversion use the slot code -- that way, there's no need for each storageAM to re-implement conversion to HeapTuple. > >note f) More widespread, MinimalTuples currently use a tweaked HeapTuple > >format. In the long run, it may be possible to replace them with a > >separate storage module that's specifically designed to handle tuples > >meant for tuplestores etc. That may simplify TupleTableSlot and > >execTuples. For the moment we keep the tts_mintuple as it is. Whenever > >a tuple is not already in heap format, we heapify it in order to put in > >the store. > I wonder, do we really need MinimalTuples to support all formats? Sure. I wouldn't want to say "you can create table in columnar storage format, but if you do, these tables cannot use hash join". > >ii) execTuples has additional accessors for tuples-in-slot, such as > >ExecFetchSlotTuple and friends. I expect to have some of them to return > >abstract StorageTuples, others HeapTuple or MinimalTuples (possibly > >wrapped in Datum), depending on callers. We might be able to cut down > >on these later; my first cut will try to avoid API changes to keep > >fallout to a minimum. > > I'd suggest replacing all occurrences of HeapTuple with StorageTuple. > Do you see any problems with it? The HeapTuple-in-datum representation, as I recall, is used in the SQL function manager; maybe other places too. Maybe there's a way to fix that layer so that it uses StorageTuple instead, but I prefer not to touch it in the first phase. We can fix it later. This is already a big enough patch ... > >iii) All tuples need to be identifiable by ItemPointers. Storages that > >have different requirements will need careful additional thought across > >the board. > > For a start, we can simply deny secondary indexes for these storages > or require a function that converts tuple identifier inside the storage to > ItemPointer suitable for an index. Umm. I don't think rejecting secondary indexes would work very well. I think we can lift this limitation later; we just need to change the IndexTuple abstraction so that it doesn't rely on ItemPointer as currently. > >v) Currently, one Buffer may be associated with one HeapTuple living in a > >slot; when the slot is cleared, the buffer pin is released. My current > >patch moves the buffer pin to inside the heapam-based storage AM and the > >buffer is released by the ->slot_clear_tuple method. The rationale for > >doing this is that some storage AMs might want to keep several buffers > >pinned at once, for example, and must not to release those pins > >individually but in batches as the scan moves forwards (say a batch of > >tuples in a columnar storage AM has column values spread across many > >buffers; they must all be kept pinned until the scan has moved past the > >whole set of tuples). But I'm not really sure that this is a great > >design. > > Frankly, I doubt that it's real to implement columnar storage just as > a variant of pluggable storage. It requires a lot of changes in executor > and optimizer and so on, which are hardly compatible with existing > tuple-oriented model. However I'm not so good in this area, so if you > feel that it's possible, go ahead. Well, not *just* as a variant of pluggable storage. This thread is just one sub-project inside the greater project to enable column-oriented storage; that includes further changes to executor, too, but I haven't discussed those in this proposal. I mentioned all this in Brussels' developer meeting earlier this year. (There I mostly talked about vertical partitioning, which is a different subproject that I've put aside for the moment, but really it's all part of the same thing.) https://wiki.postgresql.org/wiki/Future_of_storage Thanks for reading! -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: