Re: Pluggable storage - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Pluggable storage
Date
Msg-id 20160817170344.GA901677@alvherre.pgsql
Whole thread Raw
In response to Re: Pluggable storage  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses Re: Pluggable storage  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Anastasia Lubennikova wrote:
> 13.08.2016 02:15, Alvaro Herrera:

> >To support this, we introduce StorageTuple and StorageScanDesc.
> >StorageTuples represent a physical tuple coming from some storage AM.
> >It is necessary to have a pointer to a StorageAmRoutine in order to
> >manipulate the tuple.  For heapam.c, a StorageTuple is just a HeapTuple.
> 
> StorageTuples concept looks really cool. I've got some questions on
> details of implementation.
> 
> Do StorageTuples have fields common to all implementations?
> Or StorageTuple is totally abstract structure that has nothing to do
> with data, except pointing to it?
> 
> I mean, now we already have HeapTupleData structure, which is a pretty
> good candidate to replace with StorageTuple.

I was planning to replace all uses of HeapTuple in the executor with
StorageTuple, actually.  But the main reason I would like to avoid
HeapTupleData itself is that it contains an assumption that there is a
single palloc chunk that contains the tuple (t_len and t_data).  This
might not be true in representations that split the tuple, for example
in columnar storage where you have one column in page A and another
column in page B, for the same tuple.  I suppose there might be some
point to keeping t_tableOid and t_self, though.

> And maybe add a "t_handler" field that points out to handler functions.
> I don't sure if it will be a name of StorageAm, or its OID, or maybe the
> main function itself. Although, If I'm not mistaken, we always have
> RelationData when we want to operate the tuple, so having t_handler
> in the StorageTuple is excessive.

Yeah, I think the RelationData (or more precisely the StorageAmRoutine)
is going to be available always, so I don't think we need a pointer in
the tuple itself.

> This approach allows to minimize code changes and ensure that we
> won't miss any function that handles tuples.
> 
> Do you see any weak points of the suggestion?
> What design do you use in your prototype?

It's currently a "void *" pointer in my prototype.

> >RelationData gains ->rd_stamroutine which is a pointer to the
> >StorageAmRoutine for the relation in question.  Similarly,
> >TupleTableSlot is augmented with a link to the StorageAmRoutine to
> >handle the StorageTuple it contains (probably in most cases it's set at
> >the same time as the tupdesc).  This implies that routines such as
> >ExecAssignScanType need to pass down the StorageAmRoutine from the
> >relation to the slot.
> 
> If we already have this pointer in t_handler as described below,
> we don't need to pass it between functions and slots.

I think it's better to have it in slots, so you can install multiple
tuples in the slot without having to change the routine pointers each
time.

> >The executor is modified so that instead of calling heap_insert etc
> >directly, it uses rel->rd_stamroutine to call these methods.  The
> >executor is still in charge of dealing with indexes, constraints, and
> >any other thing that's not the tuple storage itself (this is one major
> >point in which this differs from FDWs).  This all looks simple enough,
> >with one exception and a few notes:
> 
> That is exactly what I tried to describe in my proposal.
> Chapter "Relation management". I'm sure, you've already noticed
> that it will require huge source code cleaning. I've carefully read
> the sources and found "violators" of abstraction in src/backend/commands.
> The list is attached to the wiki page
> https://wiki.postgresql.org/wiki/HeapamRefactoring.
> 
> Except these, there are some pretty strange and unrelated functions in
> src/backend/catalog.
> I'm willing to fix them, but I'd like to synchronize our efforts.

I very much would like to stay away from touching src/backend/catalog,
which are the functions that deal with system catalogs.  We can simply
say that system catalogs are hardcoded to use heapam.c storage for now.
If we later see a need to enable some particular catalog using a
different storage implementation, we can change the code for that
specific catalog in src/backend/catalog and everywhere else, to use the
abstract API instead of hardcoding heap_insert etc.  But that can be
left for a second pass.  (This is my point "iv" further below, to which
you said "+1").


> Nothing to do, just substitute t_data with proper HeapTupleHeader
> representation. I think it's a job for StorageAm. Let's say each StorageAm
> must have stam_to_heaptuple() function and opposite function
> stam_from_heaptuple().

Hmm, yeah, that also works.  We'd have to check again whether it's more
convenient to start as a slot rather than a StorageTuple.  AFAICS the
trigger.c code is all starting from a slot, so it makes sense to have
the conversion use the slot code -- that way, there's no need for each
storageAM to re-implement conversion to HeapTuple.

> >note f) More widespread, MinimalTuples currently use a tweaked HeapTuple
> >format.  In the long run, it may be possible to replace them with a
> >separate storage module that's specifically designed to handle tuples
> >meant for tuplestores etc.  That may simplify TupleTableSlot and
> >execTuples.  For the moment we keep the tts_mintuple as it is.  Whenever
> >a tuple is not already in heap format, we heapify it in order to put in
> >the store.
> I wonder, do we really need MinimalTuples to support all formats?

Sure.  I wouldn't want to say "you can create table in columnar storage
format, but if you do, these tables cannot use hash join".

> >ii) execTuples has additional accessors for tuples-in-slot, such as
> >ExecFetchSlotTuple and friends.  I expect to have some of them to return
> >abstract StorageTuples, others HeapTuple or MinimalTuples (possibly
> >wrapped in Datum), depending on callers.  We might be able to cut down
> >on these later; my first cut will try to avoid API changes to keep
> >fallout to a minimum.
>
> I'd suggest replacing all occurrences of HeapTuple with StorageTuple.
> Do you see any problems with it?

The HeapTuple-in-datum representation, as I recall, is used in the SQL
function manager; maybe other places too.  Maybe there's a way to fix
that layer so that it uses StorageTuple instead, but I prefer not to
touch it in the first phase.  We can fix it later.  This is already a
big enough patch ...

> >iii) All tuples need to be identifiable by ItemPointers.  Storages that
> >have different requirements will need careful additional thought across
> >the board.
> 
> For a start, we can simply deny secondary indexes for these storages
> or require a function that converts tuple identifier inside the storage to
> ItemPointer suitable for an index.

Umm.  I don't think rejecting secondary indexes would work very well.  I
think we can lift this limitation later; we just need to change the
IndexTuple abstraction so that it doesn't rely on ItemPointer as
currently.

> >v) Currently, one Buffer may be associated with one HeapTuple living in a
> >slot; when the slot is cleared, the buffer pin is released.  My current
> >patch moves the buffer pin to inside the heapam-based storage AM and the
> >buffer is released by the ->slot_clear_tuple method.  The rationale for
> >doing this is that some storage AMs might want to keep several buffers
> >pinned at once, for example, and must not to release those pins
> >individually but in batches as the scan moves forwards (say a batch of
> >tuples in a columnar storage AM has column values spread across many
> >buffers; they must all be kept pinned until the scan has moved past the
> >whole set of tuples).  But I'm not really sure that this is a great
> >design.
> 
> Frankly, I doubt that it's real to implement columnar storage just as
> a variant of pluggable storage. It requires a lot of changes in executor
> and optimizer and so on, which are hardly compatible with existing
> tuple-oriented model. However I'm not so good in this area, so if you
> feel that it's possible, go ahead.

Well, not *just* as a variant of pluggable storage.  This thread is just
one sub-project inside the greater project to enable column-oriented
storage; that includes further changes to executor, too, but I haven't
discussed those in this proposal.  I mentioned all this in Brussels'
developer meeting earlier this year.  (There I mostly talked about
vertical partitioning, which is a different subproject that I've put
aside for the moment, but really it's all part of the same thing.)
https://wiki.postgresql.org/wiki/Future_of_storage

Thanks for reading!

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Add -c to rsync commands on SR tutorial wiki page
Next
From: Jim Nasby
Date:
Subject: Re: Why we lost Uber as a user