Re: [HACKERS] Pluggable storage - Mailing list pgsql-hackers
From | Haribabu Kommi |
---|---|
Subject | Re: [HACKERS] Pluggable storage |
Date | |
Msg-id | CAJrrPGfCLTaD8DOzYqm44TbPYcOicp9eyGA4nDW=R6EN=mrNig@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Pluggable storage (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [HACKERS] Pluggable storage
(Alexander Korotkov <a.korotkov@postgrespro.ru>)
|
List | pgsql-hackers |
On Tue, Aug 15, 2017 at 4:53 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-06-13 11:50:27 +1000, Haribabu Kommi wrote:
> Here I attached WIP patches to support pluggable storage. The patch series
> are may not work individually. Still so many things are under development.
> These patches are just to share the approach of the current development.
Making a pass through the patchset to get a feel where this at, and
where this is headed. I previously skimmed the thread to get a rough
sense on what's discused, but not in a very detailed manner.
Thanks for the review.
General:
- I think one important discussion we need to have is what kind of
performance impact we're going to accept introducing this. It seems
very likely that this'll cause some slowdown. We can kind of
alleviate that by doing some optimizations at the same time, but
nevertheless, this abstraction is going to cost.
OK. May be to take some decision, we may need some performance
figures, I will measure the performance once the API's stabilized.
- I don't think we should introduce this without a user besides
heapam. The likelihood that API will be usable by anything else
without a testcase seems fairly remote. I think some src/test/modules
type implementation of a per-session, in-memory storage - relatively
easy to implement - or such is necessary.
Sure, I will add a test module once the API's are stabilized.
- I think, and detailed some of that, we're going to need some cleanups
that go in before this, to decrease the size / increase the quality of
the new APIs. It's going to get more painful to change APIs
subsequently.
- We'll have to document clearly that these APIs are going to change for
a while, even after the release introducing them.
Yes, that's correct, because this is the first time we are developing the
storage API's to support pluggable storage, so it may needs some refinements
based on the usage to support different storage methods.
StorageAm - Scan stuff:
- I think API isn't quite right. There's a lot of granular callback
functionality like scan_begin_function / scan_begin_catalog /
scan_begin_bm - these largely are convenience wrappers around the same
function, and I don't think that would, or rather should, change in
any abstracted storage layer. So I think there needs to be some
unification first (pretty close w/ beginscan_internal already, but
perhaps we should get rid of a few of these wrappers).
OK. I will change the API to add a function to beginscan_internal and replace
the rest of the functions usage with beginscan_internal. And also there are
many bool flags that are passed to the beginscan_internal, I will try to optimize
them also.
- Some of the exposed functionality, e.g. scan_getpage,
scan_update_snapshot, scan_rescan_set_params looks like it should just
be excised, i.e. there's no reason for it to exist.
Currently these API's are used only in Bitmap and Sample scan's.
These scan methods are fully depends on the heap format. I will
check how to remove these API's.
- Minor: don't think the _function suffix for Storis necessary, just
makes things long, and every member has it. Besides that, it's also
easy to misunderstand - for a second I understood
scan_getnext_function to be about getting the next function...
OK. How about adding _hook?
- Scans are still represented as HeapScanDesc - I don't think that's
going to fly. Either this needs to be an opaque type (i.e. a struct
that's not defined, just forward declared), or it needs to be a base
struct that individual AMs embed in their own structs. Individual AMs
definitely are going to need different pieces of data.
Currently the internal members of the HeapScanDesc are directly used
in many places especially in Bitmap and Sample scan's. I am yet to write
the code in a best way to handle these scan methods and then removing
its usage will be easy.
Storage AM - tuple stuff:
- tuple_get_{xmin, updated_xid, cmin, itempointer, ctid, heaponly} are
each individual functions, that seems pretty painful to maintain, and
v/ likely to just grow and grow. Not sure what the solution is, but
this seems like a hard sell.
OK. How about adding a one API and takes some flags to represent
what type of data that is needed from the tuple and returned the corresponding
data as void *. The caller must typecast the data to their corresponding
type before use it.
- The three *speculative* functions don't quite seem right to me, nor do
I understand:
+ *
+ * Setting a tuple's speculative token is a slot-only operation, so no need
+ * for a storage AM method, but after inserting a tuple containing a
+ * speculative token, the insertion must be completed by these routines:
+ */
I don't see anything related to slots, here?
The tuple_set_speculative_token API is not required. Just update the slot
member directly with speculative token is fine and this value is used in
the tuple_insert API to form the tuple with speculative token. Later with
the other two API's the tuple is either finished or aborted.
Storage AM - slot stuff:
- I think there's a number of wrapper functions (slot_getattr,
slot_getallattrs, getsomeattrs, attisnull) around the same
functionality - that bloats the API and causes slowdowns. Instead we
need something like slot_virtualize_tuple(int16 upto), and the rest
should just be wrappers.
OK. I will change accordingly.
- I think it's wrong to have the slot functionality defined on the
StorageAm level. That'll cause even more indirect function calls (=>
slowness), and besides that the TupleTableSlot struct will need
members for each and every Am.
I think the solution is to instead have an Am level "create slot"
function, and the returned slot is allocated by the Am, with a base
member of TupleTableSlot with basically just tts_nvalid, tts_values,
tts_isnull as members. Those are the only members that can be
accessed without functions.
Access to the individual functions (say store_tuple) would then be
direct members of the TupleTableSlot interface. While that costs a bit
of memory, it removes one indirection from an already performance
critical path.
OK. This will change the structure to hold the minimal members that are
accessed irrespective of storage AM, and rest of data will be a void pointer
that can be accessed by only the storage AM.
- MinimalTuples should be one type of slot for the above, except it's
not created by an StorageAm but by a function returning a
TupleTableSlot.
This should remove the need for the slot_copy_min_tuple,
slot_is_physical_tuple functions.
OK.
- Right now TupleTableSlots are an executor datastructure, but
these patches (rightly!) make it much more widely used. So I think it
needs to be moved outside of executor/, and probably renamed to
something like TupleHolder or something.
OK.
- The oid integration seems wrong - without an accessor oids won't be
queryable with this unless you break through the API. But from a
higher level view I do wonder if it's not time to remove "special" oid
columns and replace them with a normal column. We should be hesitant
enshrining crusty old concepts in new APIs.
OK.
Executor integration:
- I'm quite fearful that this'll cause slowdowns in a few tight paths.
The most likely cases here seem to be a) bitmap indexscans b)
indexscans c) highly selective sequential scans. I do wonder if
that can be partially addressed by switching out the individual
executor routines in the relevant scan nodes by something using or
similar to the infrastructure in cc9f08b6b8
Sorry, I didn't understand this point clearly. Can you provide some more
details.
Regards,
Hari Babu
Fujitsu Australia
pgsql-hackers by date: