Thread: TupleTableSlot abstraction
Hi, I've recently been discussing with Robert how to abstract TupleTableSlots in the light of upcoming developments around abstracting storage away. Besides that aspect I think we also need to make them more abstract to later deal with vectorized excution, but I'm more fuzzy on the details there. I'd previously, in a response to an early version of the pluggable storage patch, commented [1] how I think the abstraction should roughly look like. I'd privately started to prototype that. After my discussion with Robert I got that prototype in a state that can run many queries, but doesn't pass the regression tests. The pluggable storage patch has also been updated [2] to abstract slots more (see patch 0005). My position is that there's a number of weaknesses with the current TupleTableSlot API in light of multiple independent, possibly out of core, storage implementations: - TupleTableSlots have to contain all the necessary information for each type of slot. Some implementations might require a buffer pin to be hold (our heap), others pointers to multiple underlying points of data (columns store), some need additional metadata (our MinimalTuple). Unifying the information into one struct makes it much harder to provide differing implementations without modifying core postgres and/or increasing overhead for every user of slots. I think the consequence of that is that we need to allow each type of slot to have their own TupleTableSlot embedding struct. Haribabu's patch solved this by adding a tts_storage parameter that contains additional data, but imo that's unconvincing due to the added indirection in very performance critical codepaths. - The way slots currently are implemented each new variant data is stored in slots adds new branches to hot code paths like ExecClearTuple(), and they're not extensible. Therefore I think we need to move to callback based implementation. In my tests, by moving the callback invocations into very thin inline functions, the branch prediction accuracy actually goes sligthly *up*. - Currently we frequently convert from one way of representing a row into another, in the same slot. We do so fairly implicilty in ExecMaterializeSlot(), ExecFetchSlotMinimalTuple(), ExecCopySlot()... The more we abstract specific storage representation away, the worse I think that is. I think we should make such conversions explicit by requiring transfers between two slots if a specific type is required. - We have a few codepaths that set fields on tuples that can't be represented in virtual slots. E.g. postgres_fdw sets ctid, oid, ExecProcessReturning() will set tableoid. In my POC I turned TupleTableSlot into basically an abstract base class: struct TupleTableSlot { NodeTag type; uint16 flags; uint16 nvalid; /* # of valid values in tts_values */ const TupleTableSlotOps *const cb; TupleDesc tupleDescriptor; /* slot's tuple descriptor */ Datum *values; /* current per-attribute values */ bool *nulls; /* current per-attribute null flags */ /* can we optimize away? */ MemoryContext mcxt; /* slot itself is in this context */ }; which then is inherited from for specific implementations of tuple slots: typedef struct HeapTupleTableSlot { TupleTableSlot base; HeapTuple tuple; /* physical tuple */ uint32 off; /* saved state for slot_deform_tuple */ } HeapTupleTableSlot; ... typedef struct MinimalTupleTableSlot { TupleTableSlot base; HeapTuple tuple; /* tuple wrapper */ MinimalTuple mintuple; /* minimal tuple, or NULL if none */ HeapTupleData minhdr; /* workspace for minimal-tuple-only case */ uint32 off; /* saved state for slot_deform_tuple */ } MinimalTupleTableSlot; typedef struct TupleTableSlotOps { void (*init)(TupleTableSlot *slot); void (*release)(TupleTableSlot *slot); void (*clear)(TupleTableSlot *slot); void (*getsomeattrs)(TupleTableSlot *slot, int natts); void (*materialize)(TupleTableSlot *slot); HeapTuple (*get_heap_tuple)(TupleTableSlot *slot); MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot); HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot); MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot); } TupleTableSlotOps; when creating a slot one has to specify the type of slot one wants to use: typedef enum TupleTableSlotType { TTS_TYPE_VIRTUAL, TTS_TYPE_HEAPTUPLE, TTS_TYPE_MINIMALTUPLE, TTS_TYPE_BUFFER } TupleTableSlotType; extern TupleTableSlot *MakeTupleTableSlot(TupleDesc desc, TupleTableSlotType st); You might notice that I've renamed a few fields (tts_values -> values, tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed in the above structs / the attached patch. I now think that's probably not a good idea, unnecessarily breaking more code than necessary. I generally like this scheme because it allows the TupleTableStruct to be relatively lean, without knowing much about specific implementations of slots. After this change functions like slot_getattr are thin inline wrappers: +static inline Datum +slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull) +{ + Assert(attnum > 0); + + if (attnum > slot->nvalid) + slot->cb->getsomeattrs(slot, attnum); + + Assert(attnum <= slot->nvalid); + + *isnull = slot->nulls[attnum - 1]; + return slot->values[attnum - 1]; +} Note that I've moved system attribute handling out of the slot_getattr path - there were a few paths accessing them via slot_getattr, and it's a lot of unnecessary branching. The fact that slots of different types have different storage means that we can't just change the type of a slot when needed. In nearly all places that's not a problem. E.g. ExecHashTableInsert() converts the "incoming" slot to one containing a minimal tuple with ExecFetchSlotMinimalTuple(), but that's essentially transient as it's just used to copy the tuple into the hashtable. We could even gain quite some efficiency by directly copying into the allocated space (saving one serialization/copy step for the common case where input is either a virtual or a heap tuple). The slightly bigger issue is that several parts of the code currently require heaptuples they can scribble upon. E.g. nodeModifyTable.c materializes the tuple - those can be solved by using a local slot to store the tuple, rather than using the incoming one. In nearly all cases we'd still be left with the same number of tuple copy steps, as materializing the tuple with copy into the local storage anyway. A few other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or such instead of materialize. The biggest issue I don't have a good answer for, and where I don't like Haribabu's solution, is how to deal with system columns like oid, ctid, tableoid. Haribabu's patch adds storage of those to the slot, but that seems like quite the cludge to me. It appears to me that the best thing would be to require projections for all accesses to system columns, at the original level of access. Then we don't need any sort of special codepaths dealing with them. But that's not exactly a trivial change and has some added complications too (junkfilter type things). I made a reference to vectorized execution above. That's because that I found, while prototyping vectorized execution, that it's a very common need to interact with parts of the system that aren't vectorized, and doing so has to be extremely cheap. With the above we can have a TupleTableSlot type that's essentially just a cheap cursor into a batch of tuples. Besides making it efficiently possible to hand of slots to part of the system that can't deal with tuple batches, it allows to do fun things like having the first slot_deform_tuple() call for one tuple in a batch to deform a full page's worth of tuples, yielding *much* better cache access patterns. I haven't done that, but I think we should split ExecStoreTuple() into multiple versions, that only work on specific types of tuple slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other code dealing with tuples would call ExecStoreHeapTuple(). The relevant callers already need to know what kind of tuple they're dealing with, therefore that's not a huge burden. Please please note that the attached patch is *NOT* meant to be a full proposal or even a to be reviewed patch, it's just an illustration of the concepts I'm talking about. Comments? Better ideas? [1] http://archives.postgresql.org/message-id/20170815065348.afkj45dgmg22ecfh%40alap3.anarazel.de [2] http://archives.postgresql.org/message-id/CAJrrPGdY_hgvu06R_vCibUktKRLp%3Db8nBfeZkh%3DKRc95tq4euA%40mail.gmail.com Greetings, Andres Freund
Attachment
Hi!
Thank you for working on this subject. See some my comments below.
On Wed, Feb 21, 2018 at 1:43 AM, Andres Freund <andres@anarazel.de> wrote:
- TupleTableSlots have to contain all the necessary information for each
type of slot. Some implementations might require a buffer pin to be
hold (our heap), others pointers to multiple underlying points of data
(columns store), some need additional metadata (our MinimalTuple).
Unifying the information into one struct makes it much harder to
provide differing implementations without modifying core postgres
and/or increasing overhead for every user of slots.
I think the consequence of that is that we need to allow each type of
slot to have their own TupleTableSlot embedding struct.
+1, I think that it's reasonable to keep minimal common slot information in
TupleTableSlot struct and to let particular slot implementations to extend it.
Haribabu's patch solved this by adding a tts_storage parameter that
contains additional data, but imo that's unconvincing due to the added
indirection in very performance critical codepaths.
+1
- The way slots currently are implemented each new variant data is
stored in slots adds new branches to hot code paths like
ExecClearTuple(), and they're not extensible.
Therefore I think we need to move to callback based implementation. In
my tests, by moving the callback invocations into very thin inline
functions, the branch prediction accuracy actually goes sligthly
*up*.
Sounds interesting. Besides branch prediction accuracy, what can you
say about overall performance?
- Currently we frequently convert from one way of representing a row
into another, in the same slot. We do so fairly implicilty in
ExecMaterializeSlot(), ExecFetchSlotMinimalTuple(), ExecCopySlot()...
The more we abstract specific storage representation away, the worse I
think that is. I think we should make such conversions explicit by
requiring transfers between two slots if a specific type is required.
- We have a few codepaths that set fields on tuples that can't be
represented in virtual slots. E.g. postgres_fdw sets ctid, oid,
ExecProcessReturning() will set tableoid.
In my POC I turned TupleTableSlot into basically an abstract base class:
struct TupleTableSlot
{
NodeTag type;
uint16 flags;
uint16 nvalid; /* # of valid values in tts_values */
const TupleTableSlotOps *const cb;
TupleDesc tupleDescriptor; /* slot's tuple descriptor */
Datum *values; /* current per-attribute values */
bool *nulls; /* current per-attribute null flags */
/* can we optimize away? */
MemoryContext mcxt; /* slot itself is in this context */
};
which then is inherited from for specific implementations of tuple
slots:
typedef struct HeapTupleTableSlot
{
TupleTableSlot base;
HeapTuple tuple; /* physical tuple */
uint32 off; /* saved state for slot_deform_tuple */
} HeapTupleTableSlot;
...
typedef struct MinimalTupleTableSlot
{
TupleTableSlot base;
HeapTuple tuple; /* tuple wrapper */
MinimalTuple mintuple; /* minimal tuple, or NULL if none */
HeapTupleData minhdr; /* workspace for minimal-tuple-only case */
uint32 off; /* saved state for slot_deform_tuple */
} MinimalTupleTableSlot;
typedef struct TupleTableSlotOps
{
void (*init)(TupleTableSlot *slot);
void (*release)(TupleTableSlot *slot);
void (*clear)(TupleTableSlot *slot);
void (*getsomeattrs)(TupleTableSlot *slot, int natts);
void (*materialize)(TupleTableSlot *slot);
HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);
MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);
HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);
MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);
} TupleTableSlotOps;
when creating a slot one has to specify the type of slot one wants to
use:
typedef enum TupleTableSlotType
{
TTS_TYPE_VIRTUAL,
TTS_TYPE_HEAPTUPLE,
TTS_TYPE_MINIMALTUPLE,
TTS_TYPE_BUFFER
} TupleTableSlotType;
extern TupleTableSlot *MakeTupleTableSlot(TupleDesc desc, TupleTableSlotType st);
Can user define his own slot type? If so, I assume that hard-coded
enum is a limitation of current prototype, and eventually we would have
an extendable array of slot types. Is it correct?
You might notice that I've renamed a few fields (tts_values -> values,
tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed
in the above structs / the attached patch. I now think that's probably
not a good idea, unnecessarily breaking more code than necessary.
For me "tss_" prefix looks a bit weird, but I agree that we probably should
refrain from renaming this in order to break as small number of things as possible.
I generally like this scheme because it allows the TupleTableStruct to
be relatively lean, without knowing much about specific implementations
of slots.
After this change functions like slot_getattr are thin inline wrappers:
+static inline Datum
+slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
+{
+ Assert(attnum > 0);
+
+ if (attnum > slot->nvalid)
+ slot->cb->getsomeattrs(slot, attnum);
+
+ Assert(attnum <= slot->nvalid);
+
+ *isnull = slot->nulls[attnum - 1];
+ return slot->values[attnum - 1];
+}
Note that I've moved system attribute handling out of the slot_getattr
path - there were a few paths accessing them via slot_getattr, and it's
a lot of unnecessary branching.
The fact that slots of different types have different storage means that
we can't just change the type of a slot when needed. In nearly all
places that's not a problem. E.g. ExecHashTableInsert() converts the
"incoming" slot to one containing a minimal tuple with
ExecFetchSlotMinimalTuple(), but that's essentially transient as it's
just used to copy the tuple into the hashtable. We could even gain
quite some efficiency by directly copying into the allocated space
(saving one serialization/copy step for the common case where input is
either a virtual or a heap tuple).
The slightly bigger issue is that several parts of the code currently
require heaptuples they can scribble upon. E.g. nodeModifyTable.c
materializes the tuple - those can be solved by using a local slot to
store the tuple, rather than using the incoming one. In nearly all cases
we'd still be left with the same number of tuple copy steps, as
materializing the tuple with copy into the local storage anyway. A few
other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or
such instead of materialize.
The biggest issue I don't have a good answer for, and where I don't like
Haribabu's solution, is how to deal with system columns like oid, ctid,
tableoid. Haribabu's patch adds storage of those to the slot, but that
seems like quite the cludge to me. It appears to me that the best thing
would be to require projections for all accesses to system columns, at
the original level of access. Then we don't need any sort of special
codepaths dealing with them. But that's not exactly a trivial change and
has some added complications too (junkfilter type things).
I also think that projections for accessing system columns is probably
the best solution for them.
I made a reference to vectorized execution above. That's because that I
found, while prototyping vectorized execution, that it's a very common
need to interact with parts of the system that aren't vectorized, and
doing so has to be extremely cheap. With the above we can have a
TupleTableSlot type that's essentially just a cheap cursor into a batch
of tuples. Besides making it efficiently possible to hand of slots to
part of the system that can't deal with tuple batches, it allows to do
fun things like having the first slot_deform_tuple() call for one tuple
in a batch to deform a full page's worth of tuples, yielding *much*
better cache access patterns.
Are you planning to use custom slots only for interaction between
vectorized and non-vectorized parts of executor, or also for interaction
inside vectorized part of executor. It seems that proposed interface
isn't suitable for interaction between two vectorized nodes. Are you
planning to introduce special kind of vectorized slots or whatever?
I haven't done that, but I think we should split ExecStoreTuple() into
multiple versions, that only work on specific types of tuple
slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other
code dealing with tuples would call ExecStoreHeapTuple(). The relevant
callers already need to know what kind of tuple they're dealing with,
therefore that's not a huge burden.
Please please note that the attached patch is *NOT* meant to be a full
proposal or even a to be reviewed patch, it's just an illustration of
the concepts I'm talking about.
Comments? Better ideas?
As I get, your primary interest in this topic is vectorized execution. When
passing data from vectorized to non-vectorized parts of executor tuples are
materialized by moving them from vectorized slot to regular slot.
But what do you thing about alternative tuple formats in custom row-oriented
table access methods? It's likely we should try to minimize tuple format
conversion. Thus, some executor nodes can switch their tuple format to
match table access method tuple format. For instance, we're running
aggregate over table with custom table access method. Then hash
aggregate node may accumulate tuples in the same format as they are
received from table access method. Does it make sense?
------
Alexander Korotkov
Postgres Professional: http://www. postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.
The Russian Postgres Company
Andres Freund wrote: > Hi, > > I've recently been discussing with Robert how to abstract > TupleTableSlots in the light of upcoming developments around abstracting > storage away. Besides that aspect I think we also need to make them > more abstract to later deal with vectorized excution, but I'm more fuzzy > on the details there. Hi, is this patch proposed for pg11? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-03-13 15:18:34 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > Hi, > > > > I've recently been discussing with Robert how to abstract > > TupleTableSlots in the light of upcoming developments around abstracting > > storage away. Besides that aspect I think we also need to make them > > more abstract to later deal with vectorized excution, but I'm more fuzzy > > on the details there. > Hi, is this patch proposed for pg11? I wish, but I don't see it happening unless I get a time compression device from somewhere :( Greetings, Andres Freund
On Wed, Feb 21, 2018 at 4:13 AM, Andres Freund <andres@anarazel.de> wrote: > > In my POC I turned TupleTableSlot into basically an abstract base class: Here's set of patches which move this POC forward towards completion, not yet complete though. Here's what has changed. 1. Fixed all the regression failures while running make check from regress and postgres_fdw. 2. Added/modified a bunch of comments about the new TupleTableSlot structure. Modified comments in existing functions working with TupleTableSlots as per the new implementation. Added comments explaining each callback in TupleTableSlotOps, and their implementations. 3. Improved error messages in some TupleTableTypeSlot specific implementations. 4. Fixed some coding errors in the POC patch (which didn't show up as a failure in regression) > > You might notice that I've renamed a few fields (tts_values -> values, > tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed > in the above structs / the attached patch. I now think that's probably > not a good idea, unnecessarily breaking more code than necessary. Done that. All members of TupleTableSlot have their names starting with tts_ > > After this change functions like slot_getattr are thin inline wrappers: > +static inline Datum > +slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull) > +{ > + Assert(attnum > 0); > + > + if (attnum > slot->nvalid) > + slot->cb->getsomeattrs(slot, attnum); > + > + Assert(attnum <= slot->nvalid); > + > + *isnull = slot->nulls[attnum - 1]; > + return slot->values[attnum - 1]; > +} > > Note that I've moved system attribute handling out of the slot_getattr > path - there were a few paths accessing them via slot_getattr, and it's > a lot of unnecessary branching. The system attributes are accessed via heap_getsysattr() by passing that function a tuple from HeapTupleTableSlot or BufferHeapTupleTableSlot. Those places Assert that the slot being used is either HeapTupleTableSlot or BufferHeapTupleTableSlot. slot_getsysattr() which accessed system attributes through a lot is now removed and replaced with heap_getsysattr as described. > > The slightly bigger issue is that several parts of the code currently > require heaptuples they can scribble upon. E.g. nodeModifyTable.c > materializes the tuple - those can be solved by using a local slot to > store the tuple, rather than using the incoming one. In nearly all cases > we'd still be left with the same number of tuple copy steps, as > materializing the tuple with copy into the local storage anyway. A few > other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or > such instead of materialize. There is a patch changing callers of ExecMaterializeSlot() to use ExecGetHeapTupleFromSlot() or ExecSaveSlot(), new functions added by me. But I think that needs more work. I don't think the new functions are necessary. We could use ExecCopySlotTuple(), ExecFetchSlotTuple() in place of ExecGetHeapTupleFromSlot() appropriately. The callers who just want the tuple and not necessarily a writable one, can use ExecFetchSlotTuple(). Those who want a writable copy can use ExecCopySlotTuple(). When the slot is *not* heap or buffer heap tuple table slot, we materialize a heap tuple every time we want to get the heap tuple out of the slot. If we are doing this multiple times without changing the contents of the slot, we will leak memory. I have TODOs to check whether that's a problem in practice. May be we should use ExecCopySlotTuple() explicitly while fetching tuples from those kinds of slot. Same is case while fetching a minimal heap tuple from non minimal heap tuple slots. > > > The biggest issue I don't have a good answer for, and where I don't like > Haribabu's solution, is how to deal with system columns like oid, ctid, > tableoid. Haribabu's patch adds storage of those to the slot, but that > seems like quite the cludge to me. It appears to me that the best thing > would be to require projections for all accesses to system columns, at > the original level of access. Then we don't need any sort of special > codepaths dealing with them. But that's not exactly a trivial change and > has some added complications too (junkfilter type things). > I studied the code extensively to check whether there are any cases where we fetch system columns using (Var nodes with varattno < 0) from (headers of ) tuples produced by non-scan nodes (e.g. join, aggregation). That's not the case. We do have instances of INNER, OUTER Var with varattno < 0, but those are the cases when trigger accesses OLD and NEW tuples. I have a patch to add tests for the same. > > I haven't done that, but I think we should split ExecStoreTuple() into > multiple versions, that only work on specific types of tuple > slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other > code dealing with tuples would call ExecStoreHeapTuple(). The relevant > callers already need to know what kind of tuple they're dealing with, > therefore that's not a huge burden. I thought so too, but haven't done that change right now. Will work on that. That's in my TODO list. Here's description of patches. 0001, 0002 are the same patches as submitted in [1]. 0003 changes tts_nvalid from int to AttrNumber. int would take at least 4 bytes whereas the maximum value that tts_nvalid can take is restricted by AttrNumber which 2 byte long. 0004 removed TupleDescGetSlot() function, which is obsolete, but retained for extensions. With TupleTableSlot abstraction, I think the extensions need to change anyway, so removing that function altogether. 0005 packs various boolean members from TupleTableSlot structure into a uint16 tts_flags, with each bit for each one bool member. I think we can remove SHOULDFREEMIN since SHOULDFREE on MinimalHeapTupleTableSlot can be used in place of SHOULDFREEMIN. I will do that in the next version. This reduces the size of TupleTableSlot structure and can be committed as an independent patch. There's also possibility that only the BufferHeapTupleTableSlot will have ~SHOULDFREE but all other slots have SHOULDFREE set, so we can get rid of that flag altogether. But I need to verify that. 0006 This is improved version of your POC patch. I have changed the tuple table slot type in many places which showed up as regression failures. Added/modified a bunch of comments, improved error messages, corrected code at some places, addressed FIXME's at some places. Also added macros to check the TupleTableSlot's type based on TupleTableSlotOps set. 0007 adds tts_type member indicating TupleTableSlot type readily avaialble for debugging. We may or may not commit this patch. 0008 changes the place where we reset expression context used for processing RETURNING and ON CONFLICT clause. If a query has both the clauses, we reset the same expression twice while processing same input/output tuple. This means that some memory allocated while processing ON CONFLICT and required by RETURNING will get freed before the later gets at it. This looks like a bug in the existing code, but I have not been able to create a reproduction for the same. It showed up as a regression failure with TupleTableAbstraction changes. 0009 Replaces ExecMaterializeSlot as described above. I will work on this again and examine all the instances of ExecMaterializeSlot as described above. 0010 Index scan uses reorder queue in some cases. Tuples are stored reorder queue as heap tuples without any associated buffer. While returning a tuple from reorder queue, IndexNextWithReorder() should use TupleTableSlot of type HeapTupleTableSlot. A tuple returned directly from an index scan has a buffer associated with it, so should use TupleTableSlot of type BufferHeapTupleTableSlot. So, use different kinds of slots for an index scan. Next steps 1. Address TODO in the code. I have listed some of those above. 2. Right now we are using TupleTableSlotType, an enum, to create slots of required type. But extensions which want to add their own slot types won't be able to add a type in that enum. So, they will need to write their own MakeTupleTableSlot. That function uses the TupleTableSlotType to set TupleTableSlotOps and calculate the minimum size of slot. We could change the function to accept TupleTableSlotOps and the minimum size and it just works for all the extensions. Or it could just accept TupleTableSlotOps and there's a callback to calculate minimum memory required for the slot (say based on the tuple descriptor available). 3. compile with LLVM and fix any compilation and regression errors. 4. We need to do something with the name ExecStoreVirtualSlot(), which is being (and will be) used for all kinds of TupleTableSlot type. Right now I don't have any bright ideas. 5. ExecFetch* functions are now one liners, so we could make those inline and keep those in header file like, say slot_getattr. 6. ExecCopySlot can be a thin wrapper if we add a callback copyslot() and invoked on the destination slot type. 7. slot_attisnull() deforms a heap/minimal tuple if that status for given attribute is not available tts_isnull. Should we instead add a callback attisnull() which can use something like heap_isnull()? There are TODOs in the patches for 4, 5, 6, 7. I will continue to work on these steps. [1] https://www.postgresql.org/message-id/CAFjFpRerUFX=T0nSnCoroXAJMoo-xah9J+pi7+xDUx86PtQmew@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
On Mon, Jun 11, 2018 at 6:13 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > >> >> The slightly bigger issue is that several parts of the code currently >> require heaptuples they can scribble upon. E.g. nodeModifyTable.c >> materializes the tuple - those can be solved by using a local slot to >> store the tuple, rather than using the incoming one. In nearly all cases >> we'd still be left with the same number of tuple copy steps, as >> materializing the tuple with copy into the local storage anyway. A few >> other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or >> such instead of materialize. > > There is a patch changing callers of ExecMaterializeSlot() to use > ExecGetHeapTupleFromSlot() or ExecSaveSlot(), new functions added by > me. But I think that needs more work. I don't think the new functions > are necessary. We could use ExecCopySlotTuple(), ExecFetchSlotTuple() > in place of ExecGetHeapTupleFromSlot() appropriately. The callers who > just want the tuple and not necessarily a writable one, can use > ExecFetchSlotTuple(). Those who want a writable copy can use > ExecCopySlotTuple(). When the slot is *not* heap or buffer heap tuple > table slot, we materialize a heap tuple every time we want to get the > heap tuple out of the slot. If we are doing this multiple times > without changing the contents of the slot, we will leak memory. I have > TODOs to check whether that's a problem in practice. May be we should > use ExecCopySlotTuple() explicitly while fetching tuples from those > kinds of slot. Same is case while fetching a minimal heap tuple from > non minimal heap tuple slots. In the attached set of patches, the ExecMaterializeSlot(), ExecFetchSlotTuple() and ExecFetchSlotMinimalTuple() have been changed. Here's excerpt of the commit messages included in 0006 in the attached patch-set. --- Before this work, the contents of the slot could be in virtual form (in tts_datums and tts_isnull) and/or be stored as a heap tuple and/or a minimal tuple. When the contents were in heap or minimal tuple form, those tuples need not be "owned" by the slot. When a tuple is owned by a slot, it's slot's responsibility to free the memory consumed by the tuple when the tuple is not needed. That typically happened when the slot got a new tuple to store in it or when the slot itself was discarded, mostly along-with the other slots in tuple table. Then ExecMaterializeSlot() served two purposes: 1. If the contents of the slot were not "owned" by the slot, or if there was no heap tuple in the slot, it created a heap tuple which it "owned" from those contents. 2. It returned the heap tuple, which the slot "owned". This was typically used when the caller wanted, from the slot, a heap tuple which it can upon (esp. system attributes like tableOid) and expected it to be visible from within the slot. Since a single slot contained a tuple in various forms, it could "own" multiple forms at a time and return whichever was requested when ExecFetchSlotTuple() or ExecFetchSlotMinimalTuple() was called. Those functions, however, differed from ExecMaterializeSlot() in the sense that they returned a tuple even when the slot didn't own it. With this work, every slot contains a tuple in "virtual" form, but only one of the other two forms. Thus ExecMaterializeSlot() can not create a heap tuple, and let the slot "own" it, in a slot that can not contain a heap tuple. Neither ExecFetchSlotTuple() can return a tuple from a slot which can not contain a heap tuple without consuming memory (slots with minimal tuple can do some pointer swizzling and save memory but that lump of memory is not exactly same as a heap tuple memory lump.). Since such a heap tuple can not be "owned" by the slot, the memory consumed by the tuple needs to be freed by the caller. Similarly for ExecFetchSlotMinimalTuple(). Hence, in the new scheme the meaning of ExecMaterializeSlot(), ExecFetchSlotTuple() and ExecFetchSlotMinimalTuple() needs to change. ExecMaterializeSlot() --------------------- ExecMaterializeSlot() should save the contents of the slot into its native tuple format, whichever it is, so that it "owns" the tuple. ExecFetchSlotTuple() -------------------- ExecFetchSlotTuple() should just return the heap tuple when the underlying slot can "own" a heap tuple and error out when slot can not contain the requested format e.g. when ExecFetchSlotTuple() is called on a "virtual" or "minimal" tuple slot. This is in-line with what we are doing with ExecStore* functions. The earlier callers of ExecMaterializeSlot(), which expected a heap tuple to be returned, require to call ExecMaterializeSlot() (no return value) followed by ExecFetchSlotTuple(). Instead of that ExecFetchSlotTuple() is changed to accept a second argument "materialize". When it's true, the returned heap tuple is materialized and "owned" by the slot. When it's false, the returned heap tuple may not be "owned" by the slot. All the callers of ExecFetchSlotTuple() can be managed to pass a HeapTupleTableSlot or a BufferHeapTupleTableSlot. Instead of relying on the TupleTableSlotType (which may be expanded later for user defined TupleTableSlots) we rely on the presence get_heap_tuple() callback in the given slot. That callback is expected to return a heap tuple as is from the slot. In a few cases, ExecFetchSlotTuple() is called with a slot which is filled by tuplestore_gettupleslot(). We use a separate tuple store to store foreign table tuples for processing AFTER triggers and for storing tuples returned by a function. They are stored in the tuple store as minimal tuples. In both the cases, the minimal tuples are converted to heap tuple (using earlier version of ExecFetchSlotTuple()) before they are processed further after obtaining those from a tuple store using tuplestore_gettupleslot(). With the tuple table store abstraction a minimal tuple can be stored in a MinimalTupleTableSlot. Thus we have two options 1. Pass a MinimalTupleTableSlot to tuplestore_gettupleslot() and get a minimal tuple in this slot. Convert the minimal tuple to a heap tuple by allocating memory for the converted tuple. We have to explicitly release the memory allocated for the heap tuple after it's been processed. When do we do that exactly in AfterTriggerExecute() is not clear. ExecFetchSlotTupleDatum() could do that right away. 2. Pass a HeapTupleTableSlot to tuplestore_gettupleslot() and let it store the heap tuple crafted from minimal tuple, which can be freed within the same function, if necessary. The heap tuple gets freed when the slot is used to store the next tuple to be processed. The second option looks better since it continues to use slot's mechanism to "own" and free tuples that it "owns". Hence implemented the same. ExecFetchSlotMinimalTuple() --------------------------- Before this work, a TupleTableSlot could "own" a minimal tuple as well. Thus ExecFetchSlotMinimalTuple() returned a minimal tuple after constructing and "owning" it if it was not readily available. When the slot "owned" the minimal tuple, the memory consumed by the tuple was freed when a new tuple was stored in it or the slot was cleared. With this work, not all TupleTableSlot types can "own" a minimal tuple. When fetching a minimal tuple from a slot that can not "own" a tuple, memory is allocated to construct the minimal tuple, which needs to be freed explicitly. Hence ExecFetchSlotMinimalTuple() is modified to flag the caller whether it should free the memory consumed by the returned minimal tuple. Right now only a MinimalTupleTableSlot can own a minimal tuple. But we may change that in future or a user defined TupleTableSlot type (added through an extension) may be able to "own" a minimal tuple in it. Hence instead of relying upon TTS_IS_MINIMAL() macro to tell us whether the TupleTableSlot can "own" a minimal tuple or not, we rely on the set of callbacks. A TupleTableSlot which can hold a minimal tuple implements get_minimal_tuple callback. Other TupleTableSlot types leave the callback NULL. The minimal tuple returned by this function is usually copied into a hash table or a file. But, unlike ExecFetchSlotTuple() it's never written to. Hence the difference between signatures of the two functions. --- > >> >> I haven't done that, but I think we should split ExecStoreTuple() into >> multiple versions, that only work on specific types of tuple >> slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other >> code dealing with tuples would call ExecStoreHeapTuple(). The relevant >> callers already need to know what kind of tuple they're dealing with, >> therefore that's not a huge burden. > > I thought so too, but haven't done that change right now. Will work on > that. That's in my TODO list. This is still a TODO. 0001-0005 are same as the previous patch set. > > 0006 This is improved version of your POC patch. I have changed the > tuple table slot type in many places which showed up as regression > failures. Added/modified a bunch of comments, improved error messages, > corrected code at some places, addressed FIXME's at some places. Also > added macros to check the TupleTableSlot's type based on > TupleTableSlotOps set. The changes to ExecMaterializeSlot(), ExecFetchSlotTuple() and ExecFetchSlotMinimalTuple() discussed above are included in this patch. 0007, 0008 are same as previous patch set. > > 0009 Replaces ExecMaterializeSlot as described above. I will work on > this again and examine all the instances of ExecMaterializeSlot as > described above. This is merged with 0006. > > 0010 Index scan uses reorder queue in some cases. Tuples are stored > reorder queue as heap tuples without any associated buffer. While > returning a tuple from reorder queue, IndexNextWithReorder() should > use TupleTableSlot of type HeapTupleTableSlot. A tuple returned > directly from an index scan has a buffer associated with it, so should > use > TupleTableSlot of type BufferHeapTupleTableSlot. So, use different > kinds of slots for an index scan. This is 0009 > > > Next steps > 1. Address TODO in the code. I have listed some of those above. There are still a handful of TODOs in the patches. I will work on those next. > > 2. Right now we are using TupleTableSlotType, an enum, to create slots > of required type. But extensions which want to add their own slot > types won't be able to add a type in that enum. So, they will need to > write their own MakeTupleTableSlot. That function uses the > TupleTableSlotType to set TupleTableSlotOps and calculate the minimum > size of slot. We could change the function to accept TupleTableSlotOps > and the minimum size and it just works for all the extensions. Or it > could just accept TupleTableSlotOps and there's a callback to > calculate minimum memory required for the slot (say based on the tuple > descriptor available). This is still a TODO. > > 3. compile with LLVM and fix any compilation and regression errors. When I compiled server with just 0003 applied with LLVM, the compilation went well, but there was a server crash. That patch changes type of tts_nvalid from int32 to AttrNumber. I tried debugging the crash with a debug LLVM build, but couldn't complete the work. Attached patch attrnumber_llvm_type.patch is my incomplete attempt to fix that crash. I think, we should make it easy to change the data types of the members in structures shared by JIT and non-JIT code, may be automatically create both copies of the code somehow. I will get back to this after addressing other TODOs. > > 4. We need to do something with the name ExecStoreVirtualSlot(), which > is being (and will be) used for all kinds of TupleTableSlot type. > Right now I don't have any bright ideas. Still a TODO. > > 5. ExecFetch* functions are now one liners, so we could make those > inline and keep those in header file like, say slot_getattr. This is still a TODO. > > 6. ExecCopySlot can be a thin wrapper if we add a callback copyslot() > and invoked on the destination slot type. This is still a TODO. > > 7. slot_attisnull() deforms a heap/minimal tuple if that status for > given attribute is not available tts_isnull. Should we instead add a > callback attisnull() which can use something like heap_isnull()? This is still a TODO. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
On Thu, Jul 5, 2018 at 4:07 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: >>> I haven't done that, but I think we should split ExecStoreTuple() into >>> multiple versions, that only work on specific types of tuple >>> slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other >>> code dealing with tuples would call ExecStoreHeapTuple(). The relevant >>> callers already need to know what kind of tuple they're dealing with, >>> therefore that's not a huge burden. >> >> I thought so too, but haven't done that change right now. Will work on >> that. That's in my TODO list. Done. Added that as a separate patch 0001 since that change adds value by itself. 0001 in the earlier patch set got committed, 0002 in that patch set is not required anymore. 0002 - 0004 in this patch set are same as 0003-0005 in the previous patch set. 0005 in this patch set is 0006 in the previous one with a bunch of TODO's addressed. An important change is virtual tuple slot contents are never required to be freed when slot is cleared. 0006-0009 are same as 0007 - 0010 in the previous patch set. >> >> Next steps >> 1. Address TODO in the code. I have listed some of those above. > > There are still a handful of TODOs in the patches. I will work on those next. The number of TODOs has reduced, but there are still some that I am working on. > >> >> 2. Right now we are using TupleTableSlotType, an enum, to create slots >> of required type. But extensions which want to add their own slot >> types won't be able to add a type in that enum. So, they will need to >> write their own MakeTupleTableSlot. That function uses the >> TupleTableSlotType to set TupleTableSlotOps and calculate the minimum >> size of slot. We could change the function to accept TupleTableSlotOps >> and the minimum size and it just works for all the extensions. Or it >> could just accept TupleTableSlotOps and there's a callback to >> calculate minimum memory required for the slot (say based on the tuple >> descriptor available). This is still TODO. > >> >> 3. compile with LLVM and fix any compilation and regression errors. > > When I compiled server with just 0003 applied with LLVM, the > compilation went well, but there was a server crash. That patch > changes type of tts_nvalid from int32 to AttrNumber. I tried debugging > the crash with a debug LLVM build, but couldn't complete the work. > Attached patch attrnumber_llvm_type.patch is my incomplete attempt to > fix that crash. I think, we should make it easy to change the data > types of the members in structures shared by JIT and non-JIT code, may > be automatically create both copies of the code somehow. I will get > back to this after addressing other TODOs. > This is still a TODO >> >> 4. We need to do something with the name ExecStoreVirtualSlot(), which >> is being (and will be) used for all kinds of TupleTableSlot type. >> Right now I don't have any bright ideas. > Done and added as a separate patch 0010. Separate patch so that we can discard this change, if we don't agree on it. >> >> 5. ExecFetch* functions are now one liners, so we could make those >> inline and keep those in header file like, say slot_getattr. > Done. > >> >> 6. ExecCopySlot can be a thin wrapper if we add a callback copyslot() >> and invoked on the destination slot type. This is still a TODO > >> >> 7. slot_attisnull() deforms a heap/minimal tuple if that status for >> given attribute is not available tts_isnull. Should we instead add a >> callback attisnull() which can use something like heap_isnull()? > This is still a TODO. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
On Fri, Jul 13, 2018 at 3:45 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > > >>> >>> Next steps >>> 1. Address TODO in the code. I have listed some of those above. >> >> There are still a handful of TODOs in the patches. I will work on those next. > > The number of TODOs has reduced, but there are still some that I am working on. The attached patch set has all the TODOs fixed. > >> >>> >>> 2. Right now we are using TupleTableSlotType, an enum, to create slots >>> of required type. But extensions which want to add their own slot >>> types won't be able to add a type in that enum. So, they will need to >>> write their own MakeTupleTableSlot. That function uses the >>> TupleTableSlotType to set TupleTableSlotOps and calculate the minimum >>> size of slot. We could change the function to accept TupleTableSlotOps >>> and the minimum size and it just works for all the extensions. Or it >>> could just accept TupleTableSlotOps and there's a callback to >>> calculate minimum memory required for the slot (say based on the tuple >>> descriptor available). > > This is still TODO. Done. > >> >>> >>> 3. compile with LLVM and fix any compilation and regression errors. >> >> When I compiled server with just 0003 applied with LLVM, the >> compilation went well, but there was a server crash. That patch >> changes type of tts_nvalid from int32 to AttrNumber. I tried debugging >> the crash with a debug LLVM build, but couldn't complete the work. >> Attached patch attrnumber_llvm_type.patch is my incomplete attempt to >> fix that crash. I think, we should make it easy to change the data >> types of the members in structures shared by JIT and non-JIT code, may >> be automatically create both copies of the code somehow. I will get >> back to this after addressing other TODOs. >> > > This is still a TODO Still a TODO. > >> >>> >>> 6. ExecCopySlot can be a thin wrapper if we add a callback copyslot() >>> and invoked on the destination slot type. > > This is still a TODO Still a TODO. > >> >>> >>> 7. slot_attisnull() deforms a heap/minimal tuple if that status for >>> given attribute is not available tts_isnull. Should we instead add a >>> callback attisnull() which can use something like heap_isnull()? >> > > This is still a TODO. Done. I also noticed that slot_getattr() optimizes the cases when the requested attributes is NULL or is missing from a tuple. Given that a custom TupleTableSlot type can have its own optimizations for the same, added a new call back getattr() to obtain value of a given attribute from slot. The callback is called from slot_getattr(). -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Hi, Working on rebasing the pluggable storage patch on the current version of this. On 2018-07-26 17:09:08 +0530, Ashutosh Bapat wrote: > Done. I also noticed that slot_getattr() optimizes the cases when the > requested attributes is NULL or is missing from a tuple. Given that a > custom TupleTableSlot type can have its own optimizations for the > same, added a new call back getattr() to obtain value of a given > attribute from slot. The callback is called from slot_getattr(). I'm quite against this. This is just proliferation of callbacks without much use. Why do you think this is helpful? I think it's much better to go back to a single callback to deform here. Greetings, Andres Freund
On Sun, Aug 5, 2018 at 3:49 PM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > Working on rebasing the pluggable storage patch on the current version > of this. Thanks. Please let me know if you see any issues. > > On 2018-07-26 17:09:08 +0530, Ashutosh Bapat wrote: >> Done. I also noticed that slot_getattr() optimizes the cases when the >> requested attributes is NULL or is missing from a tuple. Given that a >> custom TupleTableSlot type can have its own optimizations for the >> same, added a new call back getattr() to obtain value of a given >> attribute from slot. The callback is called from slot_getattr(). > > I'm quite against this. This is just proliferation of callbacks without > much use. Why do you think this is helpful? I think it's much better > to go back to a single callback to deform here. > I think, I explained why getattr() needs to be a separate callback. There's a reason why slot_getattr() more code than just calling slot_getsomeattrs() and return the required one - the cases when the requested attribute is NULL or is missing from a tuple. Depending upon the tuple layout access to a simple attribute can be optimized without spending cycles to extract attributes prior to that one. Columnar store (I haven't seen Haribabu's patch), for example, stores columns from multiple rows together and thus doesn't have a compact version of tuple. In such cases extracting an individual attribute directly is far cheaper than extracting all the previous attributes. Why should we force the storage API to extract all the attributes in such a case? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Hi, On 2018-08-06 10:08:24 +0530, Ashutosh Bapat wrote: > I think, I explained why getattr() needs to be a separate callback. > There's a reason why slot_getattr() more code than just calling > slot_getsomeattrs() and return the required one - the cases when the > requested attribute is NULL or is missing from a tuple. Depending > upon the tuple layout access to a simple attribute can be optimized > without spending cycles to extract attributes prior to that one. > Columnar store (I haven't seen Haribabu's patch), for example, stores > columns from multiple rows together and thus doesn't have a compact > version of tuple. In such cases extracting an individual attribute > directly is far cheaper than extracting all the previous attributes. OTOH, it means we continually access the null bitmap instead of just tts_isnull[i]. Your logic about not deforming columns in this case would hold for *any* deforming of previous columns as well. That's an optimization that we probably want to implement at some point (e.g. by building a bitmap of needed columns in the planner), but I don't think we should do it together with this already large patchset. > Why should we force the storage API to extract all the attributes in > such a case? Because there's no need yet, and it complicates the API without corresponding benefit. Greetings, Andres Freund
On Mon, Aug 6, 2018 at 10:15 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2018-08-06 10:08:24 +0530, Ashutosh Bapat wrote: >> I think, I explained why getattr() needs to be a separate callback. >> There's a reason why slot_getattr() more code than just calling >> slot_getsomeattrs() and return the required one - the cases when the >> requested attribute is NULL or is missing from a tuple. Depending >> upon the tuple layout access to a simple attribute can be optimized >> without spending cycles to extract attributes prior to that one. >> Columnar store (I haven't seen Haribabu's patch), for example, stores >> columns from multiple rows together and thus doesn't have a compact >> version of tuple. In such cases extracting an individual attribute >> directly is far cheaper than extracting all the previous attributes. > > OTOH, it means we continually access the null bitmap instead of just > tts_isnull[i]. Nope. The slot_getattr implementation in these patches as well as in the current master return from tts_isnull if the array has the information. > > Your logic about not deforming columns in this case would hold for *any* > deforming of previous columns as well. That's an optimization that we > probably want to implement at some point (e.g. by building a bitmap of > needed columns in the planner), but I don't think we should do it > together with this already large patchset. Agree about optimizing using a separate bitmap indicating validity of a particular value. > > >> Why should we force the storage API to extract all the attributes in >> such a case? > > Because there's no need yet, and it complicates the API without > corresponding benefit. The earlier version of slot_getattr() was optimized to access a given attribute. That must have been done for some reason. Leaving that optimization aside for this work will cause regression in the cases where that optimization matters. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Thu, Jul 26, 2018 at 5:09 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: >> >>> >>>> >>>> 3. compile with LLVM and fix any compilation and regression errors. >>> >>> When I compiled server with just 0003 applied with LLVM, the >>> compilation went well, but there was a server crash. That patch >>> changes type of tts_nvalid from int32 to AttrNumber. I tried debugging >>> the crash with a debug LLVM build, but couldn't complete the work. >>> Attached patch attrnumber_llvm_type.patch is my incomplete attempt to >>> fix that crash. I think, we should make it easy to change the data >>> types of the members in structures shared by JIT and non-JIT code, may >>> be automatically create both copies of the code somehow. I will get >>> back to this after addressing other TODOs. >>> Still a TODO. > >> >>> >>>> >>>> 6. ExecCopySlot can be a thin wrapper if we add a callback copyslot() >>>> and invoked on the destination slot type. >> Done. With this set of patches, make check-world passes clean. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
On Wed, Aug 8, 2018 at 5:07 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: Amit Khandekar offlist told me that the previous patch-set doesn't apply cleanly on the latest head. PFA patches rebased on 31380bc7c204e7cfa9c9e1c62947909e2b38577c >>>>> >>>>> 3. compile with LLVM and fix any compilation and regression errors. >>>> >>>> When I compiled server with just 0003 applied with LLVM, the >>>> compilation went well, but there was a server crash. That patch >>>> changes type of tts_nvalid from int32 to AttrNumber. I tried debugging >>>> the crash with a debug LLVM build, but couldn't complete the work. >>>> Attached patch attrnumber_llvm_type.patch is my incomplete attempt to >>>> fix that crash. I think, we should make it easy to change the data >>>> types of the members in structures shared by JIT and non-JIT code, may >>>> be automatically create both copies of the code somehow. I will get >>>> back to this after addressing other TODOs. >>>> > still a TODO -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
PFA patches rebased on ee80124811908ef1d4679296c46e36bd8a32b9de. On Thu, Aug 9, 2018 at 5:12 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: >>>>>> >>>>>> 3. compile with LLVM and fix any compilation and regression errors. >>>>> >>>>> When I compiled server with just 0003 applied with LLVM, the >>>>> compilation went well, but there was a server crash. That patch >>>>> changes type of tts_nvalid from int32 to AttrNumber. Done. Fixed the crash in the attached patch set. Thanks Andres for offlist help. I also tried compiling the whole patch-set with LLVM. I have fixed compilation errors in llvm_compile_expr(). The compilation errors in slot_compile_deform() requite more work. That function accesses tuple, slow and offset properties from a TupleTableSlot. The tuple and offset properties are now moved to HeapTupleTableSlot so they can't be accessed from a TupleTableSlot unless it's HeapTupleTableSlot or BufferHeapTupleTableSlot and even then TupleTableSlot needs to be casted into a HeapTupleTableSlot for accessing those properties. I do not know yet as to how casting works in LLVM code. slow property of a TupleTableSlot is now available through tts_flags. We need to add LLVM code to fetch tts_flags and perform bit operation on it to get or set slow property. I haven't found any precedence for LLVM bit operations in postgresql's JIT code. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Hi, On 2018-08-17 12:10:20 +0530, Ashutosh Bapat wrote: > We need to add LLVM code to fetch tts_flags and > perform bit operation on it to get or set slow property. I haven't > found any precedence for LLVM bit operations in postgresql's JIT code. There are several, look for the infomask accesses in slot_compiler_deform. I'll try to do the adaption later today. Greetings, Andres Freund
On 2018-08-17 01:07:06 -0700, Andres Freund wrote: > Hi, > > On 2018-08-17 12:10:20 +0530, Ashutosh Bapat wrote: > > We need to add LLVM code to fetch tts_flags and > > perform bit operation on it to get or set slow property. I haven't > > found any precedence for LLVM bit operations in postgresql's JIT code. > > There are several, look for the infomask accesses in > slot_compiler_deform. > > I'll try to do the adaption later today. Attached is a series of patches doing so. The previous implementation of sysvar accesses wasn't actually working - the slot argument was uninitialized. I also noticed an independent issue in your changes to ExecInitScanTupleSlot(): You can't assume that the plan belonging to the ScanState have a Scan node in their plan. Look e.g. at Material, Sort etc. So currently your scanrelid access is often just uninitialized data. Greetings, Andres Freund
Attachment
On Sat, Aug 18, 2018 at 8:53 PM, Andres Freund <andres@anarazel.de> wrote: > On 2018-08-17 01:07:06 -0700, Andres Freund wrote: >> Hi, >> >> On 2018-08-17 12:10:20 +0530, Ashutosh Bapat wrote: >> > We need to add LLVM code to fetch tts_flags and >> > perform bit operation on it to get or set slow property. I haven't >> > found any precedence for LLVM bit operations in postgresql's JIT code. >> >> There are several, look for the infomask accesses in >> slot_compiler_deform. >> >> I'll try to do the adaption later today. > > Attached is a series of patches doing so. The previous implementation > of sysvar accesses wasn't actually working - the slot argument was > uninitialized. Sorry for that. I couldn't test it since the code wasn't compiling. The changes to slot_compile_deform changes you provided in the patch fix the compilation errors. Thanks for those changes. > > I also noticed an independent issue in your changes to > ExecInitScanTupleSlot(): You can't assume that the plan belonging to the > ScanState have a Scan node in their plan. Look e.g. at Material, Sort > etc. So currently your scanrelid access is often just uninitialized > data. I have incorporated changes in your patches into the relevant patches in the updated patch-set. With this patch-set make check-world passes for me. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
On 2018-08-20 17:51:38 +0530, Ashutosh Bapat wrote: > > I also noticed an independent issue in your changes to > > ExecInitScanTupleSlot(): You can't assume that the plan belonging to the > > ScanState have a Scan node in their plan. Look e.g. at Material, Sort > > etc. So currently your scanrelid access is often just uninitialized > > data. > > I have incorporated changes in your patches into the relevant patches > in the updated patch-set. With this patch-set make check-world passes > for me. Have you addressed the issue I commented on above? Greetings, Andres Freund
On Mon, Aug 20, 2018 at 5:58 PM, Andres Freund <andres@anarazel.de> wrote: > On 2018-08-20 17:51:38 +0530, Ashutosh Bapat wrote: >> > I also noticed an independent issue in your changes to >> > ExecInitScanTupleSlot(): You can't assume that the plan belonging to the >> > ScanState have a Scan node in their plan. Look e.g. at Material, Sort >> > etc. So currently your scanrelid access is often just uninitialized >> > data. >> >> I have incorporated changes in your patches into the relevant patches >> in the updated patch-set. With this patch-set make check-world passes >> for me. > > Have you addressed the issue I commented on above? Sorry, forgot about that. Here's the patch set with that addressed. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Hi, On 2018-08-20 19:51:33 +0530, Ashutosh Bapat wrote: > Sorry, forgot about that. Here's the patch set with that addressed. Btw, you attach files as tar.zip, but they're actually gzip compressed... > From 838a463646a048b3dccff95079a514fdc86effb3 Mon Sep 17 00:00:00 2001 > From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> > Date: Mon, 13 Aug 2018 11:27:57 +0530 > Subject: [PATCH 01/11] Split ExecStoreTuple into ExecStoreHeapTuple and > ExecStoreBufferHeapTuple > > ExecStoreTuple() accepts a heap tuple from a buffer or constructed > on-the-fly. In the first case the caller passed a valid buffer and in > the later case it passes InvalidBuffer. In the first case, > ExecStoreTuple() pins the given buffer and in the later case it > records shouldFree flag. The function has some extra checks to > differentiate between the two cases. The usecases never overlap thus > spending extra cycles in checks is useless. Hence separate these > usecases into separate functions ExecStoreHeapTuple() to store > on-the-fly tuple and ExecStoreBufferHeapTuple() to store an on-disk > tuple from a buffer. This allows to shave some extra cycles while > storing a tuple in the slot. It doesn't *yet* allow shaving extra cycles, no? > * SLOT ACCESSORS > * ExecSetSlotDescriptor - set a slot's tuple descriptor > - * ExecStoreTuple - store a physical tuple in the slot > + * ExecStoreHeapTuple - store an on-the-fly heap tuple in the slot > + * ExecStoreBufferHeapTuple - store an on-disk heap tuple in the slot > * ExecStoreMinimalTuple - store a minimal physical tuple in the slot > * ExecClearTuple - clear contents of a slot > * ExecStoreVirtualTuple - mark slot as containing a virtual > * tuple I'd advocate for a separate patch ripping these out, they're almost always out of date. > /* -------------------------------- > - * ExecStoreTuple > + * ExecStoreHeapTuple > * > - * This function is used to store a physical tuple into a specified > + * This function is used to store an on-the-fly physical tuple into a specified > * slot in the tuple table. > * > * tuple: tuple to store > * slot: slot to store it in > - * buffer: disk buffer if tuple is in a disk page, else InvalidBuffer > * shouldFree: true if ExecClearTuple should pfree() the tuple > * when done with it > * > - * If 'buffer' is not InvalidBuffer, the tuple table code acquires a pin > - * on the buffer which is held until the slot is cleared, so that the tuple > - * won't go away on us. > + * shouldFree is normally set 'true' for tuples constructed on-the-fly. But it > + * can be 'false' when the referenced tuple is held in a tuple table slot > + * belonging to a lower-level executor Proc node. In this case the lower-level > + * slot retains ownership and responsibility for eventually releasing the > + * tuple. When this method is used, we must be certain that the upper-level > + * Proc node will lose interest in the tuple sooner than the lower-level one > + * does! If you're not certain, copy the lower-level tuple with heap_copytuple > + * and let the upper-level table slot assume ownership of the copy! > + * > + * Return value is just the passed-in slot pointer. > + * -------------------------------- > + */ > +TupleTableSlot * > +ExecStoreHeapTuple(HeapTuple tuple, > + TupleTableSlot *slot, > + bool shouldFree) > +{ > + /* > + * sanity checks > + */ > + Assert(tuple != NULL); > + Assert(slot != NULL); > + Assert(slot->tts_tupleDescriptor != NULL); > + > + /* > + * Free any old physical tuple belonging to the slot. > + */ > + if (slot->tts_shouldFree) > + heap_freetuple(slot->tts_tuple); > + if (slot->tts_shouldFreeMin) > + heap_free_minimal_tuple(slot->tts_mintuple); > + > + /* > + * Store the new tuple into the specified slot. > + */ > + slot->tts_isempty = false; > + slot->tts_shouldFree = shouldFree; > + slot->tts_shouldFreeMin = false; > + slot->tts_tuple = tuple; > + slot->tts_mintuple = NULL; > + > + /* Mark extracted state invalid */ > + slot->tts_nvalid = 0; > + > + return slot; > +} Uh, there could very well be a buffer previously stored in the slot, no? This can't currently be applied independently afaict. > From c4c55bc0d501400ffca2e7393039b2d38660cd2d Mon Sep 17 00:00:00 2001 > From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> > Date: Mon, 13 Aug 2018 11:27:57 +0530 > Subject: [PATCH 02/11] tts_nvalid in TupleTableSlot is AttrNumber > @@ -125,7 +125,7 @@ typedef struct TupleTableSlot > MemoryContext tts_mcxt; /* slot itself is in this context */ > Buffer tts_buffer; /* tuple's buffer, or InvalidBuffer */ > #define FIELDNO_TUPLETABLESLOT_NVALID 9 > - int tts_nvalid; /* # of valid values in tts_values */ > + AttrNumber tts_nvalid; /* # of valid values in tts_values */ > #define FIELDNO_TUPLETABLESLOT_VALUES 10 > Datum *tts_values; /* current per-attribute values */ > #define FIELDNO_TUPLETABLESLOT_ISNULL 11 Shouldn't this be adapting at least a *few* more things, like slot_getattr's argument? > /* > - * Fill in missing values for a TupleTableSlot. > - * > - * This is only exposed because it's needed for JIT compiled tuple > - * deforming. That exception aside, there should be no callers outside of this > - * file. > - */ > -void > -slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum) > -{ > - AttrMissing *attrmiss = NULL; > - int missattnum; > - > - if (slot->tts_tupleDescriptor->constr) > - attrmiss = slot->tts_tupleDescriptor->constr->missing; > - > - if (!attrmiss) > - { > - /* no missing values array at all, so just fill everything in as NULL */ > - memset(slot->tts_values + startAttNum, 0, > - (lastAttNum - startAttNum) * sizeof(Datum)); > - memset(slot->tts_isnull + startAttNum, 1, > - (lastAttNum - startAttNum) * sizeof(bool)); > - } > - else > - { > - /* if there is a missing values array we must process them one by one */ > - for (missattnum = startAttNum; > - missattnum < lastAttNum; > - missattnum++) > - { > - slot->tts_values[missattnum] = attrmiss[missattnum].am_value; > - slot->tts_isnull[missattnum] = !attrmiss[missattnum].am_present; > - } > - } > -} I would split out these moves into a separate commit, they are trivially committable separately. The commit's pretty big already, and that'd make it easier to see the actual differences. > - > -/* > * heap_compute_data_size > * Determine size of the data area of a tuple to be constructed > */ > @@ -1407,10 +1370,9 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc, > * re-computing information about previously extracted attributes. > * slot->tts_nvalid is the number of attributes already extracted. > */ > -static void > -slot_deform_tuple(TupleTableSlot *slot, int natts) > +void > +slot_deform_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, int natts) > { This should be renamed to include "heap" in the name, as it's not going to be usable for, say, zheap. > -/* > - * slot_getsysattr > - * This function fetches a system attribute of the slot's current tuple. > - * Unlike slot_getattr, if the slot does not contain system attributes, > - * this will return false (with a NULL attribute value) instead of > - * throwing an error. > - */ > -bool > -slot_getsysattr(TupleTableSlot *slot, int attnum, > - Datum *value, bool *isnull) > -{ > - HeapTuple tuple = slot->tts_tuple; > - > - Assert(attnum < 0); /* else caller error */ > - if (tuple == NULL || > - tuple == &(slot->tts_minhdr)) > - { > - /* No physical tuple, or minimal tuple, so fail */ > - *value = (Datum) 0; > - *isnull = true; > - return false; > - } > - *value = heap_getsysattr(tuple, attnum, slot->tts_tupleDescriptor, isnull); > - return true; > -} I think I was wrong at saying that we should remove this. I think you were right that it should become a callback... > +/* > + * This is a function used by all getattr() callbacks which deal with a heap > + * tuple or some tuple format which can be represented as a heap tuple e.g. a > + * minimal tuple. > + * > + * heap_getattr considers any attnum beyond the attributes available in the > + * tuple as NULL. This function however returns the values of missing > + * attributes from the tuple descriptor in that case. Also this function does > + * not support extracting system attributes. > + * > + * If the attribute needs to be fetched from the tuple, the function fills in > + * tts_values and tts_isnull arrays upto the required attnum. > + */ > +Datum > +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, > + int attnum, bool *isnull) > +{ > + HeapTupleHeader tup = tuple->t_data; > + Assert(slot->tts_nvalid < attnum); > + > + Assert(attnum > 0); > + > + if (attnum > HeapTupleHeaderGetNatts(tup)) > + return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull); > + > + /* > + * check if target attribute is null: no point in groveling through tuple > + */ > + if (HeapTupleHasNulls(tuple) && att_isnull(attnum - 1, tup->t_bits)) > + { > + *isnull = true; > + return (Datum) 0; > + } I still think this is an optimization with a negative benefit, especially as it requires an extra callback. We should just rely on slot_deform_tuple and then access that. That'll also just access the null bitmap for the relevant column, and it'll make successive accesses cheaper. > @@ -2883,7 +2885,7 @@ CopyFrom(CopyState cstate) > if (slot == NULL) /* "do nothing" */ > skip_tuple = true; > else /* trigger might have changed tuple */ > - tuple = ExecMaterializeSlot(slot); > + tuple = ExecFetchSlotTuple(slot, true); > } Could we do the Materialize vs Fetch vs Copy change separately? > diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c > index e1eb7c3..9957c70 100644 > --- a/src/backend/commands/matview.c > +++ b/src/backend/commands/matview.c > @@ -484,7 +484,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self) > * get the heap tuple out of the tuple table slot, making sure we have a > * writable copy > */ > - tuple = ExecMaterializeSlot(slot); > + tuple = ExecCopySlotTuple(slot); > > heap_insert(myState->transientrel, > tuple, > @@ -494,6 +494,9 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self) > > /* We know this is a newly created relation, so there are no indexes */ > > + /* Free the copied tuple. */ > + heap_freetuple(tuple); > + > return true; > } This'll potentially increase a fair amount of extra allocation overhead, no? > diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c > index 9d6e25a..1b4e726 100644 > --- a/src/backend/executor/execExprInterp.c > +++ b/src/backend/executor/execExprInterp.c > @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) > > EEO_CASE(EEOP_INNER_SYSVAR) > { > - int attnum = op->d.var.attnum; > - Datum d; > - > - /* these asserts must match defenses in slot_getattr */ > - Assert(innerslot->tts_tuple != NULL); > - Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr)); > - > - /* heap_getsysattr has sufficient defenses against bad attnums */ > - d = heap_getsysattr(innerslot->tts_tuple, attnum, > - innerslot->tts_tupleDescriptor, > - op->resnull); > - *op->resvalue = d; > + ExecEvalSysVar(state, op, econtext, innerslot); Please split this out into a separate patch. > +/* > + * TupleTableSlotOps implementation for BufferHeapTupleTableSlot. > + */ > + > +static void > +tts_buffer_init(TupleTableSlot *slot) > +{ > +} Should rename these to buffer_heap or such. > +/* > + * Store the given tuple into the given BufferHeapTupleTableSlot and pin the > + * given buffer. If the tuple already contained in the slot can be freed free > + * it. > + */ > +static void > +tts_buffer_store_tuple(TupleTableSlot *slot, HeapTuple tuple, Buffer buffer) > +{ > + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; > + > + if (IS_TTS_SHOULDFREE(slot)) > + { > + /* > + * A heap tuple stored in a BufferHeapTupleTableSlot should have a > + * buffer associated with it, unless it's materialized. > + */ > + Assert(!BufferIsValid(bslot->buffer)); > + > + heap_freetuple(bslot->base.tuple); > + RESET_TTS_SHOULDFREE(slot); > + } > + > + RESET_TTS_EMPTY(slot); > + slot->tts_nvalid = 0; > + bslot->base.tuple = tuple; > + bslot->base.off = 0; > + > + /* > + * If tuple is on a disk page, keep the page pinned as long as we hold a > + * pointer into it. We assume the caller already has such a pin. > + * > + * This is coded to optimize the case where the slot previously held a > + * tuple on the same disk page: in that case releasing and re-acquiring > + * the pin is a waste of cycles. This is a common situation during > + * seqscans, so it's worth troubling over. > + */ > + if (bslot->buffer != buffer) > + { > + if (BufferIsValid(bslot->buffer)) > + ReleaseBuffer(bslot->buffer); > + bslot->buffer = buffer; > + IncrBufferRefCount(buffer); > + } > +} This needs to also support storing a non-buffer tuple, I think. > +/* > + * TupleTableSlotOps for each of TupleTableSlotTypes. These are used to > + * identify the type of slot. > + */ > +const TupleTableSlotOps TTSOpsVirtual = { > + sizeof(TupleTableSlot), > + tts_virtual_init, > + tts_virtual_release, > + tts_virtual_clear, > + tts_virtual_getsomeattrs, > + tts_virtual_attisnull, > + tts_virtual_getattr, > + tts_virtual_materialize, > + tts_virtual_copyslot, > + > + /* > + * A virtual tuple table slot can not "own" a heap tuple or a minimal > + * tuple. > + */ > + NULL, > + NULL, > + tts_virtual_copy_heap_tuple, > + tts_virtual_copy_minimal_tuple, > +}; As we're now going to require C99, could you convert these into designated initializer style (i.e. .init = tts_heap_init etc)? > @@ -353,25 +1285,9 @@ ExecStoreHeapTuple(HeapTuple tuple, > Assert(slot != NULL); > Assert(slot->tts_tupleDescriptor != NULL); > > - /* > - * Free any old physical tuple belonging to the slot. > - */ > - if (IS_TTS_SHOULDFREE(slot)) > - heap_freetuple(slot->tts_tuple); > - if (IS_TTS_SHOULDFREEMIN(slot)) > - heap_free_minimal_tuple(slot->tts_mintuple); > - > - /* > - * Store the new tuple into the specified slot. > - */ > - RESET_TTS_EMPTY(slot); > - shouldFree ? SET_TTS_SHOULDFREE(slot) : RESET_TTS_SHOULDFREE(slot); > - RESET_TTS_SHOULDFREEMIN(slot); > - slot->tts_tuple = tuple; > - slot->tts_mintuple = NULL; > - > - /* Mark extracted state invalid */ > - slot->tts_nvalid = 0; > + if (!TTS_IS_HEAPTUPLE(slot)) > + elog(ERROR, "trying to store a heap tuple into wrong type of slot"); > + tts_heap_store_tuple(slot, tuple, shouldFree); > > return slot; > } This should allow for buffer tuples too. > @@ -983,9 +1595,10 @@ ExecInitExtraTupleSlot(EState *estate, TupleDesc tupledesc) > * ---------------- > */ > TupleTableSlot * > -ExecInitNullTupleSlot(EState *estate, TupleDesc tupType) > +ExecInitNullTupleSlot(EState *estate, TupleDesc tupType, > + const TupleTableSlotOps *tts_cb) > { > - TupleTableSlot *slot = ExecInitExtraTupleSlot(estate, tupType); > + TupleTableSlot *slot = ExecInitExtraTupleSlot(estate, tupType, tts_cb); > > return ExecStoreAllNullTuple(slot); > } It's a bit weird that the type name is *Ops but the param is tts_cb, no? > @@ -1590,7 +1590,8 @@ ExecHashTableInsert(HashJoinTable hashtable, > TupleTableSlot *slot, > uint32 hashvalue) > { > - MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot); > + bool shouldFree; > + MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, &shouldFree); > int bucketno; > int batchno; > > @@ -1664,6 +1665,9 @@ ExecHashTableInsert(HashJoinTable hashtable, > hashvalue, > &hashtable->innerBatchFile[batchno]); > } > + > + if (shouldFree) > + heap_free_minimal_tuple(tuple); > } Hm, how about splitting these out? > @@ -277,10 +277,32 @@ ExecInsert(ModifyTableState *mtstate, > OnConflictAction onconflict = node->onConflictAction; > > /* > - * get the heap tuple out of the tuple table slot, making sure we have a > - * writable copy > + * Get the heap tuple out of the tuple table slot, making sure we have a > + * writable copy. > + * > + * If the slot can contain a heap tuple, materialize the tuple within the > + * slot itself so that the slot "owns" it and any changes to the tuple > + * reflect in the slot as well. > + * > + * Otherwise, store the copy of the heap tuple in es_dml_input_tuple_slot, which > + * is assumed to be able to hold a heap tuple, so that repeated requests > + * for heap tuple in the following code will not create multiple copies and > + * leak memory. Also keeping the tuple in the slot makes sure that it will > + * be freed when it's no more needed either because a trigger modified it > + * or when we are done processing it. > */ > - tuple = ExecMaterializeSlot(slot); > + if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))) > + { > + TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot; > + > + Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot)); > + if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor) > + ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor); > + ExecCopySlot(es_slot, slot); > + slot = es_slot; > + } > + > + tuple = ExecFetchSlotTuple(slot, true); I strongly dislike this. Could you look at my pluggable storage tree and see whether you could do something similar here? > @@ -2402,8 +2454,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) > */ > mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists); > > - /* Set up a slot for the output of the RETURNING projection(s) */ > - ExecInitResultTupleSlotTL(estate, &mtstate->ps); > + /* > + * Set up a slot for the output of the RETURNING projection(s). > + * ExecDelete() requies the contents of the slot to be > + * saved/materialized, so use heap tuple table slot for a DELETE. > + * Otherwise a virtual tuple table slot suffices. > + */ > + ExecInitResultTupleSlotTL(estate, &mtstate->ps, > + operation == CMD_DELETE ? > + &TTSOpsHeapTuple : &TTSOpsVirtual); > slot = mtstate->ps.ps_ResultTupleSlot; I'm not clear on why this this the case? > /* Need a > diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c > index ecdbe7f..ea2858b 100644 > --- a/src/backend/executor/tqueue.c > +++ b/src/backend/executor/tqueue.c > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self) > TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self; > HeapTuple tuple; > shm_mq_result result; > + bool tuple_copied = false; > + > + /* Get the tuple out of slot, if necessary converting the slot's contents > + * into a heap tuple by copying. In the later case we need to free the copy. > + */ > + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)) > + { > + tuple = ExecFetchSlotTuple(slot, true); > + tuple_copied = false; > + } > + else > + { > + tuple = ExecCopySlotTuple(slot); > + tuple_copied = true; > + } To me needing this if() here is a bad sign, I think we want a ExecFetchSlotTuple* like api with a bool *shouldFree arg, like you did for minimal tuples instead. > diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c > index 66cc5c3..b44438b 100644 > --- a/src/backend/tcop/pquery.c > +++ b/src/backend/tcop/pquery.c > @@ -1071,7 +1071,7 @@ RunFromStore(Portal portal, ScanDirection direction, uint64 count, > uint64 current_tuple_count = 0; > TupleTableSlot *slot; > > - slot = MakeSingleTupleTableSlot(portal->tupDesc); > + slot = MakeSingleTupleTableSlot(portal->tupDesc, &TTSOpsMinimalTuple); > > dest->rStartup(dest, CMD_SELECT, portal->tupDesc); These fairly rote changes make it really hard to see the actual meat of the changes in the patch. I think one way to address that would be to introduce stub &TTSOps* variables, add them to MakeSingleTupleTableSlot etc, but still have them return the "old style" slots. Then that can be reviewed separately. > @@ -1375,9 +1376,9 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS) > * previous row available for comparisons. This is accomplished by > * swapping the slot pointer variables after each row. > */ > - extraslot = MakeSingleTupleTableSlot(osastate->qstate->tupdesc); > + extraslot = MakeSingleTupleTableSlot(osastate->qstate->tupdesc, > + &TTSOpsMinimalTuple); > slot2 = extraslot; > - > /* iterate till we find the hypothetical row */ > while (tuplesort_gettupleslot(osastate->sortstate, true, true, slot, > &abbrevVal)) superflous change. > diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c > index 5560a3e..ba98908 100644 > --- a/src/backend/utils/sort/tuplestore.c > +++ b/src/backend/utils/sort/tuplestore.c > @@ -1073,6 +1073,13 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward, > * pointer to a tuple held within the tuplestore. The latter is more > * efficient but the slot contents may be corrupted if additional writes to > * the tuplestore occur. (If using tuplestore_trim, see comments therein.) > + * > + * If the given slot can not contain a minimal tuple, the given minimal tuple > + * is converted into the form that the given slot can contain. Irrespective of > + * the value of copy, that conversion might need to create a copy. If > + * should_free is set to true by tuplestore_gettuple(), the minimal tuple is > + * freed after the conversion, if necessary. Right now, the function only > + * supports slots of type HeapTupleTableSlot, other than MinimalTupleTableSlot. > */ > bool > tuplestore_gettupleslot(Tuplestorestate *state, bool forward, > @@ -1085,12 +1092,25 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward, > > if (tuple) > { > - if (copy && !should_free) > + if (TTS_IS_MINIMALTUPLE(slot)) > { > - tuple = heap_copy_minimal_tuple(tuple); > + if (copy && !should_free) > + { > + tuple = heap_copy_minimal_tuple(tuple); > + should_free = true; > + } > + ExecStoreMinimalTuple(tuple, slot, should_free); > + } > + else if (TTS_IS_HEAPTUPLE(slot)) > + { > + HeapTuple htup = heap_tuple_from_minimal_tuple(tuple); > + > + if (should_free) > + heap_free_minimal_tuple(tuple); > should_free = true; > + > + ExecStoreHeapTuple(htup, slot, should_free); > } > - ExecStoreMinimalTuple(tuple, slot, should_free); > return true; > } > else Why is this a good idea? Shouldn't the caller always have a minimal slot here? This is problematic, because it means this'd need adapting for every new slot type, which'd be kinda against the idea of the whole thing... > +#define TTS_IS_VIRTUAL(slot) ((slot)->tts_cb == &TTSOpsVirtual) > +#define TTS_IS_HEAPTUPLE(slot) ((slot)->tts_cb == &TTSOpsHeapTuple) > +#define TTS_IS_MINIMALTUPLE(slot) ((slot)->tts_cb == &TTSOpsMinimalTuple) > +#define TTS_IS_BUFFERTUPLE(slot) ((slot)->tts_cb == &TTSOpsBufferTuple) > + > +extern Datum ExecFetchSlotTupleDatum(TupleTableSlot *slot); this is a weird place for the function protoype? > +/* TupleTableSlotType specific routines */ > +typedef struct TupleTableSlotOps > +{ > + /* Minimum size of the slot */ > + size_t base_slot_size; > + > + /* Initialization. */ > + void (*init)(TupleTableSlot *slot); > + > + /* Destruction. */ > + void (*release)(TupleTableSlot *slot); > + > + /* > + * Clear the contents of the slot. Only the contents are expected to be > + * cleared and not the tuple descriptor. Typically an implementation of > + * this callback should free the memory allocated for the tuple contained > + * in the slot. > + */ > + void (*clear)(TupleTableSlot *slot); > + > + /* > + * Fill up first natts entries of tts_values and tts_isnull arrays with > + * values from the tuple contained in the slot. The function may be called > + * with natts more than the number of attributes available in the tuple, in > + * which case it should fill up as many entries as the number of available > + * attributes. The callback should update tts_nvalid with number of entries > + * filled up. > + */ > + void (*getsomeattrs)(TupleTableSlot *slot, int natts); > + > + /* > + * Returns true if the attribute given by attnum is NULL, return false > + * otherwise. Some slot types may have more efficient methods to return > + * NULL-ness of a given attribute compared to checking NULL-ness after > + * calling getsomeattrs(). So this is a separate callback. We expect this > + * callback to be invoked by slot_attisnull() only. That function returns > + * if the information is available readily e.g. in tts_isnull array. The > + * callback need not repeat the same. > + */ > + bool (*attisnull)(TupleTableSlot *slot, int attnum); > + > + /* > + * Returns value of the given attribute as a datum and sets isnull to > + * false, if it's not NULL. If the attribute is NULL, it sets isnull to > + * true. Some slot types may have more efficient methods to return value of > + * a given attribute rather than returning the attribute value from > + * tts_values and tts_isnull after calling getsomeattrs(). So this is a > + * separate callback. We expect this callback to be invoked by > + * slot_getattr() only. That function returns if the information is > + * available readily e.g. in tts_values and tts_isnull array or can be > + * inferred from tuple descriptor. The callback need not repeat the same. > + */ > + Datum (*getattr)(TupleTableSlot *slot, int attnum, bool *isnull); These two really show go. > +/* virtual or base type */ > +struct TupleTableSlot > { > NodeTag type; > #define FIELDNO_TUPLETABLESLOT_FLAGS 1 > - uint16 tts_flags; /* Boolean states */ > -#define FIELDNO_TUPLETABLESLOT_TUPLE 2 > - HeapTuple tts_tuple; /* physical tuple, or NULL if virtual */ > -#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 3 > - TupleDesc tts_tupleDescriptor; /* slot's tuple descriptor */ > - MemoryContext tts_mcxt; /* slot itself is in this context */ > - Buffer tts_buffer; /* tuple's buffer, or InvalidBuffer */ > -#define FIELDNO_TUPLETABLESLOT_NVALID 6 > + uint16 tts_flags; > +#define FIELDNO_TUPLETABLESLOT_NVALID 2 > AttrNumber tts_nvalid; /* # of valid values in tts_values */ > -#define FIELDNO_TUPLETABLESLOT_VALUES 7 > + > + const TupleTableSlotOps *const tts_cb; > +#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4 > + TupleDesc tts_tupleDescriptor; /* slot's tuple descriptor */ > +#define FIELDNO_TUPLETABLESLOT_VALUES 5 > Datum *tts_values; /* current per-attribute values */ > -#define FIELDNO_TUPLETABLESLOT_ISNULL 8 > +#define FIELDNO_TUPLETABLESLOT_ISNULL 6 > bool *tts_isnull; /* current per-attribute isnull flags */ > - MinimalTuple tts_mintuple; /* minimal tuple, or NULL if none */ > - HeapTupleData tts_minhdr; /* workspace for minimal-tuple-only case */ > -#define FIELDNO_TUPLETABLESLOT_OFF 11 > - uint32 tts_off; /* saved state for slot_deform_tuple */ > -} TupleTableSlot; > > -#define TTS_HAS_PHYSICAL_TUPLE(slot) \ > - ((slot)->tts_tuple != NULL && (slot)->tts_tuple != &((slot)->tts_minhdr)) > + /* can we optimize away? */ > + MemoryContext tts_mcxt; /* slot itself is in this context */ > +}; the "can we" comment above is pretty clearly wrong, I think (Yes, I made it...). > From 84046126d5b8c9d780cb7134048cc717bda462a8 Mon Sep 17 00:00:00 2001 > From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> > Date: Mon, 13 Aug 2018 11:27:57 +0530 > Subject: [PATCH 07/11] Reset expression context just after resetting per tuple > context in ExecModifyTable(). > > Expression context saved in ps_ExprContext for ModifyTable node is > used to process returning clause and on conflict clause. For every > processed tuple it's reset in ExecProcessReturning() and > ExecOnConflictUpdate(). When a query has both RETURNING and ON > CONFLICT clauses, the reset happens twice and the first one of those > might reset memory used by the other. For some reason this doesn't > show up on HEAD, but is apparent when virtual tuple table slots, which > do not copy the datums in its own memory, are used for tuples returned > by RETURNING clause. > > This is fix for a query failing in sql/insert_conflict.sql > > insert into insertconflicttest as i values (23, 'Avocado') on conflict (key) do update set fruit = excluded.*::text > returning *; > > Ashutosh Bapat, per suggestion by Andres Freund Doesn't this have to be earlier in the series? Phew, sorry if some things are daft, I'm kinda jetlagged... - Andres
On Fri, Aug 24, 2018 at 6:46 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2018-08-20 19:51:33 +0530, Ashutosh Bapat wrote: >> Sorry, forgot about that. Here's the patch set with that addressed. > > Btw, you attach files as tar.zip, but they're actually gzip > compressed... I am using "tar -zcvf" to create the files where -z indicates gzip. But if I use .gz as extension for some reason it's not able to recognize the format. So, I just keep .zip. Do you see any problem with that extension? The attached file has extension .tar.gz. > > >> From 838a463646a048b3dccff95079a514fdc86effb3 Mon Sep 17 00:00:00 2001 >> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> >> Date: Mon, 13 Aug 2018 11:27:57 +0530 >> Subject: [PATCH 01/11] Split ExecStoreTuple into ExecStoreHeapTuple and >> ExecStoreBufferHeapTuple >> >> ExecStoreTuple() accepts a heap tuple from a buffer or constructed >> on-the-fly. In the first case the caller passed a valid buffer and in >> the later case it passes InvalidBuffer. In the first case, >> ExecStoreTuple() pins the given buffer and in the later case it >> records shouldFree flag. The function has some extra checks to >> differentiate between the two cases. The usecases never overlap thus >> spending extra cycles in checks is useless. Hence separate these >> usecases into separate functions ExecStoreHeapTuple() to store >> on-the-fly tuple and ExecStoreBufferHeapTuple() to store an on-disk >> tuple from a buffer. This allows to shave some extra cycles while >> storing a tuple in the slot. > > It doesn't *yet* allow shaving extra cycles, no? By segregating those two functions we are already saving some cycles by not checking for a valid buffer for an on the fly tuple. That's not huge but some. More saving will come with the complete abstraction. > >> * SLOT ACCESSORS >> * ExecSetSlotDescriptor - set a slot's tuple descriptor >> - * ExecStoreTuple - store a physical tuple in the slot >> + * ExecStoreHeapTuple - store an on-the-fly heap tuple in the slot >> + * ExecStoreBufferHeapTuple - store an on-disk heap tuple in the slot >> * ExecStoreMinimalTuple - store a minimal physical tuple in the slot >> * ExecClearTuple - clear contents of a slot >> * ExecStoreVirtualTuple - mark slot as containing a virtual >> * tuple > > I'd advocate for a separate patch ripping these out, they're almost > always out of date. Done. Added as 0001 patch. > >> + * Return value is just the passed-in slot pointer. >> + * -------------------------------- >> + */ >> +TupleTableSlot * >> +ExecStoreHeapTuple(HeapTuple tuple, >> + TupleTableSlot *slot, >> + bool shouldFree) >> +{ >> + >> + return slot; >> +} > > Uh, there could very well be a buffer previously stored in the slot, no? > This can't currently be applied independently afaict. Sorry, I missed that when splitting the function. Added code to unpin any buffer that was pinned for an earlier tuple. > >> From c4c55bc0d501400ffca2e7393039b2d38660cd2d Mon Sep 17 00:00:00 2001 >> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> >> Date: Mon, 13 Aug 2018 11:27:57 +0530 >> Subject: [PATCH 02/11] tts_nvalid in TupleTableSlot is AttrNumber > >> @@ -125,7 +125,7 @@ typedef struct TupleTableSlot >> MemoryContext tts_mcxt; /* slot itself is in this context */ >> Buffer tts_buffer; /* tuple's buffer, or InvalidBuffer */ >> #define FIELDNO_TUPLETABLESLOT_NVALID 9 >> - int tts_nvalid; /* # of valid values in tts_values */ >> + AttrNumber tts_nvalid; /* # of valid values in tts_values */ >> #define FIELDNO_TUPLETABLESLOT_VALUES 1 >> Datum *tts_values; /* current per-attribute values */ >> #define FIELDNO_TUPLETABLESLOT_ISNULL 11 > > Shouldn't this be adapting at least a *few* more things, like > slot_getattr's argument? That's not something I had in mind for this patch. There are many other places where we declare attribute number variables as "int" instead of "AttrNumber". I don't think that's what we should do in this patch set. The idea of this patch is to reduce the size of TupleTableSlot by 2 bytes. Changing signatures of the functions, though has some readability improvement, doesn't achieve any such benefit. Even then I went ahead and change signatures of those functions, but server built with LLVM crashed. So, I have reverted back all those changes. I think we should change the JIT code automatically when function signatures/structure definitions change OR at least compilation should fail indicating the difference between JIT code and normal code. Investigating a crash takes a hell lot of time and is not easy without a custom LLVM build. I doubt if every PostgreSQL developer would want to do that. > >> /* >> - * Fill in missing values for a TupleTableSlot. >> - * >> - * This is only exposed because it's needed for JIT compiled tuple >> - * deforming. That exception aside, there should be no callers outside of this >> - * file. >> - */ >> -void >> -slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum) >> -{ >> - AttrMissing *attrmiss = NULL; >> - int missattnum; >> - >> - if (slot->tts_tupleDescriptor->constr) >> - attrmiss = slot->tts_tupleDescriptor->constr->missing; >> - >> - if (!attrmiss) >> - { >> - /* no missing values array at all, so just fill everything in as NULL */ >> - memset(slot->tts_values + startAttNum, 0, >> - (lastAttNum - startAttNum) * sizeof(Datum)); >> - memset(slot->tts_isnull + startAttNum, 1, >> - (lastAttNum - startAttNum) * sizeof(bool)); >> - } >> - else >> - { >> - /* if there is a missing values array we must process them one by one */ >> - for (missattnum = startAttNum; >> - missattnum < lastAttNum; >> - missattnum++) >> - { >> - slot->tts_values[missattnum] = attrmiss[missattnum].am_value; >> - slot->tts_isnull[missattnum] = !attrmiss[missattnum].am_present; >> - } >> - } >> -} > > I would split out these moves into a separate commit, they are trivially > committable separately. The commit's pretty big already, and that'd make > it easier to see the actual differences. I have included some of those movements in the same patch which changes the type of tts_nvalid. Functions like slot_attisnull() are changed to be inline static functions in tuptable.h. Those functions are now inline since they are just wrapper around slot specific callbacks. Thus without the TupleTableSlot abstraction patch it's not possible to mark them "inline and thus move them to a header file e.g. tuptable.h. For now I have left these kinds of movements in TupleTableSlot abstraction patch. > > >> - >> -/* >> * heap_compute_data_size >> * Determine size of the data area of a tuple to be constructed >> */ >> @@ -1407,10 +1370,9 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc, >> * re-computing information about previously extracted attributes. >> * slot->tts_nvalid is the number of attributes already extracted. >> */ >> -static void >> -slot_deform_tuple(TupleTableSlot *slot, int natts) >> +void >> +slot_deform_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, int natts) >> { > > This should be renamed to include "heap" in the name, as it's not going > to be usable for, say, zheap. LGTM. Done. > > >> -/* >> - * slot_getsysattr >> - * This function fetches a system attribute of the slot's current tuple. >> - * Unlike slot_getattr, if the slot does not contain system attributes, >> - * this will return false (with a NULL attribute value) instead of >> - * throwing an error. >> - */ >> -bool >> -slot_getsysattr(TupleTableSlot *slot, int attnum, >> - Datum *value, bool *isnull) >> -{ >> - HeapTuple tuple = slot->tts_tuple; >> - >> - Assert(attnum < 0); /* else caller error */ >> - if (tuple == NULL || >> - tuple == &(slot->tts_minhdr)) >> - { >> - /* No physical tuple, or minimal tuple, so fail */ >> - *value = (Datum) 0; >> - *isnull = true; >> - return false; >> - } >> - *value = heap_getsysattr(tuple, attnum, slot->tts_tupleDescriptor, isnull); >> - return true; >> -} > > I think I was wrong at saying that we should remove this. I think you > were right that it should become a callback... We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you want to reinstantiate those as well? If so, slot_getsysattr() becomes a wrapper around getsysattr() callback. > > >> +/* >> + * This is a function used by all getattr() callbacks which deal with a heap >> + * tuple or some tuple format which can be represented as a heap tuple e.g. a >> + * minimal tuple. >> + * >> + * heap_getattr considers any attnum beyond the attributes available in the >> + * tuple as NULL. This function however returns the values of missing >> + * attributes from the tuple descriptor in that case. Also this function does >> + * not support extracting system attributes. >> + * >> + * If the attribute needs to be fetched from the tuple, the function fills in >> + * tts_values and tts_isnull arrays upto the required attnum. >> + */ >> +Datum >> +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, >> + int attnum, bool *isnull) >> +{ >> + HeapTupleHeader tup = tuple->t_data; >> + Assert(slot->tts_nvalid < attnum); >> + >> + Assert(attnum > 0); >> + >> + if (attnum > HeapTupleHeaderGetNatts(tup)) >> + return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull); >> + >> + /* >> + * check if target attribute is null: no point in groveling through tuple >> + */ >> + if (HeapTupleHasNulls(tuple) && att_isnull(attnum - 1, tup->t_bits)) >> + { >> + *isnull = true; >> + return (Datum) 0; >> + } > > I still think this is an optimization with a negative benefit, > especially as it requires an extra callback. We should just rely on > slot_deform_tuple and then access that. That'll also just access the > null bitmap for the relevant column, and it'll make successive accesses > cheaper. I don't understand why we have differing implementations for slot_attisnull(), slot_getsomeattrs(), slot_getattr() in HEAD. If what you are saying is true, we should have implemented all the first and last as a call to slot_getsomeattrs() followed by returing values from tts_values and tts_isnull. Since this is refactoring work, I am trying not to change the existing functionality of those functions. > >> @@ -2883,7 +2885,7 @@ CopyFrom(CopyState cstate) >> if (slot == NULL) /* "do nothing" */ >> skip_tuple = true; >> else /* trigger might have changed tuple */ >> - tuple = ExecMaterializeSlot(slot); >> + tuple = ExecFetchSlotTuple(slot, true); >> } > > Could we do the Materialize vs Fetch vs Copy change separately? > Ok. I will do that. > >> diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c >> index e1eb7c3..9957c70 100644 >> --- a/src/backend/commands/matview.c >> +++ b/src/backend/commands/matview.c >> @@ -484,7 +484,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self) >> * get the heap tuple out of the tuple table slot, making sure we have a >> * writable copy >> */ >> - tuple = ExecMaterializeSlot(slot); >> + tuple = ExecCopySlotTuple(slot); >> >> heap_insert(myState->transientrel, >> tuple, >> @@ -494,6 +494,9 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self) >> >> /* We know this is a newly created relation, so there are no indexes */ >> >> + /* Free the copied tuple. */ >> + heap_freetuple(tuple); >> + >> return true; >> } > > This'll potentially increase a fair amount of extra allocation overhead, > no? Yes. There are two ways to avoid that 1. Using a slot type specific for the transient rel which is receiving the tuple OR 2. Allocate memory for the converted tuple into per tuple memory context. The first case too will have the same problem since we will free one tuple at at time in ExecClearSlot(). I am inclined to use second option. Comments? > > >> diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c >> index 9d6e25a..1b4e726 100644 >> --- a/src/backend/executor/execExprInterp.c >> +++ b/src/backend/executor/execExprInterp.c >> @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) >> >> EEO_CASE(EEOP_INNER_SYSVAR) >> { >> - int attnum = op->d.var.attnum; >> - Datum d; >> - >> - /* these asserts must match defenses in slot_getattr */ >> - Assert(innerslot->tts_tuple != NULL); >> - Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr)); >> - >> - /* heap_getsysattr has sufficient defenses against bad attnums */ >> - d = heap_getsysattr(innerslot->tts_tuple, attnum, >> - innerslot->tts_tupleDescriptor, >> - op->resnull); >> - *op->resvalue = d; >> + ExecEvalSysVar(state, op, econtext, innerslot); > > Please split this out into a separate patch. > Ok. Will do (not in the attached patch-set) > >> +/* >> + * TupleTableSlotOps implementation for BufferHeapTupleTableSlot. >> + */ >> + >> +static void >> +tts_buffer_init(TupleTableSlot *slot) >> +{ >> +} > > Should rename these to buffer_heap or such. Ok. Done. > >> +/* >> + * Store the given tuple into the given BufferHeapTupleTableSlot and pin the >> + * given buffer. If the tuple already contained in the slot can be freed free >> + * it. >> + */ >> +static void >> +tts_buffer_store_tuple(TupleTableSlot *slot, HeapTuple tuple, Buffer buffer) >> +{ >> + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; >> + >> + if (IS_TTS_SHOULDFREE(slot)) >> + { >> + /* >> + * A heap tuple stored in a BufferHeapTupleTableSlot should have a >> + * buffer associated with it, unless it's materialized. >> + */ >> + Assert(!BufferIsValid(bslot->buffer)); >> + >> + heap_freetuple(bslot->base.tuple); >> + RESET_TTS_SHOULDFREE(slot); >> + } >> + >> + RESET_TTS_EMPTY(slot); >> + slot->tts_nvalid = 0; >> + bslot->base.tuple = tuple; >> + bslot->base.off = 0; >> + >> + /* >> + * If tuple is on a disk page, keep the page pinned as long as we hold a >> + * pointer into it. We assume the caller already has such a pin. >> + * >> + * This is coded to optimize the case where the slot previously held a >> + * tuple on the same disk page: in that case releasing and re-acquiring >> + * the pin is a waste of cycles. This is a common situation during >> + * seqscans, so it's worth troubling over. >> + */ >> + if (bslot->buffer != buffer) >> + { >> + if (BufferIsValid(bslot->buffer)) >> + ReleaseBuffer(bslot->buffer); >> + bslot->buffer = buffer; >> + IncrBufferRefCount(buffer); >> + } >> +} > > This needs to also support storing a non-buffer tuple, I think. I don't see a reason why that's needed. A non-buffer tuple should be stored in a HeapTupleTableSlot. It might happen that ExecMaterializeSlot() creates a copy of the buffer tuple in the memory context of the slot. But there should be a buffer associated with the heap tuple being stored in BufferHeapTupleTableSlot. Otherwise, why are we separating those two types. In fact, as discussed earlier offline, I am of the opinion that buffer should be an attribute of every tuple table slot which may carry tuple from a buffer, instead of creating two slot types with and without buffer. OTOH, I observe that tts_buffer_store_tuple() is called only from ExecStoreBufferHeapTuple(). We may want to merge the first into the later. > >> +/* >> + * TupleTableSlotOps for each of TupleTableSlotTypes. These are used to >> + * identify the type of slot. >> + */ >> +const TupleTableSlotOps TTSOpsVirtual = { >> + sizeof(TupleTableSlot), >> + tts_virtual_init, >> + tts_virtual_release, >> + tts_virtual_clear, >> + tts_virtual_getsomeattrs, >> + tts_virtual_attisnull, >> + tts_virtual_getattr, >> + tts_virtual_materialize, >> + tts_virtual_copyslot, >> + >> + /* >> + * A virtual tuple table slot can not "own" a heap tuple or a minimal >> + * tuple. >> + */ >> + NULL, >> + NULL, >> + tts_virtual_copy_heap_tuple, >> + tts_virtual_copy_minimal_tuple, >> +}; > > As we're now going to require C99, could you convert these into > designated initializer style (i.e. .init = tts_heap_init etc)? That's a good idea. Done. > >> @@ -353,25 +1285,9 @@ ExecStoreHeapTuple(HeapTuple tuple, >> Assert(slot != NULL); >> Assert(slot->tts_tupleDescriptor != NULL); >> >> - /* >> - * Free any old physical tuple belonging to the slot. >> - */ >> - if (IS_TTS_SHOULDFREE(slot)) >> - heap_freetuple(slot->tts_tuple); >> - if (IS_TTS_SHOULDFREEMIN(slot)) >> - heap_free_minimal_tuple(slot->tts_mintuple); >> - >> - /* >> - * Store the new tuple into the specified slot. >> - */ >> - RESET_TTS_EMPTY(slot); >> - shouldFree ? SET_TTS_SHOULDFREE(slot) : RESET_TTS_SHOULDFREE(slot); >> - RESET_TTS_SHOULDFREEMIN(slot); >> - slot->tts_tuple = tuple; >> - slot->tts_mintuple = NULL; >> - >> - /* Mark extracted state invalid */ >> - slot->tts_nvalid = 0; >> + if (!TTS_IS_HEAPTUPLE(slot)) >> + elog(ERROR, "trying to store a heap tuple into wrong type of slot"); >> + tts_heap_store_tuple(slot, tuple, shouldFree); >> >> return slot; >> } > > This should allow for buffer tuples too. Same answer as the answer to your comment on tts_buffer_store_tuple(). > >> @@ -983,9 +1595,10 @@ ExecInitExtraTupleSlot(EState *estate, TupleDesc tupledesc) >> * ---------------- >> */ >> TupleTableSlot * >> -ExecInitNullTupleSlot(EState *estate, TupleDesc tupType) >> +ExecInitNullTupleSlot(EState *estate, TupleDesc tupType, >> + const TupleTableSlotOps *tts_cb) >> { >> - TupleTableSlot *slot = ExecInitExtraTupleSlot(estate, tupType); >> + TupleTableSlot *slot = ExecInitExtraTupleSlot(estate, tupType, tts_cb); >> >> return ExecStoreAllNullTuple(slot); >> } > > It's a bit weird that the type name is *Ops but the param is tts_cb, no? Your original patch used the same names and I haven't changed that. I am fine with tts_ops as well. Is that good enough? > > >> @@ -1590,7 +1590,8 @@ ExecHashTableInsert(HashJoinTable hashtable, >> TupleTableSlot *slot, >> uint32 hashvalue) >> { >> - MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot); >> + bool shouldFree; >> + MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, &shouldFree); >> int bucketno; >> int batchno; >> >> @@ -1664,6 +1665,9 @@ ExecHashTableInsert(HashJoinTable hashtable, >> hashvalue, >> &hashtable->innerBatchFile[batchno]); >> } >> + >> + if (shouldFree) >> + heap_free_minimal_tuple(tuple); >> } > > Hm, how about splitting these out? Split into a separate patch? It doesn't make sense to add this patch before 0006 since the slots in those patches can "own" a minimal tuple. Let's add a patch after 0006 i.e. tuple table abstraction patch. Will do. > > >> @@ -277,10 +277,32 @@ ExecInsert(ModifyTableState *mtstate, >> OnConflictAction onconflict = node->onConflictAction; >> >> /* >> - * get the heap tuple out of the tuple table slot, making sure we have a >> - * writable copy >> + * Get the heap tuple out of the tuple table slot, making sure we have a >> + * writable copy. >> + * >> + * If the slot can contain a heap tuple, materialize the tuple within the >> + * slot itself so that the slot "owns" it and any changes to the tuple >> + * reflect in the slot as well. >> + * >> + * Otherwise, store the copy of the heap tuple in es_dml_input_tuple_slot, which >> + * is assumed to be able to hold a heap tuple, so that repeated requests >> + * for heap tuple in the following code will not create multiple copies and >> + * leak memory. Also keeping the tuple in the slot makes sure that it will >> + * be freed when it's no more needed either because a trigger modified it >> + * or when we are done processing it. >> */ >> - tuple = ExecMaterializeSlot(slot); >> + if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))) >> + { >> + TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot; >> + >> + Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot)); >> + if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor) >> + ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor); >> + ExecCopySlot(es_slot, slot); >> + slot = es_slot; >> + } >> + >> + tuple = ExecFetchSlotTuple(slot, true); > > I strongly dislike this. Could you look at my pluggable storage tree > and see whether you could do something similar here? Can you please provide the URL to your tree? > > >> @@ -2402,8 +2454,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) >> */ >> mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists); >> >> - /* Set up a slot for the output of the RETURNING projection(s) */ >> - ExecInitResultTupleSlotTL(estate, &mtstate->ps); >> + /* >> + * Set up a slot for the output of the RETURNING projection(s). >> + * ExecDelete() requies the contents of the slot to be >> + * saved/materialized, so use heap tuple table slot for a DELETE. >> + * Otherwise a virtual tuple table slot suffices. >> + */ >> + ExecInitResultTupleSlotTL(estate, &mtstate->ps, >> + operation == CMD_DELETE ? >> + &TTSOpsHeapTuple : &TTSOpsVirtual); >> slot = mtstate->ps.ps_ResultTupleSlot; > > I'm not clear on why this this the case? Do you mean why does ExecDelete() materializes the tuple? I don't know. But my guess is materialization ensures valid tuple being returned in case the deleted tuple's contents are destroyed. > >> /* Need a >> diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c >> index ecdbe7f..ea2858b 100644 >> --- a/src/backend/executor/tqueue.c >> +++ b/src/backend/executor/tqueue.c >> @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self) >> TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self; >> HeapTuple tuple; >> shm_mq_result result; >> + bool tuple_copied = false; >> + >> + /* Get the tuple out of slot, if necessary converting the slot's contents >> + * into a heap tuple by copying. In the later case we need to free the copy. >> + */ >> + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)) >> + { >> + tuple = ExecFetchSlotTuple(slot, true); >> + tuple_copied = false; >> + } >> + else >> + { >> + tuple = ExecCopySlotTuple(slot); >> + tuple_copied = true; >> + } > > To me needing this if() here is a bad sign, I think we want a > ExecFetchSlotTuple* like api with a bool *shouldFree arg, like you did > for minimal tuples instead. If we change the existing API, we will need to handle shouldFree flag everywhere ExecFetchSlotTuple() is being called. Most of the callers of ExecFetchSlotTuple pass tuple table slots which can "own" a heap tuple. Your suggestion to use a separate ExecFetchSlotTuple() like API looks good.. But I am not sure how useful it will be if there's going to be a single caller of that API. > >> diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c >> index 66cc5c3..b44438b 100644 >> --- a/src/backend/tcop/pquery.c >> +++ b/src/backend/tcop/pquery.c >> @@ -1071,7 +1071,7 @@ RunFromStore(Portal portal, ScanDirection direction, uint64 count, >> uint64 current_tuple_count = 0; >> TupleTableSlot *slot; >> >> - slot = MakeSingleTupleTableSlot(portal->tupDesc); >> + slot = MakeSingleTupleTableSlot(portal->tupDesc, &TTSOpsMinimalTuple); >> >> dest->rStartup(dest, CMD_SELECT, portal->tupDesc); > > These fairly rote changes make it really hard to see the actual meat of > the changes in the patch. I think one way to address that would be to > introduce stub &TTSOps* variables, add them to MakeSingleTupleTableSlot > etc, but still have them return the "old style" slots. Then that can be > reviewed separately. This means that we have to define TTSOps variables before the actual abstraction patch and then redefine those in the abstraction patch (0006). Instead we could separate changes to the slot initialization functions in a separate patch which comes after the abstraction proper patch. The abstraction proper patch always creates a virtual tuple table slot. We will need to compile, test and commit both the patches together but reviewing becomes easier. Also the tuple table abstraction patch doesn't pass regession on its own, it requires some patches that follow it. So it should be fine. Comments? > >> @@ -1375,9 +1376,9 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS) >> * previous row available for comparisons. This is accomplished by >> * swapping the slot pointer variables after each row. >> */ >> - extraslot = MakeSingleTupleTableSlot(osastate->qstate->tupdesc); >> + extraslot = MakeSingleTupleTableSlot(osastate->qstate->tupdesc, >> + &TTSOpsMinimalTuple); >> slot2 = extraslot; >> - >> /* iterate till we find the hypothetical row */ >> while (tuplesort_gettupleslot(osastate->sortstate, true, true, slot, >> &abbrevVal)) > > superflous change. Done. > > > > >> diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c >> index 5560a3e..ba98908 100644 >> --- a/src/backend/utils/sort/tuplestore.c >> +++ b/src/backend/utils/sort/tuplestore.c >> @@ -1073,6 +1073,13 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward, >> * pointer to a tuple held within the tuplestore. The latter is more >> * efficient but the slot contents may be corrupted if additional writes to >> * the tuplestore occur. (If using tuplestore_trim, see comments therein.) >> + * >> + * If the given slot can not contain a minimal tuple, the given minimal tuple >> + * is converted into the form that the given slot can contain. Irrespective of >> + * the value of copy, that conversion might need to create a copy. If >> + * should_free is set to true by tuplestore_gettuple(), the minimal tuple is >> + * freed after the conversion, if necessary. Right now, the function only >> + * supports slots of type HeapTupleTableSlot, other than MinimalTupleTableSlot. >> */ >> bool >> tuplestore_gettupleslot(Tuplestorestate *state, bool forward, >> @@ -1085,12 +1092,25 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward, >> >> if (tuple) >> { >> - if (copy && !should_free) >> + if (TTS_IS_MINIMALTUPLE(slot)) >> { >> - tuple = heap_copy_minimal_tuple(tuple); >> + if (copy && !should_free) >> + { >> + tuple = heap_copy_minimal_tuple(tuple); >> + should_free = true; >> + } >> + ExecStoreMinimalTuple(tuple, slot, should_free); >> + } >> + else if (TTS_IS_HEAPTUPLE(slot)) >> + { >> + HeapTuple htup = heap_tuple_from_minimal_tuple(tuple); >> + >> + if (should_free) >> + heap_free_minimal_tuple(tuple); >> should_free = true; >> + >> + ExecStoreHeapTuple(htup, slot, should_free); >> } >> - ExecStoreMinimalTuple(tuple, slot, should_free); >> return true; >> } >> else > > Why is this a good idea? Shouldn't the caller always have a minimal slot > here? As I have explained in the commit message, not every caller has a minimal slot right now. For example, AfterTriggerExecute fetches foreign tuples from a tuple store as minimal tuples and requires those to be in the form of heap tuple for trigger processing. We can use two slots, first a minimal for fetching tuple from tuple store and second a heap for trigger processing and convert from minimal tuple to heap tuple using ExecCopySlotTuple(). That increases the number of slots. > This is problematic, because it means this'd need adapting for > every new slot type, which'd be kinda against the idea of the whole > thing... I don't think we will need to support more slot types in than what's there right now. But in case we have to we have two more options as below. Amit Khandekar suggested offlist to invoke a (new) slot type specific callback which will convert given minimal tuple into the tuple type of the slot. Third option is to write tuplestore_gettupleslot* functions one for each kind of TupleTableSlot type. I don't have much preference for one over the other. > >> +#define TTS_IS_VIRTUAL(slot) ((slot)->tts_cb == &TTSOpsVirtual) >> +#define TTS_IS_HEAPTUPLE(slot) ((slot)->tts_cb == &TTSOpsHeapTuple) >> +#define TTS_IS_MINIMALTUPLE(slot) ((slot)->tts_cb == &TTSOpsMinimalTuple) >> +#define TTS_IS_BUFFERTUPLE(slot) ((slot)->tts_cb == &TTSOpsBufferTuple) >> + >> +extern Datum ExecFetchSlotTupleDatum(TupleTableSlot *slot); > > this is a weird place for the function protoype? Fixed. > > >> +/* TupleTableSlotType specific routines */ >> +typedef struct TupleTableSlotOps >> +{ >> + /* Minimum size of the slot */ >> + size_t base_slot_size; >> + >> + /* Initialization. */ >> + void (*init)(TupleTableSlot *slot); >> + >> + /* Destruction. */ >> + void (*release)(TupleTableSlot *slot); >> + >> + /* >> + * Clear the contents of the slot. Only the contents are expected to be >> + * cleared and not the tuple descriptor. Typically an implementation of >> + * this callback should free the memory allocated for the tuple contained >> + * in the slot. >> + */ >> + void (*clear)(TupleTableSlot *slot); >> + >> + /* >> + * Fill up first natts entries of tts_values and tts_isnull arrays with >> + * values from the tuple contained in the slot. The function may be called >> + * with natts more than the number of attributes available in the tuple, in >> + * which case it should fill up as many entries as the number of available >> + * attributes. The callback should update tts_nvalid with number of entries >> + * filled up. >> + */ >> + void (*getsomeattrs)(TupleTableSlot *slot, int natts); >> + >> + /* >> + * Returns true if the attribute given by attnum is NULL, return false >> + * otherwise. Some slot types may have more efficient methods to return >> + * NULL-ness of a given attribute compared to checking NULL-ness after >> + * calling getsomeattrs(). So this is a separate callback. We expect this >> + * callback to be invoked by slot_attisnull() only. That function returns >> + * if the information is available readily e.g. in tts_isnull array. The >> + * callback need not repeat the same. >> + */ >> + bool (*attisnull)(TupleTableSlot *slot, int attnum); >> + >> + /* >> + * Returns value of the given attribute as a datum and sets isnull to >> + * false, if it's not NULL. If the attribute is NULL, it sets isnull to >> + * true. Some slot types may have more efficient methods to return value of >> + * a given attribute rather than returning the attribute value from >> + * tts_values and tts_isnull after calling getsomeattrs(). So this is a >> + * separate callback. We expect this callback to be invoked by >> + * slot_getattr() only. That function returns if the information is >> + * available readily e.g. in tts_values and tts_isnull array or can be >> + * inferred from tuple descriptor. The callback need not repeat the same. >> + */ >> + Datum (*getattr)(TupleTableSlot *slot, int attnum, bool *isnull); > > These two really show go. I have explained my position for this while replying to an earlier comment. > > >> +/* virtual or base type */ >> +struct TupleTableSlot >> { >> NodeTag type; >> #define FIELDNO_TUPLETABLESLOT_FLAGS 1 >> - uint16 tts_flags; /* Boolean states */ >> -#define FIELDNO_TUPLETABLESLOT_TUPLE 2 >> - HeapTuple tts_tuple; /* physical tuple, or NULL if virtual */ >> -#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 3 >> - TupleDesc tts_tupleDescriptor; /* slot's tuple descriptor */ >> - MemoryContext tts_mcxt; /* slot itself is in this context */ >> - Buffer tts_buffer; /* tuple's buffer, or InvalidBuffer */ >> -#define FIELDNO_TUPLETABLESLOT_NVALID 6 >> + uint16 tts_flags; >> +#define FIELDNO_TUPLETABLESLOT_NVALID 2 >> AttrNumber tts_nvalid; /* # of valid values in tts_values */ >> -#define FIELDNO_TUPLETABLESLOT_VALUES 7 >> + >> + const TupleTableSlotOps *const tts_cb; >> +#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4 >> + TupleDesc tts_tupleDescriptor; /* slot's tuple descriptor */ >> +#define FIELDNO_TUPLETABLESLOT_VALUES 5 >> Datum *tts_values; /* current per-attribute values */ >> -#define FIELDNO_TUPLETABLESLOT_ISNULL 8 >> +#define FIELDNO_TUPLETABLESLOT_ISNULL 6 >> bool *tts_isnull; /* current per-attribute isnull flags */ >> - MinimalTuple tts_mintuple; /* minimal tuple, or NULL if none */ >> - HeapTupleData tts_minhdr; /* workspace for minimal-tuple-only case */ >> -#define FIELDNO_TUPLETABLESLOT_OFF 11 >> - uint32 tts_off; /* saved state for slot_deform_tuple */ >> -} TupleTableSlot; >> >> -#define TTS_HAS_PHYSICAL_TUPLE(slot) \ >> - ((slot)->tts_tuple != NULL && (slot)->tts_tuple != &((slot)->tts_minhdr)) >> + /* can we optimize away? */ >> + MemoryContext tts_mcxt; /* slot itself is in this context */ >> +}; > > the "can we" comment above is pretty clearly wrong, I think (Yes, I made > it...). > Removed. > > >> From 84046126d5b8c9d780cb7134048cc717bda462a8 Mon Sep 17 00:00:00 2001 >> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> >> Date: Mon, 13 Aug 2018 11:27:57 +0530 >> Subject: [PATCH 07/11] Reset expression context just after resetting per tuple >> context in ExecModifyTable(). >> >> Expression context saved in ps_ExprContext for ModifyTable node is >> used to process returning clause and on conflict clause. For every >> processed tuple it's reset in ExecProcessReturning() and >> ExecOnConflictUpdate(). When a query has both RETURNING and ON >> CONFLICT clauses, the reset happens twice and the first one of those >> might reset memory used by the other. For some reason this doesn't >> show up on HEAD, but is apparent when virtual tuple table slots, which >> do not copy the datums in its own memory, are used for tuples returned >> by RETURNING clause. >> >> This is fix for a query failing in sql/insert_conflict.sql >> >> insert into insertconflicttest as i values (23, 'Avocado') on conflict (key) do update set fruit = excluded.*::text >> returning *; >> >> Ashutosh Bapat, per suggestion by Andres Freund > > Doesn't this have to be earlier in the series? I have placed it later since the tuple table slot abstraction work exposes the bug, which otherwise is not visible. But yes, we can commit that patch before the tuple table slot abstraction proper work. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Man, how I dislike patches in tarballs. 0002 says: + * shouldFree is set 'true' since a tuple stored on a disk page should not be + * pfree'd. Surely you mean 'false' :-) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 28 August 2018 at 22:43, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Fri, Aug 24, 2018 at 6:46 AM, Andres Freund <andres@anarazel.de> wrote: >> >>> -/* >>> - * slot_getsysattr >>> - * This function fetches a system attribute of the slot's current tuple. >>> - * Unlike slot_getattr, if the slot does not contain system attributes, >>> - * this will return false (with a NULL attribute value) instead of >>> - * throwing an error. >>> - */ >>> -bool >>> -slot_getsysattr(TupleTableSlot *slot, int attnum, >>> - Datum *value, bool *isnull) >>> -{ >>> - HeapTuple tuple = slot->tts_tuple; >>> - >>> - Assert(attnum < 0); /* else caller error */ >>> - if (tuple == NULL || >>> - tuple == &(slot->tts_minhdr)) >>> - { >>> - /* No physical tuple, or minimal tuple, so fail */ >>> - *value = (Datum) 0; >>> - *isnull = true; >>> - return false; >>> - } >>> - *value = heap_getsysattr(tuple, attnum, slot->tts_tupleDescriptor, isnull); >>> - return true; >>> -} >> >> I think I was wrong at saying that we should remove this. I think you >> were right that it should become a callback... > > We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you > want to reinstantiate those as well? If so, slot_getsysattr() becomes > a wrapper around getsysattr() callback. One option is that the getsysattr() callback function returns false if system attributes are not supported for that slot type. Other option is that in the not-supported case, the function errors out, meaning that the caller should be aware that the slot type is the one that supports system attributes. I had prepared changes for the first option, but Ashutosh Bapat offlist made me realize that it's worth considering the 2nd option( i.e. erroring out). The only use case where slot_getsysattr() is called right now is execCurrentOf(), and it currently checks the bool return value of slot_getsysattr() and prints a user-friendly message if false is returned. Looking at the code (commit 8f5ac440430ab), it seems that we want to continue to have a user-friendly message for some unhandled cases like custom scans. So perhaps for now it's ok to go with the first option where getsysattr callback returns false for slot types that don't have system attributes. > >> >> >>> +/* >>> + * This is a function used by all getattr() callbacks which deal with a heap >>> + * tuple or some tuple format which can be represented as a heap tuple e.g. a >>> + * minimal tuple. >>> + * >>> + * heap_getattr considers any attnum beyond the attributes available in the >>> + * tuple as NULL. This function however returns the values of missing >>> + * attributes from the tuple descriptor in that case. Also this function does >>> + * not support extracting system attributes. >>> + * >>> + * If the attribute needs to be fetched from the tuple, the function fills in >>> + * tts_values and tts_isnull arrays upto the required attnum. >>> + */ >>> +Datum >>> +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, >>> + int attnum, bool *isnull) >>> +{ >>> + HeapTupleHeader tup = tuple->t_data; >>> + Assert(slot->tts_nvalid < attnum); >>> + >>> + Assert(attnum > 0); >>> + >>> + if (attnum > HeapTupleHeaderGetNatts(tup)) >>> + return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull); >>> + >>> + /* >>> + * check if target attribute is null: no point in groveling through tuple >>> + */ >>> + if (HeapTupleHasNulls(tuple) && att_isnull(attnum - 1, tup->t_bits)) >>> + { >>> + *isnull = true; >>> + return (Datum) 0; >>> + } >> >> I still think this is an optimization with a negative benefit, >> especially as it requires an extra callback. We should just rely on >> slot_deform_tuple and then access that. That'll also just access the >> null bitmap for the relevant column, and it'll make successive accesses >> cheaper. > > I don't understand why we have differing implementations for > slot_attisnull(), slot_getsomeattrs(), slot_getattr() in HEAD. If what > you are saying is true, we should have implemented all the first and > last as a call to slot_getsomeattrs() followed by returing values from > tts_values and tts_isnull. Since this is refactoring work, I am trying > not to change the existing functionality of those functions. I agree that we should not change the way in which slot_getattr() finds the attr; i.e. first call att_isnull(), and only then try to deform the tuple. I mean there must be some good reason that is done on HEAD. Maybe we can change this separately after investigation, but not as part of the tuple abstraction patch. ---------- BTW, on HEAD, for dropped attribute slot_getattr() returns null datum, which hasn't been done in the patch series. That needs to be done.
Hi, On 2018-08-31 10:05:05 +0530, Amit Khandekar wrote: > On 28 August 2018 at 22:43, Ashutosh Bapat > >> I think I was wrong at saying that we should remove this. I think you > >> were right that it should become a callback... > > > We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you > > want to reinstantiate those as well? If so, slot_getsysattr() becomes > > a wrapper around getsysattr() callback. Right. > One option is that the getsysattr() callback function returns false if > system attributes are not supported for that slot type. Other option > is that in the not-supported case, the function errors out, meaning > that the caller should be aware that the slot type is the one that > supports system attributes. > I had prepared changes for the first option, but Ashutosh Bapat > offlist made me realize that it's worth considering the 2nd option( > i.e. erroring out). I think we should error out. > >> I still think this is an optimization with a negative benefit, > >> especially as it requires an extra callback. We should just rely on > >> slot_deform_tuple and then access that. That'll also just access the > >> null bitmap for the relevant column, and it'll make successive accesses > >> cheaper. > > > > I don't understand why we have differing implementations for > > slot_attisnull(), slot_getsomeattrs(), slot_getattr() in HEAD. If what > > you are saying is true, we should have implemented all the first and > > last as a call to slot_getsomeattrs() followed by returing values from > > tts_values and tts_isnull. Since this is refactoring work, I am trying > > not to change the existing functionality of those functions. > > I agree that we should not change the way in which slot_getattr() > finds the attr; i.e. first call att_isnull(), and only then try to > deform the tuple. I mean there must be some good reason that is done > on HEAD. Maybe we can change this separately after investigation, but > not as part of the tuple abstraction patch. There's really no good reason for the behaviour as it exists on HEAD. It already can cause worse performance there. The price to pay for continuing to have an optimization which isn't actually optimizing anything is way too high if it requires us to have multiple functionally unnecessary callbacks. Greetings, Andres Freund
On 28 August 2018 at 22:43, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Fri, Aug 24, 2018 at 6:46 AM, Andres Freund <andres@anarazel.de> wrote: > >> >>> @@ -2883,7 +2885,7 @@ CopyFrom(CopyState cstate) >>> if (slot == NULL) /* "do nothing" */ >>> skip_tuple = true; >>> else /* trigger might have changed tuple */ >>> - tuple = ExecMaterializeSlot(slot); >>> + tuple = ExecFetchSlotTuple(slot, true); >>> } >> >> Could we do the Materialize vs Fetch vs Copy change separately? >> > > Ok. I will do that. Ashutosh offlist has shared with me 0006-Rethink-ExecMaterializeSlot-ExecFetchSlotTuple-in-th.patch which contains the separated changes. The attached tar includes this additional patch. >> >>> @@ -1590,7 +1590,8 @@ ExecHashTableInsert(HashJoinTable hashtable, >>> TupleTableSlot *slot, >>> uint32 hashvalue) >>> { >>> - MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot); >>> + bool shouldFree; >>> + MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, &shouldFree); >>> int bucketno; >>> int batchno; >>> >>> @@ -1664,6 +1665,9 @@ ExecHashTableInsert(HashJoinTable hashtable, >>> hashvalue, >>> &hashtable->innerBatchFile[batchno]); >>> } >>> + >>> + if (shouldFree) >>> + heap_free_minimal_tuple(tuple); >>> } >> >> Hm, how about splitting these out? > > Split into a separate patch? It doesn't make sense to add this patch > before 0006 since the slots in those patches can "own" a minimal > tuple. Let's add a patch after 0006 i.e. tuple table abstraction > patch. Will do. Ashutosh offlist has shared with me 0009-Rethink-ExecFetchSlotMinimalTuple.patch which contains the above separated changes. The attached tar includes this additional patch. On 31 August 2018 at 20:50, Andres Freund <andres@anarazel.de> wrote: >> On 31 August 2018 at 10:05, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > Hi, > > On 2018-08-31 10:05:05 +0530, Amit Khandekar wrote: >> On 28 August 2018 at 22:43, Ashutosh Bapat >> >> I think I was wrong at saying that we should remove this. I think you >> >> were right that it should become a callback... >> >> > We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you >> > want to reinstantiate those as well? If so, slot_getsysattr() becomes >> > a wrapper around getsysattr() callback. > > Right. > >> One option is that the getsysattr() callback function returns false if >> system attributes are not supported for that slot type. Other option >> is that in the not-supported case, the function errors out, meaning >> that the caller should be aware that the slot type is the one that >> supports system attributes. > >> I had prepared changes for the first option, but Ashutosh Bapat >> offlist made me realize that it's worth considering the 2nd option( >> i.e. erroring out). > > I think we should error out. I did these changes in the existing 0007-Restructure-TupleTableSlot... patch. This patch now contains changes for the new getsysattr() callback. I used the 2nd option : erroring out. On 31 August 2018 at 10:05, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > The only use case where slot_getsysattr() is called right now is > execCurrentOf(), and it currently checks the bool return value of > slot_getsysattr() and prints a user-friendly message if false is > returned. Looking at the code (commit 8f5ac440430ab), it seems that we > want to continue to have a user-friendly message for some unhandled > cases like custom scans. So perhaps for now it's ok to go with the > first option where getsysattr callback returns false for slot types > that don't have system attributes. I noticed that the issue for which commit 8f5ac440430ab was done was that the tid was attempted to be fetched from a tuple that doesn't support system attribute (virtual tuple), although the report complained about a non-user-friendly message being given. So similarly for custom scan, since it is not handled similarly, for now we can continue to give an "unfriendly" message. The getsysattr() callbacks will return something like "virtual tuple table slot does not have system atttributes" if the slot does not support system attributes. The attached v11 tar has the above set of changes. Thanks -Amit Khandekar
Attachment
On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote: > The attached v11 tar has the above set of changes. - I've pushed 0003 - although that commit seems to have included a lot of things unrelated to the commit message. I guess two patches have accidentally been merged? Could you split out the part of the changes that was mis-squashed? - I've pushed an extended version of 0001. - I've pushed 0002, after some minor polishing - I've pushed 0004 Greetings, Andres Freund
Hi, On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote: > Subject: [PATCH 05/14] Use tts_flags instead of previous bool members > > Pack the boolean members in TupleTableSlot into a 16 bit tts_flags. > This reduces the size of TupleTableSlot since each bool member takes > at least a byte on the platforms where bool is defined as a char. > > Ashutosh Bapat and Andres Freund > + > +/* true = slot is empty */ > +#define TTS_ISEMPTY (1 << 1) > +#define IS_TTS_EMPTY(slot) ((slot)->tts_flags & TTS_ISEMPTY) > +#define SET_TTS_EMPTY(slot) ((slot)->tts_flags |= TTS_ISEMPTY) > +#define RESET_TTS_EMPTY(slot) ((slot)->tts_flags &= ~TTS_ISEMPTY) The flag stuff is the right way, but I'm a bit dubious about the accessor functions. I can see open-coding the accesses, using the macros, or potentially going towards using bitfields. Any comments? - Andres
On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote: > The attached v11 tar has the above set of changes. > Subject: [PATCH 07/14] Restructure TupleTableSlot to allow tuples other than > HeapTuple > + > +/* > + * This is a function used by all getattr() callbacks which deal with a heap > + * tuple or some tuple format which can be represented as a heap tuple e.g. a > + * minimal tuple. > + * > + * heap_getattr considers any attnum beyond the attributes available in the > + * tuple as NULL. This function however returns the values of missing > + * attributes from the tuple descriptor in that case. Also this function does > + * not support extracting system attributes. > + * > + * If the attribute needs to be fetched from the tuple, the function fills in > + * tts_values and tts_isnull arrays upto the required attnum. > + */ > +Datum > +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, > + int attnum, bool > *isnull) I'm still *vehemently* opposed to the introduction of this. > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo, > Datum iDatum; > bool isNull; > > - if (keycol != 0) > + if (keycol < 0) > + { > + HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot; > + > + /* Only heap tuples have system attributes. */ > + Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)); > + > + iDatum = heap_getsysattr(hslot->tuple, keycol, > + slot->tts_tupleDescriptor, > + &isNull); > + } > + else if (keycol != 0) > { > /* > * Plain index column; get the value we need directly from the This now should access the system column via the slot, right? There's other places like this IIRC. > diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c > index 9d6e25a..1b4e726 100644 > --- a/src/backend/executor/execExprInterp.c > +++ b/src/backend/executor/execExprInterp.c > @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) > > EEO_CASE(EEOP_INNER_SYSVAR) > { > - int attnum = op->d.var.attnum; > - Datum d; > - > - /* these asserts must match defenses in slot_getattr */ > - Assert(innerslot->tts_tuple != NULL); > - Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr)); > - > - /* heap_getsysattr has sufficient defenses against bad attnums */ > - d = heap_getsysattr(innerslot->tts_tuple, attnum, > - innerslot->tts_tupleDescriptor, > - op->resnull); > - *op->resvalue = d; > + ExecEvalSysVar(state, op, econtext, innerslot); These changes should be in a separate commit. > +const TupleTableSlotOps TTSOpsHeapTuple = { > + sizeof(HeapTupleTableSlot), > + .init = tts_heap_init, The first field should also use a named field (same in following cases). > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable, /* tuple table */ > { > TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc); > > + slot->tts_cb->release(slot); > /* Always release resources and reset the slot to empty */ > ExecClearTuple(slot); > if (slot->tts_tupleDescriptor) > @@ -240,6 +1076,7 @@ void > ExecDropSingleTupleTableSlot(TupleTableSlot *slot) > { > /* This should match ExecResetTupleTable's processing of one slot */ > + slot->tts_cb->release(slot); > Assert(IsA(slot, TupleTableSlot)); > ExecClearTuple(slot); > if (slot->tts_tupleDescriptor) ISTM that release should be called *after* clearing the slot. > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self) > TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self; > HeapTuple tuple; > shm_mq_result result; > + bool tuple_copied = false; > + > + /* Get the tuple out of slot, if necessary converting the slot's contents > + * into a heap tuple by copying. In the later case we need to free the copy. > + */ > + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)) > + { > + tuple = ExecFetchSlotTuple(slot, true); > + tuple_copied = false; > + } > + else > + { > + tuple = ExecCopySlotTuple(slot); > + tuple_copied = true; > + } This seems like a bad idea to me. We shouldn't hardcode slots like this. I've previously argued that we should instead allow ExecFetchSlotTuple() for all types of tuples, but add a new bool *shouldFree argument, which will then allow the caller to free the tuple. We gain zilch by having this kind of logic in multiple callers. > From 280eac5c6758061d0a46b7ef9dd324e9a23226b5 Mon Sep 17 00:00:00 2001 > From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> > Date: Fri, 31 Aug 2018 10:53:42 +0530 > Subject: [PATCH 08/14] Change tuple table slot creation routines to suite > tuple table slot abstraction > > This change requires ExecInitResultTupleSlotTL, ExecInitScanTupleSlot, > ExecCreateScanSlotFromOuterPlan, ExecInitNullTupleSlot, > ExecInitExtraTupleSlot, MakeTupleTableSlot, ExecAllocTableSlot to > accept TupleTableSlotType as a new parameter. Change all their calls. > > Ashutosh Bapat and Andres Freund > > This by itself won't compile. Neither the tuple table slot abstraction > patch would compile and work without this change. Both of those need > to be committed together. I don't like this kind of split - all commits should individually compile. I think we should instead introduce dummy / empty structs for &TTSOpsHeapTuple etc, and add the parameters necessary to pass them through. And then move this patch to *before* the "core" abstract slot patch. That way every commit, but the super verbose stuff is still split out. > From 06d31f91831a1da78d56e4217f08d3866c7be6f8 Mon Sep 17 00:00:00 2001 > From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> > Date: Fri, 31 Aug 2018 11:00:30 +0530 > Subject: [PATCH 09/14] Rethink ExecFetchSlotMinimalTuple() > > Before this work, a TupleTableSlot could "own" a minimal tuple as > well. Thus ExecFetchSlotMinimalTuple() returned a minimal tuple after > constructing and "owning" it if it was not readily available. When the > slot "owned" the minimal tuple, the memory consumed by the tuple was > freed when a new tuple was stored in it or the slot was cleared. > > With this work, not all TupleTableSlot types can "own" a minimal > tuple. When fetching a minimal tuple from a slot that can not "own" a > tuple, memory is allocated to construct the minimal tuple, which needs > to be freed explicitly. Hence ExecFetchSlotMinimalTuple() is modified > to flag the caller whether it should free the memory consumed by the > returned minimal tuple. > > Right now only a MinimalTupleTableSlot can own a minimal tuple. But we > may change that in future or a user defined TupleTableSlot type (added > through an extension) may be able to "own" a minimal tuple in it. > Hence instead of relying upon TTS_IS_MINIMAL() macro to tell us > whether the TupleTableSlot can "own" a minimal tuple or not, we rely > on the set of callbacks. A TupleTableSlot which can hold a minimal > tuple implements get_minimal_tuple callback. Other TupleTableSlot > types leave the callback NULL. > > The minimal tuple returned by this function is usually copied into a > hash table or a file. But, unlike ExecFetchSlotTuple() it's never > written to. Hence the difference between signatures of the two > functions. I'm somewhat inclined to think this should be done in an earlier patch, before the main "abstract slot" work. Ok, gotta switch to a smaller patch for a bit ;) Greetings, Andres Freund
At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund <andres@anarazel.de> wrote in <20180925234509.3hrrf6tmvy5tfith@alap3.anarazel.de> > Hi, > > On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote: > > Subject: [PATCH 05/14] Use tts_flags instead of previous bool members > > > > Pack the boolean members in TupleTableSlot into a 16 bit tts_flags. > > This reduces the size of TupleTableSlot since each bool member takes > > at least a byte on the platforms where bool is defined as a char. > > > > Ashutosh Bapat and Andres Freund > > > + > > +/* true = slot is empty */ > > +#define TTS_ISEMPTY (1 << 1) > > +#define IS_TTS_EMPTY(slot) ((slot)->tts_flags & TTS_ISEMPTY) > > +#define SET_TTS_EMPTY(slot) ((slot)->tts_flags |= TTS_ISEMPTY) > > +#define RESET_TTS_EMPTY(slot) ((slot)->tts_flags &= ~TTS_ISEMPTY) > > The flag stuff is the right way, but I'm a bit dubious about the > accessor functions. I can see open-coding the accesses, using the > macros, or potentially going towards using bitfields. > > Any comments? Currently we have few setter/resetter function(macro)s for such simple operations. FWIW open-coding in the following two looks rather easier to me. > if (IS_TTS_EMPTY(slot)) > if (slot->tts_flags & TTS_ISEMPTY) About bitfields, an advantage of it is debugger awareness. We don't need to look aside to the definitions of bitwise macros while using debugger. And the current code is preserved in appearance by using it. > if (slot->tts_isempty) > slot->tts_isempty = true; In short, +1 from me to use bitfields. Coulnd't we use bitfield here, possiblly in other places then? ===== Not related to tuple slots, in other places, like infomask, we handle a set of bitmap values altogether. > infomask = tuple->t_data->t_infomask; Bare bitfields are a bit inconvenient for the use. It gets simpler using C11 anonymous member but not so bothersome even in C99. Anyway I don't think we jump into that immediately. infomask.all = tuple->t_data->t_infomask.all; - if (!HeapTupleHeaderXminCommitted(tuple)) C99> if (tuple->t_infomask.b.xmin_committed) C11> if (tuple->t_infomask.xmin_committed) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund <andres@anarazel.de> wrote in <20180925234509.3hrrf6tmvy5tfith@alap3.anarazel.de> >> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote: >>> Pack the boolean members in TupleTableSlot into a 16 bit tts_flags. >>> This reduces the size of TupleTableSlot since each bool member takes >>> at least a byte on the platforms where bool is defined as a char. > About bitfields, an advantage of it is debugger awareness. We > don't need to look aside to the definitions of bitwise macros > while using debugger. And the current code is preserved in > appearance by using it. FWIW, I would expect a change like this to be a net loss performance-wise on most platforms. Testing the contents of a byte-wide variable is pretty cheap on any architecture invented later than ~ 1970. Testing a bit, though, requires a masking operation that is not free. I am not seeing how making TupleTableSlot a little smaller buys that back ... we don't normally have that many active slots in a plan. regards, tom lane
I have only done the below two changes yet. After doing that and rebasing with latest master, in the regression I got crashes, and I suspect the reason being that I have used Virtual tuple slot for the destination slot of execute_attr_map_slot(). I am analyzing it. I am anyway attaching the patches (v12) to give you an idea of how I have handled the below two items. On Wed, 26 Sep 2018 at 05:09, Andres Freund <andres@anarazel.de> wrote: > > On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote: > > The attached v11 tar has the above set of changes. > > - I've pushed 0003 - although that commit seems to have included a lot > of things unrelated to the commit message. I guess two patches have > accidentally been merged? Could you split out the part of the changes > that was mis-squashed? Yeah, it indeed looks like it had unrelated things, mostly the changes that moved the slot attribute functions into execTuples.c . I have included this change as the very first patch in the patch series. >> From 280eac5c6758061d0a46b7ef9dd324e9a23226b5 Mon Sep 17 00:00:00 2001 >> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> >> Date: Fri, 31 Aug 2018 10:53:42 +0530 >> Subject: [PATCH 08/14] Change tuple table slot creation routines to suite >> tuple table slot abstraction >> >> This change requires ExecInitResultTupleSlotTL, ExecInitScanTupleSlot, >> ExecCreateScanSlotFromOuterPlan, ExecInitNullTupleSlot, >> ExecInitExtraTupleSlot, MakeTupleTableSlot, ExecAllocTableSlot to >> accept TupleTableSlotType as a new parameter. Change all their calls. >> >> Ashutosh Bapat and Andres Freund >> >> This by itself won't compile. Neither the tuple table slot abstraction >> patch would compile and work without this change. Both of those need >> to be committed together. > > I don't like this kind of split - all commits should individually > compile. I think we should instead introduce dummy / empty structs for > &TTSOpsHeapTuple etc, and add the parameters necessary to pass them > through. And then move this patch to *before* the "core" abstract slot > patch. That way every commit, but the super verbose stuff is still > split out. > Done. Moved this patch before the core one. In this patch, just declared the global variables of type TupleTableSlotOps, without initializing them. I have tried to make sure individual patches compile successfully by shuffling around the changes to the right patch, but there is one particular patch that still gives error. Will fix that later. I will handle the other review comments in the next patch series.
Attachment
On Thu, 4 Oct 2018 at 22:59, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > I have only done the below two changes yet. After doing that and > rebasing with latest master, in the regression I got crashes, and I > suspect the reason being that I have used Virtual tuple slot for the > destination slot of execute_attr_map_slot(). I am analyzing it. I am > anyway attaching the patches (v12) to give you an idea of how I have > handled the below two items. It seems to be some corruption here : @@ -956,17 +978,39 @@ ExecUpdate(ModifyTableState *mtstate, - tuple = ExecMaterializeSlot(slot); + if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))) + { + TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot; + + Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot)); + if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor) + ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor); + ExecCopySlot(es_slot, slot); + slot = es_slot; + } + + tuple = ExecFetchSlotTuple(slot, true); After the slotification of partition tuple conversion, the input slot is a virtual tuple, and the above code seems to result in some corruption which I have not finished analyzing. It only happens for INSERT ON CONFLICT case with partitions. On Wed, 26 Sep 2018 at 05:35, Andres Freund <andres@anarazel.de> wrote: > > > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable, /* tuple table */ > > { > > TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc); > > > > + slot->tts_cb->release(slot); > > /* Always release resources and reset the slot to empty */ > > ExecClearTuple(slot); > > if (slot->tts_tupleDescriptor) > > @@ -240,6 +1076,7 @@ void > > ExecDropSingleTupleTableSlot(TupleTableSlot *slot) > > { > > /* This should match ExecResetTupleTable's processing of one slot */ > > + slot->tts_cb->release(slot); > > Assert(IsA(slot, TupleTableSlot)); > > ExecClearTuple(slot); > > if (slot->tts_tupleDescriptor) > > ISTM that release should be called *after* clearing the slot. I am not sure what was release() designed to do. Currently all of the implementers of this function are empty. Was it meant for doing ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or ReleaseBuffer(bslot->buffer) ? I think the purpose of keeping this *before* clearing the tuple might be because the clear() might have already cleared some handles that release() might need. > > > > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self) > > TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self; > > HeapTuple tuple; > > shm_mq_result result; > > + bool tuple_copied = false; > > + > > + /* Get the tuple out of slot, if necessary converting the slot's contents > > + * into a heap tuple by copying. In the later case we need to free the copy. > > + */ > > + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)) > > + { > > + tuple = ExecFetchSlotTuple(slot, true); > > + tuple_copied = false; > > + } > > + else > > + { > > + tuple = ExecCopySlotTuple(slot); > > + tuple_copied = true; > > + } > > This seems like a bad idea to me. We shouldn't hardcode slots like > this. I've previously argued that we should instead allow > ExecFetchSlotTuple() for all types of tuples, but add a new bool > *shouldFree argument, which will then allow the caller to free the > tuple. We gain zilch by having this kind of logic in multiple callers. How about having a separate ExecFetchSlotHeapTuple() for many of the callers where it is known that the tuple is a heap/buffer tuple ? And in rare places such as above where slot type is not known, we can have ExecFetchSlotTuple() which would have an extra shouldFree parameter.
On Wed, 26 Sep 2018 at 05:35, Andres Freund <andres@anarazel.de> wrote: > > + > > +/* > > + * This is a function used by all getattr() callbacks which deal with a heap > > + * tuple or some tuple format which can be represented as a heap tuple e.g. a > > + * minimal tuple. > > + * > > + * heap_getattr considers any attnum beyond the attributes available in the > > + * tuple as NULL. This function however returns the values of missing > > + * attributes from the tuple descriptor in that case. Also this function does > > + * not support extracting system attributes. > > + * > > + * If the attribute needs to be fetched from the tuple, the function fills in > > + * tts_values and tts_isnull arrays upto the required attnum. > > + */ > > +Datum > > +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, > > + int attnum, bool > > *isnull) > > I'm still *vehemently* opposed to the introduction of this. You mean, you want to remove the att_isnull() optimization, right ? Removed that code now. Directly deforming the tuple regardless of the null attribute. > > > > > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo, > > Datum iDatum; > > bool isNull; > > > > - if (keycol != 0) > > + if (keycol < 0) > > + { > > + HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot; > > + > > + /* Only heap tuples have system attributes. */ > > + Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)); > > + > > + iDatum = heap_getsysattr(hslot->tuple, keycol, > > + slot->tts_tupleDescriptor, > > + &isNull); > > + } > > + else if (keycol != 0) > > { > > /* > > * Plain index column; get the value we need directly from the > > This now should access the system column via the slot, right? There's > other places like this IIRC. Done. In FormIndexDatum() and ExecInterpExpr(), directly calling slot_getsysattr() now. In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am planning to remove this definition since it would be a single line function just calling slot_getsysattr(). In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I haven't removed the definition yet. I am planning to create a new LLVMValueRef FuncSlotGetsysattr, and use that instead, in build_ExecEvalSysVar(), or for that matter, I am thinking to revert back build_ExecEvalSysVar() and instead have that code inline as in HEAD. > > > > > diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c > > index 9d6e25a..1b4e726 100644 > > --- a/src/backend/executor/execExprInterp.c > > +++ b/src/backend/executor/execExprInterp.c > > @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) > > > > EEO_CASE(EEOP_INNER_SYSVAR) > > { > > - int attnum = op->d.var.attnum; > > - Datum d; > > - > > - /* these asserts must match defenses in slot_getattr */ > > - Assert(innerslot->tts_tuple != NULL); > > - Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr)); > > - > > - /* heap_getsysattr has sufficient defenses against bad attnums */ > > - d = heap_getsysattr(innerslot->tts_tuple, attnum, > > - innerslot->tts_tupleDescriptor, > > - op->resnull); > > - *op->resvalue = d; > > + ExecEvalSysVar(state, op, econtext, innerslot); > > These changes should be in a separate commit. As mentioned above, now I am not using ExecEvalSysVar(), instead, I am calling slot_getsysattr(). So no need of separate commit for that. > > > > +const TupleTableSlotOps TTSOpsHeapTuple = { > > + sizeof(HeapTupleTableSlot), > > + .init = tts_heap_init, > > The first field should also use a named field (same in following cases). Done. > > > > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable, /* tuple table */ > > { > > TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc); > > > > + slot->tts_cb->release(slot); > > /* Always release resources and reset the slot to empty */ > > ExecClearTuple(slot); > > if (slot->tts_tupleDescriptor) > > @@ -240,6 +1076,7 @@ void > > ExecDropSingleTupleTableSlot(TupleTableSlot *slot) > > { > > /* This should match ExecResetTupleTable's processing of one slot */ > > + slot->tts_cb->release(slot); > > Assert(IsA(slot, TupleTableSlot)); > > ExecClearTuple(slot); > > if (slot->tts_tupleDescriptor) > > ISTM that release should be called *after* clearing the slot. I am copying here what I discussed about this in the earlier reply: I am not sure what was release() designed to do. Currently all of the implementers of this function are empty. Was it meant for doing ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or ReleaseBuffer(bslot->buffer) ? I think the purpose of keeping this *before* clearing the tuple might be because the clear() might have already cleared some handles that release() might need. > > > > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self) > > TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self; > > HeapTuple tuple; > > shm_mq_result result; > > + bool tuple_copied = false; > > + > > + /* Get the tuple out of slot, if necessary converting the slot's contents > > + * into a heap tuple by copying. In the later case we need to free the copy. > > + */ > > + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)) > > + { > > + tuple = ExecFetchSlotTuple(slot, true); > > + tuple_copied = false; > > + } > > + else > > + { > > + tuple = ExecCopySlotTuple(slot); > > + tuple_copied = true; > > + } > > This seems like a bad idea to me. We shouldn't hardcode slots like > this. I've previously argued that we should instead allow > ExecFetchSlotTuple() for all types of tuples, but add a new bool > *shouldFree argument, which will then allow the caller to free the > tuple. We gain zilch by having this kind of logic in multiple callers. Done. See more comments below. > > > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self) > > > TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self; > > > HeapTuple tuple; > > > shm_mq_result result; > > > + bool tuple_copied = false; > > > + > > > + /* Get the tuple out of slot, if necessary converting the slot's contents > > > + * into a heap tuple by copying. In the later case we need to free the copy. > > > + */ > > > + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)) > > > + { > > > + tuple = ExecFetchSlotTuple(slot, true); > > > + tuple_copied = false; > > > + } > > > + else > > > + { > > > + tuple = ExecCopySlotTuple(slot); > > > + tuple_copied = true; > > > + } > > > > This seems like a bad idea to me. We shouldn't hardcode slots like > > this. I've previously argued that we should instead allow > > ExecFetchSlotTuple() for all types of tuples, but add a new bool > > *shouldFree argument, which will then allow the caller to free the > > tuple. We gain zilch by having this kind of logic in multiple callers. > > How about having a separate ExecFetchSlotHeapTuple() for many of the > callers where it is known that the tuple is a heap/buffer tuple ? And > in rare places such as above where slot type is not known, we can have > ExecFetchSlotTuple() which would have an extra shouldFree parameter. I went ahead and did these changes, but for now, I haven't replaced ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I retained ExecFetchSlotTuple() to be called for heap tuples, and added a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't like these names, but until we have concluded, I don't want to go ahead and replace all the numerous ExecFetchSlotTuple() calls with ExecFetchSlotHeapTuple(). Have used ExecFetchGenericSlotTuple() for ExecBRInsertTriggers() and ExecIRInsertTriggers() as well, because in CopyFrom(), now with partition mps, the input slot to these functions can be a virtual slot. > > rebasing with latest master, in the regression I got crashes, and I > > suspect the reason being that I have used Virtual tuple slot for the > > destination slot of execute_attr_map_slot(). I am analyzing it. I am > > anyway attaching the patches (v12) to give you an idea of how I have > > handled the below two items. > > It seems to be some corruption here : > > @@ -956,17 +978,39 @@ ExecUpdate(ModifyTableState *mtstate, > > - tuple = ExecMaterializeSlot(slot); > > + if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))) > + { > + TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot; > + > + Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot)); > + if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor) > + ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor); > + ExecCopySlot(es_slot, slot); > + slot = es_slot; > + } > + > + tuple = ExecFetchSlotTuple(slot, true); > > After the slotification of partition tuple conversion, the input slot > is a virtual tuple, and the above code seems to result in some > corruption which I have not finished analyzing. It only happens for > INSERT ON CONFLICT case with partitions. In ExecInsert() and ExecUpdate(), when it's a partition with different column order, the input slot is a slot returned by execute_attr_map_slot(), i.e. a virtual slot. So, in ExecUpdate() the input slot->tts_values[] datums may even point to existing contents of estate->es_dml_input_tuple_slot. This happens when the set clause of UPDATE uses excluded.* columns : (From insert_conflict.sql) insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b; And ExecSetSlotDescriptor() is called *before* extracting the tuple out of the virtual slot, so this function clears the tuple of es_dml_input_tuple_slot, and so the input slot datums pointing to this slot tuple become dangling pointers. Basically, we should extract the tuple out of the input slot as early as possible. I now have a new function ExecCopySlotWithTupDesc() which extracts the tuple using dstslot's memory context, and only then changes the tuple descriptor. Effectively, the dstlot has a materialized tuple. Used this function for both ExecUpdate() and ExecInsert(). On Fri, 5 Oct 2018 at 20:28, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > From 06d31f91831a1da78d56e4217f08d3866c7be6f8 Mon Sep 17 00:00:00 2001 > > From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> > > Date: Fri, 31 Aug 2018 11:00:30 +0530 > > Subject: [PATCH 09/14] Rethink ExecFetchSlotMinimalTuple() > > > > Before this work, a TupleTableSlot could "own" a minimal tuple as > > well. Thus ExecFetchSlotMinimalTuple() returned a minimal tuple after > > constructing and "owning" it if it was not readily available. When the > > slot "owned" the minimal tuple, the memory consumed by the tuple was > > freed when a new tuple was stored in it or the slot was cleared. > > > > With this work, not all TupleTableSlot types can "own" a minimal > > tuple. When fetching a minimal tuple from a slot that can not "own" a > > tuple, memory is allocated to construct the minimal tuple, which needs > > to be freed explicitly. Hence ExecFetchSlotMinimalTuple() is modified > > to flag the caller whether it should free the memory consumed by the > > returned minimal tuple. > > > > Right now only a MinimalTupleTableSlot can own a minimal tuple. But we > > may change that in future or a user defined TupleTableSlot type (added > > through an extension) may be able to "own" a minimal tuple in it. > > Hence instead of relying upon TTS_IS_MINIMAL() macro to tell us > > whether the TupleTableSlot can "own" a minimal tuple or not, we rely > > on the set of callbacks. A TupleTableSlot which can hold a minimal > > tuple implements get_minimal_tuple callback. Other TupleTableSlot > > types leave the callback NULL. > > > > The minimal tuple returned by this function is usually copied into a > > hash table or a file. But, unlike ExecFetchSlotTuple() it's never > > written to. Hence the difference between signatures of the two > > functions. > > I'm somewhat inclined to think this should be done in an earlier patch, > before the main "abstract slot" work. Yet to do this. There is still one more regression test failure in polygon.sql which I am yet to analyze. Also, there are some more pending points of yours that I haven't addressed yet. Attached is v13 patch series. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Attachment
On Wed, 26 Sep 2018 at 05:15, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote: > > Subject: [PATCH 05/14] Use tts_flags instead of previous bool members > > > > Pack the boolean members in TupleTableSlot into a 16 bit tts_flags. > > This reduces the size of TupleTableSlot since each bool member takes > > at least a byte on the platforms where bool is defined as a char. > > > > Ashutosh Bapat and Andres Freund > > > + > > +/* true = slot is empty */ > > +#define TTS_ISEMPTY (1 << 1) > > +#define IS_TTS_EMPTY(slot) ((slot)->tts_flags & TTS_ISEMPTY) > > +#define SET_TTS_EMPTY(slot) ((slot)->tts_flags |= TTS_ISEMPTY) > > +#define RESET_TTS_EMPTY(slot) ((slot)->tts_flags &= ~TTS_ISEMPTY) > > The flag stuff is the right way, but I'm a bit dubious about the > accessor functions. I can see open-coding the accesses, using the > macros, or potentially going towards using bitfields. > > Any comments? I like this abstraction using macros, since this will allow us to conveniently change the way this information is stored inside the slot. I think for this same reason we have defined macros (HeapTupleIsHotUpdated, etc) for each bit of heaptuple infomask. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Hi, On 2018-10-01 22:21:58 -0400, Tom Lane wrote: > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > > At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund <andres@anarazel.de> wrote in <20180925234509.3hrrf6tmvy5tfith@alap3.anarazel.de> > >> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote: > >>> Pack the boolean members in TupleTableSlot into a 16 bit tts_flags. > >>> This reduces the size of TupleTableSlot since each bool member takes > >>> at least a byte on the platforms where bool is defined as a char. > > > About bitfields, an advantage of it is debugger awareness. We > > don't need to look aside to the definitions of bitwise macros > > while using debugger. And the current code is preserved in > > appearance by using it. > > FWIW, I would expect a change like this to be a net loss performance-wise > on most platforms. Testing the contents of a byte-wide variable is pretty > cheap on any architecture invented later than ~ 1970. Testing a bit, > though, requires a masking operation that is not free. I am not seeing > how making TupleTableSlot a little smaller buys that back ... we don't > normally have that many active slots in a plan. I measured it as a speedup on x86-64, mainly because we require fewer instructions to reset a slot into an empty state, but also because there are fewer loads. Masking a register is just about free, loading from memory isn't, even if the cacheline is in L1. The other benefit is that this allows TupleTableSlots to fit into one cacheline more often, and that's noticable in cache miss rates. Greetings, Andres Freund
On 2018-10-09 20:46:04 +0530, Amit Khandekar wrote: > On Wed, 26 Sep 2018 at 05:35, Andres Freund <andres@anarazel.de> wrote: > > > + > > > +/* > > > + * This is a function used by all getattr() callbacks which deal with a heap > > > + * tuple or some tuple format which can be represented as a heap tuple e.g. a > > > + * minimal tuple. > > > + * > > > + * heap_getattr considers any attnum beyond the attributes available in the > > > + * tuple as NULL. This function however returns the values of missing > > > + * attributes from the tuple descriptor in that case. Also this function does > > > + * not support extracting system attributes. > > > + * > > > + * If the attribute needs to be fetched from the tuple, the function fills in > > > + * tts_values and tts_isnull arrays upto the required attnum. > > > + */ > > > +Datum > > > +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, > > > + int attnum, bool > > > *isnull) > > > > I'm still *vehemently* opposed to the introduction of this. > > You mean, you want to remove the att_isnull() optimization, right ? Yes. > Removed that code now. Directly deforming the tuple regardless of the > null attribute. Good, thanks. > > > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo, > > > Datum iDatum; > > > bool isNull; > > > > > > - if (keycol != 0) > > > + if (keycol < 0) > > > + { > > > + HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot; > > > + > > > + /* Only heap tuples have system attributes. */ > > > + Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)); > > > + > > > + iDatum = heap_getsysattr(hslot->tuple, keycol, > > > + slot->tts_tupleDescriptor, > > > + &isNull); > > > + } > > > + else if (keycol != 0) > > > { > > > /* > > > * Plain index column; get the value we need directly from the > > > > This now should access the system column via the slot, right? There's > > other places like this IIRC. > > Done. In FormIndexDatum() and ExecInterpExpr(), directly calling > slot_getsysattr() now. > > In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am > planning to remove this definition since it would be a single line > function just calling slot_getsysattr(). > > In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I > haven't removed the definition yet. I am planning to create a new > LLVMValueRef FuncSlotGetsysattr, and use that instead, in > build_ExecEvalSysVar(), or for that matter, I am thinking to revert > back build_ExecEvalSysVar() and instead have that code inline as in > HEAD. I'd just have ExecInterpExpr() continue calling ExecEvalSysVar. There's no reason for it to be inline. And it's simpler for JIT than the alternative. > > > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable, /* tuple table */ > > > { > > > TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc); > > > > > > + slot->tts_cb->release(slot); > > > /* Always release resources and reset the slot to empty */ > > > ExecClearTuple(slot); > > > if (slot->tts_tupleDescriptor) > > > @@ -240,6 +1076,7 @@ void > > > ExecDropSingleTupleTableSlot(TupleTableSlot *slot) > > > { > > > /* This should match ExecResetTupleTable's processing of one slot */ > > > + slot->tts_cb->release(slot); > > > Assert(IsA(slot, TupleTableSlot)); > > > ExecClearTuple(slot); > > > if (slot->tts_tupleDescriptor) > > > > ISTM that release should be called *after* clearing the slot. > > I am copying here what I discussed about this in the earlier reply: > > I am not sure what was release() designed to do. Currently all of the > implementers of this function are empty. So additional deallocations can happen in a slot. We might need this e.g. at some point for zheap which needs larger, longer-lived, buffers in slots. > Was it meant for doing > ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or > ReleaseBuffer(bslot->buffer) ? No. The former lives in generic code, the latter is in ClearTuple. > I think the purpose of keeping this *before* clearing the tuple might > be because the clear() might have already cleared some handles that > release() might need. It's just plainly wrong to call it this way round. > I went ahead and did these changes, but for now, I haven't replaced > ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I > retained ExecFetchSlotTuple() to be called for heap tuples, and added > a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't > like these names, but until we have concluded, I don't want to go > ahead and replace all the numerous ExecFetchSlotTuple() calls with > ExecFetchSlotHeapTuple(). Why not? Greetings, Andres Freund
On Sat, 13 Oct 2018 at 04:02, Andres Freund <andres@anarazel.de> wrote: > > > > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo, > > > > Datum iDatum; > > > > bool isNull; > > > > > > > > - if (keycol != 0) > > > > + if (keycol < 0) > > > > + { > > > > + HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot; > > > > + > > > > + /* Only heap tuples have system attributes. */ > > > > + Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)); > > > > + > > > > + iDatum = heap_getsysattr(hslot->tuple, keycol, > > > > + slot->tts_tupleDescriptor, > > > > + &isNull); > > > > + } > > > > + else if (keycol != 0) > > > > { > > > > /* > > > > * Plain index column; get the value we need directly from the > > > > > > This now should access the system column via the slot, right? There's > > > other places like this IIRC. > > > > Done. In FormIndexDatum() and ExecInterpExpr(), directly calling > > slot_getsysattr() now. > > > > In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am > > planning to remove this definition since it would be a single line > > function just calling slot_getsysattr(). > > > > In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I > > haven't removed the definition yet. I am planning to create a new > > LLVMValueRef FuncSlotGetsysattr, and use that instead, in > > build_ExecEvalSysVar(), or for that matter, I am thinking to revert > > back build_ExecEvalSysVar() and instead have that code inline as in > > HEAD. > > I'd just have ExecInterpExpr() continue calling ExecEvalSysVar. There's > no reason for it to be inline. Can you explain more why you think there should be a ExecEvalSysVar() definition ? As I mentioned earlier it would just call slot_getsysattr() and do nothing else. > And it's simpler for JIT than the alternative. You mean it would be simpler for JIT to call ExecEvalSysVar() than slot_getsysattr() ? I didn't get why it is simpler. Or are you talking considering build_ExecEvalSysVar() ? I am ok with retaining build_ExecEvalSysVar() , but I was saying even inside this function, we could do : LLVMBuildCall(.... , llvm_get_decl(mod, FuncSlotGetsysattr) , .....) rather than: LLVMFunctionType(,...) LLVMAddFunction("ExecEvalSysVar", ....) LLVMBuildCall(...) > > > > > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable, /* tuple table */ > > > > { > > > > TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc); > > > > > > > > + slot->tts_cb->release(slot); > > > > /* Always release resources and reset the slot to empty */ > > > > ExecClearTuple(slot); > > > > if (slot->tts_tupleDescriptor) > > > > @@ -240,6 +1076,7 @@ void > > > > ExecDropSingleTupleTableSlot(TupleTableSlot *slot) > > > > { > > > > /* This should match ExecResetTupleTable's processing of one slot */ > > > > + slot->tts_cb->release(slot); > > > > Assert(IsA(slot, TupleTableSlot)); > > > > ExecClearTuple(slot); > > > > if (slot->tts_tupleDescriptor) > > > > > > ISTM that release should be called *after* clearing the slot. > > > > I am copying here what I discussed about this in the earlier reply: > > > > I am not sure what was release() designed to do. Currently all of the > > implementers of this function are empty. > > So additional deallocations can happen in a slot. We might need this > e.g. at some point for zheap which needs larger, longer-lived, buffers > in slots. > > > Was it meant for doing > > ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or > > ReleaseBuffer(bslot->buffer) ? > > No. The former lives in generic code, the latter is in ClearTuple. > > > I think the purpose of keeping this *before* clearing the tuple might > > be because the clear() might have already cleared some handles that > > release() might need. > > It's just plainly wrong to call it this way round. Ok. > > > > I went ahead and did these changes, but for now, I haven't replaced > > ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I > > retained ExecFetchSlotTuple() to be called for heap tuples, and added > > a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't > > like these names, but until we have concluded, I don't want to go > > ahead and replace all the numerous ExecFetchSlotTuple() calls with > > ExecFetchSlotHeapTuple(). > > Why not? I haven't gone ahead because I wanted to know if you are ok with the names. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
On Tue, 9 Oct 2018 at 20:46, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > There is still one more regression test failure in polygon.sql which I > am yet to analyze. Below is a narrowed down testcase which reproduces the failure in polygon.sql : create table tab (n int, dist int, id integer); insert into tab values (1, 2, 3); -- Force a hash join set enable_nestloop TO f; set enable_mergejoin TO f; -- Expected to return a row SELECT * FROM tab t1 , tab t2 where t1.id = t2.id; n | dist | id | n | dist | id ---+------+----+---+------+---- (0 rows) In MultiExecPrivateHash(), to generate the hash table, the tuples are retrieved one by one from the scan of outer plan state. For each tuple, ExecHashGetHashValue() is called to get the join attribute value of the tuple. Here, when the attribute is retrieved by a jit-compiled slot-deforming function built by slot_compile_deform(), the attribute value is a junk value. So the hash join condition fails and the join returns no rows. Root cause : In llvm_compile_expr(), for the switch case : EEOP_INNER_FETCHSOME, innerPlanState(parent)->ps_ResultTupleSlot->tts_cb is passed to slot_compile_deform(). And in slot_compile_deform(), this tts_cb is used to determine whether the inner slot is a minimal slot or a buffer/heap tuple slot, and accordingly v_tupleheaderp is calculated using StructMinimalTupleTableSlot or StructHeapTupleTableSlot. In the above hash join scenario, the ps_ResultTupleSlot is a minimal tuple slot. But at runtime, when MultiExecPrivateHash()=>ExecHashGetHashValue() is called, the slot returned by outer node (Seqscan) is a buffer heap tuple slot; this is because the seq scan does not return using its ps_ResultTupleSlot, instead it directly returns its scan slot since there is no projection info needed. Hence the tuple is retrieved using a wrong offset inside the Tuple table slot, because the jit function was compiled assuming it's going to be a minimal tuple slot. So, although we can safely use innerPlanState(parent)->ps_ResultTupleSlot to get the tuple descriptor for slot_compile_deform(), we should not use the same tuple slot to know what kind of a tuple slot it will be. That can be known only at runtime. Possible Fix : I am thinking, in slot_compile_deform(), we need to include the logic instructions to determine the slot type. Have a new FIELDNO_TUPLETABLESLOT_OPS to retrieve TupleTableSlot.tts_cb, and then accordingly calculate the tuple offset. I am not sure if this will turn out to have a performance impact on jit execution, or whether it is feasible to do such conditional thing in llvm; trying to understand. Comments ? -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Hi, On 2018-10-15 12:12:03 +0530, Amit Khandekar wrote: > On Sat, 13 Oct 2018 at 04:02, Andres Freund <andres@anarazel.de> wrote: > > I'd just have ExecInterpExpr() continue calling ExecEvalSysVar. There's > > no reason for it to be inline. > Can you explain more why you think there should be a ExecEvalSysVar() > definition ? As I mentioned earlier it would just call > slot_getsysattr() and do nothing else. I think it's superflous to have three opcodes for this, and we should instead just have one sysvar instruction that chooses the slot based on a parameter of the opcode. Inline code in the interpreter isn't free. > > And it's simpler for JIT than the alternative. > > You mean it would be simpler for JIT to call ExecEvalSysVar() than > slot_getsysattr() ? I didn't get why it is simpler. Because there'd be no special code at all. We can just use the code that existing ExecEval* routines use. > Or are you talking considering build_ExecEvalSysVar() ? I am ok with > retaining build_ExecEvalSysVar() , but I was saying even inside this > function, we could do : > LLVMBuildCall(.... , llvm_get_decl(mod, FuncSlotGetsysattr) , .....) > rather than: > LLVMFunctionType(,...) > LLVMAddFunction("ExecEvalSysVar", ....) > LLVMBuildCall(...) That'd probably generate more work than it'd save, because llvm_get_decl requires that the function is present in llvmjit_types.c. > > > I went ahead and did these changes, but for now, I haven't replaced > > > ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I > > > retained ExecFetchSlotTuple() to be called for heap tuples, and added > > > a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't > > > like these names, but until we have concluded, I don't want to go > > > ahead and replace all the numerous ExecFetchSlotTuple() calls with > > > ExecFetchSlotHeapTuple(). > > > > Why not? > > I haven't gone ahead because I wanted to know if you are ok with the names. Just renaming a function is cheap, it's just a oneliner script ;) FWIW, I dislike ExecFetchGenericSlotTuple() as well. It's still a HeapTuple, there's absolutely nothing Generic about it. I don't see why we'd not just use ExecFetchSlotHeapTuple() with a new shouldFree parameter? Greetings, Andres Freund
Hi, Attached is a heavily revised version of the patchset developed in this thread. It's based on the last version posted by Amit Khandekar at [1]. The largest issue this resolves is that, for JITing, we need to know what kind of slot we're going to deal with. E.g. in contrast to right now, we do not want to generate bespoke deform function for virtual slots, and we need to be able to discern between slots we know to deform and slots where we don't. Up to now generated a deform function whenever we can figure out what the tupledesc for an EEOP_*_FETCHSOME operation is, using the fairly ugly hack for looking at PlanState->scanslot the scan desc, and the right/left tree's for INNER/OUTER. That kind of works as far as descriptors go, but doesn't for the slot types - it's far from uncommon that the slot type inside a node differs from the result type from the left/right tree. I've tried various approaches for this, and what I settled on is that every PlanState signals in new fields what its scan result type and slot type is, if known; normally that's automatic due to using ExecInitScanTupleSlot(), ExecInitResultSlot(). By default, the right / left (aka inner/outer) are inferred using ExecGetResultSlotOps on the subsidiary node, but can be overriden too. This gets rid of similar logic previously present in llvmjit_expr.c, by moving the relevant knowledge into ExprEvalOp.d.fetch. While this primarily benefits JITing, it also allows for future optimizations like eliding slot_getsomattr() / EEOP_*_FETCHSOME calls for Vars on slots known to be virtual (which is a large percentage, due to ExecProject() etc). To avoid bugs around this, I've added assertions that check that hte slot type is as expected. This, unsurprisingly, caught a number of bugs. While this approach isn't perfect (in particular, it adds a few new fields to PlanState), I've tried hard ot find other solutions, and didn't come up with anything that's meaningfully better. If anybody has better ideas, please call now. The second bigger change is that I've removed ExecFetchGenericSlotTuple - I really disliked having both that and ExecFetchSlotTuple. Now ExecFetchSlotTuple has a new *shouldFree parameter that indicates whether the returned value is allocated in the current context, or not. I've also renamed it to ExecFetchSlotHeapTuple. It's currently still used for places where we modify the returned tuple in-place (to set oid etc), but subsequent patches in the pluggable storage patchset, get rid of that too. I've done a bit of benchmarking (TPCH and pgench), and after some performance fixes the patchset either has no performance effect, or a very small performance benefit (when using JITing, it reduces the overhead). Further changes: - changed the split of patches, so they now all pass the tests, and individually make (some) sense. - virtual tuple table slots can now be materialized (by serializing !byval columns into a single allocation, that's then pointed to by tts_values). - The responsibility for filling missing columns is now again in slot_getsomeattrs, rather than the per-slot callbacks (which can still do so if desired, but normally I don't see much point). - lots of bugfixes - comment cleanups - performance improvements - Reduction in the number of callbacks. getattr, attisnull are gone. What I'm now planning to do is to go through the big comment in tuptable.h and update that to the new world. While I'm not convinced that that that's the best place for it, it needs to be accurate. Furthermore: - More comment polishing - I'll probably split the commits up a bit further (particulary JIT ignoring virtual tuple slots, inlining the hot path of slot_getsomeattrs()) - serious commit message polishing After this, I hope Amit Khandekar will rebase a patch he's sent me internally that converts triggers to use slots. I'll work on rebasing the pluggable storage patch ontop of this. [1] http://archives.postgresql.org/message-id/CAJ3gD9eq38XhLsLTie%2B3NHsCRkLO0xHLA4MQX_3sr6or7xws4Q%40mail.gmail.com Greetings, Andres Freund
Attachment
- v14-0001-Rejigger-materializing-and-fetching-a-HeapTuple-.patch
- v14-0002-Change-tuple-table-slot-creation-routines-to-sui.patch
- v14-0003-Use-separate-tuple-table-slot-to-return-tuples-f.patch
- v14-0004-Compute-information-about-EEOP_-_FETCHSOME-at-ex.patch
- v14-0005-Verify-that-expected-slot-types-match-returned-s.patch
- v14-0006-Restructure-TupleTableSlot-to-allow-tuples-other.patch
- v14-0007-Rationalize-expression-context-reset-in-ExecModi.patch
Hi, On 2018-11-13 15:30:21 -0800, Andres Freund wrote: > What I'm now planning to do is to go through the big comment in > tuptable.h and update that to the new world. While I'm not convinced > that that that's the best place for it, it needs to be accurate. > > Furthermore: > - More comment polishing > - I'll probably split the commits up a bit further (particulary JIT > ignoring virtual tuple slots, inlining the hot path of > slot_getsomeattrs()) > - serious commit message polishing I've done all of that now, and pushed it. Thanks Ashutosh, Amit Khandekar and everyone else. On to pluggable storage... Regards, Andres
On Wed, 14 Nov 2018 at 05:00, Andres Freund <andres@anarazel.de> wrote: > After this, I hope Amit Khandekar will rebase a patch he's sent me > internally that converts triggers to use slots. I'll work on rebasing > the pluggable storage patch ontop of this. Shared this patch in a separate mail thread : https://www.postgresql.org/message-id/CAJ3gD9fjpoPHSHB-Ufj7ciT8nV0JSA2gdJUdtxo-bMyPrpjk%3DQ%40mail.gmail.com -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
On Fri, Nov 16, 2018 at 7:46 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-11-13 15:30:21 -0800, Andres Freund wrote:
> What I'm now planning to do is to go through the big comment in
> tuptable.h and update that to the new world. While I'm not convinced
> that that that's the best place for it, it needs to be accurate.
>
> Furthermore:
> - More comment polishing
> - I'll probably split the commits up a bit further (particulary JIT
> ignoring virtual tuple slots, inlining the hot path of
> slot_getsomeattrs())
> - serious commit message polishing
I've done all of that now, and pushed it. Thanks Ashutosh, Amit
Khandekar and everyone else.
Hi Andres and all,
This commit 763f2edd92095b1ca2 "Rejigger materializing and fetching a HeapTuple from a slot" introduces a memory leak into the ExecutorState context which is invoked by this statement:
CREATE TABLE tbloom AS
SELECT
(random() * 1000000)::int as i1,
(random() * 1000000)::int as i2,
(random() * 1000000)::int as i3,
(random() * 1000000)::int as i4,
(random() * 1000000)::int as i5,
(random() * 1000000)::int as i6
FROM
generate_series(1,50000000);
By blind analogy to the changes made to matview.c, I think that createas.c is missing a heap_freetuple, as attached.
It fixes the leak, and still passes "make check".
Cheers,
Jeff
Attachment
On Sat, Feb 16, 2019 at 05:07:44PM -0500, Jeff Janes wrote: > By blind analogy to the changes made to matview.c, I think that createas.c > is missing a heap_freetuple, as attached. I think that you are right. CTAS and materialized views share a lot when it comes to relation creation and initial table loading. I have reproduced the leak and could notice that your fix is correct. So committed. -- Michael
Attachment
Hi, On 2019-02-27 14:21:52 +0900, Michael Paquier wrote: > On Sat, Feb 16, 2019 at 05:07:44PM -0500, Jeff Janes wrote: > > By blind analogy to the changes made to matview.c, I think that createas.c > > is missing a heap_freetuple, as attached. First, sorry to have been slow answering. I was whacking around code further modifying this, and thought I'd just solve the immediate issue here by committing the followup work that removes getting the tuple out of the slot entirely. That took longer than planned, so it makes sense to commit an interim fix. > I think that you are right. CTAS and materialized views share a lot > when it comes to relation creation and initial table loading. I have > reproduced the leak and could notice that your fix is correct. So > committed. I'm not so sure that's the architecturally correct fix however. Is it actually guaranteed, given expanded tuples, toasting, etc, that there's no other memory leak here? I wonder if we shouldn't work twoards using a short lived memory context here. Note how e.g. printtup() uses a short lived context for its work. Greetings, Andres Freund
On Tue, Feb 26, 2019 at 09:42:38PM -0800, Andres Freund wrote: > I'm not so sure that's the architecturally correct fix however. Is it > actually guaranteed, given expanded tuples, toasting, etc, that there's > no other memory leak here? I wonder if we shouldn't work twoards using a > short lived memory context here. Note how e.g. printtup() uses a short > lived context for its work. Perhaps. I got to wonder if this change would not impact code using their own DestReceiver, resulting in similar leaks when they insert tuples on-the-fly. Such issues can be surprising for fork an plugin developers. I was playing a bit with some refactoring of relation creation for CTAS in the scope of temporary matviews, and noticed this issue on the CF list, so that was a bit annoying, and issues like that tend to be easily forgotten.. -- Michael
Attachment
Hi, On 2019-02-27 15:34:07 +0900, Michael Paquier wrote: > On Tue, Feb 26, 2019 at 09:42:38PM -0800, Andres Freund wrote: > > I'm not so sure that's the architecturally correct fix however. Is it > > actually guaranteed, given expanded tuples, toasting, etc, that there's > > no other memory leak here? I wonder if we shouldn't work twoards using a > > short lived memory context here. Note how e.g. printtup() uses a short > > lived context for its work. > > Perhaps. I got to wonder if this change would not impact code using > their own DestReceiver, resulting in similar leaks when they insert > tuples on-the-fly. Im not sure I understand. How can adding a memory context + reset to ctas and matview receivers negatively impact other dest receivers? > I was playing a bit with some refactoring of relation creation for > CTAS in the scope of temporary matviews, and noticed this issue on the > CF list, so that was a bit annoying, and issues like that tend to be > easily forgotten.. It's been 10 days since the report, nobody pinged, and obviously I'm working on pluggable storage, so ... Greetings, Andres Freund
On Tue, Feb 26, 2019 at 10:38:45PM -0800, Andres Freund wrote: > Im not sure I understand. How can adding a memory context + reset to > ctas and matview receivers negatively impact other dest receivers? I don't think you got my point here: imagine that a plugin use the current receiveSlot logic from createas.c or matview.c, and forgets to free the tuple copied. On v11, that works fine. On current HEAD, they win silently a new leak. -- Michael
Attachment
On 2019-02-27 15:42:50 +0900, Michael Paquier wrote: > On Tue, Feb 26, 2019 at 10:38:45PM -0800, Andres Freund wrote: > > Im not sure I understand. How can adding a memory context + reset to > > ctas and matview receivers negatively impact other dest receivers? > > I don't think you got my point here: imagine that a plugin use the > current receiveSlot logic from createas.c or matview.c, and forgets to > free the tuple copied. On v11, that works fine. On current HEAD, > they win silently a new leak. The copy was made in intorel_receive()?