Thread: TupleTableSlot abstraction

TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

I've recently been discussing with Robert how to abstract
TupleTableSlots in the light of upcoming developments around abstracting
storage away.  Besides that aspect I think we also need to make them
more abstract to later deal with vectorized excution, but I'm more fuzzy
on the details there.

I'd previously, in a response to an early version of the pluggable
storage patch, commented [1] how I think the abstraction should roughly
look like. I'd privately started to prototype that. After my discussion
with Robert I got that prototype in a state that can run many queries,
but doesn't pass the regression tests.

The pluggable storage patch has also been updated [2] to abstract slots
more (see patch 0005).

My position is that there's a number of weaknesses with the current
TupleTableSlot API in light of multiple independent, possibly out of
core, storage implementations:


- TupleTableSlots have to contain all the necessary information for each
  type of slot. Some implementations might require a buffer pin to be
  hold (our heap), others pointers to multiple underlying points of data
  (columns store), some need additional metadata (our MinimalTuple).

  Unifying the information into one struct makes it much harder to
  provide differing implementations without modifying core postgres
  and/or increasing overhead for every user of slots.

  I think the consequence of that is that we need to allow each type of
  slot to have their own TupleTableSlot embedding struct.

  Haribabu's patch solved this by adding a tts_storage parameter that
  contains additional data, but imo that's unconvincing due to the added
  indirection in very performance critical codepaths.


- The way slots currently are implemented each new variant data is
  stored in slots adds new branches to hot code paths like
  ExecClearTuple(), and they're not extensible.

  Therefore I think we need to move to callback based implementation. In
  my tests, by moving the callback invocations into very thin inline
  functions, the branch prediction accuracy actually goes sligthly
  *up*.


- Currently we frequently convert from one way of representing a row
  into another, in the same slot. We do so fairly implicilty in
  ExecMaterializeSlot(), ExecFetchSlotMinimalTuple(), ExecCopySlot()...

  The more we abstract specific storage representation away, the worse I
  think that is. I think we should make such conversions explicit by
  requiring transfers between two slots if a specific type is required.


- We have a few codepaths that set fields on tuples that can't be
  represented in virtual slots. E.g. postgres_fdw sets ctid, oid,
  ExecProcessReturning() will set tableoid.


In my POC I turned TupleTableSlot into basically an abstract base class:
struct TupleTableSlot
{
    NodeTag        type;

    uint16        flags;
    uint16        nvalid;        /* # of valid values in tts_values */

    const TupleTableSlotOps *const cb;

    TupleDesc    tupleDescriptor;    /* slot's tuple descriptor */

    Datum       *values;        /* current per-attribute values */
    bool       *nulls;        /* current per-attribute null flags */

    /* can we optimize away? */
    MemoryContext mcxt;        /* slot itself is in this context */
};

which then is inherited from for specific implementations of tuple
slots:

typedef struct HeapTupleTableSlot
{
    TupleTableSlot base;
    HeapTuple    tuple;        /* physical tuple */
    uint32        off;        /* saved state for slot_deform_tuple */
} HeapTupleTableSlot;

...
typedef struct MinimalTupleTableSlot
{
    TupleTableSlot base;
    HeapTuple    tuple;        /* tuple wrapper */
    MinimalTuple mintuple;    /* minimal tuple, or NULL if none */
    HeapTupleData minhdr;    /* workspace for minimal-tuple-only case */
    uint32        off;        /* saved state for slot_deform_tuple */
} MinimalTupleTableSlot;


typedef struct TupleTableSlotOps
{
    void (*init)(TupleTableSlot *slot);
    void (*release)(TupleTableSlot *slot);

    void (*clear)(TupleTableSlot *slot);

    void (*getsomeattrs)(TupleTableSlot *slot, int natts);

    void (*materialize)(TupleTableSlot *slot);

    HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);
    MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);

    HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);
    MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);

} TupleTableSlotOps;

when creating a slot one has to specify the type of slot one wants to
use:

typedef enum TupleTableSlotType
{
    TTS_TYPE_VIRTUAL,
    TTS_TYPE_HEAPTUPLE,
    TTS_TYPE_MINIMALTUPLE,
    TTS_TYPE_BUFFER
} TupleTableSlotType;

extern TupleTableSlot *MakeTupleTableSlot(TupleDesc desc, TupleTableSlotType st);

You might notice that I've renamed a few fields (tts_values -> values,
tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed
in the above structs / the attached patch. I now think that's probably
not a good idea, unnecessarily breaking more code than necessary.


I generally like this scheme because it allows the TupleTableStruct to
be relatively lean, without knowing much about specific implementations
of slots.

After this change functions like slot_getattr are thin inline wrappers:
+static inline Datum
+slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
+{
+   Assert(attnum > 0);
+
+   if (attnum > slot->nvalid)
+       slot->cb->getsomeattrs(slot, attnum);
+
+   Assert(attnum <= slot->nvalid);
+
+   *isnull = slot->nulls[attnum - 1];
+   return slot->values[attnum - 1];
+}

Note that I've moved system attribute handling out of the slot_getattr
path - there were a few paths accessing them via slot_getattr, and it's
a lot of unnecessary branching.


The fact that slots of different types have different storage means that
we can't just change the type of a slot when needed. In nearly all
places that's not a problem. E.g. ExecHashTableInsert() converts the
"incoming" slot to one containing a minimal tuple with
ExecFetchSlotMinimalTuple(), but that's essentially transient as it's
just used to copy the tuple into the hashtable.  We could even gain
quite some efficiency by directly copying into the allocated space
(saving one serialization/copy step for the common case where input is
either a virtual or a heap tuple).

The slightly bigger issue is that several parts of the code currently
require heaptuples they can scribble upon. E.g. nodeModifyTable.c
materializes the tuple - those can be solved by using a local slot to
store the tuple, rather than using the incoming one. In nearly all cases
we'd still be left with the same number of tuple copy steps, as
materializing the tuple with copy into the local storage anyway.  A few
other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or
such instead of materialize.


The biggest issue I don't have a good answer for, and where I don't like
Haribabu's solution, is how to deal with system columns like oid, ctid,
tableoid. Haribabu's patch adds storage of those to the slot, but that
seems like quite the cludge to me.  It appears to me that the best thing
would be to require projections for all accesses to system columns, at
the original level of access. Then we don't need any sort of special
codepaths dealing with them. But that's not exactly a trivial change and
has some added complications too (junkfilter type things).


I made a reference to vectorized execution above. That's because that I
found, while prototyping vectorized execution, that it's a very common
need to interact with parts of the system that aren't vectorized, and
doing so has to be extremely cheap.  With the above we can have a
TupleTableSlot type that's essentially just a cheap cursor into a batch
of tuples.  Besides making it efficiently possible to hand of slots to
part of the system that can't deal with tuple batches, it allows to do
fun things like having the first slot_deform_tuple() call for one tuple
in a batch to deform a full page's worth of tuples, yielding *much*
better cache access patterns.


I haven't done that, but I think we should split ExecStoreTuple() into
multiple versions, that only work on specific types of tuple
slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other
code dealing with tuples would call ExecStoreHeapTuple().  The relevant
callers already need to know what kind of tuple they're dealing with,
therefore that's not a huge burden.


Please please note that the attached patch is *NOT* meant to be a full
proposal or even a to be reviewed patch, it's just an illustration of
the concepts I'm talking about.

Comments? Better ideas?

[1] http://archives.postgresql.org/message-id/20170815065348.afkj45dgmg22ecfh%40alap3.anarazel.de
[2] http://archives.postgresql.org/message-id/CAJrrPGdY_hgvu06R_vCibUktKRLp%3Db8nBfeZkh%3DKRc95tq4euA%40mail.gmail.com

Greetings,

Andres Freund

Attachment

Re: TupleTableSlot abstraction

From
Alexander Korotkov
Date:
Hi!

Thank you for working on this subject.  See some my comments below.

On Wed, Feb 21, 2018 at 1:43 AM, Andres Freund <andres@anarazel.de> wrote:
- TupleTableSlots have to contain all the necessary information for each
  type of slot. Some implementations might require a buffer pin to be
  hold (our heap), others pointers to multiple underlying points of data
  (columns store), some need additional metadata (our MinimalTuple).

  Unifying the information into one struct makes it much harder to
  provide differing implementations without modifying core postgres
  and/or increasing overhead for every user of slots.

  I think the consequence of that is that we need to allow each type of
  slot to have their own TupleTableSlot embedding struct.
 
+1, I think that it's reasonable to keep minimal common slot information in
TupleTableSlot struct and to let particular slot implementations to extend it.

  Haribabu's patch solved this by adding a tts_storage parameter that
  contains additional data, but imo that's unconvincing due to the added
  indirection in very performance critical codepaths.
 
+1

- The way slots currently are implemented each new variant data is
  stored in slots adds new branches to hot code paths like
  ExecClearTuple(), and they're not extensible.

  Therefore I think we need to move to callback based implementation. In
  my tests, by moving the callback invocations into very thin inline
  functions, the branch prediction accuracy actually goes sligthly
  *up*.

Sounds interesting.  Besides branch prediction accuracy, what can you
say about overall performance?

- Currently we frequently convert from one way of representing a row
  into another, in the same slot. We do so fairly implicilty in
  ExecMaterializeSlot(), ExecFetchSlotMinimalTuple(), ExecCopySlot()...

  The more we abstract specific storage representation away, the worse I
  think that is. I think we should make such conversions explicit by
  requiring transfers between two slots if a specific type is required.


- We have a few codepaths that set fields on tuples that can't be
  represented in virtual slots. E.g. postgres_fdw sets ctid, oid,
  ExecProcessReturning() will set tableoid.


In my POC I turned TupleTableSlot into basically an abstract base class:
struct TupleTableSlot
{
        NodeTag         type;

        uint16          flags;
        uint16          nvalid;         /* # of valid values in tts_values */

        const TupleTableSlotOps *const cb;

        TupleDesc       tupleDescriptor;        /* slot's tuple descriptor */

        Datum      *values;             /* current per-attribute values */
        bool       *nulls;              /* current per-attribute null flags */

        /* can we optimize away? */
        MemoryContext mcxt;             /* slot itself is in this context */
};

which then is inherited from for specific implementations of tuple
slots:

typedef struct HeapTupleTableSlot
{
        TupleTableSlot base;
        HeapTuple       tuple;          /* physical tuple */
        uint32          off;            /* saved state for slot_deform_tuple */
} HeapTupleTableSlot;

...
typedef struct MinimalTupleTableSlot
{
        TupleTableSlot base;
        HeapTuple       tuple;          /* tuple wrapper */
        MinimalTuple mintuple;  /* minimal tuple, or NULL if none */
        HeapTupleData minhdr;   /* workspace for minimal-tuple-only case */
        uint32          off;            /* saved state for slot_deform_tuple */
} MinimalTupleTableSlot;


typedef struct TupleTableSlotOps
{
        void (*init)(TupleTableSlot *slot);
        void (*release)(TupleTableSlot *slot);

        void (*clear)(TupleTableSlot *slot);

        void (*getsomeattrs)(TupleTableSlot *slot, int natts);

        void (*materialize)(TupleTableSlot *slot);

        HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);
        MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);

        HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);
        MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);

} TupleTableSlotOps;

when creating a slot one has to specify the type of slot one wants to
use:

typedef enum TupleTableSlotType
{
        TTS_TYPE_VIRTUAL,
        TTS_TYPE_HEAPTUPLE,
        TTS_TYPE_MINIMALTUPLE,
        TTS_TYPE_BUFFER
} TupleTableSlotType;

extern TupleTableSlot *MakeTupleTableSlot(TupleDesc desc, TupleTableSlotType st);
 
Can user define his own slot type?  If so, I assume that hard-coded
enum is a limitation of current prototype, and eventually we would have
an extendable array of slot types.  Is it correct?

You might notice that I've renamed a few fields (tts_values -> values,
tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed
in the above structs / the attached patch. I now think that's probably
not a good idea, unnecessarily breaking more code than necessary.

For me "tss_" prefix looks a bit weird, but I agree that we probably should
refrain from renaming this in order to break as small number of things as possible.
 
I generally like this scheme because it allows the TupleTableStruct to
be relatively lean, without knowing much about specific implementations
of slots.

After this change functions like slot_getattr are thin inline wrappers:
+static inline Datum
+slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
+{
+   Assert(attnum > 0);
+
+   if (attnum > slot->nvalid)
+       slot->cb->getsomeattrs(slot, attnum);
+
+   Assert(attnum <= slot->nvalid);
+
+   *isnull = slot->nulls[attnum - 1];
+   return slot->values[attnum - 1];
+}

Note that I've moved system attribute handling out of the slot_getattr
path - there were a few paths accessing them via slot_getattr, and it's
a lot of unnecessary branching.


The fact that slots of different types have different storage means that
we can't just change the type of a slot when needed. In nearly all
places that's not a problem. E.g. ExecHashTableInsert() converts the
"incoming" slot to one containing a minimal tuple with
ExecFetchSlotMinimalTuple(), but that's essentially transient as it's
just used to copy the tuple into the hashtable.  We could even gain
quite some efficiency by directly copying into the allocated space
(saving one serialization/copy step for the common case where input is
either a virtual or a heap tuple).

The slightly bigger issue is that several parts of the code currently
require heaptuples they can scribble upon. E.g. nodeModifyTable.c
materializes the tuple - those can be solved by using a local slot to
store the tuple, rather than using the incoming one. In nearly all cases
we'd still be left with the same number of tuple copy steps, as
materializing the tuple with copy into the local storage anyway.  A few
other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or
such instead of materialize.


The biggest issue I don't have a good answer for, and where I don't like
Haribabu's solution, is how to deal with system columns like oid, ctid,
tableoid. Haribabu's patch adds storage of those to the slot, but that
seems like quite the cludge to me.  It appears to me that the best thing
would be to require projections for all accesses to system columns, at
the original level of access. Then we don't need any sort of special
codepaths dealing with them. But that's not exactly a trivial change and
has some added complications too (junkfilter type things).

I also think that projections for accessing system columns is probably
the best solution for them.
 
I made a reference to vectorized execution above. That's because that I
found, while prototyping vectorized execution, that it's a very common
need to interact with parts of the system that aren't vectorized, and
doing so has to be extremely cheap.  With the above we can have a
TupleTableSlot type that's essentially just a cheap cursor into a batch
of tuples.  Besides making it efficiently possible to hand of slots to
part of the system that can't deal with tuple batches, it allows to do
fun things like having the first slot_deform_tuple() call for one tuple
in a batch to deform a full page's worth of tuples, yielding *much*
better cache access patterns.
 
Are you planning to use custom slots only for interaction between
vectorized and non-vectorized parts of executor, or also for interaction
inside vectorized part of executor.  It seems that proposed interface
isn't suitable for interaction between two vectorized nodes.  Are you
planning to introduce special kind of vectorized slots or whatever?

I haven't done that, but I think we should split ExecStoreTuple() into
multiple versions, that only work on specific types of tuple
slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other
code dealing with tuples would call ExecStoreHeapTuple().  The relevant
callers already need to know what kind of tuple they're dealing with,
therefore that's not a huge burden.


Please please note that the attached patch is *NOT* meant to be a full
proposal or even a to be reviewed patch, it's just an illustration of
the concepts I'm talking about.

Comments? Better ideas?

As I get, your primary interest in this topic is vectorized execution.  When
passing data from vectorized to non-vectorized parts of executor tuples are
materialized by moving them from vectorized slot to regular slot.
But what do you thing about alternative tuple formats in custom row-oriented
table access methods?  It's likely we should try to minimize tuple format
conversion.  Thus, some executor nodes can switch their tuple format to
match table access method tuple format.  For instance, we're running
aggregate over table with custom table access method.  Then hash
aggregate node may accumulate tuples in the same format as they are
received from table access method.  Does it make sense?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: TupleTableSlot abstraction

From
Alvaro Herrera
Date:
Andres Freund wrote:
> Hi,
> 
> I've recently been discussing with Robert how to abstract
> TupleTableSlots in the light of upcoming developments around abstracting
> storage away.  Besides that aspect I think we also need to make them
> more abstract to later deal with vectorized excution, but I'm more fuzzy
> on the details there.

Hi, is this patch proposed for pg11?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
On 2018-03-13 15:18:34 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi,
> > 
> > I've recently been discussing with Robert how to abstract
> > TupleTableSlots in the light of upcoming developments around abstracting
> > storage away.  Besides that aspect I think we also need to make them
> > more abstract to later deal with vectorized excution, but I'm more fuzzy
> > on the details there.

> Hi, is this patch proposed for pg11?

I wish, but I don't see it happening unless I get a time compression
device from somewhere :(

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Wed, Feb 21, 2018 at 4:13 AM, Andres Freund <andres@anarazel.de> wrote:

>
> In my POC I turned TupleTableSlot into basically an abstract base class:

Here's set of patches which move this POC forward towards completion,
not yet complete though.

Here's what has changed.
1. Fixed all the regression failures while running make check from
regress and postgres_fdw.
2. Added/modified a bunch of comments about the new TupleTableSlot
structure. Modified comments in existing functions working with
TupleTableSlots as per the new implementation. Added comments
explaining each callback in TupleTableSlotOps, and their
implementations.
3. Improved error messages in some TupleTableTypeSlot specific implementations.
4. Fixed some coding errors in the POC patch (which didn't show up as
a failure in regression)

>
> You might notice that I've renamed a few fields (tts_values -> values,
> tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed
> in the above structs / the attached patch. I now think that's probably
> not a good idea, unnecessarily breaking more code than necessary.

Done that. All members of TupleTableSlot have their names starting with tts_

>
> After this change functions like slot_getattr are thin inline wrappers:
> +static inline Datum
> +slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
> +{
> +   Assert(attnum > 0);
> +
> +   if (attnum > slot->nvalid)
> +       slot->cb->getsomeattrs(slot, attnum);
> +
> +   Assert(attnum <= slot->nvalid);
> +
> +   *isnull = slot->nulls[attnum - 1];
> +   return slot->values[attnum - 1];
> +}
>
> Note that I've moved system attribute handling out of the slot_getattr
> path - there were a few paths accessing them via slot_getattr, and it's
> a lot of unnecessary branching.

The system attributes are accessed via heap_getsysattr() by passing
that function a tuple from HeapTupleTableSlot or
BufferHeapTupleTableSlot. Those places Assert that the slot being used
is either HeapTupleTableSlot or BufferHeapTupleTableSlot.
slot_getsysattr() which accessed system attributes through a lot is
now removed and replaced with heap_getsysattr as described.

>
> The slightly bigger issue is that several parts of the code currently
> require heaptuples they can scribble upon. E.g. nodeModifyTable.c
> materializes the tuple - those can be solved by using a local slot to
> store the tuple, rather than using the incoming one. In nearly all cases
> we'd still be left with the same number of tuple copy steps, as
> materializing the tuple with copy into the local storage anyway.  A few
> other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or
> such instead of materialize.

There is a patch changing callers of ExecMaterializeSlot() to use
ExecGetHeapTupleFromSlot() or ExecSaveSlot(), new functions added by
me. But I think that needs more work. I don't think the new functions
are necessary. We could use ExecCopySlotTuple(), ExecFetchSlotTuple()
in place of ExecGetHeapTupleFromSlot() appropriately. The callers who
just want the tuple and not necessarily a writable one, can use
ExecFetchSlotTuple(). Those who want a writable copy can use
ExecCopySlotTuple(). When the slot is *not* heap or buffer heap tuple
table slot, we materialize a heap tuple every time we want to get the
heap tuple out of the slot. If we are doing this multiple times
without changing the contents of the slot, we will leak memory. I have
TODOs to check whether that's a problem in practice. May be we should
use ExecCopySlotTuple() explicitly while fetching tuples from those
kinds of slot. Same is case while fetching a minimal heap tuple from
non minimal heap tuple slots.

>
>
> The biggest issue I don't have a good answer for, and where I don't like
> Haribabu's solution, is how to deal with system columns like oid, ctid,
> tableoid. Haribabu's patch adds storage of those to the slot, but that
> seems like quite the cludge to me.  It appears to me that the best thing
> would be to require projections for all accesses to system columns, at
> the original level of access. Then we don't need any sort of special
> codepaths dealing with them. But that's not exactly a trivial change and
> has some added complications too (junkfilter type things).
>

I studied the code extensively to check whether there are any cases
where we fetch system columns using (Var nodes with varattno < 0) from
(headers of ) tuples produced by non-scan nodes (e.g. join,
aggregation). That's not the case. We do have instances of INNER,
OUTER Var with varattno < 0, but those are the cases when trigger
accesses OLD and NEW tuples. I have a patch to add tests for the same.

>
> I haven't done that, but I think we should split ExecStoreTuple() into
> multiple versions, that only work on specific types of tuple
> slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other
> code dealing with tuples would call ExecStoreHeapTuple().  The relevant
> callers already need to know what kind of tuple they're dealing with,
> therefore that's not a huge burden.

I thought so too, but haven't done that change right now. Will work on
that. That's in my TODO list.

Here's description of patches.
0001, 0002 are the same patches as submitted in [1].

0003 changes tts_nvalid from int to AttrNumber. int would take at
least 4 bytes whereas the maximum value that tts_nvalid can take is
restricted by AttrNumber which 2 byte long.

0004 removed TupleDescGetSlot() function, which is obsolete, but
retained for extensions. With TupleTableSlot abstraction, I think the
extensions need to change anyway, so removing that function
altogether.

0005 packs various boolean members from TupleTableSlot structure into
a uint16 tts_flags, with each bit for each one bool member. I think we
can remove SHOULDFREEMIN since SHOULDFREE on MinimalHeapTupleTableSlot
can be used in place of SHOULDFREEMIN. I will do that in the next
version. This reduces the size of TupleTableSlot structure and can be
committed as an independent patch. There's also possibility that only
the BufferHeapTupleTableSlot will have ~SHOULDFREE but all other slots
have SHOULDFREE set, so we can get rid of that flag altogether. But I
need to verify that.

0006 This is improved version of your POC patch. I have changed the
tuple table slot type in many places which showed up as regression
failures. Added/modified a bunch of comments, improved error messages,
corrected code at some places, addressed FIXME's at some places. Also
added macros to check the TupleTableSlot's type based on
TupleTableSlotOps set.

0007 adds tts_type member indicating TupleTableSlot type readily
avaialble for debugging. We may or may not commit this patch.

0008 changes the place where we reset expression context used for
processing RETURNING and ON CONFLICT clause. If a query has both the
clauses, we reset the same expression twice while processing same
input/output tuple. This means that some memory allocated while
processing ON CONFLICT and required by RETURNING will get freed before
the later gets at it. This looks like a bug in the existing code, but
I have not been able to create a reproduction for the same. It showed
up as a regression failure with TupleTableAbstraction changes.

0009 Replaces ExecMaterializeSlot as described above. I will work on
this again and examine all the instances of ExecMaterializeSlot as
described above.

0010 Index scan uses reorder queue in some cases. Tuples are stored
reorder queue as heap tuples without any associated buffer. While
returning a tuple from reorder queue, IndexNextWithReorder() should
use TupleTableSlot of type HeapTupleTableSlot. A tuple returned
directly from an index scan has a buffer associated with it, so should
use
TupleTableSlot of type BufferHeapTupleTableSlot. So, use different
kinds of slots for an index scan.


Next steps
1. Address TODO in the code. I have listed some of those above.

2. Right now we are using TupleTableSlotType, an enum, to create slots
of required type. But extensions which want to add their own slot
types won't be able to add a type in that enum. So, they will need to
write their own MakeTupleTableSlot. That function uses the
TupleTableSlotType to set TupleTableSlotOps and calculate the minimum
size of slot. We could change the function to accept TupleTableSlotOps
and the minimum size and it just works for all the extensions. Or it
could just accept TupleTableSlotOps and there's a callback to
calculate minimum memory required for the slot (say based on the tuple
descriptor available).

3. compile with LLVM and fix any compilation and regression errors.

4. We need to do something with the name ExecStoreVirtualSlot(), which
is being (and will be) used for all kinds of TupleTableSlot type.
Right now I don't have any bright ideas.

5. ExecFetch* functions are now one liners, so we could make those
inline and keep those in header file like, say slot_getattr.

6. ExecCopySlot can be a thin wrapper if we add a callback copyslot()
and invoked on the destination slot type.

7. slot_attisnull() deforms a heap/minimal tuple if that status for
given attribute is not available tts_isnull. Should we instead add a
callback attisnull() which can use something like heap_isnull()?

There are TODOs in the patches for 4, 5, 6, 7.

I will continue to work on these steps.

[1] https://www.postgresql.org/message-id/CAFjFpRerUFX=T0nSnCoroXAJMoo-xah9J+pi7+xDUx86PtQmew@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Mon, Jun 11, 2018 at 6:13 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>>
>> The slightly bigger issue is that several parts of the code currently
>> require heaptuples they can scribble upon. E.g. nodeModifyTable.c
>> materializes the tuple - those can be solved by using a local slot to
>> store the tuple, rather than using the incoming one. In nearly all cases
>> we'd still be left with the same number of tuple copy steps, as
>> materializing the tuple with copy into the local storage anyway.  A few
>> other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or
>> such instead of materialize.
>
> There is a patch changing callers of ExecMaterializeSlot() to use
> ExecGetHeapTupleFromSlot() or ExecSaveSlot(), new functions added by
> me. But I think that needs more work. I don't think the new functions
> are necessary. We could use ExecCopySlotTuple(), ExecFetchSlotTuple()
> in place of ExecGetHeapTupleFromSlot() appropriately. The callers who
> just want the tuple and not necessarily a writable one, can use
> ExecFetchSlotTuple(). Those who want a writable copy can use
> ExecCopySlotTuple(). When the slot is *not* heap or buffer heap tuple
> table slot, we materialize a heap tuple every time we want to get the
> heap tuple out of the slot. If we are doing this multiple times
> without changing the contents of the slot, we will leak memory. I have
> TODOs to check whether that's a problem in practice. May be we should
> use ExecCopySlotTuple() explicitly while fetching tuples from those
> kinds of slot. Same is case while fetching a minimal heap tuple from
> non minimal heap tuple slots.

In the attached set of patches, the ExecMaterializeSlot(),
ExecFetchSlotTuple() and ExecFetchSlotMinimalTuple() have been
changed.

Here's excerpt of the commit messages included in 0006 in the attached
patch-set.
---
Before this work, the contents of the slot could be in virtual form
(in tts_datums and tts_isnull) and/or be stored as a heap tuple and/or
a minimal tuple.  When the contents were in heap or minimal tuple
form, those tuples need not be "owned" by the slot. When a tuple is
owned by a slot, it's slot's responsibility to free the memory
consumed by the tuple when the tuple is not needed. That typically
happened when the slot got a new tuple to store in it or when the slot
itself was discarded, mostly along-with the other slots in tuple
table.

Then ExecMaterializeSlot() served two purposes: 1. If the contents of
the slot were not "owned" by the slot, or if there was no heap tuple
in the slot, it created a heap tuple which it "owned" from those
contents. 2. It returned the heap tuple, which the slot "owned". This
was typically used when the caller wanted, from the slot, a heap tuple
which it can upon (esp. system attributes like tableOid) and expected
it to be visible from within the slot.

Since a single slot contained a tuple in various forms, it could "own"
multiple forms at a time and return whichever was requested when
ExecFetchSlotTuple() or ExecFetchSlotMinimalTuple() was called.  Those
functions, however, differed from ExecMaterializeSlot() in the sense
that they returned a tuple even when the slot didn't own it.

With this work, every slot contains a tuple in "virtual" form, but
only one of the other two forms. Thus ExecMaterializeSlot() can not
create a heap tuple, and let the slot "own" it, in a slot that can not
contain a heap tuple.  Neither ExecFetchSlotTuple() can return a tuple
from a slot which can not contain a heap tuple without consuming
memory (slots with minimal tuple can do some pointer swizzling and
save memory but that lump of memory is not exactly same as a heap
tuple memory lump.). Since such a heap tuple can not be "owned" by the
slot, the memory consumed by the tuple needs to be freed by the
caller. Similarly for ExecFetchSlotMinimalTuple().

Hence, in the new scheme the meaning of ExecMaterializeSlot(),
ExecFetchSlotTuple() and ExecFetchSlotMinimalTuple() needs to change.

ExecMaterializeSlot()
---------------------

ExecMaterializeSlot() should save the contents of the slot into its
native tuple format, whichever it is, so that it "owns" the tuple.

ExecFetchSlotTuple()
--------------------

ExecFetchSlotTuple() should just return the heap tuple when the
underlying slot can "own" a heap tuple and error out when slot can not
contain the requested format e.g. when ExecFetchSlotTuple() is called
on a "virtual" or "minimal" tuple slot.  This is in-line with what we
are doing with ExecStore* functions. The earlier callers of
ExecMaterializeSlot(), which expected a heap tuple to be returned,
require to call ExecMaterializeSlot() (no return value) followed by
ExecFetchSlotTuple(). Instead of that ExecFetchSlotTuple() is changed
to accept a second argument "materialize". When it's true, the
returned heap tuple is materialized and "owned" by the slot. When it's
false, the returned heap tuple may not be "owned" by the slot. All the
callers of ExecFetchSlotTuple() can be managed to pass a
HeapTupleTableSlot or a BufferHeapTupleTableSlot. Instead of relying
on the TupleTableSlotType (which may be expanded later for user
defined TupleTableSlots) we rely on the presence get_heap_tuple()
callback in the given slot. That callback is expected to return a heap
tuple as is from the slot.

In a few cases, ExecFetchSlotTuple() is called with a slot which is
filled by tuplestore_gettupleslot(). We use a separate tuple store to
store foreign table tuples for processing AFTER triggers and for
storing tuples returned by a function. They are stored in the tuple
store as minimal tuples. In both the cases, the minimal tuples are
converted to heap tuple (using earlier version of
ExecFetchSlotTuple()) before they are processed further after
obtaining those from a tuple store using tuplestore_gettupleslot().
With the tuple table store abstraction a minimal tuple can be stored
in a MinimalTupleTableSlot. Thus we have two options

1. Pass a MinimalTupleTableSlot to tuplestore_gettupleslot() and get a
minimal tuple in this slot. Convert the minimal tuple to a heap tuple
by allocating memory for the converted tuple. We have to explicitly
release the memory allocated for the heap tuple after it's been
processed. When do we do that exactly in AfterTriggerExecute() is not
clear. ExecFetchSlotTupleDatum() could do that right away.

2. Pass a HeapTupleTableSlot to tuplestore_gettupleslot() and let it
store the heap tuple crafted from minimal tuple, which can be freed
within the same function, if necessary. The heap tuple gets freed when
the slot is used to store the next tuple to be processed.

The second option looks better since it continues to use slot's
mechanism to "own" and free tuples that it "owns". Hence implemented
the same.

ExecFetchSlotMinimalTuple()
---------------------------

Before this work, a TupleTableSlot could "own" a minimal tuple as
well. Thus ExecFetchSlotMinimalTuple() returned a minimal tuple after
constructing and "owning" it if it was not readily available. When the
slot "owned" the minimal tuple, the memory consumed by the tuple was
freed when a new tuple was stored in it or the slot was cleared.

With this work, not all TupleTableSlot types can "own" a minimal
tuple.  When fetching a minimal tuple from a slot that can not "own" a
tuple, memory is allocated to construct the minimal tuple, which needs
to be freed explicitly. Hence ExecFetchSlotMinimalTuple() is modified
to flag the caller whether it should free the memory consumed by the
returned minimal tuple.

Right now only a MinimalTupleTableSlot can own a minimal tuple. But we
may change that in future or a user defined TupleTableSlot type (added
through an extension) may be able to "own" a minimal tuple in it.
Hence instead of relying upon TTS_IS_MINIMAL() macro to tell us
whether the TupleTableSlot can "own" a minimal tuple or not, we rely
on the set of callbacks.  A TupleTableSlot which can hold a minimal
tuple implements get_minimal_tuple callback. Other TupleTableSlot
types leave the callback NULL.

The minimal tuple returned by this function is usually copied into a
hash table or a file. But, unlike ExecFetchSlotTuple() it's never
written to. Hence the difference between signatures of the two
functions.

---


>
>>
>> I haven't done that, but I think we should split ExecStoreTuple() into
>> multiple versions, that only work on specific types of tuple
>> slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other
>> code dealing with tuples would call ExecStoreHeapTuple().  The relevant
>> callers already need to know what kind of tuple they're dealing with,
>> therefore that's not a huge burden.
>
> I thought so too, but haven't done that change right now. Will work on
> that. That's in my TODO list.

This is still a TODO.

0001-0005 are same as the previous patch set.

>
> 0006 This is improved version of your POC patch. I have changed the
> tuple table slot type in many places which showed up as regression
> failures. Added/modified a bunch of comments, improved error messages,
> corrected code at some places, addressed FIXME's at some places. Also
> added macros to check the TupleTableSlot's type based on
> TupleTableSlotOps set.

The changes to ExecMaterializeSlot(), ExecFetchSlotTuple() and
ExecFetchSlotMinimalTuple() discussed above are included in this
patch.

0007, 0008 are same as previous patch set.

>
> 0009 Replaces ExecMaterializeSlot as described above. I will work on
> this again and examine all the instances of ExecMaterializeSlot as
> described above.

This is merged with 0006.

>
> 0010 Index scan uses reorder queue in some cases. Tuples are stored
> reorder queue as heap tuples without any associated buffer. While
> returning a tuple from reorder queue, IndexNextWithReorder() should
> use TupleTableSlot of type HeapTupleTableSlot. A tuple returned
> directly from an index scan has a buffer associated with it, so should
> use
> TupleTableSlot of type BufferHeapTupleTableSlot. So, use different
> kinds of slots for an index scan.

This is 0009

>
>
> Next steps
> 1. Address TODO in the code. I have listed some of those above.

There are still a handful of TODOs in the patches. I will work on those next.

>
> 2. Right now we are using TupleTableSlotType, an enum, to create slots
> of required type. But extensions which want to add their own slot
> types won't be able to add a type in that enum. So, they will need to
> write their own MakeTupleTableSlot. That function uses the
> TupleTableSlotType to set TupleTableSlotOps and calculate the minimum
> size of slot. We could change the function to accept TupleTableSlotOps
> and the minimum size and it just works for all the extensions. Or it
> could just accept TupleTableSlotOps and there's a callback to
> calculate minimum memory required for the slot (say based on the tuple
> descriptor available).

This is still a TODO.

>
> 3. compile with LLVM and fix any compilation and regression errors.

When I compiled server with just 0003 applied with LLVM, the
compilation went well, but there was a server crash. That patch
changes type of tts_nvalid from int32 to AttrNumber. I tried debugging
the crash with a debug LLVM build, but couldn't complete the work.
Attached patch attrnumber_llvm_type.patch is my incomplete attempt to
fix that crash. I think, we should make it easy to change the data
types of the members in structures shared by JIT and non-JIT code, may
be automatically create both copies of the code somehow. I will get
back to this after addressing other TODOs.

>
> 4. We need to do something with the name ExecStoreVirtualSlot(), which
> is being (and will be) used for all kinds of TupleTableSlot type.
> Right now I don't have any bright ideas.

Still a TODO.

>
> 5. ExecFetch* functions are now one liners, so we could make those
> inline and keep those in header file like, say slot_getattr.

This is still a TODO.

>
> 6. ExecCopySlot can be a thin wrapper if we add a callback copyslot()
> and invoked on the destination slot type.

This is still a TODO.

>
> 7. slot_attisnull() deforms a heap/minimal tuple if that status for
> given attribute is not available tts_isnull. Should we instead add a
> callback attisnull() which can use something like heap_isnull()?

This is still a TODO.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Thu, Jul 5, 2018 at 4:07 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

>>> I haven't done that, but I think we should split ExecStoreTuple() into
>>> multiple versions, that only work on specific types of tuple
>>> slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other
>>> code dealing with tuples would call ExecStoreHeapTuple().  The relevant
>>> callers already need to know what kind of tuple they're dealing with,
>>> therefore that's not a huge burden.
>>
>> I thought so too, but haven't done that change right now. Will work on
>> that. That's in my TODO list.

Done. Added that as a separate patch 0001 since that change adds value
by itself.

0001 in the earlier patch set got committed, 0002 in that patch set is
not required anymore.

0002 - 0004 in this patch set are same as 0003-0005 in the previous patch set.

0005 in this patch set is 0006 in the previous one with a bunch of
TODO's addressed. An important change is virtual tuple slot contents
are never required to be freed when slot is cleared.

0006-0009 are same as 0007 - 0010 in the previous patch set.


>>
>> Next steps
>> 1. Address TODO in the code. I have listed some of those above.
>
> There are still a handful of TODOs in the patches. I will work on those next.

The number of TODOs has reduced, but there are still some that I am working on.

>
>>
>> 2. Right now we are using TupleTableSlotType, an enum, to create slots
>> of required type. But extensions which want to add their own slot
>> types won't be able to add a type in that enum. So, they will need to
>> write their own MakeTupleTableSlot. That function uses the
>> TupleTableSlotType to set TupleTableSlotOps and calculate the minimum
>> size of slot. We could change the function to accept TupleTableSlotOps
>> and the minimum size and it just works for all the extensions. Or it
>> could just accept TupleTableSlotOps and there's a callback to
>> calculate minimum memory required for the slot (say based on the tuple
>> descriptor available).

This is still TODO.

>
>>
>> 3. compile with LLVM and fix any compilation and regression errors.
>
> When I compiled server with just 0003 applied with LLVM, the
> compilation went well, but there was a server crash. That patch
> changes type of tts_nvalid from int32 to AttrNumber. I tried debugging
> the crash with a debug LLVM build, but couldn't complete the work.
> Attached patch attrnumber_llvm_type.patch is my incomplete attempt to
> fix that crash. I think, we should make it easy to change the data
> types of the members in structures shared by JIT and non-JIT code, may
> be automatically create both copies of the code somehow. I will get
> back to this after addressing other TODOs.
>

This is still a TODO

>>
>> 4. We need to do something with the name ExecStoreVirtualSlot(), which
>> is being (and will be) used for all kinds of TupleTableSlot type.
>> Right now I don't have any bright ideas.
>

Done and added as a separate patch 0010. Separate patch so that we can
discard this change, if we don't agree on it.

>>
>> 5. ExecFetch* functions are now one liners, so we could make those
>> inline and keep those in header file like, say slot_getattr.
>

Done.

>
>>
>> 6. ExecCopySlot can be a thin wrapper if we add a callback copyslot()
>> and invoked on the destination slot type.

This is still a TODO

>
>>
>> 7. slot_attisnull() deforms a heap/minimal tuple if that status for
>> given attribute is not available tts_isnull. Should we instead add a
>> callback attisnull() which can use something like heap_isnull()?
>

This is still a TODO.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Fri, Jul 13, 2018 at 3:45 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>
>>>
>>> Next steps
>>> 1. Address TODO in the code. I have listed some of those above.
>>
>> There are still a handful of TODOs in the patches. I will work on those next.
>
> The number of TODOs has reduced, but there are still some that I am working on.

The attached patch set has all the TODOs fixed.

>
>>
>>>
>>> 2. Right now we are using TupleTableSlotType, an enum, to create slots
>>> of required type. But extensions which want to add their own slot
>>> types won't be able to add a type in that enum. So, they will need to
>>> write their own MakeTupleTableSlot. That function uses the
>>> TupleTableSlotType to set TupleTableSlotOps and calculate the minimum
>>> size of slot. We could change the function to accept TupleTableSlotOps
>>> and the minimum size and it just works for all the extensions. Or it
>>> could just accept TupleTableSlotOps and there's a callback to
>>> calculate minimum memory required for the slot (say based on the tuple
>>> descriptor available).
>
> This is still TODO.

Done.

>
>>
>>>
>>> 3. compile with LLVM and fix any compilation and regression errors.
>>
>> When I compiled server with just 0003 applied with LLVM, the
>> compilation went well, but there was a server crash. That patch
>> changes type of tts_nvalid from int32 to AttrNumber. I tried debugging
>> the crash with a debug LLVM build, but couldn't complete the work.
>> Attached patch attrnumber_llvm_type.patch is my incomplete attempt to
>> fix that crash. I think, we should make it easy to change the data
>> types of the members in structures shared by JIT and non-JIT code, may
>> be automatically create both copies of the code somehow. I will get
>> back to this after addressing other TODOs.
>>
>
> This is still a TODO

Still a TODO.

>
>>
>>>
>>> 6. ExecCopySlot can be a thin wrapper if we add a callback copyslot()
>>> and invoked on the destination slot type.
>
> This is still a TODO

Still a TODO.

>
>>
>>>
>>> 7. slot_attisnull() deforms a heap/minimal tuple if that status for
>>> given attribute is not available tts_isnull. Should we instead add a
>>> callback attisnull() which can use something like heap_isnull()?
>>
>
> This is still a TODO.

Done. I also noticed that slot_getattr() optimizes the cases when the
requested attributes is NULL or is missing from a tuple. Given that a
custom TupleTableSlot type can have its own optimizations for the
same, added a new call back getattr() to obtain value of a given
attribute from slot. The callback is called from slot_getattr().

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

Working on rebasing the pluggable storage patch on the current version
of this.

On 2018-07-26 17:09:08 +0530, Ashutosh Bapat wrote:
> Done. I also noticed that slot_getattr() optimizes the cases when the
> requested attributes is NULL or is missing from a tuple. Given that a
> custom TupleTableSlot type can have its own optimizations for the
> same, added a new call back getattr() to obtain value of a given
> attribute from slot. The callback is called from slot_getattr().

I'm quite against this. This is just proliferation of callbacks without
much use.  Why do you think this is helpful?  I think it's much better
to go back to a single callback to deform here.

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Sun, Aug 5, 2018 at 3:49 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> Working on rebasing the pluggable storage patch on the current version
> of this.

Thanks. Please let me know if you see any issues.

>
> On 2018-07-26 17:09:08 +0530, Ashutosh Bapat wrote:
>> Done. I also noticed that slot_getattr() optimizes the cases when the
>> requested attributes is NULL or is missing from a tuple. Given that a
>> custom TupleTableSlot type can have its own optimizations for the
>> same, added a new call back getattr() to obtain value of a given
>> attribute from slot. The callback is called from slot_getattr().
>
> I'm quite against this. This is just proliferation of callbacks without
> much use.  Why do you think this is helpful?  I think it's much better
> to go back to a single callback to deform here.
>

I think, I explained why getattr() needs to be a separate callback.
There's a reason why slot_getattr() more code than just calling
slot_getsomeattrs() and return the required one - the cases when the
requested attribute is NULL or is missing from a tuple.  Depending
upon the tuple layout access to a simple attribute can be optimized
without spending cycles to extract attributes prior to that one.
Columnar store (I haven't seen Haribabu's patch), for example, stores
columns from multiple rows together and thus doesn't have a compact
version of tuple. In such cases extracting an individual attribute
directly is far cheaper than extracting all the previous attributes.
Why should we force the storage API to extract all the attributes in
such a case?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2018-08-06 10:08:24 +0530, Ashutosh Bapat wrote:
> I think, I explained why getattr() needs to be a separate callback.
> There's a reason why slot_getattr() more code than just calling
> slot_getsomeattrs() and return the required one - the cases when the
> requested attribute is NULL or is missing from a tuple.  Depending
> upon the tuple layout access to a simple attribute can be optimized
> without spending cycles to extract attributes prior to that one.
> Columnar store (I haven't seen Haribabu's patch), for example, stores
> columns from multiple rows together and thus doesn't have a compact
> version of tuple. In such cases extracting an individual attribute
> directly is far cheaper than extracting all the previous attributes.

OTOH, it means we continually access the null bitmap instead of just
tts_isnull[i].

Your logic about not deforming columns in this case would hold for *any*
deforming of previous columns as well. That's an optimization that we
probably want to implement at some point (e.g. by building a bitmap of
needed columns in the planner), but I don't think we should do it
together with this already large patchset.


> Why should we force the storage API to extract all the attributes in
> such a case?

Because there's no need yet, and it complicates the API without
corresponding benefit.

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Mon, Aug 6, 2018 at 10:15 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-08-06 10:08:24 +0530, Ashutosh Bapat wrote:
>> I think, I explained why getattr() needs to be a separate callback.
>> There's a reason why slot_getattr() more code than just calling
>> slot_getsomeattrs() and return the required one - the cases when the
>> requested attribute is NULL or is missing from a tuple.  Depending
>> upon the tuple layout access to a simple attribute can be optimized
>> without spending cycles to extract attributes prior to that one.
>> Columnar store (I haven't seen Haribabu's patch), for example, stores
>> columns from multiple rows together and thus doesn't have a compact
>> version of tuple. In such cases extracting an individual attribute
>> directly is far cheaper than extracting all the previous attributes.
>
> OTOH, it means we continually access the null bitmap instead of just
> tts_isnull[i].

Nope. The slot_getattr implementation in these patches as well as in
the current master return from tts_isnull if the array has the
information.

>
> Your logic about not deforming columns in this case would hold for *any*
> deforming of previous columns as well. That's an optimization that we
> probably want to implement at some point (e.g. by building a bitmap of
> needed columns in the planner), but I don't think we should do it
> together with this already large patchset.

Agree about optimizing using a separate bitmap indicating validity of
a particular value.

>
>
>> Why should we force the storage API to extract all the attributes in
>> such a case?
>
> Because there's no need yet, and it complicates the API without
> corresponding benefit.

The earlier version of slot_getattr() was optimized to access a given
attribute. That must have been done for some reason. Leaving that
optimization aside for this work will cause regression in the cases
where that optimization matters.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Thu, Jul 26, 2018 at 5:09 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>>
>>>
>>>>
>>>> 3. compile with LLVM and fix any compilation and regression errors.
>>>
>>> When I compiled server with just 0003 applied with LLVM, the
>>> compilation went well, but there was a server crash. That patch
>>> changes type of tts_nvalid from int32 to AttrNumber. I tried debugging
>>> the crash with a debug LLVM build, but couldn't complete the work.
>>> Attached patch attrnumber_llvm_type.patch is my incomplete attempt to
>>> fix that crash. I think, we should make it easy to change the data
>>> types of the members in structures shared by JIT and non-JIT code, may
>>> be automatically create both copies of the code somehow. I will get
>>> back to this after addressing other TODOs.
>>>

Still a TODO.

>
>>
>>>
>>>>
>>>> 6. ExecCopySlot can be a thin wrapper if we add a callback copyslot()
>>>> and invoked on the destination slot type.
>>

Done.

With this set of patches, make check-world passes clean.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Wed, Aug 8, 2018 at 5:07 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Amit Khandekar offlist told me that the previous patch-set doesn't
apply cleanly on the latest head.

PFA patches rebased on 31380bc7c204e7cfa9c9e1c62947909e2b38577c

>>>>>
>>>>> 3. compile with LLVM and fix any compilation and regression errors.
>>>>
>>>> When I compiled server with just 0003 applied with LLVM, the
>>>> compilation went well, but there was a server crash. That patch
>>>> changes type of tts_nvalid from int32 to AttrNumber. I tried debugging
>>>> the crash with a debug LLVM build, but couldn't complete the work.
>>>> Attached patch attrnumber_llvm_type.patch is my incomplete attempt to
>>>> fix that crash. I think, we should make it easy to change the data
>>>> types of the members in structures shared by JIT and non-JIT code, may
>>>> be automatically create both copies of the code somehow. I will get
>>>> back to this after addressing other TODOs.
>>>>
>

still a TODO

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
PFA patches rebased on ee80124811908ef1d4679296c46e36bd8a32b9de.

On Thu, Aug 9, 2018 at 5:12 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>>>>>>
>>>>>> 3. compile with LLVM and fix any compilation and regression errors.
>>>>>
>>>>> When I compiled server with just 0003 applied with LLVM, the
>>>>> compilation went well, but there was a server crash. That patch
>>>>> changes type of tts_nvalid from int32 to AttrNumber.

Done. Fixed the crash in the attached patch set. Thanks Andres for offlist help.

I also tried compiling the whole patch-set with LLVM. I have fixed
compilation errors in llvm_compile_expr().

The compilation errors in slot_compile_deform() requite more work.
That function accesses tuple, slow and offset properties from a
TupleTableSlot. The tuple and offset properties are now moved to
HeapTupleTableSlot so they can't be accessed from a TupleTableSlot
unless it's HeapTupleTableSlot or BufferHeapTupleTableSlot and even
then TupleTableSlot needs to be casted into a HeapTupleTableSlot for
accessing those properties. I do not know yet as to how casting works
in LLVM code. slow property of a TupleTableSlot is now available
through tts_flags. We need to add LLVM code to fetch tts_flags and
perform bit operation on it to get or set slow property. I haven't
found any precedence for LLVM bit operations in postgresql's JIT code.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2018-08-17 12:10:20 +0530, Ashutosh Bapat wrote:
> We need to add LLVM code to fetch tts_flags and
> perform bit operation on it to get or set slow property. I haven't
> found any precedence for LLVM bit operations in postgresql's JIT code.

There are several, look for the infomask accesses in
slot_compiler_deform.

I'll try to do the adaption later today.

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
On 2018-08-17 01:07:06 -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-08-17 12:10:20 +0530, Ashutosh Bapat wrote:
> > We need to add LLVM code to fetch tts_flags and
> > perform bit operation on it to get or set slow property. I haven't
> > found any precedence for LLVM bit operations in postgresql's JIT code.
> 
> There are several, look for the infomask accesses in
> slot_compiler_deform.
> 
> I'll try to do the adaption later today.

Attached is a series of patches doing so.  The previous implementation
of sysvar accesses wasn't actually working - the slot argument was
uninitialized.

I also noticed an independent issue in your changes to
ExecInitScanTupleSlot(): You can't assume that the plan belonging to the
ScanState have a Scan node in their plan. Look e.g. at Material, Sort
etc. So currently your scanrelid access is often just uninitialized
data.

Greetings,

Andres Freund

Attachment

Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Sat, Aug 18, 2018 at 8:53 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-08-17 01:07:06 -0700, Andres Freund wrote:
>> Hi,
>>
>> On 2018-08-17 12:10:20 +0530, Ashutosh Bapat wrote:
>> > We need to add LLVM code to fetch tts_flags and
>> > perform bit operation on it to get or set slow property. I haven't
>> > found any precedence for LLVM bit operations in postgresql's JIT code.
>>
>> There are several, look for the infomask accesses in
>> slot_compiler_deform.
>>
>> I'll try to do the adaption later today.
>
> Attached is a series of patches doing so.  The previous implementation
> of sysvar accesses wasn't actually working - the slot argument was
> uninitialized.

Sorry for that. I couldn't test it since the code wasn't compiling.
The changes to slot_compile_deform changes you provided in the patch
fix the compilation errors. Thanks for those changes.

>
> I also noticed an independent issue in your changes to
> ExecInitScanTupleSlot(): You can't assume that the plan belonging to the
> ScanState have a Scan node in their plan. Look e.g. at Material, Sort
> etc. So currently your scanrelid access is often just uninitialized
> data.

I have incorporated changes in your patches into the relevant patches
in the updated patch-set. With this patch-set make check-world passes
for me.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Andres Freund
Date:
On 2018-08-20 17:51:38 +0530, Ashutosh Bapat wrote:
> > I also noticed an independent issue in your changes to
> > ExecInitScanTupleSlot(): You can't assume that the plan belonging to the
> > ScanState have a Scan node in their plan. Look e.g. at Material, Sort
> > etc. So currently your scanrelid access is often just uninitialized
> > data.
> 
> I have incorporated changes in your patches into the relevant patches
> in the updated patch-set. With this patch-set make check-world passes
> for me.

Have you addressed the issue I commented on above?

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Mon, Aug 20, 2018 at 5:58 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-08-20 17:51:38 +0530, Ashutosh Bapat wrote:
>> > I also noticed an independent issue in your changes to
>> > ExecInitScanTupleSlot(): You can't assume that the plan belonging to the
>> > ScanState have a Scan node in their plan. Look e.g. at Material, Sort
>> > etc. So currently your scanrelid access is often just uninitialized
>> > data.
>>
>> I have incorporated changes in your patches into the relevant patches
>> in the updated patch-set. With this patch-set make check-world passes
>> for me.
>
> Have you addressed the issue I commented on above?

Sorry, forgot about that. Here's the patch set with that addressed.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2018-08-20 19:51:33 +0530, Ashutosh Bapat wrote:
> Sorry, forgot about that. Here's the patch set with that addressed.

Btw, you attach files as tar.zip, but they're actually gzip
compressed...


> From 838a463646a048b3dccff95079a514fdc86effb3 Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
> Date: Mon, 13 Aug 2018 11:27:57 +0530
> Subject: [PATCH 01/11] Split ExecStoreTuple into ExecStoreHeapTuple and
>  ExecStoreBufferHeapTuple
> 
> ExecStoreTuple() accepts a heap tuple from a buffer or constructed
> on-the-fly. In the first case the caller passed a valid buffer and in
> the later case it passes InvalidBuffer. In the first case,
> ExecStoreTuple() pins the given buffer and in the later case it
> records shouldFree flag. The function has some extra checks to
> differentiate between the two cases.  The usecases never overlap thus
> spending extra cycles in checks is useless. Hence separate these
> usecases into separate functions ExecStoreHeapTuple() to store
> on-the-fly tuple and ExecStoreBufferHeapTuple() to store an on-disk
> tuple from a buffer. This allows to shave some extra cycles while
> storing a tuple in the slot.

It doesn't *yet* allow shaving extra cycles, no?

> *     SLOT ACCESSORS
>   *        ExecSetSlotDescriptor    - set a slot's tuple descriptor
> - *        ExecStoreTuple            - store a physical tuple in the slot
> + *        ExecStoreHeapTuple        - store an on-the-fly heap tuple in the slot
> + *        ExecStoreBufferHeapTuple - store an on-disk heap tuple in the slot
>   *        ExecStoreMinimalTuple    - store a minimal physical tuple in the slot
>   *        ExecClearTuple            - clear contents of a slot
>   *        ExecStoreVirtualTuple    - mark slot as containing a virtual
>   *        tuple

I'd advocate for a separate patch ripping these out, they're almost
always out of date.


>  /* --------------------------------
> - *        ExecStoreTuple
> + *        ExecStoreHeapTuple
>   *
> - *        This function is used to store a physical tuple into a specified
> + *        This function is used to store an on-the-fly physical tuple into a specified
>   *        slot in the tuple table.
>   *
>   *        tuple:    tuple to store
>   *        slot:    slot to store it in
> - *        buffer: disk buffer if tuple is in a disk page, else InvalidBuffer
>   *        shouldFree: true if ExecClearTuple should pfree() the tuple
>   *                    when done with it
>   *
> - * If 'buffer' is not InvalidBuffer, the tuple table code acquires a pin
> - * on the buffer which is held until the slot is cleared, so that the tuple
> - * won't go away on us.
> + * shouldFree is normally set 'true' for tuples constructed on-the-fly.  But it
> + * can be 'false' when the referenced tuple is held in a tuple table slot
> + * belonging to a lower-level executor Proc node.  In this case the lower-level
> + * slot retains ownership and responsibility for eventually releasing the
> + * tuple.  When this method is used, we must be certain that the upper-level
> + * Proc node will lose interest in the tuple sooner than the lower-level one
> + * does!  If you're not certain, copy the lower-level tuple with heap_copytuple
> + * and let the upper-level table slot assume ownership of the copy!
> + *
> + * Return value is just the passed-in slot pointer.
> + * --------------------------------
> + */
> +TupleTableSlot *
> +ExecStoreHeapTuple(HeapTuple tuple,
> +                   TupleTableSlot *slot,
> +                   bool shouldFree)
> +{
> +    /*
> +     * sanity checks
> +     */
> +    Assert(tuple != NULL);
> +    Assert(slot != NULL);
> +    Assert(slot->tts_tupleDescriptor != NULL);
> +
> +    /*
> +     * Free any old physical tuple belonging to the slot.
> +     */
> +    if (slot->tts_shouldFree)
> +        heap_freetuple(slot->tts_tuple);
> +    if (slot->tts_shouldFreeMin)
> +        heap_free_minimal_tuple(slot->tts_mintuple);
> +
> +    /*
> +     * Store the new tuple into the specified slot.
> +     */
> +    slot->tts_isempty = false;
> +    slot->tts_shouldFree = shouldFree;
> +    slot->tts_shouldFreeMin = false;
> +    slot->tts_tuple = tuple;
> +    slot->tts_mintuple = NULL;
> +
> +    /* Mark extracted state invalid */
> +    slot->tts_nvalid = 0;
> +
> +    return slot;
> +}

Uh, there could very well be a buffer previously stored in the slot, no?
This can't currently be applied independently afaict.


> From c4c55bc0d501400ffca2e7393039b2d38660cd2d Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
> Date: Mon, 13 Aug 2018 11:27:57 +0530
> Subject: [PATCH 02/11] tts_nvalid in TupleTableSlot is AttrNumber

> @@ -125,7 +125,7 @@ typedef struct TupleTableSlot
>      MemoryContext tts_mcxt;        /* slot itself is in this context */
>      Buffer        tts_buffer;        /* tuple's buffer, or InvalidBuffer */
>  #define FIELDNO_TUPLETABLESLOT_NVALID 9
> -    int            tts_nvalid;        /* # of valid values in tts_values */
> +    AttrNumber    tts_nvalid;        /* # of valid values in tts_values */
>  #define FIELDNO_TUPLETABLESLOT_VALUES 10
>      Datum       *tts_values;        /* current per-attribute values */
>  #define FIELDNO_TUPLETABLESLOT_ISNULL 11

Shouldn't this be adapting at least a *few* more things, like
slot_getattr's argument?



>   /*
> - * Fill in missing values for a TupleTableSlot.
> - *
> - * This is only exposed because it's needed for JIT compiled tuple
> - * deforming. That exception aside, there should be no callers outside of this
> - * file.
> - */
> -void
> -slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum)
> -{
> -    AttrMissing *attrmiss = NULL;
> -    int            missattnum;
> -
> -    if (slot->tts_tupleDescriptor->constr)
> -        attrmiss = slot->tts_tupleDescriptor->constr->missing;
> -
> -    if (!attrmiss)
> -    {
> -        /* no missing values array at all, so just fill everything in as NULL */
> -        memset(slot->tts_values + startAttNum, 0,
> -               (lastAttNum - startAttNum) * sizeof(Datum));
> -        memset(slot->tts_isnull + startAttNum, 1,
> -               (lastAttNum - startAttNum) * sizeof(bool));
> -    }
> -    else
> -    {
> -        /* if there is a missing values array we must process them one by one */
> -        for (missattnum = startAttNum;
> -             missattnum < lastAttNum;
> -             missattnum++)
> -        {
> -            slot->tts_values[missattnum] = attrmiss[missattnum].am_value;
> -            slot->tts_isnull[missattnum] = !attrmiss[missattnum].am_present;
> -        }
> -    }
> -}

I would split out these moves into a separate commit, they are trivially
committable separately. The commit's pretty big already, and that'd make
it easier to see the actual differences.


> -
> -/*
>   * heap_compute_data_size
>   *        Determine size of the data area of a tuple to be constructed
>   */
> @@ -1407,10 +1370,9 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
>   *        re-computing information about previously extracted attributes.
>   *        slot->tts_nvalid is the number of attributes already extracted.
>   */
> -static void
> -slot_deform_tuple(TupleTableSlot *slot, int natts)
> +void
> +slot_deform_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, int natts)
>  {

This should be renamed to include "heap" in the name, as it's not going
to be usable for, say, zheap.


> -/*
> - * slot_getsysattr
> - *        This function fetches a system attribute of the slot's current tuple.
> - *        Unlike slot_getattr, if the slot does not contain system attributes,
> - *        this will return false (with a NULL attribute value) instead of
> - *        throwing an error.
> - */
> -bool
> -slot_getsysattr(TupleTableSlot *slot, int attnum,
> -                Datum *value, bool *isnull)
> -{
> -    HeapTuple    tuple = slot->tts_tuple;
> -
> -    Assert(attnum < 0);            /* else caller error */
> -    if (tuple == NULL ||
> -        tuple == &(slot->tts_minhdr))
> -    {
> -        /* No physical tuple, or minimal tuple, so fail */
> -        *value = (Datum) 0;
> -        *isnull = true;
> -        return false;
> -    }
> -    *value = heap_getsysattr(tuple, attnum, slot->tts_tupleDescriptor, isnull);
> -    return true;
> -}

I think I was wrong at saying that we should remove this. I think you
were right that it should become a callback...


> +/*
> + * This is a function used by all getattr() callbacks which deal with a heap
> + * tuple or some tuple format which can be represented as a heap tuple e.g. a
> + * minimal tuple.
> + *
> + * heap_getattr considers any attnum beyond the attributes available in the
> + * tuple as NULL. This function however returns the values of missing
> + * attributes from the tuple descriptor in that case. Also this function does
> + * not support extracting system attributes.
> + *
> + * If the attribute needs to be fetched from the tuple, the function fills in
> + * tts_values and tts_isnull arrays upto the required attnum.
> + */
> +Datum
> +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
> +                        int attnum, bool *isnull)
> +{
> +    HeapTupleHeader tup = tuple->t_data;
> +    Assert(slot->tts_nvalid < attnum);
> +
> +    Assert(attnum > 0);
> +
> +    if (attnum > HeapTupleHeaderGetNatts(tup))
> +        return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull);
> +
> +    /*
> +     * check if target attribute is null: no point in groveling through tuple
> +     */
> +    if (HeapTupleHasNulls(tuple) && att_isnull(attnum - 1, tup->t_bits))
> +    {
> +        *isnull = true;
> +        return (Datum) 0;
> +    }

I still think this is an optimization with a negative benefit,
especially as it requires an extra callback. We should just rely on
slot_deform_tuple and then access that. That'll also just access the
null bitmap for the relevant column, and it'll make successive accesses
cheaper.

> @@ -2883,7 +2885,7 @@ CopyFrom(CopyState cstate)
>              if (slot == NULL)    /* "do nothing" */
>                  skip_tuple = true;
>              else                /* trigger might have changed tuple */
> -                tuple = ExecMaterializeSlot(slot);
> +                tuple = ExecFetchSlotTuple(slot, true);
>          }

Could we do the Materialize vs Fetch vs Copy change separately?


> diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
> index e1eb7c3..9957c70 100644
> --- a/src/backend/commands/matview.c
> +++ b/src/backend/commands/matview.c
> @@ -484,7 +484,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
>       * get the heap tuple out of the tuple table slot, making sure we have a
>       * writable copy
>       */
> -    tuple = ExecMaterializeSlot(slot);
> +    tuple = ExecCopySlotTuple(slot);
>  
>      heap_insert(myState->transientrel,
>                  tuple,
> @@ -494,6 +494,9 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
>  
>      /* We know this is a newly created relation, so there are no indexes */
>  
> +    /* Free the copied tuple. */
> +    heap_freetuple(tuple);
> +
>      return true;
>  }

This'll potentially increase a fair amount of extra allocation overhead,
no?


> diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
> index 9d6e25a..1b4e726 100644
> --- a/src/backend/executor/execExprInterp.c
> +++ b/src/backend/executor/execExprInterp.c
> @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
>  
>          EEO_CASE(EEOP_INNER_SYSVAR)
>          {
> -            int            attnum = op->d.var.attnum;
> -            Datum        d;
> -
> -            /* these asserts must match defenses in slot_getattr */
> -            Assert(innerslot->tts_tuple != NULL);
> -            Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr));
> -
> -            /* heap_getsysattr has sufficient defenses against bad attnums */
> -            d = heap_getsysattr(innerslot->tts_tuple, attnum,
> -                                innerslot->tts_tupleDescriptor,
> -                                op->resnull);
> -            *op->resvalue = d;
> +            ExecEvalSysVar(state, op, econtext, innerslot);

Please split this out into a separate patch.


> +/*
> + * TupleTableSlotOps implementation for BufferHeapTupleTableSlot.
> + */
> +
> +static void
> +tts_buffer_init(TupleTableSlot *slot)
> +{
> +}

Should rename these to buffer_heap or such.

> +/*
> + * Store the given tuple into the given BufferHeapTupleTableSlot and pin the
> + * given buffer. If the tuple already contained in the slot can be freed free
> + * it.
> + */
> +static void
> +tts_buffer_store_tuple(TupleTableSlot *slot, HeapTuple tuple, Buffer buffer)
> +{
> +    BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
> +
> +    if (IS_TTS_SHOULDFREE(slot))
> +    {
> +        /*
> +         * A heap tuple stored in a BufferHeapTupleTableSlot should have a
> +         * buffer associated with it, unless it's materialized.
> +         */
> +        Assert(!BufferIsValid(bslot->buffer));
> +
> +        heap_freetuple(bslot->base.tuple);
> +        RESET_TTS_SHOULDFREE(slot);
> +    }
> +
> +    RESET_TTS_EMPTY(slot);
> +    slot->tts_nvalid = 0;
> +    bslot->base.tuple = tuple;
> +    bslot->base.off = 0;
> +
> +    /*
> +     * If tuple is on a disk page, keep the page pinned as long as we hold a
> +     * pointer into it.  We assume the caller already has such a pin.
> +     *
> +     * This is coded to optimize the case where the slot previously held a
> +     * tuple on the same disk page: in that case releasing and re-acquiring
> +     * the pin is a waste of cycles.  This is a common situation during
> +     * seqscans, so it's worth troubling over.
> +     */
> +    if (bslot->buffer != buffer)
> +    {
> +        if (BufferIsValid(bslot->buffer))
> +            ReleaseBuffer(bslot->buffer);
> +        bslot->buffer = buffer;
> +        IncrBufferRefCount(buffer);
> +    }
> +}

This needs to also support storing a non-buffer tuple, I think.


> +/*
> + * TupleTableSlotOps for each of TupleTableSlotTypes. These are used to
> + * identify the type of slot.
> + */
> +const TupleTableSlotOps TTSOpsVirtual = {
> +    sizeof(TupleTableSlot),
> +    tts_virtual_init,
> +    tts_virtual_release,
> +    tts_virtual_clear,
> +    tts_virtual_getsomeattrs,
> +    tts_virtual_attisnull,
> +    tts_virtual_getattr,
> +    tts_virtual_materialize,
> +    tts_virtual_copyslot,
> +
> +    /*
> +     * A virtual tuple table slot can not "own" a heap tuple or a minimal
> +     * tuple.
> +     */
> +    NULL,
> +    NULL,
> +    tts_virtual_copy_heap_tuple,
> +    tts_virtual_copy_minimal_tuple,
> +};

As we're now going to require C99, could you convert these into
designated initializer style (i.e. .init = tts_heap_init etc)?


> @@ -353,25 +1285,9 @@ ExecStoreHeapTuple(HeapTuple tuple,
>      Assert(slot != NULL);
>      Assert(slot->tts_tupleDescriptor != NULL);
>  
> -    /*
> -     * Free any old physical tuple belonging to the slot.
> -     */
> -    if (IS_TTS_SHOULDFREE(slot))
> -        heap_freetuple(slot->tts_tuple);
> -    if (IS_TTS_SHOULDFREEMIN(slot))
> -        heap_free_minimal_tuple(slot->tts_mintuple);
> -
> -    /*
> -     * Store the new tuple into the specified slot.
> -     */
> -    RESET_TTS_EMPTY(slot);
> -    shouldFree ? SET_TTS_SHOULDFREE(slot) : RESET_TTS_SHOULDFREE(slot);
> -    RESET_TTS_SHOULDFREEMIN(slot);
> -    slot->tts_tuple = tuple;
> -    slot->tts_mintuple = NULL;
> -
> -    /* Mark extracted state invalid */
> -    slot->tts_nvalid = 0;
> +    if (!TTS_IS_HEAPTUPLE(slot))
> +        elog(ERROR, "trying to store a heap tuple into wrong type of slot");
> +    tts_heap_store_tuple(slot, tuple, shouldFree);
>  
>      return slot;
>  }

This should allow for buffer tuples too.


> @@ -983,9 +1595,10 @@ ExecInitExtraTupleSlot(EState *estate, TupleDesc tupledesc)
>   * ----------------
>   */
>  TupleTableSlot *
> -ExecInitNullTupleSlot(EState *estate, TupleDesc tupType)
> +ExecInitNullTupleSlot(EState *estate, TupleDesc tupType,
> +                      const TupleTableSlotOps *tts_cb)
>  {
> -    TupleTableSlot *slot = ExecInitExtraTupleSlot(estate, tupType);
> +    TupleTableSlot *slot = ExecInitExtraTupleSlot(estate, tupType, tts_cb);
>  
>      return ExecStoreAllNullTuple(slot);
>  }

It's a bit weird that the type name is *Ops but the param is tts_cb, no?


> @@ -1590,7 +1590,8 @@ ExecHashTableInsert(HashJoinTable hashtable,
>                      TupleTableSlot *slot,
>                      uint32 hashvalue)
>  {
> -    MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
> +    bool        shouldFree;
> +    MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, &shouldFree);
>      int            bucketno;
>      int            batchno;
>  
> @@ -1664,6 +1665,9 @@ ExecHashTableInsert(HashJoinTable hashtable,
>                                hashvalue,
>                                &hashtable->innerBatchFile[batchno]);
>      }
> +
> +    if (shouldFree)
> +        heap_free_minimal_tuple(tuple);
>  }

Hm, how about splitting these out?


> @@ -277,10 +277,32 @@ ExecInsert(ModifyTableState *mtstate,
>      OnConflictAction onconflict = node->onConflictAction;
>  
>      /*
> -     * get the heap tuple out of the tuple table slot, making sure we have a
> -     * writable copy
> +     * Get the heap tuple out of the tuple table slot, making sure we have a
> +     * writable copy.
> +     *
> +     * If the slot can contain a heap tuple, materialize the tuple within the
> +     * slot itself so that the slot "owns" it and any changes to the tuple
> +     * reflect in the slot as well.
> +     *
> +     * Otherwise, store the copy of the heap tuple in es_dml_input_tuple_slot, which
> +     * is assumed to be able to hold a heap tuple, so that repeated requests
> +     * for heap tuple in the following code will not create multiple copies and
> +     * leak memory. Also keeping the tuple in the slot makes sure that it will
> +     * be freed when it's no more needed either because a trigger modified it
> +     * or when we are done processing it.
>       */
> -    tuple = ExecMaterializeSlot(slot);
> +    if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)))
> +    {
> +        TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot;
> +
> +        Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot));
> +        if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor)
> +            ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor);
> +        ExecCopySlot(es_slot, slot);
> +        slot = es_slot;
> +    }
> +
> +    tuple = ExecFetchSlotTuple(slot, true);

I strongly dislike this.  Could you look at my pluggable storage tree
and see whether you could do something similar here?


> @@ -2402,8 +2454,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
>           */
>          mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
>  
> -        /* Set up a slot for the output of the RETURNING projection(s) */
> -        ExecInitResultTupleSlotTL(estate, &mtstate->ps);
> +        /*
> +         * Set up a slot for the output of the RETURNING projection(s).
> +         * ExecDelete() requies the contents of the slot to be
> +         * saved/materialized, so use heap tuple table slot for a DELETE.
> +         * Otherwise a virtual tuple table slot suffices.
> +         */
> +        ExecInitResultTupleSlotTL(estate, &mtstate->ps,
> +                                  operation == CMD_DELETE ?
> +                                  &TTSOpsHeapTuple : &TTSOpsVirtual);
>          slot = mtstate->ps.ps_ResultTupleSlot;

I'm not clear on why this this the case?

>          /* Need a
>         diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
> index ecdbe7f..ea2858b 100644
> --- a/src/backend/executor/tqueue.c
> +++ b/src/backend/executor/tqueue.c
> @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
>      TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
>      HeapTuple    tuple;
>      shm_mq_result result;
> +    bool        tuple_copied = false;
> +
> +    /* Get the tuple out of slot, if necessary converting the slot's contents
> +     * into a heap tuple by copying. In the later case we need to free the copy.
> +     */
> +    if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
> +    {
> +        tuple = ExecFetchSlotTuple(slot, true);
> +        tuple_copied = false;
> +    }
> +    else
> +    {
> +        tuple = ExecCopySlotTuple(slot);
> +        tuple_copied = true;
> +    }

To me needing this if() here is a bad sign, I think we want a
ExecFetchSlotTuple* like api with a bool *shouldFree arg, like you did
for minimal tuples instead.

> diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
> index 66cc5c3..b44438b 100644
> --- a/src/backend/tcop/pquery.c
> +++ b/src/backend/tcop/pquery.c
> @@ -1071,7 +1071,7 @@ RunFromStore(Portal portal, ScanDirection direction, uint64 count,
>      uint64        current_tuple_count = 0;
>      TupleTableSlot *slot;
>  
> -    slot = MakeSingleTupleTableSlot(portal->tupDesc);
> +    slot = MakeSingleTupleTableSlot(portal->tupDesc, &TTSOpsMinimalTuple);
>  
>      dest->rStartup(dest, CMD_SELECT, portal->tupDesc);

These fairly rote changes make it really hard to see the actual meat of
the changes in the patch. I think one way to address that would be to
introduce stub &TTSOps* variables, add them to MakeSingleTupleTableSlot
etc, but still have them return the "old style" slots.  Then that can be
reviewed separately.

> @@ -1375,9 +1376,9 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
>       * previous row available for comparisons.  This is accomplished by
>       * swapping the slot pointer variables after each row.
>       */
> -    extraslot = MakeSingleTupleTableSlot(osastate->qstate->tupdesc);
> +    extraslot = MakeSingleTupleTableSlot(osastate->qstate->tupdesc,
> +                                         &TTSOpsMinimalTuple);
>      slot2 = extraslot;
> -
>      /* iterate till we find the hypothetical row */
>      while (tuplesort_gettupleslot(osastate->sortstate, true, true, slot,
>                                    &abbrevVal))

superflous change.




> diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
> index 5560a3e..ba98908 100644
> --- a/src/backend/utils/sort/tuplestore.c
> +++ b/src/backend/utils/sort/tuplestore.c
> @@ -1073,6 +1073,13 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
>   * pointer to a tuple held within the tuplestore.  The latter is more
>   * efficient but the slot contents may be corrupted if additional writes to
>   * the tuplestore occur.  (If using tuplestore_trim, see comments therein.)
> + *
> + * If the given slot can not contain a minimal tuple, the given minimal tuple
> + * is converted into the form that the given slot can contain. Irrespective of
> + * the value of copy, that conversion might need to create a copy. If
> + * should_free is set to true by tuplestore_gettuple(), the minimal tuple is
> + * freed after the conversion, if necessary. Right now, the function only
> + * supports slots of type HeapTupleTableSlot, other than MinimalTupleTableSlot.
>   */
>  bool
>  tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
> @@ -1085,12 +1092,25 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
>  
>      if (tuple)
>      {
> -        if (copy && !should_free)
> +        if (TTS_IS_MINIMALTUPLE(slot))
>          {
> -            tuple = heap_copy_minimal_tuple(tuple);
> +            if (copy && !should_free)
> +            {
> +                tuple = heap_copy_minimal_tuple(tuple);
> +                should_free = true;
> +            }
> +            ExecStoreMinimalTuple(tuple, slot, should_free);
> +        }
> +        else if (TTS_IS_HEAPTUPLE(slot))
> +        {
> +            HeapTuple htup = heap_tuple_from_minimal_tuple(tuple);
> +
> +            if (should_free)
> +                heap_free_minimal_tuple(tuple);
>              should_free = true;
> +
> +            ExecStoreHeapTuple(htup, slot, should_free);
>          }
> -        ExecStoreMinimalTuple(tuple, slot, should_free);
>          return true;
>      }
>      else

Why is this a good idea? Shouldn't the caller always have a minimal slot
here? This is problematic, because it means this'd need adapting for
every new slot type, which'd be kinda against the idea of the whole
thing...


> +#define TTS_IS_VIRTUAL(slot) ((slot)->tts_cb == &TTSOpsVirtual)
> +#define TTS_IS_HEAPTUPLE(slot) ((slot)->tts_cb == &TTSOpsHeapTuple)
> +#define TTS_IS_MINIMALTUPLE(slot) ((slot)->tts_cb == &TTSOpsMinimalTuple)
> +#define TTS_IS_BUFFERTUPLE(slot) ((slot)->tts_cb == &TTSOpsBufferTuple)
> +
> +extern Datum ExecFetchSlotTupleDatum(TupleTableSlot *slot);

this is a weird place for the function protoype?


> +/* TupleTableSlotType specific routines */
> +typedef struct TupleTableSlotOps
> +{
> +    /* Minimum size of the slot */
> +    size_t            base_slot_size;
> +
> +    /* Initialization. */
> +    void (*init)(TupleTableSlot *slot);
> +
> +    /* Destruction. */
> +    void (*release)(TupleTableSlot *slot);
> +
> +    /*
> +     * Clear the contents of the slot. Only the contents are expected to be
> +     * cleared and not the tuple descriptor. Typically an implementation of
> +     * this callback should free the memory allocated for the tuple contained
> +     * in the slot.
> +     */
> +    void (*clear)(TupleTableSlot *slot);
> +
> +    /*
> +     * Fill up first natts entries of tts_values and tts_isnull arrays with
> +     * values from the tuple contained in the slot. The function may be called
> +     * with natts more than the number of attributes available in the tuple, in
> +     * which case it should fill up as many entries as the number of available
> +     * attributes. The callback should update tts_nvalid with number of entries
> +     * filled up.
> +     */
> +    void (*getsomeattrs)(TupleTableSlot *slot, int natts);
> +
> +    /*
> +     * Returns true if the attribute given by attnum is NULL, return false
> +     * otherwise. Some slot types may have more efficient methods to return
> +     * NULL-ness of a given attribute compared to checking NULL-ness after
> +     * calling getsomeattrs(). So this is a separate callback. We expect this
> +     * callback to be invoked by slot_attisnull() only. That function returns
> +     * if the information is available readily e.g. in tts_isnull array.  The
> +     * callback need not repeat the same.
> +     */
> +    bool (*attisnull)(TupleTableSlot *slot, int attnum);
> +
> +    /*
> +     * Returns value of the given attribute as a datum and sets isnull to
> +     * false, if it's not NULL. If the attribute is NULL, it sets isnull to
> +     * true. Some slot types may have more efficient methods to return value of
> +     * a given attribute rather than returning the attribute value from
> +     * tts_values and tts_isnull after calling getsomeattrs(). So this is a
> +     * separate callback. We expect this callback to be invoked by
> +     * slot_getattr() only. That function returns if the information is
> +     * available readily e.g. in tts_values and tts_isnull array or can be
> +     * inferred from tuple descriptor.  The callback need not repeat the same.
> +     */
> +    Datum (*getattr)(TupleTableSlot *slot, int attnum, bool *isnull);

These two really show go.


> +/* virtual or base type */
> +struct TupleTableSlot
>  {
>      NodeTag        type;
>  #define FIELDNO_TUPLETABLESLOT_FLAGS 1
> -    uint16        tts_flags;        /* Boolean states */
> -#define FIELDNO_TUPLETABLESLOT_TUPLE 2
> -    HeapTuple    tts_tuple;        /* physical tuple, or NULL if virtual */
> -#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 3
> -    TupleDesc    tts_tupleDescriptor;    /* slot's tuple descriptor */
> -    MemoryContext tts_mcxt;        /* slot itself is in this context */
> -    Buffer        tts_buffer;        /* tuple's buffer, or InvalidBuffer */
> -#define FIELDNO_TUPLETABLESLOT_NVALID 6
> +    uint16        tts_flags;
> +#define FIELDNO_TUPLETABLESLOT_NVALID 2
>      AttrNumber    tts_nvalid;        /* # of valid values in tts_values */
> -#define FIELDNO_TUPLETABLESLOT_VALUES 7
> +
> +    const TupleTableSlotOps *const tts_cb;
> +#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4
> +    TupleDesc    tts_tupleDescriptor;    /* slot's tuple descriptor */
> +#define FIELDNO_TUPLETABLESLOT_VALUES 5
>      Datum       *tts_values;        /* current per-attribute values */
> -#define FIELDNO_TUPLETABLESLOT_ISNULL 8
> +#define FIELDNO_TUPLETABLESLOT_ISNULL 6
>      bool       *tts_isnull;        /* current per-attribute isnull flags */
> -    MinimalTuple tts_mintuple;    /* minimal tuple, or NULL if none */
> -    HeapTupleData tts_minhdr;    /* workspace for minimal-tuple-only case */
> -#define FIELDNO_TUPLETABLESLOT_OFF 11
> -    uint32        tts_off;        /* saved state for slot_deform_tuple */
> -} TupleTableSlot;
>  
> -#define TTS_HAS_PHYSICAL_TUPLE(slot)  \
> -    ((slot)->tts_tuple != NULL && (slot)->tts_tuple != &((slot)->tts_minhdr))
> +    /* can we optimize away? */
> +    MemoryContext tts_mcxt;        /* slot itself is in this context */
> +};

the "can we" comment above is pretty clearly wrong, I think (Yes, I made
it...).



> From 84046126d5b8c9d780cb7134048cc717bda462a8 Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
> Date: Mon, 13 Aug 2018 11:27:57 +0530
> Subject: [PATCH 07/11] Reset expression context just after resetting per tuple
>  context in ExecModifyTable().
> 
> Expression context saved in ps_ExprContext for ModifyTable node is
> used to process returning clause and on conflict clause. For every
> processed tuple it's reset in ExecProcessReturning() and
> ExecOnConflictUpdate(). When a query has both RETURNING and ON
> CONFLICT clauses, the reset happens twice and the first one of those
> might reset memory used by the other. For some reason this doesn't
> show up on HEAD, but is apparent when virtual tuple table slots, which
> do not copy the datums in its own memory, are used for tuples returned
> by RETURNING clause.
> 
> This is fix for a query failing in sql/insert_conflict.sql
> 
> insert into insertconflicttest as i values (23, 'Avocado') on conflict (key) do update set fruit = excluded.*::text
>   returning *;
> 
> Ashutosh Bapat, per suggestion by Andres Freund

Doesn't this have to be earlier in the series?


Phew, sorry if some things are daft, I'm kinda jetlagged...

- Andres


Re: TupleTableSlot abstraction

From
Ashutosh Bapat
Date:
On Fri, Aug 24, 2018 at 6:46 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-08-20 19:51:33 +0530, Ashutosh Bapat wrote:
>> Sorry, forgot about that. Here's the patch set with that addressed.
>
> Btw, you attach files as tar.zip, but they're actually gzip
> compressed...

I am using "tar -zcvf" to create the files where -z indicates gzip.
But if I use .gz as extension for some reason it's not able to
recognize the format. So, I just keep .zip. Do you see any problem
with that extension? The attached file has extension .tar.gz.

>
>
>> From 838a463646a048b3dccff95079a514fdc86effb3 Mon Sep 17 00:00:00 2001
>> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
>> Date: Mon, 13 Aug 2018 11:27:57 +0530
>> Subject: [PATCH 01/11] Split ExecStoreTuple into ExecStoreHeapTuple and
>>  ExecStoreBufferHeapTuple
>>
>> ExecStoreTuple() accepts a heap tuple from a buffer or constructed
>> on-the-fly. In the first case the caller passed a valid buffer and in
>> the later case it passes InvalidBuffer. In the first case,
>> ExecStoreTuple() pins the given buffer and in the later case it
>> records shouldFree flag. The function has some extra checks to
>> differentiate between the two cases.  The usecases never overlap thus
>> spending extra cycles in checks is useless. Hence separate these
>> usecases into separate functions ExecStoreHeapTuple() to store
>> on-the-fly tuple and ExecStoreBufferHeapTuple() to store an on-disk
>> tuple from a buffer. This allows to shave some extra cycles while
>> storing a tuple in the slot.
>
> It doesn't *yet* allow shaving extra cycles, no?

By segregating those two functions we are already saving some cycles
by not checking for a valid buffer for an on the fly tuple. That's not
huge but some. More saving will come with the complete abstraction.

>
>> *      SLOT ACCESSORS
>>   *           ExecSetSlotDescriptor   - set a slot's tuple descriptor
>> - *           ExecStoreTuple                  - store a physical tuple in the slot
>> + *           ExecStoreHeapTuple              - store an on-the-fly heap tuple in the slot
>> + *           ExecStoreBufferHeapTuple - store an on-disk heap tuple in the slot
>>   *           ExecStoreMinimalTuple   - store a minimal physical tuple in the slot
>>   *           ExecClearTuple                  - clear contents of a slot
>>   *           ExecStoreVirtualTuple   - mark slot as containing a virtual
>>   *           tuple
>
> I'd advocate for a separate patch ripping these out, they're almost
> always out of date.

Done. Added as 0001 patch.

>
>> + * Return value is just the passed-in slot pointer.
>> + * --------------------------------
>> + */
>> +TupleTableSlot *
>> +ExecStoreHeapTuple(HeapTuple tuple,
>> +                                TupleTableSlot *slot,
>> +                                bool shouldFree)
>> +{
>> +
>> +     return slot;
>> +}
>
> Uh, there could very well be a buffer previously stored in the slot, no?
> This can't currently be applied independently afaict.

Sorry, I missed that when splitting the function. Added code to unpin
any buffer that was pinned for an earlier tuple.

>
>> From c4c55bc0d501400ffca2e7393039b2d38660cd2d Mon Sep 17 00:00:00 2001
>> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
>> Date: Mon, 13 Aug 2018 11:27:57 +0530
>> Subject: [PATCH 02/11] tts_nvalid in TupleTableSlot is AttrNumber
>
>> @@ -125,7 +125,7 @@ typedef struct TupleTableSlot
>>       MemoryContext tts_mcxt;         /* slot itself is in this context */
>>       Buffer          tts_buffer;             /* tuple's buffer, or InvalidBuffer */
>>  #define FIELDNO_TUPLETABLESLOT_NVALID 9
>> -     int                     tts_nvalid;             /* # of valid values in tts_values */
>> +     AttrNumber      tts_nvalid;             /* # of valid values in tts_values */
>>  #define FIELDNO_TUPLETABLESLOT_VALUES 1
>>       Datum      *tts_values;         /* current per-attribute values */
>>  #define FIELDNO_TUPLETABLESLOT_ISNULL 11
>
> Shouldn't this be adapting at least a *few* more things, like
> slot_getattr's argument?

That's not something I had in mind for this patch. There are many
other places where we declare attribute number variables as "int"
instead of "AttrNumber". I don't think that's what we should do in
this patch set. The idea of this patch is to reduce the size of
TupleTableSlot by 2 bytes. Changing signatures of the functions,
though has some readability improvement, doesn't achieve any such
benefit.

Even then I went ahead and change signatures of those functions, but
server built with LLVM crashed. So, I have reverted back all those
changes. I think we should change the JIT code automatically when
function signatures/structure definitions change OR at least
compilation should fail indicating the difference between JIT code and
normal code. Investigating a crash takes a hell lot of time and is not
easy without a custom LLVM build. I doubt if every PostgreSQL
developer would want to do that.

>
>>   /*
>> - * Fill in missing values for a TupleTableSlot.
>> - *
>> - * This is only exposed because it's needed for JIT compiled tuple
>> - * deforming. That exception aside, there should be no callers outside of this
>> - * file.
>> - */
>> -void
>> -slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum)
>> -{
>> -     AttrMissing *attrmiss = NULL;
>> -     int                     missattnum;
>> -
>> -     if (slot->tts_tupleDescriptor->constr)
>> -             attrmiss = slot->tts_tupleDescriptor->constr->missing;
>> -
>> -     if (!attrmiss)
>> -     {
>> -             /* no missing values array at all, so just fill everything in as NULL */
>> -             memset(slot->tts_values + startAttNum, 0,
>> -                        (lastAttNum - startAttNum) * sizeof(Datum));
>> -             memset(slot->tts_isnull + startAttNum, 1,
>> -                        (lastAttNum - startAttNum) * sizeof(bool));
>> -     }
>> -     else
>> -     {
>> -             /* if there is a missing values array we must process them one by one */
>> -             for (missattnum = startAttNum;
>> -                      missattnum < lastAttNum;
>> -                      missattnum++)
>> -             {
>> -                     slot->tts_values[missattnum] = attrmiss[missattnum].am_value;
>> -                     slot->tts_isnull[missattnum] = !attrmiss[missattnum].am_present;
>> -             }
>> -     }
>> -}
>
> I would split out these moves into a separate commit, they are trivially
> committable separately. The commit's pretty big already, and that'd make
> it easier to see the actual differences.

I have included some of those movements in the same patch which
changes the type of tts_nvalid. Functions like slot_attisnull() are
changed to be inline static functions in tuptable.h. Those functions
are now inline since they are just wrapper around slot specific
callbacks. Thus without the TupleTableSlot abstraction patch it's not
possible to mark them "inline and thus move them to a header file e.g.
tuptable.h. For now I have left these kinds of movements in
TupleTableSlot abstraction patch.

>
>
>> -
>> -/*
>>   * heap_compute_data_size
>>   *           Determine size of the data area of a tuple to be constructed
>>   */
>> @@ -1407,10 +1370,9 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
>>   *           re-computing information about previously extracted attributes.
>>   *           slot->tts_nvalid is the number of attributes already extracted.
>>   */
>> -static void
>> -slot_deform_tuple(TupleTableSlot *slot, int natts)
>> +void
>> +slot_deform_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, int natts)
>>  {
>
> This should be renamed to include "heap" in the name, as it's not going
> to be usable for, say, zheap.

LGTM. Done.

>
>
>> -/*
>> - * slot_getsysattr
>> - *           This function fetches a system attribute of the slot's current tuple.
>> - *           Unlike slot_getattr, if the slot does not contain system attributes,
>> - *           this will return false (with a NULL attribute value) instead of
>> - *           throwing an error.
>> - */
>> -bool
>> -slot_getsysattr(TupleTableSlot *slot, int attnum,
>> -                             Datum *value, bool *isnull)
>> -{
>> -     HeapTuple       tuple = slot->tts_tuple;
>> -
>> -     Assert(attnum < 0);                     /* else caller error */
>> -     if (tuple == NULL ||
>> -             tuple == &(slot->tts_minhdr))
>> -     {
>> -             /* No physical tuple, or minimal tuple, so fail */
>> -             *value = (Datum) 0;
>> -             *isnull = true;
>> -             return false;
>> -     }
>> -     *value = heap_getsysattr(tuple, attnum, slot->tts_tupleDescriptor, isnull);
>> -     return true;
>> -}
>
> I think I was wrong at saying that we should remove this. I think you
> were right that it should become a callback...

We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you
want to reinstantiate those as well? If so, slot_getsysattr() becomes
a wrapper around getsysattr() callback.

>
>
>> +/*
>> + * This is a function used by all getattr() callbacks which deal with a heap
>> + * tuple or some tuple format which can be represented as a heap tuple e.g. a
>> + * minimal tuple.
>> + *
>> + * heap_getattr considers any attnum beyond the attributes available in the
>> + * tuple as NULL. This function however returns the values of missing
>> + * attributes from the tuple descriptor in that case. Also this function does
>> + * not support extracting system attributes.
>> + *
>> + * If the attribute needs to be fetched from the tuple, the function fills in
>> + * tts_values and tts_isnull arrays upto the required attnum.
>> + */
>> +Datum
>> +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
>> +                                             int attnum, bool *isnull)
>> +{
>> +     HeapTupleHeader tup = tuple->t_data;
>> +     Assert(slot->tts_nvalid < attnum);
>> +
>> +     Assert(attnum > 0);
>> +
>> +     if (attnum > HeapTupleHeaderGetNatts(tup))
>> +             return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull);
>> +
>> +     /*
>> +      * check if target attribute is null: no point in groveling through tuple
>> +      */
>> +     if (HeapTupleHasNulls(tuple) && att_isnull(attnum - 1, tup->t_bits))
>> +     {
>> +             *isnull = true;
>> +             return (Datum) 0;
>> +     }
>
> I still think this is an optimization with a negative benefit,
> especially as it requires an extra callback. We should just rely on
> slot_deform_tuple and then access that. That'll also just access the
> null bitmap for the relevant column, and it'll make successive accesses
> cheaper.

I don't understand why we have differing implementations for
slot_attisnull(), slot_getsomeattrs(), slot_getattr() in HEAD. If what
you are saying is true, we should have implemented all the first and
last as a call to slot_getsomeattrs() followed by returing values from
tts_values and tts_isnull. Since this is refactoring work, I am trying
not to change the existing functionality of those functions.

>
>> @@ -2883,7 +2885,7 @@ CopyFrom(CopyState cstate)
>>                       if (slot == NULL)       /* "do nothing" */
>>                               skip_tuple = true;
>>                       else                            /* trigger might have changed tuple */
>> -                             tuple = ExecMaterializeSlot(slot);
>> +                             tuple = ExecFetchSlotTuple(slot, true);
>>               }
>
> Could we do the Materialize vs Fetch vs Copy change separately?
>

Ok. I will do that.

>
>> diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
>> index e1eb7c3..9957c70 100644
>> --- a/src/backend/commands/matview.c
>> +++ b/src/backend/commands/matview.c
>> @@ -484,7 +484,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
>>        * get the heap tuple out of the tuple table slot, making sure we have a
>>        * writable copy
>>        */
>> -     tuple = ExecMaterializeSlot(slot);
>> +     tuple = ExecCopySlotTuple(slot);
>>
>>       heap_insert(myState->transientrel,
>>                               tuple,
>> @@ -494,6 +494,9 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
>>
>>       /* We know this is a newly created relation, so there are no indexes */
>>
>> +     /* Free the copied tuple. */
>> +     heap_freetuple(tuple);
>> +
>>       return true;
>>  }
>
> This'll potentially increase a fair amount of extra allocation overhead,
> no?

Yes. There are two ways to avoid that 1. Using a slot type specific
for the transient rel which is receiving the tuple OR 2. Allocate
memory for the converted tuple into per tuple memory context. The
first case too will have the same problem since we will free one tuple
at at time in ExecClearSlot(). I am inclined to use second option.
Comments?

>
>
>> diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
>> index 9d6e25a..1b4e726 100644
>> --- a/src/backend/executor/execExprInterp.c
>> +++ b/src/backend/executor/execExprInterp.c
>> @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
>>
>>               EEO_CASE(EEOP_INNER_SYSVAR)
>>               {
>> -                     int                     attnum = op->d.var.attnum;
>> -                     Datum           d;
>> -
>> -                     /* these asserts must match defenses in slot_getattr */
>> -                     Assert(innerslot->tts_tuple != NULL);
>> -                     Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr));
>> -
>> -                     /* heap_getsysattr has sufficient defenses against bad attnums */
>> -                     d = heap_getsysattr(innerslot->tts_tuple, attnum,
>> -                                                             innerslot->tts_tupleDescriptor,
>> -                                                             op->resnull);
>> -                     *op->resvalue = d;
>> +                     ExecEvalSysVar(state, op, econtext, innerslot);
>
> Please split this out into a separate patch.
>

Ok. Will do (not in the attached patch-set)

>
>> +/*
>> + * TupleTableSlotOps implementation for BufferHeapTupleTableSlot.
>> + */
>> +
>> +static void
>> +tts_buffer_init(TupleTableSlot *slot)
>> +{
>> +}
>
> Should rename these to buffer_heap or such.

Ok. Done.

>
>> +/*
>> + * Store the given tuple into the given BufferHeapTupleTableSlot and pin the
>> + * given buffer. If the tuple already contained in the slot can be freed free
>> + * it.
>> + */
>> +static void
>> +tts_buffer_store_tuple(TupleTableSlot *slot, HeapTuple tuple, Buffer buffer)
>> +{
>> +     BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
>> +
>> +     if (IS_TTS_SHOULDFREE(slot))
>> +     {
>> +             /*
>> +              * A heap tuple stored in a BufferHeapTupleTableSlot should have a
>> +              * buffer associated with it, unless it's materialized.
>> +              */
>> +             Assert(!BufferIsValid(bslot->buffer));
>> +
>> +             heap_freetuple(bslot->base.tuple);
>> +             RESET_TTS_SHOULDFREE(slot);
>> +     }
>> +
>> +     RESET_TTS_EMPTY(slot);
>> +     slot->tts_nvalid = 0;
>> +     bslot->base.tuple = tuple;
>> +     bslot->base.off = 0;
>> +
>> +     /*
>> +      * If tuple is on a disk page, keep the page pinned as long as we hold a
>> +      * pointer into it.  We assume the caller already has such a pin.
>> +      *
>> +      * This is coded to optimize the case where the slot previously held a
>> +      * tuple on the same disk page: in that case releasing and re-acquiring
>> +      * the pin is a waste of cycles.  This is a common situation during
>> +      * seqscans, so it's worth troubling over.
>> +      */
>> +     if (bslot->buffer != buffer)
>> +     {
>> +             if (BufferIsValid(bslot->buffer))
>> +                     ReleaseBuffer(bslot->buffer);
>> +             bslot->buffer = buffer;
>> +             IncrBufferRefCount(buffer);
>> +     }
>> +}
>
> This needs to also support storing a non-buffer tuple, I think.

I don't see a reason why that's needed. A non-buffer tuple should be
stored in a HeapTupleTableSlot. It might happen that
ExecMaterializeSlot() creates a copy of the buffer tuple in the memory
context of the slot. But there should be a buffer associated with the
heap tuple being stored in BufferHeapTupleTableSlot. Otherwise, why
are we separating those two types.

In fact, as discussed earlier offline, I am of the opinion that buffer
should be an attribute of every tuple table slot which may carry tuple
from a buffer, instead of creating two slot types with and without
buffer.

OTOH, I observe that tts_buffer_store_tuple() is called only from
ExecStoreBufferHeapTuple(). We may want to merge the first into the
later.

>
>> +/*
>> + * TupleTableSlotOps for each of TupleTableSlotTypes. These are used to
>> + * identify the type of slot.
>> + */
>> +const TupleTableSlotOps TTSOpsVirtual = {
>> +     sizeof(TupleTableSlot),
>> +     tts_virtual_init,
>> +     tts_virtual_release,
>> +     tts_virtual_clear,
>> +     tts_virtual_getsomeattrs,
>> +     tts_virtual_attisnull,
>> +     tts_virtual_getattr,
>> +     tts_virtual_materialize,
>> +     tts_virtual_copyslot,
>> +
>> +     /*
>> +      * A virtual tuple table slot can not "own" a heap tuple or a minimal
>> +      * tuple.
>> +      */
>> +     NULL,
>> +     NULL,
>> +     tts_virtual_copy_heap_tuple,
>> +     tts_virtual_copy_minimal_tuple,
>> +};
>
> As we're now going to require C99, could you convert these into
> designated initializer style (i.e. .init = tts_heap_init etc)?

That's a good idea. Done.

>
>> @@ -353,25 +1285,9 @@ ExecStoreHeapTuple(HeapTuple tuple,
>>       Assert(slot != NULL);
>>       Assert(slot->tts_tupleDescriptor != NULL);
>>
>> -     /*
>> -      * Free any old physical tuple belonging to the slot.
>> -      */
>> -     if (IS_TTS_SHOULDFREE(slot))
>> -             heap_freetuple(slot->tts_tuple);
>> -     if (IS_TTS_SHOULDFREEMIN(slot))
>> -             heap_free_minimal_tuple(slot->tts_mintuple);
>> -
>> -     /*
>> -      * Store the new tuple into the specified slot.
>> -      */
>> -     RESET_TTS_EMPTY(slot);
>> -     shouldFree ? SET_TTS_SHOULDFREE(slot) : RESET_TTS_SHOULDFREE(slot);
>> -     RESET_TTS_SHOULDFREEMIN(slot);
>> -     slot->tts_tuple = tuple;
>> -     slot->tts_mintuple = NULL;
>> -
>> -     /* Mark extracted state invalid */
>> -     slot->tts_nvalid = 0;
>> +     if (!TTS_IS_HEAPTUPLE(slot))
>> +             elog(ERROR, "trying to store a heap tuple into wrong type of slot");
>> +     tts_heap_store_tuple(slot, tuple, shouldFree);
>>
>>       return slot;
>>  }
>
> This should allow for buffer tuples too.

Same answer as the answer to your comment on tts_buffer_store_tuple().

>
>> @@ -983,9 +1595,10 @@ ExecInitExtraTupleSlot(EState *estate, TupleDesc tupledesc)
>>   * ----------------
>>   */
>>  TupleTableSlot *
>> -ExecInitNullTupleSlot(EState *estate, TupleDesc tupType)
>> +ExecInitNullTupleSlot(EState *estate, TupleDesc tupType,
>> +                                       const TupleTableSlotOps *tts_cb)
>>  {
>> -     TupleTableSlot *slot = ExecInitExtraTupleSlot(estate, tupType);
>> +     TupleTableSlot *slot = ExecInitExtraTupleSlot(estate, tupType, tts_cb);
>>
>>       return ExecStoreAllNullTuple(slot);
>>  }
>
> It's a bit weird that the type name is *Ops but the param is tts_cb, no?

Your original patch used the same names and I haven't changed that. I
am fine with tts_ops as well. Is that good enough?

>
>
>> @@ -1590,7 +1590,8 @@ ExecHashTableInsert(HashJoinTable hashtable,
>>                                       TupleTableSlot *slot,
>>                                       uint32 hashvalue)
>>  {
>> -     MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
>> +     bool            shouldFree;
>> +     MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, &shouldFree);
>>       int                     bucketno;
>>       int                     batchno;
>>
>> @@ -1664,6 +1665,9 @@ ExecHashTableInsert(HashJoinTable hashtable,
>>                                                         hashvalue,
>>                                                         &hashtable->innerBatchFile[batchno]);
>>       }
>> +
>> +     if (shouldFree)
>> +             heap_free_minimal_tuple(tuple);
>>  }
>
> Hm, how about splitting these out?

Split into a separate patch? It doesn't make sense to add this patch
before 0006 since the slots in those patches can "own" a minimal
tuple. Let's add a patch after 0006 i.e. tuple table abstraction
patch. Will do.

>
>
>> @@ -277,10 +277,32 @@ ExecInsert(ModifyTableState *mtstate,
>>       OnConflictAction onconflict = node->onConflictAction;
>>
>>       /*
>> -      * get the heap tuple out of the tuple table slot, making sure we have a
>> -      * writable copy
>> +      * Get the heap tuple out of the tuple table slot, making sure we have a
>> +      * writable copy.
>> +      *
>> +      * If the slot can contain a heap tuple, materialize the tuple within the
>> +      * slot itself so that the slot "owns" it and any changes to the tuple
>> +      * reflect in the slot as well.
>> +      *
>> +      * Otherwise, store the copy of the heap tuple in es_dml_input_tuple_slot, which
>> +      * is assumed to be able to hold a heap tuple, so that repeated requests
>> +      * for heap tuple in the following code will not create multiple copies and
>> +      * leak memory. Also keeping the tuple in the slot makes sure that it will
>> +      * be freed when it's no more needed either because a trigger modified it
>> +      * or when we are done processing it.
>>        */
>> -     tuple = ExecMaterializeSlot(slot);
>> +     if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)))
>> +     {
>> +             TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot;
>> +
>> +             Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot));
>> +             if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor)
>> +                     ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor);
>> +             ExecCopySlot(es_slot, slot);
>> +             slot = es_slot;
>> +     }
>> +
>> +     tuple = ExecFetchSlotTuple(slot, true);
>
> I strongly dislike this.  Could you look at my pluggable storage tree
> and see whether you could do something similar here?

Can you please provide the URL to your tree?

>
>
>> @@ -2402,8 +2454,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
>>                */
>>               mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
>>
>> -             /* Set up a slot for the output of the RETURNING projection(s) */
>> -             ExecInitResultTupleSlotTL(estate, &mtstate->ps);
>> +             /*
>> +              * Set up a slot for the output of the RETURNING projection(s).
>> +              * ExecDelete() requies the contents of the slot to be
>> +              * saved/materialized, so use heap tuple table slot for a DELETE.
>> +              * Otherwise a virtual tuple table slot suffices.
>> +              */
>> +             ExecInitResultTupleSlotTL(estate, &mtstate->ps,
>> +                                                               operation == CMD_DELETE ?
>> +                                                               &TTSOpsHeapTuple : &TTSOpsVirtual);
>>               slot = mtstate->ps.ps_ResultTupleSlot;
>
> I'm not clear on why this this the case?

Do you mean why does ExecDelete() materializes the tuple? I don't
know. But my guess is materialization ensures valid tuple being
returned in case the deleted tuple's contents are destroyed.

>
>>               /* Need a
>>         diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
>> index ecdbe7f..ea2858b 100644
>> --- a/src/backend/executor/tqueue.c
>> +++ b/src/backend/executor/tqueue.c
>> @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
>>       TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
>>       HeapTuple       tuple;
>>       shm_mq_result result;
>> +     bool            tuple_copied = false;
>> +
>> +     /* Get the tuple out of slot, if necessary converting the slot's contents
>> +      * into a heap tuple by copying. In the later case we need to free the copy.
>> +      */
>> +     if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
>> +     {
>> +             tuple = ExecFetchSlotTuple(slot, true);
>> +             tuple_copied = false;
>> +     }
>> +     else
>> +     {
>> +             tuple = ExecCopySlotTuple(slot);
>> +             tuple_copied = true;
>> +     }
>
> To me needing this if() here is a bad sign, I think we want a
> ExecFetchSlotTuple* like api with a bool *shouldFree arg, like you did
> for minimal tuples instead.

If we change the existing API, we will need to handle shouldFree flag
everywhere ExecFetchSlotTuple() is being called. Most of the callers
of ExecFetchSlotTuple pass tuple table slots which can "own" a heap
tuple. Your suggestion to use a separate ExecFetchSlotTuple() like API
looks good.. But I am not sure how useful it will be if there's going
to be a single caller of that API.

>
>> diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
>> index 66cc5c3..b44438b 100644
>> --- a/src/backend/tcop/pquery.c
>> +++ b/src/backend/tcop/pquery.c
>> @@ -1071,7 +1071,7 @@ RunFromStore(Portal portal, ScanDirection direction, uint64 count,
>>       uint64          current_tuple_count = 0;
>>       TupleTableSlot *slot;
>>
>> -     slot = MakeSingleTupleTableSlot(portal->tupDesc);
>> +     slot = MakeSingleTupleTableSlot(portal->tupDesc, &TTSOpsMinimalTuple);
>>
>>       dest->rStartup(dest, CMD_SELECT, portal->tupDesc);
>
> These fairly rote changes make it really hard to see the actual meat of
> the changes in the patch. I think one way to address that would be to
> introduce stub &TTSOps* variables, add them to MakeSingleTupleTableSlot
> etc, but still have them return the "old style" slots.  Then that can be
> reviewed separately.

This means that we have to define TTSOps variables before the actual
abstraction patch and then redefine those in the abstraction patch
(0006). Instead we could separate changes to the slot initialization
functions in a separate patch which comes after the abstraction proper
patch. The abstraction proper patch always creates a virtual tuple
table slot. We will need to compile, test and commit both the patches
together but reviewing becomes easier. Also the tuple table
abstraction patch doesn't pass regession on its own, it requires some
patches that follow it. So it should be fine. Comments?

>
>> @@ -1375,9 +1376,9 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
>>        * previous row available for comparisons.  This is accomplished by
>>        * swapping the slot pointer variables after each row.
>>        */
>> -     extraslot = MakeSingleTupleTableSlot(osastate->qstate->tupdesc);
>> +     extraslot = MakeSingleTupleTableSlot(osastate->qstate->tupdesc,
>> +                                                                              &TTSOpsMinimalTuple);
>>       slot2 = extraslot;
>> -
>>       /* iterate till we find the hypothetical row */
>>       while (tuplesort_gettupleslot(osastate->sortstate, true, true, slot,
>>                                                                 &abbrevVal))
>
> superflous change.

Done.

>
>
>
>
>> diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
>> index 5560a3e..ba98908 100644
>> --- a/src/backend/utils/sort/tuplestore.c
>> +++ b/src/backend/utils/sort/tuplestore.c
>> @@ -1073,6 +1073,13 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
>>   * pointer to a tuple held within the tuplestore.  The latter is more
>>   * efficient but the slot contents may be corrupted if additional writes to
>>   * the tuplestore occur.  (If using tuplestore_trim, see comments therein.)
>> + *
>> + * If the given slot can not contain a minimal tuple, the given minimal tuple
>> + * is converted into the form that the given slot can contain. Irrespective of
>> + * the value of copy, that conversion might need to create a copy. If
>> + * should_free is set to true by tuplestore_gettuple(), the minimal tuple is
>> + * freed after the conversion, if necessary. Right now, the function only
>> + * supports slots of type HeapTupleTableSlot, other than MinimalTupleTableSlot.
>>   */
>>  bool
>>  tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
>> @@ -1085,12 +1092,25 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
>>
>>       if (tuple)
>>       {
>> -             if (copy && !should_free)
>> +             if (TTS_IS_MINIMALTUPLE(slot))
>>               {
>> -                     tuple = heap_copy_minimal_tuple(tuple);
>> +                     if (copy && !should_free)
>> +                     {
>> +                             tuple = heap_copy_minimal_tuple(tuple);
>> +                             should_free = true;
>> +                     }
>> +                     ExecStoreMinimalTuple(tuple, slot, should_free);
>> +             }
>> +             else if (TTS_IS_HEAPTUPLE(slot))
>> +             {
>> +                     HeapTuple htup = heap_tuple_from_minimal_tuple(tuple);
>> +
>> +                     if (should_free)
>> +                             heap_free_minimal_tuple(tuple);
>>                       should_free = true;
>> +
>> +                     ExecStoreHeapTuple(htup, slot, should_free);
>>               }
>> -             ExecStoreMinimalTuple(tuple, slot, should_free);
>>               return true;
>>       }
>>       else
>
> Why is this a good idea? Shouldn't the caller always have a minimal slot
> here?

As I have explained in the commit message, not every caller has a
minimal slot right now. For example, AfterTriggerExecute fetches
foreign tuples from a tuple store as minimal tuples and requires those
to be in the form of heap tuple for trigger processing. We can use two
slots, first a minimal for fetching tuple from tuple store and second
a heap for trigger processing and convert from minimal tuple to heap
tuple using ExecCopySlotTuple(). That increases the number of slots.

> This is problematic, because it means this'd need adapting for
> every new slot type, which'd be kinda against the idea of the whole
> thing...

I don't think we will need to support more slot types in than what's
there right now. But in case we have to we have two more options as
below.

Amit Khandekar suggested offlist to invoke a (new) slot type specific
callback which will convert given minimal tuple into the tuple type of
the slot.

Third option is to write tuplestore_gettupleslot* functions one for
each kind of TupleTableSlot type.

I don't have much preference for one over the other.

>
>> +#define TTS_IS_VIRTUAL(slot) ((slot)->tts_cb == &TTSOpsVirtual)
>> +#define TTS_IS_HEAPTUPLE(slot) ((slot)->tts_cb == &TTSOpsHeapTuple)
>> +#define TTS_IS_MINIMALTUPLE(slot) ((slot)->tts_cb == &TTSOpsMinimalTuple)
>> +#define TTS_IS_BUFFERTUPLE(slot) ((slot)->tts_cb == &TTSOpsBufferTuple)
>> +
>> +extern Datum ExecFetchSlotTupleDatum(TupleTableSlot *slot);
>
> this is a weird place for the function protoype?

Fixed.

>
>
>> +/* TupleTableSlotType specific routines */
>> +typedef struct TupleTableSlotOps
>> +{
>> +     /* Minimum size of the slot */
>> +     size_t                  base_slot_size;
>> +
>> +     /* Initialization. */
>> +     void (*init)(TupleTableSlot *slot);
>> +
>> +     /* Destruction. */
>> +     void (*release)(TupleTableSlot *slot);
>> +
>> +     /*
>> +      * Clear the contents of the slot. Only the contents are expected to be
>> +      * cleared and not the tuple descriptor. Typically an implementation of
>> +      * this callback should free the memory allocated for the tuple contained
>> +      * in the slot.
>> +      */
>> +     void (*clear)(TupleTableSlot *slot);
>> +
>> +     /*
>> +      * Fill up first natts entries of tts_values and tts_isnull arrays with
>> +      * values from the tuple contained in the slot. The function may be called
>> +      * with natts more than the number of attributes available in the tuple, in
>> +      * which case it should fill up as many entries as the number of available
>> +      * attributes. The callback should update tts_nvalid with number of entries
>> +      * filled up.
>> +      */
>> +     void (*getsomeattrs)(TupleTableSlot *slot, int natts);
>> +
>> +     /*
>> +      * Returns true if the attribute given by attnum is NULL, return false
>> +      * otherwise. Some slot types may have more efficient methods to return
>> +      * NULL-ness of a given attribute compared to checking NULL-ness after
>> +      * calling getsomeattrs(). So this is a separate callback. We expect this
>> +      * callback to be invoked by slot_attisnull() only. That function returns
>> +      * if the information is available readily e.g. in tts_isnull array.  The
>> +      * callback need not repeat the same.
>> +      */
>> +     bool (*attisnull)(TupleTableSlot *slot, int attnum);
>> +
>> +     /*
>> +      * Returns value of the given attribute as a datum and sets isnull to
>> +      * false, if it's not NULL. If the attribute is NULL, it sets isnull to
>> +      * true. Some slot types may have more efficient methods to return value of
>> +      * a given attribute rather than returning the attribute value from
>> +      * tts_values and tts_isnull after calling getsomeattrs(). So this is a
>> +      * separate callback. We expect this callback to be invoked by
>> +      * slot_getattr() only. That function returns if the information is
>> +      * available readily e.g. in tts_values and tts_isnull array or can be
>> +      * inferred from tuple descriptor.  The callback need not repeat the same.
>> +      */
>> +     Datum (*getattr)(TupleTableSlot *slot, int attnum, bool *isnull);
>
> These two really show go.

I have explained my position for this while replying to an earlier comment.

>
>
>> +/* virtual or base type */
>> +struct TupleTableSlot
>>  {
>>       NodeTag         type;
>>  #define FIELDNO_TUPLETABLESLOT_FLAGS 1
>> -     uint16          tts_flags;              /* Boolean states */
>> -#define FIELDNO_TUPLETABLESLOT_TUPLE 2
>> -     HeapTuple       tts_tuple;              /* physical tuple, or NULL if virtual */
>> -#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 3
>> -     TupleDesc       tts_tupleDescriptor;    /* slot's tuple descriptor */
>> -     MemoryContext tts_mcxt;         /* slot itself is in this context */
>> -     Buffer          tts_buffer;             /* tuple's buffer, or InvalidBuffer */
>> -#define FIELDNO_TUPLETABLESLOT_NVALID 6
>> +     uint16          tts_flags;
>> +#define FIELDNO_TUPLETABLESLOT_NVALID 2
>>       AttrNumber      tts_nvalid;             /* # of valid values in tts_values */
>> -#define FIELDNO_TUPLETABLESLOT_VALUES 7
>> +
>> +     const TupleTableSlotOps *const tts_cb;
>> +#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4
>> +     TupleDesc       tts_tupleDescriptor;    /* slot's tuple descriptor */
>> +#define FIELDNO_TUPLETABLESLOT_VALUES 5
>>       Datum      *tts_values;         /* current per-attribute values */
>> -#define FIELDNO_TUPLETABLESLOT_ISNULL 8
>> +#define FIELDNO_TUPLETABLESLOT_ISNULL 6
>>       bool       *tts_isnull;         /* current per-attribute isnull flags */
>> -     MinimalTuple tts_mintuple;      /* minimal tuple, or NULL if none */
>> -     HeapTupleData tts_minhdr;       /* workspace for minimal-tuple-only case */
>> -#define FIELDNO_TUPLETABLESLOT_OFF 11
>> -     uint32          tts_off;                /* saved state for slot_deform_tuple */
>> -} TupleTableSlot;
>>
>> -#define TTS_HAS_PHYSICAL_TUPLE(slot)  \
>> -     ((slot)->tts_tuple != NULL && (slot)->tts_tuple != &((slot)->tts_minhdr))
>> +     /* can we optimize away? */
>> +     MemoryContext tts_mcxt;         /* slot itself is in this context */
>> +};
>
> the "can we" comment above is pretty clearly wrong, I think (Yes, I made
> it...).
>

Removed.

>
>
>> From 84046126d5b8c9d780cb7134048cc717bda462a8 Mon Sep 17 00:00:00 2001
>> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
>> Date: Mon, 13 Aug 2018 11:27:57 +0530
>> Subject: [PATCH 07/11] Reset expression context just after resetting per tuple
>>  context in ExecModifyTable().
>>
>> Expression context saved in ps_ExprContext for ModifyTable node is
>> used to process returning clause and on conflict clause. For every
>> processed tuple it's reset in ExecProcessReturning() and
>> ExecOnConflictUpdate(). When a query has both RETURNING and ON
>> CONFLICT clauses, the reset happens twice and the first one of those
>> might reset memory used by the other. For some reason this doesn't
>> show up on HEAD, but is apparent when virtual tuple table slots, which
>> do not copy the datums in its own memory, are used for tuples returned
>> by RETURNING clause.
>>
>> This is fix for a query failing in sql/insert_conflict.sql
>>
>> insert into insertconflicttest as i values (23, 'Avocado') on conflict (key) do update set fruit = excluded.*::text
>>   returning *;
>>
>> Ashutosh Bapat, per suggestion by Andres Freund
>
> Doesn't this have to be earlier in the series?

I have placed it later since the tuple table slot abstraction work
exposes the bug, which otherwise is not visible. But yes, we can
commit that patch before the tuple table slot abstraction proper work.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Alvaro Herrera
Date:
Man, how I dislike patches in tarballs.

0002 says:

+ * shouldFree is set 'true' since a tuple stored on a disk page should not be
+ * pfree'd.

Surely you mean 'false' :-)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: TupleTableSlot abstraction

From
Amit Khandekar
Date:
On 28 August 2018 at 22:43, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Fri, Aug 24, 2018 at 6:46 AM, Andres Freund <andres@anarazel.de> wrote:
>>
>>> -/*
>>> - * slot_getsysattr
>>> - *           This function fetches a system attribute of the slot's current tuple.
>>> - *           Unlike slot_getattr, if the slot does not contain system attributes,
>>> - *           this will return false (with a NULL attribute value) instead of
>>> - *           throwing an error.
>>> - */
>>> -bool
>>> -slot_getsysattr(TupleTableSlot *slot, int attnum,
>>> -                             Datum *value, bool *isnull)
>>> -{
>>> -     HeapTuple       tuple = slot->tts_tuple;
>>> -
>>> -     Assert(attnum < 0);                     /* else caller error */
>>> -     if (tuple == NULL ||
>>> -             tuple == &(slot->tts_minhdr))
>>> -     {
>>> -             /* No physical tuple, or minimal tuple, so fail */
>>> -             *value = (Datum) 0;
>>> -             *isnull = true;
>>> -             return false;
>>> -     }
>>> -     *value = heap_getsysattr(tuple, attnum, slot->tts_tupleDescriptor, isnull);
>>> -     return true;
>>> -}
>>
>> I think I was wrong at saying that we should remove this. I think you
>> were right that it should become a callback...
>
> We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you
> want to reinstantiate those as well? If so, slot_getsysattr() becomes
> a wrapper around getsysattr() callback.

One option is that the getsysattr() callback function returns false if
system attributes are not supported for that slot type. Other option
is that in the not-supported case, the function errors out, meaning
that the caller should be aware that the slot type is the one that
supports system attributes.

I had prepared changes for the first option, but Ashutosh Bapat
offlist made me realize that it's worth considering the 2nd option(
i.e. erroring out).

The only use case where slot_getsysattr() is called right now is
execCurrentOf(), and it currently checks the bool return value of
slot_getsysattr() and prints a user-friendly message if false is
returned. Looking at the code (commit 8f5ac440430ab), it seems that we
want to continue to have a user-friendly message for some unhandled
cases like custom scans. So perhaps for now it's ok to go with the
first option where getsysattr callback returns false for slot types
that don't have system attributes.

>
>>
>>
>>> +/*
>>> + * This is a function used by all getattr() callbacks which deal with a heap
>>> + * tuple or some tuple format which can be represented as a heap tuple e.g. a
>>> + * minimal tuple.
>>> + *
>>> + * heap_getattr considers any attnum beyond the attributes available in the
>>> + * tuple as NULL. This function however returns the values of missing
>>> + * attributes from the tuple descriptor in that case. Also this function does
>>> + * not support extracting system attributes.
>>> + *
>>> + * If the attribute needs to be fetched from the tuple, the function fills in
>>> + * tts_values and tts_isnull arrays upto the required attnum.
>>> + */
>>> +Datum
>>> +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
>>> +                                             int attnum, bool *isnull)
>>> +{
>>> +     HeapTupleHeader tup = tuple->t_data;
>>> +     Assert(slot->tts_nvalid < attnum);
>>> +
>>> +     Assert(attnum > 0);
>>> +
>>> +     if (attnum > HeapTupleHeaderGetNatts(tup))
>>> +             return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull);
>>> +
>>> +     /*
>>> +      * check if target attribute is null: no point in groveling through tuple
>>> +      */
>>> +     if (HeapTupleHasNulls(tuple) && att_isnull(attnum - 1, tup->t_bits))
>>> +     {
>>> +             *isnull = true;
>>> +             return (Datum) 0;
>>> +     }
>>
>> I still think this is an optimization with a negative benefit,
>> especially as it requires an extra callback. We should just rely on
>> slot_deform_tuple and then access that. That'll also just access the
>> null bitmap for the relevant column, and it'll make successive accesses
>> cheaper.
>
> I don't understand why we have differing implementations for
> slot_attisnull(), slot_getsomeattrs(), slot_getattr() in HEAD. If what
> you are saying is true, we should have implemented all the first and
> last as a call to slot_getsomeattrs() followed by returing values from
> tts_values and tts_isnull. Since this is refactoring work, I am trying
> not to change the existing functionality of those functions.

I agree that we should not change the way in which slot_getattr()
finds the attr; i.e. first call att_isnull(), and only then try to
deform the tuple. I mean there must be some good reason that is done
on HEAD. Maybe we can change this separately after investigation, but
not as part of the tuple abstraction patch.

----------

BTW, on HEAD, for dropped attribute slot_getattr() returns null datum,
which hasn't been done in the patch series. That needs to be done.


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2018-08-31 10:05:05 +0530, Amit Khandekar wrote:
> On 28 August 2018 at 22:43, Ashutosh Bapat
> >> I think I was wrong at saying that we should remove this. I think you
> >> were right that it should become a callback...
> 
> > We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you
> > want to reinstantiate those as well? If so, slot_getsysattr() becomes
> > a wrapper around getsysattr() callback.

Right.

> One option is that the getsysattr() callback function returns false if
> system attributes are not supported for that slot type. Other option
> is that in the not-supported case, the function errors out, meaning
> that the caller should be aware that the slot type is the one that
> supports system attributes.

> I had prepared changes for the first option, but Ashutosh Bapat
> offlist made me realize that it's worth considering the 2nd option(
> i.e. erroring out).

I think we should error out.


> >> I still think this is an optimization with a negative benefit,
> >> especially as it requires an extra callback. We should just rely on
> >> slot_deform_tuple and then access that. That'll also just access the
> >> null bitmap for the relevant column, and it'll make successive accesses
> >> cheaper.
> >
> > I don't understand why we have differing implementations for
> > slot_attisnull(), slot_getsomeattrs(), slot_getattr() in HEAD. If what
> > you are saying is true, we should have implemented all the first and
> > last as a call to slot_getsomeattrs() followed by returing values from
> > tts_values and tts_isnull. Since this is refactoring work, I am trying
> > not to change the existing functionality of those functions.
> 
> I agree that we should not change the way in which slot_getattr()
> finds the attr; i.e. first call att_isnull(), and only then try to
> deform the tuple. I mean there must be some good reason that is done
> on HEAD. Maybe we can change this separately after investigation, but
> not as part of the tuple abstraction patch.

There's really no good reason for the behaviour as it exists on HEAD. It
already can cause worse performance there. The price to pay for
continuing to have an optimization which isn't actually optimizing
anything is way too high if it requires us to have multiple functionally
unnecessary callbacks.

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Amit Khandekar
Date:
On 28 August 2018 at 22:43, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Fri, Aug 24, 2018 at 6:46 AM, Andres Freund <andres@anarazel.de> wrote:
>
>>
>>> @@ -2883,7 +2885,7 @@ CopyFrom(CopyState cstate)
>>>                       if (slot == NULL)       /* "do nothing" */
>>>                               skip_tuple = true;
>>>                       else                            /* trigger might have changed tuple */
>>> -                             tuple = ExecMaterializeSlot(slot);
>>> +                             tuple = ExecFetchSlotTuple(slot, true);
>>>               }
>>
>> Could we do the Materialize vs Fetch vs Copy change separately?
>>
>
> Ok. I will do that.

Ashutosh offlist has shared with me
0006-Rethink-ExecMaterializeSlot-ExecFetchSlotTuple-in-th.patch which
contains the separated changes. The attached tar includes this
additional patch.


>>
>>> @@ -1590,7 +1590,8 @@ ExecHashTableInsert(HashJoinTable hashtable,
>>>                                       TupleTableSlot *slot,
>>>                                       uint32 hashvalue)
>>>  {
>>> -     MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
>>> +     bool            shouldFree;
>>> +     MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, &shouldFree);
>>>       int                     bucketno;
>>>       int                     batchno;
>>>
>>> @@ -1664,6 +1665,9 @@ ExecHashTableInsert(HashJoinTable hashtable,
>>>                                                         hashvalue,
>>>                                                         &hashtable->innerBatchFile[batchno]);
>>>       }
>>> +
>>> +     if (shouldFree)
>>> +             heap_free_minimal_tuple(tuple);
>>>  }
>>
>> Hm, how about splitting these out?
>
> Split into a separate patch? It doesn't make sense to add this patch
> before 0006 since the slots in those patches can "own" a minimal
> tuple. Let's add a patch after 0006 i.e. tuple table abstraction
> patch. Will do.

Ashutosh offlist has shared with me
0009-Rethink-ExecFetchSlotMinimalTuple.patch which contains the above
separated changes. The attached tar includes this additional patch.


On 31 August 2018 at 20:50, Andres Freund <andres@anarazel.de> wrote:
>> On 31 August 2018 at 10:05, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> Hi,
>
> On 2018-08-31 10:05:05 +0530, Amit Khandekar wrote:
>> On 28 August 2018 at 22:43, Ashutosh Bapat
>> >> I think I was wrong at saying that we should remove this. I think you
>> >> were right that it should become a callback...
>>
>> > We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you
>> > want to reinstantiate those as well? If so, slot_getsysattr() becomes
>> > a wrapper around getsysattr() callback.
>
> Right.
>
>> One option is that the getsysattr() callback function returns false if
>> system attributes are not supported for that slot type. Other option
>> is that in the not-supported case, the function errors out, meaning
>> that the caller should be aware that the slot type is the one that
>> supports system attributes.
>
>> I had prepared changes for the first option, but Ashutosh Bapat
>> offlist made me realize that it's worth considering the 2nd option(
>> i.e. erroring out).
>
> I think we should error out.

I did these changes in the existing 0007-Restructure-TupleTableSlot...
patch. This patch now contains changes for the new getsysattr()
callback. I used the 2nd option : erroring out.

On 31 August 2018 at 10:05, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> The only use case where slot_getsysattr() is called right now is
> execCurrentOf(), and it currently checks the bool return value of
> slot_getsysattr() and prints a user-friendly message if false is
> returned. Looking at the code (commit 8f5ac440430ab), it seems that we
> want to continue to have a user-friendly message for some unhandled
> cases like custom scans. So perhaps for now it's ok to go with the
> first option where getsysattr callback returns false for slot types
> that don't have system attributes.

I noticed that the issue for which commit 8f5ac440430ab was done was
that the tid was attempted to be fetched from a tuple that doesn't
support system attribute (virtual tuple), although the report
complained about a non-user-friendly message being given. So similarly
for custom scan, since it is not handled similarly, for now we can
continue to give an "unfriendly" message. The getsysattr() callbacks
will return something like "virtual tuple table slot does not have
system atttributes" if the slot does not support system attributes.


The attached v11 tar has the above set of changes.

Thanks
-Amit Khandekar

Attachment

Re: TupleTableSlot abstraction

From
Andres Freund
Date:
On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> The attached v11 tar has the above set of changes.

- I've pushed 0003 - although that commit seems to have included a lot
  of things unrelated to the commit message. I guess two patches have
  accidentally been merged?  Could you split out the part of the changes
  that was mis-squashed?
- I've pushed an extended version of 0001.
- I've pushed 0002, after some minor polishing
- I've pushed 0004

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> Subject: [PATCH 05/14] Use tts_flags instead of previous bool members
> 
> Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
> This reduces the size of TupleTableSlot since each bool member takes
> at least a byte on the platforms where bool is defined as a char.
> 
> Ashutosh Bapat and Andres Freund

> +
> +/* true = slot is empty */
> +#define            TTS_ISEMPTY            (1 << 1)
> +#define IS_TTS_EMPTY(slot)    ((slot)->tts_flags & TTS_ISEMPTY)
> +#define SET_TTS_EMPTY(slot) ((slot)->tts_flags |= TTS_ISEMPTY)
> +#define RESET_TTS_EMPTY(slot) ((slot)->tts_flags &= ~TTS_ISEMPTY)

The flag stuff is the right way, but I'm a bit dubious about the
accessor functions.  I can see open-coding the accesses, using the
macros, or potentially going towards using bitfields.

Any comments?

- Andres


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> The attached v11 tar has the above set of changes.


> Subject: [PATCH 07/14] Restructure TupleTableSlot to allow tuples other than
>  HeapTuple

> +
> +/*
> + * This is a function used by all getattr() callbacks which deal with a heap
> + * tuple or some tuple format which can be represented as a heap tuple e.g. a
> + * minimal tuple.
> + *
> + * heap_getattr considers any attnum beyond the attributes available in the
> + * tuple as NULL. This function however returns the values of missing
> + * attributes from the tuple descriptor in that case. Also this function does
> + * not support extracting system attributes.
> + *
> + * If the attribute needs to be fetched from the tuple, the function fills in
> + * tts_values and tts_isnull arrays upto the required attnum.
> + */
> +Datum
> +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
> +                        int attnum, bool
> *isnull)

I'm still *vehemently* opposed to the introduction of this.



> @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo,
>          Datum        iDatum;
>          bool        isNull;
>  
> -        if (keycol != 0)
> +        if (keycol < 0)
> +        {
> +            HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot;
> +
> +            /* Only heap tuples have system attributes. */
> +            Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot));
> +
> +            iDatum = heap_getsysattr(hslot->tuple, keycol,
> +                                     slot->tts_tupleDescriptor,
> +                                     &isNull);
> +        }
> +        else if (keycol != 0)
>          {
>              /*
>               * Plain index column; get the value we need directly from the

This now should access the system column via the slot, right?  There's
other places like this IIRC.



> diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
> index 9d6e25a..1b4e726 100644
> --- a/src/backend/executor/execExprInterp.c
> +++ b/src/backend/executor/execExprInterp.c
> @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
>  
>          EEO_CASE(EEOP_INNER_SYSVAR)
>          {
> -            int            attnum = op->d.var.attnum;
> -            Datum        d;
> -
> -            /* these asserts must match defenses in slot_getattr */
> -            Assert(innerslot->tts_tuple != NULL);
> -            Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr));
> -
> -            /* heap_getsysattr has sufficient defenses against bad attnums */
> -            d = heap_getsysattr(innerslot->tts_tuple, attnum,
> -                                innerslot->tts_tupleDescriptor,
> -                                op->resnull);
> -            *op->resvalue = d;
> +            ExecEvalSysVar(state, op, econtext, innerslot);

These changes should be in a separate commit.


> +const TupleTableSlotOps TTSOpsHeapTuple = {
> +    sizeof(HeapTupleTableSlot),
> +    .init = tts_heap_init,

The first field should also use a named field (same in following cases).


> @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,    /* tuple table */
>      {
>          TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
>  
> +        slot->tts_cb->release(slot);
>          /* Always release resources and reset the slot to empty */
>          ExecClearTuple(slot);
>          if (slot->tts_tupleDescriptor)
> @@ -240,6 +1076,7 @@ void
>  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
>  {
>      /* This should match ExecResetTupleTable's processing of one slot */
> +    slot->tts_cb->release(slot);
>      Assert(IsA(slot, TupleTableSlot));
>      ExecClearTuple(slot);
>      if (slot->tts_tupleDescriptor)

ISTM that release should be called *after* clearing the slot.


> @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
>      TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
>      HeapTuple    tuple;
>      shm_mq_result result;
> +    bool        tuple_copied = false;
> +
> +    /* Get the tuple out of slot, if necessary converting the slot's contents
> +     * into a heap tuple by copying. In the later case we need to free the copy.
> +     */
> +    if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
> +    {
> +        tuple = ExecFetchSlotTuple(slot, true);
> +        tuple_copied = false;
> +    }
> +    else
> +    {
> +        tuple = ExecCopySlotTuple(slot);
> +        tuple_copied = true;
> +    }

This seems like a bad idea to me.  We shouldn't hardcode slots like
this.  I've previously argued that we should instead allow
ExecFetchSlotTuple() for all types of tuples, but add a new bool
*shouldFree argument, which will then allow the caller to free the
tuple.  We gain zilch by having this kind of logic in multiple callers.


> From 280eac5c6758061d0a46b7ef9dd324e9a23226b5 Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
> Date: Fri, 31 Aug 2018 10:53:42 +0530
> Subject: [PATCH 08/14] Change tuple table slot creation routines to suite
>  tuple table slot abstraction
> 
> This change requires ExecInitResultTupleSlotTL, ExecInitScanTupleSlot,
> ExecCreateScanSlotFromOuterPlan, ExecInitNullTupleSlot,
> ExecInitExtraTupleSlot, MakeTupleTableSlot, ExecAllocTableSlot to
> accept TupleTableSlotType as a new parameter. Change all their calls.
> 
> Ashutosh Bapat and Andres Freund
> 
> This by itself won't compile. Neither the tuple table slot abstraction
> patch would compile and work without this change. Both of those need
> to be committed together.

I don't like this kind of split - all commits should individually
compile. I think we should instead introduce dummy / empty structs for
&TTSOpsHeapTuple etc, and add the parameters necessary to pass them
through. And then move this patch to *before* the "core" abstract slot
patch.  That way every commit, but the super verbose stuff is still
split out.


> From 06d31f91831a1da78d56e4217f08d3866c7be6f8 Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
> Date: Fri, 31 Aug 2018 11:00:30 +0530
> Subject: [PATCH 09/14] Rethink ExecFetchSlotMinimalTuple()
> 
> Before this work, a TupleTableSlot could "own" a minimal tuple as
> well. Thus ExecFetchSlotMinimalTuple() returned a minimal tuple after
> constructing and "owning" it if it was not readily available. When the
> slot "owned" the minimal tuple, the memory consumed by the tuple was
> freed when a new tuple was stored in it or the slot was cleared.
> 
> With this work, not all TupleTableSlot types can "own" a minimal
> tuple.  When fetching a minimal tuple from a slot that can not "own" a
> tuple, memory is allocated to construct the minimal tuple, which needs
> to be freed explicitly. Hence ExecFetchSlotMinimalTuple() is modified
> to flag the caller whether it should free the memory consumed by the
> returned minimal tuple.
> 
> Right now only a MinimalTupleTableSlot can own a minimal tuple. But we
> may change that in future or a user defined TupleTableSlot type (added
> through an extension) may be able to "own" a minimal tuple in it.
> Hence instead of relying upon TTS_IS_MINIMAL() macro to tell us
> whether the TupleTableSlot can "own" a minimal tuple or not, we rely
> on the set of callbacks.  A TupleTableSlot which can hold a minimal
> tuple implements get_minimal_tuple callback. Other TupleTableSlot
> types leave the callback NULL.
> 
> The minimal tuple returned by this function is usually copied into a
> hash table or a file. But, unlike ExecFetchSlotTuple() it's never
> written to. Hence the difference between signatures of the two
> functions.

I'm somewhat inclined to think this should be done in an earlier patch,
before the main "abstract slot" work.

Ok, gotta switch to a smaller patch for a bit ;)

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Kyotaro HORIGUCHI
Date:
At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund <andres@anarazel.de> wrote in
<20180925234509.3hrrf6tmvy5tfith@alap3.anarazel.de>
> Hi,
> 
> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> > Subject: [PATCH 05/14] Use tts_flags instead of previous bool members
> > 
> > Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
> > This reduces the size of TupleTableSlot since each bool member takes
> > at least a byte on the platforms where bool is defined as a char.
> > 
> > Ashutosh Bapat and Andres Freund
> 
> > +
> > +/* true = slot is empty */
> > +#define            TTS_ISEMPTY            (1 << 1)
> > +#define IS_TTS_EMPTY(slot)    ((slot)->tts_flags & TTS_ISEMPTY)
> > +#define SET_TTS_EMPTY(slot) ((slot)->tts_flags |= TTS_ISEMPTY)
> > +#define RESET_TTS_EMPTY(slot) ((slot)->tts_flags &= ~TTS_ISEMPTY)
> 
> The flag stuff is the right way, but I'm a bit dubious about the
> accessor functions.  I can see open-coding the accesses, using the
> macros, or potentially going towards using bitfields.
> 
> Any comments?

Currently we have few setter/resetter function(macro)s for such
simple operations. FWIW open-coding in the following two looks
rather easier to me.

> if (IS_TTS_EMPTY(slot))
> if (slot->tts_flags & TTS_ISEMPTY)


About bitfields, an advantage of it is debugger awareness. We
don't need to look aside to the definitions of bitwise macros
while using debugger. And the current code is preserved in
appearance by using it.

> if (slot->tts_isempty)
> slot->tts_isempty = true;

In short, +1 from me to use bitfields. Coulnd't we use bitfield
here, possiblly in other places then?


=====
Not related to tuple slots, in other places, like infomask, we
handle a set of bitmap values altogether.


> infomask = tuple->t_data->t_infomask;

Bare bitfields are a bit inconvenient for the use. It gets
simpler using C11 anonymous member but not so bothersome even in
C99.  Anyway I don't think we jump into that immediately.

infomask.all = tuple->t_data->t_infomask.all;

- if (!HeapTupleHeaderXminCommitted(tuple))
C99> if (tuple->t_infomask.b.xmin_committed)
C11> if (tuple->t_infomask.xmin_committed)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: TupleTableSlot abstraction

From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund <andres@anarazel.de> wrote in
<20180925234509.3hrrf6tmvy5tfith@alap3.anarazel.de>
>> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
>>> Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
>>> This reduces the size of TupleTableSlot since each bool member takes
>>> at least a byte on the platforms where bool is defined as a char.

> About bitfields, an advantage of it is debugger awareness. We
> don't need to look aside to the definitions of bitwise macros
> while using debugger. And the current code is preserved in
> appearance by using it.

FWIW, I would expect a change like this to be a net loss performance-wise
on most platforms.  Testing the contents of a byte-wide variable is pretty
cheap on any architecture invented later than ~ 1970.  Testing a bit,
though, requires a masking operation that is not free.  I am not seeing
how making TupleTableSlot a little smaller buys that back ... we don't
normally have that many active slots in a plan.

            regards, tom lane


Re: TupleTableSlot abstraction

From
Amit Khandekar
Date:
I have only done the below two changes yet. After doing that and
rebasing with latest master, in the regression I got crashes, and I
suspect the reason being that I have used Virtual tuple slot for the
destination slot of execute_attr_map_slot(). I am analyzing it. I am
anyway attaching the patches (v12) to give you an idea of how I have
handled the below two items.

On Wed, 26 Sep 2018 at 05:09, Andres Freund <andres@anarazel.de> wrote:
>
> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> > The attached v11 tar has the above set of changes.
>
> - I've pushed 0003 - although that commit seems to have included a lot
>   of things unrelated to the commit message. I guess two patches have
>   accidentally been merged?  Could you split out the part of the changes
>   that was mis-squashed?

Yeah, it indeed looks like it had unrelated things, mostly the changes
that moved the slot attribute functions into execTuples.c . I have
included this change as the very first patch in the patch series.


>> From 280eac5c6758061d0a46b7ef9dd324e9a23226b5 Mon Sep 17 00:00:00 2001
>> From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
>> Date: Fri, 31 Aug 2018 10:53:42 +0530
>> Subject: [PATCH 08/14] Change tuple table slot creation routines to suite
>>  tuple table slot abstraction
>>
>> This change requires ExecInitResultTupleSlotTL, ExecInitScanTupleSlot,
>> ExecCreateScanSlotFromOuterPlan, ExecInitNullTupleSlot,
>> ExecInitExtraTupleSlot, MakeTupleTableSlot, ExecAllocTableSlot to
>> accept TupleTableSlotType as a new parameter. Change all their calls.
>>
>> Ashutosh Bapat and Andres Freund
>>
>> This by itself won't compile. Neither the tuple table slot abstraction
>> patch would compile and work without this change. Both of those need
>> to be committed together.
>
> I don't like this kind of split - all commits should individually
> compile. I think we should instead introduce dummy / empty structs for
> &TTSOpsHeapTuple etc, and add the parameters necessary to pass them
> through. And then move this patch to *before* the "core" abstract slot
> patch.  That way every commit, but the super verbose stuff is still
> split out.
>

Done. Moved this patch before the core one. In this patch, just
declared the global variables of type TupleTableSlotOps, without
initializing them.

I have tried to make sure individual patches compile successfully by
shuffling around the changes to the right patch, but there is one
particular patch that still gives error. Will fix that later.

I will handle the other review comments in the next patch series.

Attachment

Re: TupleTableSlot abstraction

From
Amit Khandekar
Date:
On Thu, 4 Oct 2018 at 22:59, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> I have only done the below two changes yet. After doing that and
> rebasing with latest master, in the regression I got crashes, and I
> suspect the reason being that I have used Virtual tuple slot for the
> destination slot of execute_attr_map_slot(). I am analyzing it. I am
> anyway attaching the patches (v12) to give you an idea of how I have
> handled the below two items.

It seems to be some corruption here :

@@ -956,17 +978,39 @@ ExecUpdate(ModifyTableState *mtstate,

- tuple = ExecMaterializeSlot(slot);

+ if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)))
+ {
+ TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot;
+
+ Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot));
+ if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor)
+ ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor);
+ ExecCopySlot(es_slot, slot);
+ slot = es_slot;
+ }
+
+ tuple = ExecFetchSlotTuple(slot, true);

After the slotification of partition tuple conversion, the input slot
is a virtual tuple, and the above code seems to result in some
corruption which I have not finished analyzing. It only happens for
INSERT ON CONFLICT case with partitions.

On Wed, 26 Sep 2018 at 05:35, Andres Freund <andres@anarazel.de> wrote:
>
> > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,    /* tuple table */
> >       {
> >               TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
> >
> > +             slot->tts_cb->release(slot);
> >               /* Always release resources and reset the slot to empty */
> >               ExecClearTuple(slot);
> >               if (slot->tts_tupleDescriptor)
> > @@ -240,6 +1076,7 @@ void
> >  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
> >  {
> >       /* This should match ExecResetTupleTable's processing of one slot */
> > +     slot->tts_cb->release(slot);
> >       Assert(IsA(slot, TupleTableSlot));
> >       ExecClearTuple(slot);
> >       if (slot->tts_tupleDescriptor)
>
> ISTM that release should be called *after* clearing the slot.

I am not sure what was release() designed to do. Currently all of the
implementers of this function are empty. Was it meant for doing
ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or
ReleaseBuffer(bslot->buffer) ? I think the purpose of keeping this
*before* clearing the tuple might be because the clear() might have
already cleared some handles that release() might need.

>
>
> > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
> >       TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
> >       HeapTuple       tuple;
> >       shm_mq_result result;
> > +     bool            tuple_copied = false;
> > +
> > +     /* Get the tuple out of slot, if necessary converting the slot's contents
> > +      * into a heap tuple by copying. In the later case we need to free the copy.
> > +      */
> > +     if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
> > +     {
> > +             tuple = ExecFetchSlotTuple(slot, true);
> > +             tuple_copied = false;
> > +     }
> > +     else
> > +     {
> > +             tuple = ExecCopySlotTuple(slot);
> > +             tuple_copied = true;
> > +     }
>
> This seems like a bad idea to me.  We shouldn't hardcode slots like
> this.  I've previously argued that we should instead allow
> ExecFetchSlotTuple() for all types of tuples, but add a new bool
> *shouldFree argument, which will then allow the caller to free the
> tuple.  We gain zilch by having this kind of logic in multiple callers.

How about having a separate ExecFetchSlotHeapTuple() for many of the
callers where it is known that the tuple is a heap/buffer tuple ? And
in rare places such as above where slot type is not known, we can have
ExecFetchSlotTuple() which would have an extra shouldFree parameter.


Re: TupleTableSlot abstraction

From
Amit Khandekar
Date:
On Wed, 26 Sep 2018 at 05:35, Andres Freund <andres@anarazel.de> wrote:
> > +
> > +/*
> > + * This is a function used by all getattr() callbacks which deal with a heap
> > + * tuple or some tuple format which can be represented as a heap tuple e.g. a
> > + * minimal tuple.
> > + *
> > + * heap_getattr considers any attnum beyond the attributes available in the
> > + * tuple as NULL. This function however returns the values of missing
> > + * attributes from the tuple descriptor in that case. Also this function does
> > + * not support extracting system attributes.
> > + *
> > + * If the attribute needs to be fetched from the tuple, the function fills in
> > + * tts_values and tts_isnull arrays upto the required attnum.
> > + */
> > +Datum
> > +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
> > +                                             int attnum, bool
> > *isnull)
>
> I'm still *vehemently* opposed to the introduction of this.

You mean, you want to remove the att_isnull() optimization, right ?
Removed that code now. Directly deforming the tuple regardless of the
null attribute.

>
>
>
> > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo,
> >               Datum           iDatum;
> >               bool            isNull;
> >
> > -             if (keycol != 0)
> > +             if (keycol < 0)
> > +             {
> > +                     HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot;
> > +
> > +                     /* Only heap tuples have system attributes. */
> > +                     Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot));
> > +
> > +                     iDatum = heap_getsysattr(hslot->tuple, keycol,
> > +                                                                      slot->tts_tupleDescriptor,
> > +                                                                      &isNull);
> > +             }
> > +             else if (keycol != 0)
> >               {
> >                       /*
> >                        * Plain index column; get the value we need directly from the
>
> This now should access the system column via the slot, right?  There's
> other places like this IIRC.

Done. In FormIndexDatum() and ExecInterpExpr(), directly calling
slot_getsysattr() now.

In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am
planning to remove this definition since it would be a single line
function just calling slot_getsysattr().

In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I
haven't removed the definition yet. I am planning to create a new
LLVMValueRef FuncSlotGetsysattr, and use that instead, in
build_ExecEvalSysVar(), or for that matter, I am thinking to revert
back build_ExecEvalSysVar() and instead have that code inline as in
HEAD.

>
>
>
> > diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
> > index 9d6e25a..1b4e726 100644
> > --- a/src/backend/executor/execExprInterp.c
> > +++ b/src/backend/executor/execExprInterp.c
> > @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
> >
> >               EEO_CASE(EEOP_INNER_SYSVAR)
> >               {
> > -                     int                     attnum = op->d.var.attnum;
> > -                     Datum           d;
> > -
> > -                     /* these asserts must match defenses in slot_getattr */
> > -                     Assert(innerslot->tts_tuple != NULL);
> > -                     Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr));
> > -
> > -                     /* heap_getsysattr has sufficient defenses against bad attnums */
> > -                     d = heap_getsysattr(innerslot->tts_tuple, attnum,
> > -                                                             innerslot->tts_tupleDescriptor,
> > -                                                             op->resnull);
> > -                     *op->resvalue = d;
> > +                     ExecEvalSysVar(state, op, econtext, innerslot);
>
> These changes should be in a separate commit.

As mentioned above, now I am not using ExecEvalSysVar(), instead, I am
calling slot_getsysattr(). So no need of separate commit for that.

>
>
> > +const TupleTableSlotOps TTSOpsHeapTuple = {
> > +     sizeof(HeapTupleTableSlot),
> > +     .init = tts_heap_init,
>
> The first field should also use a named field (same in following cases).

Done.

>
>
> > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,    /* tuple table */
> >       {
> >               TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
> >
> > +             slot->tts_cb->release(slot);
> >               /* Always release resources and reset the slot to empty */
> >               ExecClearTuple(slot);
> >               if (slot->tts_tupleDescriptor)
> > @@ -240,6 +1076,7 @@ void
> >  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
> >  {
> >       /* This should match ExecResetTupleTable's processing of one slot */
> > +     slot->tts_cb->release(slot);
> >       Assert(IsA(slot, TupleTableSlot));
> >       ExecClearTuple(slot);
> >       if (slot->tts_tupleDescriptor)
>
> ISTM that release should be called *after* clearing the slot.

I am copying here what I discussed about this in the earlier reply:

I am not sure what was release() designed to do. Currently all of the
implementers of this function are empty. Was it meant for doing
ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or
ReleaseBuffer(bslot->buffer) ? I think the purpose of keeping this
*before* clearing the tuple might be because the clear() might have
already cleared some handles that release() might need.

>
>
> > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
> >       TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
> >       HeapTuple       tuple;
> >       shm_mq_result result;
> > +     bool            tuple_copied = false;
> > +
> > +     /* Get the tuple out of slot, if necessary converting the slot's contents
> > +      * into a heap tuple by copying. In the later case we need to free the copy.
> > +      */
> > +     if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
> > +     {
> > +             tuple = ExecFetchSlotTuple(slot, true);
> > +             tuple_copied = false;
> > +     }
> > +     else
> > +     {
> > +             tuple = ExecCopySlotTuple(slot);
> > +             tuple_copied = true;
> > +     }
>
> This seems like a bad idea to me.  We shouldn't hardcode slots like
> this.  I've previously argued that we should instead allow
> ExecFetchSlotTuple() for all types of tuples, but add a new bool
> *shouldFree argument, which will then allow the caller to free the
> tuple.  We gain zilch by having this kind of logic in multiple callers.

Done. See more comments below.

> > > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
> > >       TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
> > >       HeapTuple       tuple;
> > >       shm_mq_result result;
> > > +     bool            tuple_copied = false;
> > > +
> > > +     /* Get the tuple out of slot, if necessary converting the slot's contents
> > > +      * into a heap tuple by copying. In the later case we need to free the copy.
> > > +      */
> > > +     if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
> > > +     {
> > > +             tuple = ExecFetchSlotTuple(slot, true);
> > > +             tuple_copied = false;
> > > +     }
> > > +     else
> > > +     {
> > > +             tuple = ExecCopySlotTuple(slot);
> > > +             tuple_copied = true;
> > > +     }
> >
> > This seems like a bad idea to me.  We shouldn't hardcode slots like
> > this.  I've previously argued that we should instead allow
> > ExecFetchSlotTuple() for all types of tuples, but add a new bool
> > *shouldFree argument, which will then allow the caller to free the
> > tuple.  We gain zilch by having this kind of logic in multiple callers.
>
> How about having a separate ExecFetchSlotHeapTuple() for many of the
> callers where it is known that the tuple is a heap/buffer tuple ? And
> in rare places such as above where slot type is not known, we can have
> ExecFetchSlotTuple() which would have an extra shouldFree parameter.

I went ahead and did these changes, but for now, I haven't replaced
ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I
retained ExecFetchSlotTuple() to be called for heap tuples, and added
a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't
like these names, but until we have concluded, I don't want to go
ahead and replace all the numerous ExecFetchSlotTuple() calls with
ExecFetchSlotHeapTuple().

Have used ExecFetchGenericSlotTuple() for ExecBRInsertTriggers() and
ExecIRInsertTriggers() as well, because in CopyFrom(), now with
partition mps, the input slot to these functions can be a virtual
slot.

> > rebasing with latest master, in the regression I got crashes, and I
> > suspect the reason being that I have used Virtual tuple slot for the
> > destination slot of execute_attr_map_slot(). I am analyzing it. I am
> > anyway attaching the patches (v12) to give you an idea of how I have
> > handled the below two items.
>
> It seems to be some corruption here :
>
> @@ -956,17 +978,39 @@ ExecUpdate(ModifyTableState *mtstate,
>
> - tuple = ExecMaterializeSlot(slot);
>
> + if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)))
> + {
> + TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot;
> +
> + Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot));
> + if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor)
> + ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor);
> + ExecCopySlot(es_slot, slot);
> + slot = es_slot;
> + }
> +
> + tuple = ExecFetchSlotTuple(slot, true);
>
> After the slotification of partition tuple conversion, the input slot
> is a virtual tuple, and the above code seems to result in some
> corruption which I have not finished analyzing. It only happens for
> INSERT ON CONFLICT case with partitions.

In ExecInsert() and ExecUpdate(), when it's a partition with different
column order, the input slot is a slot returned by
execute_attr_map_slot(), i.e. a virtual slot.  So, in ExecUpdate() the
input slot->tts_values[] datums may even point to existing contents of
estate->es_dml_input_tuple_slot. This happens when the set clause of
UPDATE uses excluded.* columns :

(From insert_conflict.sql)
insert into parted_conflict_test values (3, 'b')
on conflict (a) do update set b = excluded.b;

And ExecSetSlotDescriptor() is called *before* extracting the tuple
out of the virtual slot, so this function clears the tuple of
es_dml_input_tuple_slot, and so the input slot datums pointing to this
slot tuple become dangling pointers.

Basically, we should extract the tuple out of the input slot as early
as possible.

I now have a new function ExecCopySlotWithTupDesc() which extracts the
tuple using dstslot's memory context, and only then changes the tuple
descriptor. Effectively, the dstlot has a materialized tuple.

Used this function for both ExecUpdate() and ExecInsert().


On Fri, 5 Oct 2018 at 20:28, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > From 06d31f91831a1da78d56e4217f08d3866c7be6f8 Mon Sep 17 00:00:00 2001
> > From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
> > Date: Fri, 31 Aug 2018 11:00:30 +0530
> > Subject: [PATCH 09/14] Rethink ExecFetchSlotMinimalTuple()
> >
> > Before this work, a TupleTableSlot could "own" a minimal tuple as
> > well. Thus ExecFetchSlotMinimalTuple() returned a minimal tuple after
> > constructing and "owning" it if it was not readily available. When the
> > slot "owned" the minimal tuple, the memory consumed by the tuple was
> > freed when a new tuple was stored in it or the slot was cleared.
> >
> > With this work, not all TupleTableSlot types can "own" a minimal
> > tuple.  When fetching a minimal tuple from a slot that can not "own" a
> > tuple, memory is allocated to construct the minimal tuple, which needs
> > to be freed explicitly. Hence ExecFetchSlotMinimalTuple() is modified
> > to flag the caller whether it should free the memory consumed by the
> > returned minimal tuple.
> >
> > Right now only a MinimalTupleTableSlot can own a minimal tuple. But we
> > may change that in future or a user defined TupleTableSlot type (added
> > through an extension) may be able to "own" a minimal tuple in it.
> > Hence instead of relying upon TTS_IS_MINIMAL() macro to tell us
> > whether the TupleTableSlot can "own" a minimal tuple or not, we rely
> > on the set of callbacks.  A TupleTableSlot which can hold a minimal
> > tuple implements get_minimal_tuple callback. Other TupleTableSlot
> > types leave the callback NULL.
> >
> > The minimal tuple returned by this function is usually copied into a
> > hash table or a file. But, unlike ExecFetchSlotTuple() it's never
> > written to. Hence the difference between signatures of the two
> > functions.
>
> I'm somewhat inclined to think this should be done in an earlier patch,
> before the main "abstract slot" work.

Yet to do this.

There is still one more regression test failure in polygon.sql which I
am yet to analyze.

Also, there are some more pending points of yours that I haven't addressed yet.

Attached is v13 patch series.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: TupleTableSlot abstraction

From
Amit Khandekar
Date:
On Wed, 26 Sep 2018 at 05:15, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> > Subject: [PATCH 05/14] Use tts_flags instead of previous bool members
> >
> > Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
> > This reduces the size of TupleTableSlot since each bool member takes
> > at least a byte on the platforms where bool is defined as a char.
> >
> > Ashutosh Bapat and Andres Freund
>
> > +
> > +/* true = slot is empty */
> > +#define                      TTS_ISEMPTY                     (1 << 1)
> > +#define IS_TTS_EMPTY(slot)   ((slot)->tts_flags & TTS_ISEMPTY)
> > +#define SET_TTS_EMPTY(slot) ((slot)->tts_flags |= TTS_ISEMPTY)
> > +#define RESET_TTS_EMPTY(slot) ((slot)->tts_flags &= ~TTS_ISEMPTY)
>
> The flag stuff is the right way, but I'm a bit dubious about the
> accessor functions.  I can see open-coding the accesses, using the
> macros, or potentially going towards using bitfields.
>
> Any comments?

I like this abstraction using macros, since this will allow us to
conveniently change the way this information is stored inside the
slot. I think for this same reason we have defined macros
(HeapTupleIsHotUpdated, etc) for each bit of heaptuple infomask.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2018-10-01 22:21:58 -0400, Tom Lane wrote:
> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> > At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund <andres@anarazel.de> wrote in
<20180925234509.3hrrf6tmvy5tfith@alap3.anarazel.de>
> >> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> >>> Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
> >>> This reduces the size of TupleTableSlot since each bool member takes
> >>> at least a byte on the platforms where bool is defined as a char.
> 
> > About bitfields, an advantage of it is debugger awareness. We
> > don't need to look aside to the definitions of bitwise macros
> > while using debugger. And the current code is preserved in
> > appearance by using it.
> 
> FWIW, I would expect a change like this to be a net loss performance-wise
> on most platforms.  Testing the contents of a byte-wide variable is pretty
> cheap on any architecture invented later than ~ 1970.  Testing a bit,
> though, requires a masking operation that is not free.  I am not seeing
> how making TupleTableSlot a little smaller buys that back ... we don't
> normally have that many active slots in a plan.

I measured it as a speedup on x86-64, mainly because we require fewer
instructions to reset a slot into an empty state, but also because there
are fewer loads. Masking a register is just about free, loading from
memory isn't, even if the cacheline is in L1.  The other benefit is that
this allows TupleTableSlots to fit into one cacheline more often, and
that's noticable in cache miss rates.

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
On 2018-10-09 20:46:04 +0530, Amit Khandekar wrote:
> On Wed, 26 Sep 2018 at 05:35, Andres Freund <andres@anarazel.de> wrote:
> > > +
> > > +/*
> > > + * This is a function used by all getattr() callbacks which deal with a heap
> > > + * tuple or some tuple format which can be represented as a heap tuple e.g. a
> > > + * minimal tuple.
> > > + *
> > > + * heap_getattr considers any attnum beyond the attributes available in the
> > > + * tuple as NULL. This function however returns the values of missing
> > > + * attributes from the tuple descriptor in that case. Also this function does
> > > + * not support extracting system attributes.
> > > + *
> > > + * If the attribute needs to be fetched from the tuple, the function fills in
> > > + * tts_values and tts_isnull arrays upto the required attnum.
> > > + */
> > > +Datum
> > > +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
> > > +                                             int attnum, bool
> > > *isnull)
> >
> > I'm still *vehemently* opposed to the introduction of this.
> 
> You mean, you want to remove the att_isnull() optimization, right ?

Yes.


> Removed that code now. Directly deforming the tuple regardless of the
> null attribute.

Good, thanks.


> > > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo,
> > >               Datum           iDatum;
> > >               bool            isNull;
> > >
> > > -             if (keycol != 0)
> > > +             if (keycol < 0)
> > > +             {
> > > +                     HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot;
> > > +
> > > +                     /* Only heap tuples have system attributes. */
> > > +                     Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot));
> > > +
> > > +                     iDatum = heap_getsysattr(hslot->tuple, keycol,
> > > +                                                                      slot->tts_tupleDescriptor,
> > > +                                                                      &isNull);
> > > +             }
> > > +             else if (keycol != 0)
> > >               {
> > >                       /*
> > >                        * Plain index column; get the value we need directly from the
> >
> > This now should access the system column via the slot, right?  There's
> > other places like this IIRC.
> 
> Done. In FormIndexDatum() and ExecInterpExpr(), directly calling
> slot_getsysattr() now.
> 
> In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am
> planning to remove this definition since it would be a single line
> function just calling slot_getsysattr().
> 
> In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I
> haven't removed the definition yet. I am planning to create a new
> LLVMValueRef FuncSlotGetsysattr, and use that instead, in
> build_ExecEvalSysVar(), or for that matter, I am thinking to revert
> back build_ExecEvalSysVar() and instead have that code inline as in
> HEAD.

I'd just have ExecInterpExpr() continue calling ExecEvalSysVar. There's
no reason for it to be inline. And it's simpler for JIT than the
alternative.

> > > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,    /* tuple table */
> > >       {
> > >               TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
> > >
> > > +             slot->tts_cb->release(slot);
> > >               /* Always release resources and reset the slot to empty */
> > >               ExecClearTuple(slot);
> > >               if (slot->tts_tupleDescriptor)
> > > @@ -240,6 +1076,7 @@ void
> > >  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
> > >  {
> > >       /* This should match ExecResetTupleTable's processing of one slot */
> > > +     slot->tts_cb->release(slot);
> > >       Assert(IsA(slot, TupleTableSlot));
> > >       ExecClearTuple(slot);
> > >       if (slot->tts_tupleDescriptor)
> >
> > ISTM that release should be called *after* clearing the slot.
> 
> I am copying here what I discussed about this in the earlier reply:
> 
> I am not sure what was release() designed to do. Currently all of the
> implementers of this function are empty.

So additional deallocations can happen in a slot. We might need this
e.g. at some point for zheap which needs larger, longer-lived, buffers
in slots.

> Was it meant for doing
> ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or
> ReleaseBuffer(bslot->buffer) ?

No. The former lives in generic code, the latter is in ClearTuple.

> I think the purpose of keeping this *before* clearing the tuple might
> be because the clear() might have already cleared some handles that
> release() might need.

It's just plainly wrong to call it this way round.


> I went ahead and did these changes, but for now, I haven't replaced
> ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I
> retained ExecFetchSlotTuple() to be called for heap tuples, and added
> a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't
> like these names, but until we have concluded, I don't want to go
> ahead and replace all the numerous ExecFetchSlotTuple() calls with
> ExecFetchSlotHeapTuple().

Why not?


Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Amit Khandekar
Date:
On Sat, 13 Oct 2018 at 04:02, Andres Freund <andres@anarazel.de> wrote:
> > > > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo,
> > > >               Datum           iDatum;
> > > >               bool            isNull;
> > > >
> > > > -             if (keycol != 0)
> > > > +             if (keycol < 0)
> > > > +             {
> > > > +                     HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot;
> > > > +
> > > > +                     /* Only heap tuples have system attributes. */
> > > > +                     Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot));
> > > > +
> > > > +                     iDatum = heap_getsysattr(hslot->tuple, keycol,
> > > > +                                                                      slot->tts_tupleDescriptor,
> > > > +                                                                      &isNull);
> > > > +             }
> > > > +             else if (keycol != 0)
> > > >               {
> > > >                       /*
> > > >                        * Plain index column; get the value we need directly from the
> > >
> > > This now should access the system column via the slot, right?  There's
> > > other places like this IIRC.
> >
> > Done. In FormIndexDatum() and ExecInterpExpr(), directly calling
> > slot_getsysattr() now.
> >
> > In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am
> > planning to remove this definition since it would be a single line
> > function just calling slot_getsysattr().
> >
> > In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I
> > haven't removed the definition yet. I am planning to create a new
> > LLVMValueRef FuncSlotGetsysattr, and use that instead, in
> > build_ExecEvalSysVar(), or for that matter, I am thinking to revert
> > back build_ExecEvalSysVar() and instead have that code inline as in
> > HEAD.
>
> I'd just have ExecInterpExpr() continue calling ExecEvalSysVar. There's
> no reason for it to be inline.
Can you explain more why you think there should be a ExecEvalSysVar()
definition ? As I mentioned earlier it would just call
slot_getsysattr() and do nothing else.

> And it's simpler for JIT than the alternative.

You mean it would be simpler for JIT to call ExecEvalSysVar() than
slot_getsysattr() ? I didn't get why it is simpler.

Or are you talking considering build_ExecEvalSysVar() ? I am ok with
retaining build_ExecEvalSysVar() , but I was saying even inside this
function, we could do :
LLVMBuildCall(.... , llvm_get_decl(mod, FuncSlotGetsysattr) , .....)
rather than:
LLVMFunctionType(,...)
LLVMAddFunction("ExecEvalSysVar", ....)
LLVMBuildCall(...)


>
> > > > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,    /* tuple table */
> > > >       {
> > > >               TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
> > > >
> > > > +             slot->tts_cb->release(slot);
> > > >               /* Always release resources and reset the slot to empty */
> > > >               ExecClearTuple(slot);
> > > >               if (slot->tts_tupleDescriptor)
> > > > @@ -240,6 +1076,7 @@ void
> > > >  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
> > > >  {
> > > >       /* This should match ExecResetTupleTable's processing of one slot */
> > > > +     slot->tts_cb->release(slot);
> > > >       Assert(IsA(slot, TupleTableSlot));
> > > >       ExecClearTuple(slot);
> > > >       if (slot->tts_tupleDescriptor)
> > >
> > > ISTM that release should be called *after* clearing the slot.
> >
> > I am copying here what I discussed about this in the earlier reply:
> >
> > I am not sure what was release() designed to do. Currently all of the
> > implementers of this function are empty.
>
> So additional deallocations can happen in a slot. We might need this
> e.g. at some point for zheap which needs larger, longer-lived, buffers
> in slots.
>
> > Was it meant for doing
> > ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or
> > ReleaseBuffer(bslot->buffer) ?
>
> No. The former lives in generic code, the latter is in ClearTuple.
>
> > I think the purpose of keeping this *before* clearing the tuple might
> > be because the clear() might have already cleared some handles that
> > release() might need.
>
> It's just plainly wrong to call it this way round.

Ok.

>
>
> > I went ahead and did these changes, but for now, I haven't replaced
> > ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I
> > retained ExecFetchSlotTuple() to be called for heap tuples, and added
> > a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't
> > like these names, but until we have concluded, I don't want to go
> > ahead and replace all the numerous ExecFetchSlotTuple() calls with
> > ExecFetchSlotHeapTuple().
>
> Why not?

I haven't gone ahead because I wanted to know if you are ok with the names.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: TupleTableSlot abstraction

From
Amit Khandekar
Date:
On Tue, 9 Oct 2018 at 20:46, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> There is still one more regression test failure in polygon.sql which I
> am yet to analyze.

Below is a narrowed down testcase which reproduces the failure in polygon.sql :

create table tab (n int, dist int, id integer);
insert into tab values (1, 2, 3);

-- Force a hash join
set enable_nestloop TO f;
set enable_mergejoin TO f;

-- Expected to return a row
SELECT * FROM tab t1 , tab t2 where t1.id = t2.id;
 n | dist | id | n | dist | id
---+------+----+---+------+----
(0 rows)

In MultiExecPrivateHash(), to generate the hash table, the tuples are
retrieved one by one from the scan of outer plan state. For each
tuple, ExecHashGetHashValue() is called to get the join attribute
value of the tuple. Here, when the attribute is retrieved by a
jit-compiled slot-deforming function built by slot_compile_deform(),
the attribute value is a junk value. So the hash join condition fails
and the join returns no rows.

Root cause :

In llvm_compile_expr(), for the switch case : EEOP_INNER_FETCHSOME,
innerPlanState(parent)->ps_ResultTupleSlot->tts_cb is passed to
slot_compile_deform(). And in slot_compile_deform(), this tts_cb is
used to determine whether the inner slot is a minimal slot or a
buffer/heap tuple slot, and accordingly v_tupleheaderp is calculated
using StructMinimalTupleTableSlot or StructHeapTupleTableSlot.

In the above hash join scenario, the ps_ResultTupleSlot is a minimal tuple slot.
But at runtime, when MultiExecPrivateHash()=>ExecHashGetHashValue() is
called, the slot returned by outer node (Seqscan) is a buffer heap
tuple slot; this is because the seq scan does not return using its
ps_ResultTupleSlot, instead it directly returns its scan slot since
there is no projection info needed. Hence the tuple is retrieved using
a wrong offset inside the Tuple table slot, because the jit function
was compiled assuming it's going to be a minimal tuple slot.

So, although we can safely use
innerPlanState(parent)->ps_ResultTupleSlot to get the tuple descriptor
for slot_compile_deform(), we should not use the same tuple slot to
know what kind of a tuple slot it will be. That can be known only at
runtime.

Possible Fix :

I am thinking, in slot_compile_deform(), we need to include the logic
instructions to determine the slot type. Have a new
FIELDNO_TUPLETABLESLOT_OPS to retrieve TupleTableSlot.tts_cb, and then
accordingly calculate the tuple offset. I am not sure if this will
turn out to have a performance impact on jit execution, or whether it
is feasible to do such conditional thing in llvm; trying to
understand.

Comments ?


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2018-10-15 12:12:03 +0530, Amit Khandekar wrote:
> On Sat, 13 Oct 2018 at 04:02, Andres Freund <andres@anarazel.de> wrote:
> > I'd just have ExecInterpExpr() continue calling ExecEvalSysVar. There's
> > no reason for it to be inline.

> Can you explain more why you think there should be a ExecEvalSysVar()
> definition ? As I mentioned earlier it would just call
> slot_getsysattr() and do nothing else.

I think it's superflous to have three opcodes for this, and we should
instead just have one sysvar instruction that chooses the slot based on
a parameter of the opcode.  Inline code in the interpreter isn't free.


> > And it's simpler for JIT than the alternative.
> 
> You mean it would be simpler for JIT to call ExecEvalSysVar() than
> slot_getsysattr() ? I didn't get why it is simpler.

Because there'd be no special code at all. We can just use the code that
existing ExecEval* routines use.


> Or are you talking considering build_ExecEvalSysVar() ? I am ok with
> retaining build_ExecEvalSysVar() , but I was saying even inside this
> function, we could do :
> LLVMBuildCall(.... , llvm_get_decl(mod, FuncSlotGetsysattr) , .....)
> rather than:
> LLVMFunctionType(,...)
> LLVMAddFunction("ExecEvalSysVar", ....)
> LLVMBuildCall(...)

That'd probably generate more work than it'd save, because llvm_get_decl
requires that the function is present in llvmjit_types.c.


> > > I went ahead and did these changes, but for now, I haven't replaced
> > > ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I
> > > retained ExecFetchSlotTuple() to be called for heap tuples, and added
> > > a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't
> > > like these names, but until we have concluded, I don't want to go
> > > ahead and replace all the numerous ExecFetchSlotTuple() calls with
> > > ExecFetchSlotHeapTuple().
> >
> > Why not?
> 
> I haven't gone ahead because I wanted to know if you are ok with the names.

Just renaming a function is cheap, it's just a oneliner script ;)

FWIW, I dislike ExecFetchGenericSlotTuple() as well. It's still a
HeapTuple, there's absolutely nothing Generic about it.  I don't see why
we'd not just use ExecFetchSlotHeapTuple() with a new shouldFree
parameter?

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

Attached is a heavily revised version of the patchset developed in this
thread. It's based on the last version posted by Amit Khandekar at [1].

The largest issue this resolves is that, for JITing, we need to know
what kind of slot we're going to deal with.  E.g. in contrast to right
now, we do not want to generate bespoke deform function for virtual
slots, and we need to be able to discern between slots we know to deform
and slots where we don't.  Up to now generated a deform function
whenever we can figure out what the tupledesc for an EEOP_*_FETCHSOME
operation is, using the fairly ugly hack for looking at
PlanState->scanslot the scan desc, and the right/left tree's for
INNER/OUTER.  That kind of works as far as descriptors go, but doesn't
for the slot types - it's far from uncommon that the slot type inside a
node differs from the result type from the left/right tree.

I've tried various approaches for this, and what I settled on is that
every PlanState signals in new fields what its scan result type and slot
type is, if known; normally that's automatic due to using
ExecInitScanTupleSlot(), ExecInitResultSlot().  By default, the right /
left (aka inner/outer) are inferred using ExecGetResultSlotOps on the
subsidiary node, but can be overriden too.  This gets rid of similar
logic previously present in llvmjit_expr.c, by moving the relevant
knowledge into ExprEvalOp.d.fetch.

While this primarily benefits JITing, it also allows for future
optimizations like eliding slot_getsomattr() / EEOP_*_FETCHSOME calls
for Vars on slots known to be virtual (which is a large percentage, due
to ExecProject() etc).

To avoid bugs around this, I've added assertions that check that hte
slot type is as expected. This, unsurprisingly, caught a number of bugs.

While this approach isn't perfect (in particular, it adds a few new
fields to PlanState), I've tried hard ot find other solutions, and
didn't come up with anything that's meaningfully better.  If anybody has
better ideas, please call now.



The second bigger change is that I've removed ExecFetchGenericSlotTuple
- I really disliked having both that and ExecFetchSlotTuple. Now
ExecFetchSlotTuple has a new *shouldFree parameter that indicates
whether the returned value is allocated in the current context, or
not. I've also renamed it to ExecFetchSlotHeapTuple.  It's currently
still used for places where we modify the returned tuple in-place (to
set oid etc), but subsequent patches in the pluggable storage patchset,
get rid of that too.


I've done a bit of benchmarking (TPCH and pgench), and after some
performance fixes the patchset either has no performance effect, or a
very small performance benefit (when using JITing, it reduces the
overhead).


Further changes:
- changed the split of patches, so they now all pass the
  tests, and individually make (some) sense.
- virtual tuple table slots can now be materialized (by serializing
  !byval columns into a single allocation, that's then pointed to by
  tts_values).
- The responsibility for filling missing columns is now again in
  slot_getsomeattrs, rather than the per-slot callbacks (which can still
  do so if desired, but normally I don't see much point).
- lots of bugfixes
- comment cleanups
- performance improvements
- Reduction in the number of callbacks. getattr, attisnull are gone.


What I'm now planning to do is to go through the big comment in
tuptable.h and update that to the new world.  While I'm not convinced
that that that's the best place for it, it needs to be accurate.

Furthermore:
- More comment polishing
- I'll probably split the commits up a bit further (particulary JIT
  ignoring virtual tuple slots, inlining the hot path of
  slot_getsomeattrs())
- serious commit message polishing

After this, I hope Amit Khandekar will rebase a patch he's sent me
internally that converts triggers to use slots. I'll work on rebasing
the pluggable storage patch ontop of this.

[1] http://archives.postgresql.org/message-id/CAJ3gD9eq38XhLsLTie%2B3NHsCRkLO0xHLA4MQX_3sr6or7xws4Q%40mail.gmail.com

Greetings,

Andres Freund

Attachment

Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2018-11-13 15:30:21 -0800, Andres Freund wrote:
> What I'm now planning to do is to go through the big comment in
> tuptable.h and update that to the new world.  While I'm not convinced
> that that that's the best place for it, it needs to be accurate.
> 
> Furthermore:
> - More comment polishing
> - I'll probably split the commits up a bit further (particulary JIT
>   ignoring virtual tuple slots, inlining the hot path of
>   slot_getsomeattrs())
> - serious commit message polishing

I've done all of that now, and pushed it.  Thanks Ashutosh, Amit
Khandekar and everyone else.

On to pluggable storage...

Regards,

Andres


Re: TupleTableSlot abstraction

From
Amit Khandekar
Date:
On Wed, 14 Nov 2018 at 05:00, Andres Freund <andres@anarazel.de> wrote:
> After this, I hope Amit Khandekar will rebase a patch he's sent me
> internally that converts triggers to use slots. I'll work on rebasing
> the pluggable storage patch ontop of this.

Shared this patch in a separate mail thread :
https://www.postgresql.org/message-id/CAJ3gD9fjpoPHSHB-Ufj7ciT8nV0JSA2gdJUdtxo-bMyPrpjk%3DQ%40mail.gmail.com
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: TupleTableSlot abstraction

From
Jeff Janes
Date:
On Fri, Nov 16, 2018 at 7:46 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-11-13 15:30:21 -0800, Andres Freund wrote:
> What I'm now planning to do is to go through the big comment in
> tuptable.h and update that to the new world.  While I'm not convinced
> that that that's the best place for it, it needs to be accurate.
>
> Furthermore:
> - More comment polishing
> - I'll probably split the commits up a bit further (particulary JIT
>   ignoring virtual tuple slots, inlining the hot path of
>   slot_getsomeattrs())
> - serious commit message polishing

I've done all of that now, and pushed it.  Thanks Ashutosh, Amit
Khandekar and everyone else.

Hi Andres and all, 

This commit 763f2edd92095b1ca2 "Rejigger materializing and fetching a HeapTuple from a slot" introduces a memory leak into the ExecutorState context which is invoked by this statement:

CREATE TABLE tbloom AS
   SELECT
     (random() * 1000000)::int as i1,
     (random() * 1000000)::int as i2,
     (random() * 1000000)::int as i3,
     (random() * 1000000)::int as i4,
     (random() * 1000000)::int as i5,
     (random() * 1000000)::int as i6
   FROM
  generate_series(1,50000000);
 
By blind analogy to the changes made to matview.c, I think that createas.c is missing a heap_freetuple, as attached.

It fixes the leak, and still passes "make check".
Cheers,

Jeff


Attachment

Re: TupleTableSlot abstraction

From
Michael Paquier
Date:
On Sat, Feb 16, 2019 at 05:07:44PM -0500, Jeff Janes wrote:
> By blind analogy to the changes made to matview.c, I think that createas.c
> is missing a heap_freetuple, as attached.

I think that you are right.  CTAS and materialized views share a lot
when it comes to relation creation and initial table loading.  I have
reproduced the leak and could notice that your fix is correct.  So
committed.
--
Michael

Attachment

Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2019-02-27 14:21:52 +0900, Michael Paquier wrote:
> On Sat, Feb 16, 2019 at 05:07:44PM -0500, Jeff Janes wrote:
> > By blind analogy to the changes made to matview.c, I think that createas.c
> > is missing a heap_freetuple, as attached.

First, sorry to have been slow answering. I was whacking around code
further modifying this, and thought I'd just solve the immediate issue
here by committing the followup work that removes getting the tuple out
of the slot entirely. That took longer than planned, so it makes sense
to commit an interim fix.


> I think that you are right.  CTAS and materialized views share a lot
> when it comes to relation creation and initial table loading.  I have
> reproduced the leak and could notice that your fix is correct.  So
> committed.

I'm not so sure that's the architecturally correct fix however. Is it
actually guaranteed, given expanded tuples, toasting, etc, that there's
no other memory leak here? I wonder if we shouldn't work twoards using a
short lived memory context here. Note how e.g. printtup() uses a short
lived context for its work.

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Michael Paquier
Date:
On Tue, Feb 26, 2019 at 09:42:38PM -0800, Andres Freund wrote:
> I'm not so sure that's the architecturally correct fix however. Is it
> actually guaranteed, given expanded tuples, toasting, etc, that there's
> no other memory leak here? I wonder if we shouldn't work twoards using a
> short lived memory context here. Note how e.g. printtup() uses a short
> lived context for its work.

Perhaps.  I got to wonder if this change would not impact code using
their own DestReceiver, resulting in similar leaks when they insert
tuples on-the-fly.  Such issues can be surprising for fork an plugin
developers.  I was playing a bit with some refactoring of relation
creation for CTAS in the scope of temporary matviews, and noticed this
issue on the CF list, so that was a bit annoying, and issues like that
tend to be easily forgotten..
--
Michael

Attachment

Re: TupleTableSlot abstraction

From
Andres Freund
Date:
Hi,

On 2019-02-27 15:34:07 +0900, Michael Paquier wrote:
> On Tue, Feb 26, 2019 at 09:42:38PM -0800, Andres Freund wrote:
> > I'm not so sure that's the architecturally correct fix however. Is it
> > actually guaranteed, given expanded tuples, toasting, etc, that there's
> > no other memory leak here? I wonder if we shouldn't work twoards using a
> > short lived memory context here. Note how e.g. printtup() uses a short
> > lived context for its work.
> 
> Perhaps.  I got to wonder if this change would not impact code using
> their own DestReceiver, resulting in similar leaks when they insert
> tuples on-the-fly.

Im not sure I understand. How can adding a memory context + reset to
ctas and matview receivers negatively impact other dest receivers?


> I was playing a bit with some refactoring of relation creation for
> CTAS in the scope of temporary matviews, and noticed this issue on the
> CF list, so that was a bit annoying, and issues like that tend to be
> easily forgotten..

It's been 10 days since the report, nobody pinged, and obviously I'm
working on pluggable storage, so ...

Greetings,

Andres Freund


Re: TupleTableSlot abstraction

From
Michael Paquier
Date:
On Tue, Feb 26, 2019 at 10:38:45PM -0800, Andres Freund wrote:
> Im not sure I understand. How can adding a memory context + reset to
> ctas and matview receivers negatively impact other dest receivers?

I don't think you got my point here: imagine that a plugin use the
current receiveSlot logic from createas.c or matview.c, and forgets to
free the tuple copied.  On v11, that works fine.  On current HEAD,
they win silently a new leak.
--
Michael

Attachment

Re: TupleTableSlot abstraction

From
Andres Freund
Date:
On 2019-02-27 15:42:50 +0900, Michael Paquier wrote:
> On Tue, Feb 26, 2019 at 10:38:45PM -0800, Andres Freund wrote:
> > Im not sure I understand. How can adding a memory context + reset to
> > ctas and matview receivers negatively impact other dest receivers?
> 
> I don't think you got my point here: imagine that a plugin use the
> current receiveSlot logic from createas.c or matview.c, and forgets to
> free the tuple copied.  On v11, that works fine.  On current HEAD,
> they win silently a new leak.

The copy was made in intorel_receive()?