Thread: Re: posgres 12 bug (partitioned table)

Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Pavel Biryukov <79166341370@yandex.ru> writes:
> Hello. I'm catch error "virtual tuple table slot does not have system
> attributes" when inserting row into partitioned table with RETURNING
> xmin

Reproduced here.  The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression.  I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
David Rowley
Date:
On Fri, 5 Jun 2020 at 04:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Reproduced here.  The example works in v11, and in v10 if you remove
> the unnecessary-to-the-example primary key, so it seems like a clear
> regression.  I didn't dig for the cause but I suppose it's related
> to Andres' slot-related changes.

Looks like c2fe139c20 is the breaking commit.

David



Re: posgres 12 bug (partitioned table)

From
Soumyadeep Chakraborty
Date:
Hello,

ccing pgsql-hackers@postgresql.org

Upon investigation, it seems that the problem is caused by the
following:

The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot. If there are system
columns other than ctid and tableOid referenced in the RETURNING clause
(not only xmin as in the bug report), it will lead to the ERROR as
mentioned in this thread as virtual tuple table slots don't really store
such columns. (ctid and tableOid are present directly in the
TupleTableSlot struct and can be satisfied from there: refer:
slot_getsysattr()))

I have attached two alternate patches to solve the problem. Both patches
use and share a mechanism to detect if there are any such system
columns. This is done inside ExecBuildProjectionInfo() and we store this
info inside the ProjectionInfo struct. Then based on this info, system
columns are populated in a suitable slot, which is then passed on to
ExecProcessReturning(). (If it is deemed that this operation only be
done for RETURNING, we can just as easily do it in the callsite for
ExecBuildProjectionInfo() in ExecInitModifyTable() for RETURNING instead
of doing it inside ExecBuildProjectionInfo())

The first patch [1] explicitly creates a heap tuple table slot, fills in the
system column values as we would do during heap_prepare_insert() and
then passes that slot to ExecProcessReturning(). (We use a heap tuple table
slot as it is guaranteed to support these attributes).

The second patch [2] instead of relying on a heap tuple table slot,
relies on ExecGetReturningSlot() for the right slot and
table_tuple_fetch_row_version() to supply the system column values.  It
does make the assumption that the AM would supply a slot that will have
these system columns.

[1] v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch
[2] v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch

Regards,
Soumyadeep (VMware)

Attachment

Re: posgres 12 bug (partitioned table)

From
Soumyadeep Chakraborty
Date:
Hello,

ccing pgsql-hackers@postgresql.org

Upon investigation, it seems that the problem is caused by the
following:

The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot. If there are system
columns other than ctid and tableOid referenced in the RETURNING clause
(not only xmin as in the bug report), it will lead to the ERROR as
mentioned in this thread as virtual tuple table slots don't really store
such columns. (ctid and tableOid are present directly in the
TupleTableSlot struct and can be satisfied from there: refer:
slot_getsysattr()))

I have attached two alternate patches to solve the problem. Both patches
use and share a mechanism to detect if there are any such system
columns. This is done inside ExecBuildProjectionInfo() and we store this
info inside the ProjectionInfo struct. Then based on this info, system
columns are populated in a suitable slot, which is then passed on to
ExecProcessReturning(). (If it is deemed that this operation only be
done for RETURNING, we can just as easily do it in the callsite for
ExecBuildProjectionInfo() in ExecInitModifyTable() for RETURNING instead
of doing it inside ExecBuildProjectionInfo())

The first patch [1] explicitly creates a heap tuple table slot, fills in the
system column values as we would do during heap_prepare_insert() and
then passes that slot to ExecProcessReturning(). (We use a heap tuple table
slot as it is guaranteed to support these attributes).

The second patch [2] instead of relying on a heap tuple table slot,
relies on ExecGetReturningSlot() for the right slot and
table_tuple_fetch_row_version() to supply the system column values.  It
does make the assumption that the AM would supply a slot that will have
these system columns.

[1] v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch
[2] v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch

Regards,
Soumyadeep (VMware)

Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Hi Soumyadeep,

Thanks for picking this up.

On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
> Upon investigation, it seems that the problem is caused by the
> following:
>
> The slot passed to the call to ExecProcessReturning() inside
> ExecInsert() is often a virtual tuple table slot.

Actually, not that often in practice.  The slot is not virtual, for
example, when inserting into a regular non-partitioned table.  Whether
or not it is virtual depends on the following piece of code in
ExecInitModifyTable():

        mtstate->mt_scans[i] =
            ExecInitExtraTupleSlot(mtstate->ps.state,
ExecGetResultType(mtstate->mt_plans[i]),

table_slot_callbacks(resultRelInfo->ri_RelationDesc));

Specifically, the call to table_slot_callbacks() above determines what
kind of slot is assigned for a given target relation.  For partitioned
tables, it happens to return a virtual slot currently, per this
implementation:

    if (relation->rd_tableam)
        tts_cb = relation->rd_tableam->slot_callbacks(relation);
    else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
    {
        /*
         * Historically FDWs expect to store heap tuples in slots. Continue
         * handing them one, to make it less painful to adapt FDWs to new
         * versions. The cost of a heap slot over a virtual slot is pretty
         * small.
         */
        tts_cb = &TTSOpsHeapTuple;
    }
    else
    {
        /*
         * These need to be supported, as some parts of the code (like COPY)
         * need to create slots for such relations too. It seems better to
         * centralize the knowledge that a heap slot is the right thing in
         * that case here.
         */
        Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
               relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
        tts_cb = &TTSOpsVirtual;
    }

If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached).  In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.

> I have attached two alternate patches to solve the problem.

IMHO, they are solving the problem at the wrong place.  We should
really fix things so that the slot that gets passed down to
ExecProcessReturning() is of the correct type to begin with.  We could
do what I suggest above or maybe find some other way.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Hi Soumyadeep,

Thanks for picking this up.

On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
> Upon investigation, it seems that the problem is caused by the
> following:
>
> The slot passed to the call to ExecProcessReturning() inside
> ExecInsert() is often a virtual tuple table slot.

Actually, not that often in practice.  The slot is not virtual, for
example, when inserting into a regular non-partitioned table.  Whether
or not it is virtual depends on the following piece of code in
ExecInitModifyTable():

        mtstate->mt_scans[i] =
            ExecInitExtraTupleSlot(mtstate->ps.state,
ExecGetResultType(mtstate->mt_plans[i]),

table_slot_callbacks(resultRelInfo->ri_RelationDesc));

Specifically, the call to table_slot_callbacks() above determines what
kind of slot is assigned for a given target relation.  For partitioned
tables, it happens to return a virtual slot currently, per this
implementation:

    if (relation->rd_tableam)
        tts_cb = relation->rd_tableam->slot_callbacks(relation);
    else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
    {
        /*
         * Historically FDWs expect to store heap tuples in slots. Continue
         * handing them one, to make it less painful to adapt FDWs to new
         * versions. The cost of a heap slot over a virtual slot is pretty
         * small.
         */
        tts_cb = &TTSOpsHeapTuple;
    }
    else
    {
        /*
         * These need to be supported, as some parts of the code (like COPY)
         * need to create slots for such relations too. It seems better to
         * centralize the knowledge that a heap slot is the right thing in
         * that case here.
         */
        Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
               relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
        tts_cb = &TTSOpsVirtual;
    }

If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached).  In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.

> I have attached two alternate patches to solve the problem.

IMHO, they are solving the problem at the wrong place.  We should
really fix things so that the slot that gets passed down to
ExecProcessReturning() is of the correct type to begin with.  We could
do what I suggest above or maybe find some other way.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: posgres 12 bug (partitioned table)

From
Soumyadeep Chakraborty
Date:
Hi Amit,

Thanks for your reply!

On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hi Soumyadeep,
>
> Thanks for picking this up.
>
> On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
> <soumyadeep2007@gmail.com> wrote:
> > Upon investigation, it seems that the problem is caused by the
> > following:
> >
> > The slot passed to the call to ExecProcessReturning() inside
> > ExecInsert() is often a virtual tuple table slot.
>
> Actually, not that often in practice.  The slot is not virtual, for
> example, when inserting into a regular non-partitioned table.

Indeed! I meant partitioned tables are a common use case. Sorry, I
should have elaborated.

> If I change this to return a "heap" slot for partitioned tables, just
> like for foreign tables, the problem goes away (see the attached).  In
> fact, even make check-world passes, so I don't know why it isn't that
> way to begin with.
>

This is what I had thought of initially but I had taken a step back for 2
reasons:

1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1], a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.

2. This is a secondary aspect but I will mention it here for
completeness. Not knowing enough about this code: will demanding heap
tuples for partitioned tables all throughout the code have a performance
impact? At a first glance it didn't seem to be the case.  However, I did
find lots of callsites for partitioning or otherwise where we kind of
expect a virtual tuple table slot (as evidenced with the calls to
ExecStoreVirtualTuple()). With your patch, we seem to be calling
ExecStoreVirtualTuple() on a heap tuple table slot, in various places:
such as inside execute_attr_map_slot(). It seems to be harmless to do so
however, in accordance with my limited investigation.

All in all, I think we have to explicitly supply those system columns. I
heard from Daniel that one of the motivations for having table AMs
was to ensure that transaction meta-data storage is not demanded off any
AM.

[1] https://github.com/greenplum-db/postgres/tree/zedstore

Regards,
Soumyadeep



Re: posgres 12 bug (partitioned table)

From
Soumyadeep Chakraborty
Date:
Hi Amit,

Thanks for your reply!

On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hi Soumyadeep,
>
> Thanks for picking this up.
>
> On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
> <soumyadeep2007@gmail.com> wrote:
> > Upon investigation, it seems that the problem is caused by the
> > following:
> >
> > The slot passed to the call to ExecProcessReturning() inside
> > ExecInsert() is often a virtual tuple table slot.
>
> Actually, not that often in practice.  The slot is not virtual, for
> example, when inserting into a regular non-partitioned table.

Indeed! I meant partitioned tables are a common use case. Sorry, I
should have elaborated.

> If I change this to return a "heap" slot for partitioned tables, just
> like for foreign tables, the problem goes away (see the attached).  In
> fact, even make check-world passes, so I don't know why it isn't that
> way to begin with.
>

This is what I had thought of initially but I had taken a step back for 2
reasons:

1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1], a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.

2. This is a secondary aspect but I will mention it here for
completeness. Not knowing enough about this code: will demanding heap
tuples for partitioned tables all throughout the code have a performance
impact? At a first glance it didn't seem to be the case.  However, I did
find lots of callsites for partitioning or otherwise where we kind of
expect a virtual tuple table slot (as evidenced with the calls to
ExecStoreVirtualTuple()). With your patch, we seem to be calling
ExecStoreVirtualTuple() on a heap tuple table slot, in various places:
such as inside execute_attr_map_slot(). It seems to be harmless to do so
however, in accordance with my limited investigation.

All in all, I think we have to explicitly supply those system columns. I
heard from Daniel that one of the motivations for having table AMs
was to ensure that transaction meta-data storage is not demanded off any
AM.

[1] https://github.com/greenplum-db/postgres/tree/zedstore

Regards,
Soumyadeep



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Hi Soumyadeep,

On Wed, Jul 8, 2020 at 9:37 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
> On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > If I change this to return a "heap" slot for partitioned tables, just
> > like for foreign tables, the problem goes away (see the attached).  In
> > fact, even make check-world passes, so I don't know why it isn't that
> > way to begin with.
>
> This is what I had thought of initially but I had taken a step back for 2
> reasons:
>
> 1. It is not mandatory for an AM to supply a heap tuple in the slot
> passed to any AM routine. With your patch, the slot passed to
> table_tuple_insert() inside ExecInsert() for instance is now expected to
> supply a heap tuple for the subsequent call to ExecProcessReturning().
> This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
> applying your patch on Zedstore [1], a columnar AM, and consequently, I
> got incorrect values for xmin, xmax etc with the query reported in this
> issue.

Ah, I see.  You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert().  So,
table_tuple_insert() always refers to a leaf partition's AM.   Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table.  How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops?  That would be the case, for
example, if the leaf partition is of Zedstore AM.  In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Hi Soumyadeep,

On Wed, Jul 8, 2020 at 9:37 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
> On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > If I change this to return a "heap" slot for partitioned tables, just
> > like for foreign tables, the problem goes away (see the attached).  In
> > fact, even make check-world passes, so I don't know why it isn't that
> > way to begin with.
>
> This is what I had thought of initially but I had taken a step back for 2
> reasons:
>
> 1. It is not mandatory for an AM to supply a heap tuple in the slot
> passed to any AM routine. With your patch, the slot passed to
> table_tuple_insert() inside ExecInsert() for instance is now expected to
> supply a heap tuple for the subsequent call to ExecProcessReturning().
> This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
> applying your patch on Zedstore [1], a columnar AM, and consequently, I
> got incorrect values for xmin, xmax etc with the query reported in this
> issue.

Ah, I see.  You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert().  So,
table_tuple_insert() always refers to a leaf partition's AM.   Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table.  How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops?  That would be the case, for
example, if the leaf partition is of Zedstore AM.  In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Soumyadeep Chakraborty
Date:
Hey Amit,

On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Ah, I see.  You might've noticed that ExecInsert() only ever sees leaf
> partitions, because tuple routing would've switched the result
> relation to a leaf partition by the time we are in ExecInsert().  So,
> table_tuple_insert() always refers to a leaf partition's AM.   Not
> sure if you've also noticed but each leaf partition gets to own a slot
> (PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
> used if the leaf partition attribute numbers are not the same as the
> root partitioned table.  How about we also use it if the leaf
> partition AM's table_slot_callbacks() differs from the root
> partitioned table's slot's tts_ops?  That would be the case, for
> example, if the leaf partition is of Zedstore AM.  In the more common
> cases where all leaf partitions are of heap AM, this would mean the
> original slot would be used as is, that is, if we accept hard-coding
> table_slot_callbacks() to return a "heap" slot for partitioned tables
> as I suggest.

Even then, we will still need to fill in the system columns explicitly as
pi_PartitionTupleSlot will not be filled with system columns after
it comes back out of table_tuple_insert() if we have a non-heap AM.


Regards,
Soumyadeep (VMware)



Re: posgres 12 bug (partitioned table)

From
Soumyadeep Chakraborty
Date:
Hey Amit,

On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Ah, I see.  You might've noticed that ExecInsert() only ever sees leaf
> partitions, because tuple routing would've switched the result
> relation to a leaf partition by the time we are in ExecInsert().  So,
> table_tuple_insert() always refers to a leaf partition's AM.   Not
> sure if you've also noticed but each leaf partition gets to own a slot
> (PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
> used if the leaf partition attribute numbers are not the same as the
> root partitioned table.  How about we also use it if the leaf
> partition AM's table_slot_callbacks() differs from the root
> partitioned table's slot's tts_ops?  That would be the case, for
> example, if the leaf partition is of Zedstore AM.  In the more common
> cases where all leaf partitions are of heap AM, this would mean the
> original slot would be used as is, that is, if we accept hard-coding
> table_slot_callbacks() to return a "heap" slot for partitioned tables
> as I suggest.

Even then, we will still need to fill in the system columns explicitly as
pi_PartitionTupleSlot will not be filled with system columns after
it comes back out of table_tuple_insert() if we have a non-heap AM.


Regards,
Soumyadeep (VMware)



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
On Thu, Jul 9, 2020 at 1:53 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
> On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Ah, I see.  You might've noticed that ExecInsert() only ever sees leaf
> > partitions, because tuple routing would've switched the result
> > relation to a leaf partition by the time we are in ExecInsert().  So,
> > table_tuple_insert() always refers to a leaf partition's AM.   Not
> > sure if you've also noticed but each leaf partition gets to own a slot
> > (PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
> > used if the leaf partition attribute numbers are not the same as the
> > root partitioned table.  How about we also use it if the leaf
> > partition AM's table_slot_callbacks() differs from the root
> > partitioned table's slot's tts_ops?  That would be the case, for
> > example, if the leaf partition is of Zedstore AM.  In the more common
> > cases where all leaf partitions are of heap AM, this would mean the
> > original slot would be used as is, that is, if we accept hard-coding
> > table_slot_callbacks() to return a "heap" slot for partitioned tables
> > as I suggest.
>
> Even then, we will still need to fill in the system columns explicitly as
> pi_PartitionTupleSlot will not be filled with system columns after
> it comes back out of table_tuple_insert() if we have a non-heap AM.

Well, I was hoping that table_tuple_insert() would fill that info, but
you did say upthread that table AMs are not exactly expected to do so,
so maybe you have a point.

By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin?  Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
On Thu, Jul 9, 2020 at 1:53 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
> On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Ah, I see.  You might've noticed that ExecInsert() only ever sees leaf
> > partitions, because tuple routing would've switched the result
> > relation to a leaf partition by the time we are in ExecInsert().  So,
> > table_tuple_insert() always refers to a leaf partition's AM.   Not
> > sure if you've also noticed but each leaf partition gets to own a slot
> > (PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
> > used if the leaf partition attribute numbers are not the same as the
> > root partitioned table.  How about we also use it if the leaf
> > partition AM's table_slot_callbacks() differs from the root
> > partitioned table's slot's tts_ops?  That would be the case, for
> > example, if the leaf partition is of Zedstore AM.  In the more common
> > cases where all leaf partitions are of heap AM, this would mean the
> > original slot would be used as is, that is, if we accept hard-coding
> > table_slot_callbacks() to return a "heap" slot for partitioned tables
> > as I suggest.
>
> Even then, we will still need to fill in the system columns explicitly as
> pi_PartitionTupleSlot will not be filled with system columns after
> it comes back out of table_tuple_insert() if we have a non-heap AM.

Well, I was hoping that table_tuple_insert() would fill that info, but
you did say upthread that table AMs are not exactly expected to do so,
so maybe you have a point.

By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin?  Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Soumyadeep Chakraborty
Date:
Hey Amit,

On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:

> By the way, what happens today if you do INSERT INTO a_zedstore_table
> ... RETURNING xmin?  Do you get an error "xmin is unrecognized" or
> some such in slot_getsysattr() when trying to project the RETURNING
> list?
>

We get garbage values for xmin and cmin. If we request cmax/xmax, we get
an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
"zedstore tuple table slot does not have system attributes (except xmin
and cmin)"

A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
zedstoream_insert(), which is the tuple_insert() implementation, does
not supply the xmin/cmin, thus making those values garbage.

For context, Zedstore has its own UNDO log implementation to act as
storage for transaction information. (which is intended to be replaced
with the upstream UNDO log in the future).

The above behavior is not just restricted to INSERT..RETURNING, right
now. If we do a select <tx_column> from foo in Zedstore, the behavior is
the same.  The transaction information is never returned from Zedstore
in tableam calls that don't demand transactional information be
used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
it will use the transactional information correctly. It will also
populate TM_FailureData, which contains xmax and cmax, in the APIs where
it is demanded.

I really wonder what other AMs are doing about these issues.

I think we should either:

1. Demand transactional information off of AMs for all APIs that involve
a projection of transactional information.

2. Have some other component of Postgres supply the transactional
information. This is what I think the upstream UNDO log can probably
provide.

3. (Least elegant) Transform tuple table slots into heap tuple table
slots (since it is the only kind of tuple storage that can supply
transactional info) and explicitly fill in the transactional values
depending on the context, whenever transactional information is
projected.

For this bug report, I am not sure what is right. Perhaps, to stop the
bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
that the AM needs to provide the transactional info in the respective
insert AM API calls, as well as demand a heap slot for partition roots
and interior nodes. And then later on. we would need a larger effort
making all of these APIs not really demand transactional information.
Perhaps the UNDO framework will come to the rescue.

Regards,
Soumyadeep (VMware)



Re: posgres 12 bug (partitioned table)

From
Soumyadeep Chakraborty
Date:
Hey Amit,

On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:

> By the way, what happens today if you do INSERT INTO a_zedstore_table
> ... RETURNING xmin?  Do you get an error "xmin is unrecognized" or
> some such in slot_getsysattr() when trying to project the RETURNING
> list?
>

We get garbage values for xmin and cmin. If we request cmax/xmax, we get
an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
"zedstore tuple table slot does not have system attributes (except xmin
and cmin)"

A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
zedstoream_insert(), which is the tuple_insert() implementation, does
not supply the xmin/cmin, thus making those values garbage.

For context, Zedstore has its own UNDO log implementation to act as
storage for transaction information. (which is intended to be replaced
with the upstream UNDO log in the future).

The above behavior is not just restricted to INSERT..RETURNING, right
now. If we do a select <tx_column> from foo in Zedstore, the behavior is
the same.  The transaction information is never returned from Zedstore
in tableam calls that don't demand transactional information be
used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
it will use the transactional information correctly. It will also
populate TM_FailureData, which contains xmax and cmax, in the APIs where
it is demanded.

I really wonder what other AMs are doing about these issues.

I think we should either:

1. Demand transactional information off of AMs for all APIs that involve
a projection of transactional information.

2. Have some other component of Postgres supply the transactional
information. This is what I think the upstream UNDO log can probably
provide.

3. (Least elegant) Transform tuple table slots into heap tuple table
slots (since it is the only kind of tuple storage that can supply
transactional info) and explicitly fill in the transactional values
depending on the context, whenever transactional information is
projected.

For this bug report, I am not sure what is right. Perhaps, to stop the
bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
that the AM needs to provide the transactional info in the respective
insert AM API calls, as well as demand a heap slot for partition roots
and interior nodes. And then later on. we would need a larger effort
making all of these APIs not really demand transactional information.
Perhaps the UNDO framework will come to the rescue.

Regards,
Soumyadeep (VMware)



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Hi Soumyadeep,

On Fri, Jul 10, 2020 at 2:56 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
>
> Hey Amit,
>
> On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> > By the way, what happens today if you do INSERT INTO a_zedstore_table
> > ... RETURNING xmin?  Do you get an error "xmin is unrecognized" or
> > some such in slot_getsysattr() when trying to project the RETURNING
> > list?
> >
> We get garbage values for xmin and cmin. If we request cmax/xmax, we get
> an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
> "zedstore tuple table slot does not have system attributes (except xmin
> and cmin)"
>
> A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
> zedstoream_insert(), which is the tuple_insert() implementation, does
> not supply the xmin/cmin, thus making those values garbage.
>
> For context, Zedstore has its own UNDO log implementation to act as
> storage for transaction information. (which is intended to be replaced
> with the upstream UNDO log in the future).
>
> The above behavior is not just restricted to INSERT..RETURNING, right
> now. If we do a select <tx_column> from foo in Zedstore, the behavior is
> the same.  The transaction information is never returned from Zedstore
> in tableam calls that don't demand transactional information be
> used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
> it will use the transactional information correctly. It will also
> populate TM_FailureData, which contains xmax and cmax, in the APIs where
> it is demanded.
>
> I really wonder what other AMs are doing about these issues.
>
> I think we should either:
>
> 1. Demand transactional information off of AMs for all APIs that involve
> a projection of transactional information.
>
> 2. Have some other component of Postgres supply the transactional
> information. This is what I think the upstream UNDO log can probably
> provide.

So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns.  We shouldn't really need any
new core code to get the transaction-related system columns while
there exists a perfectly reasonable channel for it to arrive through
-- TupleTableSlots.  I suppose there's a reason why we allow AMs to
provide their own slot callbacks.

Whether an AM uses UNDO log or something else to manage the
transaction info is up to the AM, so I don't see why the AMs
themselves shouldn't be in charge of returning that info, because only
they know where it is.

> 3. (Least elegant) Transform tuple table slots into heap tuple table
> slots (since it is the only kind of tuple storage that can supply
> transactional info) and explicitly fill in the transactional values
> depending on the context, whenever transactional information is
> projected.
>
> For this bug report, I am not sure what is right. Perhaps, to stop the
> bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
> that the AM needs to provide the transactional info in the respective
> insert AM API calls,

As long as the AM's slot_getsysattr() callback returns the correct
value, this works.

> as well as demand a heap slot for partition roots
> and interior nodes.

It would be a compromise on the core's part to use "heap" slots for
partitioned tables, because they don't have a valid table AM.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Hi Soumyadeep,

On Fri, Jul 10, 2020 at 2:56 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
>
> Hey Amit,
>
> On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> > By the way, what happens today if you do INSERT INTO a_zedstore_table
> > ... RETURNING xmin?  Do you get an error "xmin is unrecognized" or
> > some such in slot_getsysattr() when trying to project the RETURNING
> > list?
> >
> We get garbage values for xmin and cmin. If we request cmax/xmax, we get
> an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
> "zedstore tuple table slot does not have system attributes (except xmin
> and cmin)"
>
> A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
> zedstoream_insert(), which is the tuple_insert() implementation, does
> not supply the xmin/cmin, thus making those values garbage.
>
> For context, Zedstore has its own UNDO log implementation to act as
> storage for transaction information. (which is intended to be replaced
> with the upstream UNDO log in the future).
>
> The above behavior is not just restricted to INSERT..RETURNING, right
> now. If we do a select <tx_column> from foo in Zedstore, the behavior is
> the same.  The transaction information is never returned from Zedstore
> in tableam calls that don't demand transactional information be
> used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
> it will use the transactional information correctly. It will also
> populate TM_FailureData, which contains xmax and cmax, in the APIs where
> it is demanded.
>
> I really wonder what other AMs are doing about these issues.
>
> I think we should either:
>
> 1. Demand transactional information off of AMs for all APIs that involve
> a projection of transactional information.
>
> 2. Have some other component of Postgres supply the transactional
> information. This is what I think the upstream UNDO log can probably
> provide.

So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns.  We shouldn't really need any
new core code to get the transaction-related system columns while
there exists a perfectly reasonable channel for it to arrive through
-- TupleTableSlots.  I suppose there's a reason why we allow AMs to
provide their own slot callbacks.

Whether an AM uses UNDO log or something else to manage the
transaction info is up to the AM, so I don't see why the AMs
themselves shouldn't be in charge of returning that info, because only
they know where it is.

> 3. (Least elegant) Transform tuple table slots into heap tuple table
> slots (since it is the only kind of tuple storage that can supply
> transactional info) and explicitly fill in the transactional values
> depending on the context, whenever transactional information is
> projected.
>
> For this bug report, I am not sure what is right. Perhaps, to stop the
> bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
> that the AM needs to provide the transactional info in the respective
> insert AM API calls,

As long as the AM's slot_getsysattr() callback returns the correct
value, this works.

> as well as demand a heap slot for partition roots
> and interior nodes.

It would be a compromise on the core's part to use "heap" slots for
partitioned tables, because they don't have a valid table AM.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Reading my own words, I think I must fix an ambiguity:

On Fri, Jul 10, 2020 at 3:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> So even if an AM's table_tuple_insert() itself doesn't populate the
> transaction info into the slot handed to it, maybe as an optimization,
> it does not sound entirely unreasonable to expect that the AM's
> slot_getsysattr() callback returns it correctly when projecting a
> target list containing system columns.

The "maybe as an optimization" refers to the part of the sentence that
comes before it.  That is, I mean table_tuple_insert() may choose to
not populate the transaction info in the slot as an optimization.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Reading my own words, I think I must fix an ambiguity:

On Fri, Jul 10, 2020 at 3:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> So even if an AM's table_tuple_insert() itself doesn't populate the
> transaction info into the slot handed to it, maybe as an optimization,
> it does not sound entirely unreasonable to expect that the AM's
> slot_getsysattr() callback returns it correctly when projecting a
> target list containing system columns.

The "maybe as an optimization" refers to the part of the sentence that
comes before it.  That is, I mean table_tuple_insert() may choose to
not populate the transaction info in the slot as an optimization.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Robert Haas
Date:
On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pavel Biryukov <79166341370@yandex.ru> writes:
> > Hello. I'm catch error "virtual tuple table slot does not have system
> > attributes" when inserting row into partitioned table with RETURNING
> > xmin
>
> Reproduced here.  The example works in v11, and in v10 if you remove
> the unnecessary-to-the-example primary key, so it seems like a clear
> regression.  I didn't dig for the cause but I suppose it's related
> to Andres' slot-related changes.

I wonder whether it's really correct to classify this as a bug. The
subsequent discussion essentially boils down to this: the partitioned
table's children could use any AM, and they might not all use the same
AM. The system columns that are relevant for the heap may therefore be
relevant to all, some, or none of the children. In fact, any fixed
kind of tuple table slot we might choose to use for the parent has
this problem. If all of the children are of the same type -- and today
that would have to be heap -- then using that type of tuple table slot
for the parent as well would make sense. But there's no real reason
why that's the correct answer in general. If the children are all of
some other type, using a heap slot for the parent is wrong; and if
they're all of different types, it's unclear that anything other than
a virtual slot makes any sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Pavel Biryukov <79166341370@yandex.ru> writes:
>>> Hello. I'm catch error "virtual tuple table slot does not have system
>>> attributes" when inserting row into partitioned table with RETURNING
>>> xmin

> I wonder whether it's really correct to classify this as a bug. The
> subsequent discussion essentially boils down to this: the partitioned
> table's children could use any AM, and they might not all use the same
> AM. The system columns that are relevant for the heap may therefore be
> relevant to all, some, or none of the children. In fact, any fixed
> kind of tuple table slot we might choose to use for the parent has
> this problem. If all of the children are of the same type -- and today
> that would have to be heap -- then using that type of tuple table slot
> for the parent as well would make sense. But there's no real reason
> why that's the correct answer in general. If the children are all of
> some other type, using a heap slot for the parent is wrong; and if
> they're all of different types, it's unclear that anything other than
> a virtual slot makes any sense.

Well, if we want to allow such scenarios then we need to forbid queries
from accessing "system columns" of a partitioned table, much as we do for
views.  Failing way down inside the executor seems quite unacceptable.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Robert Haas
Date:
On Tue, Aug 11, 2020 at 12:02 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
> I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for
concurrency.
> This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.

That's certainly a good reason to try to make it work. And we can make
it work, if we're willing to assume that everything's a heap table.
But at some point, that hopefully won't be true any more, and then
this whole idea becomes pretty dubious. I think we shouldn't wait
until it happens to start thinking about that problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2020-08-11 19:02:57 +0300, Pavel Biryukov wrote:
> I just want to point that Npgsql provider for .Net Core builds queries like
> that (RETURNING xmin) to keep track for concurrency.

Could you provide a bit more details about what that's actually used
for?

Regards,

Andres



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 11, 2020 at 12:02 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
>> I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for
concurrency.
>> This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.

> That's certainly a good reason to try to make it work. And we can make
> it work, if we're willing to assume that everything's a heap table.
> But at some point, that hopefully won't be true any more, and then
> this whole idea becomes pretty dubious. I think we shouldn't wait
> until it happens to start thinking about that problem.

For xmin in particular, you don't have to assume "everything's a heap".
What you have to assume is "everything uses MVCC", which seems a more
defensible position.  It'll still fall down for foreign tables that are
partitions, though.

I echo Andres' nearby question about exactly why npgsql has such a
hard dependency on xmin.  Maybe what we need is to try to abstract
that a little, and see if we could require all partition members
to support some unified concept of it.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2020-06-04 12:57:30 -0400, Tom Lane wrote:
> Pavel Biryukov <79166341370@yandex.ru> writes:
> > Hello. I'm catch error "virtual tuple table slot does not have system
> > attributes" when inserting row into partitioned table with RETURNING
> > xmin
>
> Reproduced here.  The example works in v11, and in v10 if you remove
> the unnecessary-to-the-example primary key, so it seems like a clear
> regression.  I didn't dig for the cause but I suppose it's related
> to Andres' slot-related changes.

The reason we're getting the failure is that nodeModifyTable.c only is
dealing with virtual tuple slots, which don't carry visibility
information. Despite actually having infrastructure for creating a
partition specific slot.  If I force check_attrmap_match() to return
false, the example starts working.

I don't really know how to best fix this in the partitioning
infrastructure. Currently the determination whether there needs to be
any conversion between subplan slot and the slot used for insertion is
solely based on comparing tuple descriptors. But for better or worse, we
don't represent system column accesses in tuple descriptors.

It's not that hard to force the slot creation & use whenever there's
returning, but somehow that feels hackish (but so does plenty other
things in execPartition.c).  See attached.


But I'm worried that that's not enough: What if somebody in a trigger
wants to access system columns besides tableoid and tid (which are
handled in a generic manner)? Currently - only for partitioned table DML
going through the root table - we'll not have valid values for the
trigger. It's pretty dubious imo to use xmin/xmax in triggers, but ...

I suspect we should just unconditionally use
partrouteinfo->pi_PartitionTupleSlot. Going through
partrouteinfo->pi_RootToPartitionMap if present, and ExecCopySlot()
otherwise.


Medium term I think we should just plain out forbid references to system
columns in partioned tables Or at least insist that all partitions have
that column. There's no meaningful way for some AMs to have xmin / xmax
in a compatible way with heap.


Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
On 2020-08-11 11:02:31 -0700, Andres Freund wrote:
> It's not that hard to force the slot creation & use whenever there's
> returning, but somehow that feels hackish (but so does plenty other
> things in execPartition.c).  See attached.

Actually attached this time.

Attachment

Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2020-08-11 13:52:00 -0400, Tom Lane wrote:
> For xmin in particular, you don't have to assume "everything's a heap".
> What you have to assume is "everything uses MVCC", which seems a more
> defensible position.  It'll still fall down for foreign tables that are
> partitions, though.

Don't think that necessarily implies having a compatible xmin / xmax
around. Certainly not a 32bit one. I guess an AM could always return
InvalidOid, but that doesn't seem particularly helpful either.

I think it'd be better if we actually tried to provide a way to do
whatever xmin is being used properly. I've seen many uses of xmin/xmax
and many of them didn't work at all, and most of remaining ones only
worked in common cases.

Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:
> Entity Framework is an ORM for .Net (and .Net Core). It has different providers
> for different databases (NpgSql for Postgres). It uses Optimistic concurrency.
> The common use case is to use xmin as "concurrency token".
>  
> In code we make "var e = new Entity();", "dbContext.Add(e)" and
> "dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,
> classical ORM;
>  
> When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it
> as concurrency token for further updates (update is made like "where id = [id]
> AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).

That's not really a safe use of xmin, e.g. it could have wrapped around
leading you to not notice a concurrent modification. Nor does it
properly deal with multiple statements within a transaction.  Perhaps
those are low enough risk for you, but I don't xmin is a decent building
block for this kind of thing.

Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2020-08-11 21:55:32 +0300, Pavel Biryukov wrote:
> I don't see a problem with "wrapping around" - the row's xmin does not change
> with freeze (AFAIK). It changes when the row is modified.
> So event if you hold some entity (with current xmin) for a long time (enough
> for "wrap around") and then try to update it, it will update ok.

The problem isn't that it won't update ok, it is that it might update
despite there being another update since the RETURNING xmin.
s1) BEGIN;INSERT ... RETURN xmin;COMMIT;
s2) BEGIN;UPDATE .. WHERE xmin ...; COMMIT;
s*) WRAPAROUND;
s1) BEGIN;UPDATE .. WHERE xmin ...; COMMIT;

this could lead to s1 not noticing that s2 was updated.

- Andres



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Hi,

On Wed, Aug 12, 2020 at 3:02 AM Andres Freund <andres@anarazel.de> wrote:
> On 2020-06-04 12:57:30 -0400, Tom Lane wrote:
> > Pavel Biryukov <79166341370@yandex.ru> writes:
> > > Hello. I'm catch error "virtual tuple table slot does not have system
> > > attributes" when inserting row into partitioned table with RETURNING
> > > xmin
> >
> > Reproduced here.  The example works in v11, and in v10 if you remove
> > the unnecessary-to-the-example primary key, so it seems like a clear
> > regression.  I didn't dig for the cause but I suppose it's related
> > to Andres' slot-related changes.
>
> The reason we're getting the failure is that nodeModifyTable.c only is
> dealing with virtual tuple slots, which don't carry visibility
> information. Despite actually having infrastructure for creating a
> partition specific slot.  If I force check_attrmap_match() to return
> false, the example starts working.
>
> I don't really know how to best fix this in the partitioning
> infrastructure. Currently the determination whether there needs to be
> any conversion between subplan slot and the slot used for insertion is
> solely based on comparing tuple descriptors. But for better or worse, we
> don't represent system column accesses in tuple descriptors.
>
> It's not that hard to force the slot creation & use whenever there's
> returning, but somehow that feels hackish (but so does plenty other
> things in execPartition.c).  See attached.
>
> But I'm worried that that's not enough: What if somebody in a trigger
> wants to access system columns besides tableoid and tid (which are
> handled in a generic manner)? Currently - only for partitioned table DML
> going through the root table - we'll not have valid values for the
> trigger. It's pretty dubious imo to use xmin/xmax in triggers, but ...
>
> I suspect we should just unconditionally use
> partrouteinfo->pi_PartitionTupleSlot. Going through
> partrouteinfo->pi_RootToPartitionMap if present, and ExecCopySlot()
> otherwise.

I see that to be the only way forward even though there will be a
slight hit in performance in typical cases where a virtual tuple slot
suffices.

> Medium term I think we should just plain out forbid references to system
> columns in partioned tables Or at least insist that all partitions have
> that column.

Performance-wise I would prefer the former, because the latter would
involve checking *all* partitions statically in the INSERT case,
something that we've avoided doing so far.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Wed, Aug 12, 2020 at 3:02 AM Andres Freund <andres@anarazel.de> wrote:
>> Medium term I think we should just plain out forbid references to system
>> columns in partioned tables Or at least insist that all partitions have
>> that column.

> Performance-wise I would prefer the former, because the latter would
> involve checking *all* partitions statically in the INSERT case,
> something that we've avoided doing so far.

It's not like we don't have a technology for doing that.  The way this
ideally would work, IMV, is that the parent partitioned table either
has or doesn't have a given system column.  If it does, then every
child must too, just like the way things work for user columns.

This'd require (a) some sort of consensus about which kinds of system
columns can make sense --- as Andres noted, 32-bit xmin might not be
the best choice here --- and (b) some notation for users to declare
which of these columns they want in a partitioned table.  Once upon
a time we had WITH OIDS, maybe that idea could be extended.

I'm not entirely sure that this is worth all the trouble, but that's
how I'd sketch doing it.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
On Wed, Aug 12, 2020 at 1:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Wed, Aug 12, 2020 at 3:02 AM Andres Freund <andres@anarazel.de> wrote:
> >> Medium term I think we should just plain out forbid references to system
> >> columns in partioned tables Or at least insist that all partitions have
> >> that column.
>
> > Performance-wise I would prefer the former, because the latter would
> > involve checking *all* partitions statically in the INSERT case,
> > something that we've avoided doing so far.
>
> It's not like we don't have a technology for doing that.  The way this
> ideally would work, IMV, is that the parent partitioned table either
> has or doesn't have a given system column.  If it does, then every
> child must too, just like the way things work for user columns.

Ah, I may have misread "insisting that all partitions have a given
system column" as doing that on every query, but maybe Andres meant
what you are describing here.

> This'd require (a) some sort of consensus about which kinds of system
> columns can make sense --- as Andres noted, 32-bit xmin might not be
> the best choice here --- and (b) some notation for users to declare
> which of these columns they want in a partitioned table.  Once upon
> a time we had WITH OIDS, maybe that idea could be extended.

For (a), isn't there already a consensus that all table AMs support at
least the set of system columns described in 5.5 System Columns [1]
even if the individual members of that set are no longer the best
choice at this point?  I do agree that we'd need (b) in some form to
require AMs to fill those columns which it seems is not the case
currently.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/docs/current/ddl-system-columns.html



AW: posgres 12 bug (partitioned table)

From
Wilm Hoyer
Date:

Hi,

while the request for returning xmin of partitioned tables is still valid, i’d like to add some information and a possible workaround.


On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:

 Entity Framework is an ORM for .Net (and .Net Core). It has different providers
 for different databases (NpgSql for Postgres). It uses Optimistic concurrency.
 The common use case is to use xmin as "concurrency token".
  
 In code we make "var e = new Entity();", "dbContext.Add(e)" and
 "dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,
 classical ORM;
  
 When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it
 as concurrency token for further updates (update is made like "where id = [id]
 AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).

 

Neither the Entity Framework, nor npgsql rely on the column xmin. Both don’t know about this column in their codebase.

In the case oft he EF i’m sure that this holds true for all versions, since it is designed as DBMS independant, and as such will never know anything about a PostgreSQL specific column.

Also you can use any ADO.Net provider to connect to a concrete DBMS – i for example use dotConnect for PostgreSQL because it provided more features and less bugs as Npgsql at the time of  decission.

As for Npgsql i have only checked that the current HEAD has no reference to xmin in its source code.

 

With that in mind, i assume the OP included the column xmin in his Entity Model by himself and set the ConcurrencyMode to fixed for that column.

As xmin is a system column that the EF should never try to update (PostgreSQL will reject this attempt, i think), i’d suggest using a self defined column (row_version for example) and either use triggers on update and insert to increment its value (works even with updates outside of EF) or let the EF do the increment.

 

Regards

Wilm Hoyer.

 

 

 

AW: AW: posgres 12 bug (partitioned table)

From
Wilm Hoyer
Date:

 

> Wilm,

 

> Have a look, there is even an extension method for xmin (stable branch):

 

 

> EF Core is mostly independent from DB, but there usually are some "tweaks" for each database that you should know (like for SQL server the native concurrency token should be like byte[]:

  1. [TimeStamp]
  2. public byte[] RowVersion { get; set; }
  3. )

 

> Additional "self defined column" for billion rows tables leads to additional space needed. We use partitioning when the tables are LARGE :)

 

> This works fine in 10, 11, it's strange it is broken in 12 (from db users point of view, I haven't examined the sources of PG for internals...)

 

Sorry, i had not looked into the extension since it is not part of core npgsql. As it states: Npgsql.EntityFrameworkCore.PostgreSQL is an Entity Framework Core provider built on top of Npgsql. (and we left npgsql before this extension was started).

I guess it would be easy for them to implement a solution, because that basically means adopting the relevant part from the EF SqlServer implementation. But i fear you’re out of luck with them, as support for table partitioning is still an open issue in their project.

 

Best regards

Wilm.

Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2020-08-12 14:19:12 +0900, Amit Langote wrote:
> On Wed, Aug 12, 2020 at 1:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Amit Langote <amitlangote09@gmail.com> writes:
> > > On Wed, Aug 12, 2020 at 3:02 AM Andres Freund <andres@anarazel.de> wrote:
> > >> Medium term I think we should just plain out forbid references to system
> > >> columns in partioned tables Or at least insist that all partitions have
> > >> that column.
> >
> > > Performance-wise I would prefer the former, because the latter would
> > > involve checking *all* partitions statically in the INSERT case,
> > > something that we've avoided doing so far.
> >
> > It's not like we don't have a technology for doing that.  The way this
> > ideally would work, IMV, is that the parent partitioned table either
> > has or doesn't have a given system column.  If it does, then every
> > child must too, just like the way things work for user columns.
> 
> Ah, I may have misread "insisting that all partitions have a given
> system column" as doing that on every query, but maybe Andres meant
> what you are describing here.

I think Tom's formulation makes sense.


> > This'd require (a) some sort of consensus about which kinds of system
> > columns can make sense --- as Andres noted, 32-bit xmin might not be
> > the best choice here --- and (b) some notation for users to declare
> > which of these columns they want in a partitioned table.  Once upon
> > a time we had WITH OIDS, maybe that idea could be extended.
> 
> For (a), isn't there already a consensus that all table AMs support at
> least the set of system columns described in 5.5 System Columns [1]
> even if the individual members of that set are no longer the best
> choice at this point?

I don't think there is. I don't think xmin/xmax/cmin/cmax should be
among those. tableoid and ctid are handled by generic code, so I think
they would be among the required columns.

Where do you see that concensus?


> I do agree that we'd need (b) in some form to require AMs to fill
> those columns which it seems is not the case currently.

Hm? What are you referencing here?

Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
Hi,

On Thu, Aug 13, 2020 at 1:27 AM Andres Freund <andres@anarazel.de> wrote:
> On 2020-08-12 14:19:12 +0900, Amit Langote wrote:
> > On Wed, Aug 12, 2020 at 1:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > It's not like we don't have a technology for doing that.  The way this
> > > ideally would work, IMV, is that the parent partitioned table either
> > > has or doesn't have a given system column.  If it does, then every
> > > child must too, just like the way things work for user columns.
> >
> > Ah, I may have misread "insisting that all partitions have a given
> > system column" as doing that on every query, but maybe Andres meant
> > what you are describing here.
>
> I think Tom's formulation makes sense.

Yes, I agree.

> > > This'd require (a) some sort of consensus about which kinds of system
> > > columns can make sense --- as Andres noted, 32-bit xmin might not be
> > > the best choice here --- and (b) some notation for users to declare
> > > which of these columns they want in a partitioned table.  Once upon
> > > a time we had WITH OIDS, maybe that idea could be extended.
> >
> > For (a), isn't there already a consensus that all table AMs support at
> > least the set of system columns described in 5.5 System Columns [1]
> > even if the individual members of that set are no longer the best
> > choice at this point?
>
> I don't think there is. I don't think xmin/xmax/cmin/cmax should be
> among those. tableoid and ctid are handled by generic code, so I think
> they would be among the required columns.
>
> Where do you see that concensus?

Perhaps I was wrong to use the word consensus.  I was trying to say
that table AM extensibility work didn't change the description in 5.5
System Columns, which still says *all* tables, irrespective of their
AM, implicitly have those columns, so I assumed we continue to ask AM
authors to have space for those columns in their tuples.  Maybe, that
list is a legacy of heapam and updating it in am AM-agnostic manner
would require consensus.

> > I do agree that we'd need (b) in some form to require AMs to fill
> > those columns which it seems is not the case currently.
>
> Hm? What are you referencing here?

I meant that WITH <a-system-column> specified on a table presumably
forces an AM to ensure that the column is present in its tuples, like
WITH OIDS specification on a table would force heapam to initialize
the oid system column in all tuples being inserted into that table.
Lack of the same notation for other system columns means that AMs
don't feel forced to ensure those columns are present in their tuples.
Also, having the WITH notation makes it easy to enforce that all
partitions in a given hierarchy have AMs that respect the WITH
specification.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Thu, Aug 13, 2020 at 1:27 AM Andres Freund <andres@anarazel.de> wrote:
>> Hm? What are you referencing here?

> I meant that WITH <a-system-column> specified on a table presumably
> forces an AM to ensure that the column is present in its tuples, like
> WITH OIDS specification on a table would force heapam to initialize
> the oid system column in all tuples being inserted into that table.
> Lack of the same notation for other system columns means that AMs
> don't feel forced to ensure those columns are present in their tuples.

I might be missing some subtlety here, but it seems to me that a
given table AM will probably have a fixed set of system columns
that it provides.  For example heapam would provide exactly the
current set of columns, no matter what.  I'm imagining that
(1) if the parent partitioned table has column X, and the proposed
child-table AM doesn't provide that, then we just refuse to
create/attach the partition.
(2) if a child-table AM provides some system column that the
parent does not, then you can access that column when querying
the child table directly, but not when querying the parent.
This works just like extra child columns in traditional PG
inheritance.

Given these rules, an AM isn't expected to do anything conditional at
runtime: it just provides what it provides.  Instead we have an issue
to solve in or near the TupleTableSlot APIs, namely how to deal with
tuples that don't all have the same system columns.  But we'd have
that problem anyway.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
On Tue, Oct 27, 2020 at 10:55 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
>
> Hi!
>
> What's the state with this issue?

I think that the patch that Andres Freund posted earlier on this
thread [1] would be fine as a workaround at least for stable releases
v12 and v13.  I have attached with this email a rebased version of
that patch, although I also made a few changes.  The idea of the patch
is to allocate and use a partition-specific *non-virtual* slot, one
that is capable of providing system columns when the RETURNING
projection needs them.  Andres' patch would allocate such a slot even
if RETURNING contained no system columns, whereas I changed the slot
creation code stanza to also check that RETURNING indeed contains
system columns.  I've attached 2 patch files: one for HEAD and another
for v12 and v13 branches.

That said, the discussion on what to do going forward to *cleanly*
support accessing system columns through partitioned tables is
pending, but maybe the "workaround" fix will be enough in the meantime
(at least v12 and v13 can only get a workaround fix).

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/20200811180629.zx57llliqcmcgfyr%40alap3.anarazel.de

Attachment

AW: posgres 12 bug (partitioned table)

From
Date:

Hi Community

 

Related to ef core I found another issue. I try to convert a database column from bool to int with ef core migrations.

So if I generate the migration file it will look like as following:  ALTER TABLE "Customer" ALTER COLUMN "State" TYPE INT 

But in Postgres this statement need the “using” extension so it should look like this: ALTER TABLE " Customer" ALTER COLUMN " State " TYPE INT USING " State"::integer;

 

In my opinion ef core should be generate the right statement (with using statements) out of the box, so is there an issue with the NPG Provider?

 

Thank you in advance.

 

Kind regards

Kenan

 

Von: Wilm Hoyer <W.Hoyer@dental-vision.de>
Gesendet: Mittwoch, 12. August 2020 11:35
An: Pavel Biryukov <79166341370@yandex.ru>; Andres Freund <andres@anarazel.de>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Robert Haas <robertmhaas@gmail.com>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Betreff: AW: posgres 12 bug (partitioned table)

 

Hi,

while the request for returning xmin of partitioned tables is still valid, i’d like to add some information and a possible workaround.


On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:

 Entity Framework is an ORM for .Net (and .Net Core). It has different providers
 for different databases (NpgSql for Postgres). It uses Optimistic concurrency.
 The common use case is to use xmin as "concurrency token".
  
 In code we make "var e = new Entity();", "dbContext.Add(e)" and
 "dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,
 classical ORM;
  
 When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it
 as concurrency token for further updates (update is made like "where id = [id]
 AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).

 

Neither the Entity Framework, nor npgsql rely on the column xmin. Both don’t know about this column in their codebase.

In the case oft he EF i’m sure that this holds true for all versions, since it is designed as DBMS independant, and as such will never know anything about a PostgreSQL specific column.

Also you can use any ADO.Net provider to connect to a concrete DBMS – i for example use dotConnect for PostgreSQL because it provided more features and less bugs as Npgsql at the time of  decission.

As for Npgsql i have only checked that the current HEAD has no reference to xmin in its source code.

 

With that in mind, i assume the OP included the column xmin in his Entity Model by himself and set the ConcurrencyMode to fixed for that column.

As xmin is a system column that the EF should never try to update (PostgreSQL will reject this attempt, i think), i’d suggest using a self defined column (row_version for example) and either use triggers on update and insert to increment its value (works even with updates outside of EF) or let the EF do the increment.

 

Regards

Wilm Hoyer.

 

 

 

Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> Forgot to ask in the last reply: for HEAD, should we consider simply
> preventing accessing system columns when querying through a parent
> partitioned table, as was also mentioned upthread?

Indeed, it seems like this thread is putting a fair amount of work
into a goal that we shouldn't even be trying to do.  We gave up the
assumption that a partitioned table's children would have consistent
system columns the moment we decided to allow foreign tables as
partition leaf tables; and it's only going to get worse as alternate
table AMs become more of a thing.  So now I'm thinking the right thing
to be working on is banning access to system columns via partition
parent tables (other than tableoid, which ought to work).

I'm not quite sure whether the right way to do that is just not
make pg_attribute entries for system columns of partitioned tables,
as we do for views; or make them but have a code-driven error if
you try to use one in a query.  The second way is uglier, but
it should allow a more on-point error message.  OTOH, if we start
to have different sets of system columns for different table AMs,
we can't have partitioned tables covering all of those sets.

In the meantime, if the back branches fail with something like
"virtual tuple table slot does not have system attributes" when
trying to do this, that's not great but I'm not sure we should
be putting effort into improving it.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2021-04-21 15:10:09 -0400, Tom Lane wrote:
> Indeed, it seems like this thread is putting a fair amount of work
> into a goal that we shouldn't even be trying to do.  We gave up the
> assumption that a partitioned table's children would have consistent
> system columns the moment we decided to allow foreign tables as
> partition leaf tables; and it's only going to get worse as alternate
> table AMs become more of a thing.

Agreed.


> So now I'm thinking the right thing to be working on is banning access
> to system columns via partition parent tables (other than tableoid,
> which ought to work).

And ctid, I assume? While I have some hope for generalizing the
representation of tids at some point, I don't think it's realistic that
we'd actually get rid of them anytime soon.


> I'm not quite sure whether the right way to do that is just not
> make pg_attribute entries for system columns of partitioned tables,
> as we do for views; or make them but have a code-driven error if
> you try to use one in a query.  The second way is uglier, but
> it should allow a more on-point error message.

I'm leaning towards the approach of not having the pg_attribute entries
- it seems not great to have pg_attribute entries for columns that
actually cannot be queried. I don't think we can expect tooling querying
the catalogs to understand that.


> OTOH, if we start to have different sets of system columns for
> different table AMs, we can't have partitioned tables covering all of
> those sets.

One could even imagine partition root specific system columns...

If we wanted AMs to actually be able to do introduce their own set of
system columns we'd need to change their representation to some
degree.

ISTM that the minimal changes needed would be to reorder sysattr.h to
have TableOidAttributeNumber be -2 (and keep
SelfItemPointerAttributeNumber as -1). And then slowly change all code
to not reference FirstLowInvalidHeapAttributeNumber - which seems like a
*substantial* amount of effort, due to all the shifting of AttrNumber by
FirstLowInvalidHeapAttributeNumber to be able to represent system
columns in bitmaps.

But perhaps that's the wrong direction, and we instead should work
towards *removing* system columns besides tableoid and ctid? At least
the way they work in heapam doesn't really make xmin,xmax,cmin,cmax
particularly useful.  Wild speculation: Perhaps we ought to have some
parse-analysis-time handling for AM specific functions that return
additional information about a row, and that are evaluated directly
above (or even in) tableams?


> In the meantime, if the back branches fail with something like
> "virtual tuple table slot does not have system attributes" when
> trying to do this, that's not great but I'm not sure we should
> be putting effort into improving it.

Seems ok enough.

Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2021-04-21 15:10:09 -0400, Tom Lane wrote:
>> So now I'm thinking the right thing to be working on is banning access
>> to system columns via partition parent tables (other than tableoid,
>> which ought to work).

> And ctid, I assume? While I have some hope for generalizing the
> representation of tids at some point, I don't think it's realistic that
> we'd actually get rid of them anytime soon.

Hmm, I'd have thought that ctid would be very high on the list of
things we don't want to assume are the same for all AMs.  Admittedly,
refactoring that is going to be a large pain in the rear, but ...

I see that it actually works right now because slot_getsysattr
special-cases both tableoid and ctid, but I have a hard time
believing that that's a long-term answer.

> One could even imagine partition root specific system columns...

Yeah.  As I think about this, I recall a previous discussion where
we suggested that maybe partitioned tables could have a subset
of system columns, whereupon all their children would be required
to support (at least) those system columns.  But that would have
to be user-controllable, so a lot of infrastructure would need to
be built for it.  For the moment I'd be satisfied with a fixed
decision that only tableoid is treated that way.

> ISTM that the minimal changes needed would be to reorder sysattr.h to
> have TableOidAttributeNumber be -2 (and keep
> SelfItemPointerAttributeNumber as -1). And then slowly change all code
> to not reference FirstLowInvalidHeapAttributeNumber - which seems like a
> *substantial* amount of effort, due to all the shifting of AttrNumber by
> FirstLowInvalidHeapAttributeNumber to be able to represent system
> columns in bitmaps.

Getting rid of FirstLowInvalidHeapAttributeNumber seems like nearly
a nonstarter.  I'd be more inclined to reduce it by one or two so
as to leave some daylight for AMs that want system columns different
from the usual set.  I don't feel any urgency about renumbering the
existing columns --- the value of that vs. the risk of breaking
things doesn't seem like a great tradeoff.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2021-04-21 16:51:42 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2021-04-21 15:10:09 -0400, Tom Lane wrote:
> >> So now I'm thinking the right thing to be working on is banning access
> >> to system columns via partition parent tables (other than tableoid,
> >> which ought to work).
> 
> > And ctid, I assume? While I have some hope for generalizing the
> > representation of tids at some point, I don't think it's realistic that
> > we'd actually get rid of them anytime soon.
> 
> Hmm, I'd have thought that ctid would be very high on the list of
> things we don't want to assume are the same for all AMs.  Admittedly,
> refactoring that is going to be a large pain in the rear, but ...

I don't really see us getting rid of something like ctid as a generic
concept across AMs - there's just too many places that need a way to
reference a specific tuple. However, I think we ought to change how much
code outside of AMs know about what tids mean. And, although that's a
significant lift on its own, we ought to make at least the generic
representation variable width.


> I see that it actually works right now because slot_getsysattr
> special-cases both tableoid and ctid, but I have a hard time
> believing that that's a long-term answer.

Not fully responsive to this: I've previously wondered if we ought to
handle tableoid at parse-analysis or expression simplification time
instead of runtime. It's not really something that needs runtime
evaluation. But it's also not clear it's worth changing anything...


> > One could even imagine partition root specific system columns...
> 
> Yeah.  As I think about this, I recall a previous discussion where
> we suggested that maybe partitioned tables could have a subset
> of system columns, whereupon all their children would be required
> to support (at least) those system columns.  But that would have
> to be user-controllable, so a lot of infrastructure would need to
> be built for it.  For the moment I'd be satisfied with a fixed
> decision that only tableoid is treated that way.

I don't really see a convincing usecase to add this kind of complication
with the current set of columns. Outside of tableoid and ctid every use
of system attributes was either wrong, or purely for debugging /
introspection, where restricting it to actual tables doesn't seem like a
problem.

Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2021-04-21 16:51:42 -0400, Tom Lane wrote:
>> Hmm, I'd have thought that ctid would be very high on the list of
>> things we don't want to assume are the same for all AMs.  Admittedly,
>> refactoring that is going to be a large pain in the rear, but ...

> I don't really see us getting rid of something like ctid as a generic
> concept across AMs - there's just too many places that need a way to
> reference a specific tuple. However, I think we ought to change how much
> code outside of AMs know about what tids mean. And, although that's a
> significant lift on its own, we ought to make at least the generic
> representation variable width.

It seems like it might not be that hard to convert ctid generically
into a uint64, where heaps and heap-related indexes only use 6 bytes
of it.  Variable-width I agree would be a very big complication added
on top, and I'm not quite convinced that we need it.

> Not fully responsive to this: I've previously wondered if we ought to
> handle tableoid at parse-analysis or expression simplification time
> instead of runtime. It's not really something that needs runtime
> evaluation. But it's also not clear it's worth changing anything...

If you do "RETURNING tableoid" in DML on a partitioned table, or on a
traditional-inheritance hierarchy, you get the OID of the particular
partition that was touched (indeed, that's the whole point of the
feature, IIRC).  Don't really see how that could be implemented
earlier than runtime.  Also don't see where the win would be, TBH.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2021-04-21 17:38:53 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I don't really see us getting rid of something like ctid as a generic
> > concept across AMs - there's just too many places that need a way to
> > reference a specific tuple. However, I think we ought to change how much
> > code outside of AMs know about what tids mean. And, although that's a
> > significant lift on its own, we ought to make at least the generic
> > representation variable width.
>
> It seems like it might not be that hard to convert ctid generically
> into a uint64, where heaps and heap-related indexes only use 6 bytes
> of it.

Yep.


> Variable-width I agree would be a very big complication added on top,
> and I'm not quite convinced that we need it.

I can see three (related) major cases where variable width tids would be
quite useful:
1) Creating an index-oriented-table AM would harder/more
   limited with just an 8 byte tid
2) Supporting "indirect" indexes (i.e. indexes pointing to a primary
   key, thereby being much cheaper to maintain when there are updates),
   would require the primary key to map to an 8 byte integer.
3) Global indexes would be a lot easier if we had variable width tids
   (but other ways of addressing the issue are possible).

Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
On Thu, Apr 22, 2021 at 4:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > Forgot to ask in the last reply: for HEAD, should we consider simply
> > preventing accessing system columns when querying through a parent
> > partitioned table, as was also mentioned upthread?
>
> Indeed, it seems like this thread is putting a fair amount of work
> into a goal that we shouldn't even be trying to do.  We gave up the
> assumption that a partitioned table's children would have consistent
> system columns the moment we decided to allow foreign tables as
> partition leaf tables; and it's only going to get worse as alternate
> table AMs become more of a thing.  So now I'm thinking the right thing
> to be working on is banning access to system columns via partition
> parent tables (other than tableoid, which ought to work).

Accessing both tableoid and ctid works currently.  Based on the
discussion, it seems we're not so sure whether that will remain the
case for the 2nd going forward.

> I'm not quite sure whether the right way to do that is just not
> make pg_attribute entries for system columns of partitioned tables,
> as we do for views; or make them but have a code-driven error if
> you try to use one in a query.  The second way is uglier, but
> it should allow a more on-point error message.  OTOH, if we start
> to have different sets of system columns for different table AMs,
> we can't have partitioned tables covering all of those sets.

I tend to agree with Andres that not having any pg_attribute entries
may be better.

> In the meantime, if the back branches fail with something like
> "virtual tuple table slot does not have system attributes" when
> trying to do this, that's not great but I'm not sure we should
> be putting effort into improving it.

Got it.  That sounds like an acceptable compromise.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Thu, Apr 22, 2021 at 4:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In the meantime, if the back branches fail with something like
>> "virtual tuple table slot does not have system attributes" when
>> trying to do this, that's not great but I'm not sure we should
>> be putting effort into improving it.

> Got it.  That sounds like an acceptable compromise.

Hm, we're not done with this.  Whatever you think of the merits of
throwing an implementation-level error, what's actually happening
in v12 and v13 in the wake of a71cfc56b is that an attempt to do
"RETURNING xmin" works in a non-cross-partition UPDATE, but in
a cross-partition UPDATE it dumps core.  What I'm seeing is that
the result of the execute_attr_map_slot() call looks like

(gdb) p *(BufferHeapTupleTableSlot*) scanslot
$3 = {base = {base = {type = T_TupleTableSlot, tts_flags = 8, tts_nvalid = 3, 
      tts_ops = 0xa305c0 <TTSOpsBufferHeapTuple>, 
      tts_tupleDescriptor = 0x7f1c7798f920, tts_values = 0x2077100, 
      tts_isnull = 0x2075fb0, tts_mcxt = 0x2015ea0, tts_tid = {ip_blkid = {
          bi_hi = 0, bi_lo = 0}, ip_posid = 3}, tts_tableOid = 37812}, 
    tuple = 0x0, off = 0, tupdata = {t_len = 0, t_self = {ip_blkid = {
          bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, 
      t_data = 0x0}}, buffer = 0}

and since base.tuple is NULL, heap_getsysattr dies on either
"Assert(tup)" or a null-pointer dereference.

So

(1) It seems like this is exposing a shortcoming in the
multiple-slot-types logic.  It's not hard to understand why the slot would
look like this after execute_attr_map_slot does ExecStoreVirtualTuple,
but if this is not a legal state for a BufferHeapTuple slot, why didn't
ExecStoreVirtualTuple raise a complaint?

(2) It also seems like we can't use the srcSlot if we want to have
the fail-because-its-a-virtual-tuple behavior.  I experimented with
doing ExecMaterializeSlot on the result of execute_attr_map_slot,
and that stops the crash, but then we're returning garbage values
of xmin etc, which does not seem good.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2021-04-22 14:21:05 -0400, Tom Lane wrote:
> Hm, we're not done with this.  Whatever you think of the merits of
> throwing an implementation-level error, what's actually happening
> in v12 and v13 in the wake of a71cfc56b is that an attempt to do
> "RETURNING xmin" works in a non-cross-partition UPDATE, but in
> a cross-partition UPDATE it dumps core.  What I'm seeing is that
> the result of the execute_attr_map_slot() call looks like
> 
> (gdb) p *(BufferHeapTupleTableSlot*) scanslot
> $3 = {base = {base = {type = T_TupleTableSlot, tts_flags = 8, tts_nvalid = 3, 
>       tts_ops = 0xa305c0 <TTSOpsBufferHeapTuple>, 
>       tts_tupleDescriptor = 0x7f1c7798f920, tts_values = 0x2077100, 
>       tts_isnull = 0x2075fb0, tts_mcxt = 0x2015ea0, tts_tid = {ip_blkid = {
>           bi_hi = 0, bi_lo = 0}, ip_posid = 3}, tts_tableOid = 37812}, 
>     tuple = 0x0, off = 0, tupdata = {t_len = 0, t_self = {ip_blkid = {
>           bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, 
>       t_data = 0x0}}, buffer = 0}
> 
> and since base.tuple is NULL, heap_getsysattr dies on either
> "Assert(tup)" or a null-pointer dereference.
> 
> So
> 
> (1) It seems like this is exposing a shortcoming in the
> multiple-slot-types logic.  It's not hard to understand why the slot would
> look like this after execute_attr_map_slot does ExecStoreVirtualTuple,
> but if this is not a legal state for a BufferHeapTuple slot, why didn't
> ExecStoreVirtualTuple raise a complaint?

I think it's too useful to support ExecStoreVirtualTuple() in a
heap[buffer] slot to disallow that. Seems like we should make
tts_buffer_heap_getsysattr() error out if there's no tuple?


> (2) It also seems like we can't use the srcSlot if we want to have
> the fail-because-its-a-virtual-tuple behavior.  I experimented with
> doing ExecMaterializeSlot on the result of execute_attr_map_slot,
> and that stops the crash, but then we're returning garbage values
> of xmin etc, which does not seem good.

Garbage values as in 0's, or random data? Seems like it should be the
former?

Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2021-04-22 14:21:05 -0400, Tom Lane wrote:
>> (1) It seems like this is exposing a shortcoming in the
>> multiple-slot-types logic.  It's not hard to understand why the slot would
>> look like this after execute_attr_map_slot does ExecStoreVirtualTuple,
>> but if this is not a legal state for a BufferHeapTuple slot, why didn't
>> ExecStoreVirtualTuple raise a complaint?

> I think it's too useful to support ExecStoreVirtualTuple() in a
> heap[buffer] slot to disallow that. Seems like we should make
> tts_buffer_heap_getsysattr() error out if there's no tuple?

OK, I could work with that.  Shall we spell the error message the
same as if it really were a virtual slot, or does it need to be
different to avoid confusion?

>> (2) It also seems like we can't use the srcSlot if we want to have
>> the fail-because-its-a-virtual-tuple behavior.  I experimented with
>> doing ExecMaterializeSlot on the result of execute_attr_map_slot,
>> and that stops the crash, but then we're returning garbage values
>> of xmin etc, which does not seem good.

> Garbage values as in 0's, or random data? Seems like it should be the
> former?

I was seeing something like xmin = 128.  I think this might be from
our filling in the header as though for a composite Datum.  In any
case, I think we need to either deliver the correct answer or throw
an error; silently returning zeroes wouldn't be good.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2021-04-22 14:37:26 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2021-04-22 14:21:05 -0400, Tom Lane wrote:
> >> (1) It seems like this is exposing a shortcoming in the
> >> multiple-slot-types logic.  It's not hard to understand why the slot would
> >> look like this after execute_attr_map_slot does ExecStoreVirtualTuple,
> >> but if this is not a legal state for a BufferHeapTuple slot, why didn't
> >> ExecStoreVirtualTuple raise a complaint?
> 
> > I think it's too useful to support ExecStoreVirtualTuple() in a
> > heap[buffer] slot to disallow that. Seems like we should make
> > tts_buffer_heap_getsysattr() error out if there's no tuple?
> 
> OK, I could work with that.  Shall we spell the error message the
> same as if it really were a virtual slot, or does it need to be
> different to avoid confusion?

Hm. Seems like it'd be better to have a distinct error message? Feels
like it'll often indicate separate issues whether a non-materialized
heap slot or a virtual slot has its system columns accessed.


> >> (2) It also seems like we can't use the srcSlot if we want to have
> >> the fail-because-its-a-virtual-tuple behavior.  I experimented with
> >> doing ExecMaterializeSlot on the result of execute_attr_map_slot,
> >> and that stops the crash, but then we're returning garbage values
> >> of xmin etc, which does not seem good.
> 
> > Garbage values as in 0's, or random data? Seems like it should be the
> > former?
> 
> I was seeing something like xmin = 128.  I think this might be from
> our filling in the header as though for a composite Datum.

I was wondering if we're not initializing memory somewhere were we
should...


> In any case, I think we need to either deliver the correct answer or
> throw an error; silently returning zeroes wouldn't be good.

IIRC there's some historical precedent in returning 0s in some
cases. But I don't think we should do that if we can avoid it.

Not entirely clear to me how we'd track whether we have valid system
column data or not once materialized - which I think is why we
historically had the cases where we returned bogus values.

Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2021-04-22 14:37:26 -0400, Tom Lane wrote:
>> OK, I could work with that.  Shall we spell the error message the
>> same as if it really were a virtual slot, or does it need to be
>> different to avoid confusion?

> Hm. Seems like it'd be better to have a distinct error message? Feels
> like it'll often indicate separate issues whether a non-materialized
> heap slot or a virtual slot has its system columns accessed.

After thinking about it for a bit, I'm inclined to promote this to
a user-facing error, and have all the slot types report

    ereport(ERROR,
            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
             errmsg("cannot retrieve a system column in this context")));

which is at least somewhat intelligible to end users.  A developer
trying to figure out why it happened would resort to \errverbose or
more likely gdb in any case, so the lack of specificity doesn't
seem like a problem.

> Not entirely clear to me how we'd track whether we have valid system
> column data or not once materialized - which I think is why we
> historically had the cases where we returned bogus values.

Right, but with this fix we won't need to materialize.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Andres Freund
Date:
Hi,

On 2021-04-22 15:09:59 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2021-04-22 14:37:26 -0400, Tom Lane wrote:
> >> OK, I could work with that.  Shall we spell the error message the
> >> same as if it really were a virtual slot, or does it need to be
> >> different to avoid confusion?
> 
> > Hm. Seems like it'd be better to have a distinct error message? Feels
> > like it'll often indicate separate issues whether a non-materialized
> > heap slot or a virtual slot has its system columns accessed.
> 
> After thinking about it for a bit, I'm inclined to promote this to
> a user-facing error, and have all the slot types report
> 
>     ereport(ERROR,
>             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>              errmsg("cannot retrieve a system column in this context")));

WFM.


> which is at least somewhat intelligible to end users.  A developer
> trying to figure out why it happened would resort to \errverbose or
> more likely gdb in any case, so the lack of specificity doesn't
> seem like a problem.

I'm wondering if it'd be a good idea to add TupleTableSlotOps.name,
which we then could include in error messages without needing to end up
with per-slot-type code... But that's probably a separate project from
adding the error above.


> > Not entirely clear to me how we'd track whether we have valid system
> > column data or not once materialized - which I think is why we
> > historically had the cases where we returned bogus values.
> 
> Right, but with this fix we won't need to materialize.

In this path - but it does seem mildly bothersome that we might do the
wrong thing in other paths. If we at least returned something halfway
sensible (e.g. NULL) instead of 128... I guess we could track whether
the tuple originated externally, or whether it's from a materialized
virtual tuple...

Greetings,

Andres Freund



Re: posgres 12 bug (partitioned table)

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2021-04-22 15:09:59 -0400, Tom Lane wrote:
>> After thinking about it for a bit, I'm inclined to promote this to
>> a user-facing error, and have all the slot types report
>>     ereport(ERROR,
>>             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>>              errmsg("cannot retrieve a system column in this context")));

> WFM.

OK, I'll go make that happen.

            regards, tom lane



Re: posgres 12 bug (partitioned table)

From
Amit Langote
Date:
On Fri, Apr 23, 2021 at 3:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Thu, Apr 22, 2021 at 4:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> In the meantime, if the back branches fail with something like
> >> "virtual tuple table slot does not have system attributes" when
> >> trying to do this, that's not great but I'm not sure we should
> >> be putting effort into improving it.
>
> > Got it.  That sounds like an acceptable compromise.
>
> Hm, we're not done with this.  Whatever you think of the merits of
> throwing an implementation-level error, what's actually happening
> in v12 and v13 in the wake of a71cfc56b is that an attempt to do
> "RETURNING xmin" works in a non-cross-partition UPDATE, but in
> a cross-partition UPDATE it dumps core.

Thanks for fixing this.

-- 
Amit Langote
EDB: http://www.enterprisedb.com