Re: posgres 12 bug (partitioned table) - Mailing list pgsql-bugs

From Amit Langote
Subject Re: posgres 12 bug (partitioned table)
Date
Msg-id CA+HiwqHU3rRrqEA_+yCzpqb_+ssFj7=df7DthpcmoUmjwDXLWg@mail.gmail.com
Whole thread Raw
In response to Re: posgres 12 bug (partitioned table)  (Soumyadeep Chakraborty <soumyadeep2007@gmail.com>)
Responses Re: posgres 12 bug (partitioned table)
Re: posgres 12 bug (partitioned table)
List pgsql-bugs
Hi Soumyadeep,

On Fri, Jul 10, 2020 at 2:56 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
>
> Hey Amit,
>
> On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> > By the way, what happens today if you do INSERT INTO a_zedstore_table
> > ... RETURNING xmin?  Do you get an error "xmin is unrecognized" or
> > some such in slot_getsysattr() when trying to project the RETURNING
> > list?
> >
> We get garbage values for xmin and cmin. If we request cmax/xmax, we get
> an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
> "zedstore tuple table slot does not have system attributes (except xmin
> and cmin)"
>
> A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
> zedstoream_insert(), which is the tuple_insert() implementation, does
> not supply the xmin/cmin, thus making those values garbage.
>
> For context, Zedstore has its own UNDO log implementation to act as
> storage for transaction information. (which is intended to be replaced
> with the upstream UNDO log in the future).
>
> The above behavior is not just restricted to INSERT..RETURNING, right
> now. If we do a select <tx_column> from foo in Zedstore, the behavior is
> the same.  The transaction information is never returned from Zedstore
> in tableam calls that don't demand transactional information be
> used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
> it will use the transactional information correctly. It will also
> populate TM_FailureData, which contains xmax and cmax, in the APIs where
> it is demanded.
>
> I really wonder what other AMs are doing about these issues.
>
> I think we should either:
>
> 1. Demand transactional information off of AMs for all APIs that involve
> a projection of transactional information.
>
> 2. Have some other component of Postgres supply the transactional
> information. This is what I think the upstream UNDO log can probably
> provide.

So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns.  We shouldn't really need any
new core code to get the transaction-related system columns while
there exists a perfectly reasonable channel for it to arrive through
-- TupleTableSlots.  I suppose there's a reason why we allow AMs to
provide their own slot callbacks.

Whether an AM uses UNDO log or something else to manage the
transaction info is up to the AM, so I don't see why the AMs
themselves shouldn't be in charge of returning that info, because only
they know where it is.

> 3. (Least elegant) Transform tuple table slots into heap tuple table
> slots (since it is the only kind of tuple storage that can supply
> transactional info) and explicitly fill in the transactional values
> depending on the context, whenever transactional information is
> projected.
>
> For this bug report, I am not sure what is right. Perhaps, to stop the
> bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
> that the AM needs to provide the transactional info in the respective
> insert AM API calls,

As long as the AM's slot_getsysattr() callback returns the correct
value, this works.

> as well as demand a heap slot for partition roots
> and interior nodes.

It would be a compromise on the core's part to use "heap" slots for
partitioned tables, because they don't have a valid table AM.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16533: Planner optimisation : range predicate not propagating to joined tables
Next
From: Євген Панченко
Date:
Subject: TDE in PostgreSQL