Hey Amit,
On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:
> By the way, what happens today if you do INSERT INTO a_zedstore_table
> ... RETURNING xmin? Do you get an error "xmin is unrecognized" or
> some such in slot_getsysattr() when trying to project the RETURNING
> list?
>
We get garbage values for xmin and cmin. If we request cmax/xmax, we get
an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
"zedstore tuple table slot does not have system attributes (except xmin
and cmin)"
A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
zedstoream_insert(), which is the tuple_insert() implementation, does
not supply the xmin/cmin, thus making those values garbage.
For context, Zedstore has its own UNDO log implementation to act as
storage for transaction information. (which is intended to be replaced
with the upstream UNDO log in the future).
The above behavior is not just restricted to INSERT..RETURNING, right
now. If we do a select <tx_column> from foo in Zedstore, the behavior is
the same. The transaction information is never returned from Zedstore
in tableam calls that don't demand transactional information be
used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
it will use the transactional information correctly. It will also
populate TM_FailureData, which contains xmax and cmax, in the APIs where
it is demanded.
I really wonder what other AMs are doing about these issues.
I think we should either:
1. Demand transactional information off of AMs for all APIs that involve
a projection of transactional information.
2. Have some other component of Postgres supply the transactional
information. This is what I think the upstream UNDO log can probably
provide.
3. (Least elegant) Transform tuple table slots into heap tuple table
slots (since it is the only kind of tuple storage that can supply
transactional info) and explicitly fill in the transactional values
depending on the context, whenever transactional information is
projected.
For this bug report, I am not sure what is right. Perhaps, to stop the
bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
that the AM needs to provide the transactional info in the respective
insert AM API calls, as well as demand a heap slot for partition roots
and interior nodes. And then later on. we would need a larger effort
making all of these APIs not really demand transactional information.
Perhaps the UNDO framework will come to the rescue.
Regards,
Soumyadeep (VMware)