Hello Tom,
12.01.2024 00:43, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jan 10, 2024 at 10:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>>> I was discouraged by that vast distance and implicit buffer usage too, but
>>> I found no other feasible way to fix it. On the other hand, 75e03eabe and
>>> Andres's words upthread made me believe that it's an acceptable solution.
>> I agree that it's potentially acceptable. I just wonder if Tom or
>> someone else is going to want to propose a bigger change to avoid some
>> of this messiness. I don't know what that would look like, though.
> I'm still feeling itchy about it. Maybe the problem could be fixed
> better if we didn't re-use slots with such abandon in this code,
> but I don't have a concrete proposal. Also, that'd likely be a
> nontrivial rewrite bringing its own possibilities of adding bugs.
I'm not too excited by this approach either, but maybe it's the only
solution which can be implemented for now.
I have rechecked other ExecGetUpdateNewTuple() calls and confirmed that all
of these materialize newslot returned.
Namely,
1) ExecCrossPartitionUpdate() called by ExecUpdateAct(), where in case of a
retry we have:
/* ensure slot is independent, consider e.g. EPQ */
ExecMaterializeSlot(slot);
2) ExecModifyTable() passes the slot returned to ExecUpdate(), which begins
with a call to ExecUpdatePrologue(), which materializes the slot.
3) ExecUpdate() itself calls ExecGetUpdateNewTuple() in a loop (redo_act),
where ExecUpdateAct() called, which also materializes the slot.
So this place in ExecBRUpdateTriggers() is the only one where we have no
materialization after ExecGetUpdateNewTuple().
Maybe another option is to always materialize the slot inside
ExecGetUpdateNewTuple(), but I'm afraid such a change is not suitable for
back-patching...
> Some concrete thoughts:
>
> * Maybe better to Assert(newslot == epqslot_clean) instead of
> what you did here? Not sure.
Me too. Other ExecGetUpdateNewTuple() calls don't have such Asserts nearby,
but I haven't studied yet whether they could have ones...
> * Why is the ExecMaterializeSlot step needed if we don't have
> an epqslot_candidate?
I had decided to move that step out of "if (epqslot_candidate != NULL)"
in v3 to play safe when removing ExecMaterializeSlot(newslot) brought by
75e03eabe (more details upthread [1]). I thought that if that commit stated
that the slot needs materialization (it was time before 86dc90056, which
raised the current issue), then the materialization must be preserved.
(The same commit can be found in zheap repository [2], but I found no other
explanation of it's purpose...)
Though looking at the current core code, I'm inclined to perform
ExecMaterializeSlot() only when we have epqslot_candidate, if we can afford
not to bother about that zheap-specific or similar scenario...
[1] https://www.postgresql.org/message-id/e989408c-1838-61cd-c968-1fcf47c6fbba%40gmail.com
[2] https://github.com/EnterpriseDB/zheap/commit/75e03eabe
Best regards,
Alexander