The cfbot complained that the patch series no longer applies, so I've rebased
it and also tried to make sure that the other flags become green.
One particular problem was that pg_upgrade complained that "live undo data"
remains in the old cluster. I found out that the temporary undo log causes the
problem, so I've adjusted the query in check_for_undo_data() accordingly until
the problem gets fixed properly.
The problem of the temporary undo log is that it's loaded into local buffers
and that backend can exit w/o flushing local buffers to disk, and thus we are
not guaranteed to find enough information when trying to discard the undo log
the backend wrote. I'm thinking about the following solutions:
1. Let the backend manage temporary undo log on its own (even the slot
metadata would stay outside the shared memory, and in particular the
insertion pointer could start from 1 for each session) and remove the
segment files at the same moment the temporary relations are removed.
However, by moving the temporary undo slots away from the shared memory,
computation of oldestFullXidHavingUndo (see the PROC_HDR structure) would
be affected. It might seem that a transaction which only writes undo log
for temporary relations does not need to affect oldestFullXidHavingUndo,
but it needs to be analyzed thoroughly. Since oldestFullXidHavingUndo
prevents transactions to be truncated from the CLOG too early, I wonder if
the following is possible (This scenario is only applicable to the zheap
storage engine [1], which is not included in this patch, but should already
be considered.):
A transaction creates a temporary table, does some (many) changes and then
gets rolled back. The undo records are being applied and it takes some
time. Since XID of the transaction did not affect oldestFullXidHavingUndo,
the XID can disappear from the CLOG due to truncation. However zundo.c in
[1] indicates that the transaction status *is* checked during undo
execution, so we might have a problem.
Or do I miss something? UndoDiscard() in zheap seems to ignore temporary
undo:
/* We can't process temporary undo logs. */
if (log->meta.persistence == UNDO_TEMP)
continue;
2. Do not load the temporary undo into local buffers. If it's always in the
shared buffers, we should never see incomplete data when trying to discard
undo. In this case, persistence levels UNDOPERSISTENCE_UNLOGGED and
UNDOPERSISTENCE_TEMP could be merged into a single level.
3. Implement the discarding in another way, but I don't have new idea right
now.
Suggestions are welcome.
[1] https://github.com/EnterpriseDB/zheap/tree/master
--
Antonin Houska
Web: https://www.cybertec-postgresql.com