Hi hackers,
I am proposing to add the reason for the oldest xmin to VACUUM logs.
This feature would be useful for identifying why dead tuples cannot be
removed, thereby helping to diagnose and prevent table bloat.
The current logs only indicate that dead tuples could not be reclaimed
due to the oldest xmin, but they do not reveal the underlying reason.
To identify the cause, it is necessary to query multiple views:
pg_stat_activity (for active transactions), pg_prepared_xacts (for
prepared statements), pg_replication_slots (for replication slots),
and pg_stat_replication (for hot standby feedback). However, because
the data in these views is volatile, it is difficult to retroactively
determine what was holding the oldest xmin at the specific time the
log message was generated.
This PoC patch addresses this problem. The implementation now outputs
the reason for the oldest xmin and, where applicable, the backend PID.
This information was originally discarded when calculating the oldest
xmin horizon, and the computation required to retrieve these reasons
is considered reasonable.
The patch is attached. What do you think?
--
Best regards,
Shinya Kato
NTT OSS Center