On 29/10/2025 14:30, Joel Jacobson wrote:
> On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
>> On 25/10/2025 21:01, Joel Jacobson wrote:
>>> I also want to share a new idea that Heikki Linnakangas came up with
>>> during PgConf25 in Riga.
>>>
>>> The idea is basically to scan the notification queue from tail to head,
>>> and update xids that precedes newFrozenXid, to either
>>> FrozenTransactionId if committed, or InvalidTransactionId if not.
>>>
>>> I've implemented this by adding a new extern function
>>> AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
>>>
>>> This way, we don't need to hold back the advancement of newFrozenXid in
>>> vac_update_datfrozenxid.
>>>
>>> Attempt of implementing Heikki's idea:
>>> - async_notify_freeze_xids.txt
>>
>> Thanks!
> 
> Thanks for the idea!
For the record, Yura suggested the same approach earlier in this thread 
[1]. That line of discussion led to more complicated patches but I think 
the complications came from trying to set the flag as part of 
commit/abort. The changes are more straightforward and backpatchable if 
we only do it at vacuum.
>>> This idea has the benefit of never holding back newFrozenXid,
>>> however, I see some problems with it:
>>>
>>> - If there is a bug in the code, we could accidentally overwrite
>>>     xid values we didn't intend to overwrite, and maybe we would never
>>>     find out that we did.
>>>
>>> - We wouldn't know for sure that we've understood the cause of
>>>     this bug.
>>>
>>> With v12-vacuum_notify_queue_cleanup, we instead have the downside of
>>> possibly holding back newFrozenXid, but with the advantage of giving us
>>> higher confidence in that we've correctly identified the cause of the
>>> bug.
>>
>> Meh. Robustness is good and all, and in heap tuples we don't overwrite
>> the xmin/xmax but just set a FROZEN flag, precisely so that we preserve
>> evidence if something goes wrong. But I can't get too worked up about
>> that for the async notification queue.
> 
> Having slept on this for some days, I'm less worried about this
> approach. I like the simplicity of it, and that we don't bolt on complexity
> to another subsystem, just for the sake of improved debugging
> capabilities of a different subsystem.
> 
> I think a different way of looking at this, is that we wouldn't conceal
> a bug in async, but rather we would let vacuum do part of its job, of
> checking the commit status of the xids, when needed, and be okay with
> that split responsibility.
Ok, at quick glance I think this patch is in pretty good shape. I'll 
review it more thoroughly, and see if I can incorporate the test from 
Matheus Alcantara's or Arseniy Mukhin's latest patches, and commit and 
backpatch this.
- Heikki