Re: [PATCH] Write Notifications Through WAL - Mailing list pgsql-hackers

From Rishu Bagga
Subject Re: [PATCH] Write Notifications Through WAL
Date
Msg-id CAK80=jiLjJ1CDQBieG+wjwZMd7uG_T=pPhsLiW5jzA9KmQJQGQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Write Notifications Through WAL  (Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>)
List pgsql-hackers
Thank you Arseniy for the thoughtful and detailed feedback, I have
addressed the concerns in the new patch.

On Wed, Oct 8, 2025 at 3:07 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

> Yes, we hold the lock while we are writing the commit record to wal,
> but other transactions don't see our transaction as committed and
> can't see changes that our transaction done besides notifications
> (inserts/updates/deletes) right after we wrote the commit record.
> There are a few more things that need to be done before it. IIUC the
> moment when a fresh snapshot will show our transaction as 'not in
> progress' is right after updating procarray (ProcArrayEndTransaction()
> call in CommitTransaction()). So I think we still need
> XidInMVCCSnapshot() check in reading path. Without it we can end up in
> a situation where the listener got the notification but still can't
> see changes notify transaction done because notification was processed
> too early.

Good catch, I've fixed this in the new patch.

> I think reserving slots is a great idea. But I have a question about
> checking if the queue is full. Why does PreCommit_Notify need three
> checks? Isn't it enough to simply check that we don't exceed
> max_total_entries?

After taking a closer look, you are right in that we don't need all three
checks; however, it isn't enough to check that we don't exceed
max_total_entries.
The reason for this is that we can exceed max_total_pages without necessarily
exceeding max_total_entries, in the case that the oldest active entry is not the
first entry on the tail page. So actually the only test we need is to
confirm that
when we zero a new page, that the total number of pages is still <=
max_total_pages.
I have updated this in the new patch.


> xl_notify.notify_lsn part looks good to me, but, IMHO, we can move
> locking and adding to the queue logic one level up into
> RecordTransactionCommit() and write it around the
> XactLogCommitRecord() call.

Fixed in the new patch.

> >

> I don't have enough knowledge to say if it's the right approach to
> prevent wal truncating with unread notifications. But there are two
> things about current implementation I noticed. Not sure if it's worth
> doing something with it before you know it's the right approach in
> general. But anyway, there are two points about it:
>
> We hold NotifyQueueLock while adding the notifications data record to
> wal. IIUC one of the points of using wal here was that it lets us to
> write notification data in parallel as wal is good for parallel
> writes. But with lock it seems like we don't take advantage of it.
> IIUC we hold the lock here to prevent truncation of the new record
> after we wrote it and before we save its lsn. Is this correct? Maybe
> we can solve it somehow differently? For example before writing the
> notification data record we can save the current lsn (something like
> XactLastRecEnd) in UncommittedNotifies so we can be sure anything
> after it will not be truncated, including our record that we are gonna
> write. This way we don't need to hold the lock.
>
> We iterate over all entries in UncommittedNotifies every time we want
> to add/remove data. What do you think about using MyProcNumber as an
> array index instead of iteration as we do with listener positions for
> instance? Sounds possible as every backend can't have more than one
> uncommitted lsn. Also maybe backends can remove entries from
> UncommittedNotifies in AtAbort_Notify() the same way committed
> transactions do it in asyncQueueAddCompactEntry()? This way
> AsyncNotifyOldestRequiredLSN() will not need to be bothered with
> cleaning UncommittedNotifies and checking transaction statuses.

After taking another look, I realized there is a cleaner approach that
does not need the use of UncommittedNotifiesArray in the first place,
and also allows us to keep the parallel WAL writes for notifications.
We can simply have the recycler iterate through the procarray instead, and find
the lowest value of a proc's notifyDataLsn. (renamed from
notifyCommitLsn in prev patch)
There is a small window where the notify data WAL record has been written,
and its lsn hasn't been assigned to MyProc->notifyDataLsn, so to make
sure we don't
lose any records, before we write the data record, we set
MyProc->notifyDataLsn to
the current lsn pointer using GetXLogInsertRecPtr(). If a Proc has a
non-invalid value for
notifyDataLsn, then we know that it has uncommitted notification data
written to the WAL,
at or above that value.

The performance remains similar with these changes. I have also
rebased the patch on the latest HEAD.


Thanks,

Rishu

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Next
From: Tom Lane
Date:
Subject: Re: Clarification on Role Access Rights to Table Indexes