Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

From Ian Barwick
Subject Re: Improving connection scalability: GetSnapshotData()
Date
Msg-id 61291ffe-d611-f889-68b5-c298da9fb18f@2ndquadrant.com
Whole thread Raw
In response to Re: Improving connection scalability: GetSnapshotData()  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2020/09/03 17:18, Michael Paquier wrote:
> On Sun, Aug 16, 2020 at 02:26:57PM -0700, Andres Freund wrote:
>> So we get some builfarm results while thinking about this.
> 
> Andres, there is an entry in the CF for this thread:
> https://commitfest.postgresql.org/29/2500/
> 
> A lot of work has been committed with 623a9ba, 73487a6, 5788e25, etc.

I haven't seen it mentioned here, so apologies if I've overlooked
something, but as of 623a9ba queries on standbys seem somewhat
broken.

Specifically, I maintain some code which does something like this:

- connects to a standby
- checks a particular row does not exist on the standby
- connects to the primary
- writes a row in the primary
- polls the standby (using the same connection as above)
   to verify the row arrives on the standby

As of recent HEAD it never sees the row arrive on the standby, even
though it is verifiably there.

I've traced this back to 623a9ba, which relies on "xactCompletionCount"
being incremented to determine whether the snapshot can be reused,
but that never happens on a standby, meaning this test in
GetSnapshotDataReuse():

     if (curXactCompletionCount != snapshot->snapXactCompletionCount)
         return false;

will never return false, and the snapshot's xmin/xmax never get advanced.
Which means the session on the standby is not able to see rows on the
standby added after the session was started.

It's simple enough to prevent that being an issue by just never calling
GetSnapshotDataReuse() if the snapshot was taken during recovery
(though obviously that means any performance benefits won't be available
on standbys).

I wonder if it's possible to increment "xactCompletionCount"
during replay along these lines:

     *** a/src/backend/access/transam/xact.c
     --- b/src/backend/access/transam/xact.c
     *************** xact_redo_commit(xl_xact_parsed_commit *
     *** 5915,5920 ****
     --- 5915,5924 ----
              */
             if (XactCompletionApplyFeedback(parsed->xinfo))
                     XLogRequestWalReceiverReply();
     +
     +       LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
     +       ShmemVariableCache->xactCompletionCount++;
     +       LWLockRelease(ProcArrayLock);
       }

which seems to work (though quite possibly I've overlooked something I don't
know that I don't know about and it will all break horribly somewhere,
etc. etc.).


Regards

Ian Barwick


-- 
Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Remove page-read callback from XLogReaderState.