Re: [HACKERS] make async slave to wait for lsn to be replayed - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [HACKERS] make async slave to wait for lsn to be replayed
Date
Msg-id b155606b-e744-4218-bda5-29379779da1a@iki.fi
Whole thread Raw
In response to Re: [HACKERS] make async slave to wait for lsn to be replayed  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: [HACKERS] make async slave to wait for lsn to be replayed
List pgsql-hackers
On 07/04/2024 00:52, Alexander Korotkov wrote:
> On Fri, Apr 5, 2024 at 9:15 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> I'm still concerned that WaitLSNCleanup is only called in ProcKill.
>> Does this mean that if a process throws an error while waiting, it'll
>> not get cleaned up until it exits?  Maybe this is not a big deal, but it
>> seems odd.
> 
> I've added WaitLSNCleanup() to the AbortTransaction().  Just pushed
> that together with the improvements upthread.

Race condition:

1. backend: pg_wal_replay_wait('0/1234') is called. It calls WaitForLSN
2. backend: WaitForLSN calls addLSNWaiter('0/1234'). It adds the backend 
process to the LSN heap and returns
3. replay:  rm_redo record '0/1234'
4. backend: WaitForLSN enters for-loop, calls GetXLogReplayRecPtr()
5. backend: current replay LSN location is '0/1234', so we exit the loop
6. replay: calls WaitLSNSetLatches()

In a nutshell, it's possible for the loop in WaitForLSN to exit without 
cleaning up the process from the heap. I was able to hit that by adding 
a delay after the addLSNWaiter() call:

> TRAP: failed Assert("!procInfo->inHeap"), File: "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
> postgres: heikki postgres [local] CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
> postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
> postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
> postgres: heikki postgres [local] CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
> postgres: heikki postgres [local] CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
> postgres: heikki postgres [local] CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]

I think there's a similar race condition if the timeout is reached at 
the same time that the startup process wakes up the process.

>      * At first, we check that pg_wal_replay_wait() is called in a non-atomic
>      * context.  That is, a procedure call isn't wrapped into a transaction,
>      * another procedure call, or a function call.
>      *

It's pretty unfortunate to have all these restrictions. It would be nice 
to do:

select pg_wal_replay_wait('0/1234'); select * from foo;

in a single multi-query call, to avoid the round-trip to the client. You 
can avoid it with libpq or protocol level pipelining, too, but it's more 
complicated.

>      * Secondly, according to PlannedStmtRequiresSnapshot(), even in an atomic
>      * context, CallStmt is processed with a snapshot.  Thankfully, we can pop
>      * this snapshot, because PortalRunUtility() can tolerate this.

This assumption that PortalRunUtility() can tolerate us popping the 
snapshot sounds very fishy. I haven't looked at what's going on there, 
but doesn't sound like a great assumption.

If recovery ends while a process is waiting for an LSN to arrive, does 
it ever get woken up?

The docs could use some-copy-editing, but just to point out one issue:

> There are also procedures to control the progress of recovery.

That's copy-pasted from an earlier sentence at the table that lists 
functions like pg_promote(), pg_wal_replay_pause(), and 
pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control the 
progress of recovery like those functions do, it only causes the calling 
backend to wait.

Overall, this feature doesn't feel quite ready for v17, and IMHO should 
be reverted. It's a nice feature, so I'd love to have it fixed and 
reviewed early in the v18 cycle.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
Next
From: Heikki Linnakangas
Date:
Subject: Re: Improve eviction algorithm in ReorderBuffer