Hi,
On 2023-05-19 12:07:56 +0900, Kyotaro Horiguchi wrote:
> At Thu, 18 May 2023 20:11:11 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > > > + ConditionVariableInit(&WalSndCtl->physicalWALSndCV);
> > > > + ConditionVariableInit(&WalSndCtl->logicalWALSndCV);
> > >
> > > It's not obvious to me that it's worth having two CVs, because it's more
> > > expensive to find no waiters in two CVs than to find no waiters in one CV.
> >
> > I disagree. In the tight per-WAL record recovery loop, WalSndWakeup
> > wakes up logical walsenders for every WAL record, but it wakes up
> > physical walsenders only if the applied WAL record causes a TLI
> > switch. Therefore, the extra cost of spinlock acquire-release for per
> > WAL record applies only for logical walsenders. On the other hand, if
> > we were to use a single CV, we would be unnecessarily waking up (if at
> > all they are sleeping) physical walsenders for every WAL record -
> > which is costly IMO.
>
> As I was reading this, I start thinking that one reason for the
> regression could be to exccessive frequency of wakeups during logical
> replication. In physical replication, we make sure to avoid exccessive
> wakeups when the stream is tightly packed. I'm just wondering why
> logical replication doesn't (or can't) do the same thing, IMHO.
It's possible we could try to reduce the frequency by issuing wakeups only at
specific points. The most obvious thing to do would be to wake only when
waiting for more WAL or when crossing a page boundary, or such. Unfortunately
that could easily lead to deadlocks, because the startup process might be
blocked waiting for a lock, held by a backend doing logical decoding - which
can't progress until the startup process wakes the backend up.
So I don't think this is promising avenue in the near term.
Greetings,
Andres Freund