Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Date
Msg-id CAHg+QDe=4pE3=7db5SUzsPFFpz97O74Mk0riJgk=aGJ44WdLtw@mail.gmail.com
Whole thread Raw
In response to Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
List pgsql-hackers
Stephen, thank you!

On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote:
> On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
> > satyanarlapuram@gmail.com> wrote:
> >>> Actually all the WAL insertions are done under a critical section
> >>> (except few exceptions), that means if you see all the references of
> >>> XLogInsert(), it is always called under the critical section and that is my
> >>> main worry about hooking at XLogInsert level.
> >>>
> >>
> >> Got it, understood the concern. But can we document the limitations of
> >> the hook and let the hook take care of it? I don't expect an error to be
> >> thrown here since we are not planning to allocate memory or make file
> >> system calls but instead look at the shared memory state and add delays
> >> when required.
> >>
> >>
> > Yet another problem is that if we are in XlogInsert() that means we are
> > holding the buffer locks on all the pages we have modified, so if we add a
> > hook at that level which can make it wait then we would also block any of
> > the read operations needed to read from those buffers.  I haven't thought
> > what could be better way to do this but this is certainly not good.
> >
>
> Yes, this is a problem. The other approach is adding a hook at
> XLogWrite/XLogFlush? All the other backends will be waiting behind the
> WALWriteLock. The process that is performing the write enters into a busy
> loop with small delays until the criteria are met. Inability to process the
> interrupts inside the critical section is a challenge in both approaches.
> Any other thoughts?

Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function?  Sure
seems like there's a lot of similarity.

I was thinking of achieving log governance (throttling WAL MB/sec) and also providing RPO guarantees. In this model, it is hard to throttle WAL generation of a long running transaction (for example copy/select into). However, this meets my RPO needs. Are you in support of adding a hook or the actual change? IMHO, the hook allows more creative options. I can go ahead and make a patch accordingly.


 
Thanks,

Stephen

pgsql-hackers by date:

Previous
From: SATYANARAYANA NARLAPURAM
Date:
Subject: Re: Report checkpoint progress in server logs
Next
From: Stephen Frost
Date:
Subject: Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes