On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* SATYANARAYANA NARLAPURAM (satyanarlapuram@gmail.com) wrote: > On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM < > > satyanarlapuram@gmail.com> wrote: > >>> Actually all the WAL insertions are done under a critical section > >>> (except few exceptions), that means if you see all the references of > >>> XLogInsert(), it is always called under the critical section and that is my > >>> main worry about hooking at XLogInsert level. > >>> > >> > >> Got it, understood the concern. But can we document the limitations of > >> the hook and let the hook take care of it? I don't expect an error to be > >> thrown here since we are not planning to allocate memory or make file > >> system calls but instead look at the shared memory state and add delays > >> when required. > >> > >> > > Yet another problem is that if we are in XlogInsert() that means we are > > holding the buffer locks on all the pages we have modified, so if we add a > > hook at that level which can make it wait then we would also block any of > > the read operations needed to read from those buffers. I haven't thought > > what could be better way to do this but this is certainly not good. > > > > Yes, this is a problem. The other approach is adding a hook at > XLogWrite/XLogFlush? All the other backends will be waiting behind the > WALWriteLock. The process that is performing the write enters into a busy > loop with small delays until the criteria are met. Inability to process the > interrupts inside the critical section is a challenge in both approaches. > Any other thoughts?
Why not have this work the exact same way sync replicas do, except that it's based off of some byte/time lag for some set of async replicas? That is, in RecordTransactionCommit(), perhaps right after the SyncRepWaitForLSN() call, or maybe even add this to that function? Sure seems like there's a lot of similarity.
I was thinking of achieving log governance (throttling WAL MB/sec) and also providing RPO guarantees. In this model, it is hard to throttle WAL generation of a long running transaction (for example copy/select into).
Long running transactions have a lot of downsides and are best discouraged. I don’t know that we should be designing this for that case specifically, particularly given the complications it would introduce as discussed on this thread already.
However, this meets my RPO needs. Are you in support of adding a hook or the actual change? IMHO, the hook allows more creative options. I can go ahead and make a patch accordingly.
I would think this would make more sense as part of core rather than a hook, as that then requires an extension and additional setup to get going, which raises the bar quite a bit when it comes to actually being used.
Sounds good, I will work on making the changes accordingly.