Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Date
Msg-id CALj2ACVU9QQe1MJoSe9o1Q3wh+=s_12bii-HCL_rhGdd0-W8EA@mail.gmail.com
Whole thread Raw
In response to Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Fri, Dec 24, 2021 at 4:43 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:
>>
>> XLogInsert in my opinion is the best place to call it and the hook can be something like this "void
xlog_insert_hook(NULL)"as all the throttling logic required is the current flush position which can be obtained from
GetFlushRecPtrand the ReplicationSlotCtl. Attached a draft patch. 
>
> IMHO, it is not a good idea to call an external hook function inside a critical section.  Generally, we ensure that
wedo not call any code path within a critical section which can throw an error and if we start calling the external
hookthen we lose that control.  It should be blocked at the operation level itself e.g. ALTER TABLE READ ONLY, or by
someother hook at a little higher level. 

Yeah, good point. It's not advisable to give the control to the
external module in the critical section. For instance, memory
allocation isn't allowed (see [1]) and the ereport(ERROR,....) would
transform to PANIC inside the critical section (see [2], [3]).
Moreover the critical section is to be short-spanned i.e. executing
the as minimal code as possible. There's no guarantee that an external
module would follow these.

I suggest we do it at the level of transaction start i.e. when a txnid
is getting allocated i.e. in AssignTransactionId(). If we do this,
when the limit for the throttling is exceeded, the current txn (even
if it is a long running txn) continues to do the WAL insertions, the
next txns would get blocked. But this is okay and can be conveyed to
the users via documentation if need be. We do block txnid assignments
for parallel workers in this function, so this is a good choice IMO.

Thoughts?

[1]
/*
 * You should not do memory allocations within a critical section, because
 * an out-of-memory error will be escalated to a PANIC. To enforce that
 * rule, the allocation functions Assert that.
 */
#define AssertNotInCriticalSection(context) \
    Assert(CritSectionCount == 0 || (context)->allowInCritSection)

[2]
        /*
         * If we are inside a critical section, all errors become PANIC
         * errors.  See miscadmin.h.
         */
        if (CritSectionCount > 0)
            elevel = PANIC;

[3]
 * A related, but conceptually distinct, mechanism is the "critical section"
 * mechanism.  A critical section not only holds off cancel/die interrupts,
 * but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC)
 * --- that is, a system-wide reset is forced.  Needless to say, only really
 * *critical* code should be marked as a critical section!  Currently, this
 * mechanism is only used for XLOG-related code.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: more descriptive message for process termination due to max_slot_wal_keep_size
Next
From: Michael Paquier
Date:
Subject: Re: An obsolete comment of pg_stat_statements