Re: logical replication worker accesses catalogs in error context callback - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: logical replication worker accesses catalogs in error context callback
Date
Msg-id CALj2ACViLnbQa_KuH8F_58-Fzs4ceWOuDB6we0m8UO7anQ5WSQ@mail.gmail.com
Whole thread Raw
In response to Re: logical replication worker accesses catalogs in error context callback  (Andres Freund <andres@anarazel.de>)
Responses Re: logical replication worker accesses catalogs in error context callback
List pgsql-hackers
On Thu, Feb 4, 2021 at 5:16 AM Andres Freund <andres@anarazel.de> wrote:
> > > > About 0001, have we tried to reproduce the actual bug here which means
> > > > when the error_callback is called we should face some problem? I feel
> > > > with the correct testcase we should hit the Assert
> > > > (Assert(IsTransactionState());) in SearchCatCacheInternal because
> > > > there we expect the transaction to be in a valid state. I understand
> > > > that the transaction is in a broken state at that time but having a
> > > > testcase to hit the actual bug makes it easy to test the fix.
> > >
> > > I think it's easy to hit - add a trivial DEBUG message to XLogWrite(),
> > > and run anything involving logical rep using a low enough log
> > > level. There might even be enough messages with a low enough level
> > > already somewhere in the relevant paths.
> >
> > I'm not sure how adding a DEBUG message to XLogWrite() would ensure
> > the logical replication worker on the subscriber system enters
> > slot_store_error_callback and hits the Assert(IsTransactionState());?
> > Could you please elaborate? Or I may be missing something here to
> > understand.
>
> XLogWrite() is in a critical section, the DEBUG messages triggers error
> context callbacks to be called, the callbacks allocate memory, which
> triggers an assertion.

I see that XLogWrite() can be called from logical replication

worker(apply_handle_commit->apply_handle_commit_internal->CommitTransactionCommand->CommitTransaction->RecordTransactionCommit->XLogFlush->XLogWrite
--> by now the txn state is not TRANS_INPROGRESS, so
IsTransactionState() can return false. Adding a DEBUG message there
can call the error context callback.

But the error context callback slot_store_error_callback, gets used in
slot_store_data and slot_modify_data. In both places we just register
the callback before an attribute processing for loop and deregister
after it. So, when we are in those for loops, we expect to be in
TRANS_INPROGRESS, see ensure_transaction before slot_store_data and
the same is true for slot_modify_data. So, to really hit the assert
failure in the catalogue access from slot_store_error_callback, first,
we shouldn't be in TRANS_INPROGRESS in those for loops and have a
DEBUG message. I'm not exactly sure how we achieve this.

Am I missing something?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Single transaction in the tablesync worker?
Next
From: Ronan Dunklau
Date:
Subject: Preserve attstattarget on REINDEX CONCURRENTLY