Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508 - Mailing list pgsql-hackers

From Erik Rijkers
Subject Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
Date
Msg-id e237018a-155a-1dda-804b-2519f48d0903@xs4all.nl
Whole thread Raw
In response to Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Op 20-04-2022 om 06:54 schreef Kyotaro Horiguchi:
> At Tue, 19 Apr 2022 10:55:26 -0700, Andres Freund <andres@anarazel.de> wrote in
>> Hi,
>>
>> On 2022-04-19 10:36:24 -0700, Andres Freund wrote:
>>> On 2022-04-19 13:50:25 +0200, Erik Rijkers wrote:
>>>> The 12th run of statbug.sh crashed and gave a corefile.
>>>
>>> I ran through quite a few iterations by now, without reproducing :(
>>>
>>> I guess there's some timing issue and you're hitting on your system
>>> due to the slower disks.
>>
>> Ah. I found the issue. The new pgstat_report_stat(true) call in
>> LogicalRepApplyLoop()'s "timeout" section doesn't check if we're in a
>> transaction. And the transactional stats code doesn't handle that (never
>> has).
>>
>> I think all that's needed is a if (IsTransactionState()) around that
>> pgstat_report_stat().
> 
> if (!IsTransactinoState()) ?
> 
>> It might be possible to put an assertion into pgstat_report_stat(), but
>> I need to look at the process exit code to see if it is.
> 
> Inserting a sleep in pgoutput_commit_txn reproduced this. Crashes with
> the same stack trace with the similar variable state.
> 
> diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
> index b197bfd565..def4d751d3 100644
> --- a/src/backend/replication/pgoutput/pgoutput.c
> +++ b/src/backend/replication/pgoutput/pgoutput.c
> @@ -568,6 +568,7 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
>           return;
>       }
>   
> +    sleep(2);
>       OutputPluginPrepareWrite(ctx, true);
>       logicalrep_write_commit(ctx->out, txn, commit_lsn);
>       OutputPluginWrite(ctx, true);
> 
> The following  actuall works for this.
> 
> diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
> index 4171371296..f4e5359513 100644
> --- a/src/backend/replication/logical/worker.c
> +++ b/src/backend/replication/logical/worker.c
> @@ -2882,10 +2882,11 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
>               send_feedback(last_received, requestReply, requestReply);
>   
>               /*
> -             * Force reporting to ensure long idle periods don't lead to
> -             * arbitrarily delayed stats.
> +             * Force reporting to ensure long out-of-transaction idle periods
> +             * don't lead to arbitrarily delayed stats.
>                */
> -            pgstat_report_stat(true);
> +            if (!IsTransactionState())
> +                pgstat_report_stat(true);
>           }
>       }
>   

Yes, that seems to fix it: I applied that latter patch, and ran my 
program 250x without errors. Then I removed it again an it gave the 
error within 15x.

thanks!

Erik


> regards.
> 



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Logical replication timeout problem
Next
From: Alvaro Herrera
Date:
Subject: Re: minor MERGE cleanups