Re: TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals - Mailing list pgsql-hackers

From Erik Rijkers
Subject Re: TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals
Date
Msg-id f4bc19a726ac9fce7e47c69ddf018cbc@xs4all.nl
Whole thread Raw
In response to Re: TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 2017-12-20 06:27, Michael Paquier wrote:
> On Wed, Dec 20, 2017 at 7:46 AM, Erik Rijkers <er@xs4all.nl> wrote:

TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(safeXid, 
snap->xmin))", File: "snapbuild.c", Line: 580)

>> Sorry, that was probably too terse, I should explain that a little.
>> 
>> After initing 50 instances, I set up and run a pgbench session in the 
>> master
>> session; the pgbench lines are:
>> 
>>   init: pgbench --port=6515 --quiet --initialize --scale=1 postgres
>>   run:  pgbench -M prepared -c 16 -j 8 -T 1 -P 1 -n postgres -- scale 
>> 1
>> 
>> the other instances then catch up.  The whole takes 5 minutes or so
>> 
>> I vary scale, duration, and number of instances.  I haven't had it 
>> fail in
>> this way yet but I mostly tried with lower number of instances (up to 
>> 25 or
>> so).
> 
> Hm. Are you saying that it takes at least 50 cascading instances to
> see the problem you are seeing? And that you are not seeing any
> problems with a lower number of cascading instances? Are you enabling
> hot_standby_feedback?

That sounds more definitive than I meant it, but yes, only now that I 
tried a higher number of instances did I see this.  But is also often 
succeeds at up to 100 instances (100 is the highest I have tried).

These 50 instances were a logical replication chain, and 
hot_standby_feedback is off.

Overnight I ran 80x the test that failed yesterday: now they all 80 
succeeded.  I am not sure what causes failure over success.

(logical replication does the initial syncing of the instances one by 
one (sequentially) so it isn't as busy as expected; it just takes a long 
time)

I wrote a simple perl program to test logical replication (attached, 
FWIW), running:

./cascade.pl --instances=50 --scale=1 --clients=16 --threads=8 
--duration=1 --repeats=3 --waiting=10

This cascade.pl program uses knowledge of my setup so probably won't run 
elsewhere as is but it shows how the failing test was done.


Erik
Attachment

pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: [HACKERS] UPDATE of partition key
Next
From: Michael Paquier
Date:
Subject: Re: TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals