Re: [HACKERS] logical replication - still unstable after all thesemonths - Mailing list pgsql-hackers

From Mark Kirkwood
Subject Re: [HACKERS] logical replication - still unstable after all thesemonths
Date
Msg-id ba1ffd89-7046-cfdf-5b08-f6f0834a0825@catalyst.net.nz
Whole thread Raw
In response to Re: [HACKERS] logical replication - still unstable after all thesemonths  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
List pgsql-hackers

On 05/06/17 13:08, Mark Kirkwood wrote:
> On 05/06/17 00:04, Erik Rijkers wrote:
>
>> On 2017-05-31 16:20, Erik Rijkers wrote:
>>> On 2017-05-31 11:16, Petr Jelinek wrote:
>>> [...]
>>>> Thanks to Mark's offer I was able to study the issue as it happened 
>>>> and
>>>> found the cause of this.
>>>>
>>>> [0001-Improve-handover-logic-between-sync-and-apply-worker.patch]
>>>
>>> This looks good:
>>>
>>> -- out_20170531_1141.txt
>>>     100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n   --  scale 25
>>>     100 -- All is well.
>>>
>>> So this is 100x a 1-minute test with 100x success. (This on the most
>>> fastidious machine (slow disks, meagre specs) that used to give 15%
>>> failures)
>>
>> [Improve-handover-logic-between-sync-and-apply-worker-v2.patch]
>>
>> No errors after (several days of) running variants of this. (2500x 1 
>> minute runs; 12x 1-hour runs)
>
> Same here, no errors with the v2 patch applied (approx 2 days - all 1 
> minute runs)
>

Further, reapplying the v1 patch (with a bit of editing as I wanted to 
apply it to my current master), gets a failure with missing rows in the 
history table quite quickly. I'll put back the v2 patch and resume runs 
with that, but I'm cautiously optimistic that the v2 patch solves the issue.

regards

Mark




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
Next
From: Amit Langote
Date:
Subject: Re: [HACKERS] sketchy partcollation handling