Re: [HACKERS] logical replication - still unstable after all thesemonths - Mailing list pgsql-hackers

From Erik Rijkers
Subject Re: [HACKERS] logical replication - still unstable after all thesemonths
Date
Msg-id 752c2572afefa737c49d707f5109cbf1@xs4all.nl
Whole thread Raw
In response to Re: [HACKERS] logical replication - still unstable after all thesemonths  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
List pgsql-hackers
On 2017-05-26 10:29, Mark Kirkwood wrote:
> On 26/05/17 20:09, Erik Rijkers wrote:
> 
>> On 2017-05-26 09:40, Simon Riggs wrote:
>>> 
>>> If we can find out what the bug is with a repeatable test case we can 
>>> fix it.
>>> 
>>> Could you provide more details? Thanks
>> 
>> I will, just need some time to clean things up a bit.
>> 
>> 
>> But what I would like is for someone else to repeat my 100x1-minute 
>> tests, taking as core that snippet I posted in my previous email.  I 
>> built bash-stuff around that core (to take md5's, shut-down/start-up 
>> the two instances between runs, write info to log-files, etc).  But it 
>> would be good if someone else made that separately because if that 
>> then does not fail, it would prove that my test-harness is at fault 
>> (and not logical replication).
>> 
> 
> Will do - what I had been doing was running pgbench, waiting until the

Great!

You'll have to think about whether to go with instances of either 
master, or master+those 4 patches.  I guess either choice makes sense.

> row counts on the replica pgbench_history were the same as the
> primary, then summing the %balance and delta fields from the primary
> and replica dbs and comparing. So far - all match up ok. However I'd

I did number-summing for a while as well (because it's a lot faster than 
taking md5's over the full content).
But the problem with summing is that (I think) in the end you cannot be 
really sure that the result is correct (false positives, although I 
don't understand the odds).

> been running a longer time frames (5 minutes), so not the same number
> of repetitions as yet.

I've run 3600-, 30- and 15-minute runs too, but in this case (these 100x 
tests) I wanted to especially test the area around startup/initialise of 
logical replication.  Also the increasing quality of logical replication 
(once it runs with the correct

thanks,

Erik Rijkers



pgsql-hackers by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: [HACKERS] logical replication - still unstable after all thesemonths
Next
From: Konstantin Knizhnik
Date:
Subject: [HACKERS] Logical replication & corrupted pages recovery