Re: Why we lost Uber as a user - Mailing list pgsql-hackers

From Alfred Perlstein
Subject Re: Why we lost Uber as a user
Date
Msg-id 39886b9a-6ff2-e48e-975a-4c7a7a2418c7@freebsd.org
Whole thread Raw
In response to Re: Why we lost Uber as a user  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Why we lost Uber as a user  (Bruce Momjian <bruce@momjian.us>)
Re: Why we lost Uber as a user  (Greg Stark <stark@mit.edu>)
List pgsql-hackers

On 8/2/16 2:14 PM, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> With physical replication, there is the concern that a bug in *just* the
>> physical (WAL) side of things could cause corruption.
> Right.  But with logical replication, there's the same risk that the
> master's state could be fine but a replication bug creates corruption on
> the slave.
>
> Assuming that the logical replication works by issuing valid SQL commands
> to the slave, one could hope that this sort of "corruption" only extends
> to having valid data on the slave that fails to match the master.
> But that's still not a good state to be in.  And to the extent that
> performance concerns lead the implementation to bypass some levels of the
> SQL engine, you can easily lose that guarantee too.
>
> In short, I think Uber's position that logical replication is somehow more
> reliable than physical is just wishful thinking.  If anything, my money
> would be on the other way around: there's a lot less mechanism that can go
> wrong in physical replication.  Which is not to say there aren't good
> reasons to use logical replication; I just do not believe that one.
>
>             regards, tom lane
>
>
The reason it can be less catastrophic is that for logical replication 
you may futz up your data, but you are safe from corrupting your entire 
db.  Meaning if an update is missed or doubled that may be addressed by 
a fixup SQL stmt, however if the replication causes a write to the 
entirely wrong place in the db file then you need to "fsck" your db and 
hope that nothing super critical was blown away.

The impact across a cluster is potentially magnified by physical 
replication.

So for instance, let's say there is a bug in the master's write to 
disk.  The logical replication acts as a barrier from that bad write 
going to the slaves.   With bad writes going to slaves then any 
corruption experienced on the master will quickly reach the slaves and 
they too will be corrupted.

With logical replication a bug may be stopped at the replication layer.  
At that point you can resync the slave from the master.

Now in the case of physical replication all your base are belong to zuul 
and you are in a very bad state.

That said with logical replication, who's to say that if the statement 
is replicated to a slave that the slave won't experience the same bug 
and also corrupt itself.

We may be saying the same thing, but still there is something to be said 
for logical replication... also, didnt they show that logical 
replication was faster for some use cases at Uber?

-Alfred








pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: multivariate statistics (v19)
Next
From: Bruce Momjian
Date:
Subject: Re: Why we lost Uber as a user