Re: Replication on the backend - Mailing list pgsql-hackers

From J. Andrew Rogers
Subject Re: Replication on the backend
Date
Msg-id DC5354B1-808C-4E1A-9EDA-C7084C4914B1@neopolitan.com
Whole thread Raw
In response to Re: Replication on the backend  (Gregory Maxwell <gmaxwell@gmail.com>)
List pgsql-hackers
On Dec 6, 2005, at 9:09 PM, Gregory Maxwell wrote:
> Eh, why would light limited delay be any slower than a disk on FC the
> same distance away? :)
>
> In any case, performance of PG on iscsi is just fine. You can't blame
> the network... Doing multimaster replication is hard because the
> locking primitives that are fine on a simple multiprocessor system
> (with a VERY high bandwidth very low latency interconnect between
> processors) just don't work across a network, so you're left finding
> other methods and making them work...


Speed of light latency shows up pretty damn often in real networks,  
even relatively local ones.  The number of people that wonder why a  
transcontinental SLA of 10ms is not possible is astonishing.  The  
silicon fabrics are sufficiently fast that most well-designed  
networks are limited by how fast one can push photons through a  
fiber, which is significantly slower than photons through a vacuum.   
Silicon switch fabrics add latency measured in nanoseconds, which is  
effectively zero for many networks that leave the system board.

Compared to single system simple SMP, a local cluster built on a  
first-rate fabric will have about an order of magnitude higher  
latency but very similar bandwidth.  On the other hand, at those  
latencies you can increase the number of addressable processors with  
that kind of bandwidth by an order of magnitude, so it is a bit of a  
trade.  However, latency matters a lot such that one would have to be  
a lot smarter about partitioning synchronization across that fabric  
even though one would lose nothing in the bandwidth department.


> But again, multimaster isn't hard because there of some inherently
> slow property of networks.


Eh?  As far as I know, the difficulty of multi-master is almost  
entirely a product of the latency of real networks such that they are  
too slow for scalable distributed locks.  SMP is little more than a  
distributed lock manager implemented in silicon.  Therefore, multi- 
master is hard in practice because we cannot drive networks fast  
enough.  That said, current state-of-the-art network fabrics are  
within an order of magnitude of SMP fabrics such that they could be  
real contenders, particularly once you get north of 8-16 processors.

The really sweet potential is in Opteron system boards with  
Infiniband directly attached to HyperTransport.  At that level of  
bandwidth and latency, both per node and per switch fabric, the  
architecture possibilities start to become intriguing.


J. Andrew Rogers




pgsql-hackers by date:

Previous
From: Markus Schiltknecht
Date:
Subject: Re: Replication on the backend
Next
From: Harald Fuchs
Date:
Subject: Re: Oddity with extract microseconds?