Re: Horizontal scalability/sharding - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Horizontal scalability/sharding |
Date | |
Msg-id | 55E7388A.6030300@agliodbs.com Whole thread Raw |
In response to | Horizontal scalability/sharding (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Horizontal scalability/sharding
|
List | pgsql-hackers |
On 09/01/2015 04:14 PM, Petr Jelinek wrote: > On 2015-09-02 00:09, Josh Berkus wrote: >> On 09/01/2015 02:29 PM, Tomas Vondra wrote: >>> So while you may be right in single-DC deployments, with multi-DC >>> deployments the situation is quite different - not only that the network >>> bandwidth is not unlimited, but because latencies within DC may be a >>> fraction of latencies between the locations (to the extent that the >>> increase due to syncrep may be just noise). So the local replication may >>> be actually way faster. >> >> I'm not seeing how the above is better using syncrep than using shard >> copying? > > Shard copying usually assumes that the origin node does the copy - the > data has to go twice through the slow connection. With replication you > can replicate locally over fast connection. Ah, I was thinking of the case of having a single set of copies in the remote DC, but of course that isn't going to be the case with a highly redundant setup. Basically this seems to be saying that, in an ideal setup, we'd have some kind of synchronous per-shard replication. We don't have that at present (sync rep is whole-node, and BDR is asynchronous). There's also the question of how to deal with failures and taking bad nodes out of circulation in such a setup, especially considering that the writes could be coming from multiple other nodes. >> Not really, the mechanism is different and the behavior is different. >> One critical deficiency in using binary syncrep is that you can't do >> round-robin redundancy at all; every redundant node has to be an exact >> mirror of another node. In a good HA distributed system, you want >> multiple shards per node, and you want each shard to be replicated to a >> different node, so that in the event of node failure you're not dumping >> the full load on one other server. >> > > This assumes that we use binary replication, but we can reasonably use > logical replication which can quite easily do filtering of what's > replicated where. Is there a way to do logical synchronous replication? I didn't think there was. >>> IMHO the design has to address the multi-DC setups somehow. I think that >>> many of the customers who are so concerned about scaling to many shards >>> are also concerned about availability in case of DC outages, no? >> >> Certainly. But users located in a single DC shouldn't pay the same >> overhead as users who are geographically spread. >> > > Agreed, so we should support both ways, but I don't think it's necessary > to support both ways in version 0.1. It's just important to not paint > ourselves into a corner with design decisions that would make one of the > ways impossible. Exactly! Let me explain why I'm so vocal on this point. PostgresXC didn't deal with the redundancy/node replacement at all until after version 1.0. Then, when they tried to address it, they discovered that the code was chock full of assumptions that "1 node == 1 shard", and breaking that assumption would require a total refactor of the code (which never happened). I don't want to see a repeat of that mistake. Even if it's only on paper, any new sharding design needs to address these questions: 1. How do we ensure no/minimal data is lost if we lose a node? 2. How do we replace a lost node (without taking the cluster down)? 2. a. how do we allow an out-of-sync node to "catchup"? 3. How do we maintain metadata about good/bad nodes (and shard locations)? 4. How do we add nodes to expand the cluster? There doesn't need to be code for all of the above from version 0.1, but there needs to be a plan to tackle those problems. Otherwise, we'll just end up with another dead-end, not-useful-in-production technology. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: