Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization |
Date | |
Msg-id | CAHyXU0zcSykVeB5e7qhRwyfpXNvTDyjRpyCJ9ts9a3JKTRSfOg@mail.gmail.com Whole thread Raw |
In response to | Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization (Mark Dilger <markdilger@yahoo.com>) |
List | pgsql-hackers |
On Tue, Dec 31, 2013 at 3:51 PM, Mark Dilger <markdilger@yahoo.com> wrote: > The BDR documentation > http://wiki.postgresql.org/images/7/75/BDR_Presentation_PGCon2012.pdf > says, > > "Physical replication forces us to use just one > node: multi-master required for write scalability" > > "Physical replication provides best read scalability" > > I am inclined to agree with the second statement, but > I think my proposal invalidates the first statement, at > least for a particular rigorous partitioning over which > server owns which data. > > In my own workflow, I load lots of data from different > sources. The partition the data loads into depends on > which source it came from, and it is never mixed or > cross referenced in any operation that writes the data. > It is only "mixed" in the sense that applications query > data from multiple sources. > > So for me, multi-master with physical replication seems > possible, and would presumably provide the best > read scalability. I doubt that I am in the only database > user who has this kind of workflow. > > The alternatives are ugly. I can load data from separate > sources into separate database servers *without* replication > between them, but then the application layer has to > emulate queries across the data. (Yuck.) Or I can use > logical replication such as BDR, but then the servers > are spending more effort than with physical replication, > so I get less bang for the buck when I purchase more > servers to add to the cluster. Or I can use FDW to access > data from other servers, but that means the same data > may be pulled across the wire arbitrarily many times, with > corresponding impact on the bandwidth. > > Am I missing something here? Does BDR really provide > an equivalent solution? I think BDR is better: while it does only support schema-equivalent replication that is the typical case for distributed write systems like this. Also, there are a lot less assumptions about the network architecture in the actual data itself (for example, what happens when you want to change onwer/mege/split data?). IMNSHO, It's better that each node is managing WAL for itself, not the other way around except in the very special case you want an exact replica of the database on each node at all times as with the current HS/SR. A **huge** amount of work has/is being put in to wal based logical replication support (LLSR in the BDR docs) that should mostly combine the flexibility of trigger based logical replication with the robustness of wal based replication that we have in core now. LLSR a low level data transmission framework that can be wrapped by higher level user facing stuff like BDR. LLSR, by the way, does not come attached with the assumption that all databases have the same schema. If I were you, I'd be studying up on LLSR and seeing how it could be molded into the use cases you're talking about. From a development point of view, the replication train hasn't just left the station, it's a space shuttle that just broke out of earth's orbit. By my reckoning a new 'from the ground up' implementation of replication requiring in-core changes has an exactly zero percent change of being adopted. merlin
pgsql-hackers by date: