Thread: Merge a sharded master into a single read-only slave
I have a master database sharded by user_id, with globally unique IDs for everything, except shared configuration data stored in global tables (resources strings, system parameters, etc).
What would be the best (ie both fast and reliable, simple to maintain as a bonus) to merge all shards into a single read-only slave that will then be replicated and used for read queries ? I took a look at Londiste and repmgr, and can see some ways to accomplish that, but would appreciate the advice of people here.
Thank you,
Sébastien
On Thu, May 29, 2014 at 12:58 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
I have a master database sharded by user_id, with globally unique IDs for everything, except shared configuration data stored in global tables (resources strings, system parameters, etc).What would be the best (ie both fast and reliable, simple to maintain as a bonus) to merge all shards into a single read-only slave that will then be replicated and used for read queries ? I took a look at Londiste and repmgr, and can see some ways to accomplish that, but would appreciate the advice of people here.Thank you,Sébastien
Answering myself, please correct me if my findings are wrong.
I cannot find a way to accomplish the above without using statement level replication. That kind of defeat the point since if my DB is sharded, it's to avoid having to vertically scale to sustain the write charge, but by using statement level replication, I will now have to vertically scale the slave, bringing me back to square one.
So my conclusion is that for now, the best way to scale read-only queries for a sharded master is to implement map-reduce at the application level. Fortunately, most of the time, read queries scope can be limited to a single shard, but nonetheless, it would have been nice to avoid the additional complexity if it had been possible to merge sharded tables on a binary level (which should be much faster than statement level), given that their records will never overlap (i.e. the same record is never present in many shards).
Sébastien
On 6/1/2014 12:59 PM, Sébastien Lorion wrote: > it would have been nice to avoid the additional complexity if it had > been possible to merge sharded tables on a binary level (which should > be much faster than statement level), given that their records will > never overlap (i.e. the same record is never present in many shards). not even remotely possible, since binary replication is at a block level, NOT a tuple level... Also, the index on this merged table will be considerably different than the index on any one of the sharded 'masters' (and in binary replication, indexes are also block replicated). -- john r pierce 37N 122W somewhere on the middle of the left coast
> So my conclusion is that for now, the best way to scale read-only queries for a sharded master is to
> implement map-reduce at the application level.
That's the conclusion I would expect. It's the price you pay for sharding, it's part of the deal.
But it's also the benefit you get from sharding. Once your read traffic grows to the point that it's too much for a single host, you're going to have to re-shard it all again *anyway*. The whole point of sharding is that it allows you to grow outside the capacities of a single host.
On Mon, Jun 2, 2014 at 12:52 PM, Kevin Goess <kgoess@bepress.com> wrote:
> So my conclusion is that for now, the best way to scale read-only queries for a sharded master is to> implement map-reduce at the application level.That's the conclusion I would expect. It's the price you pay for sharding, it's part of the deal.But it's also the benefit you get from sharding. Once your read traffic grows to the point that it's too much for a single host, you're going to have to re-shard it all again *anyway*. The whole point of sharding is that it allows you to grow outside the capacities of a single host.
I am not sure I am following you completely. I can replicate the read-only slaves almost as much as I want (with chained replication), so why would I be limited to a single host ? You would have a point concerning database size, but in my case, the main reason I need to shard is because of the amount of writes.
On Mon, Jun 2, 2014 at 2:47 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
On Mon, Jun 2, 2014 at 12:52 PM, Kevin Goess <kgoess@bepress.com> wrote:> So my conclusion is that for now, the best way to scale read-only queries for a sharded master is to> implement map-reduce at the application level.That's the conclusion I would expect. It's the price you pay for sharding, it's part of the deal.But it's also the benefit you get from sharding. Once your read traffic grows to the point that it's too much for a single host, you're going to have to re-shard it all again *anyway*. The whole point of sharding is that it allows you to grow outside the capacities of a single host.I am not sure I am following you completely. I can replicate the read-only slaves almost as much as I want (with chained replication), so why would I be limited to a single host ? You would have a point concerning database size, but in my case, the main reason I need to shard is because of the amount of writes.
Not sure if this will work for you, but sharing a similar scenario in case it may work for you.
An extension I wrote provides similar logical replication as you've probably seen in other tools. https://github.com/omniti-labs/mimeo
A client of ours had a table sharded by UUID to 512 clusters but needed that data pulled to a single cluster for reporting purposes. The tables also had a timestamp column that was set on each insert/update, so the incremental replication method was able to be used here to pull data from all clusters to a single cluster. The single reporting cluster then just had an inheritance table set up with an empty parent table pointing to all the child tables that pulled data into them.
Yes, it was a lot of setup since each of the 512 tables has to be set up individually. But once it was set up it worked surprisingly well. And it's honestly a use case I had never foreseen for the extension.
Here's PostgreSQL-based sharding solution which provides both read/write horizontal scalability. http://sourceforge.net/apps/mediawiki/postgres-xc/index.php?title=Main_Page http://sourceforge.net/projects/postgres-xc/ Hope this helps. --- Koichi Suzuki 2014-06-03 3:47 GMT+09:00 Sébastien Lorion <sl@thestrangefactory.com>: > On Mon, Jun 2, 2014 at 12:52 PM, Kevin Goess <kgoess@bepress.com> wrote: >> >> > So my conclusion is that for now, the best way to scale read-only >> > queries for a sharded master is to >> > implement map-reduce at the application level. >> >> That's the conclusion I would expect. It's the price you pay for sharding, >> it's part of the deal. >> >> But it's also the benefit you get from sharding. Once your read traffic >> grows to the point that it's too much for a single host, you're going to >> have to re-shard it all again *anyway*. The whole point of sharding is that >> it allows you to grow outside the capacities of a single host. > > > I am not sure I am following you completely. I can replicate the read-only > slaves almost as much as I want (with chained replication), so why would I > be limited to a single host ? You would have a point concerning database > size, but in my case, the main reason I need to shard is because of the > amount of writes. >
Not sure if this would work in your case, but maybe it can at least give you an idea of what can be done.One difference is it has several methods for this replication, one being incremental based on either time or serial ID. Since incremental replication requires just read-only access on the source databases, it causes no extra write overhead as most logical replication solutions do (triggers writing to queue tables).Not sure if this will work for you, but sharing a similar scenario in case it may work for you.An extension I wrote provides similar logical replication as you've probably seen in other tools.
https://github.com/omniti-labs/mimeo
A client of ours had a table sharded by UUID to 512 clusters but needed that data pulled to a single cluster for reporting purposes. The tables also had a timestamp column that was set on each insert/update, so the incremental replication method was able to be used here to pull data from all clusters to a single cluster. The single reporting cluster then just had an inheritance table set up with an empty parent table pointing to all the child tables that pulled data into them.
Yes, it was a lot of setup since each of the 512 tables has to be set up individually. But once it was set up it worked surprisingly well. And it's honestly a use case I had never foreseen for the extension.
Thank you Keith for taking the time to let me know about your solution. It looks great indeed, especially the part about not putting load on the shards themselves. Correct me if I am wrong, but will it not also suffer the same limitation as any statement based replication, namely that the "merged" slave will have to sustain the same write load as all shards combined ?
Sébastien
Hi Sébastien: On Thu, Jun 5, 2014 at 5:41 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote: > .... Correct me if I am wrong, but will it not also suffer the same > limitation as any statement based replication, namely that the "merged" > slave will have to sustain the same write load as all shards combined ? I cannot tell you the exact mimeo behaviour, but if you incremental replication using an id/timestamp by >pulling< changes from the masters, you will normally batch them and insert all the changes to the slaves in a single transaction, which leads to less load as many times your limit is in transaction rate, not record rate. (i.e., every 5 minutes you query for all the tuples changed, and insert/update them all in one go ) ( Also, if tuples are updated many times between sweeps the slave will get only one ) Francisco Olarte.
Hi Sébastien:
On Thu, Jun 5, 2014 at 5:41 PM, Sébastien Lorion
<sl@thestrangefactory.com> wrote:
> .... Correct me if I am wrong, but will it not also suffer the same> limitation as any statement based replication, namely that the "merged"I cannot tell you the exact mimeo behaviour, but if you incremental
> slave will have to sustain the same write load as all shards combined ?
replication using an id/timestamp by >pulling< changes from the
masters, you will normally batch them and insert all the changes to
the slaves in a single transaction, which leads to less load as many
times your limit is in transaction rate, not record rate. (i.e., every
5 minutes you query for all the tuples changed, and insert/update them
all in one go ) ( Also, if tuples are updated many times between
sweeps the slave will get only one )
Francisco Olarte.
You are right, requesting changes at fixed time intervals would certainly help reduce the load. I will have to test and see if a good balance can be achieved between not having stale data for too long and keeping up with writes.
Sébastien
On Thu, Jun 5, 2014 at 2:09 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
Hi Sébastien:
On Thu, Jun 5, 2014 at 5:41 PM, Sébastien Lorion
<sl@thestrangefactory.com> wrote:
> .... Correct me if I am wrong, but will it not also suffer the same> limitation as any statement based replication, namely that the "merged"I cannot tell you the exact mimeo behaviour, but if you incremental
> slave will have to sustain the same write load as all shards combined ?
replication using an id/timestamp by >pulling< changes from the
masters, you will normally batch them and insert all the changes to
the slaves in a single transaction, which leads to less load as many
times your limit is in transaction rate, not record rate. (i.e., every
5 minutes you query for all the tuples changed, and insert/update them
all in one go ) ( Also, if tuples are updated many times between
sweeps the slave will get only one )
Francisco Olarte.You are right, requesting changes at fixed time intervals would certainly help reduce the load. I will have to test and see if a good balance can be achieved between not having stale data for too long and keeping up with writes.Sébastien
If you have any questions while evaluating it, feel free to ask or post any issues to github.