Re: Merge a sharded master into a single read-only slave - Mailing list pgsql-general

From Keith Fiske
Subject Re: Merge a sharded master into a single read-only slave
Date
Msg-id CAG1_KcBDtNj1yh8sMTzpffm1k-_qjrXHXVOFaQeQsY6bW5tusg@mail.gmail.com
Whole thread Raw
In response to Re: Merge a sharded master into a single read-only slave  (Sébastien Lorion <sl@thestrangefactory.com>)
Responses Re: Merge a sharded master into a single read-only slave
List pgsql-general


On Mon, Jun 2, 2014 at 2:47 PM, Sébastien Lorion <sl@thestrangefactory.com> wrote:
On Mon, Jun 2, 2014 at 12:52 PM, Kevin Goess <kgoess@bepress.com> wrote:
> So my conclusion is that for now, the best way to scale read-only queries for a sharded master is to 
> implement map-reduce at the application level. 

That's the conclusion I would expect. It's the price you pay for sharding, it's part of the deal.  

But it's also the benefit you get from sharding.  Once your read traffic grows to the point that it's too much for a single host, you're going to have to re-shard it all again *anyway*.  The whole point of sharding is that it allows you to grow outside the capacities of a single host.

​I am not sure I am following you completely. I can replicate the read-only slaves almost as much as I want (with chained replication), so why would I be limited to a single host ? You would have a point concerning database size, but in my case, the main reason I need to shard is because of the amount of writes.


Not sure if this will work for you, but sharing a similar scenario in case it may work for you.

An extension I wrote provides similar logical replication as you've probably seen in other tools.
https://github.com/omniti-labs/mimeo
One difference is it has several methods for this replication, one being incremental based on either time or serial ID. Since incremental replication requires just read-only access on the source databases, it causes no extra write overhead as most logical replication solutions do (triggers writing to queue tables).

A client of ours had a table sharded by UUID to 512 clusters but needed that data pulled to a single cluster for reporting purposes. The tables also had a timestamp column that was set on each insert/update, so the incremental replication method was able to be used here to pull data from all clusters to a single cluster. The single reporting cluster then just had an inheritance table set up with an empty parent table pointing to all the child tables that pulled data into them.

Yes, it was a lot of setup since each of the 512 tables has to be set up individually. But once it was set up it worked surprisingly well. And it's honestly a use case I had never foreseen for the extension.

Not sure if this would work in your case, but maybe it can at least give you an idea of what can be done.

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

pgsql-general by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Re: PostgreSQL 9.3 XML parser seems not to recognize the DOCTYPE element in XML files
Next
From: David G Johnston
Date:
Subject: Re: PostgreSQL 9.3 XML parser seems not to recognize the DOCTYPE element in XML files