Re: Horizontal scalability/sharding - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Horizontal scalability/sharding |
Date | |
Msg-id | 20150901221145.GB27332@momjian.us Whole thread Raw |
In response to | Re: Horizontal scalability/sharding (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Horizontal scalability/sharding
|
List | pgsql-hackers |
On Tue, Sep 1, 2015 at 12:40:40PM -0400, Robert Haas wrote: > On Tue, Sep 1, 2015 at 6:55 AM, Bruce Momjian <bruce@momjian.us> wrote: > > I assumed these queries were going to be solved by sending as digested > > data as possible to the coordinator, and having the coordinator complete > > any remaining processing. I think we are going to need to decide if > > such "sending data back to shards" is something we are ever going to > > implement. I can see FDWs _not_ working well for that use-case. > > I do think we are going to want to support that. All the people I've > talked to about parallel and distributed query processing agree that > you need to do that sort of thing to get really good and scalable > performance. I think that we could make a lot of headway as compared > with the status quo just by implementing more pushdown optimizations > than we have today. Right now, SELECT COUNT(*) FROM table will suck > back the whole remote table and count the rows locally, and that's > stupid. We can fix that case with better pushdown logic. We can also > fix the case of N-way join nests where the joins are either on the > partitioning key or to replicated tables. But suppose you have a join > between two tables which are sharded across the cluster but not on the > partitioning key. There's no way to push the join down, so all the > work comes back to the coordinator, which is possibly OK if such > queries are rare, but not so hot if they are frequent. Let me clearer about what the Citus Data paper shows. I said originally that the data was sent to the coordinator, sorted, then resent to the shards, but the document: https://goo.gl/vJWF85https://www.citusdata.com/blog/114-how-to-build-your-distributed-database has the shards create the groups and the groups are sent to the other shards. For example, to do COUNT(DISTINCT) if you have three shards, then each shard breaks its data into 3 buckets (1B in size), then the first bucket from each of the three shards goes to the first shard, and the second bucket goes to the second shared, etc. Basically, they are doing map-reduce, and the shards are creating additional batches that get shipped to other shards. I can see FDWs not working well in that case as you are really creating a new data layout just for the query. This explains why the XC/XL people are saying they would use FDWs if they existed at the time they started development, while the Citus Data people are saying they couldn't use FDWs as they currently exist. They probably both needed FDW improvements, but I think the Citus Data features would need a lot more. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
pgsql-hackers by date: