The plan for FDW-based sharding - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | The plan for FDW-based sharding |
Date | |
Msg-id | 20160223164335.GA11285@momjian.us Whole thread Raw |
Responses |
Re: The plan for FDW-based sharding
Re: The plan for FDW-based sharding Re: The plan for FDW-based sharding Re: The plan for FDW-based sharding |
List | pgsql-hackers |
There was discussion at the FOSDEM/PGDay Developer Meeting (https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting) about sharding so I wanted to outline where I think we are going with sharding and FDWs. First, let me point out that, unlike pg_upgrade and the Windows port, which either worked or didn't work, sharding is going be implemented and useful in stages. It will take several years to complete, similar to parallelism, streaming replication, and logical replication. Second, as part of this staged implementation, there are several use cases that will be shardable at first, and then only later, more complex ones. For example, here are some use cases and the technology they require: 1. Cross-node read-only queries on read-only shards using aggregate queries, e.g. data warehouse: This is the simplest to implement as it doesn't require a global transaction manager, global snapshot manager, and the number of rows returned from the shards is minimal because of the aggregates. 2. Cross-node read-only queries on read-only shards using non-aggregate queries: This will stress the coordinator to collect and process many returned rows, and will show how well the FDW transfer mechanism scales. 3. Cross-node read-only queries on read/write shards: This will require a global snapshot manager to make sure the shards return consistent data. 4. Cross-node read-write queries: This will require a global snapshot manager and global snapshot manager. In 9.6, we will have FDW join and sort pushdown (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s calability.html). Unfortunately I don't think we will have aggregate pushdown, so we can't test #1, but we might be able to test #2, even in 9.5. Also, we might have better partitioning syntax in 9.6. We need things like parallel partition access and replicated lookup tables for more join pushdown. In a way, because these enhancements are useful independent of sharding, we have not tested to see how well an FDW sharding setup will work and for which workloads. We know Postgres XC/XL works, and scales, but we also know they require too many code changes to be merged into Postgres (at least based on previous discussions). The FDW sharding approach is to enhance the existing features of Postgres to allow as much sharding as possible. Once that is done, we can see what workloads it covers and decide if we are willing to copy the volume of code necessary to implement all supported Postgres XC or XL workloads. (The Postgres XL license now matches the Postgres license, http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/. Postgres XC has always used the Postgres license.) If we are not willing to add code for the missing Postgres XC/XL features, Postgres XC/XL will probably remain a separate fork of Postgres. I don't think anyone knows the answer to this question, and I don't know how to find the answer except to keep going with our current FDW sharding approach. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription +
pgsql-hackers by date: