Re: The plan for FDW-based sharding - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: The plan for FDW-based sharding
Date
Msg-id 56CD764C.6050408@postgrespro.ru
Whole thread Raw
In response to The plan for FDW-based sharding  (Bruce Momjian <bruce@momjian.us>)
Responses Re: The plan for FDW-based sharding
List pgsql-hackers
Sorry, but based on this plan it is possible to make a conclusion that 
there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based.  From my point of view there are  much more 
possible alternatives.
Our main idea with XTM (eXtensible Transaction Manager API) was to make 
it possible to develop cluster solutions for Postgres as extensions 
without patching code of Postgres core. And FDW is one of the mechanism 
which makes it possible to reach this goal.

IMHO it will be hard to implement efficient execution of complex OLAP 
queries (including cross-node joins  and aggregation) within FDW 
paradigm. It will be necessary to build distributed query execution plan 
and coordinate it execution at cluster nodes. And definitely we need 
specialized optimizer for distributed queries. Right now solution of the 
problem are provided by XL and Greenplum, but both are forks of Posrgres 
with a lot of changes in Postgres core. The challenge is to provide the 
similar functionality, but at extension level (using custom nodes, 
pluggable transaction manager, ...).

But, as you noticed,  complex OLAP is just one of the scenarios and this 
is not the only possible way of using clusters. In some cases FDW-based 
sharding can be quite efficient. Or pg_shard approach which also adds 
sharding at extension level and in some aspects is more flexible than 
FDW-based solution. Not all scenarios require global transaction 
manager. But if one need global consistency, then XTM API allows to 
provide ACID for both approaches (and not only for them).

We currently added to commitfest our XTM patch together with 
postgres_fdw patch integrating timestamp-based DTM implementation in 
postgres_fdw. It illustrates how global consistency canbe reached for 
FDW-based sharding.
If this XTM patch will be committed, then in 9.6 we will have wide 
flexibility to play with different distributed transaction managers. And 
it can be used for many cluster solutions.

IMHO it will be very useful to extend your classification of cluster use 
cases, more precisely  formulate demands in all cases, investigate  how 
them can be covered by existed cluster solutions for Postgres and which 
niches are still vacant. We are currently continue work on "multimaster" 
- some more convenient alternative to hot-standby replication. Looks 
like PostgreSQL is missing some product providing functionality similar 
to Oracle RAC or MySQL Gallera. It is yet another direction of cluster 
development for PostgreSQL.  Let's be more open and flexible.


On 23.02.2016 19:43, Bruce Momjian wrote:
> There was discussion at the FOSDEM/PGDay Developer Meeting
> (https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
> about sharding so I wanted to outline where I think we are going with
> sharding and FDWs.
>
> First, let me point out that, unlike pg_upgrade and the Windows port,
> which either worked or didn't work, sharding is going be implemented and
> useful in stages.  It will take several years to complete, similar to
> parallelism, streaming replication, and logical replication.
>
> Second, as part of this staged implementation, there are several use
> cases that will be shardable at first, and then only later, more complex
> ones.  For example, here are some use cases and the technology they
> require:
>
> 1. Cross-node read-only queries on read-only shards using aggregate
> queries, e.g. data warehouse:
>
> This is the simplest to implement as it doesn't require a global
> transaction manager, global snapshot manager, and the number of rows
> returned from the shards is minimal because of the aggregates.
>
> 2. Cross-node read-only queries on read-only shards using non-aggregate
> queries:
>
> This will stress the coordinator to collect and process many returned
> rows, and will show how well the FDW transfer mechanism scales.
>
> 3. Cross-node read-only queries on read/write shards:
>
> This will require a global snapshot manager to make sure the shards
> return consistent data.
>
> 4. Cross-node read-write queries:
>
> This will require a global snapshot manager and global snapshot manager.
>
> In 9.6, we will have FDW join and sort pushdown
> (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
> calability.html).  Unfortunately I don't think we will have aggregate
> pushdown, so we can't test #1, but we might be able to test #2, even in
> 9.5.  Also, we might have better partitioning syntax in 9.6.
>
> We need things like parallel partition access and replicated lookup
> tables for more join pushdown.
>
> In a way, because these enhancements are useful independent of sharding,
> we have not tested to see how well an FDW sharding setup will work and
> for which workloads.
>
> We know Postgres XC/XL works, and scales, but we also know they require
> too many code changes to be merged into Postgres (at least based on
> previous discussions).  The FDW sharding approach is to enhance the
> existing features of Postgres to allow as much sharding as possible.
>
> Once that is done, we can see what workloads it covers and
> decide if we are willing to copy the volume of code necessary
> to implement all supported Postgres XC or XL workloads.
> (The Postgres XL license now matches the Postgres license,
> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> Postgres XC has always used the Postgres license.)
>
> If we are not willing to add code for the missing Postgres XC/XL
> features, Postgres XC/XL will probably remain a separate fork of
> Postgres.  I don't think anyone knows the answer to this question, and I
> don't know how to find the answer except to keep going with our current
> FDW sharding approach.
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: The plan for FDW-based sharding
Next
From: Oleg Bartunov
Date:
Subject: Re: The plan for FDW-based sharding