Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers
From | MauMau |
---|---|
Subject | Re: I'd like to discuss scaleout at PGCon |
Date | |
Msg-id | BDE951DEFC144489AFD1D7131C547B8B@tunaPC Whole thread Raw |
In response to | Re: I'd like to discuss scaleout at PGCon (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
From: Michael Paquier > Greenplum's orca planner (and Citus?) have such facilities if I recall > correctly, just mentioning that pushing down directly to remote nodes > compiled plans ready for execution exists here and there (that's not the > case of XC/XL). For queries whose planning time is way shorter than its > actual execution, like analytical work that would not matter much. But > not for OLTP and short transaction workloads. It seems that Greenplum does: https://greenplum.org/docs/580/admin_guide/query/topics/parallel-proc. html#topic1 "The master receives, parses, and optimizes the query. The resulting query plan is either parallel or targeted. The master dispatches parallel query plans to all segments,..." while Citus doesn't: https://docs.citusdata.com/en/v7.4/develop/reference_processing.html#c itus-query-processing "Next, the planner breaks the query into two parts - the coordinator query which runs on the coordinator and the worker query fragments which run on individual shards on the workers. The planner then assigns these query fragments to the workers such that all their resources are used efficiently. After this step, the distributed query plan is passed on to the distributed executor for execution. ... Once the distributed executor sends the query fragments to the workers, they are processed like regular PostgreSQL queries. The PostgreSQL planner on that worker chooses the most optimal plan for executing that query locally on the corresponding shard table. The PostgreSQL executor then runs that query and returns the query results back to the distributed executor." BTW, the above page states that worker nodes directly exchanges data during query execution. Greenplum also does so among segment nodes to join tables which are distributed by different key columns. XL seems to do so, too. If this type of interaction is necessary, how would the FDW approach do that? The remote servers need to interact with each other. "The task tracker executor is designed to efficiently handle complex queries which require repartitioning and shuffling intermediate data among workers." > Greenplum uses also a single-coordinator, multi-datanode instance. That > looks similar, right? Greenplum uses a single master and multiple workers. That's similar to Citus. But Greenplum is not similar to VoltDB nor Vertica, since those allow applications to connect to any node. >> Our proprietary RDBMS named Symfoware, which is not based on >> PostgreSQL, also doesn't have an extra hop, and can handle distributed >> transactions and deadlock detection/resolution without any special >> node like GTM. > > Interesting to know that. This is an area with difficult problems. At > the closer to merge with Postgres head, the more fun (?) you get into > trying to support new SQL features, and sometimes you finish with hard > ERRORs or extra GUC switches to prevent any kind of inconsistent > operations. Yes, I hope our deadlock detection/resolution can be ported to PostgreSQL. But I'm also concerned like you, because Symfoware is locking-based, not MVCC-based. Regards MauMau
pgsql-hackers by date: