Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers

From MauMau
Subject Re: I'd like to discuss scaleout at PGCon
Date
Msg-id BDE951DEFC144489AFD1D7131C547B8B@tunaPC
Whole thread Raw
In response to Re: I'd like to discuss scaleout at PGCon  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
From: Michael Paquier
> Greenplum's orca planner (and Citus?) have such facilities if I
recall
> correctly, just mentioning that pushing down directly to remote
nodes
> compiled plans ready for execution exists here and there (that's not
the
> case of XC/XL).  For queries whose planning time is way shorter than
its
> actual execution, like analytical work that would not matter much.
But
> not for OLTP and short transaction workloads.

It seems that Greenplum does:

https://greenplum.org/docs/580/admin_guide/query/topics/parallel-proc.
html#topic1

"The master receives, parses, and optimizes the query. The resulting
query plan is either parallel or targeted. The master dispatches
parallel query plans to all segments,..."

while Citus doesn't:

https://docs.citusdata.com/en/v7.4/develop/reference_processing.html#c
itus-query-processing

"Next, the planner breaks the query into two parts - the coordinator
query which runs on the coordinator and the worker query fragments
which run on individual shards on the workers. The planner then
assigns these query fragments to the workers such that all their
resources are used efficiently. After this step, the distributed query
plan is passed on to the distributed executor for execution.
...
Once the distributed executor sends the query fragments to the
workers, they are processed like regular PostgreSQL queries. The
PostgreSQL planner on that worker chooses the most optimal plan for
executing that query locally on the corresponding shard table. The
PostgreSQL executor then runs that query and returns the query results
back to the distributed executor."



BTW, the above page states that worker nodes directly exchanges data
during query execution.  Greenplum also does so among segment nodes to
join tables which are distributed by different key columns.  XL seems
to do so, too.  If this type of interaction is necessary, how would
the FDW approach do that?  The remote servers need to interact with
each other.

"The task tracker executor is designed to efficiently handle complex
queries which require repartitioning and shuffling intermediate data
among workers."



> Greenplum uses also a single-coordinator, multi-datanode instance.
That
> looks similar, right?

Greenplum uses a single master and multiple workers.  That's similar
to Citus.  But Greenplum is not similar to VoltDB nor Vertica, since
those allow applications to connect to any node.


>> Our proprietary RDBMS named Symfoware, which is not based on
>> PostgreSQL, also doesn't have an extra hop, and can handle
distributed
>> transactions and deadlock detection/resolution without any special
>> node like GTM.
>
> Interesting to know that.  This is an area with difficult problems.
At
> the closer to merge with Postgres head, the more fun (?) you get
into
> trying to support new SQL features, and sometimes you finish with
hard
> ERRORs or extra GUC switches to prevent any kind of inconsistent
> operations.

Yes, I hope our deadlock detection/resolution can be ported to
PostgreSQL.  But I'm also concerned like you, because Symfoware is
locking-based, not MVCC-based.

Regards
MauMau



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Scariest patch tournament, PostgreSQL 11 edition
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: commitfest 2018-07