Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers

From MauMau
Subject Re: I'd like to discuss scaleout at PGCon
Date
Msg-id CALO4oLMq9UYGa6PXs54j507F5oLj1q=8-Rdk=6Gd_+YMFYNOtg@mail.gmail.com
Whole thread Raw
In response to Re: I'd like to discuss scaleout at PGCon  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: I'd like to discuss scaleout at PGCon
Re: I'd like to discuss scaleout at PGCon
List pgsql-hackers
2018-05-31 22:44 GMT+09:00, Robert Haas <robertmhaas@gmail.com>:
> On Thu, May 31, 2018 at 8:12 AM, MauMau <maumau307@gmail.com> wrote:
>> Oh, I didn't know you support FDW approach mainly for analytics.  I
>> guessed the first target was OLTP read-write scalability.
>
> That seems like a harder target to me, because you will have an extra
> hop involved -- SQL from the client to the first server, then via SQL
> to a second server.  The work of parsing and planning also has to be
> done twice, once for the foreign table and again for the table.  For
> longer-running queries this overhead doesn't matter as much, but for
> short-running queries it is significant.

Yes, that extra hop and double parsing/planning were the killer for
our performance goal when we tried to meet our customer's scaleout
needs with XL.  The application executes 82 DML statements in one
transaction.  Those DMLs consist of INSERT, UPDATE and SELECT that
only accesses one row with a primary key.  The target tables are only
a few, so the application PREPAREs a few statements and EXECUTEs them
repeatedly.  We placed the coordinator node of XL on the same host as
the application, and data nodes and GTM on other individual nodes.

The response time of XL compared to PostgreSQL was 2.4 times, and the
throughput (tps) was 43%.  Interestingly, perf showed that
base_yyparse() was the top CPU consumer on both coordinator and data
node, while base_yyparse() appeared near the bottom of the ranking.
The SQL processor should be one layer, not two layers.

In the above benchmark, each transaction only accessed data on one
data node.  That's what sharding principles recommend.  The FDW
approach would be no problem as long as the application follows the
sharding recommendation.

But not all applications will/can follow the sharding recommendation.
The above application, which is migrated from a mainframe, uses
INSERTs to load data, inserting rows onto various nodes.  Considering
your concern of double parsing/planning for a local foreign table and
a remote real table, wouldn't the FDW approach hit the wall?


> I don't know what "node management" and "failure dectection/failover"
> mean specifically.  I'd like to hear proposals, though.

That's nothing special or new.  Things like:

* Define a set of nodes that can join the cluster.
* Initialize or configure a node according to its role in the cluster.
* Decommission a node from the cluster.
* Define a node group in which all member nodes have the same data set
for redundancy.
* One command to start and shutdown the entire cluster.
* System tables to display the member nodes and node groups.
* Each node's in-memory view of the current cluster state.
* How each node monitors which other nodes.
* Elect a new primary node within a node group when the current
primary node fails.
* Whether each node group should be configured with a master-slaves
replication topology, or a multi-master topology like MySQL Group
Replication

Some of the above may end up with XL's things like
pgxc_node/pgxc_group system tables, pgxc_ctl command, CREATE/DROP
NODE/NODE GROUP commands, etc.


Regards
MauMau


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: behave of --create-slot option
Next
From: Craig Ringer
Date:
Subject: Re: [PATCH] We install pg_regress and isolationtester but not pg_isolation_regress