Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers

From MauMau
Subject Re: I'd like to discuss scaleout at PGCon
Date
Msg-id DEB6D337EA354805B6E371E7ACDAC97F@tunaPC
Whole thread Raw
In response to Re: I'd like to discuss scaleout at PGCon  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: I'd like to discuss scaleout at PGCon
Re: I'd like to discuss scaleout at PGCon
List pgsql-hackers
From: Robert Haas
On Thu, May 31, 2018 at 8:12 AM, MauMau <maumau307@gmail.com> wrote:
>> Oh, I didn't know you support FDW approach mainly for analytics.  I
>> guessed the first target was OLTP read-write scalability.
>
> That seems like a harder target to me, because you will have an
extra
> hop involved -- SQL from the client to the first server, then via
SQL
> to a second server.  The work of parsing and planning also has to be
> done twice, once for the foreign table and again for the table.  For
> longer-running queries this overhead doesn't matter as much, but for
> short-running queries it is significant.


From: Simon Riggs
On 1 June 2018 at 04:00, MauMau <maumau307@gmail.com> wrote:
>> The SQL processor should be one layer, not two layers.

> For OLTP, that would be best. But it would be restricted to
> single-node requests, leaving you the problem of how you know ahead
of
> time whether an SQL statement was single node or not.
>
> Using a central coordinator node allows us to hide the decision of
> single-node/multi-node from the user which seems essential for
general
> SQL. If you are able to restrict the types of requests users make
then
> we can do direct access to partitions - so there is scope for a
> single-node API, as Mongo provides.

I don't think an immediate server like the coordinators in XL is
necessary.  That extra hop can be eliminated by putting both the
coordinator and the data node roles in the same server process.  That
is, the node to which an application connects communicates with other
nodes only when it does not necessary data.

Furthermore, an extra hop and double parsing/planning could matter for
analytic queries, too.  For example, SAP HANA boasts of scanning 1
billion rows in one second.  In HANA's scaleout architecture, an
application can connect to any worker node and the node communicates
with other nodes only when necessary (there's one special node called
"master", but it manages the catalog and transactions; it's not an
extra hop like the coordinator in XL).  Vertica is an MPP analytics
database, but it doesn't have a node like the coordinator, either.  To
achieve maximum performance for real-time queries, the scaleout
architecture should avoid an extra hop when possible.


> Using a central coordinator also allows multi-node transaction
> control, global deadlock detection etc..

VoltDB does not have an always-pass hop like the coordinator in XL.
Our proprietary RDBMS named Symfoware, which is not based on
PostgreSQL, also doesn't have an extra hop, and can handle distributed
transactions and deadlock detection/resolution without any special
node like GTM.


Regards
MauMau



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: buildfarm vs code
Next
From: Andrew Dunstan
Date:
Subject: Re: buildfarm vs code