Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers
| From | MauMau | 
|---|---|
| Subject | Re: I'd like to discuss scaleout at PGCon | 
| Date | |
| Msg-id | DEB6D337EA354805B6E371E7ACDAC97F@tunaPC Whole thread Raw | 
| In response to | Re: I'd like to discuss scaleout at PGCon (Simon Riggs <simon@2ndquadrant.com>) | 
| Responses | Re: I'd like to discuss scaleout at PGCon Re: I'd like to discuss scaleout at PGCon | 
| List | pgsql-hackers | 
From: Robert Haas On Thu, May 31, 2018 at 8:12 AM, MauMau <maumau307@gmail.com> wrote: >> Oh, I didn't know you support FDW approach mainly for analytics. I >> guessed the first target was OLTP read-write scalability. > > That seems like a harder target to me, because you will have an extra > hop involved -- SQL from the client to the first server, then via SQL > to a second server. The work of parsing and planning also has to be > done twice, once for the foreign table and again for the table. For > longer-running queries this overhead doesn't matter as much, but for > short-running queries it is significant. From: Simon Riggs On 1 June 2018 at 04:00, MauMau <maumau307@gmail.com> wrote: >> The SQL processor should be one layer, not two layers. > For OLTP, that would be best. But it would be restricted to > single-node requests, leaving you the problem of how you know ahead of > time whether an SQL statement was single node or not. > > Using a central coordinator node allows us to hide the decision of > single-node/multi-node from the user which seems essential for general > SQL. If you are able to restrict the types of requests users make then > we can do direct access to partitions - so there is scope for a > single-node API, as Mongo provides. I don't think an immediate server like the coordinators in XL is necessary. That extra hop can be eliminated by putting both the coordinator and the data node roles in the same server process. That is, the node to which an application connects communicates with other nodes only when it does not necessary data. Furthermore, an extra hop and double parsing/planning could matter for analytic queries, too. For example, SAP HANA boasts of scanning 1 billion rows in one second. In HANA's scaleout architecture, an application can connect to any worker node and the node communicates with other nodes only when necessary (there's one special node called "master", but it manages the catalog and transactions; it's not an extra hop like the coordinator in XL). Vertica is an MPP analytics database, but it doesn't have a node like the coordinator, either. To achieve maximum performance for real-time queries, the scaleout architecture should avoid an extra hop when possible. > Using a central coordinator also allows multi-node transaction > control, global deadlock detection etc.. VoltDB does not have an always-pass hop like the coordinator in XL. Our proprietary RDBMS named Symfoware, which is not based on PostgreSQL, also doesn't have an extra hop, and can handle distributed transactions and deadlock detection/resolution without any special node like GTM. Regards MauMau
pgsql-hackers by date: