Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Horizontal scalability/sharding
Date
Msg-id CAB7nPqTNeTB0wNHha7awfvQb8Rj78pkd03ti0nBOrktZBH_vUA@mail.gmail.com
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Sep 3, 2015 at 3:41 AM, Robert Haas wrote:
> 3. IIUC, Postgres-XC handles this problem by reducing at least
> volatile functions, maybe all functions, to constants.  Then it
> generates an SQL statement to be sent to the data node to make the
> appropriate change.  If there's more than one copy of the data, we
> send a separate copy of the SQL statement to every node.  I'm not sure
> exactly what happens if some of those nodes are not available, but I
> don't think it's anything good.  Fundamentally, this model doesn't
> allow for many good options in that case.

I don't recall that. Immutable functions are switched to constants in
the query sent to datanodes. Volatile and stable functions are
evaluated locally after fetching the results from the remote node. Not
that efficient for warehouse loads. My 2c.

> 4. Therefore, I think that we should instead use logical replication,
> which might be either synchronous or asynchronous.  When you modify
> one copy of the data, that change will then be replicated to all other
> nodes.  If you are OK with eventual consistency, this replication can
> be asynchronous, and nodes that are off-line will catch up when they
> are on-line.  If you are not OK with that, then you must replicate
> synchronously to every node before transaction commit; or at least you
> must replicate synchronously to every node that is currently on-line.
> This presents some challenges: logical decoding currently can't
> replicate transactions that are still in process - replication starts
> when the transaction commits.  Also, we don't have any way for
> synchronous replication to wait for multiple nodes.

That's something that the quorum synchronous patch would address.
Still, having the possibility to be synchronous across multiple nodes
does not seem like to be something at the top of the list.

> Also, the GTM needs to be aware that this stuff is happening, or it will DTWT.  That too seems like a problem that
canbe solved.
 

If I understood correctly, yes it is with its centralized transaction
facility each node is aware of the transaction status via the global
snapshot.
-- 
Michael



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Allow replication roles to use file access functions
Next
From: Michael Paquier
Date:
Subject: Re: Allow replication roles to use file access functions