Re: Proposal: "Causal reads" mode for load balancing reads without stale data - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date
Msg-id CANP8+j+BgzJ0b3-M8RmYBxHzwCK3C0UR4Zy27uBQNFo7KKa_qA@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: "Causal reads" mode for load balancing reads without stale data  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Proposal: "Causal reads" mode for load balancing reads without stale data  (Craig Ringer <craig@2ndquadrant.com>)
Re: Proposal: "Causal reads" mode for load balancing reads without stale data  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 15 November 2015 at 14:50, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Nov 15, 2015 at 5:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Hmm, if that's where we're at, I'll summarize my thoughts.
>
> All of this discussion presupposes we are distributing/load balancing
> queries so that reads and writes might occur on different nodes.

Agreed.  I think that's a pretty common pattern, though certainly not
the only one.

It looks to me this functionality is only of use in a pooler. Please explain how else this would be used.
 
> Your option (2) is wider but also worse in some ways. It can be implemented
> in a pooler.
>
> Your option (3) doesn't excite me much. You've got a load of stuff that
> really should happen in a pooler. And at its core we have synchronous_commit
> = apply but with a timeout rather than a wait.

I don't see how either option (2) or option (3) could be implemented
in a pooler.  How would that work?

My starting thought was that (1) was the only way forwards. Through discussion, I now see that its not the best solution for the general case.

The pooler knows which statements are reads and writes, it also knows about transaction boundaries, so it is possible for it to perform the waits for either (2) or (3). The pooler *needs* to know which nodes it can route queries to, so it looks to me that the pooler is the best place to put waits and track status of nodes, no matter when we wait. I don't see any benefit in having other nodes keep track of node status since that will just replicate work that *must* be performed in the pooler.

I would like to see a load balancing pooler in Postgres.
 
--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Getting sorted data from foreign server for merge join
Next
From: Craig Ringer
Date:
Subject: Re: Getting sorted data from foreign server for merge join