Re: Proposal: "Causal reads" mode for load balancing reads without stale data - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date
Msg-id CAEepm=3OgFBXONe-2=P+nkROovQ7cOsD3Q7TbTE0SQg-Oe=fSA@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: "Causal reads" mode for load balancing reads without stale data  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Sun, Nov 15, 2015 at 11:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 12 November 2015 at 18:25, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
 
 I don't want to get bogged down in details, while we're talking about the 30,000 foot view).

Hmm, if that's where we're at, I'll summarize my thoughts.

All of this discussion presupposes we are distributing/load balancing queries so that reads and writes might occur on different nodes.

We need a good balancer. Any discussion of this that ignores the balancer component is only talking about half the solution. What we need to do is decide whether functionality should live in the balancer or the core. 

Your option (1) is viable, but only in certain cases. We could add support for some token/wait mechanism but as you say, this would require application changes not pooler changes.

Your option (2) is wider but also worse in some ways. It can be implemented in a pooler. 

Your option (3) doesn't excite me much. You've got a load of stuff that really should happen in a pooler. And at its core we have synchronous_commit = apply but with a timeout rather than a wait. So anyway, consider me nudged to finish my patch to provide capability for that by 1 Jan.
 
Just to be clear, this patch doesn't use a "timeout rather than a wait".  It always waits for the current set of available causal reads standbys to apply the commit.  It's just that nodes get kicked out of that set pretty soon if they don't keep up, a bit like a RAID controller dropping a failing disk.  And it does so using a protocol that ensures that the dropped standby starts raising the error, even if contact has been lost with it, so the causal reads guarantee is maintained at all times for all clients.

On a related note, any further things like "GUC causal_reads_standby_names" should be implemented by Node Registry as a named group of nodes. We can have as many arbitrary groups of nodes as we want. If that sounds strange look back at exactly why GUCs are called GUCs.

Agreed, the application_name whitelist stuff is clunky.  I left it out of the first version I posted, not wanting the focus of this proposal to be side-tracked.  But as Ants Aasma pointed out, some users might need something like that, so I posted a 2nd version that follows the established example, again not wanting to distract with anything new in that area.  Of course that would eventually be replaced/improved as part of a future node topology management project.

--

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] Refactoring of LWLock tranches
Next
From: Robert Haas
Date:
Subject: Re: Parallel Seq Scan