Re: 9.6 -> 10.0 - Mailing list pgsql-advocacy

From Magnus Hagander
Subject Re: 9.6 -> 10.0
Date
Msg-id CABUevEyEeeOwGaq-xPLL5TnLn1RR7dpKXiw6dr-6cJB0TYxTVA@mail.gmail.com
Whole thread Raw
In response to Re: 9.6 -> 10.0  (Petr Jelinek <petr@2ndquadrant.com>)
List pgsql-advocacy
On Thu, May 12, 2016 at 6:41 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 12/05/16 18:09, Bruce Momjian wrote:
On Thu, May 12, 2016 at 05:43:28PM +0200, Magnus Hagander wrote:

On May 12, 2016 17:36, "Bruce Momjian" <bruce@momjian.us> wrote:
In a master/slave setup with pg_logical, a major upgrade is _near_
zero-downtime, because you have to switch over all write transactions at
a single point in time when you promote the slave to master.  So you
have to either prevent new write transactions from going to the slave
while you wait for the master transactions to finish, or (more likely)
you have to terminate the write transactions on the master and then
promote the slave to master and allow everything to reconnect.

(In practice, you can't change a read/write server to read-only without
a restart, so effectively all old-master transactions have to be drained
at some point.)

You can make it closer to, or completely zero, if you combine it with pgbouncer
in transaction pooling mode. There will be a performance hiccup, but it should
work.

That is an interesting approach.  How many applications are prepared to
re-sent a transaction block based on the error returned by pgbouncer in
this case?

I am thinking our docs need a new section about reducing downtime during
switch-over, and using logical replication for major version upgrades.


There is no error, in pgbouncer you can pause connections while waiting for running transactions to finish, change the config for the databases to point to the new server and then on resume and it will send the new transactions to the new server. From application point of view this looks like momentary latency increase, not as error. I did live demo of this using continuously running pgbench during the upgrade/switchover on several conferences.

Yeah, that's the method I was referring to. If the application can be cleanly running in that mode, it can be with just a small latency hiccup. For a lot of cases, I've seen customers where the heavy part of the application can run through that, and some things need a direct thing (e.g. you can't run LISTEN/NOTIFY in transaction pooling mode), but you can get quite close to zero downtime even in those cases.

And yes, now that you mention it, I do remember seeing you doing such a demo :)

--

pgsql-advocacy by date:

Previous
From: Euler Taveira
Date:
Subject: Re: 9.6 -> 10.0
Next
From: Petr Jelinek
Date:
Subject: Re: New versioning scheme