Thread: [ADMIN] Shutdown Order with Primary/Standby?
Afternoon,
Just wondering if there's a best practice in regards to shutdown of primary and standbys, as I approach my first planned maintenance task as a newbie PostgreSQL DBA. I found one older post that suggested doing a fast-mode shutdown on primary first so that all transaction info can be cleanly pushed to standby, then shutdown the standby. Our big app group runs on 9.2.18 and is being upgraded to 9.2.22.
Extending that scenario, if you had a cascading standby, would you go primary -> upstream standby -> downstream standby?
Curious what folks think.
Exactly what I would do, yes.
From: pgsql-admin-owner@postgresql.org <pgsql-admin-owner@postgresql.org> on behalf of Don Seiler <don@seiler.us>
Sent: Thursday, August 10, 2017 12:59 PM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Shutdown Order with Primary/Standby?
Sent: Thursday, August 10, 2017 12:59 PM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Shutdown Order with Primary/Standby?
Extending that scenario, if you had a cascading standby, would you go primary -> upstream standby -> downstream standby?
www.seiler.us We use VictorOps for our paging/notification system, and we're pretty happy with it so far. On the DBA team, we've just been using a simple email gateway to send ... |
Journyx, Inc.
7600 Burnet Road #300
Austin, TX 78757
www.journyx.com
Austin, TX 78757
www.journyx.com
p 512.834.8888
f 512-834-8858
Do you receive our promotional emails? Click here or visit http://journyx.com/communication-preferences to unsubscribe.
On Thu, Aug 10, 2017 at 11:59 AM, Don Seiler <don@seiler.us> wrote: > Afternoon, > > Just wondering if there's a best practice in regards to shutdown of primary > and standbys, as I approach my first planned maintenance task as a newbie > PostgreSQL DBA. I found one older post that suggested doing a fast-mode > shutdown on primary first so that all transaction info can be cleanly pushed > to standby, then shutdown the standby. Our big app group runs on 9.2.18 and > is being upgraded to 9.2.22. > > Extending that scenario, if you had a cascading standby, would you go > primary -> upstream standby -> downstream standby? Set keep wal segments to something largish (1000 or so) well before the upgrade etc. Make sure the volume holding pg_xlog can hold 1000*16MB of data. This ensures the streaming replicant can catch up if some stuff happens before it's back up. Best practices are to first eliminate client access to the db cluster so there's no updates going on up to the last second and all that. A great tool for this is pgbouncer, which can pause its connections while you do the upgrade. After that, it really doesn't matter the order, but yes, I generally shut down the source, then the destination machines.
Yes we'll be shutting down the app servers in advance.
On Aug 10, 2017 1:13 PM, "Scott Marlowe" <scott.marlowe@gmail.com> wrote:
On Thu, Aug 10, 2017 at 11:59 AM, Don Seiler <don@seiler.us> wrote:
> Afternoon,
>
> Just wondering if there's a best practice in regards to shutdown of primary
> and standbys, as I approach my first planned maintenance task as a newbie
> PostgreSQL DBA. I found one older post that suggested doing a fast-mode
> shutdown on primary first so that all transaction info can be cleanly pushed
> to standby, then shutdown the standby. Our big app group runs on 9.2.18 and
> is being upgraded to 9.2.22.
>
> Extending that scenario, if you had a cascading standby, would you go
> primary -> upstream standby -> downstream standby?
Set keep wal segments to something largish (1000 or so) well before
the upgrade etc. Make sure the volume holding pg_xlog can hold
1000*16MB of data. This ensures the streaming replicant can catch up
if some stuff happens before it's back up.
Best practices are to first eliminate client access to the db cluster
so there's no updates going on up to the last second and all that. A
great tool for this is pgbouncer, which can pause its connections
while you do the upgrade.
After that, it really doesn't matter the order, but yes, I generally
shut down the source, then the destination machines.
On Thu, Aug 10, 2017 at 1:13 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Set keep wal segments to something largish (1000 or so) well before
the upgrade etc. Make sure the volume holding pg_xlog can hold
1000*16MB of data. This ensures the streaming replicant can catch up
if some stuff happens before it's back up.
If we have both primary and standby down at the same time, would this really still be necessary? FWIW right now ours is set to keep 128.
Also, going back to my original question. Once both are down, is it best practice to perform patching/upgrades on the standby first (starting furthest downstream if cascading)? e.g. patch/upgrade the standby (via standard CentOS7 yum from the repo), then start the standby DB and verify nothing has broken, then do the same to the upstream or primary?
Don.
Don Seiler
www.seiler.us
www.seiler.us
On Fri, Aug 11, 2017 at 8:10 AM, Don Seiler <don@seiler.us> wrote: > On Thu, Aug 10, 2017 at 1:13 PM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> Set keep wal segments to something largish (1000 or so) well before >> the upgrade etc. Make sure the volume holding pg_xlog can hold >> 1000*16MB of data. This ensures the streaming replicant can catch up >> if some stuff happens before it's back up. > > > If we have both primary and standby down at the same time, would this really > still be necessary? FWIW right now ours is set to keep 128. It's nice in case the replica hangs or acts up, you won't have to resubscribe it should it run through the first 128 wal segments etc. > Also, going back to my original question. Once both are down, is it best > practice to perform patching/upgrades on the standby first (starting > furthest downstream if cascading)? e.g. patch/upgrade the standby (via > standard CentOS7 yum from the repo), then start the standby DB and verify > nothing has broken, then do the same to the upstream or primary? Lots of ways to do this. One way is to make a replicant, remove it from the cluster and fail it over and then upgrade it and test the crap out of it distinctly from production. Load test, proper behavior testing, and so on. If you're doing minor upgrades, you don't need to bring down the whole cluster, you can take a single replica out and upgrade pg on it, then put it back in the cluster. Again, if you have enough kept wal segments, or you use replication slots, then it will just catch right up.
On Fri, Aug 11, 2017 at 11:07 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
If you're doing minor upgrades, you don't need to bring down the whole
cluster, you can take a single replica out and upgrade pg on it, then
put it back in the cluster. Again, if you have enough kept wal
segments, or you use replication slots, then it will just catch right
up.
What do you mean by "take it out of the cluster" and "put it back in the cluster"? Shut it down and rename the recovery.conf? Or more complexity than that? Why would that be needed at all?
Don Seiler
www.seiler.us
www.seiler.us
On Fri, Aug 11, 2017 at 2:20 PM, Don Seiler <don@seiler.us> wrote: > On Fri, Aug 11, 2017 at 11:07 AM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> If you're doing minor upgrades, you don't need to bring down the whole >> cluster, you can take a single replica out and upgrade pg on it, then >> put it back in the cluster. Again, if you have enough kept wal >> segments, or you use replication slots, then it will just catch right >> up. > > > What do you mean by "take it out of the cluster" and "put it back in the > cluster"? Shut it down and rename the recovery.conf? Or more complexity than > that? Why would that be needed at all? Yeah, basically touch the the file defined to make it fail over so it becomes its own master. Then test upgrade it and test the crap out of it separate from production. If all works fine, then resubscribe it and now you know your upgrades in production should work.