Re: adding more space to the existing server - Mailing list pgsql-general

From Alban Hertroys
Subject Re: adding more space to the existing server
Date
Msg-id 13358DF7-5700-44A2-A86B-2A3721AA9650@gmail.com
Whole thread Raw
In response to Re: adding more space to the existing server  (Julie Nishimura <juliezain@hotmail.com>)
Responses Re: adding more space to the existing server
List pgsql-general
> On 2 Aug 2019, at 21:45, Julie Nishimura <juliezain@hotmail.com> wrote:

> 1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. That
iswhy we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby
(b),we can introduce twice larger host, then start the replication to the newly larger host, and when it is caught up,
breakit again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. After it is
caughtup, break the rep again, switch master->standby (if necessary). 

Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I
managedto mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I
wasattempting to replace for, like, 15 minutes). 

> 2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2
weekswith no standby situation 

No standby situation? Murphy is probably just waiting for that to strike…
I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytes
fromher main server (because of some failing disks in her RAID set) using her backup server to move data around (with
backupsremoved to make room), when, due to an accident outside the building, an aerial 10KV power line hit another
powerline in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant that the
filesystems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’s
wereat the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (And
thenit turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is what
heractual inquiry was about) 

I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental
perhaps?

From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested.

> 4) by pg_basebackup and restore
>
> As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into
2hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please point
meto some document which lists all steps describing breaking up the existing replication properly? we are using 9.6
postgres

I’m going to assume that you will have data coming in while this split is taking place and that you therefore cannot
offlinethe entire set of databases for as long as this takes. If not, that would probably allow for a simpler (faster)
scenario.

I think the easiest for this scenario would be to add two more machines (c and d) and replicate them off the current
setup.You want that to happen as parallel as possible, so perhaps replicate c off a and d off b. 

If you aren’t already using “replication slots”, I found that to make things both easier to understand and more
reliable.You can query their status, for one thing. 

Those replicas will take extra time of course (about double) because you’re replicating twice what you need, but I
don’tthink you can replicate parts of a cluster with your setup unless you go for a different replication approach (I
thinkper database replication requires statement level replication?). 

After that, decouple both sets into:
a —> b (your current machine)
c —> d (the new ones)

(Although any order should be fine, really, as long as they have caught up.)

At that point I would probably (temporarily) pause replication in at least one set and create a backup of that.

This is the point to start removing superfluous databases from a and c (so that a+c make up the complete set again).
After verifying that no databases are missing, unpause replication.

If instead you find that you accidentally removed a database from both a and c, you still have replicas to recover it
from.And the backups, of course, but that will not contain the data that came in after replication was paused. 

I do hope the remaining 3% disk space is enough to cover all that, though...

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: adding more space to the existing server
Next
From: Julie Nishimura
Date:
Subject: Re: adding more space to the existing server