Re: Shortest offline window on database migration - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Shortest offline window on database migration
Date
Msg-id 20190530152914.l2h5f4fomro37ieh@development
Whole thread Raw
In response to Shortest offline window on database migration  (Haroldo Kerry <hkerry@callix.com.br>)
Responses RE: Shortest offline window on database migration
List pgsql-performance
On Thu, May 30, 2019 at 12:08:04PM -0300, Haroldo Kerry wrote:
>Hello,
>
>We are migrating our PostgreSQL 9.6.10 database (with streaming replication
>active) to a faster disk array.
>We are using this opportunity to enable checksums, so we will have to do a
>full backup-restore.
>The database size is about 500GB, it takes about 2h:30min for a full
>backup, and then about 1h to fully restore it with checksum enabled on the
>new array, plus 2h to recreate the replica on the old array.
>
>Although all synthetic tests (pgbench) indicate the new disk array is
>faster, we will only be 100% confident once we see its performance on
>production, so our backup plan is using our replica database on the older
>array. If the new array performance is poor during production ramp up, we
>can switch to the replica with little impact to our customers.
>
>Problem is the offline window for backup, restore the full database with
>checksum and recreate the replica is about 5h:30m.
>
>One thing that occurred to us to shorten the offline window was restoring
>the database to both the master and replica in parallel (of course we would
>configure the replica as master do restore the database), that would shave
>1h of the total time. Although this is not documented we thought that
>restoring the same database to identical servers would result in binary
>identical data files.
>
>We tried this in lab. As this is not a kosher way to create a replica, we
>ran a checksum comparison of all data files, and we ended up having a lot
>of differences. Bummer. Both master and replica worked (no errors on logs),
>but we ended up insecure about this path because of the binary differences
>on data files.
>But in principle it should work, right?

What should work? Backup using pg_dump and restore certainly won't give
you the same binary files - the commit timestamps will be different,
operations may happen in a different order (esp. with parallel restore),
and so on. And the instances don't start as a copy anyway, so there will
be different system IDs, etc.

So no, this is not a valid way to provision master/standby cluster.

>Has anyone been through this type of problem?
>

Unfortunately, I don't think there's a much better solution that what you
initially described - dump/restore, and then creating a replica.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-performance by date:

Previous
From: Haroldo Kerry
Date:
Subject: Shortest offline window on database migration
Next
From: Steven Winfield
Date:
Subject: RE: Shortest offline window on database migration