Re: Shortest offline window on database migration - Mailing list pgsql-performance

From Stephen Frost
Subject Re: Shortest offline window on database migration
Date
Msg-id 20190604201835.GV2480@tamriel.snowman.net
Whole thread Raw
In response to Re: Shortest offline window on database migration  (Haroldo Kerry <hkerry@callix.com.br>)
List pgsql-performance
Greetings,

* Haroldo Kerry (hkerry@callix.com.br) wrote:
> The bottleneck at dump is CPU (a single one, on a 44 thread server), as we
> are using the -Fc option, that does not allow multiple jobs.
> We tried some time ago to use the --jobs option of pg_dump but it was
> slower, even with more threads. Our guess is the sheer volume of files
> outweighs the processing gains of using a compressed file output. Also
> pg_dump even with multiple jobs spends a lot of time (1h+) on the "reading
> dependency data" section that seems to be single threaded (our application
> is multi-tenant and we use schemas to separate tenant data, hence we have a
> lot of tables).

You might want to reconsider using the separate-schemas-for-tenants
approach.  This isn't the only annoyance you can run into with lots and
lots of tables.  That said, are you using the newer version of pg_dump
(which is what you should be doing when migrating to a newer version of
PG, always)?  We've improved it over time, though I can't recall off-hand
if this particular issue was improved of in-between the releases being
discussed here.  Of course, lots of little files and dealing with them
could drag down performance when working in parallel.  Still a bit
surprised that it's ending up slower than -Fc.

> We are creating the replica using :
> docker exec pg-2 pg_basebackup -h 192.168.0.107 -U replication -P --xlog -D
> /var/lib/postgresql/data_9.6
> and it is taking 1h10m , instead of the 2h I reported initially, because we
> were using rsync with checksums to do it, after experimenting with
> pg_basebackup we found out it is faster, rsync was taking 1h just to
> calculate all checksums. Thanks for your insight on this taking too long.

So, it's a bit awkward still, unfortunately, but you can use pgbackrest
to effectively give you a parallel-replica-build.  The steps are
something like:

Get pgbackrest WAL archiving up and going, with the repo on the
destination server/filesystem, but have 'compress=n' in the
pgbackrest.conf for the repo.

Run: pgbackrest --stanza=mydb --type=full --process-max=8 backup

Once that's done, just do:

mv /path/to/repo/backup/mydb/20190605-120000F/pg_data /new/pgdata
chmod -R g-rwx /new/pgdata

Then in /new/pgdata, create a recovery.conf file like:

restore_command = 'pgbackrest --stanza=mydb archive-get %f "%p"'

And start up the DB server.

We have some ideas about how make that whole thing cleaner but the
rewrite into C has delayed our efforts, perhaps once that's done (this
fall), we can look at it.

Of course, you won't have an actual backup of the new database server at
that point yet, so you'll want to clean things up and make that happen
ASAP.  Another option, which is what I usually recommend, is just to
take a new backup (properly) and then do a restore from it, but that'll
obviously take longer since there's two copies being done instead of one
(though you can parallelize to your heart's content, so it can still be
quite fast if you have enough CPU and I/O).

Thanks,

Stephen

Attachment

pgsql-performance by date:

Previous
From: Vasilis Ventirozos
Date:
Subject: Strange query behaviour between 9.4 and 12beta1
Next
From: Tomas Vondra
Date:
Subject: Re: Strange query behaviour between 9.4 and 12beta1