Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
Date
Msg-id CAEze2WhfkYQbyRaBptT2BQ5Sv-DN0gEWCuSMWbQzCwyb8qjkBA@mail.gmail.com
Whole thread Raw
In response to Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
List pgsql-hackers
On Tue, 11 Jun 2024 at 04:01, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Fri, Jun 07, 2024 at 11:10:25AM +0200, Matthias van de Meent wrote:
> > My primary concern isn't the IO, but the O(shared_buffers) that we
> > have to go through during a checkpoint. As I mentioned upthread, it is
> > reasonably possible the new cluster is already setup with a good
> > fraction of the old system's shared_buffers configured. Every
> > checkpoint has to scan all those buffers, which IMV can get (much)
> > more expensive than the IO overhead caused by the WAL_LOG strategy. It
> > may be a baseless fear as I haven't done the performance benchmarks
> > for this, but I wouldn't be surprised if shared_buffers=8GB would
> > measurably impact the upgrade performance in the current patch (vs the
> > default 128MB).
>
> I did a handful of benchmarks on an r5.24xlarge that seem to prove your
> point.  The following are the durations of the pg_restore step of
> pg_upgrade:
>
> * 10k empty databases, 128MB shared_buffers
>   WAL_LOG:   1m 01s
>   FILE_COPY: 0m 22s
>
> * 10k empty databases, 100GB shared_buffers
>   WAL_LOG:   2m 03s
>   FILE_COPY: 5m 08s
>
> * 2.5k databases with 10k tables each, 128MB shared_buffers
>   WAL_LOG:   17m 20s
>   FILE_COPY: 16m 44s
>
> * 2.5k databases with 10k tables each, 100GB shared_buffers
>   WAL_LOG:   16m 39s
>   FILE_COPY: 15m 21s
>
> I was surprised with the last result, but there's enough other stuff
> happening during such a test that I hesitate to conclude much.

If you still have the test data set up, could you test the attached
patch (which does skip the checkpoints in FILE_COPY mode during binary
upgrades)?

>>>> If such a change were implemented (i.e. no checkpoints for FILE_COPY
>>>> in binary upgrade, with a single manual checkpoint after restoring
>>>> template1 in create_new_objects) I think most of my concerns with this
>>>> patch would be alleviated.
>>>
>>> Yeah, I think that's a valid point. The second checkpoint is to ensure
>>> that the XLOG_DBASE_CREATE_FILE_COPY never gets replayed. However, for
>>> binary upgrades, we don't need that guarantee because a checkpoint
>>> will be performed during shutdown at the end of the upgrade anyway.
>>
>> Indeed.
>
> It looks like pg_dump always uses template0, so AFAICT we don't even need
> the suggested manual checkpoint after restoring template1.

Thanks for reminding me. It seems I misunderstood the reason why we
first process template1 in create_new_objects, as I didn't read the
comments thoroughly enough.

> I do like the idea of skipping a bunch of unnecessary operations in binary
> upgrade mode, since it'll help me in my goal of speeding up pg_upgrade.
> But I'm a bit hesitant to get too fancy here and to introduce a bunch of
> new "if (IsBinaryUpgrade)" checks if the gains in the field won't be all
> that exciting.  However, we've already sprinkled such checks quite
> liberally, so maybe I'm being too cautious...

Hmm, yes. From an IO perspective I think this could be an improvement,
but let's check the numbers first.

Kind regards,

Matthias van de Meent

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: confirmed flush lsn seems to be move backward in certain error cases
Next
From: Amit Kapila
Date:
Subject: Re: 001_rep_changes.pl fails due to publisher stuck on shutdown