On Tue, 11 Jun 2024 at 04:01, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Fri, Jun 07, 2024 at 11:10:25AM +0200, Matthias van de Meent wrote:
> > My primary concern isn't the IO, but the O(shared_buffers) that we
> > have to go through during a checkpoint. As I mentioned upthread, it is
> > reasonably possible the new cluster is already setup with a good
> > fraction of the old system's shared_buffers configured. Every
> > checkpoint has to scan all those buffers, which IMV can get (much)
> > more expensive than the IO overhead caused by the WAL_LOG strategy. It
> > may be a baseless fear as I haven't done the performance benchmarks
> > for this, but I wouldn't be surprised if shared_buffers=8GB would
> > measurably impact the upgrade performance in the current patch (vs the
> > default 128MB).
>
> I did a handful of benchmarks on an r5.24xlarge that seem to prove your
> point. The following are the durations of the pg_restore step of
> pg_upgrade:
>
> * 10k empty databases, 128MB shared_buffers
> WAL_LOG: 1m 01s
> FILE_COPY: 0m 22s
>
> * 10k empty databases, 100GB shared_buffers
> WAL_LOG: 2m 03s
> FILE_COPY: 5m 08s
>
> * 2.5k databases with 10k tables each, 128MB shared_buffers
> WAL_LOG: 17m 20s
> FILE_COPY: 16m 44s
>
> * 2.5k databases with 10k tables each, 100GB shared_buffers
> WAL_LOG: 16m 39s
> FILE_COPY: 15m 21s
>
> I was surprised with the last result, but there's enough other stuff
> happening during such a test that I hesitate to conclude much.
If you still have the test data set up, could you test the attached
patch (which does skip the checkpoints in FILE_COPY mode during binary
upgrades)?
>>>> If such a change were implemented (i.e. no checkpoints for FILE_COPY
>>>> in binary upgrade, with a single manual checkpoint after restoring
>>>> template1 in create_new_objects) I think most of my concerns with this
>>>> patch would be alleviated.
>>>
>>> Yeah, I think that's a valid point. The second checkpoint is to ensure
>>> that the XLOG_DBASE_CREATE_FILE_COPY never gets replayed. However, for
>>> binary upgrades, we don't need that guarantee because a checkpoint
>>> will be performed during shutdown at the end of the upgrade anyway.
>>
>> Indeed.
>
> It looks like pg_dump always uses template0, so AFAICT we don't even need
> the suggested manual checkpoint after restoring template1.
Thanks for reminding me. It seems I misunderstood the reason why we
first process template1 in create_new_objects, as I didn't read the
comments thoroughly enough.
> I do like the idea of skipping a bunch of unnecessary operations in binary
> upgrade mode, since it'll help me in my goal of speeding up pg_upgrade.
> But I'm a bit hesitant to get too fancy here and to introduce a bunch of
> new "if (IsBinaryUpgrade)" checks if the gains in the field won't be all
> that exciting. However, we've already sprinkled such checks quite
> liberally, so maybe I'm being too cautious...
Hmm, yes. From an IO perspective I think this could be an improvement,
but let's check the numbers first.
Kind regards,
Matthias van de Meent