Re: Pg_upgrade speed for many tables - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Pg_upgrade speed for many tables
Date
Msg-id CAMkU=1wiOoSt3gPvqyv_1zehCYfRyjTwVPFbZO0Y6q8ZDAd=tw@mail.gmail.com
Whole thread Raw
In response to Re: Pg_upgrade speed for many tables  (Bruce Momjian <bruce@momjian.us>)
Responses Use of fsync; was Re: Pg_upgrade speed for many tables  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote:
>>
>> Is turning off synchronous_commit enough?  What about turning off fsync?
>
> I did some testing with the attached patch on a magnetic disk with no
> BBU that turns off fsync;

With which file system? I wouldn't expect you to see a benefit with
ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits
"group fsync" of new file creations but rather does each one serially.Whether it is worth applying a fix that is only
neededfor that one
 
file system, I don't know.  The trade-offs are not all that clear to
me yet.

>  I got these results
>
>                  sync_com=off  fsync=off
>             1        15.90     13.51
>          1000        26.09     24.56
>          2000        33.41     31.20
>          4000        57.39     57.74
>          8000       102.84    116.28
>         16000       189.43    207.84
>
> It shows fsync faster for < 4k, and slower for > 4k.  Not sure why this
> is the cause but perhaps the buffering of the fsync is actually faster
> than doing a no-op fsync.

synchronous-commit=off turns off not only the fsync at each commit,
but also the write-to-kernel at each commit; so it is not surprising
that it is faster at large scale.  I would specify both
synchronous-commit=off and fsync=off.


>> When I'm doing a pg_upgrade with thousands of tables, the shutdown
>> checkpoint after restoring the dump to the new cluster takes a very
>> long time, as the writer drains its operation table by opening and
>> individually fsync-ing thousands of files.  This takes about 40 ms per
>> file, which I assume is a combination of slow lap-top disk drive, and
>> a strange deal with ext4 which makes fsyncing a recently created file
>> very slow.   But even with faster hdd, this would still be a problem
>> if it works the same way, with every file needing 4 rotations to be
>> fsynced and this happens in serial.
>
> Is this with the current code that does synchronous_commit=off?  If not,
> can you test to see if this is still a problem?

Yes, it is with synchronous_commit=off. (or if it wasn't originally,
it is now, with the same result)

Applying your fsync patch does solve the problem for me on ext4.
Having the new cluster be on ext3 rather than ext4 also solves the
problem, without the need for a patch; but it would be nice to more
friendly to ext4, which is popular even though not recommended.

>>
>> Anyway, the reason I think turning fsync off might be reasonable is
>> that as soon as the new cluster is shut down, pg_upgrade starts
>> overwriting most of those just-fsynced file with other files from the
>> old cluster, and AFAICT makes no effort to fsync them.  So until there
>> is a system-wide sync after the pg_upgrade finishes, your new cluster
>> is already in mortal danger anyway.
>
> pg_upgrade does a cluster shutdown before overwriting those files.

Right.  So as far as the cluster is concerned, those files have been
fsynced.  But then the next step is go behind the cluster's back and
replace those fsynced files with different files, which may or may not
have been fsynced.  This is what makes me thing the new cluster is in
mortal danger.  Not only have the new files perhaps not been fsynced,
but the cluster is not even aware of this fact, so you can start it
up, and then shut it down, and it still won't bother to fsync them,
because as far as it is concerned they already have been.

Given that, how much extra danger would be added by having the new
cluster schema restore run with fsync=off?

In any event, I think the documentation should caution that the
upgrade should not be deemed to be a success until after a system-wide
sync has been done.  Even if we use the link rather than copy method,
are we sure that that is safe if the directories recording those links
have not been fsynced?

Cheers,

Jeff



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Do we need so many hint bits?
Next
From: Peter Geoghegan
Date:
Subject: Re: Doc patch making firm recommendation for setting the value of commit_delay