Re: Speed up pg_checksums in cases where checksum already set - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Speed up pg_checksums in cases where checksum already set
Date
Msg-id YK8BH74lQNdb8Ro+@paquier.xyz
Whole thread Raw
In response to Speed up pg_checksums in cases where checksum already set  (Greg Sabino Mullane <htamfids@gmail.com>)
Responses Re: Speed up pg_checksums in cases where checksum already set
List pgsql-hackers
On Wed, May 26, 2021 at 05:23:55PM -0400, Greg Sabino Mullane wrote:
> The attached patch makes an optimization to pg_checksums which prevents
> rewriting the block if the checksum is already what we expect. This can
> lead to much faster runs in cases where it is already set (e.g. enabled ->
> disabled -> enable, external helper process, interrupted runs, future
> parallel processes).

Makes sense.

> There is also an effort to not sync the data directory
> if no changes were written. Finally, added a bit more output on how many
> files were actually changed, e.g.:

-    if (do_sync)
+    if (do_sync && total_files_modified)
     {
      pg_log_info("syncing data directory");
         fsync_pgdata(DataDir, PG_VERSION_NUM);

Here, I am on the edge.  It could be an advantage to force a flush of
the data folder anyway, no?  Say, all the pages have a correct
checksum and they are in the OS cache, but they may not have been
flushed yet.  That would emulate what initdb -S does already.

> Checksum operation completed
> Files scanned:   1236
> Blocks scanned:  23283
> Files modified:  38
> Blocks modified: 19194
> pg_checksums: syncing data directory
> pg_checksums: updating control file
> Checksums enabled in cluster

The addition of the number of files modified looks like an advantage.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Add ZSON extension to /contrib/
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Parallel Inserts in CREATE TABLE AS