Re: parallel pg_restore blocks on heavy random read I/O on all children processes - Mailing list pgsql-performance

From Dimitrios Apostolou
Subject Re: parallel pg_restore blocks on heavy random read I/O on all children processes
Date
Msg-id 1cbb9bd6-60cd-92cb-c3c2-4cf4fd8a7b64@gmx.net
Whole thread Raw
In response to Re: parallel pg_restore blocks on heavy random read I/O on all children processes  (Dimitrios Apostolou <jimis@gmx.net>)
Responses Re: parallel pg_restore blocks on heavy random read I/O on all children processes
List pgsql-performance
Hello again,

I traced the seeking-reading behaviour of parallel pg_restore inside
_skipData() when called from _PrintTocData(). Since most of today's I/O
devices (both rotating and solid state) can read 1MB faster sequentially
than it takes to seek and read 4KB, I tried the following change:

diff --git a/src/bin/pg_dump/pg_backup_custom.c
b/src/bin/pg_dump/pg_backup_custom.c
index 55107b20058..262ba509829 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -618,31 +618,31 @@ _skipLOs(ArchiveHandle *AH)
   * Skip data from current file position.
   * Data blocks are formatted as an integer length, followed by data.
   * A zero length indicates the end of the block.
  */
  static void
  _skipData(ArchiveHandle *AH)
  {
         lclContext *ctx = (lclContext *) AH->formatData;
         size_t          blkLen;
         char       *buf = NULL;
         int                     buflen = 0;

         blkLen = ReadInt(AH);
         while (blkLen != 0)
         {
-               if (ctx->hasSeek)
+               if (ctx->hasSeek && blkLen > 1024 * 1024)
                 {
                         if (fseeko(AH->FH, blkLen, SEEK_CUR) != 0)
                                 pg_fatal("error during file seek: %m");
                 }
                 else
                 {
                         if (blkLen > buflen)
                         {
                                 free(buf);
                                 buf = (char *) pg_malloc(blkLen);
                                 buflen = blkLen;
                         }
                         if (fread(buf, 1, blkLen, AH->FH) != blkLen)
                         {
                                 if (feof(AH->FH))


This simple change improves immensely (10x maybe, depends on the number of
workers) the offset-table building phase of the parallel backup.

A problem still remaining is that this offset-table building phase is done
on every worker process, which means that all workers scan almost in
parallel the whole archive. A more intrusive improvement would be to move
this phase to the parent process, before spawning the children.

What do you think?

Regards,
Dimitris


P.S. I also have a simple change that changes -j1 switch to mean "parallel
but with one worker process", that I did for debugging purposes. Not sure
if it is of interest here.



pgsql-performance by date:

Previous
From: Dimitrios Apostolou
Date:
Subject: Re: parallel pg_restore blocks on heavy random read I/O on all children processes
Next
From: Dimitrios Apostolou
Date:
Subject: Re: parallel pg_restore blocks on heavy random read I/O on all children processes