Hello again,
I traced the seeking-reading behaviour of parallel pg_restore inside
_skipData() when called from _PrintTocData(). Since most of today's I/O
devices (both rotating and solid state) can read 1MB faster sequentially
than it takes to seek and read 4KB, I tried the following change:
diff --git a/src/bin/pg_dump/pg_backup_custom.c
b/src/bin/pg_dump/pg_backup_custom.c
index 55107b20058..262ba509829 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -618,31 +618,31 @@ _skipLOs(ArchiveHandle *AH)
* Skip data from current file position.
* Data blocks are formatted as an integer length, followed by data.
* A zero length indicates the end of the block.
*/
static void
_skipData(ArchiveHandle *AH)
{
lclContext *ctx = (lclContext *) AH->formatData;
size_t blkLen;
char *buf = NULL;
int buflen = 0;
blkLen = ReadInt(AH);
while (blkLen != 0)
{
- if (ctx->hasSeek)
+ if (ctx->hasSeek && blkLen > 1024 * 1024)
{
if (fseeko(AH->FH, blkLen, SEEK_CUR) != 0)
pg_fatal("error during file seek: %m");
}
else
{
if (blkLen > buflen)
{
free(buf);
buf = (char *) pg_malloc(blkLen);
buflen = blkLen;
}
if (fread(buf, 1, blkLen, AH->FH) != blkLen)
{
if (feof(AH->FH))
This simple change improves immensely (10x maybe, depends on the number of
workers) the offset-table building phase of the parallel backup.
A problem still remaining is that this offset-table building phase is done
on every worker process, which means that all workers scan almost in
parallel the whole archive. A more intrusive improvement would be to move
this phase to the parent process, before spawning the children.
What do you think?
Regards,
Dimitris
P.S. I also have a simple change that changes -j1 switch to mean "parallel
but with one worker process", that I did for debugging purposes. Not sure
if it is of interest here.