Re: [PATCH v4] parallel pg_restore: avoid disk seeks when jumping short distance forward - Mailing list pgsql-hackers

From Dimitrios Apostolou
Subject Re: [PATCH v4] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date
Msg-id 9o35ps41-040p-n471-qoos-6q8648594768@tzk.arg
Whole thread Raw
In response to Re: [PATCH v4] parallel pg_restore: avoid disk seeks when jumping short distance forward  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH v4] parallel pg_restore: avoid disk seeks when jumping short distance forward
List pgsql-hackers
On Tuesday 2025-10-21 00:15, Tom Lane wrote:

>> So for me, the proposed patch actually makes it 2X slower.
>
> I went and tried this same test case on a 2024 Mac Mini M4 Pro.
> Cutting to the chase:
>
> HEAD:
>
> $ time pg_restore -f /dev/null -t zedtable bench10000.dump
>
> real    1m26.525s
> user    0m0.364s
> sys     0m6.806s
>
> Patched:
>
> $ time pg_restore -f /dev/null -t zedtable bench10000.dump
>
> real    0m15.419s
> user    0m0.279s
> sys     0m8.224s
>
> So on this hardware it *does* win (although maybe things would
> be different for a parallel restore).  The patched pg_restore
> takes just about the same amount of time as "cat", and iostat
> shows both of them reaching a bit more than 6GB/s read speed.
>
> My feeling at this point is that we'd probably drop the block
> size test as irrelevant, and instead simply ignore ctx->hasSeek
> within this loop if we think we're on a platform where that's
> the right thing.  But how do we figure that out?
>
> Not sure where we go from here, but clearly a bunch of research
> is going to be needed to decide whether this is committable.

pg_dump files from before your latest fix still exist, and they possibly
contain block header every 30 bytes (or however wide is the table rows).
A patch in pg_restore would vastly improve this use case.

May I suggest the attached patch, which replaces fseeko() with fread()
if the distance is 32KB or less? Sounds rather improbable that this
would make things worse, but maybe it's possible to generate a dump file
with 32KB wide rows, and try restoring on various hardware?

If this too is controversial, then we can reduce the number to 4KB. This
is the buffering that glibc does internally. By using the same in the
given patch, we avoid all the lseek(same-offset) repetitions between the
4K reads. This should be a strict gain, with no downsides.



Dimitris

Attachment

pgsql-hackers by date:

Previous
From: "Aya Iwata (Fujitsu)"
Date:
Subject: RE: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE
Next
From: Matheus Alcantara
Date:
Subject: Re: postgres_fdw: Use COPY to speed up batch inserts