Dimitrios Apostolou <jimis@gmx.net> writes:
> On Thu, 20 Mar 2025, Tom Lane wrote:
>> I am betting that the problem is that the dump's TOC (table of
>> contents) lacks offsets to the actual data of the database objects,
>> and thus the readers have to reconstruct that information by scanning
>> the dump file. Normally, pg_dump will back-fill offset data in the
>> TOC at completion of the dump, but if it's told to write to an
>> un-seekable output file then it cannot do that.
> Further questions:
> * Does the same happen in an uncompressed dump? Or maybe the offsets are
> pre-filled because they are predictable without compression?
Yes; no. We don't know the size of a table's data as-dumped until
we've dumped it.
> * Should pg_dump print some warning for generating a lower quality format?
I don't think so. In many use-cases this is irrelevant and the
warning would just be an annoyance.
> * The seeking pattern in pg_restore seems non-sensical to me: reading 4K,
> jumping 8-12K, repeat for the whole file? Consuming 15K IOPS for an
> hour. /Maybe/ something to improve there... Where can I read more about
> the format?
It's reading data blocks (or at least the headers thereof), which have
a limited size. I don't think that size has changed since circa 1999,
so maybe we could consider increasing it; but I doubt we could move
the needle very far that way.
> * Why doesn't it happen in single-process pg_restore?
A single-process restore is going to restore all the data in the order
it appears in the archive file, so no seeking is required. Of course,
as soon as you ask for parallelism, that doesn't work too well.
Hypothetically, maybe the algorithm for handing out tables-to-restore
to parallel workers could pay attention to the distance to the data
... except that in the problematic case we don't have that
information. I don't recall for sure, but I think that the order of
the TOC entries is not necessarily a usable proxy for the order of the
data entries. It's unclear to me that overriding the existing
heuristic (biggest tables first, I think) would be a win anyway.
regards, tom lane