pgsql: Cope with data-offset-less archive files during out-of-order res - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Cope with data-offset-less archive files during out-of-order res
Date
Msg-id E1jwTmW-0005So-HX@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Cope with data-offset-less archive files during out-of-order restores.

pg_dump produces custom-format archive files that lack data offsets
when it is unable to seek its output.  Up to now that's been a hazard
for pg_restore.  But if pg_restore is able to seek in the archive
file, there is no reason to throw up our hands when asked to restore
data blocks out of order.  Instead, whenever we are searching for a
data block, record the locations of the blocks we passed over (that
is, fill in the missing data-offset fields in our in-memory copy of
the TOC data).  Then, when we hit a case that requires going
backwards, we can just seek back.

Also track the furthest point that we've searched to, and seek back
to there when beginning a search for a new data block.  This avoids
possible O(N^2) time consumption, by ensuring that each data block
is examined at most twice.  (On Unix systems, that's at most twice
per parallel-restore job; but since Windows uses threads here, the
threads can share block location knowledge, reducing the amount of
duplicated work.)

We can also improve the code a bit by using fseeko() to skip over
data blocks during the search.

This is all of some use even in simple restores, but it's really
significant for parallel pg_restore.  In that case, we require
seekability of the input already, and we will very probably need
to do out-of-order restores.

Back-patch to v12, as this fixes a regression introduced by commit
548e50976.  Before that, parallel restore avoided requesting
out-of-order restores, so it would work on a data-offset-less
archive.  Now it will again.

Ideally this patch would include some test coverage, but there are
other open bugs that need to be fixed before we can extend our
coverage of parallel restore very much.  Plan to revisit that later.

David Gilman and Tom Lane; reviewed by Justin Pryzby

Discussion: https://postgr.es/m/CALBH9DDuJ+scZc4MEvw5uO-=vRyR2=QF9+Yh=3hPEnKHWfS81A@mail.gmail.com

Branch
------
REL_13_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/71e8e66f783c143b04d381566d7a2a61e8d7598f

Modified Files
--------------
doc/src/sgml/ref/pg_restore.sgml   |  15 ++--
src/bin/pg_dump/pg_backup_custom.c | 136 +++++++++++++++++++++++++++++--------
2 files changed, 117 insertions(+), 34 deletions(-)


pgsql-committers by date:

Previous
From: Peter Geoghegan
Date:
Subject: pgsql: Avoid CREATE INDEX unique index deduplication.
Next
From: Peter Geoghegan
Date:
Subject: pgsql: Add Valgrind buffer access instrumentation.