Re: Warn when parallel restoring a custom dump without data offsets - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Warn when parallel restoring a custom dump without data offsets
Date
Msg-id 20200523224751.GP4472@telsasoft.com
Whole thread Raw
In response to Re: Warn when parallel restoring a custom dump without data offsets  (David Gilman <davidgilman1@gmail.com>)
Responses Re: Warn when parallel restoring a custom dump without data offsets
List pgsql-hackers
On Sat, May 23, 2020 at 03:54:30PM -0400, David Gilman wrote:
> I've rounded this patch out with a test and I've set up the commitfest
> website for this thread. The latest patches are attached and I think
> they are ready for review.

Thanks.  https://commitfest.postgresql.org/28/2568/
I'm not sure this will be considered a bugfix, since the behavior is known.
Maybe eligible for backpatch though (?)

Your patch was encoded, so this is failing:
http://cfbot.cputube.org/david-gilman.html

Ideally CFBOT would deal with that (maybe by using git-am - adding Thomas), but
I think you sent using gmail web interface, which also reordered the patches.
(CFBOT *does* sort them, but it's a known annoyance).

> dump file was written with data offsets pg_restore can seek directly to

offsets COMMA

> pg_restore would only find the TOC if it happened to be immediately

"immediately" is wrong, no ?  I thought the problem was if we seeked to D and
then looked for C, we wouldn't attempt to go backwards.

> read request only when restoring a custom dump file without data offsets.

remove "only"

> of a bunch of extra tiny reads when pg_restore starts up.

I would have thought to mention the seeks() ; but it's true that the read()s now
grow quadratically.  I did run a test, but I don't know how many objects would
be unreasonable or how many it'd take to show a problem.

Maybe we should avoid fseeko(0, SEEK_SET) unless we need to wrap around after
EOF - I'm not sure.

Maybe the cleanest way would be to pre-populate a structure with all the TOC
data and loop around that instead of seeking around the file ?  Can we use the
same structure as pg_dump ?

Otherwise, that makes me think of commit 42f70cd9c.  Make it's not a good
parallel or example for this case, though.

+        The custom archive format may not work with the <option>-j</option>
+        option if the archive was originally created by writing the archive
+        to an unseekable output file. For the best concurrent restoration

Can I suggest something like: pg_restore with parallel jobs may fail if the
archive dump was written to an unseekable output stream, like stdout.

+             * If the input file can't be seeked we're at the mercy of the

seeked COMMA

>Subject: [PATCH 3/3] Add integration test for out-of-order TOC requests in pg_restore

Well done - thanks for that.

>Also add undocumented --disable-seeking argument to pg_dump to emulate
>writing to an unseekable output file.

Remove "also".

Is it possible to dump to stdout (or pipe to cat or dd) to avoid a new option ?

Maybe that would involve changing the test process to use the shell (system() vs
execve()), or maybe you could write:

/* sh handles output redirection and arg splitting */
'sh', '-c', 'pg_dump -Fc -Z6 --no-sync --disable-seeking postgres >
$tempdir/defaults_custom_format_no_seek_parallel_restore.dump',

But I think that would need to then separately handle WIN32, so maybe it's not
worth it.



pgsql-hackers by date:

Previous
From: David Gilman
Date:
Subject: Re: Warn when parallel restoring a custom dump without data offsets
Next
From: Alvaro Herrera
Date:
Subject: Re: Adding missing object access hook invocations