Re: Warn when parallel restoring a custom dump without data offsets - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: Warn when parallel restoring a custom dump without data offsets |
Date | |
Msg-id | 20200523224751.GP4472@telsasoft.com Whole thread Raw |
In response to | Re: Warn when parallel restoring a custom dump without data offsets (David Gilman <davidgilman1@gmail.com>) |
Responses |
Re: Warn when parallel restoring a custom dump without data offsets
|
List | pgsql-hackers |
On Sat, May 23, 2020 at 03:54:30PM -0400, David Gilman wrote: > I've rounded this patch out with a test and I've set up the commitfest > website for this thread. The latest patches are attached and I think > they are ready for review. Thanks. https://commitfest.postgresql.org/28/2568/ I'm not sure this will be considered a bugfix, since the behavior is known. Maybe eligible for backpatch though (?) Your patch was encoded, so this is failing: http://cfbot.cputube.org/david-gilman.html Ideally CFBOT would deal with that (maybe by using git-am - adding Thomas), but I think you sent using gmail web interface, which also reordered the patches. (CFBOT *does* sort them, but it's a known annoyance). > dump file was written with data offsets pg_restore can seek directly to offsets COMMA > pg_restore would only find the TOC if it happened to be immediately "immediately" is wrong, no ? I thought the problem was if we seeked to D and then looked for C, we wouldn't attempt to go backwards. > read request only when restoring a custom dump file without data offsets. remove "only" > of a bunch of extra tiny reads when pg_restore starts up. I would have thought to mention the seeks() ; but it's true that the read()s now grow quadratically. I did run a test, but I don't know how many objects would be unreasonable or how many it'd take to show a problem. Maybe we should avoid fseeko(0, SEEK_SET) unless we need to wrap around after EOF - I'm not sure. Maybe the cleanest way would be to pre-populate a structure with all the TOC data and loop around that instead of seeking around the file ? Can we use the same structure as pg_dump ? Otherwise, that makes me think of commit 42f70cd9c. Make it's not a good parallel or example for this case, though. + The custom archive format may not work with the <option>-j</option> + option if the archive was originally created by writing the archive + to an unseekable output file. For the best concurrent restoration Can I suggest something like: pg_restore with parallel jobs may fail if the archive dump was written to an unseekable output stream, like stdout. + * If the input file can't be seeked we're at the mercy of the seeked COMMA >Subject: [PATCH 3/3] Add integration test for out-of-order TOC requests in pg_restore Well done - thanks for that. >Also add undocumented --disable-seeking argument to pg_dump to emulate >writing to an unseekable output file. Remove "also". Is it possible to dump to stdout (or pipe to cat or dd) to avoid a new option ? Maybe that would involve changing the test process to use the shell (system() vs execve()), or maybe you could write: /* sh handles output redirection and arg splitting */ 'sh', '-c', 'pg_dump -Fc -Z6 --no-sync --disable-seeking postgres > $tempdir/defaults_custom_format_no_seek_parallel_restore.dump', But I think that would need to then separately handle WIN32, so maybe it's not worth it.
pgsql-hackers by date: