Parallel pg_restore versus old dump files - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Parallel pg_restore versus old dump files |
Date | |
Msg-id | 7040.1277237238@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Parallel pg_restore versus old dump files
Re: Parallel pg_restore versus old dump files |
List | pgsql-hackers |
I've dug into the problem reported by Igor Neyman: http://archives.postgresql.org/pgsql-admin/2010-06/msg00148.php Unlike previous complainants, Igor was kind enough to supply a pg_dump archive file that triggers the problem. What I find is that his dump file contains no data offsets, ie, dataState == K_OFFSET_POS_NOT_SET for every TABLE DATA item. This causes _PrintTocData to take the same path taken for a non-seekable input file, ie, search forward looking for the desired item. In a parallel restore, all threads will start from the same file location, right after the last serially-restored item. Therefore, of course every one of them fails, except for the one told to process the very first parallel-restore item. The reason the dump file contains no offsets is that pg_dump can't write them unless it thinks the dump file is seekable *at dump time* --- otherwise it can't rewind to modify the dump's table of contents. And guess what: pre-8.4 pg_dump on Windows will NEVER believe that the output file is seekable, because we didn't bother to define HAVE_FSEEKO in the Windows port until 8.4. In short, parallel pg_restore is guaranteed to fail on any input file made with a pre-8.4 pg_dump on Windows. It may be that there's some other mechanism involved in the reports we've gotten of parallel restore failing only some of the time, but I'm thinking that the heretofore unrecognized dependency on pg_dump-time seekability could well explain those too. I see several action items here: 1. The error message emitted by _PrintTocData is incredibly misleading. It needs to be fixed to tell people if the problem is lack of data offsets rather than lack of seek capability. 2. The reason that _PrintTocData thinks it's an error to hit a restorable data item other than the one it wants is that, lacking seek capability, there'd be no way to rewind to get at that data item later. However, this is only an issue in serial restore. In a parallel restore worker thread, we're not going to need to seek back on that file pointer anyway, so we should just allow the code to continue forward. There seem to be two plausible ways of implementing that: * Just skip the error test altogether if in a worker child. * Modify the error test so that the only data item considered "wanted" is the specific one the current worker wants. The existing parallel restore logic in pg_backup_archiver.c doesn't appear to export enough state to allow either of these strategies to be implemented. In the Unix implementation I'd be inclined to export the state by creating a suitable static variable, but that's not going to work in the thread-based Windows code. It looks like we'd need some thread-local storage which the current code hasn't got any of. Another possibility is to just remove the inside-the-loop error test altogether: make it just skip till it finds the desired item, and only throw an error if it hits EOF without finding it. In the case that the error test is trying to catch, this would mean significantly more work done before reporting the error, but do we really care? I'm leaning to this solution because it would not require exporting state from the parallel restore control logic. 3. Perhaps pg_dump ought to emit a warning when it can't seek, instead of just silently not writing the data offsets. That behavior was okay before when lack of data offsets didn't really matter that much, but lack of data offsets is a serious performance handicap for parallel restore even after we fix the outright failure condition (because each worker is going to read through a lot of data to find what it needs). 4. Is there any value in back-porting the Windows FSEEKO support into 8.3 and 8.2? Arguably, not writing the data offsets is a performance bug. However a back-port won't do anything for people who are dumping with less than the latest minor release of pg_dump, so doing this might be largely wasted effort. Comments? regards, tom lane
pgsql-hackers by date: