Thread: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -h localhost --jobs=2 crashes
BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -h localhost --jobs=2 crashes
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16147 Logged by: Bill Tihen Email address: btihen@gmail.com PostgreSQL version: 12.1 Operating system: MacOS 10.15.1 Description: The following command crashes with any database I've tried (both large and small) DBs: `pg_restore -U wti0405 -d stage3 -h localhost --jobs=8 -Fc database_12_04-01-00.bak -x` I get the error others have associated with a locking problem in earlier versions: `pg_restore: error: a worker process died unexpectedly` Interesting both of the following commands work! `pg_restore -U wti0405 -d stage3 --jobs=8 -Fc database_12_04-01-00.bak -x` & `pg_restore -U wti0405 -d stage3 -h localhost -Fc database_12_04-01-00.bak -x` So it appears this is only a problem when using parallelization with a TCP and not a local socket.
Re: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -h localhost --jobs=2 crashes
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > The following bug has been logged on the website: > Bug reference: 16147 > Logged by: Bill Tihen > Email address: btihen@gmail.com > PostgreSQL version: 12.1 > Operating system: MacOS 10.15.1 > Description: > The following command crashes with any database I've tried (both large and > small) DBs: > `pg_restore -U wti0405 -d stage3 -h localhost --jobs=8 -Fc > database_12_04-01-00.bak -x` I failed to reproduce this on my own 10.15.1 laptop, using manual builds of either HEAD or the v12 branch. Plausible reasons for the difference in results might include: * There's something different about the homebrew build (could we see the output of pg_config?) * There's something unusual about your configuration (one thought that comes to mind: do you have SSL turned on for localhost connections?) * There's something about the data in this specific database (your report that it happens for multiple databases puts a crimp in this idea, though maybe they all share a common feature) Anyway, we need more info to investigate. You might try looking into the server log to see what the failure looks like from that side --- is there a query error, or just the worker disconnecting unexpectedly? regards, tom lane
Re: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -hlocalhost --jobs=2 crashes
From
PikachuEXE
Date:
I encounter similar issue with the following differences to OP - Operating system: MacOS 10.14.6 - PostgreSQL (both server & pg_restore) installed via PSQL.app https://github.com/PostgresApp/PostgresApp/releases/tag/v2.3.2 <https://github.com/PostgresApp/PostgresApp/releases/tag/v2.3.2> pg_restore's path is /Applications/Postgres.app/Contents/Versions/latest/bin/pg_restore Same as OP If the number of jobs is 1 or setting host to unix socket The error "pg_restore: error: a worker process died unexpectedly" won't be raised However this error doesn't happen on Ubuntu 18.04 version of psql ( https://www.ubuntuupdates.org/package/postgresql/bionic-pgdg/main/base/postgresql-12 <https://www.ubuntuupdates.org/package/postgresql/bionic-pgdg/main/base/postgresql-12> ) So it might or might not be build issue for MacOS version Though having the same issue on both distribution channels (homebrew & psql.app) seem rare Anyway reported also to psql.app too https://github.com/PostgresApp/PostgresApp/issues/538 <https://github.com/PostgresApp/PostgresApp/issues/538> -- Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html
Re: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -hlocalhost --jobs=2 crashes
From
sailor
Date:
Hello, It doesn't work for me also even on PG version 12.2 OS: Ubuntu 18.04.4 LTS postgresql-12 12.2-1.pgdg18.04+1 amd64 pg_dump (PostgreSQL) 12.2 (Ubuntu 12.2-1.pgdg18.04+1) pg_restore (PostgreSQL) 12.2 (Ubuntu 12.2-1.pgdg18.04+1) pg_restore: error: could not find block ID 11550 in archive -- possibly due to out-of-order restore request, which cannot be handled due to lack of data offsets in archive pg_restore: error: could not find block ID 11546 in archive -- possibly due to out-of-order restore request, which cannot be handled due to lack of data offsets in archive pg_restore: error: a worker process died unexpectedly However, it works with --jobs=1 -- Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html
Re: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -hlocalhost --jobs=2 crashes
From
David Gilman
Date:
I did some of the digging here. The issue still exists in the latest git master. A git bisect pointed me to commit 548e50976ce721b5e927d42a105c2f05b51b52a6 which did touch the pg_restore work. The bug doesn't seem to be with the pg_dump -Fc format as I was using an older dump version to do the bisect and it could still trigger the issue in recent PostgreSQL. I have noticed this issue on several Macs. Doing some Google searching you also see people complaining about it on Mac: https://dba.stackexchange.com/questions/257398/pg-restore-with-jobs-flag-results-in-pg-restore-error-a-worker-process-di https://github.com/thoughtbot/parity/issues/175 mentioning things like Homebrew and Postgres.app that are only on Mac. The recommendation seems to be to go back to pg_restore from postgresql 11. I tried a bit of sleuthing here. I've attached a file toc_contents.txt which is the pg_restore -l output of my dump file with some relevant bits grepped. I've attached a file printf_output.txt that has some manual printfs shoved into pg_restore to try and see what's going on. Note that I dropped all but the last few restore_toc_entry calls. And finally I attached printf.patch so you can see what I'm logging. The issue seems to be that pg_restore is trying to find a TOC that was earlier in the file than the current offset, never finds that TOC and fails with the "possibly due to out-of-order restore request". I've run pg_restore with these patches a few times now and it always fails in the same way (never seeks up in the file) but in different places each time. Tom, if you or anyone else with PostgreSQL would appreciate the pg_dump file I can send it to you out of band, it's only a few megabytes. I have pg_restore with debug symbols too if you want me to try anything. On Fri, May 15, 2020 at 9:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > PG Bug reporting form <noreply@postgresql.org> writes: > > The following bug has been logged on the website: > > Bug reference: 16147 > > Logged by: Bill Tihen > > Email address: btihen@gmail.com > > PostgreSQL version: 12.1 > > Operating system: MacOS 10.15.1 > > Description: > > > The following command crashes with any database I've tried (both large and > > small) DBs: > > `pg_restore -U wti0405 -d stage3 -h localhost --jobs=8 -Fc > > database_12_04-01-00.bak -x` > > I failed to reproduce this on my own 10.15.1 laptop, using manual > builds of either HEAD or the v12 branch. Plausible reasons for > the difference in results might include: > > * There's something different about the homebrew build (could we > see the output of pg_config?) > > * There's something unusual about your configuration (one thought > that comes to mind: do you have SSL turned on for localhost > connections?) > > * There's something about the data in this specific database > (your report that it happens for multiple databases puts a crimp > in this idea, though maybe they all share a common feature) > > Anyway, we need more info to investigate. You might try looking > into the server log to see what the failure looks like from that > side --- is there a query error, or just the worker disconnecting > unexpectedly? > > regards, tom lane > > > > -- David Gilman :DG<
Attachment
Re: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -hlocalhost --jobs=2 crashes
From
David Gilman
Date:
It looks like there is a whole bunch of problems here: - Bill Tihen's original post about restoring over sockets - sailor's post about "out-of-order restore request" failure - David Zhang's crash inside of GSS I was running into the "out-of-order restore request" issue. My post above was the first half of my investigations and was incorrect. The "out-of-order restore request" comes from custom dumps who were written to a non-seekable file descriptor. They silently lack certain metadata and can't be used for parallel restores in PostgreSQL 12 or later. If you're running into "out-of-order restore request" errors try running pg_dump with the -f flag to force it to create a dump file with the correct metadata. If you can't change how the dumps are made you can't use parallel pg_restore with them. There's isn't a code fix to this problem, everything is working as expected, but I've sent in a patch to help document the behavior. https://www.postgresql.org/message-id/CALBH9DDuJ%2BscZc4MEvw5uO-%3DvRyR2%3DQF9%2BYh%3D3hPEnKHWfS81A%40mail.gmail.com On Mon, May 18, 2020 at 10:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Mar 05, 2020 at 07:53:35PM -0800, David Zhang wrote: > > I can reproduce this pg_restore crash issue (pg_dump crash too when running > > with multiple jobs) on MacOS 10.14 Mojave and MacOS 10.15 Catalina using > > following steps. > > Isn't this the same as here? > https://www.postgresql.org/message-id/flat/16041-b44f9931ad91fc3d%40postgresql.org > ..concluding that macos library fails after forking. > > ..I found that via: https://github.com/PostgresApp/PostgresApp/issues/538 > => https://www.postgresql.org/message-id/1575881854624-0.post%40n3.nabble.com > > -- > Justin > > -- David Gilman :DG<