Thread: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -h localhost --jobs=2 crashes

BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -h localhost --jobs=2 crashes

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16147
Logged by:          Bill Tihen
Email address:      btihen@gmail.com
PostgreSQL version: 12.1
Operating system:   MacOS 10.15.1
Description:

The following command crashes with any database I've tried (both large and
small) DBs:
`pg_restore -U wti0405 -d stage3 -h localhost --jobs=8 -Fc
database_12_04-01-00.bak -x`

I get the error others have associated with a locking problem in earlier
versions:
`pg_restore: error: a worker process died unexpectedly`

Interesting both of the following commands work!
`pg_restore -U wti0405 -d stage3 --jobs=8 -Fc database_12_04-01-00.bak -x`
&
`pg_restore -U wti0405 -d stage3 -h localhost -Fc database_12_04-01-00.bak
-x`

So it appears this is only a problem when using parallelization with a TCP
and not a local socket.


PG Bug reporting form <noreply@postgresql.org> writes:
> The following bug has been logged on the website:
> Bug reference:      16147
> Logged by:          Bill Tihen
> Email address:      btihen@gmail.com
> PostgreSQL version: 12.1
> Operating system:   MacOS 10.15.1
> Description:        

> The following command crashes with any database I've tried (both large and
> small) DBs:
> `pg_restore -U wti0405 -d stage3 -h localhost --jobs=8 -Fc
> database_12_04-01-00.bak -x`

I failed to reproduce this on my own 10.15.1 laptop, using manual
builds of either HEAD or the v12 branch.  Plausible reasons for
the difference in results might include:

* There's something different about the homebrew build (could we
see the output of pg_config?)

* There's something unusual about your configuration (one thought
that comes to mind: do you have SSL turned on for localhost
connections?)

* There's something about the data in this specific database
(your report that it happens for multiple databases puts a crimp
in this idea, though maybe they all share a common feature)

Anyway, we need more info to investigate.  You might try looking
into the server log to see what the failure looks like from that
side --- is there a query error, or just the worker disconnecting
unexpectedly?

            regards, tom lane



I encounter similar issue with the following differences to OP

- Operating system:   MacOS 10.14.6
- PostgreSQL (both server & pg_restore) installed via PSQL.app
https://github.com/PostgresApp/PostgresApp/releases/tag/v2.3.2
<https://github.com/PostgresApp/PostgresApp/releases/tag/v2.3.2>  

pg_restore's path is
/Applications/Postgres.app/Contents/Versions/latest/bin/pg_restore

Same as OP
If the number of jobs is 1
or setting host to unix socket

The error "pg_restore: error: a worker process died unexpectedly" won't be
raised
However this error doesn't happen on Ubuntu 18.04 version of psql
(
https://www.ubuntuupdates.org/package/postgresql/bionic-pgdg/main/base/postgresql-12
<https://www.ubuntuupdates.org/package/postgresql/bionic-pgdg/main/base/postgresql-12> 
)
So it might or might not be build issue for MacOS version 
Though having the same issue on both distribution channels (homebrew &
psql.app) seem rare

Anyway reported also to psql.app too
https://github.com/PostgresApp/PostgresApp/issues/538
<https://github.com/PostgresApp/PostgresApp/issues/538>  



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html



Hello,
It doesn't work for me also even on PG version 12.2

OS: Ubuntu 18.04.4 LTS
postgresql-12                              12.2-1.pgdg18.04+1                             
amd64
pg_dump (PostgreSQL) 12.2 (Ubuntu 12.2-1.pgdg18.04+1)
pg_restore (PostgreSQL) 12.2 (Ubuntu 12.2-1.pgdg18.04+1)

pg_restore: error: could not find block ID 11550 in archive -- possibly due
to out-of-order restore request, which cannot be handled due to lack of data
offsets in archive
pg_restore: error: could not find block ID 11546 in archive -- possibly due
to out-of-order restore request, which cannot be handled due to lack of data
offsets in archive
pg_restore: error: a worker process died unexpectedly

However, it works with --jobs=1 



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html



I did some of the digging here. The issue still exists in the latest
git master.  A git bisect pointed me to commit
548e50976ce721b5e927d42a105c2f05b51b52a6 which did touch the
pg_restore work. The bug doesn't seem to be with the pg_dump -Fc
format as I was using an older dump version to do the bisect and it
could still trigger the issue in recent PostgreSQL. I have noticed
this issue on several Macs. Doing some Google searching you also see
people complaining about it on Mac:


https://dba.stackexchange.com/questions/257398/pg-restore-with-jobs-flag-results-in-pg-restore-error-a-worker-process-di

https://github.com/thoughtbot/parity/issues/175

mentioning things like Homebrew and Postgres.app that are only on Mac.
The recommendation seems to be to go back to pg_restore from
postgresql 11.

I tried a bit of sleuthing here.  I've attached a file
toc_contents.txt which is the pg_restore -l output of my dump file
with some relevant bits grepped.  I've attached a file
printf_output.txt that has some manual printfs shoved into pg_restore
to try and see what's going on. Note that I dropped all but the last
few restore_toc_entry calls. And finally I attached printf.patch so
you can see what I'm logging.  The issue seems to be that pg_restore
is trying to find a TOC that was earlier in the file than the current
offset, never finds that TOC and fails with the "possibly due to
out-of-order restore request". I've run pg_restore with these patches
a few times now and it always fails in the same way (never seeks up in
the file) but in different places each time.

Tom, if you or anyone else with PostgreSQL would appreciate the
pg_dump file I can send it to you out of band, it's only a few
megabytes. I have pg_restore with debug symbols too if you want me to
try anything.


On Fri, May 15, 2020 at 9:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> PG Bug reporting form <noreply@postgresql.org> writes:
> > The following bug has been logged on the website:
> > Bug reference:      16147
> > Logged by:          Bill Tihen
> > Email address:      btihen@gmail.com
> > PostgreSQL version: 12.1
> > Operating system:   MacOS 10.15.1
> > Description:
>
> > The following command crashes with any database I've tried (both large and
> > small) DBs:
> > `pg_restore -U wti0405 -d stage3 -h localhost --jobs=8 -Fc
> > database_12_04-01-00.bak -x`
>
> I failed to reproduce this on my own 10.15.1 laptop, using manual
> builds of either HEAD or the v12 branch.  Plausible reasons for
> the difference in results might include:
>
> * There's something different about the homebrew build (could we
> see the output of pg_config?)
>
> * There's something unusual about your configuration (one thought
> that comes to mind: do you have SSL turned on for localhost
> connections?)
>
> * There's something about the data in this specific database
> (your report that it happens for multiple databases puts a crimp
> in this idea, though maybe they all share a common feature)
>
> Anyway, we need more info to investigate.  You might try looking
> into the server log to see what the failure looks like from that
> side --- is there a query error, or just the worker disconnecting
> unexpectedly?
>
>                         regards, tom lane
>
>
>
>


-- 
David Gilman
:DG<

Attachment
It looks like there is a whole bunch of problems here:

- Bill Tihen's original post about restoring over sockets
- sailor's post about "out-of-order restore request" failure
- David Zhang's crash inside of GSS

I was running into the "out-of-order restore request" issue. My post
above was the first half of my investigations and was incorrect. The
"out-of-order restore request" comes from custom dumps who were
written to a non-seekable file descriptor. They silently lack certain
metadata and can't be used for parallel restores in PostgreSQL 12 or
later. If you're running into "out-of-order restore request" errors
try running pg_dump with the -f flag to force it to create a dump file
with the correct metadata. If you can't change how the dumps are made
you can't use parallel pg_restore with them. There's isn't a code fix
to this problem, everything is working as expected, but I've sent in a
patch to help document the behavior.
https://www.postgresql.org/message-id/CALBH9DDuJ%2BscZc4MEvw5uO-%3DvRyR2%3DQF9%2BYh%3D3hPEnKHWfS81A%40mail.gmail.com


On Mon, May 18, 2020 at 10:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Mar 05, 2020 at 07:53:35PM -0800, David Zhang wrote:
> > I can reproduce this pg_restore crash issue (pg_dump crash too when running
> > with multiple jobs) on MacOS 10.14 Mojave and MacOS 10.15 Catalina using
> > following steps.
>
> Isn't this the same as here?
> https://www.postgresql.org/message-id/flat/16041-b44f9931ad91fc3d%40postgresql.org
> ..concluding that macos library fails after forking.
>
> ..I found that via: https://github.com/PostgresApp/PostgresApp/issues/538
> => https://www.postgresql.org/message-id/1575881854624-0.post%40n3.nabble.com
>
> --
> Justin
>
>


-- 
David Gilman
:DG<