Thread: Possible alpha5 SR bug

Possible alpha5 SR bug

From
Jeff Davis
Date:
During the testing day organized a week ago, Quinn
Weaver ran into what looks like a problem. I attached the log output at
the end of this email. Note that he was running a Mac, but replicating
from a Linux machine (both 64-bit). I know this is not a supported
configuration, but a segfault seems like a problem anyway.

Quinn helpfully provided a tarball of his data directory here:

http://fairpath.com/QuinnPgBug.tar.gz

and described his machine

"My machine is a Mac with an Intel Core 2 Duo processor (64-bit)
running Mac OS X 10.6.3.  It has 2 GB of RAM, which should be plenty
for the config we used."

I was trying to sort this bug out somewhat before posting, but we
weren't able to reproduce it (it happened near the end of testing, and
people were leaving), and I didn't have much chance to investigate in
the last week.

Regards,
    Jeff Davis

postgres@tao:/usr/local/pgsql-9.0alpha5-build1/data/data9.0$ ../../bin/postmaster -D
/usr/local/pgsql-9.0alpha5-build1/data/data9.0
LOG:  database system was interrupted; last known up at 2010-04-03 16:55:20 PDT
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG:  entering standby mode
LOG:  redo starts at 0/BC0000B8
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG:  unexpected pageaddr 0/9B000000 in log file 0, segment 189, offset 0
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
FATAL:  could not load library "/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so":
dlopen(/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so,10): Library not loaded:
/usr/local/pgsql/lib/libpq.5.dylib
          Referenced from: /usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so
          Reason: no suitable image found.  Did find:
                /Users/quinn/lib/libpq.5.dylib: stat() failed with errno=13
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG:  unexpected pageaddr 0/9B000000 in log file 0, segment 189, offset 0
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG:  WAL receiver process (PID 1011) was terminated by signal 11: Segmentation fault
LOG:  terminating any other active server processes
postgres@tao:/usr/local/pgsql-9.0alpha5-build1/data/data9.0$ fg
bash: fg: current: no such job

Re: Possible alpha5 SR bug

From
Fujii Masao
Date:
Thanks for the test and report!

On Tue, Apr 13, 2010 at 1:36 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> FATAL:  could not load library "/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so":
dlopen(/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so,10): Library not loaded:
/usr/local/pgsql/lib/libpq.5.dylib
>          Referenced from: /usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so
>          Reason: no suitable image found.  Did find:
>                /Users/quinn/lib/libpq.5.dylib: stat() failed with errno=13

Seems to have failed in loading libpq.5.dylib. I guess that "errno=13" means
"permission denied". Please ensure that the permission is appropriate.

> LOG:  WAL receiver process (PID 1011) was terminated by signal 11: Segmentation fault

Oops! I guess that this happened because walrcv_disconnect() was called in
WalRcvDie() even though libpqwalreceiver.so couldn't be loaded. The attached
patch would fix the problem.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Possible alpha5 SR bug

From
Magnus Hagander
Date:
On Tue, Apr 13, 2010 at 08:22, Fujii Masao <masao.fujii@gmail.com> wrote:
> Thanks for the test and report!
>
> On Tue, Apr 13, 2010 at 1:36 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> FATAL: =A0could not load library "/usr/local/pgsql-9.0alpha5-build1/lib/=
libpqwalreceiver.so": dlopen(/usr/local/pgsql-9.0alpha5-build1/lib/libpqwal=
receiver.so, 10): Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
>> =A0 =A0 =A0 =A0 =A0Referenced from: /usr/local/pgsql-9.0alpha5-build1/li=
b/libpqwalreceiver.so
>> =A0 =A0 =A0 =A0 =A0Reason: no suitable image found. =A0Did find:
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/Users/quinn/lib/libpq.5.dylib: stat() fa=
iled with errno=3D13
>
> Seems to have failed in loading libpq.5.dylib. I guess that "errno=3D13" =
means
> "permission denied". Please ensure that the permission is appropriate.
>
>> LOG: =A0WAL receiver process (PID 1011) was terminated by signal 11: Seg=
mentation fault
>
> Oops! I guess that this happened because walrcv_disconnect() was called in
> WalRcvDie() even though libpqwalreceiver.so couldn't be loaded. The attac=
hed
> patch would fix the problem.

Applied, thanks.

--=20
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/