Solaris 10u9, PG 8.4.6, 'c' lang function, fails on 1 of 5 servers - Mailing list pgsql-general

From dennis jenkins
Subject Solaris 10u9, PG 8.4.6, 'c' lang function, fails on 1 of 5 servers
Date
Msg-id CAAEzAp8JTS=3uprEQh=ZJS5qXrSZDGuwKjkXnq1xkaXB2KLuRw@mail.gmail.com
Whole thread Raw
Responses Re: Solaris 10u9, PG 8.4.6, 'c' lang function, fails on 1 of 5 servers
Re: Solaris 10u9, PG 8.4.6, 'c' lang function, fails on 1 of 5 servers
List pgsql-general
Hello Postgresql Community Members,

    I am stumped trying to install a few 'c' language functions
on a particular Solaris server (64-bit, amd cpu arch (not sparc)).  I actually
have 5 Postgresql servers, and the .so loads fine into 4 of them, but
refuses to load into the 5th.  I've quintuple checked the file
permissions, build of the .so, gcc versions, PostgreSQL versions,
etc...  I've had a college double check my work.  We're both stumped.
Details to follow.

    All servers are running Solaris 10u9 on 64-bit hardware inside
Solaris zones.  Two of the servers are X4720's, 144GB ram, 24 Intel
CPU cores.  These two servers run the 4 working Solaris zones that
are able to load the function implemented in the .so files.  Postgresql
version 8.4.6, compiled from source (not a binary package).

    The server that is misbehaving is an X4600, 128 GB ram, 16 AMD CPU
cores, but otherwise identical: Solaris 10u9, 64-bit OS, Postgresql
8.4.6.  All 5 systems use the stock gcc that ships with Solaris (v3.4.3,
its old, I know).

    The permissions on the files and Postgresql directories.  First the
a working server, then the server that is not working as expected.

(root@working: </db>) # ls -ld /db /db/*.so
drwx------  11 pgsql    root          23 Sep 27 10:39 /db
-rwxr-xr-x   1 root     root       57440 Sep 27 10:39 /db/pgsql_micr_parser_64.so

(root@working: </db>) # psql -Upgsql -dpostgres -c"select version();"
 PostgreSQL 8.4.6 on x86_64-pc-solaris2.11, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802), 64-bit

(root@working: </db>) # file /opt/local/x64/postgresql-8.4.6/bin/postgres
/opt/local/x64/postgresql-8.4.6/bin/postgres:   ELF 64-bit LSB executable AMD64 Version 1 [SSE], dynamically linked, not stripped

(root@working: </db>) # psql -Upgsql -dmy_db -c"create or replace function parse_micr(text) returns micr_struct
  as '/db/pgsql_micr_parser_64.so', 'pgsql_micr_parser' language c volatile cost 1;"
CREATE FUNCTION

(root@working: </db>) # psql -Upgsql -dmy_db -t -c"select transit from parse_micr(':8888=8888: <45800=100<');"
 8888=8888



(root@failed: </db>) # ls -ld /db /db/*.so
drwx------  11 pgsql    root          24 Sep 29 11:16 /db
-rwxr-xr-x   1 root     root       57440 Sep 29 09:46 /db/pgsql_micr_parser_64.so

(root@failed: </db>) # psql -Upgsql -dpostgres -c"select version();"
 PostgreSQL 8.4.6 on x86_64-pc-solaris2.11, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802), 64-bit

(root@failed: </db>) # file /opt/local/x64/postgresql-8.4.6/bin/postgres
/opt/local/x64/postgresql-8.4.6/bin/postgres:   ELF 64-bit LSB executable AMD64 Version 1 [SSE], dynamically linked, not stripped

(root@failed: </db>) #  psql -Upgsql -dmy_db -c"create or replace function parse_micr(text) returns micr_struct
 as '/db/pgsql_micr_parser_64.so', 'pgsql_micr_parser' language c volatile cost 1;"
ERROR:  could not load library "/db/pgsql_micr_parser_64.so": ld.so.1: postgres: fatal: /db/pgsql_micr_parser_64.so: Permission denied



  Ok.  Well, the file permissions are correct, so what gives?  Next
step is to trace the backend process as it attempts to load the .so. 
So I connect to the "failed" server via pgAdmin and run "select getpid();"
I then run "truss -p <PID>" from my shell, and in pgAdmin, execute the
SQL to create the function.  This is the result of the system trace:

(root@failed: </db>) # truss -p 10369
recv(9, 0x0097C103, 5, 0)       (sleeping...)
recv(9, "170301\0  ", 5, 0)                     = 5
recv(9, " TBEE5 n J\0 VF6E4DDCF84".., 32, 0)    = 32
recv(9, "170301\0B0", 5, 0)                     = 5
recv(9, "AAD5A5 L97B0CEA5A9F0CD89".., 176, 0)   = 176
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9520) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9530) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF8F50) = 0
resolvepath("/db/pgsql_micr_parser_64.so", "/db/pgsql_micr_parser_64.so", 1023) = 27
open("/db/pgsql_micr_parser_64.so", O_RDONLY)   = 22
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 22, 0) Err#13 EACCES
close(22)                                       = 0
setcontext(0xFFFFFD7FFFDF9050)
setcontext(0xFFFFFD7FFFDF9BB0)


    We can see that the backend is able to open the .so file for
reading, but the mmap fails.  From the Solaris man page on mmap:

ERRORS
     The mmap() function will fail if:

     EACCES          The fildes file descriptor is not  open  for
                     read,  regardless  of  the protection speci-
                     fied; or fildes is not open  for  write  and
                     PROT_WRITE  was  specified  for a MAP_SHARED
                     type mapping.


My analysis:

    1) The file descriptor (#22) is open for O_RDONLY.
    2) PROT_WRITE and MAP_SHARED are not specified, so write access is not relevant.
   
   
    Things that I tried, unsuccessfully:
   
1) I recompiled the .so on the target system (X4600, AMD chips) just
   in case it is somehow different from the .so that got built on the
   working system (X4270, Intel chips).
  
2) Tested with a different .so (I have another that implements forward
   and reverse DNS lookups, so one may invoke DNS functions inside SQL
   statements).  Same behavior.  Loads fine on the X4270 systems, but
   fails on the X4600 system.
  
3) Compiled both .so's on 32-bit and 64-bit Gentoo Linux and load them
   into Postgresql 9.0.4.  Works fine.
  
4) Compiled both .so's on 64-bit Solaris 10u9, postgresql 9.1 on an
   X4270 and it loads fine there too.

5) Examined a truss on a working system while loading the function. 
   Since it loaded fine already, I had to drop the function, then
   disconnect pgAdmin (to make the backend exit), reconnect and redo
   the "create function":
  
(root@working: </db>) # truss -p 16921
## (I elided a bunch of non-relevant grovelling though the FSM mapped file)
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9520) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9530) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF8F50) = 0
resolvepath("/db/pgsql_micr_parser_64.so", "/db/pgsql_micr_parser_64.so", 1023) = 27
open("/db/pgsql_micr_parser_64.so", O_RDONLY)   = 22
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 22, 0) = 0xFFFFFD7FFED80000
mmap(0x00010000, 90112, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, 4294967295, 0) = 0xFFFFFD7FFED00000
mmap(0xFFFFFD7FFED00000, 21997, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 22, 0) = 0xFFFFFD7FFED00000
mmap(0xFFFFFD7FFED15000, 2576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 22, 20480) = 0xFFFFFD7FFED15000
munmap(0xFFFFFD7FFED06000, 61440)               = 0
memcntl(0xFFFFFD7FFED00000, 7008, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(22)                                       = 0

6) There is nothing interesting in dmesg or syslog.

7) Disconnecting and reconnecting a few times, to try a freshly
   launched backend.  No luck.

Any thoughts or suggestions?

pgsql-general by date:

Previous
From: Simon Riggs
Date:
Subject: Re: pg_stat_replication data in standy servers
Next
From: Christophe Pettus
Date:
Subject: PDT but not WEST