Re: [HACKERS] plperl intial pass - Mailing list pgsql-hackers

From wieck@debis.com (Jan Wieck)
Subject Re: [HACKERS] plperl intial pass
Date
Msg-id m118tke-0003kvC@orion.SAPserv.Hamburg.dsh.de
Whole thread Raw
In response to Re: [HACKERS] plperl intial pass  ("Mark Hollomon" <mhh@nortelnetworks.com>)
List pgsql-hackers
Mark Hollomon wrote:

> >     A  dynamically  loadable  Tcl  module  contains  one  special
> >     function  named  <libname>_Init()  where  first  character of
> >     libname is capitalized.  On dynamic load,  this  function  is
> >     called  with  the  invoking  interpreter  as  argument.  This
> >     function then calls  Tcl_CreateCommand()  etc.  to  tell  Tcl
>                            ^^^^^^^^^^^^^^^^^
>
> And here-in lies the problem. Tcl_CreateCommand is sitting, not
> in the executable, but in the shared-lib with the function call
> handler. dlopen(), by default will not link across shared-libs.
>
>           postgres
>      /-----/  \-----\
>      |              |
>   plperl.so ---> Opcode.so
>              ^^
> This link doesn't happen.

    But  it does for PL/Tcl - at least under Linux-ELF. (C = Call
    to, L = Location of functions code segment):

      +-------------------------+
      |       postgres          |
      +-------------------------+
                   |
                   | dynamic load
                   |
                   v
      +---------------------------+          +---------------------------+
      | pltcl.so                  |--------->| libtcl8.0.so              |
      |                           |  auto-   |                           |
      | C Tcl_CreateInterp()      |  dynamic | L Tcl_CreateInterp()      |
      | C Tcl_CreateCommand()     |  load    | L Tcl_CreateCommand()     |
      | L static pltcl_SPI_exec() |          | C pltcl_SPI_exec()        |
      +---------------------------+          +---------------------------+

    After loading of pltcl.so,  it  calls  Tcl_CreateInterp()  to
    build  a  Tcl interpreter, and then calls Tcl_CreateCommand()
    to tell that interpreter the address of one  of  it's  hidden
    (static)  functions  plus a name for it from the script side.
    The interpreter just remembers  this  in  it's  command  hash
    table,   and  if  that  keyword  occurs  when  it  expects  a
    command/procedure  name,  just  calls  it  via  the  function
    pointer.

    There  is  no  -ltcl8.0  switch in the link step of postgres.
    The fact that pltcl.so needs something out of libtcl8.0.so is
    told when linking pltcl.so:

        gcc -shared -o pltcl.so pltcl.o -L/usr/local/lib -ltcl8.0

    That results in this:

    [pgsql@hot] ~ > ldd bin/postgres
            libdl.so.1 => /lib/libdl.so.1 (0x4000a000)
            libm.so.5 => /lib/libm.so.5 (0x4000d000)
            libtermcap.so.2 => /usr/lib/libtermcap.so.2 (0x40016000)
            libncurses.so.3.0 => /lib/libncurses.so.3.0 (0x4001a000)
            libc.so.5 => /lib/libc.so.5 (0x4005b000)
    [pgsql@hot] ~ > ldd lib/pltcl.so
            ./lib/pltcl.so => ./lib/pltcl.so (0x4000a000)
            libc.so.5 => /lib/libc.so.5 (0x40010000)
            libtcl8.0.so => /usr/local/lib/libtcl8.0.so (0x400cb000)

    As you see, there is no libtcl mentioned in  the  shared  lib
    dependencies  of  the  postgres  backend.  It's  the pltcl.so
    shared object that remembers this. And if you invoke "ldd  -r
    -d pltcl.so" it will print alot of unresolveable symbols, but
    most of them are backend symbols (the others  are  math  ones
    because  the  above  gcc -shared call is in fact incomplete -
    but since the backend is already linked  against  libm.so  it
    doesn't matter :-).

    So  if  I want to use My dynamically loadable package for Tcl
    from inside the PL/Tcl interpreter,  I  would  have  to  call
    My_Init()  from  pltcl.so  AND  add  My.so  to the linkage of
    pltcl.so. Calling My_Init()  causes  that  "pltcl.o"  has  an
    unresolved reference to symbol _My_Init. The linker find's it
    in My.so and saves this  info  in  pltcl.so  so  the  dynamic
    loader  can  (and  does)  resolve  it whenever something load
    pltcl.so.

    The important key is to reference at least one symbol in  the
    shared  lib you want to get automatically loaded. You can add
    as much link libs with -l as  you  want.  If  none  of  their
    symbols  is  needed, the linker will not save this dependency
    (because there is none) in the resulting .so.

    I'll give it a try and USE  some  binary  Tcl  packages  from
    inside.  Will tell ya soon.

> Getting those two to play together is more than I care to attempt.
> I am researching a fix now to let linux installations use dlopen
> if it is available.

    Don't think you need to.

> >     This is just the way I would do it for Tcl and I'll surely do
> >     it   someday.    I  would  like  to  have  a  second,  unsafe
> >     interpreter in the module.  That could then modify  files  or
> >     use  the  frontend  library to access a different database on
> >     another server. Needless to say that this then  would  be  an
> >     untrusted language, available only for db superusers.
> >
>
> Yes, I've been thinking about that as well. It would be nice to have
> permissions based on userid. Maybe the 'suid' stuff that is being
> discussed in another thread will gives us a mechanism.

    I  know,  I  know  -  and  I  know  how.  It  cannot work for
    "internal" language functions. But  for  anything  that  goes
    through  some loading (dynloader or PL call hander), the fmgr
    looks up pg_proc and put's  informations  into  the  FmgrInfo
    struct. Adding a setuid field to pg_proc and remembering that
    too wouldn't be too much and it then would know when  calling
    such  a  beast.  Fmgr then manages a current user stack which
    must be reset on a transaction abort. Anything that needs the
    current user simply looks at the toplevel stack entry.

    This   is   totally  transparent  then  for  all  non-builtin
    functions and all non-builtin triggers (where I don't know of
    one).

    Maybe  I  kept this far too long in mind. But I thought about
    some more complicated changes to the function call  interface
    for  a  while  that  would require touching several dozens of
    source files (single argument NULL identification,  returning
    tuples  and  tuple SET's).  Doing SETUID would have been some
    DONE WHILE AT IT. I really should  do  it  earlier  than  the
    SET's,  because they require subselecting RTE's (which it the
    third thread now - eh -  I better shut up).


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: [HACKERS] RE: [INTERFACES] Re: SSL patch
Next
From: Ryan Bradetich
Date:
Subject: Re: [HACKERS] Fortune 500 ...