RTLD_LAZY considered harmful (Re: pltlc and pltlcu problems) - Mailing list pgsql-hackers

From Tom Lane
Subject RTLD_LAZY considered harmful (Re: pltlc and pltlcu problems)
Date
Msg-id 4640.1011552017@sss.pgh.pa.us
Whole thread Raw
In response to Re: pltlc and pltlcu problems  (Brent Verner <brent@rcfile.org>)
Responses Re: RTLD_LAZY considered harmful (Re: pltlc and pltlcu problems)  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: RTLD_LAZY considered harmful (Re: pltlc and pltlcu  (Peter Eisentraut <peter_e@gmx.net>)
Re: RTLD_LAZY considered harmful (Re: pltlc and pltlcu problems)  (David Terrell <dbt@meat.net>)
Re: RTLD_LAZY considered harmful (Re: pltlc and pltlcu problems)  (Patrick Welche <prlw1@newn.cam.ac.uk>)
List pgsql-hackers
Brent Verner <brent@rcfile.org> writes:
> Can someone verify that pltcl works on
> their stock redhat 7.2 system?

Indeed it does not.  On a straight-from-the-CD RH 7.2 install and
CVS-tip Postgres, I see both of the behaviors Murray complained of.

What I think is particularly nasty is that we get an exit(127) when
the symbol resolution fails, leading to database restart.  This will
probably happen on *most* systems not only Linux, because we are
specifying RTLD_LAZY in our dlopen() calls, meaning that missing
symbols should be flagged when they are referenced at runtime --- and
if we call a function that should be there and isn't, there's not much
the dynamic loader can do except throw a signal or exit().

What we should be doing is specifying RTLD_NOW to dlopen(), so that
any unresolved symbol failure occurs during dlopen(), when we are
prepared to deal with it in a clean fashion.

I ran into this same behavior years ago on HPUX and fixed it by using
what they call BIND_IMMEDIATE mode; but I now see that most of the
other ports are specifying RTLD_LAZY, and thus have this problem.

Unless I hear a credible counter-argument, I am going to change
RTLD_LAZY to RTLD_NOW in src/backend/port/dynloader/linux.h.  I have
tested that and it produces a clean error with no backend crash.

What I would *like* to do is make the same change in all the
port/dynloader files that reference
RTLD_LAZY:src/backend/port/dynloader/aix.hsrc/backend/port/dynloader/bsdi.hsrc/backend/port/dynloader/dgux.hsrc/backend/port/dynloader/freebsd.hsrc/backend/port/dynloader/irix5.hsrc/backend/port/dynloader/linux.hsrc/backend/port/dynloader/netbsd.hsrc/backend/port/dynloader/openbsd.hsrc/backend/port/dynloader/osf.hsrc/backend/port/dynloader/sco.hsrc/backend/port/dynloader/solaris.hsrc/backend/port/dynloader/svr4.hsrc/backend/port/dynloader/univel.hsrc/backend/port/dynloader/unixware.hsrc/backend/port/dynloader/win.h
However I'm a bit scared to do that at this late stage of the release
cycle, because perhaps some of these platforms don't support the full
dlopen() API.  Comments?  Can anyone test whether RTLD_NOW works on
any of the above-mentioned ports?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pltlc and pltlcu problems
Next
From: Vince Vielhaber
Date:
Subject: AOL buying RedHat?