Thread: Detecting glibc getopt?

Detecting glibc getopt?

From
Tom Lane
Date:
I have traced down the postmaster-option-processing failure that Thomas
reported this morning.  It appears to be specific to systems running
glibc: the problem is that resetting optind to 1 is not enough to
put glibc's getopt() subroutine into a good state to process a fresh
set of options.  (Internally it has a "nextchar" pointer that is still
pointing at the old argv list, and only if the pointer points to a null
character will it wake up enough to reexamine the argv pointer you give
it.)  The reason we see this now, and didn't see it before, is that
I rearranged startup to set the ps process title as soon as possible
after forking a subprocess --- and at least on Linux machines, that
"nextchar" pointer is pointing into the argv array that's overwritten
by init_ps_display.

While I could revert that change, I don't want to.  The idea was to be
sure that a postmaster child running its authentication cycle could be
identified, and I still think that's an important feature.  So I want to
find a way to make it work.

Looking at the source code of glibc's getopt, it seems there are two
ways to force a reset:

* set __getopt_initialized to 0.  I thought this was an ideal solution
since configure could check for the presence of __getopt_initialized.
Unfortunately it seems that glibc is built in such a way that that
symbol isn't exported :-(, even though it looks global in the source.

* set optind to 0, instead of the more usual 1.  This will work, but
it requires us to know that we're dealing with glibc getopt and not
anyone else's getopt.

I have thought of two ways to detect glibc getopt: one is to assume that
if getopt_long() is available, we should set optind=0.  The other is to
try a runtime test in configure and see if it works to set optind=0.
Runtime configure tests aren't very appealing, but I don't much care
for equating HAVE_GETOPT_LONG to how we should reset optind, either.

Opinions anyone?  Better ideas?
        regards, tom lane


Re: Detecting glibc getopt?

From
Thomas Lockhart
Date:
(I still see the symptom btw; did a make distclean and configure after
updating my tree)


Re: Detecting glibc getopt?

From
Tom Lane
Date:
Thomas Lockhart <lockhart@fourpalms.org> writes:
> (I still see the symptom btw; did a make distclean and configure after
> updating my tree)

Yeah, it's still busted; my first try was wrong.  I have confirmed the
"optind = 0" fix works on my LinuxPPC machine, but we need to decide
how to autoconfigure that hack.
        regards, tom lane


Re: Detecting glibc getopt?

From
Peter Eisentraut
Date:
Tom Lane writes:

> The reason we see this now, and didn't see it before, is that
> I rearranged startup to set the ps process title as soon as possible
> after forking a subprocess --- and at least on Linux machines, that
> "nextchar" pointer is pointing into the argv array that's overwritten
> by init_ps_display.

How about copying the entire argv[] array to a new location before the
very first call to getopt().  Then you can use getopt() without hackery
and can do anything you want to the "real" argv area.  That should be a
lot safer.  (We don't know yet what other platforms might play
optimization tricks in getopt().)

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Detecting glibc getopt?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> How about copying the entire argv[] array to a new location before the
> very first call to getopt().  Then you can use getopt() without hackery
> and can do anything you want to the "real" argv area.  That should be a
> lot safer.  (We don't know yet what other platforms might play
> optimization tricks in getopt().)

Well, mumble --- strictly speaking, there is *NO* way to use getopt
over multiple cycles "without hackery".  The standard for getopt
(http://www.opengroup.org/onlinepubs/7908799/xsh/getopt.html)
doesn't say you're allowed to scribble on optind in the first place.
But you're probably right that having a read-only copy of the argv
vector will make things safer.  Will do it that way.
        regards, tom lane


Re: Detecting glibc getopt?

From
Bruce Momjian
Date:
Is this resolved?

---------------------------------------------------------------------------

> I have traced down the postmaster-option-processing failure that Thomas
> reported this morning.  It appears to be specific to systems running
> glibc: the problem is that resetting optind to 1 is not enough to
> put glibc's getopt() subroutine into a good state to process a fresh
> set of options.  (Internally it has a "nextchar" pointer that is still
> pointing at the old argv list, and only if the pointer points to a null
> character will it wake up enough to reexamine the argv pointer you give
> it.)  The reason we see this now, and didn't see it before, is that
> I rearranged startup to set the ps process title as soon as possible
> after forking a subprocess --- and at least on Linux machines, that
> "nextchar" pointer is pointing into the argv array that's overwritten
> by init_ps_display.
> 
> While I could revert that change, I don't want to.  The idea was to be
> sure that a postmaster child running its authentication cycle could be
> identified, and I still think that's an important feature.  So I want to
> find a way to make it work.
> 
> Looking at the source code of glibc's getopt, it seems there are two
> ways to force a reset:
> 
> * set __getopt_initialized to 0.  I thought this was an ideal solution
> since configure could check for the presence of __getopt_initialized.
> Unfortunately it seems that glibc is built in such a way that that
> symbol isn't exported :-(, even though it looks global in the source.
> 
> * set optind to 0, instead of the more usual 1.  This will work, but
> it requires us to know that we're dealing with glibc getopt and not
> anyone else's getopt.
> 
> I have thought of two ways to detect glibc getopt: one is to assume that
> if getopt_long() is available, we should set optind=0.  The other is to
> try a runtime test in configure and see if it works to set optind=0.
> Runtime configure tests aren't very appealing, but I don't much care
> for equating HAVE_GETOPT_LONG to how we should reset optind, either.
> 
> Opinions anyone?  Better ideas?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Detecting glibc getopt?

From
Thomas Lockhart
Date:
> Is this resolved?

Sure. Within a day or two of the initial problem report.
                 - Thomas