Thread: Solaris getopt_long and PostgreSQL

Solaris getopt_long and PostgreSQL

From
Chuck McDevitt
Date:
<div class="Section1"><p class="MsoNormal">About a year ago, you talked to the PostgreSQL people about some problem
withSolaris getopt_long, and they changed the build to use the internal getopt_long instead of the Solaris one?<p
class="MsoNormal"> <pclass="MsoNormal">What was the problem with Solaris getopt_long?  Does the problem still exist in
Solaris10?<p class="MsoNormal"> <p class="MsoNormal">My users are unhappy at the change, since normal getopt_long
reordersthe args, and apparently the built-in one doesn’t, so “psql database –p port” no longer works, since it treats
–pas the user name.<p class="MsoNormal"> <p class="MsoNormal">I don’t know if I should revert that change, or port
netBSDgetopt_long and replace the PostgreSQL one with that.</div> 

Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Dne 17.03.09 16:38, Chuck McDevitt napsal(a):
> About a year ago, you talked to the PostgreSQL people about some problem 
> with Solaris getopt_long, and they changed the build to use the internal 
> getopt_long instead of the Solaris one?
> 

The problem was with getopt which is works little bit differently when - 
is specified in optstring. If you look in POSIX it does not define how 
getopt should work in this case. Solaris getopt implementation has 
extension for long args. Unfortunately PostgreSQL clashes with it, 
because it does not use getopt_long for long argument, but getopt(argc, 
argv,"xy-").

> 
> What was the problem with Solaris getopt_long?  Does the problem still 
> exist in Solaris 10?

yes and it will exist forever, because it is public API.

see  man -s3C getopt

> 
>  
> 
> My users are unhappy at the change, since normal getopt_long reorders 
> the args, and apparently the built-in one doesn’t, so “psql database –p 
> port” no longer works, since it treats –p as the user name.

I understand, I'm not happy too :(, but how Tom mentioned it has been 
never supposed to work. See documentation

> I don’t know if I should revert that change, or port netBSD getopt_long 
> and replace the PostgreSQL one with that.

getopt_long is OK. Problem is getopt. getopt in core is currently taken 
from *BSD but it could be updated.

One possible solution should be to use internal getopt only for postgres 
binary and for other to use solaris libc version.
Zdenek



Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Dne 17.03.09 17:13, Zdenek Kotala napsal(a):

>> I don’t know if I should revert that change, or port netBSD 
>> getopt_long and replace the PostgreSQL one with that.
> 
> getopt_long is OK. Problem is getopt. getopt in core is currently taken 
> from *BSD but it could be updated.
> 
> One possible solution should be to use internal getopt only for postgres 
> binary and for other to use solaris libc version.

I'm now looking into the code and most binaries uses getopt_long which 
works fine. I think getopt_long can be used from Solaris libc. Only 
postgres and pg_resetxlog uses getopt. I think nobody has problem with 
postgres, because it is called from pg_ctl or other start/stop script 
and pg_resetxlog should be modified to use getopt_long.
Comments?
    Zdenek



Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Dne 17.03.09 17:24, Zdenek Kotala napsal(a):
> Dne 17.03.09 17:13, Zdenek Kotala napsal(a):
>
>>> I don’t know if I should revert that change, or port netBSD
>>> getopt_long and replace the PostgreSQL one with that.
>>
>> getopt_long is OK. Problem is getopt. getopt in core is currently
>> taken from *BSD but it could be updated.
>>
>> One possible solution should be to use internal getopt only for
>> postgres binary and for other to use solaris libc version.
>
> I'm now looking into the code and most binaries uses getopt_long which
> works fine. I think getopt_long can be used from Solaris libc. Only
> postgres and pg_resetxlog uses getopt. I think nobody has problem with
> postgres, because it is called from pg_ctl or other start/stop script
> and pg_resetxlog should be modified to use getopt_long.
>

There is a patch, please test it if it satisfy your expectation. Do not
forget run autoconf.

    Zdenek
*** pgsql.ed0f426df3ee/configure.in    2009-03-17 17:42:16.131828743 +0100
--- pgsql/configure.in    2009-03-17 17:38:58.639284000 +0100
***************
*** 1274,1280 ****
  # our versions on that platform.
  if test "$PORTNAME" = "solaris"; then
    AC_LIBOBJ(getopt)
-   AC_LIBOBJ(getopt_long)
  elif test x"$ac_cv_type_struct_option" = xyes ; then
    AC_REPLACE_FUNCS([getopt_long])
  else
--- 1274,1279 ----

Re: Solaris getopt_long and PostgreSQL

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> [ use Solaris' version of getopt_long ]

The reason not to do that was discussed in this thread:

http://archives.postgresql.org//pgsql-patches/2008-02/msg00075.php

While Chuck is certainly free to build his local copy however he wants,
I don't think we're going to revert this change in the community
sources.
        regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Chuck McDevitt
Date:
What about the idea of updating our getopt and getopt_long to a more modern version by porting over netBSD getopt?

The current situation is confusing for users, as "psql databasename -p port" type of calls works on almost all
platforms,but not on those using the internal getopt/getopt_long.  For those, you get "-p" is not a valid user. 
This is because MAC, BSD and GNU getopt_long permutes the arguments, and our getopt_long does not.

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, March 17, 2009 11:02 AM
> To: Zdenek Kotala
> Cc: Chuck McDevitt; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Solaris getopt_long and PostgreSQL
>
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> > [ use Solaris' version of getopt_long ]
>
> The reason not to do that was discussed in this thread:
>
> http://archives.postgresql.org//pgsql-patches/2008-02/msg00075.php
>
> While Chuck is certainly free to build his local copy however he wants,
> I don't think we're going to revert this change in the community
> sources.
>
>             regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Tom Lane
Date:
Chuck McDevitt <cmcdevitt@greenplum.com> writes:
> This is because MAC, BSD and GNU getopt_long permutes the arguments, and our getopt_long does not.

AFAIK those all work by scribbling on the original argv[] array, a
behavior that seems pretty risky from a portability standpoint.
Since our port/ module is only going to get used on old platforms with
no or broken getopt_long(), it needs to be pretty conservative about
what it assumes the system environment can handle.
        regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Chuck McDevitt
Date:
Perhaps I could use the same test pg_status uses to decide PS_USE_CHANGE_ARGV and PS_USE_CLOBBER_ARGV?

Any obviously, we don't just use ours for platforms with no or broken getopt_long, since we are talking Solaris (which
hasa bug in getopt, but getopt_long works fine) 

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, March 17, 2009 11:26 AM
> To: Chuck McDevitt
> Cc: Zdenek Kotala; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Solaris getopt_long and PostgreSQL
>
> Chuck McDevitt <cmcdevitt@greenplum.com> writes:
> > This is because MAC, BSD and GNU getopt_long permutes the arguments,
> and our getopt_long does not.
>
> AFAIK those all work by scribbling on the original argv[] array, a
> behavior that seems pretty risky from a portability standpoint.
> Since our port/ module is only going to get used on old platforms with
> no or broken getopt_long(), it needs to be pretty conservative about
> what it assumes the system environment can handle.
>
>             regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Dne 17.03.09 19:48, Chuck McDevitt napsal(a):

> Any obviously, we don't just use ours for platforms with no or broken getopt_long, > since we are talking Solaris
(whichhas a bug in getopt, but 
 
getopt_long works fine)

Just for clarification:

It is not bug in Solaris. It is Solaris' getopt extension for long arg 
processing from days when getopt_long did not exist (hmm it seems that 
it is still not in POSIX :( ). By my opinion PostgreSQL does not work 
correctly here, because it uses construction which is marked as a 
implementation depended in POSIX standard.

See:
http://www.opengroup.org/onlinepubs/009695399/functions/getopt.html
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap12.html#tag_12_02
    Zdenek


Re: Solaris getopt_long and PostgreSQL

From
Peter Eisentraut
Date:
On Tuesday 17 March 2009 20:02:14 Tom Lane wrote:
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> > [ use Solaris' version of getopt_long ]
>
> The reason not to do that was discussed in this thread:
>
> http://archives.postgresql.org//pgsql-patches/2008-02/msg00075.php

That discussion was about option parsing in postgres/postmaster, which does 
not use getopt_long at all.


Re: Solaris getopt_long and PostgreSQL

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Dne 17.03.09 19:48, Chuck McDevitt napsal(a):
>> Any obviously, we don't just use ours for platforms with no or broken getopt_long, 
>> since we are talking Solaris (which has a bug in getopt, but 
>> getopt_long works fine)

> Just for clarification:

> It is not bug in Solaris.

Well, "bug" in the sense that it doesn't do what we want it to.

After reviewing this thread and the one that led up to the 8.3 behavior,
it seems clear that we failed to draw a distinction between getopt and
getopt_long when we should have.  We don't like Solaris' getopt but
there seems no reason not to use Solaris' getopt_long.  So Zdenek's
suggestion to change configure seems the correct fix, and I've done
that.
        regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Tom Lane
Date:
I wrote:
> After reviewing this thread and the one that led up to the 8.3 behavior,
> it seems clear that we failed to draw a distinction between getopt and
> getopt_long when we should have.  We don't like Solaris' getopt but
> there seems no reason not to use Solaris' getopt_long.  So Zdenek's
> suggestion to change configure seems the correct fix, and I've done
> that.

So my reward for that is to find that every one of the Solaris 11
buildfarm members is failing today:

pg_regress: initdb failed
Examine /export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/log/initdb.log for the reason.
Command was:
"/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/./tmp_check/install//export/home/tmp/pg-test/build-suncc/HEAD/inst/bin/initdb"
-D"/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/./tmp_check/data" -L
"/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/./tmp_check/install//export/home/tmp/pg-test/build-suncc/HEAD/inst/share/postgresql"
--noclean--no-locale > "/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/log/initdb.log" 2>&1
 
make: *** [check] Error 2

================== pgsql.6320/src/test/regress/log/initdb.log ===================
initdb: too many command-line arguments (first is "-L")
Try "initdb --help" for more information.
Running in noclean mode.  Mistakes will not be cleaned up.

Apparently the system version of getopt_long is broken on Solaris 11.
My patience for this grows short.
        regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Tom Lane píše v so 28. 03. 2009 v 14:36 -0400:
> I wrote:
> > After reviewing this thread and the one that led up to the 8.3 behavior,
> > it seems clear that we failed to draw a distinction between getopt and
> > getopt_long when we should have.  We don't like Solaris' getopt but
> > there seems no reason not to use Solaris' getopt_long.  So Zdenek's
> > suggestion to change configure seems the correct fix, and I've done
> > that.
> 
> So my reward for that is to find that every one of the Solaris 11
> buildfarm members is failing today:

I'm going to look on that. 
Zdenek



Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Tom Lane píše v so 28. 03. 2009 v 14:36 -0400:

> Apparently the system version of getopt_long is broken on Solaris 11.
> My patience for this grows short.

It is not problem with getopt_long itself, but with symbol overriding.
getopt_long uses optind and so on from libc, but e.g. initdb takes
optind which is defined in port/getopt.c. It causes problem in following
code:

02607     /* Non-option argument specifies data directory */
02608     if (optind < argc)

which compares different variable.

I think it is general problem. HAVE_GETOPT_LONG cannot be defined when
HAVE_GETOPT_H is not defined.

I'm sorry for my previous patch. I had to make some mistake in
autoconf/make magic.

I attached a fix. Only problem what I see there is getopt_long.h which
contains

#ifdef HAVE_GETOPT_H
#include <getopt.h>
#endif

but getopt.h is required for getopt_long(). Fortunately, content is
similar with getopt_long.h and there is no problem with it on Solaris.

        Zdenek




Attachment

Re: Solaris getopt_long and PostgreSQL

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> I attached a fix. Only problem what I see there is getopt_long.h which
> contains 

I'm more concerned about the "static int optreset", which is 100%
guaranteed to break things.
        regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Dne 31.03.09 04:55, Tom Lane napsal(a):
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>> I attached a fix. Only problem what I see there is getopt_long.h which
>> contains
>
> I'm more concerned about the "static int optreset", which is 100%
> guaranteed to break things.

Yeah correct, I overlooked that optreset is also defined as a extern.
There is updated patch.

    Thanks Zdenek

*** pgsql.c598eae30145/src/port/getopt.c    2009-03-31 13:31:56.896353577 +0200
--- pgsql/src/port/getopt.c    2009-03-31 12:22:04.619485958 +0200
***************
*** 36,47 ****
  static char sccsid[] = "@(#)getopt.c    8.3 (Berkeley) 4/27/95";
  #endif   /* LIBC_SCCS and not lint */

!
! int            opterr = 1,            /* if error message should be printed */
!             optind = 1,            /* index into parent argv vector */
!             optopt,                /* character checked for validity */
!             optreset;            /* reset getopt */
  char       *optarg;                /* argument associated with option */

  #define BADCH    (int)'?'
  #define BADARG    (int)':'
--- 36,54 ----
  static char sccsid[] = "@(#)getopt.c    8.3 (Berkeley) 4/27/95";
  #endif   /* LIBC_SCCS and not lint */

! /* In situation when we use getopt_long from libc, we needs to use libc variable,
!  * else it causes symbol overriding and optind contains nonsens.
!  */
! #ifndef HAVE_GETOPT_LONG
! int            opterr = 1;            /* if error message should be printed */
! int            optind = 1;            /* index into parent argv vector */
! int            optopt;                /* character checked for validity */
  char       *optarg;                /* argument associated with option */
+ #endif
+
+ #ifndef HAVE_INT_OPTRESET
+ int            optreset;            /* reset getopt */
+ #endif

  #define BADCH    (int)'?'
  #define BADARG    (int)':'

Re: Solaris getopt_long and PostgreSQL

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Yeah correct, I overlooked that optreset is also defined as a extern. 
> There is updated patch.

On looking at this I still can't see how it's not broken.  You are
effectively assuming that getopt_long.c must define those variables.
But surely getopt_long.c should be assuming that getopt.c defines them.
Aren't we likely to end up with the situation that everyone is
extern'ing them?

What appears to me to be happening is that Solaris' linker is failing to
merge global variable definitions when it should (must) do so.  We need
to find out why rather than solve it with a patch that will certainly
break other platforms.

If you can't come up with a real solution, we might have to do this
with "#ifndef SOLARIS" or something similar.
        regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Dne 31.03.09 18:21, Tom Lane napsal(a):
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>> Yeah correct, I overlooked that optreset is also defined as a extern. 
>> There is updated patch.
> 
> On looking at this I still can't see how it's not broken.  You are
> effectively assuming that getopt_long.c must define those variables.

No I assuming that when we use getopt_long from libc then libc already 
has these variables defined. It must.

> But surely getopt_long.c should be assuming that getopt.c defines them.
> Aren't we likely to end up with the situation that everyone is
> extern'ing them?

Yeah, getopt_long assumes that optind, opterr and so on are already 
defined by getopt. If you look in current implementation in BDS you can 
see that there is getopt_internal() and getopt(), getopt_long are only 
wrapper.

http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/getopt_long.c?rev=1.23;content-type=text%2Fplain

The main problem what I see here is that getopt and getopt_long works 
together. Use one from system and one ported is not good idea. I think 
best solution is to port new BSD version into postgreSQL and use both 
function from libc version or ported versin. Mixing then is risky.

See also e.g. initdb.c, there is declaration of opterr...

> What appears to me to be happening is that Solaris' linker is failing to
> merge global variable definitions when it should (must) do so.  We need
> to find out why rather than solve it with a patch that will certainly
> break other platforms.

Linker cannot do it in general. Libc contains these variables and 
compiled code points to them. When you declare them again into your 
application you override them but only for your application, not for libc.

> If you can't come up with a real solution, we might have to do this
> with "#ifndef SOLARIS" or something similar.

I prefer another solution then this, but it is also possible.

Personally prefer to port new getopt_long. It is benefit also for win users.

    Zdenek


Re: Solaris getopt_long and PostgreSQL

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> The main problem what I see here is that getopt and getopt_long works 
> together. Use one from system and one ported is not good idea.

Well, the expected (and pretty-well-tested) case is that your system has
getopt but not getopt_long.  I don't see any reason why using ported
getopt_long in that case is "not good idea".

I agree that substituting getopt without substituting getopt_long is a
tad risky, and probably hasn't been tested anyplace else previously.
It may well be that we should revert to the previous state of affairs
where we don't trust Solaris for either function.

> I think best solution is to port new BSD version into postgreSQL and
> use both function from libc version or ported versin.

I'm not sure which part of "no" you didn't understand.  Changing the
contents of argv[] is going to be system-specific and there is no reason
to believe that a BSD implementation will work everywhere.
        regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Tom Lane píše v út 31. 03. 2009 v 13:10 -0400:
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> > The main problem what I see here is that getopt and getopt_long works 
> > together. Use one from system and one ported is not good idea.
> 
> Well, the expected (and pretty-well-tested) case is that your system has
> getopt but not getopt_long.  I don't see any reason why using ported
> getopt_long in that case is "not good idea".

I'm looking on to POSIX and all opt* variable are specified there and
getopt_long use only what is specified. It should be OK.

> I agree that substituting getopt without substituting getopt_long is a
> tad risky, and probably hasn't been tested anyplace else previously.


It seems to me, that optind,... is same case lake optreset. I'm thinking
to add  HAVE_INT_OPTIND macro (similar to HAVE_INT_OPTRESET) and use it
instead of HAVE_GETOPT_LONG in my previous patch.

Another possibility is to rewrite postgres(and pg_resetxlog) to use
getopt_long() instead of getopt().

> It may well be that we should revert to the previous state of affairs
> where we don't trust Solaris for either function.

I would like to solve it rather then revert back. 
Zdenek




Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Zdenek Kotala píše v út 31. 03. 2009 v 21:25 +0200:

> Another possibility is to rewrite postgres(and pg_resetxlog) to use
> getopt_long() instead of getopt().

Attached patch rewrites postgres to use getopt_long instead of getopt.
Patch also removes configure part for Solaris related to getopt.

        Zdenek

Attachment

Re: Solaris getopt_long and PostgreSQL

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Zdenek Kotala píše v út 31. 03. 2009 v 21:25 +0200:
>> Another possibility is to rewrite postgres(and pg_resetxlog) to use
>> getopt_long() instead of getopt().

> Attached patch rewrites postgres to use getopt_long instead of getopt.

Actually, I fooled around with it last night and seem to have fixed it
(buildfarm is all green today) by the expedient of not defining our own
optind etc. variables if the system supplies them.  So that seemed like
a clean fix to me --- the old handling of optreset in particular was a
huge kluge, whereas it's clear how this code is supposed to work.

I don't think we need to mess around with changing our option parsing
logic, especially not to the extent that you propose here.
        regards, tom lane


Re: Solaris getopt_long and PostgreSQL

From
Zdenek Kotala
Date:
Tom Lane píše v ne 05. 04. 2009 v 17:44 -0400:
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> > Zdenek Kotala píše v út 31. 03. 2009 v 21:25 +0200:
> >> Another possibility is to rewrite postgres(and pg_resetxlog) to use
> >> getopt_long() instead of getopt().
> 
> > Attached patch rewrites postgres to use getopt_long instead of getopt.
> 
> Actually, I fooled around with it last night and seem to have fixed it
> (buildfarm is all green today) by the expedient of not defining our own
> optind etc. variables if the system supplies them.  So that seemed like
> a clean fix to me --- the old handling of optreset in particular was a
> huge kluge, whereas it's clear how this code is supposed to work.

Yeah, everything is green today. Thanks. Is it possible to backport it
at least to 8.3?

> I don't think we need to mess around with changing our option parsing
> logic, especially not to the extent that you propose here.

When previous solution works well on all platforms there is no reason to
use getopt_long.
Thanks Zdenek