Thread: Solaris getopt_long and PostgreSQL
<div class="Section1"><p class="MsoNormal">About a year ago, you talked to the PostgreSQL people about some problem withSolaris getopt_long, and they changed the build to use the internal getopt_long instead of the Solaris one?<p class="MsoNormal"> <pclass="MsoNormal">What was the problem with Solaris getopt_long? Does the problem still exist in Solaris10?<p class="MsoNormal"> <p class="MsoNormal">My users are unhappy at the change, since normal getopt_long reordersthe args, and apparently the built-in one doesn’t, so “psql database –p port” no longer works, since it treats –pas the user name.<p class="MsoNormal"> <p class="MsoNormal">I don’t know if I should revert that change, or port netBSDgetopt_long and replace the PostgreSQL one with that.</div>
Dne 17.03.09 16:38, Chuck McDevitt napsal(a): > About a year ago, you talked to the PostgreSQL people about some problem > with Solaris getopt_long, and they changed the build to use the internal > getopt_long instead of the Solaris one? > The problem was with getopt which is works little bit differently when - is specified in optstring. If you look in POSIX it does not define how getopt should work in this case. Solaris getopt implementation has extension for long args. Unfortunately PostgreSQL clashes with it, because it does not use getopt_long for long argument, but getopt(argc, argv,"xy-"). > > What was the problem with Solaris getopt_long? Does the problem still > exist in Solaris 10? yes and it will exist forever, because it is public API. see man -s3C getopt > > > > My users are unhappy at the change, since normal getopt_long reorders > the args, and apparently the built-in one doesn’t, so “psql database –p > port” no longer works, since it treats –p as the user name. I understand, I'm not happy too :(, but how Tom mentioned it has been never supposed to work. See documentation > I don’t know if I should revert that change, or port netBSD getopt_long > and replace the PostgreSQL one with that. getopt_long is OK. Problem is getopt. getopt in core is currently taken from *BSD but it could be updated. One possible solution should be to use internal getopt only for postgres binary and for other to use solaris libc version. Zdenek
Dne 17.03.09 17:13, Zdenek Kotala napsal(a): >> I don’t know if I should revert that change, or port netBSD >> getopt_long and replace the PostgreSQL one with that. > > getopt_long is OK. Problem is getopt. getopt in core is currently taken > from *BSD but it could be updated. > > One possible solution should be to use internal getopt only for postgres > binary and for other to use solaris libc version. I'm now looking into the code and most binaries uses getopt_long which works fine. I think getopt_long can be used from Solaris libc. Only postgres and pg_resetxlog uses getopt. I think nobody has problem with postgres, because it is called from pg_ctl or other start/stop script and pg_resetxlog should be modified to use getopt_long. Comments? Zdenek
Dne 17.03.09 17:24, Zdenek Kotala napsal(a): > Dne 17.03.09 17:13, Zdenek Kotala napsal(a): > >>> I don’t know if I should revert that change, or port netBSD >>> getopt_long and replace the PostgreSQL one with that. >> >> getopt_long is OK. Problem is getopt. getopt in core is currently >> taken from *BSD but it could be updated. >> >> One possible solution should be to use internal getopt only for >> postgres binary and for other to use solaris libc version. > > I'm now looking into the code and most binaries uses getopt_long which > works fine. I think getopt_long can be used from Solaris libc. Only > postgres and pg_resetxlog uses getopt. I think nobody has problem with > postgres, because it is called from pg_ctl or other start/stop script > and pg_resetxlog should be modified to use getopt_long. > There is a patch, please test it if it satisfy your expectation. Do not forget run autoconf. Zdenek *** pgsql.ed0f426df3ee/configure.in 2009-03-17 17:42:16.131828743 +0100 --- pgsql/configure.in 2009-03-17 17:38:58.639284000 +0100 *************** *** 1274,1280 **** # our versions on that platform. if test "$PORTNAME" = "solaris"; then AC_LIBOBJ(getopt) - AC_LIBOBJ(getopt_long) elif test x"$ac_cv_type_struct_option" = xyes ; then AC_REPLACE_FUNCS([getopt_long]) else --- 1274,1279 ----
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > [ use Solaris' version of getopt_long ] The reason not to do that was discussed in this thread: http://archives.postgresql.org//pgsql-patches/2008-02/msg00075.php While Chuck is certainly free to build his local copy however he wants, I don't think we're going to revert this change in the community sources. regards, tom lane
What about the idea of updating our getopt and getopt_long to a more modern version by porting over netBSD getopt? The current situation is confusing for users, as "psql databasename -p port" type of calls works on almost all platforms,but not on those using the internal getopt/getopt_long. For those, you get "-p" is not a valid user. This is because MAC, BSD and GNU getopt_long permutes the arguments, and our getopt_long does not. > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, March 17, 2009 11:02 AM > To: Zdenek Kotala > Cc: Chuck McDevitt; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Solaris getopt_long and PostgreSQL > > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > [ use Solaris' version of getopt_long ] > > The reason not to do that was discussed in this thread: > > http://archives.postgresql.org//pgsql-patches/2008-02/msg00075.php > > While Chuck is certainly free to build his local copy however he wants, > I don't think we're going to revert this change in the community > sources. > > regards, tom lane
Chuck McDevitt <cmcdevitt@greenplum.com> writes: > This is because MAC, BSD and GNU getopt_long permutes the arguments, and our getopt_long does not. AFAIK those all work by scribbling on the original argv[] array, a behavior that seems pretty risky from a portability standpoint. Since our port/ module is only going to get used on old platforms with no or broken getopt_long(), it needs to be pretty conservative about what it assumes the system environment can handle. regards, tom lane
Perhaps I could use the same test pg_status uses to decide PS_USE_CHANGE_ARGV and PS_USE_CLOBBER_ARGV? Any obviously, we don't just use ours for platforms with no or broken getopt_long, since we are talking Solaris (which hasa bug in getopt, but getopt_long works fine) > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, March 17, 2009 11:26 AM > To: Chuck McDevitt > Cc: Zdenek Kotala; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Solaris getopt_long and PostgreSQL > > Chuck McDevitt <cmcdevitt@greenplum.com> writes: > > This is because MAC, BSD and GNU getopt_long permutes the arguments, > and our getopt_long does not. > > AFAIK those all work by scribbling on the original argv[] array, a > behavior that seems pretty risky from a portability standpoint. > Since our port/ module is only going to get used on old platforms with > no or broken getopt_long(), it needs to be pretty conservative about > what it assumes the system environment can handle. > > regards, tom lane
Dne 17.03.09 19:48, Chuck McDevitt napsal(a): > Any obviously, we don't just use ours for platforms with no or broken getopt_long, > since we are talking Solaris (whichhas a bug in getopt, but getopt_long works fine) Just for clarification: It is not bug in Solaris. It is Solaris' getopt extension for long arg processing from days when getopt_long did not exist (hmm it seems that it is still not in POSIX :( ). By my opinion PostgreSQL does not work correctly here, because it uses construction which is marked as a implementation depended in POSIX standard. See: http://www.opengroup.org/onlinepubs/009695399/functions/getopt.html http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap12.html#tag_12_02 Zdenek
On Tuesday 17 March 2009 20:02:14 Tom Lane wrote: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > [ use Solaris' version of getopt_long ] > > The reason not to do that was discussed in this thread: > > http://archives.postgresql.org//pgsql-patches/2008-02/msg00075.php That discussion was about option parsing in postgres/postmaster, which does not use getopt_long at all.
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > Dne 17.03.09 19:48, Chuck McDevitt napsal(a): >> Any obviously, we don't just use ours for platforms with no or broken getopt_long, >> since we are talking Solaris (which has a bug in getopt, but >> getopt_long works fine) > Just for clarification: > It is not bug in Solaris. Well, "bug" in the sense that it doesn't do what we want it to. After reviewing this thread and the one that led up to the 8.3 behavior, it seems clear that we failed to draw a distinction between getopt and getopt_long when we should have. We don't like Solaris' getopt but there seems no reason not to use Solaris' getopt_long. So Zdenek's suggestion to change configure seems the correct fix, and I've done that. regards, tom lane
I wrote: > After reviewing this thread and the one that led up to the 8.3 behavior, > it seems clear that we failed to draw a distinction between getopt and > getopt_long when we should have. We don't like Solaris' getopt but > there seems no reason not to use Solaris' getopt_long. So Zdenek's > suggestion to change configure seems the correct fix, and I've done > that. So my reward for that is to find that every one of the Solaris 11 buildfarm members is failing today: pg_regress: initdb failed Examine /export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/log/initdb.log for the reason. Command was: "/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/./tmp_check/install//export/home/tmp/pg-test/build-suncc/HEAD/inst/bin/initdb" -D"/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/./tmp_check/data" -L "/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/./tmp_check/install//export/home/tmp/pg-test/build-suncc/HEAD/inst/share/postgresql" --noclean--no-locale > "/export/home/tmp/pg-test/build-suncc/HEAD/pgsql.6320/src/test/regress/log/initdb.log" 2>&1 make: *** [check] Error 2 ================== pgsql.6320/src/test/regress/log/initdb.log =================== initdb: too many command-line arguments (first is "-L") Try "initdb --help" for more information. Running in noclean mode. Mistakes will not be cleaned up. Apparently the system version of getopt_long is broken on Solaris 11. My patience for this grows short. regards, tom lane
Tom Lane píše v so 28. 03. 2009 v 14:36 -0400: > I wrote: > > After reviewing this thread and the one that led up to the 8.3 behavior, > > it seems clear that we failed to draw a distinction between getopt and > > getopt_long when we should have. We don't like Solaris' getopt but > > there seems no reason not to use Solaris' getopt_long. So Zdenek's > > suggestion to change configure seems the correct fix, and I've done > > that. > > So my reward for that is to find that every one of the Solaris 11 > buildfarm members is failing today: I'm going to look on that. Zdenek
Tom Lane píše v so 28. 03. 2009 v 14:36 -0400: > Apparently the system version of getopt_long is broken on Solaris 11. > My patience for this grows short. It is not problem with getopt_long itself, but with symbol overriding. getopt_long uses optind and so on from libc, but e.g. initdb takes optind which is defined in port/getopt.c. It causes problem in following code: 02607 /* Non-option argument specifies data directory */ 02608 if (optind < argc) which compares different variable. I think it is general problem. HAVE_GETOPT_LONG cannot be defined when HAVE_GETOPT_H is not defined. I'm sorry for my previous patch. I had to make some mistake in autoconf/make magic. I attached a fix. Only problem what I see there is getopt_long.h which contains #ifdef HAVE_GETOPT_H #include <getopt.h> #endif but getopt.h is required for getopt_long(). Fortunately, content is similar with getopt_long.h and there is no problem with it on Solaris. Zdenek
Attachment
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > I attached a fix. Only problem what I see there is getopt_long.h which > contains I'm more concerned about the "static int optreset", which is 100% guaranteed to break things. regards, tom lane
Dne 31.03.09 04:55, Tom Lane napsal(a): > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: >> I attached a fix. Only problem what I see there is getopt_long.h which >> contains > > I'm more concerned about the "static int optreset", which is 100% > guaranteed to break things. Yeah correct, I overlooked that optreset is also defined as a extern. There is updated patch. Thanks Zdenek *** pgsql.c598eae30145/src/port/getopt.c 2009-03-31 13:31:56.896353577 +0200 --- pgsql/src/port/getopt.c 2009-03-31 12:22:04.619485958 +0200 *************** *** 36,47 **** static char sccsid[] = "@(#)getopt.c 8.3 (Berkeley) 4/27/95"; #endif /* LIBC_SCCS and not lint */ ! ! int opterr = 1, /* if error message should be printed */ ! optind = 1, /* index into parent argv vector */ ! optopt, /* character checked for validity */ ! optreset; /* reset getopt */ char *optarg; /* argument associated with option */ #define BADCH (int)'?' #define BADARG (int)':' --- 36,54 ---- static char sccsid[] = "@(#)getopt.c 8.3 (Berkeley) 4/27/95"; #endif /* LIBC_SCCS and not lint */ ! /* In situation when we use getopt_long from libc, we needs to use libc variable, ! * else it causes symbol overriding and optind contains nonsens. ! */ ! #ifndef HAVE_GETOPT_LONG ! int opterr = 1; /* if error message should be printed */ ! int optind = 1; /* index into parent argv vector */ ! int optopt; /* character checked for validity */ char *optarg; /* argument associated with option */ + #endif + + #ifndef HAVE_INT_OPTRESET + int optreset; /* reset getopt */ + #endif #define BADCH (int)'?' #define BADARG (int)':'
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > Yeah correct, I overlooked that optreset is also defined as a extern. > There is updated patch. On looking at this I still can't see how it's not broken. You are effectively assuming that getopt_long.c must define those variables. But surely getopt_long.c should be assuming that getopt.c defines them. Aren't we likely to end up with the situation that everyone is extern'ing them? What appears to me to be happening is that Solaris' linker is failing to merge global variable definitions when it should (must) do so. We need to find out why rather than solve it with a patch that will certainly break other platforms. If you can't come up with a real solution, we might have to do this with "#ifndef SOLARIS" or something similar. regards, tom lane
Dne 31.03.09 18:21, Tom Lane napsal(a): > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: >> Yeah correct, I overlooked that optreset is also defined as a extern. >> There is updated patch. > > On looking at this I still can't see how it's not broken. You are > effectively assuming that getopt_long.c must define those variables. No I assuming that when we use getopt_long from libc then libc already has these variables defined. It must. > But surely getopt_long.c should be assuming that getopt.c defines them. > Aren't we likely to end up with the situation that everyone is > extern'ing them? Yeah, getopt_long assumes that optind, opterr and so on are already defined by getopt. If you look in current implementation in BDS you can see that there is getopt_internal() and getopt(), getopt_long are only wrapper. http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/getopt_long.c?rev=1.23;content-type=text%2Fplain The main problem what I see here is that getopt and getopt_long works together. Use one from system and one ported is not good idea. I think best solution is to port new BSD version into postgreSQL and use both function from libc version or ported versin. Mixing then is risky. See also e.g. initdb.c, there is declaration of opterr... > What appears to me to be happening is that Solaris' linker is failing to > merge global variable definitions when it should (must) do so. We need > to find out why rather than solve it with a patch that will certainly > break other platforms. Linker cannot do it in general. Libc contains these variables and compiled code points to them. When you declare them again into your application you override them but only for your application, not for libc. > If you can't come up with a real solution, we might have to do this > with "#ifndef SOLARIS" or something similar. I prefer another solution then this, but it is also possible. Personally prefer to port new getopt_long. It is benefit also for win users. Zdenek
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > The main problem what I see here is that getopt and getopt_long works > together. Use one from system and one ported is not good idea. Well, the expected (and pretty-well-tested) case is that your system has getopt but not getopt_long. I don't see any reason why using ported getopt_long in that case is "not good idea". I agree that substituting getopt without substituting getopt_long is a tad risky, and probably hasn't been tested anyplace else previously. It may well be that we should revert to the previous state of affairs where we don't trust Solaris for either function. > I think best solution is to port new BSD version into postgreSQL and > use both function from libc version or ported versin. I'm not sure which part of "no" you didn't understand. Changing the contents of argv[] is going to be system-specific and there is no reason to believe that a BSD implementation will work everywhere. regards, tom lane
Tom Lane píše v út 31. 03. 2009 v 13:10 -0400: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > The main problem what I see here is that getopt and getopt_long works > > together. Use one from system and one ported is not good idea. > > Well, the expected (and pretty-well-tested) case is that your system has > getopt but not getopt_long. I don't see any reason why using ported > getopt_long in that case is "not good idea". I'm looking on to POSIX and all opt* variable are specified there and getopt_long use only what is specified. It should be OK. > I agree that substituting getopt without substituting getopt_long is a > tad risky, and probably hasn't been tested anyplace else previously. It seems to me, that optind,... is same case lake optreset. I'm thinking to add HAVE_INT_OPTIND macro (similar to HAVE_INT_OPTRESET) and use it instead of HAVE_GETOPT_LONG in my previous patch. Another possibility is to rewrite postgres(and pg_resetxlog) to use getopt_long() instead of getopt(). > It may well be that we should revert to the previous state of affairs > where we don't trust Solaris for either function. I would like to solve it rather then revert back. Zdenek
Zdenek Kotala píše v út 31. 03. 2009 v 21:25 +0200: > Another possibility is to rewrite postgres(and pg_resetxlog) to use > getopt_long() instead of getopt(). Attached patch rewrites postgres to use getopt_long instead of getopt. Patch also removes configure part for Solaris related to getopt. Zdenek
Attachment
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > Zdenek Kotala píše v út 31. 03. 2009 v 21:25 +0200: >> Another possibility is to rewrite postgres(and pg_resetxlog) to use >> getopt_long() instead of getopt(). > Attached patch rewrites postgres to use getopt_long instead of getopt. Actually, I fooled around with it last night and seem to have fixed it (buildfarm is all green today) by the expedient of not defining our own optind etc. variables if the system supplies them. So that seemed like a clean fix to me --- the old handling of optreset in particular was a huge kluge, whereas it's clear how this code is supposed to work. I don't think we need to mess around with changing our option parsing logic, especially not to the extent that you propose here. regards, tom lane
Tom Lane píše v ne 05. 04. 2009 v 17:44 -0400: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > Zdenek Kotala píše v út 31. 03. 2009 v 21:25 +0200: > >> Another possibility is to rewrite postgres(and pg_resetxlog) to use > >> getopt_long() instead of getopt(). > > > Attached patch rewrites postgres to use getopt_long instead of getopt. > > Actually, I fooled around with it last night and seem to have fixed it > (buildfarm is all green today) by the expedient of not defining our own > optind etc. variables if the system supplies them. So that seemed like > a clean fix to me --- the old handling of optreset in particular was a > huge kluge, whereas it's clear how this code is supposed to work. Yeah, everything is green today. Thanks. Is it possible to backport it at least to 8.3? > I don't think we need to mess around with changing our option parsing > logic, especially not to the extent that you propose here. When previous solution works well on all platforms there is no reason to use getopt_long. Thanks Zdenek