Thread: Complier warnings on mingw gcc 4.5.0
I compiled the source with mingw gcc 4.5.0, that has been released recently. The compile was succeeded and worked well at least for simple queries, but there were many warnings during the compile. ---- 1. warning: '<symbol>' redeclared without dllimport attribute: previous dllimport ignored 2. warning: unknown conversion type character 'm' in format 3. warning: unknown conversion type character 'l' in format ---- 1 is easy to fix with the attached patch. I wonder why mingw gcc < 4.5 can build codes without the fix... For 2, we could remove __attribute__((format(printf))) on mingw, but it also disables type checking for formatters. Are there better solutions? I have not yet researched for 3, that might be most dangerous. =# select version(); version ----------------------------------------------------------------------------------PostgreSQL 9.0.1 on i686-pc-mingw32, compiledby GCC gcc.exe (GCC) 4.5.0, 32-bit (1 row) OS: Windows 7 64bit diff --git a/src/include/port/win32.h b/src/include/port/win32.h index 988c1c9..31c877f 100644 *** a/src/include/port/win32.h --- b/src/include/port/win32.h *************** *** 58,64 **** #define PGDLLIMPORT __declspec (dllimport) #endif ! #ifdef _MSC_VER #define PGDLLEXPORT __declspec (dllexport) #else #define PGDLLEXPORT __declspec (dllimport) --- 58,64 ---- #define PGDLLIMPORT __declspec (dllimport) #endif ! #if defined(_MSC_VER) || __GNUC__ >= 4 #define PGDLLEXPORT __declspec (dllexport) #else #define PGDLLEXPORT __declspec(dllimport) -- Itagaki Takahiro
(2010/11/01 19:24), Itagaki Takahiro wrote: > I compiled the source with mingw gcc 4.5.0, that has been released recently. > The compile was succeeded and worked well at least for simple queries, > but there were many warnings during the compile. > ---- > 1. warning: '<symbol>' redeclared without dllimport attribute: > previous dllimport ignored > 2. warning: unknown conversion type character 'm' in format > 3. warning: unknown conversion type character 'l' in format > ---- > > 1 is easy to fix with the attached patch. Is it safe to put back the patch you applied in http://archives.postgresql.org/pgsql-committers/2010-05/msg00338.php in the case __GNUC__ >=4? regards, Hiroshi Inoue > I wonder why mingw gcc< 4.5 can build codes without the fix... > > *** a/src/include/port/win32.h > --- b/src/include/port/win32.h > *************** > *** 58,64 **** > #define PGDLLIMPORT __declspec (dllimport) > #endWindows 7 64bit > > > diff --git a/src/include/port/win32.h b/src/include/port/win32.h > indexif > > ! #ifdef _MSC_VER > #define PGDLLEXPORT __declspec (dllexport) > #else > #define PGDLLEXPORT __declspec (dllimport) > --- 58,64 ---- > #define PGDLLIMPORT __declspec (dllimport) > #endif > > ! #if defined(_MSC_VER) || __GNUC__>= 4 > #define PGDLLEXPORT __declspec (dllexport) > #else > #define PGDLLEXPORT __declspec (dllimport)
On Tue, Nov 2, 2010 at 6:02 AM, Hiroshi Inoue <inoue@tpf.co.jp> wrote: >> 1. warning: '<symbol>' redeclared without dllimport attribute: >> previous dllimport ignored > > Is it safe to put back the patch you applied in > http://archives.postgresql.org/pgsql-committers/2010-05/msg00338.php > in the case __GNUC__ >=4? Hmmm, I didn't test compiling with gcc version between 3.5 and 4.4. Does anyone test them? If it works only on gcc 4.5, I'll also add _GNUC_MINOR__ >= 5 for the check. -- Itagaki Takahiro
(2010/11/02 8:31), Itagaki Takahiro wrote: > On Tue, Nov 2, 2010 at 6:02 AM, Hiroshi Inoue<inoue@tpf.co.jp> wrote: >>> 1. warning: '<symbol>' redeclared without dllimport attribute: >>> previous dllimport ignored >> >> Is it safe to put back the patch you applied in >> http://archives.postgresql.org/pgsql-committers/2010-05/msg00338.php >> in the case __GNUC__>=4? > > Hmmm, I didn't test compiling with gcc version between 3.5 and 4.4. > Does anyone test them? If it works only on gcc 4.5, I'll also add > _GNUC_MINOR__>= 5 for the check. The problem which was fixed by your old patch is at runtime not at compilation time. Is it fixed with gcc 4.5? regards, Hiroshi Inoue
On Tue, Nov 2, 2010 at 8:43 AM, Hiroshi Inoue <inoue@tpf.co.jp> wrote: > The problem which was fixed by your old patch is at runtime not > at compilation time. Is it fixed with gcc 4.5? Now it works as far as simple test, including core functions and dynamic modules. So, I think the fix for dllexport is safe enough for mingw gcc 4.5. BTW, with out without the above fix, regression test failed with weird error if the server is built with gcc 4.5. However, server can run if I started it manually with PGPORT or -o "--port=X". We might need another fix for the issue. ---- LOG: database system was shut down at 2010-11-02 10:32:13 JST LOG: database system is ready to accept connections LOG: autovacuum launcher started FATAL: parameter "port" cannot be changed without restarting the server (repeated) ---- -- Itagaki Takahiro
On Mon, Nov 1, 2010 at 7:24 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > 1. warning: '<symbol>' redeclared without dllimport attribute: > previous dllimport ignored I discussed with Hiroshi-san about the dllimport issue, and he pointed out that __declspec(dllexport) and dllwrap cannot work well together. So, the most suitable fix would be just removing __declspec (dllimport) from PGDLLEXPORT for mingw. diff --git a/src/include/port/win32.h b/src/include/port/win32.h index 988c1c9..3417ab5 100644 *** a/src/include/port/win32.h --- b/src/include/port/win32.h *************** *** 61,67 **** #ifdef _MSC_VER #define PGDLLEXPORT __declspec (dllexport) #else ! #define PGDLLEXPORT __declspec (dllimport) #endif #else /* not CYGWIN, not MSVC, not MingW */ #definePGDLLIMPORT --- 61,67 ---- #ifdef _MSC_VER #define PGDLLEXPORT __declspec (dllexport) #else ! #define PGDLLEXPORT #endif #else /* not CYGWIN, not MSVC, not MingW */ #define PGDLLIMPORT -- Itagaki Takahiro
On 11/01/2010 10:10 PM, Itagaki Takahiro wrote: > On Tue, Nov 2, 2010 at 8:43 AM, Hiroshi Inoue<inoue@tpf.co.jp> wrote: >> The problem which was fixed by your old patch is at runtime not >> at compilation time. Is it fixed with gcc 4.5? > Now it works as far as simple test, including core functions and > dynamic modules. So, I think the fix for dllexport is safe enough > for mingw gcc 4.5. > > > BTW, with out without the above fix, regression test failed with > weird error if the server is built with gcc 4.5. However, server > can run if I started it manually with PGPORT or -o "--port=X". > We might need another fix for the issue. > ---- > LOG: database system was shut down at 2010-11-02 10:32:13 JST > LOG: database system is ready to accept connections > LOG: autovacuum launcher started > FATAL: parameter "port" cannot be changed without restarting the server > (repeated) > ---- > I have just run into this when trying to set up a new buildfarm member running Mingw. Is there a fix? Is anyone working on it? Or do I just need to downgrade the compiler? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 11/01/2010 10:10 PM, Itagaki Takahiro wrote: >> BTW, with out without the above fix, regression test failed with >> weird error if the server is built with gcc 4.5. However, server >> can run if I started it manually with PGPORT or -o "--port=X". >> We might need another fix for the issue. >> ---- >> LOG: database system was shut down at 2010-11-02 10:32:13 JST >> LOG: database system is ready to accept connections >> LOG: autovacuum launcher started >> FATAL: parameter "port" cannot be changed without restarting the server >> (repeated) > I have just run into this when trying to set up a new buildfarm member > running Mingw. Is there a fix? Is anyone working on it? Or do I just > need to downgrade the compiler? It smells a little bit like an optimization bug. Does dialing down to -O0 make it go away? regards, tom lane
On 12/13/2010 12:01 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 11/01/2010 10:10 PM, Itagaki Takahiro wrote: >>> BTW, with out without the above fix, regression test failed with >>> weird error if the server is built with gcc 4.5. However, server >>> can run if I started it manually with PGPORT or -o "--port=X". >>> We might need another fix for the issue. >>> ---- >>> LOG: database system was shut down at 2010-11-02 10:32:13 JST >>> LOG: database system is ready to accept connections >>> LOG: autovacuum launcher started >>> FATAL: parameter "port" cannot be changed without restarting the server >>> (repeated) >> I have just run into this when trying to set up a new buildfarm member >> running Mingw. Is there a fix? Is anyone working on it? Or do I just >> need to downgrade the compiler? > It smells a little bit like an optimization bug. Does dialing down to > -O0 make it go away? > > Sadly, no. I'm testing downgrading the compiler now. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/13/2010 12:01 PM, Tom Lane wrote: >> It smells a little bit like an optimization bug. Does dialing down to >> -O0 make it go away? > Sadly, no. I'm testing downgrading the compiler now. Mph. FWIW, I see that my last build of Postgres for Fedora 14 would have been with gcc 4.5.1, because that's what F14 is shipping. And that passed its regression tests on at least x86 and x86_64. Maybe you should pester the mingw folk for a compiler update. regards, tom lane
On 12/13/2010 01:12 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 12/13/2010 12:01 PM, Tom Lane wrote: >>> It smells a little bit like an optimization bug. Does dialing down to >>> -O0 make it go away? >> Sadly, no. I'm testing downgrading the compiler now. > Mph. FWIW, I see that my last build of Postgres for Fedora 14 would > have been with gcc 4.5.1, because that's what F14 is shipping. And > that passed its regression tests on at least x86 and x86_64. Maybe > you should pester the mingw folk for a compiler update. > > Further digging shows some weirdness. This doesn't appear to be compiler-related. I've rolled back all the way to gcc 3.5. It is triggered by the following line in pg_regress.c, commenting out of which causes the problem to go away (although of course it causes the regression tests to fail): putenv(new_pgoptions); cheers andrew
On Mon, Dec 13, 2010 at 22:29, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 12/13/2010 01:12 PM, Tom Lane wrote: >> >> Andrew Dunstan<andrew@dunslane.net> writes: >>> >>> On 12/13/2010 12:01 PM, Tom Lane wrote: >>>> >>>> It smells a little bit like an optimization bug. Does dialing down to >>>> -O0 make it go away? >>> >>> Sadly, no. I'm testing downgrading the compiler now. >> >> Mph. FWIW, I see that my last build of Postgres for Fedora 14 would >> have been with gcc 4.5.1, because that's what F14 is shipping. And >> that passed its regression tests on at least x86 and x86_64. Maybe >> you should pester the mingw folk for a compiler update. >> >> > > Further digging shows some weirdness. This doesn't appear to be > compiler-related. I've rolled back all the way to gcc 3.5. It is triggered > by the following line in pg_regress.c, commenting out of which causes the > problem to go away (although of course it causes the regression tests to > fail): > > putenv(new_pgoptions); Take a look at 741e4ad7de9e0069533d90efdd5b1fc9f3a64c81. If you enable that codepath to run on mingw, does it fix it? (it's msvc only now) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Andrew Dunstan <andrew@dunslane.net> writes: > Further digging shows some weirdness. This doesn't appear to be > compiler-related. I've rolled back all the way to gcc 3.5. It is > triggered by the following line in pg_regress.c, commenting out of which > causes the problem to go away (although of course it causes the > regression tests to fail): > putenv(new_pgoptions); Oh really ... are we using src/port/unsetenv.c on that platform? I wonder if that little hack is incompatible with latest mingw libraries ... regards, tom lane
On 12/13/2010 04:34 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> Further digging shows some weirdness. This doesn't appear to be >> compiler-related. I've rolled back all the way to gcc 3.5. It is >> triggered by the following line in pg_regress.c, commenting out of which >> causes the problem to go away (although of course it causes the >> regression tests to fail): >> putenv(new_pgoptions); > Oh really ... are we using src/port/unsetenv.c on that platform? > I wonder if that little hack is incompatible with latest mingw > libraries ... It is using pgwin32_putenv() and pgwin32_unsetenv(). It appears not to be related to how the environment is set at all, but to how the backend is handling PGOPTIONS. Here's a TCP level dump of traffic showing the problem. The client is on Linux. 18:34:03.106882 IP aurelia.34700 > 192.168.10.109.postgres: Flags [P.], seq 9:86, ack 2, win 46, options [nop,nop,TS val 1504831233 ecr 1085898], length 77 0x0000: 4500 0081 f95d 4000 4006 aaf3 c0a8 0a68 E....]@.@......h 0x0010: c0a8 0a6d 878c 1538a55b 18ce c920 b723 ...m...8.[.....# 0x0020: 8018 002e 07ae 0000 0101 080a 59b1 e701 ............Y... 0x0030: 0010 91ca 0000 004d 0003 0000 7573 6572 .......M....user 0x0040: 0070 6772 756e 6e65 7200 6461 7461 6261 .pgrunner.databa 0x0050: 7365 0070 6f73 7467 7265 7300 6f70 7469 se.postgres.opti 0x0060: 6f6e 7300 2d63 206c 6f675f6d 696e 5f6d ons.-c.log_min_m 0x0070: 6573 7361 6765 733d 7761 726e 696e 6700 essages=warning. 0x0080: 00 . 18:34:03.209847 IP 192.168.10.109.postgres > aurelia.34700: Flags [P.], seq 2:129, ack 86, win 65450, options [nop,nop,TS val 1085900 ecr 1504831233], length 127 0x0000: 4500 00b3 9fd3 4000 8006 c44b c0a8 0a6d E.....@....K...m 0x0010: c0a8 0a68 1538878c c920 b723 a55b 191b ...h.8.....#.[.. 0x0020: 8018 ffaa df54 0000 0101 080a 0010 91cc .....T.......... 0x0030: 59b1 e701 5200 0000 0800 0000 0045 0000 Y...R........E.. 0x0040: 0075 5346 4154 414c 0043 3535 5030 3200 .uSFATAL.C55P02. 0x0050: 4d70 6172 616d 6574 6572 2022 706f 7274 Mparameter."port 0x0060: 2220 6361 6e6e 6f74 20626520 6368 616e ".cannot.be.chan 0x0070: 6765 6420 7769 7468 6f75 7420 7265 7374 ged.without.rest 0x0080: 61727469 6e67 2074 6865 2073 6572 7665 arting.the.serve 0x0090: 7200 4667 7563 2e63 004c 3437 3934 0052 r.Fguc.c.L4794.R 0x00a0: 7365 745f 636f 6e66 6967 5f6f 7074 696f set_config_optio 0x00b0: 6e00 00 n.. cheers andrew
On 12/13/2010 06:45 PM, I wrote: [ problem on Mingw with 'FATAL: parameter "port" cannot be changed without restarting the server' if client connection options are sent ] > > > > It appears not to be related to how the environment is set at all, but > to how the backend is handling PGOPTIONS. > > Regina Obe has pointed out to me, and I have confirmed, that this does not happen with REL8_4_STABLE. So either we have introduced a bug since then or something we have done since then has tickled a Mingw bug. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/13/2010 06:45 PM, I wrote: >>> [ problem on Mingw with 'FATAL: parameter "port" cannot be changed >>> without restarting the server' if client connection options are sent ] > Regina Obe has pointed out to me, and I have confirmed, that this does > not happen with REL8_4_STABLE. So either we have introduced a bug since > then or something we have done since then has tickled a Mingw bug. Hm. Does it happen in 9.0, or just HEAD? regards, tom lane
On 12/14/2010 12:10 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 12/13/2010 06:45 PM, I wrote: >>>> [ problem on Mingw with 'FATAL: parameter "port" cannot be changed >>>> without restarting the server' if client connection options are sent ] >> Regina Obe has pointed out to me, and I have confirmed, that this does >> not happen with REL8_4_STABLE. So either we have introduced a bug since >> then or something we have done since then has tickled a Mingw bug. > Hm. Does it happen in 9.0, or just HEAD? > > Yes, it happens on 9.0. I guess I need to start some git triangulation. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/13/2010 04:34 PM, Tom Lane wrote: >> Oh really ... are we using src/port/unsetenv.c on that platform? >> I wonder if that little hack is incompatible with latest mingw >> libraries ... > It is using pgwin32_putenv() and pgwin32_unsetenv(). It appears not to > be related to how the environment is set at all, but to how the backend > is handling PGOPTIONS. Mmm, right. Even if that was messed up, it could only manifest as pg_regress shipping a bogus connection request packet. But speaking of which ... > Here's a TCP level dump of traffic showing the problem. The client is on > Linux. > 18:34:03.106882 IP aurelia.34700 > 192.168.10.109.postgres: Flags [P.], > seq 9:86, ack 2, win 46, options [nop,nop,TS val 1504831233 ecr > 1085898], length 77 > 0x0000: 4500 0081 f95d 4000 4006 aaf3 c0a8 0a68 E....]@.@......h > 0x0010: c0a8 0a6d 878c 1538 a55b 18ce c920 b723 ...m...8.[.....# > 0x0020: 8018 002e 07ae 0000 0101 080a 59b1 e701 ............Y... > 0x0030: 0010 91ca 0000 004d 0003 0000 7573 6572 .......M....user > 0x0040: 0070 6772 756e 6e65 7200 6461 7461 6261 .pgrunner.databa > 0x0050: 7365 0070 6f73 7467 7265 7300 6f70 7469 se.postgres.opti > 0x0060: 6f6e 7300 2d63 206c 6f67 5f6d 696e 5f6d ons.-c.log_min_m > 0x0070: 6573 7361 6765 733d 7761 726e 696e 6700 essages=warning. > 0x0080: 00 . This seems quite odd now that I look at it. The packet contents imply that libpq saw PGOPTIONS="-c log_min_messages=warning" and no other environment variables that would cause it to append stuff to the connection request. Which is not at all how pg_regress ought to behave, even assuming that the buildfarm script sets up PGOPTIONS that way. I'd expect to see settings for timezone, datestyle, and intervalstyle in there. What was the client here exactly? Another line of attack is that we know from the response packet that the failure is being reported at guc.c:4794. It would be really useful to know what the call stack is there. Could you change that elog to an elog(PANIC) and get a stack trace from the ensuing core dump? regards, tom lane
On 12/14/2010 12:42 PM, Tom Lane wrote: > This seems quite odd now that I look at it. The packet contents imply > that libpq saw PGOPTIONS="-c log_min_messages=warning" and no other > environment variables that would cause it to append stuff to the > connection request. Which is not at all how pg_regress ought to behave, > even assuming that the buildfarm script sets up PGOPTIONS that way. > I'd expect to see settings for timezone, datestyle, and intervalstyle > in there. What was the client here exactly? Maybe I didn't explain this properly. The trace was not from pg_regress. It was from a connection from a standard Linux psql client. > Another line of attack is that we know from the response packet that the > failure is being reported at guc.c:4794. It would be really useful to > know what the call stack is there. Could you change that elog to an > elog(PANIC) and get a stack trace from the ensuing core dump? > > I can try that. Not sure how easy that is on Windows. cheers andrew
On 12/14/2010 12:42 PM, Tom Lane wrote: > Another line of attack is that we know from the response packet that the > failure is being reported at guc.c:4794. It would be really useful to > know what the call stack is there. Could you change that elog to an > elog(PANIC) and get a stack trace from the ensuing core dump? > That didn't work. But git bisect says it's this commit that's to blame: <https://github.com/postgres/postgres/commit/e710b65c1c56ca7b91f662c63d37ff2e72862a94> It's too late to dig further now. cheers andrew
Excerpts from Andrew Dunstan's message of mié dic 15 02:08:24 -0300 2010: > > On 12/14/2010 12:42 PM, Tom Lane wrote: > > > Another line of attack is that we know from the response packet that the > > failure is being reported at guc.c:4794. It would be really useful to > > know what the call stack is there. Could you change that elog to an > > elog(PANIC) and get a stack trace from the ensuing core dump? > > > > That didn't work. But git bisect says it's this commit that's to blame: > <https://github.com/postgres/postgres/commit/e710b65c1c56ca7b91f662c63d37ff2e72862a94> Hmm I wonder if this is reproducible in a non-Windows EXEC_BACKEND scenario. This bug seems closely related to process_postgres_switches. I guess it'd be useful to add some debugging printouts there to figure out what's being passed the second time around. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Andrew Dunstan's message of mié dic 15 02:08:24 -0300 2010: >> That didn't work. But git bisect says it's this commit that's to blame: >> <https://github.com/postgres/postgres/commit/e710b65c1c56ca7b91f662c63d37ff2e72862a94> > Hmm I wonder if this is reproducible in a non-Windows EXEC_BACKEND > scenario. I'm pretty sure I tried the no-flat-files code in that scenario while writing it. But it might be worth trying that again. You'd think though that if EXEC_BACKEND were sufficient to provoke it, all Windows builds would fail. I'm still mystified by what is the difference between Andrew's non-working installation and working mingw builds. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > This bug seems closely related to process_postgres_switches. I guess > it'd be useful to add some debugging printouts there to figure out > what's being passed the second time around. It strikes me that the most obvious source for a platform dependency there would be getopt(), in particular the arrangements to cause getopt to behave sanely when we invoke it on a different argc array the second time around. If that were failing for some reason, you could imagine getopt seeing 'postgres' as the next switch to parse, which could lead to the reported failure. Hence: 1. Is that build using src/port/getopt.c, or a library-supplied getopt? What about getopt_long.c? 2. Is HAVE_INT_OPTRESET getting defined? Should it be? regards, tom lane
On 12/15/2010 10:17 AM, Tom Lane wrote: > Alvaro Herrera<alvherre@commandprompt.com> writes: >> Excerpts from Andrew Dunstan's message of mié dic 15 02:08:24 -0300 2010: >>> That didn't work. But git bisect says it's this commit that's to blame: >>> <https://github.com/postgres/postgres/commit/e710b65c1c56ca7b91f662c63d37ff2e72862a94> >> Hmm I wonder if this is reproducible in a non-Windows EXEC_BACKEND >> scenario. > I'm pretty sure I tried the no-flat-files code in that scenario while > writing it. But it might be worth trying that again. You'd think > though that if EXEC_BACKEND were sufficient to provoke it, all Windows > builds would fail. I'm still mystified by what is the difference > between Andrew's non-working installation and working mingw builds. > > This is a new installation of Mingw. The buildfarm animals were set up years ago, with substantially older versions of Mingw. SO ISTM that either we have tickled a new bug of theirs or their new setup has tickled a bug of ours. cheers andrew
On 12/15/2010 11:12 AM, Tom Lane wrote: > Alvaro Herrera<alvherre@commandprompt.com> writes: >> This bug seems closely related to process_postgres_switches. I guess >> it'd be useful to add some debugging printouts there to figure out >> what's being passed the second time around. > It strikes me that the most obvious source for a platform dependency > there would be getopt(), in particular the arrangements to cause getopt > to behave sanely when we invoke it on a different argc array the second > time around. If that were failing for some reason, you could imagine > getopt seeing 'postgres' as the next switch to parse, which could lead > to the reported failure. > > Hence: > > 1. Is that build using src/port/getopt.c, or a library-supplied getopt? > What about getopt_long.c? > > 2. Is HAVE_INT_OPTRESET getting defined? Should it be? > > I had the same thought. I did try forcing use of our getopt and getopt_long, without success, but didn't look at optreset. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/15/2010 11:12 AM, Tom Lane wrote: >> 1. Is that build using src/port/getopt.c, or a library-supplied getopt? >> What about getopt_long.c? >> >> 2. Is HAVE_INT_OPTRESET getting defined? Should it be? > I had the same thought. I did try forcing use of our getopt and > getopt_long, without success, but didn't look at optreset. Do we use configure at all on a mingw build? If we don't, then HAVE_INT_OPTRESET is surely not getting defined. It looks to me like it might be a good idea to force HAVE_INT_OPTRESET on when we are using our own versions of getopt/getopt_long. If we don't set that, then correct behavior depends on the assumption that the internal variable "place" is pointing at a null when the second series of getopt calls starts. While I'm prepared to believe that the last call of getopt left it that way, it's not clear that we can safely assume that the underlying argv array hasn't been clobbered meanwhile. You might try adding some debug printouts to src/port/getopt.c to see if you can trace exactly what's happening there. regards, tom lane
On Wed, Dec 15, 2010 at 17:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 12/15/2010 11:12 AM, Tom Lane wrote: >>> 1. Is that build using src/port/getopt.c, or a library-supplied getopt? >>> What about getopt_long.c? >>> >>> 2. Is HAVE_INT_OPTRESET getting defined? Should it be? > >> I had the same thought. I did try forcing use of our getopt and >> getopt_long, without success, but didn't look at optreset. > > Do we use configure at all on a mingw build? If we don't, then > HAVE_INT_OPTRESET is surely not getting defined. We do use configure on mingw. The output from a regular mingw configure run formed the base for the config file we use for MSVC where we can't run it, but an actual mingw build will re-run configure every time. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > On Wed, Dec 15, 2010 at 17:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Do we use configure at all on a mingw build? �If we don't, then >> HAVE_INT_OPTRESET is surely not getting defined. > We do use configure on mingw. The output from a regular mingw > configure run formed the base for the config file we use for MSVC > where we can't run it, but an actual mingw build will re-run configure > every time. Hm. It still seems pretty likely to me that the root cause is a change in mingw's getopt library function, but I don't have a theory about the precise mechanism. Is there any convenient place where we can look at the current version of their library sources, as well as the version in use in the working buildfarm members? regards, tom lane
On 12/15/2010 11:49 AM, Tom Lane wrote: > Magnus Hagander<magnus@hagander.net> writes: >> On Wed, Dec 15, 2010 at 17:43, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Do we use configure at all on a mingw build? If we don't, then >>> HAVE_INT_OPTRESET is surely not getting defined. >> We do use configure on mingw. The output from a regular mingw >> configure run formed the base for the config file we use for MSVC >> where we can't run it, but an actual mingw build will re-run configure >> every time. > Hm. It still seems pretty likely to me that the root cause is a change > in mingw's getopt library function, but I don't have a theory about the > precise mechanism. Is there any convenient place where we can look at > the current version of their library sources, as well as the version in > use in the working buildfarm members? > > I think you're probably right. narwhal reports having optreset, but my Mingw reports not having it, so this looks like a likely culprit. cheers andrew
On 12/15/2010 12:28 PM, Andrew Dunstan wrote: > > > > I think you're probably right. narwhal reports having optreset, but my > Mingw reports not having it, so this looks like a likely culprit. And the attached hack allowed "make check" to succeed. I think the logic in tcop/postgres.c and postmaster/postmaster.c is probably wrong. If we are using our getopt/getopt_long, we want to be setting optreset, whether or not configure found one in the system libraries. cheers andrew
Attachment
Andrew Dunstan <andrew@dunslane.net> writes: > And the attached hack allowed "make check" to succeed. > I think the logic in tcop/postgres.c and postmaster/postmaster.c is > probably wrong. If we are using our getopt/getopt_long, we want to be > setting optreset, whether or not configure found one in the system > libraries. Yeah, that's what I suggested earlier; but if your build *wasn't* using our versions before, we're still no closer to understanding why it was failing then. Another small problem is that a close inspection of our getopt.c says that it does reset "place" to point at a constant before returning -1, in every path except the "--" case which I doubt is being invoked. So my idea that we were clobbering argv underneath it doesn't seem to hold up. I'm still feeling that we don't understand what's happening. regards, tom lane
On 12/15/2010 02:06 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> And the attached hack allowed "make check" to succeed. >> I think the logic in tcop/postgres.c and postmaster/postmaster.c is >> probably wrong. If we are using our getopt/getopt_long, we want to be >> setting optreset, whether or not configure found one in the system >> libraries. > Yeah, that's what I suggested earlier; but if your build *wasn't* using > our versions before, we're still no closer to understanding why it was > failing then. Another small problem is that a close inspection of our > getopt.c says that it does reset "place" to point at a constant before > returning -1, in every path except the "--" case which I doubt is being > invoked. So my idea that we were clobbering argv underneath it doesn't > seem to hold up. I'm still feeling that we don't understand what's > happening. > > Sure we are closer to understanding it. It seems quite clear to me that Mingw's getopt, which we have been using, has changed between versions, as indicated by the fact that on my mingw optreset is not found, but on narwhal it is found. I haven't looked into our getopt. cheers andrew
On 12/15/2010 02:22 PM, Andrew Dunstan wrote: > > > On 12/15/2010 02:06 PM, Tom Lane wrote: >> Andrew Dunstan<andrew@dunslane.net> writes: >>> And the attached hack allowed "make check" to succeed. >>> I think the logic in tcop/postgres.c and postmaster/postmaster.c is >>> probably wrong. If we are using our getopt/getopt_long, we want to be >>> setting optreset, whether or not configure found one in the system >>> libraries. >> Yeah, that's what I suggested earlier; but if your build *wasn't* using >> our versions before, we're still no closer to understanding why it was >> failing then. Another small problem is that a close inspection of our >> getopt.c says that it does reset "place" to point at a constant before >> returning -1, in every path except the "--" case which I doubt is being >> invoked. So my idea that we were clobbering argv underneath it doesn't >> seem to hold up. I'm still feeling that we don't understand what's >> happening. >> >> > > Sure we are closer to understanding it. It seems quite clear to me > that Mingw's getopt, which we have been using, has changed between > versions, as indicated by the fact that on my mingw optreset is not > found, but on narwhal it is found. And here is where it changed: <http://sourceforge.net/project/shownotes.php?release_id=24832> * A replacement implementation for the getopt() family of functions, adding support for the GNU getopt_long_only()function. Users should note that this intentionally *removes* support for the BSD or Mac OS-Xspecific, and non-standard, `optreset' global variable; to reset the getopt() scanner, use `optind = 0;' insteadof relying on this non-standard, non-portable and now-unsupported feature. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > And here is where it changed: > <http://sourceforge.net/project/shownotes.php?release_id=24832> > * A replacement implementation for the getopt() family of functions, > adding support for the GNU getopt_long_only() function. Users > should note that this intentionally *removes* support for the BSD > or Mac OS-X specific, and non-standard, `optreset' global variable; > to reset the getopt() scanner, use `optind = 0;' instead of relying > on this non-standard, non-portable and now-unsupported feature. Great. So instead of a nonstandard but pretty portable API, they decided on a nonstandard interpretation of optind ... which absolutely will not work for our usage, because we need to be able to tell getopt to skip over --single, even if we were willing to figure out whether getopt behaves this way or the more usual way. Dolts. While I don't mind forcing use of our getopt() on mingw, I'm a mite concerned by the idea that this might represent an upstream change we'll soon see elsewhere, rather than just mingw-specific brain damage. Anybody know? regards, tom lane
On 12/15/2010 03:52 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> And here is where it changed: >> <http://sourceforge.net/project/shownotes.php?release_id=24832> >> * A replacement implementation for the getopt() family of functions, >> adding support for the GNU getopt_long_only() function. Users >> should note that this intentionally *removes* support for the BSD >> or Mac OS-X specific, and non-standard, `optreset' global variable; >> to reset the getopt() scanner, use `optind = 0;' instead of relying >> on this non-standard, non-portable and now-unsupported feature. > Great. So instead of a nonstandard but pretty portable API, they > decided on a nonstandard interpretation of optind ... which absolutely > will not work for our usage, because we need to be able to tell getopt > to skip over --single, even if we were willing to figure out whether > getopt behaves this way or the more usual way. Dolts. > > While I don't mind forcing use of our getopt() on mingw, I'm a mite > concerned by the idea that this might represent an upstream change we'll > soon see elsewhere, rather than just mingw-specific brain damage. > Anybody know? > > On my Fedora box, man 3 getopt says this: A program that scans multiple argument vectors, or rescans the same vector more than once, and wants to make use ofGNU extensions such as '+' and '-' at the start of optstring, or changes the value of POSIXLY_CORRECT between scans,must reinitialize getopt() by resetting optind to 0, rather than the traditional value of 1. (Resetting to 0 forcesthe invocation of an internal initialization routine that rechecks POSIXLY_CORRECT and checks for GNU extensions in optstring.) Modulo the --single issue, we don't have to force use of our getopt on Mingw. This patch seems to work, at least to get regression working: diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 90854f4..9ae3767 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -753,6 +753,8 @@ PostmasterMain(intargc, char *argv[]) optind = 1; #ifdef HAVE_INT_OPTRESET optreset = 1; /* some systems need this too */ +#elsif defined (WIN32) && !defined(_MSC_VER) + optind = 0; /*modern Mingw needs this instead */ #endif /* For debugging: display postmaster environment */ diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index ff2e9bd..ea4ae79 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -3444,6 +3444,8 @@ process_postgres_switches(int argc, char *argv[], GucContext ctx) optind = 1; #ifdef HAVE_INT_OPTRESET optreset = 1; /* some systems need this too */ +#elsif defined (WIN32) && !defined(_MSC_VER) + optind = 0; /* modern Mingw needsthis instead */ #endif cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On my Fedora box, man 3 getopt says this: > A program that scans multiple argument vectors, or rescans the same > vector more than once, and wants to make use of GNU extensions such > as '+' and '-' at the start of optstring, or changes the value of > POSIXLY_CORRECT between scans, must reinitialize getopt() by > resetting optind to 0, rather than the traditional value of 1. > (Resetting to 0 forces the invocation of an internal initialization > routine that rechecks POSIXLY_CORRECT and checks for GNU extensions > in optstring.) Hmm, mine says the same, but it's not entirely clear how to parse the AND and OR conditions there. The fact that it works on Fedora suggests to me that the "multiple vectors" case is somehow ANDed with one of the other conditions. Anyway seems like the next step is to compare the Fedora getopt code with mingw's ... regards, tom lane
On 12/15/2010 04:26 PM, Tom Lane wrote: > Anyway seems like the next step is to compare the > Fedora getopt code with mingw's ... > > Mingw code attached. cheers andrew
Attachment
Andrew Dunstan <andrew@dunslane.net> writes: > Mingw code attached. Hm, where did you get this? Because it does have optreset, albeit in a gratuitously ABI-incompatible fashion: > #ifdef _BSD_SOURCE > /* > * BSD adds the non-standard `optreset' feature, for reinitialisation > * of `getopt' parsing. We support this feature, for applications which > * proclaim their BSD heritage, before including this header; however, > * to maintain portability, developers are advised to avoid it. > */ > # define optreset __mingw_optreset > > extern int optreset; > #endif However, I pulled down the allegedly current mingw source tarball from sourceforge, and what I found in it is an older version that has *not* got that change. The CVS tree there doesn't seem to have it either. So I'm disinclined to want to rely on setting _BSD_SOURCE, as I first thought might be the answer --- it looks to me like only some versions of mingw will respond to that. regards, tom lane
On 12/15/2010 06:42 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> Mingw code attached. > Hm, where did you get this? Because it does have optreset, albeit in a > gratuitously ABI-incompatible fashion: > >> #ifdef _BSD_SOURCE >> /* >> * BSD adds the non-standard `optreset' feature, for reinitialisation >> * of `getopt' parsing. We support this feature, for applications which >> * proclaim their BSD heritage, before including this header; however, >> * to maintain portability, developers are advised to avoid it. >> */ >> # define optreset __mingw_optreset >> >> extern int optreset; >> #endif > However, I pulled down the allegedly current mingw source tarball from > sourceforge, and what I found in it is an older version that has *not* > got that change. The CVS tree there doesn't seem to have it either. > So I'm disinclined to want to rely on setting _BSD_SOURCE, as I first > thought might be the answer --- it looks to me like only some versions > of mingw will respond to that. > > I downloaded <http://softlayer.dl.sourceforge.net/project/mingw/MinGW/BaseSystem/RuntimeLibrary/MinGW-RT/mingwrt-3.18/mingwrt-3.18-mingw32-src.tar.gz> which is allegedly the source for the latest released runtime. The section you cite is indeed in my system's getopt.h. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/15/2010 06:42 PM, Tom Lane wrote: >> Hm, where did you get this? > I downloaded > <http://softlayer.dl.sourceforge.net/project/mingw/MinGW/BaseSystem/RuntimeLibrary/MinGW-RT/mingwrt-3.18/mingwrt-3.18-mingw32-src.tar.gz> > which is allegedly the source for the latest released runtime. Huh, I wonder why it doesn't match what's in sourceforge CVS? Anyway, the short answer is that this code has got no visible commonality with glibc's getopt(), so we need not fear that what it's doing is likely to start happening elsewhere. I didn't take the time to trace through the glibc code exactly; I figure the lack of trouble reports is sufficient proof that we're not doing anything that it won't cope with. So I concur with the previous suggestions: 1. Make this problem go away by forcing use of our getopt code on mingw. 2. Make sure we reset optreset when using our code. (Probably not really necessary, but let's just be careful.) Should we backpatch either of these things? regards, tom lane
On 12/15/2010 07:24 PM, Tom Lane wrote: > > So I concur with the previous suggestions: > > 1. Make this problem go away by forcing use of our getopt code on > mingw. > > 2. Make sure we reset optreset when using our code. (Probably not > really necessary, but let's just be careful.) > > Should we backpatch either of these things? > > Yes. We need it to back at least to 9.0. I believe #2 is in fact necessary. When I tried just #1 before it failed. What's the best way to do #2 cleanly? cheers andrew
On 12/15/2010 07:24 PM, Tom Lane wrote: > > Huh, I wonder why it doesn't match what's in sourceforge CVS? Sourceforge's CVS is way out of date. CVS tip for the getopt.h is here: <http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/mingw/include/getopt.h?rev=1.4&content-type=text/x-cvsweb-markup&cvsroot=src> See <http://mingw.org/wiki/Official_CVS_Repository> That does mean we should also look at Cygwin, though, as they supposedly share the runtime. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/15/2010 07:24 PM, Tom Lane wrote: >> Should we backpatch either of these things? > Yes. We need it to back at least to 9.0. On reflection I think we probably better fix it back to 8.2, since we're supposedly supporting Windows on all those branches, and somebody might try to build any of them on modern mingw. > I believe #2 is in fact necessary. When I tried just #1 before it > failed. What's the best way to do #2 cleanly? We can't change the meaning of HAVE_INT_OPTRESET because that would break the declaration logic in getopt.c. I'm thinking we have to complicate the #if logic in postmaster.c and postgres.c. Will look into it as soon as I get done with the contrib/seg patch (ie in an hour or so). regards, tom lane
On 12/15/2010 08:46 PM, Tom Lane wrote: > >> I believe #2 is in fact necessary. When I tried just #1 before it >> failed. What's the best way to do #2 cleanly? > We can't change the meaning of HAVE_INT_OPTRESET because that would > break the declaration logic in getopt.c. I'm thinking we have to > complicate the #if logic in postmaster.c and postgres.c. I agree. > Will look > into it as soon as I get done with the contrib/seg patch (ie in an > hour or so). > > OK. I'll test any patch you post ASAP. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/15/2010 08:46 PM, Tom Lane wrote: >> We can't change the meaning of HAVE_INT_OPTRESET because that would >> break the declaration logic in getopt.c. I'm thinking we have to >> complicate the #if logic in postmaster.c and postgres.c. > I agree. >> Will look >> into it as soon as I get done with the contrib/seg patch (ie in an >> hour or so). > OK. I'll test any patch you post ASAP. OK, patch committed so we can get testing from the existing buildfarm members, but please try on your new installation too. regards, tom lane
On 12/15/2010 11:53 PM, Tom Lane wrote: > > OK, patch committed so we can get testing from the existing buildfarm > members, but please try on your new installation too. > > It's working, but I don't think it's right :-) In particular, I don't believe this, or rather I don't believe that its converse is false: /* If not HAVE_GETOPT, we are using src/port/getopt.c, which has optreset */ The trouble is that while we've forced use of our getopt.c, we haven't inhibited configure from checking the system's getopt. So on my test system HAVE_GETOPT is defined. Possibly I was wrong about needing to have optreset set with use of our port. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > It's working, but I don't think it's right :-) In particular, I don't > believe this, or rather I don't believe that its converse is false: > /* If not HAVE_GETOPT, we are using src/port/getopt.c, which has > optreset */ Yeah, that was a 90% solution. I had first tried to look at whether LIBOBJS contains "getopt.o", but it turns out that some versions of autoconf actively prevent you from referencing that variable at all in configure.in :-(. > The trouble is that while we've forced use of our getopt.c, we haven't > inhibited configure from checking the system's getopt. So on my test > system HAVE_GETOPT is defined. Possibly I was wrong about needing to > have optreset set with use of our port. That's what I thought before. At least in src/port/getopt.c (haven't looked at getopt_long), it would only take one more line of code to make it safe against optreset not being used. All that we need is an invariant that "place" is not pointing at potentially volatile storage when we return -1, and most of the returns have that already. (We don't really care about whether it can be reset without a previous set of calls having been run to completion, because we don't need that.) If the same is true in getopt_long, then what I think we should do is actually *remove* optreset from our implementations, and instead specify that you can only switch to a new argv array after -1 has been returned. That would remove a whole lot of unecessarily tense interactions with the system libraries. This is just in the nature of a cleanup so it probably needn't be backpatched, unless people would rather that code be the same across all branches (which does have its advantages). Comments? regards, tom lane