Thread: ERRORDATA_STACK_SIZE panic crashes on Windows
Have any Windows-using hackers tried to look into the reports of $SUBJECT on 8.3? We have two fresh reports: http://archives.postgresql.org/pgsql-bugs/2008-05/msg00106.php http://archives.postgresql.org/pgsql-bugs/2008-05/msg00109.php and this isn't the first time we've heard of it. I spent some time chasing the theory that the conversion from UTF8 to the client encoding was failing; but if that's the case it should be reproducible on non-Windows machines, and I didn't have any luck with that. What I'm currently thinking is that maybe gettext() isn't on the same page as us concerning what encoding it's supposed to produce its output in. This is reinforced by the mention of changing lc_messages in the first report above. We had had some discussions of trying to adjust the LC_XXX values to ensure that they all represent the same encoding choice, but I don't believe that got done. It might also be significant that both complainants are using UTF8 database encoding; IIRC that has some weird status in the Windows locale world. I am also toying with the idea of disabling gettext usage once we get past two or so levels of error nesting, in order to prevent the recursion panic in this type of scenario --- but it would be real nice to know for certain what is happening before we try to fix it. regards, tom lane
I wrote: > Have any Windows-using hackers tried to look into the reports of > $SUBJECT on 8.3? We have two fresh reports: > http://archives.postgresql.org/pgsql-bugs/2008-05/msg00106.php > http://archives.postgresql.org/pgsql-bugs/2008-05/msg00109.php > and this isn't the first time we've heard of it. And now we have another report: http://archives.postgresql.org/pgsql-bugs/2008-05/msg00159.php for which the complainant was kind enough to supply the requested details, and they seem to confirm my previous theory: > What I'm currently thinking is that maybe gettext() isn't on the > same page as us concerning what encoding it's supposed to produce > its output in. After digging around in the source code a bit, the reason why (and why it's only seen on Windows) became blindingly obvious :-(. On Windows we specifically allow UTF-8 database encoding to be selected regardless of the system locale settings. This is (AFAIK) safe for the basic locale-dependent stuff we do, because that all goes through special UTF-16-using code paths. But it leaves gettext utterly misinformed about what it is supposed to do. Fortunately there is a way to tell gettext what to do, and accordingly I propose the attached patch. I am not in a position to test it however. Would somebody replicate the failure and confirm this fixes it? regards, tom lane Index: mbutils.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v retrieving revision 1.70 diff -c -r1.70 mbutils.c *** mbutils.c 12 Apr 2008 23:21:04 -0000 1.70 --- mbutils.c 26 May 2008 17:15:49 -0000 *************** *** 697,702 **** --- 697,721 ---- DatabaseEncoding = &pg_enc2name_tbl[encoding]; Assert(DatabaseEncoding->encoding == encoding); + + /* + * On Windows, we allow UTF-8 database encoding to be used with any + * locale setting, because UTF-8 requires special handling anyway. + * But this means that gettext() might be misled about what output + * encoding it should use, so we have to tell it explicitly. + * + * In future we might want to call bind_textdomain_codeset + * unconditionally, but that requires knowing how to spell the codeset + * name properly for all encodings on all platforms, which might be + * problematic. + * + * This is presently unnecessary, but harmless, on non-Windows platforms. + */ + #ifdef ENABLE_NLS + if (encoding == PG_UTF8) + if (bind_textdomain_codeset("postgres", "UTF-8") == NULL) + elog(LOG, "bind_textdomain_codeset failed"); + #endif } void
Tom Lane wrote: > I wrote: > > Have any Windows-using hackers tried to look into the reports of > > $SUBJECT on 8.3? We have two fresh reports: > > http://archives.postgresql.org/pgsql-bugs/2008-05/msg00106.php > > http://archives.postgresql.org/pgsql-bugs/2008-05/msg00109.php > > and this isn't the first time we've heard of it. > > And now we have another report: > http://archives.postgresql.org/pgsql-bugs/2008-05/msg00159.php > for which the complainant was kind enough to supply the requested > details, and they seem to confirm my previous theory: > > > What I'm currently thinking is that maybe gettext() isn't on the > > same page as us concerning what encoding it's supposed to produce > > its output in. > > After digging around in the source code a bit, the reason why (and why > it's only seen on Windows) became blindingly obvious :-(. On Windows > we specifically allow UTF-8 database encoding to be selected > regardless of the system locale settings. This is (AFAIK) safe for > the basic locale-dependent stuff we do, because that all goes through > special UTF-16-using code paths. But it leaves gettext utterly > misinformed about what it is supposed to do. > > Fortunately there is a way to tell gettext what to do, and accordingly > I propose the attached patch. I am not in a position to test it > however. Would somebody replicate the failure and confirm this > fixes it? After some work, I've managed to reproduce it in my test environment for Swedish, and I can confirm that the patch fixes the issue. Just for kicks, I've applied this patch so you, so you get to be on the receiving side of that ;-) //Magnus //Magnus
Magnus Hagander <magnus@hagander.net> writes: > Tom Lane wrote: >> Fortunately there is a way to tell gettext what to do, and accordingly >> I propose the attached patch. I am not in a position to test it >> however. Would somebody replicate the failure and confirm this >> fixes it? > After some work, I've managed to reproduce it in my test environment for > Swedish, and I can confirm that the patch fixes the issue. Thanks. > Just for kicks, I've applied this patch so you, so you get to be on the > receiving side of that ;-) No objection here. I noticed that you applied the patch to 8.2 as well. It should be harmless enough, but we weren't having the problem in 8.2 were we? Or am I just confused? regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > Tom Lane wrote: > >> Fortunately there is a way to tell gettext what to do, and accordingly > >> I propose the attached patch. I am not in a position to test it > >> however. Would somebody replicate the failure and confirm this > >> fixes it? > > > After some work, I've managed to reproduce it in my test environment for > > Swedish, and I can confirm that the patch fixes the issue. > > Thanks. > > > Just for kicks, I've applied this patch so you, so you get to be on the > > receiving side of that ;-) > > No objection here. > > I noticed that you applied the patch to 8.2 as well. It should be > harmless enough, but we weren't having the problem in 8.2 were we? > Or am I just confused? I am seeing a compile falure after this patch on BSD/OS 4.3.1. The failure is during link of the backend binary: -lssl -lcrypto -lgetopt -ldl -lutil -lm -o postgresutils/mb/mbutils.o: In function `SetDatabaseEncoding':utils/mb/mbutils.o(.text+0xbbc):undefined reference to `bind_textdomain_codeset'gmake: *** [postgres]Error 1 -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > I am seeing a compile falure after this patch on BSD/OS 4.3.1. The > failure is during link of the backend binary: > -lssl -lcrypto -lgetopt -ldl -lutil -lm -o postgres > utils/mb/mbutils.o: In function `SetDatabaseEncoding': > utils/mb/mbutils.o(.text+0xbbc): undefined reference to `bind_textdomain_codeset' > gmake: *** [postgres] Error 1 Hm, I assume you used --enable-nls ... why isn't libintl mentioned in the link? regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I am seeing a compile falure after this patch on BSD/OS 4.3.1. The > > failure is during link of the backend binary: > > > -lssl -lcrypto -lgetopt -ldl -lutil -lm -o postgres > > utils/mb/mbutils.o: In function `SetDatabaseEncoding': > > utils/mb/mbutils.o(.text+0xbbc): undefined reference to `bind_textdomain_codeset' > > gmake: *** [postgres] Error 1 > > Hm, I assume you used --enable-nls ... why isn't libintl mentioned > in the link? It was cut off --- the libraries are: ../../src/port/libpgport_srv.a -lintl -lssl -lcrypto -lgetopt -ldl-lutil -lm -o postgres -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Hm, I assume you used --enable-nls ... why isn't libintl mentioned >> in the link? > It was cut off --- the libraries are: > ../../src/port/libpgport_srv.a -lintl -lssl -lcrypto -lgetopt -ldl > -lutil -lm -o postgres OK, so your version of libintl doesn't have bind_textdomain_codeset? Is that functionality missing altogether, or just named differently? regards, tom lane
I wrote: > OK, so your version of libintl doesn't have bind_textdomain_codeset? Some digging in the gettext changelog suggests that bind_textdomain_codeset was added in gettext-0.10.36, released 2001-03-29. We can either add a configure test or say that we don't support such old versions of gettext ... regards, tom lane
> We can either add a configure test or say that we don't support > such old versions of gettext ... We don't support seems like a very reasonable response considering the age. Sincerely, Joshua D. Drake > > regards, tom lane >
Tom Lane wrote: > I wrote: > > OK, so your version of libintl doesn't have bind_textdomain_codeset? > > Some digging in the gettext changelog suggests that > bind_textdomain_codeset was added in gettext-0.10.36, released > 2001-03-29. > > We can either add a configure test or say that we don't support > such old versions of gettext ... Or we could just #ifdef the whole thing to win32, since it's not really needed on other platforms, pushing that decision to later... (when that version of gettext will be even more obsolete) //Magnus
Tom Lane wrote: > > Just for kicks, I've applied this patch so you, so you get to be on > > the receiving side of that ;-) > > No objection here. > > I noticed that you applied the patch to 8.2 as well. It should be > harmless enough, but we weren't having the problem in 8.2 were we? > Or am I just confused? I think the underlying problem is still there, it's just that it doesn't manifest itself in a crash here. I figured it couldn't hurt to fix it in that case. //Magnus
Magnus Hagander <magnus@hagander.net> writes: > Tom Lane wrote: >> We can either add a configure test or say that we don't support >> such old versions of gettext ... > Or we could just #ifdef the whole thing to win32, since it's not > really needed on other platforms, pushing that decision to later... > (when that version of gettext will be even more obsolete) That would work for the moment, but we're almost certainly going to have to insist on bind_textdomain_codeset being available eventually; AFAICS there's no hope of multi-locale/multi-encoding support without it. I was considering either: 1. Add a probe for bind_textdomain_codeset to configure, and conditionalize the new patch on HAVE_BIND_TEXTDOMAIN_CODESET. 2. Adjust the AC_SEARCH_LIBS call to probe for bind_textdomain_codeset() instead of gettext() as it does now. This would have the effect of rejecting pre-0.10.36 versions of the library. Magnus' suggestion gives a third possibility. I notice that the PGAC_CHECK_GETTEXT macro already contains the commentdnl FIXME: We should probably check for version >=0.10.36. So depending on what that's about, there might be some other good reasons to go with choice #2. Peter, it appears you put that comment in when you first added the macro, on 2001-06-02. Do you remember why? regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > Tom Lane wrote: > >> We can either add a configure test or say that we don't support > >> such old versions of gettext ... > > > Or we could just #ifdef the whole thing to win32, since it's not > > really needed on other platforms, pushing that decision to later... > > (when that version of gettext will be even more obsolete) > > That would work for the moment, but we're almost certainly going to > have to insist on bind_textdomain_codeset being available eventually; > AFAICS there's no hope of multi-locale/multi-encoding support without > it. Yes, that's why I said it would only push the decision to the future. Perhaps doing this #ifdef would be a good idea for back-branches, and then we look at one of the other solutions for 8.4? > I was considering either: > > 1. Add a probe for bind_textdomain_codeset to configure, and > conditionalize the new patch on HAVE_BIND_TEXTDOMAIN_CODESET. > > 2. Adjust the AC_SEARCH_LIBS call to probe for > bind_textdomain_codeset() instead of gettext() as it does now. This > would have the effect of rejecting pre-0.10.36 versions of the > library. Depending on the buildfarm issue it may be that the software is antique enough that almost only Bruce runs such an old version. If so, I think #2 is just fine (except in back branches, of course) > Magnus' suggestion gives a third possibility. > > I notice that the PGAC_CHECK_GETTEXT macro already contains the > comment dnl FIXME: We should probably check for version >=0.10.36. > So depending on what that's about, there might be some other good > reasons to go with choice #2. Peter, it appears you put that comment > in when you first added the macro, on 2001-06-02. Do you remember > why? Could it possibly be for this very reason? //Magnus
Magnus Hagander wrote: > > 2. Adjust the AC_SEARCH_LIBS call to probe for > > bind_textdomain_codeset() instead of gettext() as it does now. This > > would have the effect of rejecting pre-0.10.36 versions of the > > library. > > Depending on the buildfarm issue it may be that the software is antique > enough that almost only Bruce runs such an old version. If so, I think > #2 is just fine (except in back branches, of course) You don't have to fix it just for me --- I can remove --enable-nls; the big question is who else is going to hit this. I think the buildfarm has safe-enough checking for 8.4, but I am concerned about the back branches where testing isn't as complete. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Magnus Hagander wrote: > > > 2. Adjust the AC_SEARCH_LIBS call to probe for > > > bind_textdomain_codeset() instead of gettext() as it does now. > > > This would have the effect of rejecting pre-0.10.36 versions of > > > the library. > > > > Depending on the buildfarm issue it may be that the software is > > antique enough that almost only Bruce runs such an old version. If > > so, I think #2 is just fine (except in back branches, of course) > > You don't have to fix it just for me --- I can remove --enable-nls; > the big question is who else is going to hit this. I think the > buildfarm has safe-enough checking for 8.4, but I am concerned about > the back branches where testing isn't as complete. This is why I'm suggesting the #ifdef WIN32 for back branches. //Magnus
On Tue, May 27, 2008 at 8:05 PM, Bruce Momjian <bruce@momjian.us> wrote: > You don't have to fix it just for me --- I can remove --enable-nls; the > big question is who else is going to hit this. I think the buildfarm > has safe-enough checking for 8.4, but I am concerned about the back > branches where testing isn't as complete. We'll find out in a few hours. My guess is that anyone happy to be running such an old version of gettext is probably running old versions of everything, including PG. Your case is obviously a little different. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
Tom Lane wrote: > I notice that the PGAC_CHECK_GETTEXT macro already contains the comment > dnl FIXME: We should probably check for version >=0.10.36. > So depending on what that's about, there might be some other good > reasons to go with choice #2. Peter, it appears you put that comment in > when you first added the macro, on 2001-06-02. Do you remember why? I think that was the first version that worked sanely in general.
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane wrote: >> I notice that the PGAC_CHECK_GETTEXT macro already contains the comment >> dnl FIXME: We should probably check for version >=0.10.36. >> So depending on what that's about, there might be some other good >> reasons to go with choice #2. Peter, it appears you put that comment in >> when you first added the macro, on 2001-06-02. Do you remember why? > I think that was the first version that worked sanely in general. Hmm. Bruce, what gettext version are you running exactly, and how much have you ever tested the localization behavior with it? regards, tom lane
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane wrote: > >> I notice that the PGAC_CHECK_GETTEXT macro already contains the comment > >> dnl FIXME: We should probably check for version >=0.10.36. > >> So depending on what that's about, there might be some other good > >> reasons to go with choice #2. Peter, it appears you put that comment in > >> when you first added the macro, on 2001-06-02. Do you remember why? > > > I think that was the first version that worked sanely in general. > > Hmm. Bruce, what gettext version are you running exactly, and how much > have you ever tested the localization behavior with it? Uh, I can't seem to find the libintl version --- my bet is that it was installed as part of another package but I am not sure which one. I have never used it to test localization support so turning it off is fine --- I just enabled it to catch compile failures. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Peter Eisentraut <peter_e@gmx.net> writes: >>> I think that was the first version that worked sanely in general. >> >> Hmm. Bruce, what gettext version are you running exactly, and how much >> have you ever tested the localization behavior with it? > Uh, I can't seem to find the libintl version --- my bet is that it was > installed as part of another package but I am not sure which one. > I have never used it to test localization support so turning it off is > fine --- I just enabled it to catch compile failures. I dug through the gettext changelogs a bit more. There were *three years* and a lot of fixes between 0.10.35 and 0.10.36; the one that is most directly relevant to us is * Locales which differ only in the character encoding, for example ja_JP and ja_JP.UTF-8, can now share the same messagecatalogs. gettext converts the messages to the appropriate character encoding on the fly. Since we supply only one message catalog per language, it's probably fair to say that gettext versions without this ability are broken for our purposes. So I'm thinking that the correct fix is to require bind_textdomain_codeset to be present when --enable-nls is requested. I will go make that happen in the versions that have the new patch (ie, 8.2 and up). regards, tom lane