Thread: More Code Page wierdness
Spent a few hours today diagnosing some errors on Win32 (on Windows Server 2003). These were, I think, wrongly identified as being Windows Installer problems, so I believe Magnus was chasing his tail also. The problem seemed to be code page related... If you set listen_addresses="*" then the pgsql server doesn't recognise this because * in one code page is different from * in another. It looks like a *, but it isn't... Setting listen_addresses to other valid values works just fine....tested with "localhost" using local TCP/IP connection; "localhost, 10.0.0.x" with access from 10.0.0.x...all working fine. Changing the default Language setting to match that of your keyboard is only a temporary workaround, since you can't be sure which code page is in use by any particular application or window. The only way to be sure is to set the default code page to the current locale and reboot, but I'm not sure that catches everything either once things have been edited. ISTM that the code page comments for psql are NOT the only ones that get effected by any code page mismatch. (and BTW, Windows 1252 is the UK default, not just the German - it is "Latin I"). ***Please put them back onto the Windows Installation page, where they should be*** Anyway, main point here guys is deep code page wierdness is a wider problem than was at first thought... More testing required....and more on this particular problem to follow. I may suggest adding "any" as an option in the listen_addresses GUC, to ease the pain of this...but that seems too narrow a solution, much like the earlier suggestion to put the code page comments only on psql.... These are definitely not Windows Installer problems because it is perfectly valid action to change the Language of a server, at least in Europe. The server should work, no matter what any installer did/does....just the same as the server knows not to start if the installer incorrectly set up the rights of the instance owning userid. -- Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > The problem seemed to be code page related... > If you set listen_addresses="*" then the pgsql server doesn't recognise > this because * in one code page is different from * in another. It looks > like a *, but it isn't... I don't even know what a code page is, let alone what you envision us doing about it. How can my response to the above be anything except "Windows is too horribly broken to even contemplate supporting"? regards, tom lane
If you change the code page of a server after installing applications on it then it is your job to re-install those applicationsand choose the correct locale. Even M$ Sql Server will not automatically do this for you. It will throw an error / fail to execute any queries or proceduresthat depend upon the tempdb or master databases. The only way to fix is uninstall sql server and re-install usinga compatible code page as it will not let you change the locale setting of the system databases such as tempdb and master. Sql Server 2000 will also not warn you that the locale you have chosen does not match the operating system. Perhaps 2003 is a little more smarter but I doubt it. Mike On Mon, Jan 10, 2005 at 09:46:46PM +0000, Simon Riggs wrote: > Spent a few hours today diagnosing some errors on Win32 (on Windows > Server 2003). These were, I think, wrongly identified as being Windows > Installer problems, so I believe Magnus was chasing his tail also. > > The problem seemed to be code page related... > > If you set listen_addresses="*" then the pgsql server doesn't recognise > this because * in one code page is different from * in another. It looks > like a *, but it isn't... > > Setting listen_addresses to other valid values works just fine....tested > with "localhost" using local TCP/IP connection; "localhost, 10.0.0.x" > with access from 10.0.0.x...all working fine. > > Changing the default Language setting to match that of your keyboard is > only a temporary workaround, since you can't be sure which code page is > in use by any particular application or window. The only way to be sure > is to set the default code page to the current locale and reboot, but > I'm not sure that catches everything either once things have been > edited. > > ISTM that the code page comments for psql are NOT the only ones that get > effected by any code page mismatch. (and BTW, Windows 1252 is the UK > default, not just the German - it is "Latin I"). ***Please put them back > onto the Windows Installation page, where they should be*** > > Anyway, main point here guys is deep code page wierdness is a wider > problem than was at first thought... > > More testing required....and more on this particular problem to follow. > > I may suggest adding "any" as an option in the listen_addresses GUC, to > ease the pain of this...but that seems too narrow a solution, much like > the earlier suggestion to put the code page comments only on psql.... > > These are definitely not Windows Installer problems because it is > perfectly valid action to change the Language of a server, at least in > Europe. The server should work, no matter what any installer > did/does....just the same as the server knows not to start if the > installer incorrectly set up the rights of the instance owning userid. > > -- > Best Regards, Simon Riggs > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Is this codepage problem related to people getting the "can't map shared memory to a fixed address" problem some were having when using "*" for listen_addresses? I don't think so but wanted to ask. Also, is there anyway we can detect this problem when it happens and throw a clear error? I don't like adding 'any' as a new value at this stage. --------------------------------------------------------------------------- Simon Riggs wrote: > Spent a few hours today diagnosing some errors on Win32 (on Windows > Server 2003). These were, I think, wrongly identified as being Windows > Installer problems, so I believe Magnus was chasing his tail also. > > The problem seemed to be code page related... > > If you set listen_addresses="*" then the pgsql server doesn't recognise > this because * in one code page is different from * in another. It looks > like a *, but it isn't... > > Setting listen_addresses to other valid values works just fine....tested > with "localhost" using local TCP/IP connection; "localhost, 10.0.0.x" > with access from 10.0.0.x...all working fine. > > Changing the default Language setting to match that of your keyboard is > only a temporary workaround, since you can't be sure which code page is > in use by any particular application or window. The only way to be sure > is to set the default code page to the current locale and reboot, but > I'm not sure that catches everything either once things have been > edited. > > ISTM that the code page comments for psql are NOT the only ones that get > effected by any code page mismatch. (and BTW, Windows 1252 is the UK > default, not just the German - it is "Latin I"). ***Please put them back > onto the Windows Installation page, where they should be*** > > Anyway, main point here guys is deep code page wierdness is a wider > problem than was at first thought... > > More testing required....and more on this particular problem to follow. > > I may suggest adding "any" as an option in the listen_addresses GUC, to > ease the pain of this...but that seems too narrow a solution, much like > the earlier suggestion to put the code page comments only on psql.... > > These are definitely not Windows Installer problems because it is > perfectly valid action to change the Language of a server, at least in > Europe. The server should work, no matter what any installer > did/does....just the same as the server knows not to start if the > installer incorrectly set up the rights of the instance owning userid. > > -- > Best Regards, Simon Riggs > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Also, is there anyway we can detect this problem when it happens and > throw a clear error? Our documentation claims that psql will throw an error at startup time if you've bollixed your code page setting, but I'm darned if I see any code that does so. Is this something that happens down in the guts of Microsoft's libc equivalent, or is the documentation simply lying? regards, tom lane
> Is this codepage problem related to people getting the "can't > map shared memory to a fixed address" problem some were > having when using "*" for listen_addresses? I don't think so > but wanted to ask. Um. That issue is *fixed*. Could be there is a different issue with '*' as listen_address, but everybody who reported the issue that I've spoken to reports it fixed in current RCs. The fix being that Tom moved the shmem allocation to an earlier point. //Magnus
> > Also, is there anyway we can detect this problem when it > happens and > > throw a clear error? > > Our documentation claims that psql will throw an error at > startup time if you've bollixed your code page setting, but > I'm darned if I see any code that does so. Is this something > that happens down in the guts of Microsoft's libc equivalent, > or is the documentation simply lying? Not an error, but a warning. It's in src/bin/psql/startup.c, line 663 and forward. Called on line 304, right before going into the MainLoop(). //Magnus
> Spent a few hours today diagnosing some errors on Win32 (on > Windows Server 2003). These were, I think, wrongly identified > as being Windows Installer problems, so I believe Magnus was > chasing his tail also. I assume you are talking about the "Access Denied" on initdb error that David Saunders reported? If so, I erally don't think that specific problem has to do with encoding - it happens *before* initdb even starts. I think it's two different issues. > The problem seemed to be code page related... > > If you set listen_addresses="*" then the pgsql server doesn't > recognise this because * in one code page is different from * > in another. It looks like a *, but it isn't... Yikes! > Setting listen_addresses to other valid values works just > fine....tested with "localhost" using local TCP/IP > connection; "localhost, 10.0.0.x" > with access from 10.0.0.x...all working fine. > > Changing the default Language setting to match that of your > keyboard is only a temporary workaround, since you can't be > sure which code page is in use by any particular application > or window. The only way to be sure is to set the default code > page to the current locale and reboot, but I'm not sure that > catches everything either once things have been edited. You can log in as the postgres user and change it there. I've not seen this myself, and I've run on systems in US English and in Swedish. But you're saying this occurs if say I have the system default set to something that uses Latin1 and then use a different encoding when I edit the file? I guess that can be a problem, since notepad doesn't let you chose encoding. Or are you saying it occurs even if the encodings are the same? One way to solve this would be to keep the file in UTF-8 or something, I guess, but that's a fairly major code change... Or find some way to make sure it's always saved in whatever encoding postgres expects (US-ASCII?) > These are definitely not Windows Installer problems because > it is perfectly valid action to change the Language of a > server, at least in Europe. The server should work, no matter > what any installer did/does....just the same as the server > knows not to start if the installer incorrectly set up the > rights of the instance owning userid. Definitly. The issue appears to be that you have an invalid encoding in the config file. How exaclty did you get there - did the installer edit it into the wrong encoding, or did you edit it manually? Using what editor? //Magnus
> > The problem seemed to be code page related... > > > If you set listen_addresses="*" then the pgsql server doesn't > > recognise this because * in one code page is different from * in > > another. It looks like a *, but it isn't... > > I don't even know what a code page is, let alone what you > envision us doing about it. How can my response to the above > be anything except "Windows is too horribly broken to even > contemplate supporting"? Code page = encoding. Or to use the exact definition from MSDN: "A table that associates specific ASCII or EBCDIC values with specific characters." //Magnus
On Tue, 2005-01-11 at 10:18 +0100, Magnus Hagander wrote: > > Spent a few hours today diagnosing some errors on Win32 (on > > Windows Server 2003). These were, I think, wrongly identified > > as being Windows Installer problems, so I believe Magnus was > > chasing his tail also. > > I assume you are talking about the "Access Denied" on initdb error that > David Saunders reported? If so, I erally don't think that specific > problem has to do with encoding - it happens *before* initdb even > starts. I think it's two different issues. Sounds like a different issue, but yes, it was David's problem. > > > The problem seemed to be code page related... > > > > If you set listen_addresses="*" then the pgsql server doesn't > > recognise this because * in one code page is different from * > > in another. It looks like a *, but it isn't... > > Yikes! ...top of my most-wierd list. > > > Setting listen_addresses to other valid values works just > > fine....tested with "localhost" using local TCP/IP > > connection; "localhost, 10.0.0.x" > > with access from 10.0.0.x...all working fine. > > > > Changing the default Language setting to match that of your > > keyboard is only a temporary workaround, since you can't be > > sure which code page is in use by any particular application > > or window. The only way to be sure is to set the default code > > page to the current locale and reboot, but I'm not sure that > > catches everything either once things have been edited. > > You can log in as the postgres user and change it there. > > I've not seen this myself, and I've run on systems in US English and in > Swedish. But you're saying this occurs if say I have the system default > set to something that uses Latin1 and then use a different encoding when > I edit the file? I guess that can be a problem, since notepad doesn't > let you chose encoding. That is correct. > Or are you saying it occurs even if the encodings are the same? No, the first one. > One way to solve this would be to keep the file in UTF-8 or something, > I guess, but that's a fairly major code change... Or find some way to > make sure it's always saved in whatever encoding postgres expects > (US-ASCII?) > > > These are definitely not Windows Installer problems because > > it is perfectly valid action to change the Language of a > > server, at least in Europe. The server should work, no matter > > what any installer did/does....just the same as the server > > knows not to start if the installer incorrectly set up the > > rights of the instance owning userid. > > Definitly. The issue appears to be that you have an invalid encoding in > the config file. How exaclty did you get there - did the installer edit > it into the wrong encoding, or did you edit it manually? Using what > editor? Looks like Windows Server 2003 was setup with "English - United States", then PostgreSQL was installed using "English -UK", and the system was being edited with a UK keyboard (which shows things like British pound, hash and star all in their correct (!) place ...i.e. different to US). -- Best Regards, Simon Riggs
On Tue, 2005-01-11 at 10:20 +0100, Magnus Hagander wrote: > > > The problem seemed to be code page related... > > > > > If you set listen_addresses="*" then the pgsql server doesn't > > > recognise this because * in one code page is different from * in > > > another. It looks like a *, but it isn't... > > > > I don't even know what a code page is, let alone what you > > envision us doing about it. How can my response to the above > > be anything except "Windows is too horribly broken to even > > contemplate supporting"? > > Code page = encoding. Or to use the exact definition from MSDN: "A table > that associates specific ASCII or EBCDIC values with specific > characters." > The code page is a mechanism for dynamically changing the character encoding. Saying "I have an encoding issue" is slightly different to having the wrong code page loaded and then experiencing the resulting wierdness. Not sure whether to recommend a solution or not yet, still trying to make it work consistenly - i.e. reinstalling. -- Best Regards, Simon Riggs
> > > These are definitely not Windows Installer problems because it is > > > perfectly valid action to change the Language of a > server, at least > > > in Europe. The server should work, no matter what any installer > > > did/does....just the same as the server knows not to start if the > > > installer incorrectly set up the rights of the instance owning > > > userid. > > > > Definitly. The issue appears to be that you have an invalid > encoding > > in the config file. How exaclty did you get there - did the > installer > > edit it into the wrong encoding, or did you edit it manually? Using > > what editor? > > Looks like Windows Server 2003 was setup with "English - > United States", then PostgreSQL was installed using "English > -UK", and the system was being edited with a UK keyboard > (which shows things like British pound, hash and star all in > their correct (!) place ...i.e. different to US). My question remains - was the '*' put there by the installer, or manually using notepad (or whatever) later? (The checkbox in the installer to modify it) I wonder if a partial solution to this would be to set the codepage before we load the config file? IIRC, the config file is loaded in whatever codepage happens to be active when the server starts. The code page is later changed when loading pg_control (LC_CTYPE should affect it). That's how I think it works without specifically looking at the code. Then we could document which codepage should always be used to edit it. It's not a beautiful solutino, but it would at least make the behaviour predictable. Not sure what would be involved in requiring the file to be UTF8. Notepad can certainly handle UTF8, but I wonder how much would need to bec hanged in pg... //Magnus
On Tue, 2005-01-11 at 11:20 +0100, Magnus Hagander wrote: > > > > These are definitely not Windows Installer problems because it is > > > > perfectly valid action to change the Language of a > > server, at least > > > > in Europe. The server should work, no matter what any installer > > > > did/does....just the same as the server knows not to start if the > > > > installer incorrectly set up the rights of the instance owning > > > > userid. > > > > > > Definitly. The issue appears to be that you have an invalid > > encoding > > > in the config file. How exaclty did you get there - did the > > installer > > > edit it into the wrong encoding, or did you edit it manually? Using > > > what editor? > > > > Looks like Windows Server 2003 was setup with "English - > > United States", then PostgreSQL was installed using "English > > -UK", and the system was being edited with a UK keyboard > > (which shows things like British pound, hash and star all in > > their correct (!) place ...i.e. different to US). > > My question remains - was the '*' put there by the installer, or > manually using notepad (or whatever) later? (The checkbox in the > installer to modify it) I think it was originally put there by installer, then as part of faffing backwards and forwards trying to get connectivity to work, it was set to localhost, then to a looks-like-a-* by hand editing in the wrong code page. > I wonder if a partial solution to this would be to set the codepage > before we load the config file? IIRC, the config file is loaded in > whatever codepage happens to be active when the server starts. The code > page is later changed when loading pg_control (LC_CTYPE should affect > it). That's how I think it works without specifically looking at the > code. Yes, thats along the right lines. The issue also effects hash - since the looks-like-a-hash character in different code pages is not actually a hash either - so if you comment out a line, it could still be in effect, but you can't tell by looking! This also effects all other .conf files, including recovery.conf. So if you tried to recover a system using a recovery.conf coded on another system it could give you problems... Problem is this: Editors don't tell you what code page you're using, so when you edit a file you see one thing, then the server sees another. We could test the ASCII code and if we see a - a single character non-* in listen_addresses then issue a WARNING - if we see a line beginning with a non-alpha character assume it is a comment but we need a catch-all solution... -- Best Regards, Simon Riggs
"Magnus Hagander" <mha@sollentuna.net> writes: > Not an error, but a warning. It's in src/bin/psql/startup.c, line 663 > and forward. Called on line 304, right before going into the MainLoop(). Ah-hah. I was grepping for "code page" not "codepage". (BTW, this warning appears not to be internationalized --- boo hiss --- and we will need to adjust it to point to whereever we decide to put the documentation about this issue.) So: what the heck does it mean when GetACP() != GetConsoleCP() ? I suppose those control different areas of functionality, but what exactly? regards, tom lane
"Magnus Hagander" <mha@sollentuna.net> writes: > Not sure what would be involved in requiring the file to be UTF8. > Notepad can certainly handle UTF8, but I wonder how much would need to > bec hanged in pg... I think it would Just Work, since UTF8 is an ASCII superset, whereas apparently some of Windows' code pages are not :-( (which is proof of brain death in Redmond if I ever saw it). regards, tom lane
> > Not an error, but a warning. It's in > src/bin/psql/startup.c, line 663 > > and forward. Called on line 304, right before going into > the MainLoop(). > > Ah-hah. I was grepping for "code page" not "codepage". > (BTW, this warning appears not to be internationalized --- > boo hiss --- and we will need to adjust it to point to > whereever we decide to put the documentation about this issue.) > > So: what the heck does it mean when GetACP() != GetConsoleCP() ? > I suppose those control different areas of functionality, but > what exactly? Dunno who's responsible for that code initially ;-) Anyway. GetACP() returns the "current ANSI codepage for the system". GetConsoleCP() "retrieves the input code page used by the console" The console by default uses what's caled the OEM charset. I assume GetACP() returns the default charset for *non-console*programs. And no, I don't know exactly why that matters in the case of psql. I've always been able to "fix" thing by just changingthe font to Lucida console. But I only use very few characters that are affected by this (åäö only). //Magnus
> > Not sure what would be involved in requiring the file to be UTF8. > > Notepad can certainly handle UTF8, but I wonder how much > would need to > > bec hanged in pg... > > I think it would Just Work, since UTF8 is an ASCII superset, Ok. > whereas apparently some of Windows' code pages are not :-( > (which is proof of brain death in Redmond if I ever saw it). I've never seen one that changes any chars <= 127. But it's possible the editor in this case wrote down a file in the wrong charset. Looking at http://www.microsoft.com/globaldev/reference/wincp.mspx, "*" is 002A in *every single one*, inlcuding japanese. Looking at http://www.microsoft.com/globaldev/reference/oem/437.htm, it shows that it's 002A in the very encoding Simon said was used in this case. So it is in http://www.microsoft.com/globaldev/reference/oem/850.htm, which is the OEM codepage used in Swedish windows. I'm having trouble seeing why '*' would have a different value in different codepages. I'm wondering if we're going down the wrong road completely here? //Magnus
Magnus Hagander wrote: > I've never seen one that changes any chars <= 127. But it's possible the > editor in this case wrote down a file in the wrong charset. > Looking at http://www.microsoft.com/globaldev/reference/wincp.mspx, "*" > is 002A in *every single one*, inlcuding japanese. > > Looking at http://www.microsoft.com/globaldev/reference/oem/437.htm, it > shows that it's 002A in the very encoding Simon said was used in this > case. So it is in > http://www.microsoft.com/globaldev/reference/oem/850.htm, which is the > OEM codepage used in Swedish windows. > > I'm having trouble seeing why '*' would have a different value in > different codepages. I'm wondering if we're going down the wrong road > completely here? Agreed. The idea that <=127 codes are different for different encodings (code pages) seems impossible. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Magnus Hagander wrote: >>whereas apparently some of Windows' code pages are not :-( >>(which is proof of brain death in Redmond if I ever saw it). >> >> > >I've never seen one that changes any chars <= 127. But it's possible the >editor in this case wrote down a file in the wrong charset. >Looking at http://www.microsoft.com/globaldev/reference/wincp.mspx, "*" >is 002A in *every single one*, inlcuding japanese. > >Looking at http://www.microsoft.com/globaldev/reference/oem/437.htm, it >shows that it's 002A in the very encoding Simon said was used in this >case. So it is in >http://www.microsoft.com/globaldev/reference/oem/850.htm, which is the >OEM codepage used in Swedish windows. > >I'm having trouble seeing why '*' would have a different value in >different codepages. I'm wondering if we're going down the wrong road >completely here? > > > > So it's not just me. I was looking at similar pages here: http://www.kostis.net/charsets/ and did not find one with 2A != asterisk. cheers andrew
"Magnus Hagander" <mha@sollentuna.net> writes: > I'm having trouble seeing why '*' would have a different value in > different codepages. I'm wondering if we're going down the wrong road > completely here? That was what was bothering me too. It's hard to interpret Simon's report in another way though. regards, tom lane
On Tue, 2005-01-11 at 11:49 -0500, Tom Lane wrote: > "Magnus Hagander" <mha@sollentuna.net> writes: > > I'm having trouble seeing why '*' would have a different value in > > different codepages. I'm wondering if we're going down the wrong road > > completely here? > > That was what was bothering me too. It's hard to interpret Simon's > report in another way though. I'm open to other suggestions as to why "*" would not work, when other explicit settings do.... ...especially something that gets fixed by changing the code page. I do agree that its hard to see whats going on... I spent 4 hrs on a * -- Best Regards, Simon Riggs
Simon Riggs wrote: >On Tue, 2005-01-11 at 11:49 -0500, Tom Lane wrote: > > >>"Magnus Hagander" <mha@sollentuna.net> writes: >> >> >>>I'm having trouble seeing why '*' would have a different value in >>>different codepages. I'm wondering if we're going down the wrong road >>>completely here? >>> >>> >>That was what was bothering me too. It's hard to interpret Simon's >>report in another way though. >> >> > >I'm open to other suggestions as to why "*" would not work, when other >explicit settings do.... > >...especially something that gets fixed by changing the code page. > >I do agree that its hard to see whats going on... I spent 4 hrs on a * > > Can you hex dump a file edited with various code pages in place and see what different values an asterisk gets? That should settle the question, shouldn't it? cheers andrew
On Tue, 2005-01-11 at 12:08 -0500, Andrew Dunstan wrote: > >On Tue, 2005-01-11 at 11:49 -0500, Tom Lane wrote: > > > > > >>"Magnus Hagander" <mha@sollentuna.net> writes: > >> > >> > >>>I'm having trouble seeing why '*' would have a different value in > >>>different codepages. I'm wondering if we're going down the wrong road > >>>completely here? > >>> > >>That was what was bothering me too. It's hard to interpret Simon's > >>report in another way though. > >> > >I'm open to other suggestions as to why "*" would not work, when other > >explicit settings do.... > > > >...especially something that gets fixed by changing the code page. > > > >I do agree that its hard to see whats going on... I spent 4 hrs on a * > > > > Can you hex dump a file edited with various code pages in place and see > what different values an asterisk gets? That should settle the question, > shouldn't it? Your questions are now bothering me too, yet I lack another explanation for why "*" would not work... Unfortunately I do not have further access at this time to that system, but am still involved. rc5 is being installed soon and we can try to repeat the issue. Please don't spend anymore time on this until we have a re-verified problem on rc5. -- Best Regards, Simon Riggs