Thread: logfile character encoding
Hello, please help me with this problem: My postgreSQL logfiles generally are encoded in UTF-8, but some entries are in Windows-1252. The server status in pgAdmin III doesn't cope with this; it is unusable because of continuous error popups. I found several discussions regarding this issue, e. g. http://www.postgresql.org/message-id/CAD8_UcbfncTLTPeVcihXGJtGoB5BjfqBH48R3Zor0Y91Ph0hUw@mail.gmail.com http://www.postgresql.org/message-id/201012211547.oBLFl3qC019008@wwwmaster.postgresql.org but unfortunately I don't understand cause and, if possible, repair. log destination is configured as stderr, with logging collector on (default setup). So it may be one issue that stderr is written in 1252 (not sure about that), but then why are most logfile entries encoded in UTF-8? Is only a part of all messages routed via stderr? Another idea is that the 1252 encodings may be related to pgAdmin III? I send almost all SQL via its query window. Windows 8.1 PostgreSQL 9.3.5 64 bit pgAdmin 1.18.1 database encoding UTF8 all on the same PC, no multiple servers or database encodings involved Some examples of entries with 8 bit encoded chars (Windows 1252): === 2014-08-13 09:37:32 CEST LOG: Datenbanksystem wurde nicht richtig heruntergefahren; automatische Wiederherstellung läuft 2014-08-14 10:34:46 CEST LOG: Parameter „autovacuum“ auf „off“ gesetzt 2014-08-14 10:34:46 CEST LOG: Autovacuum-Launcher fährt herunter 2014-08-14 10:51:52 CEST TIPP: Erhöhen Sie eventuell den Konfigurationsparameter „checkpoint_segments“. 2014-08-14 14:32:43 CEST LOG: Parameter „fsync“ auf „off“ gesetzt 2014-08-14 14:32:43 CEST LOG: Parameter „full_page_writes“ auf „off“ gesetzt 2014-08-14 14:32:43 CEST LOG: Parameter „checkpoint_segments“ auf „128“ gesetzt === and some entries with UTF8-encoded non ASCII chars: === 2014-08-14 09:16:20 CEST FEHLER: Syntaxfehler bei „DROPT“ bei Zeichen 1 2014-08-14 09:16:48 CEST FEHLER: NULL-Wert in Spalte „id“ verletzt Not-Null-Constraint 2014-08-14 09:16:48 CEST DETAIL: Fehlgeschlagene Zeile enthält (null, 5). 2014-08-14 09:56:41 CEST TIPP: Konnte keine beste Kandidatfunktion auswählen. Sie müssen möglicherweise ausdrückliche Typumwandlungen hinzufügen. 2014-08-14 17:21:33 CEST FEHLER: „g“ ist keine bekannte Variable bei Zeichen 79 2014-08-14 17:21:33 CEST TIPP: Keine Funktion stimmt mit dem angegebenen Namen und den Argumenttypen überein. Sie müssen möglicherweise ausdrückliche Typumwandlungen hinzufügen. 2014-08-14 17:21:33 CEST FEHLER: Relation „relation“ existiert nicht bei Zeichen 74 2014-08-14 19:05:46 CEST FEHLER: Spalte „ways.id“ muss in der GROUP-BY-Klausel erscheinen oder in einer Aggregatfunktion verwendet werden bei Zeichen 34 2014-08-14 19:28:08 CEST WARNUNG: Spalte „source“ hat Typ „unknown“ 2014-08-14 19:28:08 CEST FEHLER: Spaltenverweis „id“ ist nicht eindeutig bei Zeichen 49 2014-08-14 20:19:51 CEST FEHLER: kann Sicht mpways nicht löschen, weil andere Objekte davon abhängen 2014-08-14 20:19:51 CEST DETAIL: Sicht mplines hängt von Sicht mpways ab Sicht mpgeoms hängt von Sicht mplines ab 2014-08-14 20:19:51 CEST TIPP: Verwenden Sie DROP ... CASCADE, um die abhängigen Objekte ebenfalls zu löschen. 2014-08-14 20:19:51 CEST FEHLER: Relation „mpways“ existiert bereits === Thanks, Redoute
Redoute <redoute@tortenboxer.de> writes: > please help me with this problem: My postgreSQL logfiles generally are > encoded in UTF-8, but some entries are in Windows-1252. The server > status in pgAdmin III doesn't cope with this; it is unusable because of > continuous error popups. > I found several discussions regarding this issue, e. g. > http://www.postgresql.org/message-id/CAD8_UcbfncTLTPeVcihXGJtGoB5BjfqBH48R3Zor0Y91Ph0hUw@mail.gmail.com > http://www.postgresql.org/message-id/201012211547.oBLFl3qC019008@wwwmaster.postgresql.org > but unfortunately I don't understand cause and, if possible, repair. Individual server sessions will log in their database encoding, but the postmaster (parent process) will probably log in an encoding determined by its environment. You need to make sure the postmaster's environment selects a UTF8 locale. I'm not familiar with how to do that on Windows, unfortunately. regards, tom lane
On 08/16/2014 12:26 PM, Redoute wrote: > Hello, > > please help me with this problem: My postgreSQL logfiles generally are > encoded in UTF-8, but some entries are in Windows-1252. The server > status in pgAdmin III doesn't cope with this; it is unusable because of > continuous error popups. > log destination is configured as stderr, with logging collector on > (default setup). So it may be one issue that stderr is written in 1252 > (not sure about that), but then why are most logfile entries encoded in > UTF-8? Is only a part of all messages routed via stderr? > > Another idea is that the 1252 encodings may be related to pgAdmin III? I > send almost all SQL via its query window. > > Windows 8.1 > PostgreSQL 9.3.5 64 bit > pgAdmin 1.18.1 > database encoding UTF8 > all on the same PC, no multiple servers or database encodings involved > > Some examples of entries with 8 bit encoded chars (Windows 1252): > === > 2014-08-13 09:37:32 CEST LOG: Datenbanksystem wurde nicht richtig > heruntergefahren; automatische Wiederherstellung läuft > 2014-08-14 10:34:46 CEST LOG: Parameter „autovacuum“ auf „off“ gesetzt > 2014-08-14 10:34:46 CEST LOG: Autovacuum-Launcher fährt herunter > 2014-08-14 10:51:52 CEST TIPP: Erhöhen Sie eventuell den > Konfigurationsparameter „checkpoint_segments“. > 2014-08-14 14:32:43 CEST LOG: Parameter „fsync“ auf „off“ gesetzt > 2014-08-14 14:32:43 CEST LOG: Parameter „full_page_writes“ auf „off“ gesetzt > 2014-08-14 14:32:43 CEST LOG: Parameter „checkpoint_segments“ auf „128“ > gesetzt > === > > > and some entries with UTF8-encoded non ASCII chars: > === > 2014-08-14 09:16:20 CEST FEHLER: Syntaxfehler bei „DROPT“ bei Zeichen 1 > 2014-08-14 09:16:48 CEST FEHLER: NULL-Wert in Spalte „id“ verletzt > Not-Null-Constraint > 2014-08-14 09:16:48 CEST DETAIL: Fehlgeschlagene Zeile enthält (null, 5). > 2014-08-14 09:56:41 CEST TIPP: Konnte keine beste Kandidatfunktion > auswählen. Sie müssen möglicherweise ausdrückliche Typumwandlungen > hinzufügen. > 2014-08-14 17:21:33 CEST FEHLER: „g“ ist keine bekannte Variable bei > Zeichen 79 > 2014-08-14 17:21:33 CEST TIPP: Keine Funktion stimmt mit dem angegebenen > Namen und den Argumenttypen überein. Sie müssen möglicherweise > ausdrückliche Typumwandlungen hinzufügen. > 2014-08-14 17:21:33 CEST FEHLER: Relation „relation“ existiert nicht bei > Zeichen 74 > 2014-08-14 19:05:46 CEST FEHLER: Spalte „ways.id“ muss in der > GROUP-BY-Klausel erscheinen oder in einer Aggregatfunktion verwendet > werden bei Zeichen 34 > 2014-08-14 19:28:08 CEST WARNUNG: Spalte „source“ hat Typ „unknown“ > 2014-08-14 19:28:08 CEST FEHLER: Spaltenverweis „id“ ist nicht eindeutig > bei Zeichen 49 > 2014-08-14 20:19:51 CEST FEHLER: kann Sicht mpways nicht löschen, weil > andere Objekte davon abhängen > 2014-08-14 20:19:51 CEST DETAIL: Sicht mplines hängt von Sicht mpways ab > Sicht mpgeoms hängt von Sicht mplines ab > 2014-08-14 20:19:51 CEST TIPP: Verwenden Sie DROP ... CASCADE, um die > abhängigen Objekte ebenfalls zu löschen. > 2014-08-14 20:19:51 CEST FEHLER: Relation „mpways“ existiert bereits > === Based on Google Translate I observed the following from the above. The first batch of messages are entirely server related messages, either start up or config file changes. The second batch are entirely query based messages. This would seem to be the reverse of your suspicion that sending queries through pgAdmin is changing the encoding to 1252. So some questions: Are you positive the database is UTF8? Is there something setting the encoding different for a connection? > > Thanks, Redoute > > -- Adrian Klaver adrian.klaver@aklaver.com
On 08/16/2014 12:26 PM, Redoute wrote: > Hello, > > > log destination is configured as stderr, with logging collector on > (default setup). So it may be one issue that stderr is written in 1252 > (not sure about that), but then why are most logfile entries encoded in > UTF-8? Is only a part of all messages routed via stderr? > > Another idea is that the 1252 encodings may be related to pgAdmin III? I > send almost all SQL via its query window. > > Windows 8.1 > PostgreSQL 9.3.5 64 bit > pgAdmin 1.18.1 > database encoding UTF8 > all on the same PC, no multiple servers or database encodings involved Aaah, missed Toms post that explained the discrepancy better. In other words database is operating in one encoding, the postmaster in another. That would explain my previous observation. > Thanks, Redoute > > -- Adrian Klaver adrian.klaver@aklaver.com
Am 16.08.2014 21:40, schrieb Tom Lane: > You need to make sure the postmaster's > environment selects a UTF8 locale. So is this possible at all in Windows? http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx According to this table Locales support ANSI and OEM codepages, but there is no mention of Unicode encodings. Don't know what ANSI codepage "0" or OEM codepage "1" means, anyway this is not given for german or english language. I tried to change the lc_messages configuration variable in postgresql.conf. Setting has been > # These settings are initialized by initdb, but they can be changed. > lc_messages = 'German_Germany.1252' # locale for system error message I tried to change it to German_Germany.UTF-8, which caused a fallback to english messages. May be because the value is invalid at all, or may be because there are no translations for this locale. (BTW having messages in english seems to be a usable work-around, since these messages are ASCII only. It is also possible to manually set lc_messages back to 'German_Germany.1252' per session, so that the logfile contains english messages from the "postmaster" and UTF-8 encoded german messages from the sessions. But would this be a solution for default Windows installations?) Thank you, Redoute
On 08/17/2014 03:30 AM, Redoute wrote: > Am 16.08.2014 21:40, schrieb Tom Lane: > >> You need to make sure the postmaster's >> environment selects a UTF8 locale. > > So is this possible at all in Windows? > http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx > According to this table Locales support ANSI and OEM codepages, but > there is no mention of Unicode encodings. Don't know what ANSI codepage > "0" or OEM codepage "1" means, anyway this is not given for german or > english language. > > I tried to change the lc_messages configuration variable in > postgresql.conf. Setting has been > >> # These settings are initialized by initdb, but they can be changed. >> lc_messages = 'German_Germany.1252' # locale for system error message > > I tried to change it to German_Germany.UTF-8, which caused a fallback to > english messages. May be because the value is invalid at all, or may be > because there are no translations for this locale. (BTW having messages > in english seems to be a usable work-around, since these messages are > ASCII only. It is also possible to manually set lc_messages back to > 'German_Germany.1252' per session, so that the logfile contains english > messages from the "postmaster" and UTF-8 encoded german messages from > the sessions. But would this be a solution for default Windows > installations?) I am afraid Windows locale setting is a mystery to me. Here is a blog that seems to address your issue somewhat: http://www.g-loaded.eu/2011/02/27/locale-windows/ Hope it helps. > > Thank you, > Redoute > > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes: > On 08/17/2014 03:30 AM, Redoute wrote: >> Am 16.08.2014 21:40, schrieb Tom Lane: >>> You need to make sure the postmaster's >>> environment selects a UTF8 locale. >> So is this possible at all in Windows? > I am afraid Windows locale setting is a mystery to me. Me too, but I repeat the point that what you have to twiddle is the postmaster's environment; in particular you have to set LC_CTYPE (or maybe better LC_ALL), which cannot be set from the config file. I doubt it will work to have LC_MESSAGES trying to select a different encoding than LC_CTYPE specifies. regards, tom lane
Am 17.08.2014 16:49, schrieb Adrian Klaver: > Here is a blog > that seems to address your issue somewhat: > > http://www.g-loaded.eu/2011/02/27/locale-windows/ This seems somewhat confused. But it made me playing with initdb, which I did not try before. I got: > >initdb.exe --locale=american_usa --encoding=UTF-8 --pgdata C:/Daten/pgdata2/ > The database cluster will be initialized with locale "English_United States.1252". > The default text search configuration will be set to "english". So also Blogger George will have Windows-1252 messages, if ever an english message from Postmaster will contain Non-ASCII. > >initdb.exe --locale=English --encoding=UTF-8 --pgdata C:/Daten/pgdata2/ the same > >initdb.exe --locale=English --pgdata C:/Daten/pgdata2/ > The database cluster will be initialized with locale "English_United States.1252". > The default database encoding has accordingly been set to "WIN1252". > The default text search configuration will be set to "english". Additional message when I leave out the encoding parameter. > >initdb.exe --pgdata C:/Daten/pgdata2/ > The database cluster will be initialized with locale "German_Germany.1252". > The default database encoding has accordingly been set to "WIN1252". > The default text search configuration will be set to "german". using default values > >initdb.exe --locale=German_Germany.874 --pgdata C:/Daten/pgdata2/ > The database cluster will be initialized with locale "German_Germany.874". > The default database encoding has accordingly been set to "WIN874". > The default text search configuration will be set to "german". I can haz Thai Codepage =:-} > >initdb.exe --locale=German_Germany.UTF-8 --pgdata C:/Daten/pgdata2/ > initdb: invalid locale name "German_Germany.UTF-8" > The database cluster will be initialized with locale "German_Germany.1252". > The default database encoding has accordingly been set to "WIN1252". > The default text search configuration will be set to "german". but no UTF-8; same for ".UTF8" or ".65001" > >initdb.exe --encoding=UTF-8 --pgdata C:/Daten/pgdata2/ > The database cluster will be initialized with locale "German_Germany.1252". > The default text search configuration will be set to "german". So this probably is the configuration I started with. Thanks, Redoute
Am 17.08.2014 17:17, schrieb Tom Lane: > Me too, but I repeat the point that what you have to twiddle is the > postmaster's environment; in particular you have to set LC_CTYPE > (or maybe better LC_ALL), which cannot be set from the config file. With "set LC_CTYPE" do you mean to create an simple environment variable? > >SET PGDATA=C:\Daten\pgdata2\ > >SET LC_ALL=German_Germany.UTF-8 > >initdb.exe > The database cluster will be initialized with locale "German_Germany.1252". > The default database encoding has accordingly been set to "WIN1252". > The default text search configuration will be set to "german". initdb uses PGDATA, but ignores LC_ALL, otherwise it would complain > initdb: invalid locale name "German_Germany.UTF-8" like seen with parameter --locale==German_Germany.UTF-8 I also tried to set environment variable LC_ALL global via control panel with different values. Restarted postgreSQL service; it doesn't care. Thanks, Redoute
On 08/17/2014 03:32 PM, Redoute wrote: > Am 17.08.2014 17:17, schrieb Tom Lane: > >> Me too, but I repeat the point that what you have to twiddle is the >> postmaster's environment; in particular you have to set LC_CTYPE >> (or maybe better LC_ALL), which cannot be set from the config file. > > With "set LC_CTYPE" do you mean to create an simple environment variable? > >>> SET PGDATA=C:\Daten\pgdata2\ >>> SET LC_ALL=German_Germany.UTF-8 >>> initdb.exe >> The database cluster will be initialized with locale "German_Germany.1252". >> The default database encoding has accordingly been set to "WIN1252". >> The default text search configuration will be set to "german". > > initdb uses PGDATA, but ignores LC_ALL, otherwise it would complain >> initdb: invalid locale name "German_Germany.UTF-8" > like seen with parameter --locale==German_Germany.UTF-8 > > I also tried to set environment variable LC_ALL global via control panel > with different values. Restarted postgreSQL service; it doesn't care. Yeah, more digging on my part seems to indicate that no matter what you do Windows is going to set the locale to some variety of a codepage. Unfortunately, I do not use Windows enough to go much further with this. Hopefully someone more Microsoft fluent can weigh in on this, as this issue comes up periodically and it would be nice to have a definitive answer one way or another. I for one would appreciate the edification. > > Thanks, > Redoute > > -- Adrian Klaver adrian.klaver@aklaver.com
➢ Yeah, more digging on my part seems to indicate that no matter what you ➢ do Windows is going to set the locale to some variety of a codepage. ➢ Unfortunately, I do not use Windows enough to go much further with this. ➢ Hopefully someone more Microsoft fluent can weigh in on this, as this ➢ issue comes up periodically and it would be nice to have a definitive ➢ answer one way or another. I for one would appreciate the edification. Wikipedia says that UTF-8 is code page 65001, in Microsoft notation (see http://en.wikipedia.org/wiki/Code_page). Does thishelp in any way (i.e. does German_Germany.65001 work for you)?
Am 18.08.2014 15:31, schrieb Holger.Friedrich-Fa-Trivadis@it.nrw.de: > Wikipedia says that UTF-8 is code page 65001, in Microsoft notation > (see http://en.wikipedia.org/wiki/Code_page). Does this help in any way > (i.e. does German_Germany.65001 work for you)? No, I tried that value yesterday, see my answer to Adrian. It seems Unicode encodings are just not target of Windows Locales. Which in my opinion is reasonable: Why should it be a localization issue, how a program writes Unicode to a file? When a localized Windows suggests two different 8bit-charsets for usage (ANSI and OEM), this doesn't hinder a program to write Unicode. Why can't PostgreSQLs "Postmaster" do it? Thanks, Redoute
What about taking a cue from the link provided by Adrian, http://www.g-loaded.eu/2011/02/27/locale-windows/, and initdb anew empty database cluster not like initdb.exe --locale=German_Germany.UTF-8 which you say does not work, but rather like initdb.exe --locale=German_Germany --encoding=UTF-8 which the link implies seems to have worked for its author? You could then check the postgresql.conf file for the new databasecluster and see what lc_messages has been set to, maybe that will work for you. Sorry I haven't tried out this idea myself -- around here we have PostgreSQL installed either on Linux servers or on virtualLinux machines, so I use Windows basically to read my e-mail (and, of course, to run VirtualBox, PuTTY and WinSCP)-- PostgreSQL isn't supposed to be installed on the Windows box, and anyway I don't have an admin password to do soregardless. -----Original Message----- From: Redoute [mailto:redoute@tortenboxer.de] Sent: Monday, August 18, 2014 5:00 PM To: Friedrich-Fa-Trivadis, Holger (IT.NRW); pgsql-general@postgresql.org Subject: Re: [GENERAL] logfile character encoding Am 18.08.2014 15:31, schrieb Holger.Friedrich-Fa-Trivadis@it.nrw.de: > Wikipedia says that UTF-8 is code page 65001, in Microsoft notation > (see http://en.wikipedia.org/wiki/Code_page). Does this help in any > way (i.e. does German_Germany.65001 work for you)? No, I tried that value yesterday, see my answer to Adrian. It seems Unicode encodings are just not target of Windows Locales. Which in my opinion is reasonable: Why should it be alocalization issue, how a program writes Unicode to a file? When a localized Windows suggests two different 8bit-charsetsfor usage (ANSI and OEM), this doesn't hinder a program to write Unicode. Why can't PostgreSQLs "Postmaster"do it? Thanks, Redoute
Sorry for misquoting, the command line that reportedly worked had --encoding=UTF8 not --encoding=UTF-8. I apologize fornot paying more attention. -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Holger.Friedrich-Fa-Trivadis@it.nrw.de Sent: Monday, August 18, 2014 5:51 PM To: pgsql-general@postgresql.org Cc: redoute@tortenboxer.de Subject: Re: [GENERAL] logfile character encoding What about taking a cue from the link provided by Adrian, http://www.g-loaded.eu/2011/02/27/locale-windows/, and initdb anew empty database cluster not like initdb.exe --locale=German_Germany.UTF-8 which you say does not work, but rather like initdb.exe --locale=German_Germany --encoding=UTF-8 which the link implies seems to have worked for its author? You could then check the postgresql.conf file for the new databasecluster and see what lc_messages has been set to, maybe that will work for you. Sorry I haven't tried out this idea myself -- around here we have PostgreSQL installed either on Linux servers or on virtualLinux machines, so I use Windows basically to read my e-mail (and, of course, to run VirtualBox, PuTTY and WinSCP)-- PostgreSQL isn't supposed to be installed on the Windows box, and anyway I don't have an admin password to do soregardless. -----Original Message----- From: Redoute [mailto:redoute@tortenboxer.de] Sent: Monday, August 18, 2014 5:00 PM To: Friedrich-Fa-Trivadis, Holger (IT.NRW); pgsql-general@postgresql.org Subject: Re: [GENERAL] logfile character encoding Am 18.08.2014 15:31, schrieb Holger.Friedrich-Fa-Trivadis@it.nrw.de: > Wikipedia says that UTF-8 is code page 65001, in Microsoft notation > (see http://en.wikipedia.org/wiki/Code_page). Does this help in any > way (i.e. does German_Germany.65001 work for you)? No, I tried that value yesterday, see my answer to Adrian. It seems Unicode encodings are just not target of Windows Locales. Which in my opinion is reasonable: Why should it be alocalization issue, how a program writes Unicode to a file? When a localized Windows suggests two different 8bit-charsetsfor usage (ANSI and OEM), this doesn't hinder a program to write Unicode. Why can't PostgreSQLs "Postmaster"do it? Thanks, Redoute -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Am 18.08.2014 17:50, schrieb Holger.Friedrich-Fa-Trivadis@it.nrw.de: > initdb.exe --locale=German_Germany --encoding=UTF-8 > > which the link implies seems to have worked for its author? I have tried a bunch of similar program calls and posted them last night as answer to Adrian. --encoding=UTF-8 sets the default database encoding, but this leaves the "Postmaster" itself at 1252. This is indicated by initdb's output, and your proposal gives exactly the setup that the installer does by default and which causes mixed logfile encodings. As all my tests show, the "Postmaster" is bound to a valid Locale, and Unicode encodings are never part of valid Locales in Windows, so a Windows "Postmaster" will never write UTF-8 output. Please note that the example in the block uses english language, so the author probably will not have noticed 1252 encodings. The english messages also don't have fancy (if not silly) quotation marks like the german ones. Thanks, Redoute
Am 18.08.2014 15:31, schrieb Holger.Friedrich-Fa-Trivadis@it.nrw.de: > Wikipedia says that UTF-8 is code page 65001, in Microsoft notation (see http://en.wikipedia.org/wiki/Code_page). Doesthis help in any way (i.e. does German_Germany.65001 work for you)? No, I tried that value yesterday, see my answer to Adrian. It seems Unicode encodings are just not target of Windows Locales. Which in my opinion is reasonable: Why should it be a localization issue, how a program writes Unicode to a file? When a localized Windows suggests two different 8bit-charsets for usage (ANSI and OEM), this doesn't hinder a program to write Unicode. Why can't PostgreSQL "Postmaster" do it? -- Kai Borgolte, Bonn