Thread: nls and server log
Hi, Currently the same message goes to server log and client app. Sometimes it bothers me since I have to analyze server logs and discovered that lc_messages is set to pt_BR and to worse things that stup^H^H^H application parse some error messages in portuguese. My solution has been a modified version of pgBadger (former was pgfouine) -- that has its problems: (i) translations are not as stable as english messages, (ii) translations are not always available and it means there is a mix of translated and untranslated messages and (iii) it is minor version dependent. I'm tired to fight against those problems and started to research if there is a good solution for backend. I'm thinking to carry both translated and untranslated messages if we ask to. We store the untranslated messages if the new GUC (say server_lc_messages) is set. The cost will be copy to new five variables (message, detail, detail_log, hint, and context) in ErrorData struct that will be used iif server_lc_messages is set. A possible optimization is not to use the new variables if the lc_messages and server_lc_messages does not match. My use case is a server log in english but I'm perfect fine allowing server log in spanish and client messages in french. Is it an acceptable plan? Ideas? -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
On Wed, Dec 24, 2014 at 1:35 PM, Euler Taveira <euler@timbira.com.br> wrote: > Currently the same message goes to server log and client app. Sometimes > it bothers me since I have to analyze server logs and discovered that > lc_messages is set to pt_BR and to worse things that stup^H^H^H > application parse some error messages in portuguese. My solution has > been a modified version of pgBadger (former was pgfouine) -- that has > its problems: (i) translations are not as stable as english messages, > (ii) translations are not always available and it means there is a mix > of translated and untranslated messages and (iii) it is minor version > dependent. I'm tired to fight against those problems and started to > research if there is a good solution for backend. > > I'm thinking to carry both translated and untranslated messages if we > ask to. We store the untranslated messages if the new GUC (say > server_lc_messages) is set. The cost will be copy to new five variables > (message, detail, detail_log, hint, and context) in ErrorData struct > that will be used iif server_lc_messages is set. A possible optimization > is not to use the new variables if the lc_messages and > server_lc_messages does not match. My use case is a server log in > english but I'm perfect fine allowing server log in spanish and client > messages in french. Is it an acceptable plan? Ideas? Seems reasonable to me, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Dec 24, 2014 at 1:35 PM, Euler Taveira <euler@timbira.com.br> wrote: >> Currently the same message goes to server log and client app. >> ... >> I'm thinking to carry both translated and untranslated messages if we >> ask to. We store the untranslated messages if the new GUC (say >> server_lc_messages) is set. The cost will be copy to new five variables >> (message, detail, detail_log, hint, and context) in ErrorData struct >> that will be used iif server_lc_messages is set. A possible optimization >> is not to use the new variables if the lc_messages and >> server_lc_messages does not match. My use case is a server log in >> english but I'm perfect fine allowing server log in spanish and client >> messages in french. Is it an acceptable plan? Ideas? > Seems reasonable to me, I think. The core problem that we've worried about in previous discussions about this is what to do about translation failures and encoding conversion failures. That is, there's been worry that a poor choice of "log locale" could result in failures that don't occur otherwise; failures that could be particularly nasty if they result in the inability to log important conditions, perhaps even prevent reporting them to the client either. While I don't say that we cannot accept any risk of that sort, I think we should consider what risks exist and whether they can be minimized before we plow ahead. It would also be useful to think about the requests we get from time to time to ensure that log messages appear in a uniform choice of encoding. I don't know whether trying to enforce a uniform log message locale would make that easier or harder. regards, tom lane
On 12/25/2014 02:35 AM, Euler Taveira wrote: > Hi, > > Currently the same message goes to server log and client app. Sometimes > it bothers me since I have to analyze server logs and discovered that > lc_messages is set to pt_BR and to worse things that stup^H^H^H > application parse some error messages in portuguese. IMO logging is simply broken for platforms where the postmaster and all DBs don't share an encoding. We mix different encodings in log messages and provide no way to separate them out. Nor is there a way to log different messages to different files. It's not just an issue with translations. We mix and mangle encodings of user-supplied text, like RAISE strings in procs, for example. We really need to be treating encoding for logging and for the client much more separately than we currently do. I think any consideration of translations for logging should be done with the underlying encoding issues in mind. My personal opinion is that we should require the server log to be capable of representing all chars in the encodings used by any DB. Which in practice means that we always just log in utf-8 if the user wants to permit DBs with different encodings. An alternative would be one file per database, always in the encoding of that database. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12/28/14, 2:56 AM, Craig Ringer wrote: > On 12/25/2014 02:35 AM, Euler Taveira wrote: >> Hi, >> >> Currently the same message goes to server log and client app. Sometimes >> it bothers me since I have to analyze server logs and discovered that >> lc_messages is set to pt_BR and to worse things that stup^H^H^H >> application parse some error messages in portuguese. > > IMO logging is simply broken for platforms where the postmaster and all > DBs don't share an encoding. We mix different encodings in log messages > and provide no way to separate them out. Nor is there a way to log > different messages to different files. > > It's not just an issue with translations. We mix and mangle encodings of > user-supplied text, like RAISE strings in procs, for example. > > We really need to be treating encoding for logging and for the client > much more separately than we currently do. I think any consideration of > translations for logging should be done with the underlying encoding > issues in mind. Agreed. > My personal opinion is that we should require the server log to be > capable of representing all chars in the encodings used by any DB. Which > in practice means that we always just log in utf-8 if the user wants to > permit DBs with different encodings. An alternative would be one file > per database, always in the encoding of that database. How much of this issue is caused by trying to machine-parse log files? Is a better option to improve that case, possiblydoing something like including a field in each line that tells you the encoding for that entry? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 12/30/2014 06:39 AM, Jim Nasby wrote: >> > > How much of this issue is caused by trying to machine-parse log files? > Is a better option to improve that case, possibly doing something like > including a field in each line that tells you the encoding for that entry? That'd be absolutely ghastly. You couldn't just view the logs with 'less' or a text editor if your logs had mixed encodings, you'd need some kind of special PostgreSQL log viewer tool. Why would we possibly do that when we could just emit utf-8 instead? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12/29/14, 7:40 PM, Craig Ringer wrote: > On 12/30/2014 06:39 AM, Jim Nasby wrote: >>> >> >> How much of this issue is caused by trying to machine-parse log files? >> Is a better option to improve that case, possibly doing something like >> including a field in each line that tells you the encoding for that entry? > > That'd be absolutely ghastly. You couldn't just view the logs with > 'less' or a text editor if your logs had mixed encodings, you'd need > some kind of special PostgreSQL log viewer tool. I was specifically talking about logs intended for machine reading (ie: CSV), not human reading. Similar to how client logging (where encoding is a lot more important) and server logging aren't exactly the same use case,human read logs vs something for a machine to read aren't the same thing either. BTW, before someone makes an argument for using tools like cut or grep with CSV, that actually falls apart spectacularlyat the first multi-line log message. I think that's just another example that trying to make one logfile servetwo different purposes just won't work well. Perhaps the solution here is to include a tool that makes it easier to deal with CSV logs, including encoding. I've certainlywished for such a tool to allow me to effectively deal with CSV logs in a way that didn't necessitate loading theminto a table. > Why would we possibly do that when we could just emit utf-8 instead? What happens if we get a translation/encoding failure (the case Tom's worried about)? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com