Re: handling unconvertible error messages - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: handling unconvertible error messages
Date
Msg-id CAMsr+YFL0b1886tMYF9RPeDdpWryG1cr8ew3pYfiXgrJofpHjA@mail.gmail.com
Whole thread Raw
In response to handling unconvertible error messages  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: handling unconvertible error messages
List pgsql-hackers
On 25 July 2016 at 22:43, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
Example: I have a database cluster initialized with --locale=ru_RU.UTF-8
(built with NLS).  Let's say for some reason, I have client encoding set
to LATIN1.  All error messages come back like this:

test=> select * from notthere;
ERROR:  character with byte sequence 0xd0 0x9e in encoding "UTF8" has no
equivalent in encoding "LATIN1"

There is no straightforward way for the client to learn that there is a
real error message, but it could not be converted.

I think ideally we could make this better in two ways:

1) Send the original error message untranslated.  That would require
saving the original error message in errmsg(), errdetail(), etc.  That
would be a lot of work for only the occasional use.  But it would also
facilitate an occasionally-requested feature of writing untranslated
error messages into the server log or the csv log, while sending
translated messages to the client (or some variant thereof).

2) Send an indication that there was an encoding problem.  Maybe a
NOTICE, or an error context?  Wiring all this into elog.c looks a bit
tricky, however.


We have a similar problem with the server logs. But there there's also an additional problem: if there isn't any character mapping issue we just totally ignore text encoding concerns and log in whatever encoding the client asked the backend to use into the log files. So log files can be a line-by-line mix of UTF-8, ISO-8859-1, and whatever other fun encodings someone asks for. There is *no* way to correctly read such a file since lines don't have any marking as to their encoding and no tools out there support line-by-line differently encoded text files anyway.

I'm not sure how closely it ties in to the issue you mention, but I think it's at least related enough to keep in mind while considering the client_encoding issue.

I suggest (3) "log the message with unmappable characters masked". Though I would definitely like to be able to also send the raw original, along with a field indicating the encoding of the original since it won't be the client_encoding, since we need some way to get to the info.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Optimizing numeric SUM() aggregate
Next
From: Tom Lane
Date:
Subject: Re: Optimizing numeric SUM() aggregate