Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding - Mailing list pgsql-hackers

From Alexander Law
Subject Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding
Date
Msg-id 500FDE6F.3060202@gmail.com
Whole thread Raw
Responses Re: Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
<div class="moz-cite-prefix">Hello,<br /> I would like to fix this bug, but it looks like it would be not one-line
patch.<br/> Looking at the pg_dump code I see that the object names come through the following chain:<br /> 1. pg_dump
executes'SELECT c.tableoid, c.oid, c.relname, ... ' and gets the object_name with the encoding chosen for db
connection/dump.<br/> 2. it invokes write_msg function or alike:<br />     write_msg(NULL, "finding the columns and
typesof table \"%s\"\n", tbinfo->dobj.name);<br /> 3. vwrite_msg localizes text message, but not the argument(s):<br
/>    vfprintf(stderr, _(fmt), ap);<br /> Here gettext (_) internally translates fmt to OS encoding (if it's different
fromUTF-8 - encoding of a localized strings).<br /><br /> And I can see only a few solutions of the problem:<br /> 1.
Toconvert the object name at the back-end, i.e. to modify all the similar SELECT's as:<br /> 'SELECT c.tableoid, c.oid,
c.relname,convert_to(c.relname, 'OS_ENCODING') AS locrelname, ...' <br /> and then do     write_msg(NULL, "finding the
columnsand types of table \"%s\"\n", tbinfo->dobj.local_name);<br /> The downside of this approach is that it
requiresrewriting all the SELECT's for all the object. And it doesn't help us to write out any other text from backend,
suchas localized backend error.<br /><br /> 2. To setup another connection to backend with the OS encoding, and to get
allthe object names through it. It looks insane too. And we have the same problem with the localized backend errors
comingon "main" connection.<br /><br /> 3. To make convert_to_os_encoding(text, encoding) function for a frontend
utilities.Unfortunately frontend can't use internal PostgreSQL conversion functions, and modifying them to use through
libpqlooks unfeasible.<br /> So the only way to implement such function is to use another encoding conversion framework
(library).<br/> And my question is - is it possible to include libiconv (add this dependency) to the frontend utilities
code?<br/><br /> 4. To force users to use OS encoding as the Database encoding. Or to not use non-ASCII characters in
andb object names and to disable nls on Windows completely. It doesn't look like a solution at all.<br /><br /> BTW,
it'snot the only one instance of the issue. For example, when I try to use vacuumdb, I get completely unreadable
messages:<br/><a class="moz-txt-link-freetext"
href="http://oi48.tinypic.com/1c8j9.jpg">http://oi48.tinypic.com/1c8j9.jpg</a><br/> (blue marks what is in Russian or
English,all the other text is gibberish).<br /><br /> Best regards,<br /> Alexander<br /><br /><br /> 18.07.2012 12:51,
AlexanderLaw wrote:<br /></div><blockquote cite="mid:50067916.8010508@gmail.com" type="cite"> Hello,<br /><br /> The
dumpfile itself is correct. The issue is only with the non-ASCII object names in pg_dump messages.<br /> The messages
text(which is non-ASCII too) displayed consistently with right encoding (i.e. with OS encoding thanks to
libintl/gettext),but encoding of db object names depends on the dump encoding and thus they're getting unreadable when
differentencoding is used.<br /> The same can be reproduced in Linux (where console encoding is UTF-8) when doing dump
withWindows-1251 or Latin1 (for western european languages).<br /><br /> Thanks,<br /> Alexander<br /><br /><br
/><blockquotestyle="border-left: #5555EE solid 0.2em; margin: 0em;       padding-left: 0.85em"><pre style="margin:
0em;">Thefollowing bug has been logged on the website:
 

Bug reference:      6742
Logged by:          Alexander LAW
Email address:      exclusion(at)gmail(dot)com
PostgreSQL version: 9.1.4
Operating system:   Windows
Description:

When I try to dump database with UTF-8 encoding in Windows, I get unreadable
object names.
Please look at the screenshot (<a href="http://oi50.tinypic.com/2lw6ipf.jpg" moz-do-not-send="true"
rel="nofollow">http://oi50.tinypic.com/2lw6ipf.jpg</a>).On the
 
left window all the pg_dump messages displayed correctly (except for the
prompt password (bug #6510)), but the non-ASCII object name is gibberish. On
the right window (where dump is done with the Windows 1251 encoding (OS
Encoding for Russian locale)) everything is right.

</pre></blockquote><pre style="margin: 0em;">Did you check the dump file using an editor that can handle UTF-8?
The Windows console is not known for properly handling that encoding.

Thomas




</pre></blockquote><br />

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: canceling autovacuum task woes
Next
From: Christoph Berg
Date:
Subject: Re: Notify system doesn't recover from "No space" error