Thread: Backup messages displayed with wrong encoding
Hello, I am using pgAdmin3 1.14.1 on Windows 2008 R2 (Russian locale, SBCS Win1251) with Postgresql 9.1.2. Having database with UTF-8 encoding and objects, those names contain non-ASCII (Russian) characters, I get non-readable object names when performing backup in pgAdmin3. I think it's happened because pgAdmin assumes that pg_dump always streams output using current locale encoding. But it's not the case, cause it can be a database encoding (by default) or the encoding specified explicitly in frmBackup. Thanks, Alexander
On Mon, 2011-12-12 at 09:49 +0400, Alexander LAW wrote: > Hello, > > I am using pgAdmin3 1.14.1 on Windows 2008 R2 (Russian locale, SBCS > Win1251) with Postgresql 9.1.2. > Having database with UTF-8 encoding and objects, those names contain > non-ASCII (Russian) characters, I get non-readable object names when > performing backup in pgAdmin3. > You mean when you restore it? pgAdmin is UTF-8 only but it accepts to use other encodings to do the dump. > I think it's happened because pgAdmin assumes that pg_dump always > streams output using current locale encoding. But it's not the case, > cause it can be a database encoding (by default) or the encoding > specified explicitly in frmBackup. > pgAdmin doesn't assume anything. It simply launches pg_dump with the options you set in the frmBackup dialog. Righ now, I cannot say wher the issue is. I would need more info to guess that. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com
Hi, To make it clear I am posting two screenshots. ss_backup_win1251 shows valid table name (which is "Test" in Russian), but in ss_backup_utf8 you can see the name with wrong encoding. When I said "pgAdmin assumes", I meant that it converts pg_admin output stream to string as ANSI-encoded, but it's not always the case. In fact, the opposite is common on Windows with Russian locale (and non-ASCII object names), cause UTF-8 is a default encoding for a database, but locale encoding (SBCS) is Win1251, and when you do backup with a default encoding, you get an unreadable log. Best regards, Alexander 13.12.2011 00:13, Guillaume Lelarge wrote: > On Mon, 2011-12-12 at 09:49 +0400, Alexander LAW wrote: >> Hello, >> >> I am using pgAdmin3 1.14.1 on Windows 2008 R2 (Russian locale, SBCS >> Win1251) with Postgresql 9.1.2. >> Having database with UTF-8 encoding and objects, those names contain >> non-ASCII (Russian) characters, I get non-readable object names when >> performing backup in pgAdmin3. >> > You mean when you restore it? pgAdmin is UTF-8 only but it accepts to > use other encodings to do the dump. > >> I think it's happened because pgAdmin assumes that pg_dump always >> streams output using current locale encoding. But it's not the case, >> cause it can be a database encoding (by default) or the encoding >> specified explicitly in frmBackup. >> > pgAdmin doesn't assume anything. It simply launches pg_dump with the > options you set in the frmBackup dialog. Righ now, I cannot say wher the > issue is. I would need more info to guess that. > >
Attachment
On Tue, 2011-12-13 at 08:15 +0400, Alexander LAW wrote: > Hi, > To make it clear I am posting two screenshots. > ss_backup_win1251 shows valid table name (which is "Test" in Russian), > but in ss_backup_utf8 you can see the name with wrong encoding. > > When I said "pgAdmin assumes", I meant that it converts pg_admin output > stream to string as ANSI-encoded, but it's not always the case. > In fact, the opposite is common on Windows with Russian locale (and > non-ASCII object names), cause UTF-8 is a default encoding for a > database, but locale encoding (SBCS) is Win1251, and when you do backup > with a default encoding, you get an unreadable log. > pgAdmin simply displays what pg_dump gives him. If it's in the right encoding, you'll see your tables' name correct. I'm not sure it would be a good idea to grab every line and to convert them in whatever encoding pgAdmin would like. If it's possible at all. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com
Hello, I don't think that Win1251 encoding is more right than UTF-8. IMHO, a program should understand what encoding is text in or let me choose the encoding to read the text. If you would consider this behavior as a problem (for the conditions described) you could solve it by providing a combobox in the Messages tab, that lets me choose the encoding of the log. But then I will choose there the same encoding as I did before in the File Options tab or the encoding of the database. So pgAdmin knows which encoding to use when reading the pg_admin output stream. And about the need and possibility of conversion, I believe that the every line of the log is converted anyway. Please look at sysProcess:ReadStream, there you have strings read from input and appended to txtMessages. str.Append(wxString::Format(wxT("%s"), wxString(buffer, wxConvLibc).c_str())); As I understand, wxConvLibc here specifies that the input strings always are in OS locale encoding, but IMO this should depend on the backup encoding. Best regards 14.12.2011 00:20, Guillaume Lelarge пишет: > On Tue, 2011-12-13 at 08:15 +0400, Alexander LAW wrote: >> Hi, >> To make it clear I am posting two screenshots. >> ss_backup_win1251 shows valid table name (which is "Test" in Russian), >> but in ss_backup_utf8 you can see the name with wrong encoding. >> >> When I said "pgAdmin assumes", I meant that it converts pg_admin output >> stream to string as ANSI-encoded, but it's not always the case. >> In fact, the opposite is common on Windows with Russian locale (and >> non-ASCII object names), cause UTF-8 is a default encoding for a >> database, but locale encoding (SBCS) is Win1251, and when you do backup >> with a default encoding, you get an unreadable log. >> > pgAdmin simply displays what pg_dump gives him. If it's in the right > encoding, you'll see your tables' name correct. I'm not sure it would be > a good idea to grab every line and to convert them in whatever encoding > pgAdmin would like. If it's possible at all. > >
Hi, I would like to clarify the issue with the encoding. Now that we've got postgres localized (with russian messages), I can see that the encoding of the pg_dump output not changed depending on the database setting or the pg_dump option. It seems that pg_dump changes only the encoding of a database objects names (see the screenshot). So it looks like a pg_dump bug, not a feature. Thanks for your feedback. Best regards, Alexander 14.12.2011 09:12, Alexander LAW пишет: > Hello, > I don't think that Win1251 encoding is more right than UTF-8. IMHO, a > program should understand what encoding is text in or let me choose > the encoding to read the text. If you would consider this behavior as > a problem (for the conditions described) you could solve it by > providing a combobox in the Messages tab, that lets me choose the > encoding of the log. But then I will choose there the same encoding as > I did before in the File Options tab or the encoding of the database. > So pgAdmin knows which encoding to use when reading the pg_admin > output stream. > And about the need and possibility of conversion, I believe that the > every line of the log is converted anyway. Please look at > sysProcess:ReadStream, there you have strings read from input and > appended to txtMessages. > > str.Append(wxString::Format(wxT("%s"), wxString(buffer, > wxConvLibc).c_str())); > > As I understand, wxConvLibc here specifies that the input strings > always are in OS locale encoding, but IMO this should depend on the > backup encoding. > > Best regards > > 14.12.2011 00:20, Guillaume Lelarge пишет: >> On Tue, 2011-12-13 at 08:15 +0400, Alexander LAW wrote: >>> Hi, >>> To make it clear I am posting two screenshots. >>> ss_backup_win1251 shows valid table name (which is "Test" in Russian), >>> but in ss_backup_utf8 you can see the name with wrong encoding. >>> >>> When I said "pgAdmin assumes", I meant that it converts pg_admin output >>> stream to string as ANSI-encoded, but it's not always the case. >>> In fact, the opposite is common on Windows with Russian locale (and >>> non-ASCII object names), cause UTF-8 is a default encoding for a >>> database, but locale encoding (SBCS) is Win1251, and when you do backup >>> with a default encoding, you get an unreadable log. >>> >> pgAdmin simply displays what pg_dump gives him. If it's in the right >> encoding, you'll see your tables' name correct. I'm not sure it would be >> a good idea to grab every line and to convert them in whatever encoding >> pgAdmin would like. If it's possible at all. >> >> >