Re: Encoding issues in console and eventlog on win32 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Encoding issues in console and eventlog on win32
Date
Msg-id 4AAE241E.8010406@enterprisedb.com
Whole thread Raw
In response to Encoding issues in console and eventlog on win32  (Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Responses Re: Encoding issues in console and eventlog on win32
List pgsql-hackers
Itagaki Takahiro wrote:
> We can choose different encodings from platform-dependent one
> for database, but postgres writes serverlogs in the database encoding.
> As the result, serverlogs are filled with broken characters.
> 
> The problem could occur on all platforms, however, there is a solution
> for win32. Since Windows supports wide characters to write logs, we can
> convert log texts => UTF-8 => UTF-16 and pass them to WriteConsoleW()
> and ReportEventW().
> 
> Especially in Japan, encoding troubles on Windows are unavoidable
> because postgres doesn't support Shift-JIS for database encoding,
> that is the native encoding for Windows Japanese edition.
> 
> If we also want to support the same functionality on non-win32 platform,
> we might need non-throwable version of pg_do_encoding_conversion():
> 
>     log_message_to_write = pg_do_encoding_conversion_nothrow(
>         log_message_in_database_encoding,
>         GetDatabaseEncoding() /* as src_encoding */,
>         GetPlatformEncoding() /* as dst_encoding */)
> 
> and pass the result to stderr and syslog. But it requires major rewrites
> of conversion functions, so I'd like to submit a solution only for win32
> for now. Also, the issue is not so serious on non-win32 platforms because
> we can choose UTF-8 or EUC_* on those platforms.

Something like that seems reasonable for the Windows event log; that is
clearly supposed to be written using a specific encoding. With the log
files, we're more free to do what we want, and IMHO we shouldn't put a
Windows-specific hack there because as you say we have the same problem
on all platforms.

There's no guarantee that conversion to UTF-8 won't fail, so this isn't
totally risk-free on Windows either. Theoretically, MultiByteToWideChar
could fail too (the patch neglects to check for that), although I
suppose it can't really happen for UTF-8 -> UTF-16 conversion.

Can't we use MultiByteToWideChar() to convert directly to the required
encoding, avoiding the double conversion?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Pierre Frédéric Caillaud
Date:
Subject: Patch LWlocks instrumentation
Next
From: Fujii Masao
Date:
Subject: Streaming Replication patch for CommitFest 2009-09