Itagaki Takahiro wrote:
> We can choose different encodings from platform-dependent one
> for database, but postgres writes serverlogs in the database encoding.
> As the result, serverlogs are filled with broken characters.
>
> The problem could occur on all platforms, however, there is a solution
> for win32. Since Windows supports wide characters to write logs, we can
> convert log texts => UTF-8 => UTF-16 and pass them to WriteConsoleW()
> and ReportEventW().
>
> Especially in Japan, encoding troubles on Windows are unavoidable
> because postgres doesn't support Shift-JIS for database encoding,
> that is the native encoding for Windows Japanese edition.
>
> If we also want to support the same functionality on non-win32 platform,
> we might need non-throwable version of pg_do_encoding_conversion():
>
> log_message_to_write = pg_do_encoding_conversion_nothrow(
> log_message_in_database_encoding,
> GetDatabaseEncoding() /* as src_encoding */,
> GetPlatformEncoding() /* as dst_encoding */)
>
> and pass the result to stderr and syslog. But it requires major rewrites
> of conversion functions, so I'd like to submit a solution only for win32
> for now. Also, the issue is not so serious on non-win32 platforms because
> we can choose UTF-8 or EUC_* on those platforms.
Something like that seems reasonable for the Windows event log; that is
clearly supposed to be written using a specific encoding. With the log
files, we're more free to do what we want, and IMHO we shouldn't put a
Windows-specific hack there because as you say we have the same problem
on all platforms.
There's no guarantee that conversion to UTF-8 won't fail, so this isn't
totally risk-free on Windows either. Theoretically, MultiByteToWideChar
could fail too (the patch neglects to check for that), although I
suppose it can't really happen for UTF-8 -> UTF-16 conversion.
Can't we use MultiByteToWideChar() to convert directly to the required
encoding, avoiding the double conversion?
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com