Home > mailing lists

Re: Encoding issues in console and eventlog on win32 - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Encoding issues in console and eventlog on win32
Date	September 14, 2009 08:08:32
Msg-id	4AAE241E.8010406@enterprisedb.com Whole thread Raw
In response to	Encoding issues in console and eventlog on win32 (Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Responses	Re: Encoding issues in console and eventlog on win32
List	pgsql-hackers

Tree view

Itagaki Takahiro wrote:
> We can choose different encodings from platform-dependent one
> for database, but postgres writes serverlogs in the database encoding.
> As the result, serverlogs are filled with broken characters.
> 
> The problem could occur on all platforms, however, there is a solution
> for win32. Since Windows supports wide characters to write logs, we can
> convert log texts => UTF-8 => UTF-16 and pass them to WriteConsoleW()
> and ReportEventW().
> 
> Especially in Japan, encoding troubles on Windows are unavoidable
> because postgres doesn't support Shift-JIS for database encoding,
> that is the native encoding for Windows Japanese edition.
> 
> If we also want to support the same functionality on non-win32 platform,
> we might need non-throwable version of pg_do_encoding_conversion():
> 
>     log_message_to_write = pg_do_encoding_conversion_nothrow(
>         log_message_in_database_encoding,
>         GetDatabaseEncoding() /* as src_encoding */,
>         GetPlatformEncoding() /* as dst_encoding */)
> 
> and pass the result to stderr and syslog. But it requires major rewrites
> of conversion functions, so I'd like to submit a solution only for win32
> for now. Also, the issue is not so serious on non-win32 platforms because
> we can choose UTF-8 or EUC_* on those platforms.

Something like that seems reasonable for the Windows event log; that is
clearly supposed to be written using a specific encoding. With the log
files, we're more free to do what we want, and IMHO we shouldn't put a
Windows-specific hack there because as you say we have the same problem
on all platforms.

There's no guarantee that conversion to UTF-8 won't fail, so this isn't
totally risk-free on Windows either. Theoretically, MultiByteToWideChar
could fail too (the patch neglects to check for that), although I
suppose it can't really happen for UTF-8 -> UTF-16 conversion.

Can't we use MultiByteToWideChar() to convert directly to the required
encoding, avoiding the double conversion?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Pierre Frédéric Caillaud
Date: 14 September 2009, 07:20:39
Subject: Patch LWlocks instrumentation

From: Fujii Masao
Date: 14 September 2009, 08:24:59
Subject: Streaming Replication patch for CommitFest 2009-09

Re: Encoding issues in console and eventlog on win32 - Mailing list pgsql-hackers

Previous

Next