Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows - Mailing list pgsql-bugs

From Tatsuo Ishii
Subject Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
Date
Msg-id 20241207.081412.2050532354647835961.ishii@postgresql.org
Whole thread Raw
In response to Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
List pgsql-bugs
> Tatsuo Ishii <ishii@postgresql.org> writes:
>> I have looked into canonicalize_path() and found this:
> 
>>         if (*p == '\\')
>>             *p = '/';
> 
> Right, that's where the trouble is.  It'd be easy enough to make
> that loop (and the similar one in cleanup_path) encoding-aware,
> if we knew what encoding applies.  Deciding that is the sticky part.
> 
> After sleeping on it, I'm coming around to the opinion that
> client_encoding (pset.encoding) is what to use in psql, for
> two reasons:
> * we already do our best to set that correctly, and the user
> is able to change it if it's wrong;
> * as previously noted, psqlscan.l will do the wrong things
> if it's not set correctly, so you're probably already hosed
> if working in a non-server-safe encoding with the wrong
> setting of client_encoding.

I think the encoding we need to supply to canonicalize_path() is not
necessarily the same as client_encoding. For example we could set
client_encoding to UTF-8 but use a file which has Shift-JIS encode
file name.  I think what we really need to supply to
canonicalize_path() is the "file system encoding", not
client_encoding.

Among the file system encodings, the only problematic one is
Shift-JIS. As far as I know, currently there's no OS except Windows
which uses Shift-JIS as the file system encoding. So probably we can
safely assume that if the OS is Windows for Japanese, we can assume
that the file system encoding is Shift-JIS. If we know how to
determine the OS is Windows for Japanese inside the
canonicalize_path(), we don't need to change the API of it.

Quick gooling found this page (sorry, in Japanese)
https://tarenagashi.hatenablog.jp/entry/2023/07/17/160149
and it says:

- In Windows "system locale" represents the language/country used.

- The code for system locale is called "LCID" and it's 1041 (decimal)
  for Japanese/Japan.

- There are some APIs to obtain LCID (GetSystemDefaultLocaleName etc.)

As I am not familiar with Windows and I cannot test these. Can someone
confirm?

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Dangling operator family after DROP TYPE
Next
From: Tom Lane
Date:
Subject: Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows