Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows - Mailing list pgsql-bugs

From Koichi Suzuki
Subject Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
Date
Msg-id CABEZHFurokPoY+Vs0zhSf7J5ahZV1p7naOPiH5_3C-ubPdDWgA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
List pgsql-bugs




2024年12月6日(金) 14:21 Tatsuo Ishii <ishii@postgresql.org>:
> I don't believe Shift-JIS uses '/' as part of multibyte characters,

Correct.

> so it should be sufficient to consider '\'.

Agreed.

> BTW, according to wikipedia[1], backslash is not even part of the
> Shift-JIS character set:
>
>     The single-byte characters 0x00 to 0x7F match the ASCII encoding,
>     except for a yen sign (U+00A5) at 0x5C and an overline (U+203E) at
>     0x7E in place of the ASCII character set's backslash and tilde
>     respectively (these deviations from ASCII align with JIS X
>     0201). The single-byte characters from 0xA1 to 0xDF map to the
>     half-width katakana characters found in JIS X 0201.
>
>     For double-byte characters, the first byte is always in the range
>     0x81 to 0x9F or the range 0xE0 to 0xEF (these ranges are
>     unassigned in JIS X 0201). If the first byte is odd, the second
>     byte must be in the range 0x40 to 0x9E (but cannot be 0x7F); if
>     the first byte is even, the second byte must in the range 0x9F to
>     0xFC.
>
> This might mean that it'd be okay to just skip the backslash-to-slash
> conversion loops altogether if we think the encoding is Shift-JIS.

I suggest to not do so because majority of Shift-JIS users treat 0x5C
as a backslash. They understand that a 0x5C means a backslash in
Shift-JIS files if the files are for programming (source code) or for
the technical documentations and so on.

Better way is to treat 'backslash' byte value in the latter byte of SJIS-encoded character as is, not treat this byte as escape character.

I'm not sure if we can fix src/fe_utils/psqlscan.l and/or src/bin/psql/psqlscanslash.l for this.  Needs some more investigqation.


Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

All the best and thanks to all the kind inputs.

pgsql-bugs by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
Next
From: Tom Lane
Date:
Subject: Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows