Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows |
Date | |
Msg-id | 2840430.1733510664@sss.pgh.pa.us Whole thread Raw |
In response to | Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows (Tatsuo Ishii <ishii@postgresql.org>) |
Responses |
Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows |
List | pgsql-bugs |
Tatsuo Ishii <ishii@postgresql.org> writes: > I have looked into canonicalize_path() and found this: > if (*p == '\\') > *p = '/'; Right, that's where the trouble is. It'd be easy enough to make that loop (and the similar one in cleanup_path) encoding-aware, if we knew what encoding applies. Deciding that is the sticky part. After sleeping on it, I'm coming around to the opinion that client_encoding (pset.encoding) is what to use in psql, for two reasons: * we already do our best to set that correctly, and the user is able to change it if it's wrong; * as previously noted, psqlscan.l will do the wrong things if it's not set correctly, so you're probably already hosed if working in a non-server-safe encoding with the wrong setting of client_encoding. However, there are a bunch of callers of canonicalize_path() that are not in psql, and those arguments don't apply to them; in fact places like initdb and pg_ctl don't really have a concept of client encoding at all. So what to do? After looking through the callers I think we might not be in as bad shape as this sounds, because all of the other callers are dealing with Postgres installation paths or data directory-related paths that are also dealt with by the server. So it's not unreasonable to require that those paths must be written in server-safe encodings. If they're not, you're going to have trouble with stuff like "show data_directory". I wonder whether we ought to try to enforce that. It'd be feasible I think for initdb to verify that the selected paths are validly encoded according to whatever encoding it's about to set the server up with. If we were feeling draconian we could insist that the installation path and data directory path be all-ASCII, which is the only way to be sure that you won't have issues if you later create a database that uses some other encoding. But I think we'd likely get pushback from that. (This ties into the nearby discussion about encoding of shared-catalog names [1], which is more or less the same problem --- maybe the path encoding checks could vary depending on how we're setting that up?) Anyway, what I'm now thinking is that we can have two variants of canonicalize_path: extern void canonicalize_path(char *path); extern void canonicalize_path_enc(char *path, int encoding); The first one assumes a server-safe encoding, the second doesn't, and at least to start with only psql would bother with the second. It looks like we don't need cleanup_path_enc, not yet anyway, since that's only applied to installation paths. I am also guessing that we don't need an encoding-aware variant of make_native_path: since it only changes '/' it can't create an incorrectly encoded path, assuming the input is OK. However, this is assuming that it's okay to use '\' as a Windows directory separator even in shift-JIS, which I'm not too sure about. regards, tom lane [1] https://www.postgresql.org/message-id/CA%2BhUKGKDC7tKMZ1v0JGH5D23F-%3DADf-3UfcriVepqoi7Q_SKgQ%40mail.gmail.com
pgsql-bugs by date: