Re: Reject Windows command lines not convertible to system locale - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Reject Windows command lines not convertible to system locale |
Date | |
Msg-id | CA+hUKG+=rA6C=C-Fdz=hHftPmnaE8dmwYXCJU-+PDUPtRHuYNw@mail.gmail.com Whole thread Raw |
Responses |
Re: Reject Windows command lines not convertible to system locale
|
List | pgsql-hackers |
On Sun, Dec 15, 2024 at 3:27 PM Noah Misch <noah@leadboat.com> wrote: > yielding mojibake Thank you for this magnificent terminology. > What we can do safely is exit if we could not convert the command line to the > Windows ANSI code page (losslessly). Patch attached. This seems to make sense generally, since (based only on googling, I am not a Windows person and I don't even play one on TV), it doesn't sound like there is a way for the system to tell you that the conversion was lossy. For the all-ASCII test, why not us pg_is_ascii()? I wonder if there are any legitimate uses that would fail spuriously due to the minutiae of normalisation, surrogates, etc. Or perhaps that type of thing is part of the problem you're talking about. > As I mention in a code comment, I decided not to do anything about the similar > hazard in GetEnvironmentStringsW(). A possible alternative would be to check > each var referenced in PostgreSQL code or each var where the name begins with > "PG". Should we use one of those or another alternative? I would vote for having a table of variables we know about and checking them too, just on principle (it's not the characters the user intended). Checking every variable might reject legitimate UTF-16 communication between parent and sub-processes other than ours, and even assuming that we own everything beginning "PG" seems an unnecessary overreach, which seems to leave only the known list idea. > The Perl testing calls powershell, which is new for our source tree. The > Internet thinks Windows Server 2008 R1 is the newest Windows where powershell > was optional. The buildfarm's oldest Windows is Windows Server 2008 R2. If > powershell is bad, there's likely a hairy way to write the test in any of > cmd.exe, cscript, Perl Win32::API, or C. My first choice was Perl > Win32::Process or IPC::Run. This test needs CreateProcessW(), but those two > use CreateProcessA(). Sounds OK from here if it's the only sane way, and you could always skip? > I didn't test a CP_UTF8 Windows system. The new test coverage has a condition > intended to avoid failure there. (My Windows development environments are all > too old, and I stopped short of building a new one for this.) CP_UTF8 Windows > raises broader issues. I'll start a separate thread about them. Is it really still in beta? Yeah, I can't immediately think of a good reason not to treat it the same as everything else and create more code paths, even if we think it should always succeed (but I'll look out for your thread, I'm quite interested in this general topic of "can we make a lot of Windows encoding troubles go away that way"). It's a few extra CPU cycles but I guess it doesn't matter much. > I wanted to put the check in some function that most main() instances already > call, instead of adding ~40 new calls for this esoteric topic. Options: > > 1. Call in pg_logging_init() and the backend. Add pg_logging_init() calls to > pg_config, ecpg (the ecpg preprocessor), and pg_test_timing. I picked > this, for two minor benefits. First, it facilitates the backend > initializing LC_MESSAGES before this check. Second, every frontend may > need pg_logging_init() eventually, so it's fair to add calls where missing. > > 2. Call in set_pglocale_pgservice() and the three programs that don't call > set_pglocale_pgservice(): oid2name, vacuumlo, pgbench. +1 FWIW, coincidentally, I assume, I wrote about environ and argv encoding problems last week, including a test program showing wchar_t <-> ACP weirdness before main() among other interesting related effects[1]. (My focus has been on how we sort out ACP vs database encoding and block stuff that is b0rked in *our* code, ie below main(), because we don't understand the platform's encoding model, not so much the stuff that is already b0rked in all non-UTF8, non-wmain() Windows programs in general before even laying hands on the program counter. I had got as far as wondering out loud if forcing or strongly recommending UTF-8 ACP should be on the agenda to head off various related problems and generally make Windows not so different from Unix, and also supporting it as a native encoding properly on that platform as experimented with in CF #3772, but much more study is required there.) [1] https://www.postgresql.org/message-id/CA%2BhUKGL%3DF0pSLLf3nLpA_-sBwYsLg7s%3DFD6YFo_PDvS84FE_hw%40mail.gmail.com
pgsql-hackers by date: