Re: Reject Windows command lines not convertible to system locale - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Reject Windows command lines not convertible to system locale
Date
Msg-id CA+hUKG+=rA6C=C-Fdz=hHftPmnaE8dmwYXCJU-+PDUPtRHuYNw@mail.gmail.com
Whole thread Raw
Responses Re: Reject Windows command lines not convertible to system locale
List pgsql-hackers
On Sun, Dec 15, 2024 at 3:27 PM Noah Misch <noah@leadboat.com> wrote:
> yielding mojibake

Thank you for this magnificent terminology.

> What we can do safely is exit if we could not convert the command line to the
> Windows ANSI code page (losslessly).  Patch attached.

This seems to make sense generally, since (based only on googling, I
am not a Windows person and I don't even play one on TV), it doesn't
sound like there is a way for the system to tell you that the
conversion was lossy.

For the all-ASCII test, why not us pg_is_ascii()?

I wonder if there are any legitimate uses that would fail spuriously
due to the minutiae of normalisation, surrogates, etc.  Or perhaps
that type of thing is part of the problem you're talking about.

> As I mention in a code comment, I decided not to do anything about the similar
> hazard in GetEnvironmentStringsW().  A possible alternative would be to check
> each var referenced in PostgreSQL code or each var where the name begins with
> "PG".  Should we use one of those or another alternative?

I would vote for having a table of variables we know about and
checking them too, just on principle (it's not the characters the user
intended).  Checking every variable might reject legitimate UTF-16
communication between parent and sub-processes other than ours, and
even assuming that we own everything beginning "PG" seems an
unnecessary overreach, which seems to leave only the known list idea.

> The Perl testing calls powershell, which is new for our source tree.  The
> Internet thinks Windows Server 2008 R1 is the newest Windows where powershell
> was optional.  The buildfarm's oldest Windows is Windows Server 2008 R2.  If
> powershell is bad, there's likely a hairy way to write the test in any of
> cmd.exe, cscript, Perl Win32::API, or C.  My first choice was Perl
> Win32::Process or IPC::Run.  This test needs CreateProcessW(), but those two
> use CreateProcessA().

Sounds OK from here if it's the only sane way, and you could always skip?

> I didn't test a CP_UTF8 Windows system.  The new test coverage has a condition
> intended to avoid failure there.  (My Windows development environments are all
> too old, and I stopped short of building a new one for this.)  CP_UTF8 Windows
> raises broader issues.  I'll start a separate thread about them.

Is it really still in beta?  Yeah, I can't immediately think of a good
reason not to treat it the same as everything else and create more
code paths, even if we think it should always succeed (but I'll look
out for your thread, I'm quite interested in this general topic of
"can we make a lot of Windows encoding troubles go away that way").
It's a few extra CPU cycles but I guess it doesn't matter much.

> I wanted to put the check in some function that most main() instances already
> call, instead of adding ~40 new calls for this esoteric topic.  Options:
>
> 1. Call in pg_logging_init() and the backend.  Add pg_logging_init() calls to
>    pg_config, ecpg (the ecpg preprocessor), and pg_test_timing.  I picked
>    this, for two minor benefits.  First, it facilitates the backend
>    initializing LC_MESSAGES before this check.  Second, every frontend may
>    need pg_logging_init() eventually, so it's fair to add calls where missing.
>
> 2. Call in set_pglocale_pgservice() and the three programs that don't call
>    set_pglocale_pgservice(): oid2name, vacuumlo, pgbench.

+1

FWIW, coincidentally, I assume, I wrote about environ and argv
encoding problems last week, including a test program showing wchar_t
<-> ACP weirdness before main() among other interesting related
effects[1].  (My focus has been on how we sort out ACP vs database
encoding and block stuff that is b0rked in *our* code, ie below
main(), because we don't understand the platform's encoding model, not
so much the stuff that is already b0rked in all non-UTF8, non-wmain()
Windows programs in general before even laying hands on the program
counter.  I had got as far as wondering out loud if forcing or
strongly recommending UTF-8 ACP should be on the agenda to head off
various related problems and generally make Windows not so different
from Unix, and also supporting it as a native encoding properly on
that platform as experimented with in CF #3772, but much more study is
required there.)

[1] https://www.postgresql.org/message-id/CA%2BhUKGL%3DF0pSLLf3nLpA_-sBwYsLg7s%3DFD6YFo_PDvS84FE_hw%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Roberto C. Sánchez
Date:
Subject: Backport of CVE-2024-10978 fix to older pgsql versions (11, 9.6, and 9.4)
Next
From: Noah Misch
Date:
Subject: Re: Reject Windows command lines not convertible to system locale