Thread: BUG #18219: libpq does not take into consideration UNICODE define

BUG #18219: libpq does not take into consideration UNICODE define

From

PG Bug reporting form

Date:

30 November 2023, 15:23:23

The following bug has been logged on the website:

Bug reference:      18219
Logged by:          Jan Březina
Email address:      2janbrezina@gmail.com
PostgreSQL version: 16.0
Operating system:   Windows
Description:

I use libpqxx via vcpkg in Windows project build for x64 system using Visual
Studio. There is a UNICODE define, which causes **W Windows functions to be
used and hence string parameters should be passed in using L"...".
I have sslrootcert file used in my connection string. During the connection
initialization stat() function is invoked for the file. It is implemented in
win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW.
Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in
is char*, not wchar_t*. This leads to failed initialization returning error
code 126 - The specified module could not be found.
There should be a switch if UNICODE is defined, then L"ntdll.dll" should be
passed in. The same applies to all other Windows functions consuming
strings.

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Thomas Munro

Date:

30 November 2023, 19:20:19

On Fri, Dec 1, 2023 at 4:50 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> I use libpqxx via vcpkg in Windows project build for x64 system using Visual
> Studio. There is a UNICODE define, which causes **W Windows functions to be
> used and hence string parameters should be passed in using L"...".
> I have sslrootcert file used in my connection string. During the connection
> initialization stat() function is invoked for the file. It is implemented in
> win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
> initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
> LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW.
> Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in
> is char*, not wchar_t*. This leads to failed initialization returning error
> code 126 - The specified module could not be found.
> There should be a switch if UNICODE is defined, then L"ntdll.dll" should be
> passed in. The same applies to all other Windows functions consuming
> strings.

I don't understand vcpkg so I don't even know where to look, but who
defined UNICODE when building libpq?

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Jan Březina

Date:

30 November 2023, 21:10:47

I defined UNICODE for the whole project and it's propagated also to the dependencies. Vcpkg is just the package manager. You can see it's port in the public repository: https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake

Dne čt 30. 11. 2023 20:20 uživatel Thomas Munro <thomas.munro@gmail.com> napsal:

On Fri, Dec 1, 2023 at 4:50 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> I use libpqxx via vcpkg in Windows project build for x64 system using Visual
> Studio. There is a UNICODE define, which causes **W Windows functions to be
> used and hence string parameters should be passed in using L"...".
> I have sslrootcert file used in my connection string. During the connection
> initialization stat() function is invoked for the file. It is implemented in
> win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
> initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
> LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW.
> Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in
> is char*, not wchar_t*. This leads to failed initialization returning error
> code 126 - The specified module could not be found.
> There should be a switch if UNICODE is defined, then L"ntdll.dll" should be
> passed in. The same applies to all other Windows functions consuming
> strings.

I don't understand vcpkg so I don't even know where to look, but who
defined UNICODE when building libpq?

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Thomas Munro

Date:

30 November 2023, 21:20:30

On Fri, Dec 1, 2023 at 10:10 AM Jan Březina <2janbrezina@gmail.com> wrote:
> I defined UNICODE for the whole project and it's propagated also to the dependencies. Vcpkg is just the package
manager.You can see it's port in the public repository:
https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake

I see.  Well, generally speaking PostgreSQL just isn't going to work
if some external thing #defines a bunch of stuff that changes the
meaning of system headers.  For example on Unix systems there are
various headers that will tell some systems to use various ancient
standards instead of the modern POSIX semantics, and if you #define
those, surprise!, PostgreSQL will not work.  But I'm not a Windows
person so I have no real opinion on whether that particular thing
would be reasonably expected to work and is something that people
typically go around doing.  What would you suggest?  Should we
explicitly #undef it, is that what people do?  My guess is that there
is 0% chance that we'll ever modify our OWN behaviour due to UNICODE
(that is, our C functions are not going to have a wchar_t version that
UNICODE silently selects that you can call, you'll always have to talk
to our code with char strings AFAICS), but as for how our own .c files
internally talk to the system headers, that's kinda private to our
source tree, so I guess it would at least be theoretically possible
for us to nail that down with a well placed #undef.  I just don't know
how sensible or typical that would be.

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Thomas Munro

Date:

30 November 2023, 21:21:49

On Fri, Dec 1, 2023 at 10:20 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> various headers that will tell some systems to use various ancient

s/headers/macros/

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Jan Březina

Date:

30 November 2023, 21:50:21

It's pretty common to handle UNICODE define on Windows AFAIK. I suggest two ways of handling the issue.

1. pass the appropriate type if UNICODE is defined:

#ifdef UNICODE

LoadLibraryEx(L"ntdll.dll", nullptr, 0);

#else

LoadLibraryEx("ntdll.dll", nullptr, 0);

#endif

2. Use **A functions no matter what (everywhere)

LoadLibraryExA("ntdll.dll", nullptr, 0);

https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa

See, LoadLibraryEx is affected by the UNICODE define within the Windows header files. It's LoadLibraryExW if it's defined, or LoadLibraryExA if it's not:

#ifdef UNICODE

#define LoadLibraryEx LoadLibraryExW

#else

#define LoadLibraryEx LoadLibraryExA

#endif

Dne čt 30. 11. 2023 22:21 uživatel Thomas Munro <thomas.munro@gmail.com> napsal:

On Fri, Dec 1, 2023 at 10:10 AM Jan Březina <2janbrezina@gmail.com> wrote:
> I defined UNICODE for the whole project and it's propagated also to the dependencies. Vcpkg is just the package manager. You can see it's port in the public repository: https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake

I see. Well, generally speaking PostgreSQL just isn't going to work
if some external thing #defines a bunch of stuff that changes the
meaning of system headers. For example on Unix systems there are
various headers that will tell some systems to use various ancient
standards instead of the modern POSIX semantics, and if you #define
those, surprise!, PostgreSQL will not work. But I'm not a Windows
person so I have no real opinion on whether that particular thing
would be reasonably expected to work and is something that people
typically go around doing. What would you suggest? Should we
explicitly #undef it, is that what people do? My guess is that there
is 0% chance that we'll ever modify our OWN behaviour due to UNICODE
(that is, our C functions are not going to have a wchar_t version that
UNICODE silently selects that you can call, you'll always have to talk
to our code with char strings AFAICS), but as for how our own .c files
internally talk to the system headers, that's kinda private to our
source tree, so I guess it would at least be theoretically possible
for us to nail that down with a well placed #undef. I just don't know
how sensible or typical that would be.

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Thomas Munro

Date:

30 November 2023, 21:58:22

On Fri, Dec 1, 2023 at 10:50 AM Jan Březina <2janbrezina@gmail.com> wrote:
> It's pretty common to handle UNICODE define on Windows AFAIK. I suggest two ways of handling the issue.
> 1. pass the appropriate type if UNICODE is defined:
> #ifdef UNICODE
> LoadLibraryEx(L"ntdll.dll", nullptr, 0);
> #else
> LoadLibraryEx("ntdll.dll", nullptr, 0);
> #endif

But this would apply to many, many places where we call system
functions, no?  I guess you've just picked on this one case because it
was the first thing to break.  In some cases the string to be provided
is not a constant, so would require passing through char/wchar_t
conversion routines.  Seems like a non-starter.

> 2. Use **A functions no matter what (everywhere)
> LoadLibraryExA("ntdll.dll", nullptr, 0);
>
> https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
>
> See, LoadLibraryEx is affected by the UNICODE define within the Windows header files. It's LoadLibraryExW if it's
defined,or LoadLibraryExA if it's not: 
>
> #ifdef UNICODE
> #define LoadLibraryEx LoadLibraryExW
> #else
> #define LoadLibraryEx LoadLibraryExA
> #endif

Right, but since some of the functions that are affected in this way
are also standardised functions that we call on POSIX systems, I think
we'd finish up scattering more #ifdef #else stuff all through the code
to explicitly select the 'ANSI' (sic) Windows variant or the
standardised-function-with-no-suffix variant.  Also seems like a
non-starter.

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Thomas Munro

Date:

30 November 2023, 22:19:53

On Fri, Dec 1, 2023 at 10:58 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Seems like a non-starter.

... and I should probably explain why.  This project has difficulty
maintaining Windows support, because of a general lack of PostgreSQL
developers working on that platform (but we welcome more! patches
welcome!) and a disproportionately high amount of maintenance work
that it generates.  One factor is that there are several different
Windows configurations (different compilers, C runtimes, build
systems, file system semantics, partial POSIX emulations like MSYS and
Cygwin...) to consider.  I doubt we'd want to add another dimension to
that problem space by saying we have to maintain and test the UNICODE
and non-UNICODE code paths in our libraries.

Hence my intuition that we should be figuring out where the right
place is to suppress or undefine UNICODE.

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Tom Lane

Date:

30 November 2023, 22:32:47

Thomas Munro <thomas.munro@gmail.com> writes:
>> Seems like a non-starter.

> ... and I should probably explain why.  This project has difficulty
> maintaining Windows support, because of a general lack of PostgreSQL
> developers working on that platform (but we welcome more! patches
> welcome!) and a disproportionately high amount of maintenance work
> that it generates.  One factor is that there are several different
> Windows configurations (different compilers, C runtimes, build
> systems, file system semantics, partial POSIX emulations like MSYS and
> Cygwin...) to consider.  I doubt we'd want to add another dimension to
> that problem space by saying we have to maintain and test the UNICODE
> and non-UNICODE code paths in our libraries.

Yeah, I think there's no chance that we'd accept such a patch even if
one were submitted.  Quite aside from the initial development effort,
the ongoing maintenance cost would be large, and the value is just not
there (as evidenced by the approximately zero previous requests we've
had for this).

> Hence my intuition that we should be figuring out where the right
> place is to suppress or undefine UNICODE.

I'm not even excited about that.  Would we accept patches to #undef
the random other system-API-changing symbols that exist on other
platforms?  We've not done so in the past and I don't see a great
argument to start here.  IMO the submitter has simply misconfigured
his project.  He should not assume that he can build other peoples'
code with whatever configuration suits his own code.

            regards, tom lane

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Jan Březina

Date:

03 December 2023, 19:55:56

The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and there are good reasons not to mix the two.

You should read the following documentation:

https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api

and mainly

https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes

So if you are not planning to support UNICODE define, then you should be concrete in what Windows API you are using and call the **A (ANSI) API directly, as I suggested. It is a correct way, doesn't conflict with any defines and doesn't conflict with any standard API, as UNICODE define affects only the behavior of Windows API.

I bet there won't be so much work to fix this, as Windows specific implementations are in separate *win32.c files and hence it should not affect any other platforms and it has to work with every compiler, as it is the standard Windows API.

I'm surprised no one raised this problem up until now and I believe there must be a lot of workarounds to make libpq library work "correctly".

If the patch would be accepted, I can help with it's implementation.

Dne čt 30. 11. 2023 23:32 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Thomas Munro <thomas.munro@gmail.com> writes:
>> Seems like a non-starter.

> ... and I should probably explain why. This project has difficulty
> maintaining Windows support, because of a general lack of PostgreSQL
> developers working on that platform (but we welcome more! patches
> welcome!) and a disproportionately high amount of maintenance work
> that it generates. One factor is that there are several different
> Windows configurations (different compilers, C runtimes, build
> systems, file system semantics, partial POSIX emulations like MSYS and
> Cygwin...) to consider. I doubt we'd want to add another dimension to
> that problem space by saying we have to maintain and test the UNICODE
> and non-UNICODE code paths in our libraries.

Yeah, I think there's no chance that we'd accept such a patch even if
one were submitted. Quite aside from the initial development effort,
the ongoing maintenance cost would be large, and the value is just not
there (as evidenced by the approximately zero previous requests we've
had for this).

> Hence my intuition that we should be figuring out where the right
> place is to suppress or undefine UNICODE.

I'm not even excited about that. Would we accept patches to #undef
the random other system-API-changing symbols that exist on other
platforms? We've not done so in the past and I don't see a great
argument to start here. IMO the submitter has simply misconfigured
his project. He should not assume that he can build other peoples'
code with whatever configuration suits his own code.

regards, tom lane

Re: BUG #18219: libpq does not take into consideration UNICODE define

From

Thomas Munro

Date:

03 December 2023, 22:20:01

On Mon, Dec 4, 2023 at 8:56 AM Jan Březina <2janbrezina@gmail.com> wrote:
> The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to
specifywhat Windows API they are using and there are good reasons not to mix the two. 
> You should read the following documentation:
> https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api
> and mainly
> https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes

We know about that stuff at least in general terms, but we are arguing
that *we* should be in control of whether our .c files are compiled
with UNICODE defined, not someone compiling our library.  That advice
is about applications.  libpq is a shared library.

But let's actually check the cost of this:

> So if you are not planning to support UNICODE define, then you should be concrete in what Windows API you are using
andcall the **A (ANSI) API directly, as I suggested. It is a correct way, doesn't conflict with any defines and doesn't
conflictwith any standard API, as UNICODE define affects only the behavior of Windows API. 
> I bet there won't be so much work to fix this, as Windows specific implementations are in separate *win32.c files and
henceit should not affect any other platforms and it has to work with every compiler, as it is the standard Windows
API.

What symbols is libpq.dll referencing?  Maybe "dumpbin /imports
libpq.dll" can tell you?  I guess that'd show us how many *A /*W
functions are reached, and would be changed to call *A directly.  If
it's really just Win32 APIs that are in Windows-only code paths that'd
be one thing.  I wasn't sure if it might also include standard C-ish
code that is shared with Unix.

RE: BUG #18219: libpq does not take into consideration UNICODE define

From

Jeff Laing

Date:

03 December 2023, 22:31:20

Looks like there is a mix to me…

c:\...mumble…> dumpbin.exe /imports libpq.dll | grep A$

23 InitializeSecurityContextA

1 AcquireCredentialsHandleA

26A GetModuleHandleA

3A8 LoadLibraryA

237 GetFileAttributesA

BA CreateFileA

3A9 LoadLibraryExA

19F FormatMessageA

179 GetUserNameA

CC SHGetFolderPathA

c:\mumble…> dumpbin.exe /imports libpq.dll | grep W$

2C5 GetStartupInfoW

26D GetModuleHandleW

From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, December 4, 2023 9:20 AM
To: Jan Březina <2janbrezina@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #18219: libpq does not take into consideration UNICODE define

On Mon, Dec 4, 2023 at 8: 56 AM Jan Březina <2janbrezina@ gmail. com> wrote: > The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and

On Mon, Dec 4, 2023 at 8:56 AM Jan Březina <2janbrezina@gmail.com> wrote:

> The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and there are good reasons not to mix the two.

> You should read the following documentation:

> https://urldefense.com/v3/__https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api__;!!LlG_G4lo9h6Y!NGBMDMpFtxNbtb155g1HCcZB7if4EEXqwlnDws8-7_cM4q7QKLQtHy960vxoB2eM1FGocuofqDDQHEKE9HLunNjlOeY$

> and mainly

> https://urldefense.com/v3/__https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes__;!!LlG_G4lo9h6Y!NGBMDMpFtxNbtb155g1HCcZB7if4EEXqwlnDws8-7_cM4q7QKLQtHy960vxoB2eM1FGocuofqDDQHEKE9HLupjSS5eI$

We know about that stuff at least in general terms, but we are arguing

that *we* should be in control of whether our .c files are compiled

with UNICODE defined, not someone compiling our library.  That advice

is about applications.  libpq is a shared library.

But let's actually check the cost of this:

> So if you are not planning to support UNICODE define, then you should be concrete in what Windows API you are using and call the **A (ANSI) API directly, as I suggested. It is a correct way, doesn't conflict with any defines and doesn't conflict with any standard API, as UNICODE define affects only the behavior of Windows API.

> I bet there won't be so much work to fix this, as Windows specific implementations are in separate *win32.c files and hence it should not affect any other platforms and it has to work with every compiler, as it is the standard Windows API.

What symbols is libpq.dll referencing?  Maybe "dumpbin /imports

libpq.dll" can tell you?  I guess that'd show us how many *A /*W

functions are reached, and would be changed to call *A directly.  If

it's really just Win32 APIs that are in Windows-only code paths that'd

be one thing.  I wasn't sure if it might also include standard C-ish

code that is shared with Unix.