Thread: BUG #18219: libpq does not take into consideration UNICODE define
The following bug has been logged on the website: Bug reference: 18219 Logged by: Jan Březina Email address: 2janbrezina@gmail.com PostgreSQL version: 16.0 Operating system: Windows Description: I use libpqxx via vcpkg in Windows project build for x64 system using Visual Studio. There is a UNICODE define, which causes **W Windows functions to be used and hence string parameters should be passed in using L"...". I have sslrootcert file used in my connection string. During the connection initialization stat() function is invoked for the file. It is implemented in win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then initialize_ntdll() (win32ntdll.c) is invoked. There is a use of LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW. Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in is char*, not wchar_t*. This leads to failed initialization returning error code 126 - The specified module could not be found. There should be a switch if UNICODE is defined, then L"ntdll.dll" should be passed in. The same applies to all other Windows functions consuming strings.
On Fri, Dec 1, 2023 at 4:50 AM PG Bug reporting form <noreply@postgresql.org> wrote: > I use libpqxx via vcpkg in Windows project build for x64 system using Visual > Studio. There is a UNICODE define, which causes **W Windows functions to be > used and hence string parameters should be passed in using L"...". > I have sslrootcert file used in my connection string. During the connection > initialization stat() function is invoked for the file. It is implemented in > win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then > initialize_ntdll() (win32ntdll.c) is invoked. There is a use of > LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW. > Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in > is char*, not wchar_t*. This leads to failed initialization returning error > code 126 - The specified module could not be found. > There should be a switch if UNICODE is defined, then L"ntdll.dll" should be > passed in. The same applies to all other Windows functions consuming > strings. I don't understand vcpkg so I don't even know where to look, but who defined UNICODE when building libpq?
On Fri, Dec 1, 2023 at 4:50 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> I use libpqxx via vcpkg in Windows project build for x64 system using Visual
> Studio. There is a UNICODE define, which causes **W Windows functions to be
> used and hence string parameters should be passed in using L"...".
> I have sslrootcert file used in my connection string. During the connection
> initialization stat() function is invoked for the file. It is implemented in
> win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
> initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
> LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW.
> Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in
> is char*, not wchar_t*. This leads to failed initialization returning error
> code 126 - The specified module could not be found.
> There should be a switch if UNICODE is defined, then L"ntdll.dll" should be
> passed in. The same applies to all other Windows functions consuming
> strings.
I don't understand vcpkg so I don't even know where to look, but who
defined UNICODE when building libpq?
On Fri, Dec 1, 2023 at 10:10 AM Jan Březina <2janbrezina@gmail.com> wrote: > I defined UNICODE for the whole project and it's propagated also to the dependencies. Vcpkg is just the package manager.You can see it's port in the public repository: https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake I see. Well, generally speaking PostgreSQL just isn't going to work if some external thing #defines a bunch of stuff that changes the meaning of system headers. For example on Unix systems there are various headers that will tell some systems to use various ancient standards instead of the modern POSIX semantics, and if you #define those, surprise!, PostgreSQL will not work. But I'm not a Windows person so I have no real opinion on whether that particular thing would be reasonably expected to work and is something that people typically go around doing. What would you suggest? Should we explicitly #undef it, is that what people do? My guess is that there is 0% chance that we'll ever modify our OWN behaviour due to UNICODE (that is, our C functions are not going to have a wchar_t version that UNICODE silently selects that you can call, you'll always have to talk to our code with char strings AFAICS), but as for how our own .c files internally talk to the system headers, that's kinda private to our source tree, so I guess it would at least be theoretically possible for us to nail that down with a well placed #undef. I just don't know how sensible or typical that would be.
On Fri, Dec 1, 2023 at 10:20 AM Thomas Munro <thomas.munro@gmail.com> wrote: > various headers that will tell some systems to use various ancient s/headers/macros/
On Fri, Dec 1, 2023 at 10:10 AM Jan Březina <2janbrezina@gmail.com> wrote:
> I defined UNICODE for the whole project and it's propagated also to the dependencies. Vcpkg is just the package manager. You can see it's port in the public repository: https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake
I see. Well, generally speaking PostgreSQL just isn't going to work
if some external thing #defines a bunch of stuff that changes the
meaning of system headers. For example on Unix systems there are
various headers that will tell some systems to use various ancient
standards instead of the modern POSIX semantics, and if you #define
those, surprise!, PostgreSQL will not work. But I'm not a Windows
person so I have no real opinion on whether that particular thing
would be reasonably expected to work and is something that people
typically go around doing. What would you suggest? Should we
explicitly #undef it, is that what people do? My guess is that there
is 0% chance that we'll ever modify our OWN behaviour due to UNICODE
(that is, our C functions are not going to have a wchar_t version that
UNICODE silently selects that you can call, you'll always have to talk
to our code with char strings AFAICS), but as for how our own .c files
internally talk to the system headers, that's kinda private to our
source tree, so I guess it would at least be theoretically possible
for us to nail that down with a well placed #undef. I just don't know
how sensible or typical that would be.
On Fri, Dec 1, 2023 at 10:50 AM Jan Březina <2janbrezina@gmail.com> wrote: > It's pretty common to handle UNICODE define on Windows AFAIK. I suggest two ways of handling the issue. > 1. pass the appropriate type if UNICODE is defined: > #ifdef UNICODE > LoadLibraryEx(L"ntdll.dll", nullptr, 0); > #else > LoadLibraryEx("ntdll.dll", nullptr, 0); > #endif But this would apply to many, many places where we call system functions, no? I guess you've just picked on this one case because it was the first thing to break. In some cases the string to be provided is not a constant, so would require passing through char/wchar_t conversion routines. Seems like a non-starter. > 2. Use **A functions no matter what (everywhere) > LoadLibraryExA("ntdll.dll", nullptr, 0); > > https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa > > See, LoadLibraryEx is affected by the UNICODE define within the Windows header files. It's LoadLibraryExW if it's defined,or LoadLibraryExA if it's not: > > #ifdef UNICODE > #define LoadLibraryEx LoadLibraryExW > #else > #define LoadLibraryEx LoadLibraryExA > #endif Right, but since some of the functions that are affected in this way are also standardised functions that we call on POSIX systems, I think we'd finish up scattering more #ifdef #else stuff all through the code to explicitly select the 'ANSI' (sic) Windows variant or the standardised-function-with-no-suffix variant. Also seems like a non-starter.
On Fri, Dec 1, 2023 at 10:58 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Seems like a non-starter. ... and I should probably explain why. This project has difficulty maintaining Windows support, because of a general lack of PostgreSQL developers working on that platform (but we welcome more! patches welcome!) and a disproportionately high amount of maintenance work that it generates. One factor is that there are several different Windows configurations (different compilers, C runtimes, build systems, file system semantics, partial POSIX emulations like MSYS and Cygwin...) to consider. I doubt we'd want to add another dimension to that problem space by saying we have to maintain and test the UNICODE and non-UNICODE code paths in our libraries. Hence my intuition that we should be figuring out where the right place is to suppress or undefine UNICODE.
Thomas Munro <thomas.munro@gmail.com> writes: >> Seems like a non-starter. > ... and I should probably explain why. This project has difficulty > maintaining Windows support, because of a general lack of PostgreSQL > developers working on that platform (but we welcome more! patches > welcome!) and a disproportionately high amount of maintenance work > that it generates. One factor is that there are several different > Windows configurations (different compilers, C runtimes, build > systems, file system semantics, partial POSIX emulations like MSYS and > Cygwin...) to consider. I doubt we'd want to add another dimension to > that problem space by saying we have to maintain and test the UNICODE > and non-UNICODE code paths in our libraries. Yeah, I think there's no chance that we'd accept such a patch even if one were submitted. Quite aside from the initial development effort, the ongoing maintenance cost would be large, and the value is just not there (as evidenced by the approximately zero previous requests we've had for this). > Hence my intuition that we should be figuring out where the right > place is to suppress or undefine UNICODE. I'm not even excited about that. Would we accept patches to #undef the random other system-API-changing symbols that exist on other platforms? We've not done so in the past and I don't see a great argument to start here. IMO the submitter has simply misconfigured his project. He should not assume that he can build other peoples' code with whatever configuration suits his own code. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes:
>> Seems like a non-starter.
> ... and I should probably explain why. This project has difficulty
> maintaining Windows support, because of a general lack of PostgreSQL
> developers working on that platform (but we welcome more! patches
> welcome!) and a disproportionately high amount of maintenance work
> that it generates. One factor is that there are several different
> Windows configurations (different compilers, C runtimes, build
> systems, file system semantics, partial POSIX emulations like MSYS and
> Cygwin...) to consider. I doubt we'd want to add another dimension to
> that problem space by saying we have to maintain and test the UNICODE
> and non-UNICODE code paths in our libraries.
Yeah, I think there's no chance that we'd accept such a patch even if
one were submitted. Quite aside from the initial development effort,
the ongoing maintenance cost would be large, and the value is just not
there (as evidenced by the approximately zero previous requests we've
had for this).
> Hence my intuition that we should be figuring out where the right
> place is to suppress or undefine UNICODE.
I'm not even excited about that. Would we accept patches to #undef
the random other system-API-changing symbols that exist on other
platforms? We've not done so in the past and I don't see a great
argument to start here. IMO the submitter has simply misconfigured
his project. He should not assume that he can build other peoples'
code with whatever configuration suits his own code.
regards, tom lane
On Mon, Dec 4, 2023 at 8:56 AM Jan Březina <2janbrezina@gmail.com> wrote: > The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specifywhat Windows API they are using and there are good reasons not to mix the two. > You should read the following documentation: > https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api > and mainly > https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes We know about that stuff at least in general terms, but we are arguing that *we* should be in control of whether our .c files are compiled with UNICODE defined, not someone compiling our library. That advice is about applications. libpq is a shared library. But let's actually check the cost of this: > So if you are not planning to support UNICODE define, then you should be concrete in what Windows API you are using andcall the **A (ANSI) API directly, as I suggested. It is a correct way, doesn't conflict with any defines and doesn't conflictwith any standard API, as UNICODE define affects only the behavior of Windows API. > I bet there won't be so much work to fix this, as Windows specific implementations are in separate *win32.c files and henceit should not affect any other platforms and it has to work with every compiler, as it is the standard Windows API. What symbols is libpq.dll referencing? Maybe "dumpbin /imports libpq.dll" can tell you? I guess that'd show us how many *A /*W functions are reached, and would be changed to call *A directly. If it's really just Win32 APIs that are in Windows-only code paths that'd be one thing. I wasn't sure if it might also include standard C-ish code that is shared with Unix.
Looks like there is a mix to me…
c:\...mumble…> dumpbin.exe /imports libpq.dll | grep A$
23 InitializeSecurityContextA
1 AcquireCredentialsHandleA
26A GetModuleHandleA
3A8 LoadLibraryA
237 GetFileAttributesA
BA CreateFileA
3A9 LoadLibraryExA
19F FormatMessageA
179 GetUserNameA
CC SHGetFolderPathA
c:\mumble…> dumpbin.exe /imports libpq.dll | grep W$
2C5 GetStartupInfoW
26D GetModuleHandleW
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, December 4, 2023 9:20 AM
To: Jan Březina <2janbrezina@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #18219: libpq does not take into consideration UNICODE define
On Mon, Dec 4, 2023 at 8: 56 AM Jan Březina <2janbrezina@ gmail. com> wrote: > The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and
On Mon, Dec 4, 2023 at 8:56 AM Jan Březina <2janbrezina@gmail.com> wrote:
> The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and there are good reasons not to mix the two.
> You should read the following documentation:
> https://urldefense.com/v3/__https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api__;!!LlG_G4lo9h6Y!NGBMDMpFtxNbtb155g1HCcZB7if4EEXqwlnDws8-7_cM4q7QKLQtHy960vxoB2eM1FGocuofqDDQHEKE9HLunNjlOeY$
> and mainly
> https://urldefense.com/v3/__https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes__;!!LlG_G4lo9h6Y!NGBMDMpFtxNbtb155g1HCcZB7if4EEXqwlnDws8-7_cM4q7QKLQtHy960vxoB2eM1FGocuofqDDQHEKE9HLupjSS5eI$
We know about that stuff at least in general terms, but we are arguing
that *we* should be in control of whether our .c files are compiled
with UNICODE defined, not someone compiling our library. That advice
is about applications. libpq is a shared library.
But let's actually check the cost of this:
> So if you are not planning to support UNICODE define, then you should be concrete in what Windows API you are using and call the **A (ANSI) API directly, as I suggested. It is a correct way, doesn't conflict with any defines and doesn't conflict with any standard API, as UNICODE define affects only the behavior of Windows API.
> I bet there won't be so much work to fix this, as Windows specific implementations are in separate *win32.c files and hence it should not affect any other platforms and it has to work with every compiler, as it is the standard Windows API.
What symbols is libpq.dll referencing? Maybe "dumpbin /imports
libpq.dll" can tell you? I guess that'd show us how many *A /*W
functions are reached, and would be changed to call *A directly. If
it's really just Win32 APIs that are in Windows-only code paths that'd
be one thing. I wasn't sure if it might also include standard C-ish
code that is shared with Unix.