[oauth] Stabilize the libpq-oauth ABI (and allow alternative implementations?) - Mailing list pgsql-hackers
| From | Jacob Champion |
|---|---|
| Subject | [oauth] Stabilize the libpq-oauth ABI (and allow alternative implementations?) |
| Date | |
| Msg-id | CAOYmi+mrGg+n_X2MOLgeWcj3v_M00gR8uz_D7mM8z=dX1JYVbg@mail.gmail.com Whole thread Raw |
| List | pgsql-hackers |
Hi everybody, We introduced the libpq-oauth module late in the cycle for PG18, and to put it mildly, its interface isn't great. The original implementation depended on libpq internals, and we had to make sure that we didn't start crashing during major or minor version upgrades. So there were a bunch of compromises made to keep things safe, including the restriction that the module name has to change every major release. Separately, but closely related: PG18's OAuth support allows you to customize the client flow via a libpq hook function. Third-party applications can make use of that, but our own utilities can still only use the builtin device flow. That's annoying. I've been working to replace the internal ABI with a stable one, hopefully to solve both problems at the same time. A dlopen() is a pretty clear seam for other people to use to modify and extend. Unfortunately my first attempt (not pictured) ended up in a rabbit hole, because I tried to tackle the third-party use case first. My second attempt, attached, focuses on the ABI stabilization instead, which I think is more likely to succeed. (This took enough thinking that I'm really glad we didn't try this for PG18. Thanks for letting me take on some technical debt for a bit.) = Design = Here's the train of thought behind the core changes (which are in patch 0004): The builtin-flow code in fe-auth-oauth.c is similar to the custom-flow code, but it's just ever-so-slightly different. I'd like to unify the two, so I want libpq-oauth to make use of the public PGoauthBearerRequest API, and that means that almost all of the injections made in the PG18 ABI need to be replaced. Most of those injections are simply subsumed by the hook API (hooray!). A couple of others can be replaced by PQconninfo(). Four are left over: - pgthreadlock_t - libpq_gettext - conn->errorMessage - conn->oauth_issuer_id I think we should keep injecting libpq_gettext; no third-party implementations would be able to use that. And application hooks are presumably capable of figuring out top-level locking already, since the application has to have called PQregisterThreadLock() if it needed to coordinate with libpq. That leaves error messages and the issuer identifier. I think both would be useful for hooks to have, too, so I'd like to add them to PQAUTHDATA_OAUTH_BEARER_TOKEN. = PQAUTHDATA_OAUTH_BEARER_TOKEN, version 2 = My original plan for authdata extensions was to add new members to the end of the structs that libpq passes into the hook. Applications would gate on a feature macro during compilation to see whether they could use the new members. That should work fine for an application hook; you're not allowed to downgrade libpq past the version that your applications are compiled against, lest you lose symbols (or other feature-flag functionality) you're relying on. Plugins, unfortunately, can't rely on the feature macro. As we found out during the libpq-oauth split [1], we have to handle a long-running application with an old libpq that loads an upgraded libpq-oauth, even if the OS package dependencies suggest otherwise. (A plugin architecture flips the direction of the runtime dependency arrow.) There are a couple ways we could handle this. For this draft, I've implemented what I think is a middle ground between verbosity and type-safety: introduce a new V2 struct that "inherits" the V1 struct and can be down-cast in the callbacks, kinda similar to our Node hierarchy. We could go even more verbose, and duplicate the entire PGoauthBearerToken struct -- matching the callback parameter types for maximum safety -- but I'm not convinced that this would be a good use of maintenance effort. The ability to cast between the two means we can share v1 and v2 implementations in our tests. We could also just add the new members at the end, say that you're only allowed to use them if the V2 hook type is in use, and scribble on them in V1 hooks to try to get misbehaving implementations to crash outright. This arguably has the same amount of type-safety as the downcast, and the resulting code looks nicer. (The libcurl API we use does something similar with curl_version_info().) But it is definitely more "magic". Also of note: this adds a PQExpBuffer to libpq-fe.h. Technically, that type is "internal". But... is it really, though? It doesn't seem possible for us to make incompatible changes there without crashing earlier psqls, in which case I would like to make use of it too. Maybe this deserves its own minithread. Okay, on to the full patchset. = Roadmap: Prep = The first few patches are bugfixes I intend to backpatch to 18. - 0001: I stomped on the SOCKTYPE name in libpq-fe.h, but that's not in our namespace and it's conceivable that it might collide with someone else. (It collided with my own test code during my work on this.) - 0002 fixes a copy-paste bug in meson.build, which luckily hadn't caused problems yet. - 0003 ports Tom's debug2 fix for Test::Cluster::connect_fails() over to 002_client.pl. (Currently, log checks in this test aren't made after connection failures, but I don't really want to chase that down later after I've forgotten about it.) = Roadmap: Implementation = Next three patches are the core implementation, which stabilizes the ABI for libpq-oauth. I feel fairly confident that this, or something close to it, could land in PG19. - 0004 introduces the new PQAUTHDATA_OAUTH_BEARER_TOKEN_V2 API. As described above. - 0005 makes use of the new API in libpq-oauth. This removes more code than it introduces, which is exciting. Now we can rename libpq-oauth-19.so to just libpq-oauth.so, since we no longer depend on anything that could change between major versions. (We still need lock and locale support from libpq, as mentioned above, so they continue to be decoupled via injection at init time.) Some of the code in this patch overlaps with the translation fixes thread [2], which I need to get to first. I'm hoping some additional simplifications can be made after those localization bugs are fixed. I think I'd also like to get the threadlock into the module API (but not the hook API). A third-party flow might need to perform its own one-time initialization, and if it relies on an old version of Curl (or worse, Kerberos [3]), it'll need to use the same lock that the top-level application has registered for libpq. So I imagine I'll need to break out an initialization function here. Alternatively, I could introduce an API into libpq to retrieve the threadlock function in use? - 0006 removes a potential ABI-compatibility pitfall for future devs. libpq-oauth needs to use the shared-library variant of libpgcommon, but it can no longer assume that the encoding support exported by libpq is compatible. So it must not accidentally link against those functions (see [4]). I don't imagine anyone will try adding code that does this in practice; we're pretty UTF8-centric in OAuth. But just to be safe, define USE_PRIVATE_ENCODING_FUNCS so that anyone who tries will fail the build. = Roadmap: Plugins? = So now we have a stable ABI, which technically means that any enterprising developer who wants to play games with LD_LIBRARY_PATH could implement their own version of an OAuth flow, and have our utilities make use of it into the future. That brings us to patch 0007, which experimentally promotes the stable API to a public header, and introduces a really janky environment variable so that people don't have to play games. It will be obvious from the code that this is not well-baked yet. I hope the ability to dlopen() a custom flow can make it for 19; I just don't know that this envvar approach is any good. The ideal situation, IMO, is for a flow to be selected in the connection string. But we have to lock that down, similarly to how we protect local_preload_libraries, to prevent horrible exploits. At which point we'll have essentially designed a generic libpq plugin system. Not necessarily a terrible thing, but I don't think I have time to take it on for PG19. WDYT? --Jacob [1] https://postgr.es/m/aAkJnDQq3mOUvmQV%40msg.df7cb.de [2] https://postgr.es/m/TY4PR01MB1690746DB91991D1E9A47F57E94CDA%40TY4PR01MB16907.jpnprd01.prod.outlook.com [3] https://postgr.es/m/aSSp03wmNMngi/Oe%40ubby [4] https://postgr.es/c/b6c7cfac8
Attachment
- 0001-libpq-fe.h-Don-t-claim-SOCKTYPE-in-the-global-namesp.patch
- 0002-libpq-oauth-use-correct-c_args-in-meson.build.patch
- 0003-oauth_validator-Avoid-races-in-log_check.patch
- 0004-libpq-Introduce-PQAUTHDATA_OAUTH_BEARER_TOKEN_V2.patch
- 0005-libpq-oauth-Use-the-PGoauthBearerRequestV2-API.patch
- 0006-libpq-oauth-Never-link-against-libpq-s-encoding-func.patch
- 0007-WIP-Introduce-third-party-OAuth-flow-plugins.patch
pgsql-hackers by date: