Thread: [PoC] Federated Authn/z with OAUTHBEARER
Hi all, We've been working on ways to expand the list of third-party auth methods that Postgres provides. Some example use cases might be "I want to let anyone with a Google account read this table" or "let anyone who belongs to this GitHub organization connect as a superuser". Attached is a proof of concept that implements pieces of OAuth 2.0 federated authorization, via the OAUTHBEARER SASL mechanism from RFC 7628 [1]. Currently, only Linux is supported due to some ugly hacks in the backend. The architecture can support the following use cases, as long as your OAuth issuer of choice implements the necessary specs, and you know how to write a validator for your issuer's bearer tokens: - Authentication only, where an external validator uses the bearer token to determine the end user's identity, and Postgres decides whether that user ID is authorized to connect via the standard pg_ident user mapping. - Authorization only, where the validator uses the bearer token to determine the allowed roles for the end user, and then checks to make sure that the connection's role is one of those. This bypasses pg_ident and allows pseudonymous connections, where Postgres doesn't care who you are as long as the token proves you're allowed to assume the role you want. - A combination, where the validator provides both an authn_id (for later audits of database access) and an authorization decision based on the bearer token and role provided. It looks kinda like this during use: $ psql 'host=example.org oauth_client_id=f02c6361-0635-...' Visit https://oauth.example.org/login and enter the code: FPQ2-M4BG = Quickstart = For anyone who likes building and seeing green tests ASAP. Prerequisite software: - iddawc v0.9.9 [2], library and dev headers, for client support - Python 3, for the test suite only (Some newer distributions have dev packages for iddawc, but mine did not.) Configure using --with-oauth (and, if you've installed iddawc into a non-standard location, be sure to use --with-includes and --with- libraries. Make sure either rpath or LD_LIBRARY_PATH will get you what you need). Install as usual. To run the test suite, make sure the contrib/authn_id extension is installed, then init and start your dev cluster. No other configuration is required; the test will do it for you. Switch to the src/test/python directory, point your PG* envvars to a superuser connection on the cluster (so that a "bare" psql will connect automatically), and run `make installcheck`. = Production Setup = (but don't use this in production, please) Actually setting up a "real" system requires knowing the specifics of your third-party issuer of choice. Your issuer MUST implement OpenID Discovery and the OAuth Device Authorization flow! Seriously, check this before spending a lot of time writing a validator against an issuer that can't actually talk to libpq. The broad strokes are as follows: 1. Register a new public client with your issuer to get an OAuth client ID for libpq. You'll use this as the oauth_client_id in the connection string. (If your issuer doesn't support public clients and gives you a client secret, you can use the oauth_client_secret connection parameter to provide that too.) The client you register must be able to use a device authorization flow; some issuers require additional setup for that. 2. Set up your HBA with the 'oauth' auth method, and set the 'issuer' and 'scope' options. 'issuer' is the base URL identifying your third- party issuer (for example, https://accounts.google.com), and 'scope' is the set of OAuth scopes that the client and server will need to authenticate and/or authorize the user (e.g. "openid email"). So a sample HBA line might look like host all all samehost oauth issuer="https://accounts.google.com" scope="openid email" 3. In postgresql.conf, set up an oauth_validator_command that's capable of verifying bearer tokens and implements the validator protocol. This is the hardest part. See below. = Design = On the client side, I've implemented the Device Authorization flow (RFC 8628, [3]). What this means in practice is that libpq reaches out to a third-party issuer (e.g. Google, Azure, etc.), identifies itself with a client ID, and requests permission to act on behalf of the end user. The issuer responds with a login URL and a one-time code, which libpq presents to the user using the notice hook. The end user then navigates to that URL, presents their code, authenticates to the issuer, and grants permission for libpq to retrieve a bearer token. libpq grabs a token and sends it to the server for verification. (The bearer token, in this setup, is essentially a plaintext password, and you must secure it like you would a plaintext password. The token has an expiration date and can be explicitly revoked, which makes it slightly better than a password, but this is still a step backwards from something like SCRAM with channel binding. There are ways to bind a bearer token to a client certificate [4], which would mitigate the risk of token theft -- but your issuer has to support that, and I haven't found much support in the wild.) The server side is where things get more difficult for the DBA. The OAUTHBEARER spec has this to say about the server side implementation: The server validates the response according to the specification for the OAuth Access Token Types used. And here's what the Bearer Token specification [5] says: This document does not specify the encoding or the contents of the token; hence, detailed recommendations about the means of guaranteeing token integrity protection are outside the scope of this document. It's the Wild West. Every issuer does their own thing in their own special way. Some don't really give you a way to introspect information about a bearer token at all, because they assume that the issuer of the token and the consumer of the token are essentially the same service. Some major players provide their own custom libraries, implemented in your-language-of-choice, to deal with their particular brand of magic. So I punted and added the oauth_validator_command GUC. A token validator command reads the bearer token from a file descriptor that's passed to it, then does whatever magic is necessary to validate that token and find out who owns it. Optionally, it can look at the role that's being connected and make sure that the token authorizes the user to actually use that role. Then it says yea or nay to Postgres, and optionally tells the server who the user is so that their ID can be logged and mapped through pg_ident. (See the commit message in 0005 for a full description of the protocol. The test suite also has two toy implementations that illustrate the protocol, but they provide zero security.) This is easily the worst part of the patch, not only because my implementation is a bad hack on OpenPipeStream(), but because it balances the security of the entire system on the shoulders of a DBA who does not have time to read umpteen OAuth specifications cover to cover. More thought and coding effort is needed here, but I didn't want to gold-plate a bad design. I'm not sure what alternatives there are within the rules laid out by OAUTHBEARER. And the system is _extremely_ flexible, in the way that only code that's maintained by somebody else can be. = Patchset Roadmap = The seven patches can be grouped into three: 1. Prep 0001 decouples the SASL code from the SCRAM implementation. 0002 makes it possible to use common/jsonapi from the frontend. 0003 lets the json_errdetail() result be freed, to avoid leaks. 2. OAUTHBEARER Implementation 0004 implements the client with libiddawc. 0005 implements server HBA support and oauth_validator_command. 3. Testing 0006 adds a simple test extension to retrieve the authn_id. 0007 adds the Python test suite I've been developing against. The first three patches are, hopefully, generally useful outside of this implementation, and I'll plan to register them in the next commitfest. The middle two patches are the "interesting" pieces, and I've split them into client and server for ease of understanding, though neither is particularly useful without the other. The last two patches grew out of a test suite that I originally built to be able to exercise NSS corner cases at the protocol/byte level. It was incredibly helpful during implementation of this new SASL mechanism, since I could write the client and server independently of each other and get high coverage of broken/malicious implementations. It's based on pytest and Construct, and the Python 3 requirement might turn some away, but I wanted to include it in case anyone else wanted to hack on the code. src/test/python/README explains more. = Thoughts/Reflections = ...in no particular order. I picked OAuth 2.0 as my first experiment in federated auth mostly because I was already familiar with pieces of it. I think SAML (via the SAML20 mechanism, RFC 6595) would be a good companion to this proof of concept, if there is general interest in federated deployments. I don't really like the OAUTHBEARER spec, but I'm not sure there's a better alternative. Everything is left as an exercise for the reader. It's not particularly extensible. Standard OAuth is built for authorization, not authentication, and from reading the RFC's history, it feels like it was a hack to just get something working. New standards like OpenID Connect have begun to fill in the gaps, but the SASL mechanisms have not kept up. (The OPENID20 mechanism is, to my understanding, unrelated/obsolete.) And support for helpful OIDC features seems to be spotty in the real world. The iddawc dependency for client-side OAuth was extremely helpful to develop this proof of concept quickly, but I don't think it would be an appropriate component to build a real feature on. It's extremely heavyweight -- it incorporates a huge stack of dependencies, including a logging framework and a web server, to implement features we would probably never use -- and it's fairly difficult to debug in practice. If a device authorization flow were the only thing that libpq needed to support natively, I think we should just depend on a widely used HTTP client, like libcurl or neon, and implement the minimum spec directly against the existing test suite. There are a huge number of other authorization flows besides Device Authorization; most would involve libpq automatically opening a web browser for you. I felt like that wasn't an appropriate thing for a library to do by default, especially when one of the most important clients is a command-line application. Perhaps there could be a hook for applications to be able to override the builtin flow and substitute their own. Since bearer tokens are essentially plaintext passwords, the relevant specs require the use of transport-level protection, and I think it'd be wise for the client to require TLS to be in place before performing the initial handshake or sending a token. Not every OAuth issuer is also an OpenID Discovery provider, so it's frustrating that OAUTHBEARER (which is purportedly an OAuth 2.0 feature) requires OIDD for real-world implementations. Perhaps we could hack around this with a data: URI or something. The client currently performs the OAuth login dance every single time a connection is made, but a proper OAuth client would cache its tokens to reuse later, and keep an eye on their expiration times. This would make daily use a little more like that of Kerberos, but we would have to design a way to create and secure a token cache on disk. If you've read this far, thank you for your interest, and I hope you enjoy playing with it! --Jacob [1] https://datatracker.ietf.org/doc/html/rfc7628 [2] https://github.com/babelouest/iddawc [3] https://datatracker.ietf.org/doc/html/rfc8628 [4] https://datatracker.ietf.org/doc/html/rfc8705 [5] https://datatracker.ietf.org/doc/html/rfc6750#section-5.2
Attachment
- 0001-auth-generalize-SASL-mechanisms.patch
- 0002-src-common-remove-logging-from-jsonapi-for-shlib.patch
- 0003-common-jsonapi-always-palloc-the-error-strings.patch
- 0004-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- 0005-backend-add-OAUTHBEARER-SASL-mechanism.patch
- 0006-Add-a-very-simple-authn_id-extension.patch
- 0007-Add-pytest-suite-for-OAuth.patch
On Tue, Jun 08, 2021 at 04:37:46PM +0000, Jacob Champion wrote: > 1. Prep > > 0001 decouples the SASL code from the SCRAM implementation. > 0002 makes it possible to use common/jsonapi from the frontend. > 0003 lets the json_errdetail() result be freed, to avoid leaks. > > The first three patches are, hopefully, generally useful outside of > this implementation, and I'll plan to register them in the next > commitfest. The middle two patches are the "interesting" pieces, and > I've split them into client and server for ease of understanding, > though neither is particularly useful without the other. Beginning with the beginning, could you spawn two threads for the jsonapi rework and the SASL/SCRAM business? I agree that these look independently useful. Glad to see someone improving the code with SASL and SCRAM which are too inter-dependent now. I saw in the RFCs dedicated to OAUTH the need for the JSON part as well. +# define check_stack_depth() +# ifdef JSONAPI_NO_LOG +# define json_log_and_abort(...) \ + do { fprintf(stderr, __VA_ARGS__); exit(1); } while(0) +# else In patch 0002, this is the wrong approach. libpq will not be able to feed on such reports, and you cannot use any of the APIs from the palloc() family either as these just fail on OOM. libpq should be able to know about the error, and would fill in the error back to the application. This abstraction is not necessary on HEAD as pg_verifybackup is fine with this level of reporting. My rough guess is that we will need to split the existing jsonapi.c into two files, one that can be used in shared libraries and a second that handles the errors. + /* TODO: SASL_EXCHANGE_FAILURE with output is forbidden in SASL */ if (result == SASL_EXCHANGE_SUCCESS) sendAuthRequest(port, AUTH_REQ_SASL_FIN, output, outputlen); Perhaps that's an issue we need to worry on its own? I didn't recall this part.. -- Michael
Attachment
On 08/06/2021 19:37, Jacob Champion wrote: > We've been working on ways to expand the list of third-party auth > methods that Postgres provides. Some example use cases might be "I want > to let anyone with a Google account read this table" or "let anyone who > belongs to this GitHub organization connect as a superuser". Cool! > The iddawc dependency for client-side OAuth was extremely helpful to > develop this proof of concept quickly, but I don't think it would be an > appropriate component to build a real feature on. It's extremely > heavyweight -- it incorporates a huge stack of dependencies, including > a logging framework and a web server, to implement features we would > probably never use -- and it's fairly difficult to debug in practice. > If a device authorization flow were the only thing that libpq needed to > support natively, I think we should just depend on a widely used HTTP > client, like libcurl or neon, and implement the minimum spec directly > against the existing test suite. You could punt and let the application implement that stuff. I'm imagining that the application code would look something like this: conn = PQconnectStartParams(...); for (;;) { status = PQconnectPoll(conn) switch (status) { case CONNECTION_SASL_TOKEN_REQUIRED: /* open a browser for the user, get token */ token = open_browser() PQauthResponse(token); break; ... } } It would be nice to have a simple default implementation, though, for psql and all the other client applications that come with PostgreSQL itself. > If you've read this far, thank you for your interest, and I hope you > enjoy playing with it! A few small things caught my eye in the backend oauth_exchange function: > + /* Handle the client's initial message. */ > + p = strdup(input); this strdup() should be pstrdup(). In the same function, there are a bunch of reports like this: > ereport(ERROR, > + (errcode(ERRCODE_PROTOCOL_VIOLATION), > + errmsg("malformed OAUTHBEARER message"), > + errdetail("Comma expected, but found character \"%s\".", > + sanitize_char(*p)))); I don't think the double quotes are needed here, because sanitize_char will return quotes if it's a single character. So it would end up looking like this: ... found character "'x'". - Heikki
On Fri, 2021-06-18 at 11:31 +0300, Heikki Linnakangas wrote: > On 08/06/2021 19:37, Jacob Champion wrote: > > We've been working on ways to expand the list of third-party auth > > methods that Postgres provides. Some example use cases might be "I want > > to let anyone with a Google account read this table" or "let anyone who > > belongs to this GitHub organization connect as a superuser". > > Cool! Glad you think so! :D > > The iddawc dependency for client-side OAuth was extremely helpful to > > develop this proof of concept quickly, but I don't think it would be an > > appropriate component to build a real feature on. It's extremely > > heavyweight -- it incorporates a huge stack of dependencies, including > > a logging framework and a web server, to implement features we would > > probably never use -- and it's fairly difficult to debug in practice. > > If a device authorization flow were the only thing that libpq needed to > > support natively, I think we should just depend on a widely used HTTP > > client, like libcurl or neon, and implement the minimum spec directly > > against the existing test suite. > > You could punt and let the application implement that stuff. I'm > imagining that the application code would look something like this: > > conn = PQconnectStartParams(...); > for (;;) > { > status = PQconnectPoll(conn) > switch (status) > { > case CONNECTION_SASL_TOKEN_REQUIRED: > /* open a browser for the user, get token */ > token = open_browser() > PQauthResponse(token); > break; > ... > } > } I was toying with the idea of having a callback for libpq clients, where they could take full control of the OAuth flow if they wanted to. Doing it inline with PQconnectPoll seems like it would work too. It has a couple of drawbacks that I can see: - If a client isn't currently using a poll loop, they'd have to switch to one to be able to use OAuth connections. Not a difficult change, but considering all the other hurdles to making this work, I'm hoping to minimize the hoop-jumping. - A client would still have to receive a bunch of OAuth parameters from some new libpq API in order to construct the correct URL to visit, so the overall complexity for implementers might be higher than if we just passed those params directly in a callback. > It would be nice to have a simple default implementation, though, for > psql and all the other client applications that come with PostgreSQL itself. I agree. I think having a bare-bones implementation in libpq itself would make initial adoption *much* easier, and then if specific applications wanted to have richer control over an authorization flow, then they could implement that themselves with the aforementioned callback. The Device Authorization flow was the most minimal working implementation I could find, since it doesn't require a web browser on the system, just the ability to print a prompt to the console. But if anyone knows of a better flow for this use case, I'm all ears. > > If you've read this far, thank you for your interest, and I hope you > > enjoy playing with it! > > A few small things caught my eye in the backend oauth_exchange function: > > > + /* Handle the client's initial message. */ > > + p = strdup(input); > > this strdup() should be pstrdup(). Thanks, I'll fix that in the next re-roll. > In the same function, there are a bunch of reports like this: > > > ereport(ERROR, > > + (errcode(ERRCODE_PROTOCOL_VIOLATION), > > + errmsg("malformed OAUTHBEARER message"), > > + errdetail("Comma expected, but found character \"%s\".", > > + sanitize_char(*p)))); > > I don't think the double quotes are needed here, because sanitize_char > will return quotes if it's a single character. So it would end up > looking like this: ... found character "'x'". I'll fix this too. Thanks! --Jacob
On Fri, 2021-06-18 at 13:07 +0900, Michael Paquier wrote: > On Tue, Jun 08, 2021 at 04:37:46PM +0000, Jacob Champion wrote: > > 1. Prep > > > > 0001 decouples the SASL code from the SCRAM implementation. > > 0002 makes it possible to use common/jsonapi from the frontend. > > 0003 lets the json_errdetail() result be freed, to avoid leaks. > > > > The first three patches are, hopefully, generally useful outside of > > this implementation, and I'll plan to register them in the next > > commitfest. The middle two patches are the "interesting" pieces, and > > I've split them into client and server for ease of understanding, > > though neither is particularly useful without the other. > > Beginning with the beginning, could you spawn two threads for the > jsonapi rework and the SASL/SCRAM business? Done [1, 2]. I've copied your comments into those threads with my responses, and I'll have them registered in commitfest shortly. Thanks! --Jacob [1] https://www.postgresql.org/message-id/3d2a6f5d50e741117d6baf83eb67ebf1a8a35a11.camel%40vmware.com [2] https://www.postgresql.org/message-id/a250d475ba1c0cc0efb7dfec8e538fcc77cdcb8e.camel%40vmware.com
On Tue, Jun 22, 2021 at 11:26:03PM +0000, Jacob Champion wrote: > Done [1, 2]. I've copied your comments into those threads with my > responses, and I'll have them registered in commitfest shortly. Thanks! -- Michael
Attachment
On Tue, 2021-06-22 at 23:22 +0000, Jacob Champion wrote: > On Fri, 2021-06-18 at 11:31 +0300, Heikki Linnakangas wrote: > > > > A few small things caught my eye in the backend oauth_exchange function: > > > > > + /* Handle the client's initial message. */ > > > + p = strdup(input); > > > > this strdup() should be pstrdup(). > > Thanks, I'll fix that in the next re-roll. > > > In the same function, there are a bunch of reports like this: > > > > > ereport(ERROR, > > > + (errcode(ERRCODE_PROTOCOL_VIOLATION), > > > + errmsg("malformed OAUTHBEARER message"), > > > + errdetail("Comma expected, but found character \"%s\".", > > > + sanitize_char(*p)))); > > > > I don't think the double quotes are needed here, because sanitize_char > > will return quotes if it's a single character. So it would end up > > looking like this: ... found character "'x'". > > I'll fix this too. Thanks! v2, attached, incorporates Heikki's suggested fixes and also rebases on top of latest HEAD, which had the SASL refactoring changes committed last month. The biggest change from the last patchset is 0001, an attempt at enabling jsonapi in the frontend without the use of palloc(), based on suggestions by Michael and Tom from last commitfest. I've also made some improvements to the pytest suite. No major changes to the OAuth implementation yet. --Jacob
Attachment
On Wed, Aug 25, 2021 at 11:42 AM Jacob Champion <pchampion@vmware.com> wrote:
On Tue, 2021-06-22 at 23:22 +0000, Jacob Champion wrote:
> On Fri, 2021-06-18 at 11:31 +0300, Heikki Linnakangas wrote:
> >
> > A few small things caught my eye in the backend oauth_exchange function:
> >
> > > + /* Handle the client's initial message. */
> > > + p = strdup(input);
> >
> > this strdup() should be pstrdup().
>
> Thanks, I'll fix that in the next re-roll.
>
> > In the same function, there are a bunch of reports like this:
> >
> > > ereport(ERROR,
> > > + (errcode(ERRCODE_PROTOCOL_VIOLATION),
> > > + errmsg("malformed OAUTHBEARER message"),
> > > + errdetail("Comma expected, but found character \"%s\".",
> > > + sanitize_char(*p))));
> >
> > I don't think the double quotes are needed here, because sanitize_char
> > will return quotes if it's a single character. So it would end up
> > looking like this: ... found character "'x'".
>
> I'll fix this too. Thanks!
v2, attached, incorporates Heikki's suggested fixes and also rebases on
top of latest HEAD, which had the SASL refactoring changes committed
last month.
The biggest change from the last patchset is 0001, an attempt at
enabling jsonapi in the frontend without the use of palloc(), based on
suggestions by Michael and Tom from last commitfest. I've also made
some improvements to the pytest suite. No major changes to the OAuth
implementation yet.
--Jacob
Hi,
For v2-0001-common-jsonapi-support-FRONTEND-clients.patch :
+ termJsonLexContext(&lex);
At the end of termJsonLexContext(), empty is copied to lex. For stack based JsonLexContext, the copy seems unnecessary.
Maybe introduce a boolean parameter for termJsonLexContext() to signal that the copy can be omitted ?
+#ifdef FRONTEND
+ /* make sure initialization succeeded */
+ if (lex->strval == NULL)
+ return JSON_OUT_OF_MEMORY;
+ /* make sure initialization succeeded */
+ if (lex->strval == NULL)
+ return JSON_OUT_OF_MEMORY;
Should PQExpBufferBroken(lex->strval) be used for the check ?
Thanks
On Wed, Aug 25, 2021 at 3:25 PM Zhihong Yu <zyu@yugabyte.com> wrote:
On Wed, Aug 25, 2021 at 11:42 AM Jacob Champion <pchampion@vmware.com> wrote:On Tue, 2021-06-22 at 23:22 +0000, Jacob Champion wrote:
> On Fri, 2021-06-18 at 11:31 +0300, Heikki Linnakangas wrote:
> >
> > A few small things caught my eye in the backend oauth_exchange function:
> >
> > > + /* Handle the client's initial message. */
> > > + p = strdup(input);
> >
> > this strdup() should be pstrdup().
>
> Thanks, I'll fix that in the next re-roll.
>
> > In the same function, there are a bunch of reports like this:
> >
> > > ereport(ERROR,
> > > + (errcode(ERRCODE_PROTOCOL_VIOLATION),
> > > + errmsg("malformed OAUTHBEARER message"),
> > > + errdetail("Comma expected, but found character \"%s\".",
> > > + sanitize_char(*p))));
> >
> > I don't think the double quotes are needed here, because sanitize_char
> > will return quotes if it's a single character. So it would end up
> > looking like this: ... found character "'x'".
>
> I'll fix this too. Thanks!
v2, attached, incorporates Heikki's suggested fixes and also rebases on
top of latest HEAD, which had the SASL refactoring changes committed
last month.
The biggest change from the last patchset is 0001, an attempt at
enabling jsonapi in the frontend without the use of palloc(), based on
suggestions by Michael and Tom from last commitfest. I've also made
some improvements to the pytest suite. No major changes to the OAuth
implementation yet.
--JacobHi,For v2-0001-common-jsonapi-support-FRONTEND-clients.patch :+ /* Clean up. */+ termJsonLexContext(&lex);At the end of termJsonLexContext(), empty is copied to lex. For stack based JsonLexContext, the copy seems unnecessary.Maybe introduce a boolean parameter for termJsonLexContext() to signal that the copy can be omitted ?+#ifdef FRONTEND
+ /* make sure initialization succeeded */
+ if (lex->strval == NULL)
+ return JSON_OUT_OF_MEMORY;Should PQExpBufferBroken(lex->strval) be used for the check ?Thanks
Hi,
For v2-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch :
+
+ if (!conn->oauth_client_id)
+ {
+ /* We can't talk to a server without a client identifier. */
+ appendPQExpBufferStr(&conn->errorMessage,
+ libpq_gettext("no oauth_client_id is set for the connection"));
+ goto cleanup;
Can conn->oauth_client_id check be performed ahead of i_init_session() ? That way, ```goto cleanup``` can be replaced with return.
+ if (!error_code || (strcmp(error_code, "authorization_pending")
+ && strcmp(error_code, "slow_down")))
+ && strcmp(error_code, "slow_down")))
What if, in the future, there is error code different from the above two which doesn't represent "OAuth token retrieval failed" scenario ?
For client_initial_response(),
+ token_buf = createPQExpBuffer();
+ if (!token_buf)
+ goto cleanup;
+ if (!token_buf)
+ goto cleanup;
If token_buf is NULL, there doesn't seem to be anything to free. We can return directly.
Cheers
On Wed, 2021-08-25 at 15:25 -0700, Zhihong Yu wrote: > > Hi, > For v2-0001-common-jsonapi-support-FRONTEND-clients.patch : > > + /* Clean up. */ > + termJsonLexContext(&lex); > > At the end of termJsonLexContext(), empty is copied to lex. For stack > based JsonLexContext, the copy seems unnecessary. > Maybe introduce a boolean parameter for termJsonLexContext() to > signal that the copy can be omitted ? Do you mean heap-based? i.e. destroyJsonLexContext() does an unnecessary copy before free? Yeah, in that case it's not super useful, but I think I'd want some evidence that the performance hit matters before optimizing it. Are there any other internal APIs that take a boolean parameter like that? If not, I think we'd probably just want to remove the copy entirely if it's a problem. > +#ifdef FRONTEND > + /* make sure initialization succeeded */ > + if (lex->strval == NULL) > + return JSON_OUT_OF_MEMORY; > > Should PQExpBufferBroken(lex->strval) be used for the check ? It should be okay to continue if the strval is broken but non-NULL, since it's about to be reset. That has the fringe benefit of allowing the function to go as far as possible without failing, though that's probably a pretty weak justification. In practice, do you think that the probability of success is low enough that we should just short-circuit and be done with it? On Wed, 2021-08-25 at 16:24 -0700, Zhihong Yu wrote: > > For v2-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch : > > + i_init_session(&session); > + > + if (!conn->oauth_client_id) > + { > + /* We can't talk to a server without a client identifier. */ > + appendPQExpBufferStr(&conn->errorMessage, > + libpq_gettext("no oauth_client_id is set for the connection")); > + goto cleanup; > > Can conn->oauth_client_id check be performed ahead > of i_init_session() ? That way, ```goto cleanup``` can be replaced > with return. Yeah, I think that makes sense. FYI, this is probably one of the functions that will be rewritten completely once iddawc is removed. > + if (!error_code || (strcmp(error_code, "authorization_pending") > + && strcmp(error_code, "slow_down"))) > > What if, in the future, there is error code different from the above > two which doesn't represent "OAuth token retrieval failed" scenario ? We'd have to update our code; that would be a breaking change to the Device Authorization spec. Here's what it says today [1]: The "authorization_pending" and "slow_down" error codes define particularly unique behavior, as they indicate that the OAuth client should continue to poll the token endpoint by repeating the token request (implementing the precise behavior defined above). If the client receives an error response with any other error code, it MUST stop polling and SHOULD react accordingly, for example, by displaying an error to the user. > For client_initial_response(), > > + token_buf = createPQExpBuffer(); > + if (!token_buf) > + goto cleanup; > > If token_buf is NULL, there doesn't seem to be anything to free. We > can return directly. That's true today, but implementations have a habit of changing. I personally prefer not to introduce too many exit points from a function that's already using goto. In my experience, that makes future maintenance harder. Thanks for the reviews! Have you been able to give the patchset a try with an OAuth deployment? --Jacob [1] https://datatracker.ietf.org/doc/html/rfc8628#section-3.5
On Thu, Aug 26, 2021 at 9:13 AM Jacob Champion <pchampion@vmware.com> wrote:
On Wed, 2021-08-25 at 15:25 -0700, Zhihong Yu wrote:
>
> Hi,
> For v2-0001-common-jsonapi-support-FRONTEND-clients.patch :
>
> + /* Clean up. */
> + termJsonLexContext(&lex);
>
> At the end of termJsonLexContext(), empty is copied to lex. For stack
> based JsonLexContext, the copy seems unnecessary.
> Maybe introduce a boolean parameter for termJsonLexContext() to
> signal that the copy can be omitted ?
Do you mean heap-based? i.e. destroyJsonLexContext() does an
unnecessary copy before free? Yeah, in that case it's not super useful,
but I think I'd want some evidence that the performance hit matters
before optimizing it.
Are there any other internal APIs that take a boolean parameter like
that? If not, I think we'd probably just want to remove the copy
entirely if it's a problem.
> +#ifdef FRONTEND
> + /* make sure initialization succeeded */
> + if (lex->strval == NULL)
> + return JSON_OUT_OF_MEMORY;
>
> Should PQExpBufferBroken(lex->strval) be used for the check ?
It should be okay to continue if the strval is broken but non-NULL,
since it's about to be reset. That has the fringe benefit of allowing
the function to go as far as possible without failing, though that's
probably a pretty weak justification.
In practice, do you think that the probability of success is low enough
that we should just short-circuit and be done with it?
On Wed, 2021-08-25 at 16:24 -0700, Zhihong Yu wrote:
>
> For v2-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch :
>
> + i_init_session(&session);
> +
> + if (!conn->oauth_client_id)
> + {
> + /* We can't talk to a server without a client identifier. */
> + appendPQExpBufferStr(&conn->errorMessage,
> + libpq_gettext("no oauth_client_id is set for the connection"));
> + goto cleanup;
>
> Can conn->oauth_client_id check be performed ahead
> of i_init_session() ? That way, ```goto cleanup``` can be replaced
> with return.
Yeah, I think that makes sense. FYI, this is probably one of the
functions that will be rewritten completely once iddawc is removed.
> + if (!error_code || (strcmp(error_code, "authorization_pending")
> + && strcmp(error_code, "slow_down")))
>
> What if, in the future, there is error code different from the above
> two which doesn't represent "OAuth token retrieval failed" scenario ?
We'd have to update our code; that would be a breaking change to the
Device Authorization spec. Here's what it says today [1]:
The "authorization_pending" and "slow_down" error codes define
particularly unique behavior, as they indicate that the OAuth client
should continue to poll the token endpoint by repeating the token
request (implementing the precise behavior defined above). If the
client receives an error response with any other error code, it MUST
stop polling and SHOULD react accordingly, for example, by displaying
an error to the user.
> For client_initial_response(),
>
> + token_buf = createPQExpBuffer();
> + if (!token_buf)
> + goto cleanup;
>
> If token_buf is NULL, there doesn't seem to be anything to free. We
> can return directly.
That's true today, but implementations have a habit of changing. I
personally prefer not to introduce too many exit points from a function
that's already using goto. In my experience, that makes future
maintenance harder.
Thanks for the reviews! Have you been able to give the patchset a try
with an OAuth deployment?
--Jacob
[1] https://datatracker.ietf.org/doc/html/rfc8628#section-3.5
Hi,
bq. destroyJsonLexContext() does an unnecessary copy before free? Yeah, in that case it's not super useful,
but I think I'd want some evidence that the performance hit matters before optimizing it. Yes I agree.
bq. In practice, do you think that the probability of success is low enough that we should just short-circuit and be done with it?
Haven't had a chance to try your patches out yet.
I will leave this to people who are more familiar with OAuth implementation(s).
bq. I personally prefer not to introduce too many exit points from a function that's already using goto.
Fair enough.
Cheers
On Thu, Aug 26, 2021 at 04:13:08PM +0000, Jacob Champion wrote: > Do you mean heap-based? i.e. destroyJsonLexContext() does an > unnecessary copy before free? Yeah, in that case it's not super useful, > but I think I'd want some evidence that the performance hit matters > before optimizing it. As an authentication code path, the impact is minimal and my take on that would be to keep the code simple. Now if you'd really wish to stress that without relying on the backend, one simple way is to use pgbench -C -n with a mostly-empty script (one meta-command) coupled with some profiling. -- Michael
Attachment
On Fri, 2021-08-27 at 11:32 +0900, Michael Paquier wrote: > Now if you'd really wish to > stress that without relying on the backend, one simple way is to use > pgbench -C -n with a mostly-empty script (one meta-command) coupled > with some profiling. Ah, thanks! I'll add that to the toolbox. --Jacob
Hi all, v3 rebases this patchset over the top of Samay's pluggable auth provider API [1], included here as patches 0001-3. The final patch in the set ports the server implementation from a core feature to a contrib module; to switch between the two approaches, simply leave out that final patch. There are still some backend changes that must be made to get this working, as pointed out in 0009, and obviously libpq support still requires code changes. --Jacob [1] https://www.postgresql.org/message-id/flat/CAJxrbyxTRn5P8J-p%2BwHLwFahK5y56PhK28VOb55jqMO05Y-DJw%40mail.gmail.com
Attachment
- v3-0001-Add-support-for-custom-authentication-methods.patch
- v3-0002-Add-sample-extension-to-test-custom-auth-provider.patch
- v3-0003-Add-tests-for-test_auth_provider-extension.patch
- v3-0004-common-jsonapi-support-FRONTEND-clients.patch
- v3-0005-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v3-0006-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v3-0007-Add-a-very-simple-authn_id-extension.patch
- v3-0008-Add-pytest-suite-for-OAuth.patch
- v3-0009-contrib-oauth-switch-to-pluggable-auth-API.patch
Hi Jacob,
Thank you for porting this on top of the pluggable auth methods API. I've addressed the feedback around other backend changes in my latest patch, but the client side changes still remain. I had a few questions to understand them better.
(a) What specifically do the client side changes in the patch implement?
(b) Are the changes you made on the client side specific to OAUTH or are they about making SASL more generic? As an additional question, if someone wanted to implement something similar on top of your patch, would they still have to make client side changes?
Regards,
Samay
On Fri, Mar 4, 2022 at 11:13 AM Jacob Champion <pchampion@vmware.com> wrote:
Hi all,
v3 rebases this patchset over the top of Samay's pluggable auth
provider API [1], included here as patches 0001-3. The final patch in
the set ports the server implementation from a core feature to a
contrib module; to switch between the two approaches, simply leave out
that final patch.
There are still some backend changes that must be made to get this
working, as pointed out in 0009, and obviously libpq support still
requires code changes.
--Jacob
[1] https://www.postgresql.org/message-id/flat/CAJxrbyxTRn5P8J-p%2BwHLwFahK5y56PhK28VOb55jqMO05Y-DJw%40mail.gmail.com
On Tue, 2022-03-22 at 14:48 -0700, samay sharma wrote: > Thank you for porting this on top of the pluggable auth methods API. > I've addressed the feedback around other backend changes in my latest > patch, but the client side changes still remain. I had a few > questions to understand them better. > > (a) What specifically do the client side changes in the patch implement? Hi Samay, The client-side changes are an implementation of the OAuth 2.0 Device Authorization Grant [1] in libpq. The majority of the OAuth logic is handled by the third-party iddawc library. The server tells the client what OIDC provider to contact, and then libpq prompts you to log into that provider on your smartphone/browser/etc. using a one-time code. After you give libpq permission to act on your behalf, the Bearer token gets sent to libpq via a direct connection, and libpq forwards it to the server so that the server can determine whether you're allowed in. > (b) Are the changes you made on the client side specific to OAUTH or > are they about making SASL more generic? The original patchset included changes to make SASL more generic. Many of those changes have since been merged, and the remaining code is mostly OAuth-specific, but there are still improvements to be made. (And there's some JSON crud to sift through in the first couple of patches. I'm still mad that the OAUTHBEARER spec requires clients to parse JSON in the first place.) > As an additional question, > if someone wanted to implement something similar on top of your > patch, would they still have to make client side changes? Any new SASL mechanisms require changes to libpq at this point. You need to implement a new pg_sasl_mech, modify pg_SASL_init() to select the mechanism correctly, and add whatever connection string options you need, along with the associated state in pg_conn. Patch 0004 has all the client-side magic for OAUTHBEARER. --Jacob [1] https://datatracker.ietf.org/doc/html/rfc8628
On Fri, 2022-03-04 at 19:13 +0000, Jacob Champion wrote: > v3 rebases this patchset over the top of Samay's pluggable auth > provider API [1], included here as patches 0001-3. v4 rebases over the latest version of the pluggable auth patchset (included as 0001-4). Note that there's a recent conflict as of d4781d887; use an older commit as the base (or wait for the other thread to be updated). --Jacob
Attachment
- v4-0001-Add-support-for-custom-authentication-methods.patch
- v4-0002-Add-sample-extension-to-test-custom-auth-provider.patch
- v4-0003-Add-tests-for-test_auth_provider-extension.patch
- v4-0004-Add-support-for-map-and-custom-auth-options.patch
- v4-0005-common-jsonapi-support-FRONTEND-clients.patch
- v4-0006-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v4-0007-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v4-0008-Add-a-very-simple-authn_id-extension.patch
- v4-0009-Add-pytest-suite-for-OAuth.patch
- v4-0010-contrib-oauth-switch-to-pluggable-auth-API.patch
Hi Hackers, We are trying to implement AAD(Azure AD) support in PostgreSQL and it can be achieved with support of the OAuth method. To support AAD on top of OAuth in a generic fashion (i.e for all other OAuth providers), we are proposing this patch. It basically exposes two new hooks (one for error reporting and one for OAuth provider specific token validation) and passing OAuth bearer token to backend. It also adds support for client credentials flow of OAuth additional to device code flow which Jacob has proposed. The changes for each component are summarized below. 1. Provider-specific extension: Each OAuth provider implements their own token validator as an extension. Extension registers an OAuth provider hook which is matched to a line in the HBA file. 2. Add support to pass on the OAuth bearer token. In this obtaining the bearer token is left to 3rd party application or user. ./psql -U <username> -d 'dbname=postgres oauth_client_id=<client_id> oauth_bearer_token=<token> 3. HBA: An additional param ‘provider’ is added for the oauth method. Defining "oauth" as method + passing provider, issuer endpoint and expected audience * * * * oauth provider=<token validation extension> issuer=.... scope=.... 4. Engine Backend: Support for generic OAUTHBEARER type, requesting client to provide token and passing to token for provider-specific extension. 5. Engine Frontend: Two-tiered approach. a) libpq transparently passes on the token received from 3rd party client as is to the backend. b) libpq optionally compiled for the clients which explicitly need libpq to orchestrate OAuth communication with the issuer (it depends heavily on 3rd party library iddawc as Jacob already pointed out. The library seems to be supporting all the OAuth flows.) Please let us know your thoughts as the proposed method supports different OAuth flows with the use of provider specific hooks. We think that the proposal would be useful for various OAuth providers. Thanks, Mahendrakar. On Tue, 20 Sept 2022 at 10:18, Jacob Champion <pchampion@vmware.com> wrote: > > On Tue, 2021-06-22 at 23:22 +0000, Jacob Champion wrote: > > On Fri, 2021-06-18 at 11:31 +0300, Heikki Linnakangas wrote: > > > > > > A few small things caught my eye in the backend oauth_exchange function: > > > > > > > + /* Handle the client's initial message. */ > > > > + p = strdup(input); > > > > > > this strdup() should be pstrdup(). > > > > Thanks, I'll fix that in the next re-roll. > > > > > In the same function, there are a bunch of reports like this: > > > > > > > ereport(ERROR, > > > > + (errcode(ERRCODE_PROTOCOL_VIOLATION), > > > > + errmsg("malformed OAUTHBEARER message"), > > > > + errdetail("Comma expected, but found character \"%s\".", > > > > + sanitize_char(*p)))); > > > > > > I don't think the double quotes are needed here, because sanitize_char > > > will return quotes if it's a single character. So it would end up > > > looking like this: ... found character "'x'". > > > > I'll fix this too. Thanks! > > v2, attached, incorporates Heikki's suggested fixes and also rebases on > top of latest HEAD, which had the SASL refactoring changes committed > last month. > > The biggest change from the last patchset is 0001, an attempt at > enabling jsonapi in the frontend without the use of palloc(), based on > suggestions by Michael and Tom from last commitfest. I've also made > some improvements to the pytest suite. No major changes to the OAuth > implementation yet. > > --Jacob
Attachment
Hi Mahendrakar, thanks for your interest and for the patch! On Mon, Sep 19, 2022 at 10:03 PM mahendrakar s <mahendrakarforpg@gmail.com> wrote: > The changes for each component are summarized below. > > 1. Provider-specific extension: > Each OAuth provider implements their own token validator as an > extension. Extension registers an OAuth provider hook which is matched > to a line in the HBA file. How easy is it to write a Bearer validator using C? My limited understanding was that most providers were publishing libraries in higher-level languages. Along those lines, sample validators will need to be provided, both to help in review and to get the pytest suite green again. (And coverage for the new code is important, too.) > 2. Add support to pass on the OAuth bearer token. In this > obtaining the bearer token is left to 3rd party application or user. > > ./psql -U <username> -d 'dbname=postgres > oauth_client_id=<client_id> oauth_bearer_token=<token> This hurts, but I think people are definitely going to ask for it, given the frightening practice of copy-pasting these (incredibly sensitive secret) tokens all over the place... Ideally I'd like to implement sender constraints for the Bearer token, to *prevent* copy-pasting (or, you know, outright theft). But I'm not sure that sender constraints are well-implemented yet for the major providers. > 3. HBA: An additional param ‘provider’ is added for the oauth method. > Defining "oauth" as method + passing provider, issuer endpoint > and expected audience > > * * * * oauth provider=<token validation extension> > issuer=.... scope=.... Naming aside (this conflicts with Samay's previous proposal, I think), I have concerns about the implementation. There's this code: > + if (oauth_provider && oauth_provider->name) > + { > + ereport(ERROR, > + (errmsg("OAuth provider \"%s\" is already loaded.", > + oauth_provider->name))); > + } which appears to prevent loading more than one global provider. But there's also code that deals with a provider list? (Again, it'd help to have test code covering the new stuff.) > b) libpq optionally compiled for the clients which > explicitly need libpq to orchestrate OAuth communication with the > issuer (it depends heavily on 3rd party library iddawc as Jacob > already pointed out. The library seems to be supporting all the OAuth > flows.) Speaking of iddawc, I don't think it's a dependency we should choose to rely on. For all the code that it has, it doesn't seem to provide compatibility with several real-world providers. Google, for one, chose not to follow the IETF spec it helped author, and iddawc doesn't support its flavor of Device Authorization. At another point, I think iddawc tried to decode Azure's Bearer tokens, which is incorrect... I haven't been able to check if those problems have been fixed in a recent version, but if we're going to tie ourselves to a huge dependency, I'd at least like to believe that said dependency is battle-tested and solid, and personally I don't feel like iddawc is. > - auth_method = I_TOKEN_AUTH_METHOD_NONE; > - if (conn->oauth_client_secret && *conn->oauth_client_secret) > - auth_method = I_TOKEN_AUTH_METHOD_SECRET_BASIC; This code got moved, but I'm not sure why? It doesn't appear to have made a change to the logic. > + if (conn->oauth_client_secret && *conn->oauth_client_secret) > + { > + session_response_type = I_RESPONSE_TYPE_CLIENT_CREDENTIALS; > + } Is this an Azure-specific requirement? Ideally a public client (which psql is) shouldn't have to provide a secret to begin with, if I understand that bit of the protocol correctly. I think Google also required provider-specific changes in this part of the code, and unfortunately I don't think they looked the same as yours. We'll have to figure all that out... Standards are great; everyone has one of their own. :) Thanks, --Jacob
On Tue, Sep 20, 2022 at 4:19 PM Jacob Champion <jchampion@timescale.com> wrote: > > 2. Add support to pass on the OAuth bearer token. In this > > obtaining the bearer token is left to 3rd party application or user. > > > > ./psql -U <username> -d 'dbname=postgres > > oauth_client_id=<client_id> oauth_bearer_token=<token> > > This hurts, but I think people are definitely going to ask for it, given > the frightening practice of copy-pasting these (incredibly sensitive > secret) tokens all over the place... After some further thought -- in this case, you already have an opaque Bearer token (and therefore you already know, out of band, which provider needs to be used), you're willing to copy-paste it from whatever service you got it from, and you have an extension plugged into Postgres on the backend that verifies this Bearer blob using some procedure that Postgres knows nothing about. Why do you need the OAUTHBEARER mechanism logic at that point? Isn't that identical to a custom password scheme? It seems like that could be handled completely by Samay's pluggable auth proposal. --Jacob
We can support both passing the token from an upstream client and libpq implementing OAUTH2 protocol to obtaining one. Libpq implementing OAUTHBEARER is needed for community/3rd party tools to have user-friendly authentication experience: 1. For community client tools, like pg_admin, psql etc. Example experience: pg_admin would be able to open a popup dialog to authenticate customer and keep refresh token to avoidasking the user frequently. 2. For 3rd party connectors supporting generic OAUTH with any provider. Useful for datawiz clients, like Tableau or ETL tools.Those can support both user and client OAUTH flows. Libpq passing toked directly from an upstream client is useful in other scenarios: 1. Enterprise clients, built with .Net / Java and using provider-specific authentication libraries, like MSAL for AAD. Thosecan also support more advance provider-specific token acquisition flows. 2. Resource-tight (like IoT) clients. Those can be compiled without optional libpq flag not including the iddawc or otherdependency. Thanks! Andrey. -----Original Message----- From: Jacob Champion <jchampion@timescale.com> Sent: Wednesday, September 21, 2022 9:03 AM To: mahendrakar s <mahendrakarforpg@gmail.com> Cc: pgsql-hackers@postgresql.org; smilingsamay@gmail.com; andres@anarazel.de; Andrey Chudnovskiy <Andrey.Chudnovskiy@microsoft.com>;Mahendrakar Srinivasarao <mahendrakars@microsoft.com> Subject: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER [You don't often get email from jchampion@timescale.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification] On Tue, Sep 20, 2022 at 4:19 PM Jacob Champion <jchampion@timescale.com> wrote: > > 2. Add support to pass on the OAuth bearer token. In this > > obtaining the bearer token is left to 3rd party application or user. > > > > ./psql -U <username> -d 'dbname=postgres > > oauth_client_id=<client_id> oauth_bearer_token=<token> > > This hurts, but I think people are definitely going to ask for it, > given the frightening practice of copy-pasting these (incredibly > sensitive > secret) tokens all over the place... After some further thought -- in this case, you already have an opaque Bearer token (and therefore you already know, outof band, which provider needs to be used), you're willing to copy-paste it from whatever service you got it from, andyou have an extension plugged into Postgres on the backend that verifies this Bearer blob using some procedure that Postgresknows nothing about. Why do you need the OAUTHBEARER mechanism logic at that point? Isn't that identical to a custom password scheme? It seemslike that could be handled completely by Samay's pluggable auth proposal. --Jacob
On Wed, Sep 21, 2022 at 3:10 PM Andrey Chudnovskiy <Andrey.Chudnovskiy@microsoft.com> wrote: > We can support both passing the token from an upstream client and libpq implementing OAUTH2 protocol to obtaining one. Right, I agree that we could potentially do both. > Libpq passing toked directly from an upstream client is useful in other scenarios: > 1. Enterprise clients, built with .Net / Java and using provider-specific authentication libraries, like MSAL for AAD.Those can also support more advance provider-specific token acquisition flows. > 2. Resource-tight (like IoT) clients. Those can be compiled without optional libpq flag not including the iddawc or otherdependency. What I don't understand is how the OAUTHBEARER mechanism helps you in this case. You're short-circuiting the negotiation where the server tells the client what provider to use and what scopes to request, and instead you're saying "here's a secret string, just take it and validate it with magic." I realize the ability to pass an opaque token may be useful, but from the server's perspective, I don't see what differentiates it from the password auth method plus a custom authenticator plugin. Why pay for the additional complexity of OAUTHBEARER if you're not going to use it? --Jacob
First, My message from corp email wasn't displayed in the thread, That is what Jacob replied to, let me post it here for context: > We can support both passing the token from an upstream client and libpq implementing OAUTH2 protocol to obtain one. > > Libpq implementing OAUTHBEARER is needed for community/3rd party tools to have user-friendly authentication experience: > > 1. For community client tools, like pg_admin, psql etc. > Example experience: pg_admin would be able to open a popup dialog to authenticate customers and keep refresh tokens toavoid asking the user frequently. > 2. For 3rd party connectors supporting generic OAUTH with any provider. Useful for datawiz clients, like Tableau or ETLtools. Those can support both user and client OAUTH flows. > > Libpq passing toked directly from an upstream client is useful in other scenarios: > 1. Enterprise clients, built with .Net / Java and using provider-specific authentication libraries, like MSAL for AAD.Those can also support more advanced provider-specific token acquisition flows. > 2. Resource-tight (like IoT) clients. Those can be compiled without the optional libpq flag not including the iddawc orother dependency. ----------------------------------------------------------------------------------------------------- On this: > What I don't understand is how the OAUTHBEARER mechanism helps you in > this case. You're short-circuiting the negotiation where the server > tells the client what provider to use and what scopes to request, and > instead you're saying "here's a secret string, just take it and > validate it with magic." > > I realize the ability to pass an opaque token may be useful, but from > the server's perspective, I don't see what differentiates it from the > password auth method plus a custom authenticator plugin. Why pay for > the additional complexity of OAUTHBEARER if you're not going to use > it? Yes, passing a token as a new auth method won't make much sense in isolation. However: 1. Since OAUTHBEARER is supported in the ecosystem, passing a token as a way to authenticate with OAUTHBEARER is more consistent (IMO), then passing it as a password. 2. Validation on the backend side doesn't depend on whether the token is obtained by libpq or transparently passed by the upstream client. 3. Single OAUTH auth method on the server side for both scenarios, would allow both enterprise clients with their own Token acquisition and community clients using libpq flows to connect as the same PG users/roles. On Wed, Sep 21, 2022 at 8:36 PM Jacob Champion <jchampion@timescale.com> wrote: > > On Wed, Sep 21, 2022 at 3:10 PM Andrey Chudnovskiy > <Andrey.Chudnovskiy@microsoft.com> wrote: > > We can support both passing the token from an upstream client and libpq implementing OAUTH2 protocol to obtaining one. > > Right, I agree that we could potentially do both. > > > Libpq passing toked directly from an upstream client is useful in other scenarios: > > 1. Enterprise clients, built with .Net / Java and using provider-specific authentication libraries, like MSAL for AAD.Those can also support more advance provider-specific token acquisition flows. > > 2. Resource-tight (like IoT) clients. Those can be compiled without optional libpq flag not including the iddawc or otherdependency. > > What I don't understand is how the OAUTHBEARER mechanism helps you in > this case. You're short-circuiting the negotiation where the server > tells the client what provider to use and what scopes to request, and > instead you're saying "here's a secret string, just take it and > validate it with magic." > > I realize the ability to pass an opaque token may be useful, but from > the server's perspective, I don't see what differentiates it from the > password auth method plus a custom authenticator plugin. Why pay for > the additional complexity of OAUTHBEARER if you're not going to use > it? > > --Jacob > > > >
On 9/21/22 21:55, Andrey Chudnovsky wrote: > First, My message from corp email wasn't displayed in the thread, I see it on the public archives [1]. Your client is choosing some pretty confusing quoting tactics, though, which you may want to adjust. :D I have what I'll call some "skeptical curiosity" here -- you don't need to defend your use cases to me by any means, but I'd love to understand more about them. > Yes, passing a token as a new auth method won't make much sense in > isolation. However: > 1. Since OAUTHBEARER is supported in the ecosystem, passing a token as > a way to authenticate with OAUTHBEARER is more consistent (IMO), then > passing it as a password. Agreed. It's probably not a very strong argument for the new mechanism, though, especially if you're not using the most expensive code inside it. > 2. Validation on the backend side doesn't depend on whether the token > is obtained by libpq or transparently passed by the upstream client. Sure. > 3. Single OAUTH auth method on the server side for both scenarios, > would allow both enterprise clients with their own Token acquisition > and community clients using libpq flows to connect as the same PG > users/roles. Okay, this is a stronger argument. With that in mind, I want to revisit your examples and maybe provide some counterproposals: >> Libpq passing toked directly from an upstream client is useful in other scenarios: >> 1. Enterprise clients, built with .Net / Java and using provider-specific authentication libraries, like MSAL for AAD.Those can also support more advanced provider-specific token acquisition flows. I can see that providing a token directly would help you work around limitations in libpq's "standard" OAuth flows, whether we use iddawc or not. And it's cheap in terms of implementation. But I have a feeling it would fall apart rapidly with error cases, where the server is giving libpq information via the OAUTHBEARER mechanism, but libpq can only communicate to your wrapper through human-readable error messages on stderr. This seems like clear motivation for client-side SASL plugins (which were also discussed on Samay's proposal thread). That's a lot more expensive to implement in libpq, but if it were hypothetically available, wouldn't you rather your provider-specific code be able to speak OAUTHBEARER directly with the server? >> 2. Resource-tight (like IoT) clients. Those can be compiled without the optional libpq flag not including the iddawc orother dependency. I want to dig into this much more; resource-constrained systems are near and dear to me. I can see two cases here: Case 1: The device is an IoT client that wants to connect on its own behalf. Why would you want to use OAuth in that case? And how would the IoT device get its Bearer token to begin with? I'm much more used to architectures that provision high-entropy secrets for this, whether they're incredibly long passwords per device (in which case, channel-bound SCRAM should be a fairly strong choice?) or client certs (which can be better decentralized, but make for a lot of bookkeeping). If the answer to that is, "we want an IoT client to be able to connect using the same role as a person", then I think that illustrates a clear need for SASL negotiation. That would let the IoT client choose SCRAM-*-PLUS or EXTERNAL, and the person at the keyboard can choose OAUTHBEARER. Then we have incredible flexibility, because you don't have to engineer one mechanism to handle them all. Case 2: The constrained device is being used as a jump point. So there's an actual person at a keyboard, trying to get into a backend server (maybe behind a firewall layer, etc.), and the middlebox is either not web-connected or is incredibly tiny for some reason. That might be a good use case for a copy-pasted Bearer token, but is there actual demand for that use case? What motivation would you (or your end user) have for choosing a fairly heavy, web-centric authentication method in such a constrained environment? Are there other resource-constrained use cases I've missed? Thanks, --Jacob [1] https://www.postgresql.org/message-id/MN0PR21MB31694BAC193ECE1807FD45358F4F9%40MN0PR21MB3169.namprd21.prod.outlook.com
On Fri, Mar 25, 2022 at 5:00 PM Jacob Champion <pchampion@vmware.com> wrote: > v4 rebases over the latest version of the pluggable auth patchset > (included as 0001-4). Note that there's a recent conflict as > of d4781d887; use an older commit as the base (or wait for the other > thread to be updated). Here's a newly rebased v5. (They're all zipped now, which I probably should have done a while back, sorry.) - As before, 0001-4 are the pluggable auth set; they've now diverged from the official version over on the other thread [1]. - I'm not sure that 0005 is still completely coherent after the rebase, given the recent changes to jsonapi.c. But for now, the tests are green, and that should be enough to keep the conversation going. - 0008 will hopefully be obsoleted when the SYSTEM_USER proposal [2] lands. Thanks, --Jacob [1] https://www.postgresql.org/message-id/CAJxrbyxgFzfqby%2BVRCkeAhJnwVZE50%2BZLPx0JT2TDg9LbZtkCg%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/7e692b8c-0b11-45db-1cad-3afc5b57409f@amazon.com
Attachment
- v5-0004-Add-support-for-map-and-custom-auth-options.patch.gz
- v5-0001-Add-support-for-custom-authentication-methods.patch.gz
- v5-0002-Add-sample-extension-to-test-custom-auth-provider.patch.gz
- v5-0005-common-jsonapi-support-FRONTEND-clients.patch.gz
- v5-0003-Add-tests-for-test_auth_provider-extension.patch.gz
- v5-0006-libpq-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v5-0009-Add-pytest-suite-for-OAuth.patch.gz
- v5-0010-contrib-oauth-switch-to-pluggable-auth-API.patch.gz
- v5-0008-Add-a-very-simple-authn_id-extension.patch.gz
- v5-0007-backend-add-OAUTHBEARER-SASL-mechanism.patch.gz
>>> Libpq passing toked directly from an upstream client is useful in other scenarios: >>> 1. Enterprise clients, built with .Net / Java and using provider-specific authentication libraries, like MSAL for AAD.Those can also support more advanced provider-specific token acquisition flows. > I can see that providing a token directly would help you work around > limitations in libpq's "standard" OAuth flows, whether we use iddawc or > not. And it's cheap in terms of implementation. But I have a feeling it > would fall apart rapidly with error cases, where the server is giving > libpq information via the OAUTHBEARER mechanism, but libpq can only > communicate to your wrapper through human-readable error messages on stderr. For the providing token directly, that would be primarily used for scenarios where the same party controls both the server and the client side wrapper. I.e. The client knows how to get a token for a particular principal and doesn't need any additional information other than human readable messages. Please clarify the scenarios where you see this falling apart. I can provide an example in the cloud world. We (Azure) as well as other providers offer ways to obtain OAUTH tokens for Service-to-Service communication at IAAS / PAAS level. on Azure "Managed Identity" feature integrated in Compute VM allows a client to make a local http call to get a token. VM itself manages the certificate livecycle, as well as implements the corresponding OAUTH flow. This capability is used by both our 1st party PAAS offerings, as well as 3rd party services deploying on VMs or managed K8S clusters. Here, the client doesn't need libpq assistance in obtaining the token. > This seems like clear motivation for client-side SASL plugins (which > were also discussed on Samay's proposal thread). That's a lot more > expensive to implement in libpq, but if it were hypothetically > available, wouldn't you rather your provider-specific code be able to > speak OAUTHBEARER directly with the server? I generally agree that pluggable auth layers in libpq could be beneficial. However, as you pointed out in Samay's thread, that would require a new distribution model for libpq / clients to optionally include provider-specific logic. My optimistic plan here would be to implement several core OAUTH flows in libpq core which would be generic enough to support major enterprise OAUTH providers: 1. Client Credentials flow (Client_id + Client_secret) for backend applications. 2. Authorization Code Flow with PKCE and/or Device code flow for GUI applications. (2.) above would require a protocol between libpq and upstream clients to exchange several messages. Your patch includes a way for libpq to deliver to the client a message about the next authentication steps, so planned to build on top of that. A little about scenarios, we look at. What we're trying to achieve here is an easy integration path for multiple players in the ecosystem: - Managed PaaS Postgres providers (both us and multi-cloud solutions) - SaaS providers deploying postgres on IaaS/PaaS providers' clouds - Tools - pg_admin, psql and other ones. - BI, ETL, Federation and other scenarios where postgres is used as the data source. If we can offer a provider agnostic solution for Backend <=> libpq <=> Upstreal client path, we can have all players above build support for OAUTH credentials, managed by the cloud provider of their choice. For us, that would mean: - Better administrator experience with pg_admin / psql handling of the AAD (Azure Active Directory) authentication flows. - Path for integration solutions using Postgres to build AAD authentication in their management experience. - Ability to use AAD identity provider for any Postgres deployments other than our 1st party PaaS offering. - Ability to offer github as the identity provider for PaaS Postgres offering. Other players in the ecosystem above would be able to get the same benefits. Does that make sense and possible without provider specific libpq plugin? ------------------------- On resource constrained scenarios. > I want to dig into this much more; resource-constrained systems are near > and dear to me. I can see two cases here: I just referred to the ability to compile libpq without extra dependencies to save some kilobytes. Not sure if OAUTH is widely used in those cases. It involves overhead anyway, and requires the device to talk to an additional party (OAUTH provider). Likely Cert authentication is easier. If needed, it can get libpq with full OAUTH support and use a client code. But I didn't think about this scenario. On Fri, Sep 23, 2022 at 3:39 PM Jacob Champion <jchampion@timescale.com> wrote: > > On Fri, Mar 25, 2022 at 5:00 PM Jacob Champion <pchampion@vmware.com> wrote: > > v4 rebases over the latest version of the pluggable auth patchset > > (included as 0001-4). Note that there's a recent conflict as > > of d4781d887; use an older commit as the base (or wait for the other > > thread to be updated). > > Here's a newly rebased v5. (They're all zipped now, which I probably > should have done a while back, sorry.) > > - As before, 0001-4 are the pluggable auth set; they've now diverged > from the official version over on the other thread [1]. > - I'm not sure that 0005 is still completely coherent after the > rebase, given the recent changes to jsonapi.c. But for now, the tests > are green, and that should be enough to keep the conversation going. > - 0008 will hopefully be obsoleted when the SYSTEM_USER proposal [2] lands. > > Thanks, > --Jacob > > [1] https://www.postgresql.org/message-id/CAJxrbyxgFzfqby%2BVRCkeAhJnwVZE50%2BZLPx0JT2TDg9LbZtkCg%40mail.gmail.com > [2] https://www.postgresql.org/message-id/flat/7e692b8c-0b11-45db-1cad-3afc5b57409f@amazon.com
On Mon, Sep 26, 2022 at 6:39 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > For the providing token directly, that would be primarily used for > scenarios where the same party controls both the server and the client > side wrapper. > I.e. The client knows how to get a token for a particular principal > and doesn't need any additional information other than human readable > messages. > Please clarify the scenarios where you see this falling apart. The most concrete example I can see is with the OAUTHBEARER error response. If you want to eventually handle differing scopes per role, or different error statuses (which the proof-of-concept currently hardcodes as `invalid_token`), then the client can't assume it knows what the server is going to say there. I think that's true even if you control both sides and are hardcoding the provider. How should we communicate those pieces to a custom client when it's passing a token directly? The easiest way I can see is for the custom client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin). If you had to parse the libpq error message, I don't think that'd be particularly maintainable. > I can provide an example in the cloud world. We (Azure) as well as > other providers offer ways to obtain OAUTH tokens for > Service-to-Service communication at IAAS / PAAS level. > on Azure "Managed Identity" feature integrated in Compute VM allows a > client to make a local http call to get a token. VM itself manages the > certificate livecycle, as well as implements the corresponding OAUTH > flow. > This capability is used by both our 1st party PAAS offerings, as well > as 3rd party services deploying on VMs or managed K8S clusters. > Here, the client doesn't need libpq assistance in obtaining the token. Cool. To me that's the strongest argument yet for directly providing tokens to libpq. > My optimistic plan here would be to implement several core OAUTH flows > in libpq core which would be generic enough to support major > enterprise OAUTH providers: > 1. Client Credentials flow (Client_id + Client_secret) for backend applications. > 2. Authorization Code Flow with PKCE and/or Device code flow for GUI > applications. As long as it's clear to DBAs when to use which flow (because existing documentation for that is hit-and-miss), I think it's reasonable to eventually support multiple flows. Personally my preference would be to start with one or two core flows, and expand outward once we're sure that we do those perfectly. Otherwise the explosion of knobs and buttons might be overwhelming, both to users and devs. Related to the question of flows is the client implementation library. I've mentioned that I don't think iddawc is production-ready. As far as I'm aware, there is only one certified OpenID relying party written in C, and that's... an Apache server plugin. That leaves us either choosing an untested library, scouring the web for a "tested" library (and hoping we're right in our assessment), or implementing our own (which is going to tamp down enthusiasm for supporting many flows, though that has its own set of benefits). If you know of any reliable implementations with a C API, please let me know. > (2.) above would require a protocol between libpq and upstream clients > to exchange several messages. > Your patch includes a way for libpq to deliver to the client a message > about the next authentication steps, so planned to build on top of > that. Specifically it delivers that message to an end user. If you want a generic machine client to be able to use that, then we'll need to talk about how. > A little about scenarios, we look at. > What we're trying to achieve here is an easy integration path for > multiple players in the ecosystem: > - Managed PaaS Postgres providers (both us and multi-cloud solutions) > - SaaS providers deploying postgres on IaaS/PaaS providers' clouds > - Tools - pg_admin, psql and other ones. > - BI, ETL, Federation and other scenarios where postgres is used as > the data source. > > If we can offer a provider agnostic solution for Backend <=> libpq <=> > Upstreal client path, we can have all players above build support for > OAUTH credentials, managed by the cloud provider of their choice. Well... I don't quite understand why we'd go to the trouble of providing a provider-agnostic communication solution only to have everyone write their own provider-specific client support. Unless you're saying Microsoft would provide an officially blessed plugin for the *server* side only, and Google would provide one of their own, and so on. The server side authorization is the only place where I think it makes sense to specialize by default. libpq should remain agnostic, with the understanding that we'll need to make hard decisions when a major provider decides not to follow a spec. > For us, that would mean: > - Better administrator experience with pg_admin / psql handling of the > AAD (Azure Active Directory) authentication flows. > - Path for integration solutions using Postgres to build AAD > authentication in their management experience. > - Ability to use AAD identity provider for any Postgres deployments > other than our 1st party PaaS offering. > - Ability to offer github as the identity provider for PaaS Postgres offering. GitHub is unfortunately a bit tricky, unless they've started supporting OpenID recently? > Other players in the ecosystem above would be able to get the same benefits. > > Does that make sense and possible without provider specific libpq plugin? If the players involved implement the flows and follow the specs, yes. That's a big "if", unfortunately. I think GitHub and Google are two major players who are currently doing things their own way. > I just referred to the ability to compile libpq without extra > dependencies to save some kilobytes. > Not sure if OAUTH is widely used in those cases. It involves overhead > anyway, and requires the device to talk to an additional party (OAUTH > provider). > Likely Cert authentication is easier. > If needed, it can get libpq with full OAUTH support and use a client > code. But I didn't think about this scenario. Makes sense. Thanks! --Jacob
> The most concrete example I can see is with the OAUTHBEARER error
> response. If you want to eventually handle differing scopes per role,
> or different error statuses (which the proof-of-concept currently
> hardcodes as `invalid_token`), then the client can't assume it knows
> what the server is going to say there. I think that's true even if you
> control both sides and are hardcoding the provider.
> response. If you want to eventually handle differing scopes per role,
> or different error statuses (which the proof-of-concept currently
> hardcodes as `invalid_token`), then the client can't assume it knows
> what the server is going to say there. I think that's true even if you
> control both sides and are hardcoding the provider.
Ok, I see the point. It's related to the topic of communication
between libpq and the upstream client.
between libpq and the upstream client.
> How should we communicate those pieces to a custom client when it's
> passing a token directly? The easiest way I can see is for the custom
> client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin).
> If you had to parse the libpq error message, I don't think that'd be
> particularly maintainable.
I agree that parsing the message is not a sustainable way.
Could you provide more details on the SASL plugin approach you propose?
Specifically, is this basically a set of extension hooks for the client side?
With the need for the client to be compiled with the plugins based on
the set of providers it needs.
Could you provide more details on the SASL plugin approach you propose?
Specifically, is this basically a set of extension hooks for the client side?
With the need for the client to be compiled with the plugins based on
the set of providers it needs.
> Well... I don't quite understand why we'd go to the trouble of
> providing a provider-agnostic communication solution only to have
> everyone write their own provider-specific client support. Unless
> you're saying Microsoft would provide an officially blessed plugin for
> the *server* side only, and Google would provide one of their own, and
> so on.
Yes, via extensions. Identity providers can open source extensions to
use their auth services outside of first party PaaS offerings.
For 3rd party Postgres PaaS or on premise deployments.
use their auth services outside of first party PaaS offerings.
For 3rd party Postgres PaaS or on premise deployments.
> The server side authorization is the only place where I think it makes
> sense to specialize by default. libpq should remain agnostic, with the
> understanding that we'll need to make hard decisions when a major
> provider decides not to follow a spec.
Completely agree with agnostic libpq. Though needs validation with
several major providers to know if this is possible.
several major providers to know if this is possible.
> Specifically it delivers that message to an end user. If you want a
> generic machine client to be able to use that, then we'll need to talk
> about how.
Yes, that's what needs to be decided.
In both Device code and Authorization code scenarios, libpq and the
client would need to exchange a couple of pieces of metadata.
Plus, after success, the client should be able to access a refresh token for further use.
In both Device code and Authorization code scenarios, libpq and the
client would need to exchange a couple of pieces of metadata.
Plus, after success, the client should be able to access a refresh token for further use.
Can we implement a generic protocol like for this between libpq and the clients?
On Fri, Sep 30, 2022 at 7:47 AM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > > How should we communicate those pieces to a custom client when it's > > passing a token directly? The easiest way I can see is for the custom > > client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin). > > If you had to parse the libpq error message, I don't think that'd be > > particularly maintainable. > > I agree that parsing the message is not a sustainable way. > Could you provide more details on the SASL plugin approach you propose? > > Specifically, is this basically a set of extension hooks for the client side? > With the need for the client to be compiled with the plugins based on > the set of providers it needs. That's a good question. I can see two broad approaches, with maybe some ability to combine them into a hybrid: 1. If there turns out to be serious interest in having libpq itself handle OAuth natively (with all of the web-facing code that implies, and all of the questions still left to answer), then we might be able to provide a "token hook" in the same way that we currently provide a passphrase hook for OpenSSL keys. By default, libpq would use its internal machinery to take the provider details, navigate its builtin flow, and return the Bearer token. If you wanted to override that behavior as a client, you could replace the builtin flow with your own, by registering a set of callbacks. 2. Alternatively, OAuth support could be provided via a mechanism plugin for some third-party SASL library (GNU libgsasl, Cyrus libsasl2). We could provide an OAuth plugin in contrib that handles the default flow. Other providers could publish their alternative plugins to completely replace the OAUTHBEARER mechanism handling. Approach (2) would make for some duplicated effort since every provider has to write code to speak the OAUTHBEARER protocol. It might simplify provider-specific distribution, since (at least for Cyrus) I think you could build a single plugin that supports both the client and server side. But it would be a lot easier to unknowingly (or knowingly) break the spec, since you'd control both the client and server sides. There would be less incentive to interoperate. Finally, we could potentially take pieces from both, by having an official OAuth mechanism plugin that provides a client-side hook to override the flow. I have no idea if the benefits would offset the costs of a plugin-for-a-plugin style architecture. And providers would still be free to ignore it and just provide a full mechanism plugin anyway. > > Well... I don't quite understand why we'd go to the trouble of > > providing a provider-agnostic communication solution only to have > > everyone write their own provider-specific client support. Unless > > you're saying Microsoft would provide an officially blessed plugin for > > the *server* side only, and Google would provide one of their own, and > > so on. > > Yes, via extensions. Identity providers can open source extensions to > use their auth services outside of first party PaaS offerings. > For 3rd party Postgres PaaS or on premise deployments. Sounds reasonable. > > The server side authorization is the only place where I think it makes > > sense to specialize by default. libpq should remain agnostic, with the > > understanding that we'll need to make hard decisions when a major > > provider decides not to follow a spec. > > Completely agree with agnostic libpq. Though needs validation with > several major providers to know if this is possible. Agreed. > > Specifically it delivers that message to an end user. If you want a > > generic machine client to be able to use that, then we'll need to talk > > about how. > > Yes, that's what needs to be decided. > In both Device code and Authorization code scenarios, libpq and the > client would need to exchange a couple of pieces of metadata. > Plus, after success, the client should be able to access a refresh token for further use. > > Can we implement a generic protocol like for this between libpq and the clients? I think we can probably prototype a callback hook for approach (1) pretty quickly. (2) is a lot more work and investigation, but it's work that I'm interested in doing (when I get the time). I think there are other very good reasons to consider a third-party SASL library, and some good lessons to be learned, even if the community decides not to go down that road. Thanks, --Jacob
> I think we can probably prototype a callback hook for approach (1) > pretty quickly. (2) is a lot more work and investigation, but it's > work that I'm interested in doing (when I get the time). I think there > are other very good reasons to consider a third-party SASL library, > and some good lessons to be learned, even if the community decides not > to go down that road. Makes sense. We will work on (1.) and do some check if there are any blockers for a shared solution to support github and google. On Fri, Sep 30, 2022 at 1:45 PM Jacob Champion <jchampion@timescale.com> wrote: > > On Fri, Sep 30, 2022 at 7:47 AM Andrey Chudnovsky > <achudnovskij@gmail.com> wrote: > > > How should we communicate those pieces to a custom client when it's > > > passing a token directly? The easiest way I can see is for the custom > > > client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin). > > > If you had to parse the libpq error message, I don't think that'd be > > > particularly maintainable. > > > > I agree that parsing the message is not a sustainable way. > > Could you provide more details on the SASL plugin approach you propose? > > > > Specifically, is this basically a set of extension hooks for the client side? > > With the need for the client to be compiled with the plugins based on > > the set of providers it needs. > > That's a good question. I can see two broad approaches, with maybe > some ability to combine them into a hybrid: > > 1. If there turns out to be serious interest in having libpq itself > handle OAuth natively (with all of the web-facing code that implies, > and all of the questions still left to answer), then we might be able > to provide a "token hook" in the same way that we currently provide a > passphrase hook for OpenSSL keys. By default, libpq would use its > internal machinery to take the provider details, navigate its builtin > flow, and return the Bearer token. If you wanted to override that > behavior as a client, you could replace the builtin flow with your > own, by registering a set of callbacks. > > 2. Alternatively, OAuth support could be provided via a mechanism > plugin for some third-party SASL library (GNU libgsasl, Cyrus > libsasl2). We could provide an OAuth plugin in contrib that handles > the default flow. Other providers could publish their alternative > plugins to completely replace the OAUTHBEARER mechanism handling. > > Approach (2) would make for some duplicated effort since every > provider has to write code to speak the OAUTHBEARER protocol. It might > simplify provider-specific distribution, since (at least for Cyrus) I > think you could build a single plugin that supports both the client > and server side. But it would be a lot easier to unknowingly (or > knowingly) break the spec, since you'd control both the client and > server sides. There would be less incentive to interoperate. > > Finally, we could potentially take pieces from both, by having an > official OAuth mechanism plugin that provides a client-side hook to > override the flow. I have no idea if the benefits would offset the > costs of a plugin-for-a-plugin style architecture. And providers would > still be free to ignore it and just provide a full mechanism plugin > anyway. > > > > Well... I don't quite understand why we'd go to the trouble of > > > providing a provider-agnostic communication solution only to have > > > everyone write their own provider-specific client support. Unless > > > you're saying Microsoft would provide an officially blessed plugin for > > > the *server* side only, and Google would provide one of their own, and > > > so on. > > > > Yes, via extensions. Identity providers can open source extensions to > > use their auth services outside of first party PaaS offerings. > > For 3rd party Postgres PaaS or on premise deployments. > > Sounds reasonable. > > > > The server side authorization is the only place where I think it makes > > > sense to specialize by default. libpq should remain agnostic, with the > > > understanding that we'll need to make hard decisions when a major > > > provider decides not to follow a spec. > > > > Completely agree with agnostic libpq. Though needs validation with > > several major providers to know if this is possible. > > Agreed. > > > > Specifically it delivers that message to an end user. If you want a > > > generic machine client to be able to use that, then we'll need to talk > > > about how. > > > > Yes, that's what needs to be decided. > > In both Device code and Authorization code scenarios, libpq and the > > client would need to exchange a couple of pieces of metadata. > > Plus, after success, the client should be able to access a refresh token for further use. > > > > Can we implement a generic protocol like for this between libpq and the clients? > > I think we can probably prototype a callback hook for approach (1) > pretty quickly. (2) is a lot more work and investigation, but it's > work that I'm interested in doing (when I get the time). I think there > are other very good reasons to consider a third-party SASL library, > and some good lessons to be learned, even if the community decides not > to go down that road. > > Thanks, > --Jacob
Hi, We validated on libpq handling OAuth natively with different flows with different OIDC certified providers. Flows: Device Code, Client Credentials and Refresh Token. Providers: Microsoft, Google and Okta. Also validated with OAuth provider Github. We propose using OpenID Connect (OIDC) as the protocol, instead of OAuth, as it is: - Discovery mechanism to bridge the differences and provide metadata. - Stricter protocol and certification process to reliably identify which providers can be supported. - OIDC is designed for authentication, while the main purpose of OAUTH is to authorize applications on behalf of the user. Github is not OIDC certified, so won’t be supported with this proposal. However, it may be supported in the future through the ability for the extension to provide custom discovery document content. OpenID configuration has a well-known discovery mechanism for the provider configuration URI which is defined in OpenID Connect. It allows libpq to fetch metadata about provider (i.e endpoints, supported grants, response types, etc). In the attached patch (based on V2 patch in the thread and does not contain Samay's changes): - Provider can configure issuer url and scope through the options hook.) - Server passes on an open discovery url and scope to libpq. - Libpq handles OAuth flow based on the flow_type sent in the connection string [1]. - Added callbacks to notify a structure to client tools if OAuth flow requires user interaction. - Pg backend uses hooks to validate bearer token. Note that authentication code flow with PKCE for GUI clients is not implemented yet. Proposed next steps: - Broaden discussion to reach agreement on the approach. - Implement libpq changes without iddawc - Prototype GUI flow with pgAdmin Thanks, Mahendrakar. [1]: connection string for refresh token flow: ./psql -U <user> -d 'dbname=postgres oauth_client_id=<client_id> oauth_flow_type=<flowtype> oauth_refresh_token=<refresh token>' On Mon, 3 Oct 2022 at 23:34, Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > > > I think we can probably prototype a callback hook for approach (1) > > pretty quickly. (2) is a lot more work and investigation, but it's > > work that I'm interested in doing (when I get the time). I think there > > are other very good reasons to consider a third-party SASL library, > > and some good lessons to be learned, even if the community decides not > > to go down that road. > > Makes sense. We will work on (1.) and do some check if there are any > blockers for a shared solution to support github and google. > > On Fri, Sep 30, 2022 at 1:45 PM Jacob Champion <jchampion@timescale.com> wrote: > > > > On Fri, Sep 30, 2022 at 7:47 AM Andrey Chudnovsky > > <achudnovskij@gmail.com> wrote: > > > > How should we communicate those pieces to a custom client when it's > > > > passing a token directly? The easiest way I can see is for the custom > > > > client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin). > > > > If you had to parse the libpq error message, I don't think that'd be > > > > particularly maintainable. > > > > > > I agree that parsing the message is not a sustainable way. > > > Could you provide more details on the SASL plugin approach you propose? > > > > > > Specifically, is this basically a set of extension hooks for the client side? > > > With the need for the client to be compiled with the plugins based on > > > the set of providers it needs. > > > > That's a good question. I can see two broad approaches, with maybe > > some ability to combine them into a hybrid: > > > > 1. If there turns out to be serious interest in having libpq itself > > handle OAuth natively (with all of the web-facing code that implies, > > and all of the questions still left to answer), then we might be able > > to provide a "token hook" in the same way that we currently provide a > > passphrase hook for OpenSSL keys. By default, libpq would use its > > internal machinery to take the provider details, navigate its builtin > > flow, and return the Bearer token. If you wanted to override that > > behavior as a client, you could replace the builtin flow with your > > own, by registering a set of callbacks. > > > > 2. Alternatively, OAuth support could be provided via a mechanism > > plugin for some third-party SASL library (GNU libgsasl, Cyrus > > libsasl2). We could provide an OAuth plugin in contrib that handles > > the default flow. Other providers could publish their alternative > > plugins to completely replace the OAUTHBEARER mechanism handling. > > > > Approach (2) would make for some duplicated effort since every > > provider has to write code to speak the OAUTHBEARER protocol. It might > > simplify provider-specific distribution, since (at least for Cyrus) I > > think you could build a single plugin that supports both the client > > and server side. But it would be a lot easier to unknowingly (or > > knowingly) break the spec, since you'd control both the client and > > server sides. There would be less incentive to interoperate. > > > > Finally, we could potentially take pieces from both, by having an > > official OAuth mechanism plugin that provides a client-side hook to > > override the flow. I have no idea if the benefits would offset the > > costs of a plugin-for-a-plugin style architecture. And providers would > > still be free to ignore it and just provide a full mechanism plugin > > anyway. > > > > > > Well... I don't quite understand why we'd go to the trouble of > > > > providing a provider-agnostic communication solution only to have > > > > everyone write their own provider-specific client support. Unless > > > > you're saying Microsoft would provide an officially blessed plugin for > > > > the *server* side only, and Google would provide one of their own, and > > > > so on. > > > > > > Yes, via extensions. Identity providers can open source extensions to > > > use their auth services outside of first party PaaS offerings. > > > For 3rd party Postgres PaaS or on premise deployments. > > > > Sounds reasonable. > > > > > > The server side authorization is the only place where I think it makes > > > > sense to specialize by default. libpq should remain agnostic, with the > > > > understanding that we'll need to make hard decisions when a major > > > > provider decides not to follow a spec. > > > > > > Completely agree with agnostic libpq. Though needs validation with > > > several major providers to know if this is possible. > > > > Agreed. > > > > > > Specifically it delivers that message to an end user. If you want a > > > > generic machine client to be able to use that, then we'll need to talk > > > > about how. > > > > > > Yes, that's what needs to be decided. > > > In both Device code and Authorization code scenarios, libpq and the > > > client would need to exchange a couple of pieces of metadata. > > > Plus, after success, the client should be able to access a refresh token for further use. > > > > > > Can we implement a generic protocol like for this between libpq and the clients? > > > > I think we can probably prototype a callback hook for approach (1) > > pretty quickly. (2) is a lot more work and investigation, but it's > > work that I'm interested in doing (when I get the time). I think there > > are other very good reasons to consider a third-party SASL library, > > and some good lessons to be learned, even if the community decides not > > to go down that road. > > > > Thanks, > > --Jacob
Attachment
On 11/23/22 01:58, mahendrakar s wrote: > We validated on libpq handling OAuth natively with different flows > with different OIDC certified providers. > > Flows: Device Code, Client Credentials and Refresh Token. > Providers: Microsoft, Google and Okta. Great, thank you! > Also validated with OAuth provider Github. (How did you get discovery working? I tried this and had to give up eventually.) > We propose using OpenID Connect (OIDC) as the protocol, instead of > OAuth, as it is: > - Discovery mechanism to bridge the differences and provide metadata. > - Stricter protocol and certification process to reliably identify > which providers can be supported. > - OIDC is designed for authentication, while the main purpose of OAUTH is to > authorize applications on behalf of the user. How does this differ from the previous proposal? The OAUTHBEARER SASL mechanism already relies on OIDC for discovery. (I think that decision is confusing from an architectural and naming standpoint, but I don't think they really had an alternative...) > Github is not OIDC certified, so won’t be supported with this proposal. > However, it may be supported in the future through the ability for the > extension to provide custom discovery document content. Right. > OpenID configuration has a well-known discovery mechanism > for the provider configuration URI which is > defined in OpenID Connect. It allows libpq to fetch > metadata about provider (i.e endpoints, supported grants, response types, etc). Sure, but this is already how the original PoC works. The test suite implements an OIDC provider, for instance. Is there something different to this that I'm missing? > In the attached patch (based on V2 patch in the thread and does not > contain Samay's changes): > - Provider can configure issuer url and scope through the options hook.) > - Server passes on an open discovery url and scope to libpq. > - Libpq handles OAuth flow based on the flow_type sent in the > connection string [1]. > - Added callbacks to notify a structure to client tools if OAuth flow > requires user interaction. > - Pg backend uses hooks to validate bearer token. Thank you for the sample! > Note that authentication code flow with PKCE for GUI clients is not > implemented yet. > > Proposed next steps: > - Broaden discussion to reach agreement on the approach. High-level thoughts on this particular patch (I assume you're not looking for low-level implementation comments yet): 0) The original hook proposal upthread, I thought, was about allowing libpq's flow implementation to be switched out by the application. I don't see that approach taken here. It's fine if that turned out to be a bad idea, of course, but this patch doesn't seem to match what we were talking about. 1) I'm really concerned about the sudden explosion of flows. We went from one flow (Device Authorization) to six. It's going to be hard enough to validate that *one* flow is useful and can be securely deployed by end users; I don't think we're going to be able to maintain six, especially in combination with my statement that iddawc is not an appropriate dependency for us. I'd much rather give applications the ability to use their own OAuth code, and then maintain within libpq only the flows that are broadly useful. This ties back to (0) above. 2) Breaking the refresh token into its own pseudoflow is, I think, passing the buck onto the user for something that's incredibly security sensitive. The refresh token is powerful; I don't really want it to be printed anywhere, let alone copy-pasted by the user. Imagine the phishing opportunities. If we want to support refresh tokens, I believe we should be developing a plan to cache and secure them within the client. They should be used as an accelerator for other flows, not as their own flow. 3) I don't like the departure from the OAUTHBEARER mechanism that's presented here. For one, since I can't see a sample plugin that makes use of the "flow type" magic numbers that have been added, I don't really understand why the extension to the mechanism is necessary. For two, if we think OAUTHBEARER is insufficient, the people who wrote it would probably like to hear about it. Claiming support for a spec, and then implementing an extension without review from the people who wrote the spec, is not something I'm personally interested in doing. 4) The test suite is still broken, so it's difficult to see these things in practice for review purposes. > - Implement libpq changes without iddawc This in particular will be much easier with a functioning test suite, and with a smaller number of flows. > - Prototype GUI flow with pgAdmin Cool! Thanks, --Jacob
> How does this differ from the previous proposal? The OAUTHBEARER SASL > mechanism already relies on OIDC for discovery. (I think that decision > is confusing from an architectural and naming standpoint, but I don't > think they really had an alternative...) Mostly terminology questions here. OAUTHBEARER SASL appears to be the spec about using OAUTH2 tokens for Authentication. While any OAUTH2 can generally work, we propose to specifically highlight that only OIDC providers can be supported, as we need the discovery document. And we won't be able to support Github under that requirement. Since the original patch used that too - no change on that, just confirmation that we need OIDC compliance. > 0) The original hook proposal upthread, I thought, was about allowing > libpq's flow implementation to be switched out by the application. I > don't see that approach taken here. It's fine if that turned out to be a > bad idea, of course, but this patch doesn't seem to match what we were > talking about. We still plan to allow the client to pass the token. Which is a generic way to implement its own OAUTH flows. > 1) I'm really concerned about the sudden explosion of flows. We went > from one flow (Device Authorization) to six. It's going to be hard > enough to validate that *one* flow is useful and can be securely > deployed by end users; I don't think we're going to be able to maintain > six, especially in combination with my statement that iddawc is not an > appropriate dependency for us. > I'd much rather give applications the ability to use their own OAuth > code, and then maintain within libpq only the flows that are broadly > useful. This ties back to (0) above. We consider the following set of flows to be minimum required: - Client Credentials - For Service to Service scenarios. - Authorization Code with PKCE - For rich clients,including pgAdmin. - Device code - for psql (and possibly other non-GUI clients). - Refresh code (separate discussion) Which is pretty much the list described here: https://oauth.net/2/grant-types/ and in OAUTH2 specs. Client Credentials is very simple, so does Refresh Code. If you prefer to pick one of the richer flows, Authorization code for GUI scenarios is probably much more widely used. Plus it's easier to implement too, as interaction goes through a series of callbacks. No polling required. > 2) Breaking the refresh token into its own pseudoflow is, I think, > passing the buck onto the user for something that's incredibly security > sensitive. The refresh token is powerful; I don't really want it to be > printed anywhere, let alone copy-pasted by the user. Imagine the > phishing opportunities. > If we want to support refresh tokens, I believe we should be developing > a plan to cache and secure them within the client. They should be used > as an accelerator for other flows, not as their own flow. It's considered a separate "grant_type" in the specs / APIs. https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens For the clients, it would be storing the token and using it to authenticate. On the question of sensitivity, secure credentials stores are different for each platform, with a lot of cloud offerings for this. pgAdmin, for example, has its own way to secure credentials to avoid asking users for passwords every time the app is opened. I believe we should delegate the refresh token management to the clients. >3) I don't like the departure from the OAUTHBEARER mechanism that's > presented here. For one, since I can't see a sample plugin that makes > use of the "flow type" magic numbers that have been added, I don't > really understand why the extension to the mechanism is necessary. I don't think it's much of a departure, but rather a separation of responsibilities between libpq and upstream clients. As libpq can be used in different apps, the client would need different types of flows/grants. I.e. those need to be provided to libpq at connection initialization or some other point. We will change to "grant_type" though and use string to be closer to the spec. What do you think is the best way for the client to signal which OAUTH flow should be used? On Wed, Nov 23, 2022 at 12:05 PM Jacob Champion <jchampion@timescale.com> wrote: > > On 11/23/22 01:58, mahendrakar s wrote: > > We validated on libpq handling OAuth natively with different flows > > with different OIDC certified providers. > > > > Flows: Device Code, Client Credentials and Refresh Token. > > Providers: Microsoft, Google and Okta. > > Great, thank you! > > > Also validated with OAuth provider Github. > > (How did you get discovery working? I tried this and had to give up > eventually.) > > > We propose using OpenID Connect (OIDC) as the protocol, instead of > > OAuth, as it is: > > - Discovery mechanism to bridge the differences and provide metadata. > > - Stricter protocol and certification process to reliably identify > > which providers can be supported. > > - OIDC is designed for authentication, while the main purpose of OAUTH is to > > authorize applications on behalf of the user. > > How does this differ from the previous proposal? The OAUTHBEARER SASL > mechanism already relies on OIDC for discovery. (I think that decision > is confusing from an architectural and naming standpoint, but I don't > think they really had an alternative...) > > > Github is not OIDC certified, so won’t be supported with this proposal. > > However, it may be supported in the future through the ability for the > > extension to provide custom discovery document content. > > Right. > > > OpenID configuration has a well-known discovery mechanism > > for the provider configuration URI which is > > defined in OpenID Connect. It allows libpq to fetch > > metadata about provider (i.e endpoints, supported grants, response types, etc). > > Sure, but this is already how the original PoC works. The test suite > implements an OIDC provider, for instance. Is there something different > to this that I'm missing? > > > In the attached patch (based on V2 patch in the thread and does not > > contain Samay's changes): > > - Provider can configure issuer url and scope through the options hook.) > > - Server passes on an open discovery url and scope to libpq. > > - Libpq handles OAuth flow based on the flow_type sent in the > > connection string [1]. > > - Added callbacks to notify a structure to client tools if OAuth flow > > requires user interaction. > > - Pg backend uses hooks to validate bearer token. > > Thank you for the sample! > > > Note that authentication code flow with PKCE for GUI clients is not > > implemented yet. > > > > Proposed next steps: > > - Broaden discussion to reach agreement on the approach. > > High-level thoughts on this particular patch (I assume you're not > looking for low-level implementation comments yet): > > 0) The original hook proposal upthread, I thought, was about allowing > libpq's flow implementation to be switched out by the application. I > don't see that approach taken here. It's fine if that turned out to be a > bad idea, of course, but this patch doesn't seem to match what we were > talking about. > > 1) I'm really concerned about the sudden explosion of flows. We went > from one flow (Device Authorization) to six. It's going to be hard > enough to validate that *one* flow is useful and can be securely > deployed by end users; I don't think we're going to be able to maintain > six, especially in combination with my statement that iddawc is not an > appropriate dependency for us. > > I'd much rather give applications the ability to use their own OAuth > code, and then maintain within libpq only the flows that are broadly > useful. This ties back to (0) above. > > 2) Breaking the refresh token into its own pseudoflow is, I think, > passing the buck onto the user for something that's incredibly security > sensitive. The refresh token is powerful; I don't really want it to be > printed anywhere, let alone copy-pasted by the user. Imagine the > phishing opportunities. > > If we want to support refresh tokens, I believe we should be developing > a plan to cache and secure them within the client. They should be used > as an accelerator for other flows, not as their own flow. > > 3) I don't like the departure from the OAUTHBEARER mechanism that's > presented here. For one, since I can't see a sample plugin that makes > use of the "flow type" magic numbers that have been added, I don't > really understand why the extension to the mechanism is necessary. > > For two, if we think OAUTHBEARER is insufficient, the people who wrote > it would probably like to hear about it. Claiming support for a spec, > and then implementing an extension without review from the people who > wrote the spec, is not something I'm personally interested in doing. > > 4) The test suite is still broken, so it's difficult to see these things > in practice for review purposes. > > > - Implement libpq changes without iddawc > > This in particular will be much easier with a functioning test suite, > and with a smaller number of flows. > > > - Prototype GUI flow with pgAdmin > > Cool! > > Thanks, > --Jacob
Hi Jacob, I had validated Github by skipping the discovery mechanism and letting the provider extension pass on the endpoints. This is just for validation purposes. If it needs to be supported, then need a way to send the discovery document from extension. Thanks, Mahendrakar. On Thu, 24 Nov 2022 at 09:16, Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > > > How does this differ from the previous proposal? The OAUTHBEARER SASL > > mechanism already relies on OIDC for discovery. (I think that decision > > is confusing from an architectural and naming standpoint, but I don't > > think they really had an alternative...) > Mostly terminology questions here. OAUTHBEARER SASL appears to be the > spec about using OAUTH2 tokens for Authentication. > While any OAUTH2 can generally work, we propose to specifically > highlight that only OIDC providers can be supported, as we need the > discovery document. > And we won't be able to support Github under that requirement. > Since the original patch used that too - no change on that, just > confirmation that we need OIDC compliance. > > > 0) The original hook proposal upthread, I thought, was about allowing > > libpq's flow implementation to be switched out by the application. I > > don't see that approach taken here. It's fine if that turned out to be a > > bad idea, of course, but this patch doesn't seem to match what we were > > talking about. > We still plan to allow the client to pass the token. Which is a > generic way to implement its own OAUTH flows. > > > 1) I'm really concerned about the sudden explosion of flows. We went > > from one flow (Device Authorization) to six. It's going to be hard > > enough to validate that *one* flow is useful and can be securely > > deployed by end users; I don't think we're going to be able to maintain > > six, especially in combination with my statement that iddawc is not an > > appropriate dependency for us. > > > I'd much rather give applications the ability to use their own OAuth > > code, and then maintain within libpq only the flows that are broadly > > useful. This ties back to (0) above. > We consider the following set of flows to be minimum required: > - Client Credentials - For Service to Service scenarios. > - Authorization Code with PKCE - For rich clients,including pgAdmin. > - Device code - for psql (and possibly other non-GUI clients). > - Refresh code (separate discussion) > Which is pretty much the list described here: > https://oauth.net/2/grant-types/ and in OAUTH2 specs. > Client Credentials is very simple, so does Refresh Code. > If you prefer to pick one of the richer flows, Authorization code for > GUI scenarios is probably much more widely used. > Plus it's easier to implement too, as interaction goes through a > series of callbacks. No polling required. > > > 2) Breaking the refresh token into its own pseudoflow is, I think, > > passing the buck onto the user for something that's incredibly security > > sensitive. The refresh token is powerful; I don't really want it to be > > printed anywhere, let alone copy-pasted by the user. Imagine the > > phishing opportunities. > > > If we want to support refresh tokens, I believe we should be developing > > a plan to cache and secure them within the client. They should be used > > as an accelerator for other flows, not as their own flow. > It's considered a separate "grant_type" in the specs / APIs. > https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens > > For the clients, it would be storing the token and using it to authenticate. > On the question of sensitivity, secure credentials stores are > different for each platform, with a lot of cloud offerings for this. > pgAdmin, for example, has its own way to secure credentials to avoid > asking users for passwords every time the app is opened. > I believe we should delegate the refresh token management to the clients. > > >3) I don't like the departure from the OAUTHBEARER mechanism that's > > presented here. For one, since I can't see a sample plugin that makes > > use of the "flow type" magic numbers that have been added, I don't > > really understand why the extension to the mechanism is necessary. > I don't think it's much of a departure, but rather a separation of > responsibilities between libpq and upstream clients. > As libpq can be used in different apps, the client would need > different types of flows/grants. > I.e. those need to be provided to libpq at connection initialization > or some other point. > We will change to "grant_type" though and use string to be closer to the spec. > What do you think is the best way for the client to signal which OAUTH > flow should be used? > > On Wed, Nov 23, 2022 at 12:05 PM Jacob Champion <jchampion@timescale.com> wrote: > > > > On 11/23/22 01:58, mahendrakar s wrote: > > > We validated on libpq handling OAuth natively with different flows > > > with different OIDC certified providers. > > > > > > Flows: Device Code, Client Credentials and Refresh Token. > > > Providers: Microsoft, Google and Okta. > > > > Great, thank you! > > > > > Also validated with OAuth provider Github. > > > > (How did you get discovery working? I tried this and had to give up > > eventually.) > > > > > We propose using OpenID Connect (OIDC) as the protocol, instead of > > > OAuth, as it is: > > > - Discovery mechanism to bridge the differences and provide metadata. > > > - Stricter protocol and certification process to reliably identify > > > which providers can be supported. > > > - OIDC is designed for authentication, while the main purpose of OAUTH is to > > > authorize applications on behalf of the user. > > > > How does this differ from the previous proposal? The OAUTHBEARER SASL > > mechanism already relies on OIDC for discovery. (I think that decision > > is confusing from an architectural and naming standpoint, but I don't > > think they really had an alternative...) > > > > > Github is not OIDC certified, so won’t be supported with this proposal. > > > However, it may be supported in the future through the ability for the > > > extension to provide custom discovery document content. > > > > Right. > > > > > OpenID configuration has a well-known discovery mechanism > > > for the provider configuration URI which is > > > defined in OpenID Connect. It allows libpq to fetch > > > metadata about provider (i.e endpoints, supported grants, response types, etc). > > > > Sure, but this is already how the original PoC works. The test suite > > implements an OIDC provider, for instance. Is there something different > > to this that I'm missing? > > > > > In the attached patch (based on V2 patch in the thread and does not > > > contain Samay's changes): > > > - Provider can configure issuer url and scope through the options hook.) > > > - Server passes on an open discovery url and scope to libpq. > > > - Libpq handles OAuth flow based on the flow_type sent in the > > > connection string [1]. > > > - Added callbacks to notify a structure to client tools if OAuth flow > > > requires user interaction. > > > - Pg backend uses hooks to validate bearer token. > > > > Thank you for the sample! > > > > > Note that authentication code flow with PKCE for GUI clients is not > > > implemented yet. > > > > > > Proposed next steps: > > > - Broaden discussion to reach agreement on the approach. > > > > High-level thoughts on this particular patch (I assume you're not > > looking for low-level implementation comments yet): > > > > 0) The original hook proposal upthread, I thought, was about allowing > > libpq's flow implementation to be switched out by the application. I > > don't see that approach taken here. It's fine if that turned out to be a > > bad idea, of course, but this patch doesn't seem to match what we were > > talking about. > > > > 1) I'm really concerned about the sudden explosion of flows. We went > > from one flow (Device Authorization) to six. It's going to be hard > > enough to validate that *one* flow is useful and can be securely > > deployed by end users; I don't think we're going to be able to maintain > > six, especially in combination with my statement that iddawc is not an > > appropriate dependency for us. > > > > I'd much rather give applications the ability to use their own OAuth > > code, and then maintain within libpq only the flows that are broadly > > useful. This ties back to (0) above. > > > > 2) Breaking the refresh token into its own pseudoflow is, I think, > > passing the buck onto the user for something that's incredibly security > > sensitive. The refresh token is powerful; I don't really want it to be > > printed anywhere, let alone copy-pasted by the user. Imagine the > > phishing opportunities. > > > > If we want to support refresh tokens, I believe we should be developing > > a plan to cache and secure them within the client. They should be used > > as an accelerator for other flows, not as their own flow. > > > > 3) I don't like the departure from the OAUTHBEARER mechanism that's > > presented here. For one, since I can't see a sample plugin that makes > > use of the "flow type" magic numbers that have been added, I don't > > really understand why the extension to the mechanism is necessary. > > > > For two, if we think OAUTHBEARER is insufficient, the people who wrote > > it would probably like to hear about it. Claiming support for a spec, > > and then implementing an extension without review from the people who > > wrote the spec, is not something I'm personally interested in doing. > > > > 4) The test suite is still broken, so it's difficult to see these things > > in practice for review purposes. > > > > > - Implement libpq changes without iddawc > > > > This in particular will be much easier with a functioning test suite, > > and with a smaller number of flows. > > > > > - Prototype GUI flow with pgAdmin > > > > Cool! > > > > Thanks, > > --Jacob
On 11/23/22 19:45, Andrey Chudnovsky wrote: > Mostly terminology questions here. OAUTHBEARER SASL appears to be the > spec about using OAUTH2 tokens for Authentication. > While any OAUTH2 can generally work, we propose to specifically > highlight that only OIDC providers can be supported, as we need the > discovery document. *If* you're using in-band discovery, yes. But I thought your use case was explicitly tailored to out-of-band token retrieval: > The client knows how to get a token for a particular principal > and doesn't need any additional information other than human readable > messages. In that case, isn't OAuth sufficient? There's definitely a need to document the distinction, but I don't think we have to require OIDC as long as the client application makes up for the missing information. (OAUTHBEARER makes the openid-configuration error member optional, presumably for this reason.) >> 0) The original hook proposal upthread, I thought, was about allowing >> libpq's flow implementation to be switched out by the application. I >> don't see that approach taken here. It's fine if that turned out to be a >> bad idea, of course, but this patch doesn't seem to match what we were >> talking about. > We still plan to allow the client to pass the token. Which is a > generic way to implement its own OAUTH flows. Okay. But why push down the implementation into the server? To illustrate what I mean, here's the architecture of my proposed patchset: +-------+ +----------+ | | -------------- Empty Token ------------> | | | libpq | <----- Error Result (w/ Discovery ) ---- | | | | | | | +--------+ +--------------+ | | | | iddawc | <--- [ Flow ] ----> | Issuer/ | | Postgres | | | | <-- Access Token -- | Authz Server | | | | +--------+ +--------------+ | +-----------+ | | | | | | | -------------- Access Token -----------> | > | Validator | | | <---- Authorization Success/Failure ---- | < | | | | | +-----------+ +-------+ +----------+ In this implementation, there's only one black box: the validator, which is responsible for taking an access token from an untrusted client, verifying that it was issued correctly for the Postgres service, and either 1) determining whether the bearer is authorized to access the database, or 2) determining the authenticated ID of the bearer so that the HBA can decide whether they're authorized. (Or both.) This approach is limited by the flows that we explicitly enable within libpq and its OAuth implementation library. You mentioned that you wanted to support other flows, including clients with out-of-band knowledge, and I suggested: > If you wanted to override [iddawc's] > behavior as a client, you could replace the builtin flow with your > own, by registering a set of callbacks. In other words, the hooks would replace iddawc in the above diagram. In my mind, something like this: +-------+ +----------+ +------+ | ----------- Empty Token ------------> | Postgres | | | < | <---------- Error Result ------------ | | | Hook | | | +-----------+ | | | | | | +------+ > | ------------ Access Token ----------> | > | Validator | | | <--- Authorization Success/Failure -- | < | | | libpq | | +-----------+ +-------+ +----------+ Now there's a second black box -- the client hook -- which takes an OAUTHBEARER error result (which may or may not have OIDC discovery information) and returns the access token. How it does this is unspecified -- it'll probably use some OAuth 2.0 flow, but maybe not. Maybe it sends the user to a web browser; maybe it uses some of the magic provider-specific libraries you mentioned upthread. It might have a refresh token cached so it doesn't have to involve the user at all. Crucially, though, the two black boxes remain independent of each other. They have well-defined inputs and outputs (the client hook could be roughly described as "implement get_auth_token()"). Their correctness can be independently verified against published OAuth specs and/or provider documentation. And the client application still makes a single call to PQconnect*(). Compare this to the architecture proposed by your patch: Client App +----------------------+ | +-------+ +----------+ | | libpq | | Postgres | | PQconnect > | | | +-------+ | +------+ | ------- Flow Type (!) -------> | > | | | +- < | Hook | < | <------- Error Result -------- | < | | | [ get +------+ | | | | | token ] | | | | | | | | | | | Hooks | | v | | | | | | PQconnect > | ----> | ------ Access Token ---------> | > | | | | | <--- Authz Success/Failure --- | < | | | +-------+ | +-------+ +----------------------+ +----------+ Rather than decouple things, I think this proposal drives a spike through the client app, libpq, and the server. Please correct me if I've misunderstood pieces of the patch, but the following is my view of it: What used to be a validator hook on the server side now actively participates in the client-side flow for some reason. (I still don't understand what the server is supposed to do with that knowledge. Changing your authz requirements based on the flow the client wants to use seems like a good way to introduce bugs.) The client-side hook is now coupled to the application logic: you have to know to expect an error from the first PQconnect*() call, then check whatever magic your hook has done for you to be able to set up the second call to PQconnect*() with the correctly scoped bearer token. So if you want to switch between the internal libpq OAuth implementation and your own hook, you have to rewrite your app logic. On top of all that, the "flow type code" being sent is a custom extension to OAUTHBEARER that appears to be incompatible with the RFC's discovery exchange (which is done by sending an empty auth token during the first round trip). > We consider the following set of flows to be minimum required: > - Client Credentials - For Service to Service scenarios. Okay, that's simple enough that I think it could probably be maintained inside libpq with minimal cost. At the same time, is it complicated enough that you need libpq to do it for you? Maybe once we get the hooks ironed out, it'll be more obvious what the tradeoff is... > If you prefer to pick one of the richer flows, Authorization code for > GUI scenarios is probably much more widely used. > Plus it's easier to implement too, as interaction goes through a > series of callbacks. No polling required. I don't think flows requiring the invocation of web browsers and custom URL handlers are a clear fit for libpq. For a first draft, at least, I think that use case should be pushed upward into the client application via a custom hook. >> If we want to support refresh tokens, I believe we should be developing >> a plan to cache and secure them within the client. They should be used >> as an accelerator for other flows, not as their own flow. > It's considered a separate "grant_type" in the specs / APIs. > https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens Yes, but that doesn't mean we have to expose it to users via a connection option. You don't get a refresh token out of the blue; you get it by going through some other flow, and then you use it in preference to going through that flow again later. > For the clients, it would be storing the token and using it to authenticate. > On the question of sensitivity, secure credentials stores are > different for each platform, with a lot of cloud offerings for this. > pgAdmin, for example, has its own way to secure credentials to avoid > asking users for passwords every time the app is opened. > I believe we should delegate the refresh token management to the clients. Delegating to client apps would be fine (and implicitly handled by a token hook, because the client app would receive the refresh token directly rather than going through libpq). Delegating to end users, not so much. Printing a refresh token to stderr as proposed here is, I think, making things unnecessarily difficult (and/or dangerous) for users. >> 3) I don't like the departure from the OAUTHBEARER mechanism that's >> presented here. For one, since I can't see a sample plugin that makes >> use of the "flow type" magic numbers that have been added, I don't >> really understand why the extension to the mechanism is necessary. > I don't think it's much of a departure, but rather a separation of > responsibilities between libpq and upstream clients. Given the proposed architectures above, 1) I think this is further coupling the components, not separating them; and 2) I can't agree that an incompatible discovery mechanism is "not much of a departure". If OAUTHBEARER's functionality isn't good enough for some reason, let's talk about why. > As libpq can be used in different apps, the client would need > different types of flows/grants. > I.e. those need to be provided to libpq at connection initialization > or some other point. Why do libpq (or the server!) need to know those things at all, if they're not going to implement the flow? > We will change to "grant_type" though and use string to be closer to the spec. > What do you think is the best way for the client to signal which OAUTH > flow should be used? libpq should not need to know the grant type in use if the client is bypassing its internal implementation entirely. Thanks, --Jacob
On 11/24/22 00:20, mahendrakar s wrote: > I had validated Github by skipping the discovery mechanism and letting > the provider extension pass on the endpoints. This is just for > validation purposes. > If it needs to be supported, then need a way to send the discovery > document from extension. Yeah. I had originally bounced around the idea that we could send a data:// URL, but I think that opens up problems. You're supposed to be able to link the issuer URI with the URI you got the configuration from, and if they're different, you bail out. If a server makes up its own OpenID configuration, we'd have to bypass that safety check, and decide what the risks and mitigations are... Not sure it's worth it. Especially if you could just lobby GitHub to, say, provide an OpenID config. (Maybe there's a security-related reason they don't.) --Jacob
Jacob, Thanks for your feedback. I think we can focus on the roles and responsibilities of the components first. Details of the patch can be elaborated. Like "flow type code" is a mistake on our side, and we will use the term "grant_type" which is defined by OIDC spec. As well as details of usage of refresh_token. > Rather than decouple things, I think this proposal drives a spike > through the client app, libpq, and the server. Please correct me if I've > misunderstood pieces of the patch, but the following is my view of it: > What used to be a validator hook on the server side now actively > participates in the client-side flow for some reason. (I still don't > understand what the server is supposed to do with that knowledge. > Changing your authz requirements based on the flow the client wants to > use seems like a good way to introduce bugs.) > The client-side hook is now coupled to the application logic: you have > to know to expect an error from the first PQconnect*() call, then check > whatever magic your hook has done for you to be able to set up the > second call to PQconnect*() with the correctly scoped bearer token. So > if you want to switch between the internal libpq OAuth implementation > and your own hook, you have to rewrite your app logic. Basically Yes. We propose an increase of the server side hook responsibility. From just validating the token, to also return the provider root URL and required audience. And possibly provide more metadata in the future. Which is in our opinion aligned with SASL protocol, where the server side is responsible for telling the client auth requirements based on the requested role in the startup packet. Our understanding is that in the original patch that information came purely from hba, and we propose extension being able to control that metadata. As we see extension as being owned by the identity provider, compared to HBA which is owned by the server administrator or cloud provider. This change of the roles is based on the vision of 4 independent actor types in the ecosystem: 1. Identity Providers (Okta, Google, Microsoft, other OIDC providers). - Publish open source extensions for PostgreSQL. - Don't have to own the server deployments, and must ensure their extensions can work in any environment. This is where we think additional hook responsibility helps. 2. Server Owners / PAAS providers (On premise admins, Cloud providers, multi-cloud PAAS providers). - Install extensions and configure HBA to allow clients to authenticate with the identity providers of their choice. 3. Client Application Developers (Data Wis, integration tools, PgAdmin, monitoring tools, e.t.c.) - Independent from specific Identity providers or server providers. Write one code for all identity providers. - Rely on application deployment owners to configure which OIDC provider to use across client and server setups. 4. Application Deployment Owners (End customers setting up applications) - The only actor actually aware of which identity provider to use. Configures the stack based on the Identity and PostgreSQL deployments they have. The critical piece of the vision is (3.) above is applications agnostic of the identity providers. Those applications rely on properly configured servers and rich driver logic (libpq, com.postgresql, npgsql) to allow their application to popup auth windows or do service-to-service authentication with any provider. In our view that would significantly democratize the deployment of OAUTH authentication in the community. In order to allow this separation, we propose: 1. HBA + Extension is the single source of truth of Provider root URL + Required Audience for each role. If some backfill for missing OIDC discovery is needed, the provider-specific extension would be providing it. 2. Client Application knows which grant_type to use in which scenario. But can be coded without knowledge of a specific provider. So can't provide discovery details. 3. Driver (libpq, others) - coordinate the authentication flow based on client grant_type and identity provider metadata to allow client applications to use any flow with any provider in a unified way. Yes, this would require a little more complicated flow between components than in your original patch. And yes, more complexity comes with more opportunity to make bugs. However, I see PG Server and Libpq as the places which can have more complexity. For the purpose of making work for the community participants easier and simplify adoption. Does this make sense to you? On Tue, Nov 29, 2022 at 1:20 PM Jacob Champion <jchampion@timescale.com> wrote: > > On 11/24/22 00:20, mahendrakar s wrote: > > I had validated Github by skipping the discovery mechanism and letting > > the provider extension pass on the endpoints. This is just for > > validation purposes. > > If it needs to be supported, then need a way to send the discovery > > document from extension. > > Yeah. I had originally bounced around the idea that we could send a > data:// URL, but I think that opens up problems. > > You're supposed to be able to link the issuer URI with the URI you got > the configuration from, and if they're different, you bail out. If a > server makes up its own OpenID configuration, we'd have to bypass that > safety check, and decide what the risks and mitigations are... Not sure > it's worth it. > > Especially if you could just lobby GitHub to, say, provide an OpenID > config. (Maybe there's a security-related reason they don't.) > > --Jacob
On Mon, Dec 5, 2022 at 4:15 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > I think we can focus on the roles and responsibilities of the components first. > Details of the patch can be elaborated. Like "flow type code" is a > mistake on our side, and we will use the term "grant_type" which is > defined by OIDC spec. As well as details of usage of refresh_token. (For the record, whether we call it "flow type" or "grant type" doesn't address my concern.) > Basically Yes. We propose an increase of the server side hook responsibility. > From just validating the token, to also return the provider root URL > and required audience. And possibly provide more metadata in the > future. I think it's okay to have the extension and HBA collaborate to provide discovery information. Your proposal goes further than that, though, and makes the server aware of the chosen client flow. That appears to be an architectural violation: why does an OAuth resource server need to know the client flow at all? > Which is in our opinion aligned with SASL protocol, where the server > side is responsible for telling the client auth requirements based on > the requested role in the startup packet. You've proposed an alternative SASL mechanism. There's nothing wrong with that, per se, but I think it should be clear why we've chosen something nonstandard. > Our understanding is that in the original patch that information came > purely from hba, and we propose extension being able to control that > metadata. > As we see extension as being owned by the identity provider, compared > to HBA which is owned by the server administrator or cloud provider. That seems reasonable, considering how tightly coupled the Issuer and the token validation process are. > 2. Server Owners / PAAS providers (On premise admins, Cloud providers, > multi-cloud PAAS providers). > - Install extensions and configure HBA to allow clients to > authenticate with the identity providers of their choice. (For a future conversation: they need to set up authorization, too, with custom scopes or some other magic. It's not enough to check who the token belongs to; even if Postgres is just using the verified email from OpenID as an authenticator, you have to also know that the user authorized the token -- and therefore the client -- to access Postgres on their behalf.) > 3. Client Application Developers (Data Wis, integration tools, > PgAdmin, monitoring tools, e.t.c.) > - Independent from specific Identity providers or server providers. > Write one code for all identity providers. Ideally, yes, but that only works if all identity providers implement the same flows in compatible ways. We're already seeing instances where that's not the case and we'll necessarily have to deal with that up front. > - Rely on application deployment owners to configure which OIDC > provider to use across client and server setups. > 4. Application Deployment Owners (End customers setting up applications) > - The only actor actually aware of which identity provider to use. > Configures the stack based on the Identity and PostgreSQL deployments > they have. (I have doubts that the roles will be as decoupled in practice as you have described them, but I'd rather defer that for now.) > The critical piece of the vision is (3.) above is applications > agnostic of the identity providers. Those applications rely on > properly configured servers and rich driver logic (libpq, > com.postgresql, npgsql) to allow their application to popup auth > windows or do service-to-service authentication with any provider. In > our view that would significantly democratize the deployment of OAUTH > authentication in the community. That seems to be restating the goal of OAuth and OIDC. Can you explain how the incompatible change allows you to accomplish this better than standard implementations? > In order to allow this separation, we propose: > 1. HBA + Extension is the single source of truth of Provider root URL > + Required Audience for each role. If some backfill for missing OIDC > discovery is needed, the provider-specific extension would be > providing it. > 2. Client Application knows which grant_type to use in which scenario. > But can be coded without knowledge of a specific provider. So can't > provide discovery details. > 3. Driver (libpq, others) - coordinate the authentication flow based > on client grant_type and identity provider metadata to allow client > applications to use any flow with any provider in a unified way. > > Yes, this would require a little more complicated flow between > components than in your original patch. Why? I claim that standard OAUTHBEARER can handle all of that. What does your proposed architecture (the third diagram) enable that my proposed hook (the second diagram) doesn't? > And yes, more complexity comes > with more opportunity to make bugs. > However, I see PG Server and Libpq as the places which can have more > complexity. For the purpose of making work for the community > participants easier and simplify adoption. > > Does this make sense to you? Some of it, but it hasn't really addressed the questions from my last mail. Thanks, --Jacob
> I think it's okay to have the extension and HBA collaborate to provide > discovery information. Your proposal goes further than that, though, > and makes the server aware of the chosen client flow. That appears to > be an architectural violation: why does an OAuth resource server need > to know the client flow at all? Ok. It may have left there from intermediate iterations. We did consider making extension drive the flow for specific grant_type, but decided against that idea. For the same reason you point to. Is it correct that your main concern about use of grant_type was that it's propagated to the server? Then yes, we will remove sending it to the server. > Ideally, yes, but that only works if all identity providers implement > the same flows in compatible ways. We're already seeing instances > where that's not the case and we'll necessarily have to deal with that > up front. Yes, based on our analysis OIDC spec is detailed enough, that providers implementing that one, can be supported with generic code in libpq / client. Github specifically won't fit there though. Microsoft Azure AD, Google, Okta (including Auth0) will. Theoretically discovery documents can be returned from the extension (server-side) which is provider specific. Though we didn't plan to prioritize that. > That seems to be restating the goal of OAuth and OIDC. Can you explain > how the incompatible change allows you to accomplish this better than > standard implementations? Do you refer to passing grant_type to the server? Which we will get rid of in the next iteration. Or other incompatible changes as well? > Why? I claim that standard OAUTHBEARER can handle all of that. What > does your proposed architecture (the third diagram) enable that my > proposed hook (the second diagram) doesn't? The hook proposed on the 2nd diagram effectively delegates all Oauth flows implementations to the client. We propose libpq takes care of pulling OpenId discovery and coordination. Which is effectively Diagram 1 + more flows + server hook providing root url/audience. Created the diagrams with all components for 3 flows: 1. Authorization code grant (Clients with Browser access): +----------------------+ +----------+ | +-------+ | | | PQconnect | | | | | [auth_code] | | | +-----------+ | -> | | -------------- Empty Token ------------> | > | | | | libpq | <----- Error(w\ Root URL + Audience ) -- | < | Pre-Auth | | | | | | Hook | | | | | +-----------+ | | | +--------------+ | | | | | -------[GET]---------> | OIDC | | Postgres | | +------+ | <--Provider Metadata-- | Discovery | | | | +- < | Hook | < | +--------------+ | | | | +------+ | | | | v | | | | | [get auth | | | | | code] | | | | |<user action>| | | | | | | | | | | + | | | | | PQconnect > | +--------+ +--------------+ | | | | | iddawc | <-- [ Auth code ]-> | Issuer/ | | | | | | | <-- Access Token -- | Authz Server | | | | | +--------+ +--------------+ | | | | | | +-----------+ | | | -------------- Access Token -----------> | > | Validator | | | | <---- Authorization Success/Failure ---- | < | Hook | | +------+ | | +-----------+ | +-< | Hook | | | | | v +------+ | | | |[store +-------+ | | | refresh_token] +----------+ +----------------------+ 2. Device code grant +----------------------+ +----------+ | +-------+ | | | PQconnect | | | | | [auth_code] | | | +-----------+ | -> | | -------------- Empty Token ------------> | > | | | | libpq | <----- Error(w\ Root URL + Audience ) -- | < | Pre-Auth | | | | | | Hook | | | | | +-----------+ | | | +--------------+ | | | | | -------[GET]---------> | OIDC | | Postgres | | +------+ | <--Provider Metadata-- | Discovery | | | | +- < | Hook | < | +--------------+ | | | | +------+ | | | | v | | | | | [device | +---------+ +--------------+ | | | code] | | iddawc | | Issuer/ | | | |<user action>| | | --[ Device code ]-> | Authz Server | | | | | |<polling>| --[ Device code ]-> | | | | | | | | --[ Device code ]-> | | | | | | | | | | | | | | | | <-- Access Token -- | | | | | | +---------+ +--------------+ | | | | | | +-----------+ | | | -------------- Access Token -----------> | > | Validator | | | | <---- Authorization Success/Failure ---- | < | Hook | | +------+ | | +-----------+ | +-< | Hook | | | | | v +------+ | | | |[store +-------+ | | | refresh_token] +----------+ +----------------------+ 3. Non-interactive flows (Client Secret / Refresh_Token) +----------------------+ +----------+ | +-------+ | | | PQconnect | | | | | [grant_type]| | | | | -> | | | +-----------+ | | | -------------- Empty Token ------------> | > | | | | libpq | <----- Error(w\ Root URL + Audience ) -- | < | Pre-Auth | | | | | | Hook | | | | | +-----------+ | | | +--------------+ | | | | | -------[GET]---------> | OIDC | | Postgres | | | | <--Provider Metadata-- | Discovery | | | | | | +--------------+ | | | | | | | | | +--------+ +--------------+ | | | | | iddawc | <-- [ Secret ]----> | Issuer/ | | | | | | | <-- Access Token -- | Authz Server | | | | | +--------+ +--------------+ | | | | | | +-----------+ | | | -------------- Access Token -----------> | > | Validator | | | | <---- Authorization Success/Failure ---- | < | Hook | | | | | +-----------+ | +-------+ +----------+ +----------------------+ I think what was the most confusing in our latest patch is that flow_type was passed to the server. We are not proposing this going forward. > (For a future conversation: they need to set up authorization, too, > with custom scopes or some other magic. It's not enough to check who > the token belongs to; even if Postgres is just using the verified > email from OpenID as an authenticator, you have to also know that the > user authorized the token -- and therefore the client -- to access > Postgres on their behalf.) My understanding is that metadata in the tokens is provider specific, so server side hook would be the right place to handle that. Plus I can envision for some providers it can make sense to make a remote call to pull some information. The way we implement Azure AD auth today in PAAS PostgreSQL offering: - Server administrator uses special extension functions to create Azure AD enabled PostgreSQL roles. - PostgreSQL extension maps Roles to unique identity Ids (UID) in the Directory. - Connection flow: If the token is valid and Role => UID mapping matches, we authenticate as the Role. - Then its native PostgreSQL role based access control takes care of privileges. This is the same for both User- and System-to-system authorization. Though I assume different providers may treat user- and system- identities differently. So their extension would handle that. Thanks! Andrey. On Wed, Dec 7, 2022 at 11:06 AM Jacob Champion <jchampion@timescale.com> wrote: > > On Mon, Dec 5, 2022 at 4:15 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > > I think we can focus on the roles and responsibilities of the components first. > > Details of the patch can be elaborated. Like "flow type code" is a > > mistake on our side, and we will use the term "grant_type" which is > > defined by OIDC spec. As well as details of usage of refresh_token. > > (For the record, whether we call it "flow type" or "grant type" > doesn't address my concern.) > > > Basically Yes. We propose an increase of the server side hook responsibility. > > From just validating the token, to also return the provider root URL > > and required audience. And possibly provide more metadata in the > > future. > > I think it's okay to have the extension and HBA collaborate to provide > discovery information. Your proposal goes further than that, though, > and makes the server aware of the chosen client flow. That appears to > be an architectural violation: why does an OAuth resource server need > to know the client flow at all? > > > Which is in our opinion aligned with SASL protocol, where the server > > side is responsible for telling the client auth requirements based on > > the requested role in the startup packet. > > You've proposed an alternative SASL mechanism. There's nothing wrong > with that, per se, but I think it should be clear why we've chosen > something nonstandard. > > > Our understanding is that in the original patch that information came > > purely from hba, and we propose extension being able to control that > > metadata. > > As we see extension as being owned by the identity provider, compared > > to HBA which is owned by the server administrator or cloud provider. > > That seems reasonable, considering how tightly coupled the Issuer and > the token validation process are. > > > 2. Server Owners / PAAS providers (On premise admins, Cloud providers, > > multi-cloud PAAS providers). > > - Install extensions and configure HBA to allow clients to > > authenticate with the identity providers of their choice. > > (For a future conversation: they need to set up authorization, too, > with custom scopes or some other magic. It's not enough to check who > the token belongs to; even if Postgres is just using the verified > email from OpenID as an authenticator, you have to also know that the > user authorized the token -- and therefore the client -- to access > Postgres on their behalf.) > > > 3. Client Application Developers (Data Wis, integration tools, > > PgAdmin, monitoring tools, e.t.c.) > > - Independent from specific Identity providers or server providers. > > Write one code for all identity providers. > > Ideally, yes, but that only works if all identity providers implement > the same flows in compatible ways. We're already seeing instances > where that's not the case and we'll necessarily have to deal with that > up front. > > > - Rely on application deployment owners to configure which OIDC > > provider to use across client and server setups. > > 4. Application Deployment Owners (End customers setting up applications) > > - The only actor actually aware of which identity provider to use. > > Configures the stack based on the Identity and PostgreSQL deployments > > they have. > > (I have doubts that the roles will be as decoupled in practice as you > have described them, but I'd rather defer that for now.) > > > The critical piece of the vision is (3.) above is applications > > agnostic of the identity providers. Those applications rely on > > properly configured servers and rich driver logic (libpq, > > com.postgresql, npgsql) to allow their application to popup auth > > windows or do service-to-service authentication with any provider. In > > our view that would significantly democratize the deployment of OAUTH > > authentication in the community. > > That seems to be restating the goal of OAuth and OIDC. Can you explain > how the incompatible change allows you to accomplish this better than > standard implementations? > > > In order to allow this separation, we propose: > > 1. HBA + Extension is the single source of truth of Provider root URL > > + Required Audience for each role. If some backfill for missing OIDC > > discovery is needed, the provider-specific extension would be > > providing it. > > 2. Client Application knows which grant_type to use in which scenario. > > But can be coded without knowledge of a specific provider. So can't > > provide discovery details. > > 3. Driver (libpq, others) - coordinate the authentication flow based > > on client grant_type and identity provider metadata to allow client > > applications to use any flow with any provider in a unified way. > > > > Yes, this would require a little more complicated flow between > > components than in your original patch. > > Why? I claim that standard OAUTHBEARER can handle all of that. What > does your proposed architecture (the third diagram) enable that my > proposed hook (the second diagram) doesn't? > > > And yes, more complexity comes > > with more opportunity to make bugs. > > However, I see PG Server and Libpq as the places which can have more > > complexity. For the purpose of making work for the community > > participants easier and simplify adoption. > > > > Does this make sense to you? > > Some of it, but it hasn't really addressed the questions from my last mail. > > Thanks, > --Jacob
That being said, the Diagram 2 would look like this with our proposal: +----------------------+ +----------+ | +-------+ | Postgres | | PQconnect ->| | | | | | | | +-----------+ | | | -------------- Empty Token ------------> | > | | | | libpq | <----- Error(w\ Root URL + Audience ) -- | < | Pre-Auth | | +------+ | | | Hook | | +- < | Hook | | | +-----------+ | | +------+ | | | | v | | | | | [get token]| | | | | | | | | | | + | | | +-----------+ | PQconnect > | | -------------- Access Token -----------> | > | Validator | | | | <---- Authorization Success/Failure ---- | < | Hook | | | | | +-----------+ | +-------+ | | +----------------------+ +----------+ With the application taking care of all Token acquisition logic. While the server-side hook is participating in the pre-authentication reply. That is definitely a required scenario for the long term and the easiest to implement in the client core. And if we can do at least that flow in PG16 it will be a strong foundation to provide more support for specific grants in libpq going forward. Does the diagram above look good to you? We can then start cleaning up the patch to get that in first. Thanks! Andrey. On Wed, Dec 7, 2022 at 3:22 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > > > I think it's okay to have the extension and HBA collaborate to provide > > discovery information. Your proposal goes further than that, though, > > and makes the server aware of the chosen client flow. That appears to > > be an architectural violation: why does an OAuth resource server need > > to know the client flow at all? > > Ok. It may have left there from intermediate iterations. We did > consider making extension drive the flow for specific grant_type, but > decided against that idea. For the same reason you point to. > Is it correct that your main concern about use of grant_type was that > it's propagated to the server? Then yes, we will remove sending it to > the server. > > > Ideally, yes, but that only works if all identity providers implement > > the same flows in compatible ways. We're already seeing instances > > where that's not the case and we'll necessarily have to deal with that > > up front. > > Yes, based on our analysis OIDC spec is detailed enough, that > providers implementing that one, can be supported with generic code in > libpq / client. > Github specifically won't fit there though. Microsoft Azure AD, > Google, Okta (including Auth0) will. > Theoretically discovery documents can be returned from the extension > (server-side) which is provider specific. Though we didn't plan to > prioritize that. > > > That seems to be restating the goal of OAuth and OIDC. Can you explain > > how the incompatible change allows you to accomplish this better than > > standard implementations? > > Do you refer to passing grant_type to the server? Which we will get > rid of in the next iteration. Or other incompatible changes as well? > > > Why? I claim that standard OAUTHBEARER can handle all of that. What > > does your proposed architecture (the third diagram) enable that my > > proposed hook (the second diagram) doesn't? > > The hook proposed on the 2nd diagram effectively delegates all Oauth > flows implementations to the client. > We propose libpq takes care of pulling OpenId discovery and coordination. > Which is effectively Diagram 1 + more flows + server hook providing > root url/audience. > > Created the diagrams with all components for 3 flows: > 1. Authorization code grant (Clients with Browser access): > +----------------------+ +----------+ > | +-------+ | > | > | PQconnect | | | > | > | [auth_code] | | | > +-----------+ > | -> | | -------------- Empty Token ------------> | > > | | > | | libpq | <----- Error(w\ Root URL + Audience ) -- | < > | Pre-Auth | > | | | | > | Hook | > | | | | > +-----------+ > | | | +--------------+ | | > | | | -------[GET]---------> | OIDC | | Postgres | > | +------+ | <--Provider Metadata-- | Discovery | | | > | +- < | Hook | < | +--------------+ | > | > | | +------+ | | > | > | v | | | > | > | [get auth | | | > | > | code] | | | > | > |<user action>| | | > | > | | | | | > | > | + | | | > | > | PQconnect > | +--------+ +--------------+ | > | > | | | iddawc | <-- [ Auth code ]-> | Issuer/ | | | > | | | | <-- Access Token -- | Authz Server | | | > | | +--------+ +--------------+ | | > | | | | > +-----------+ > | | | -------------- Access Token -----------> | > > | Validator | > | | | <---- Authorization Success/Failure ---- | < > | Hook | > | +------+ | | > +-----------+ > | +-< | Hook | | | > | > | v +------+ | | > | > |[store +-------+ | > | > | refresh_token] +----------+ > +----------------------+ > > 2. Device code grant > +----------------------+ +----------+ > | +-------+ | > | > | PQconnect | | | > | > | [auth_code] | | | > +-----------+ > | -> | | -------------- Empty Token ------------> | > > | | > | | libpq | <----- Error(w\ Root URL + Audience ) -- | < > | Pre-Auth | > | | | | > | Hook | > | | | | > +-----------+ > | | | +--------------+ | | > | | | -------[GET]---------> | OIDC | | Postgres | > | +------+ | <--Provider Metadata-- | Discovery | | | > | +- < | Hook | < | +--------------+ | > | > | | +------+ | | > | > | v | | | > | > | [device | +---------+ +--------------+ | > | > | code] | | iddawc | | Issuer/ | | > | > |<user action>| | | --[ Device code ]-> | Authz Server | | > | > | | |<polling>| --[ Device code ]-> | | | > | > | | | | --[ Device code ]-> | | | > | > | | | | | | | | > | | | | <-- Access Token -- | | | | > | | +---------+ +--------------+ | | > | | | | > +-----------+ > | | | -------------- Access Token -----------> | > > | Validator | > | | | <---- Authorization Success/Failure ---- | < > | Hook | > | +------+ | | > +-----------+ > | +-< | Hook | | | > | > | v +------+ | | > | > |[store +-------+ | > | > | refresh_token] +----------+ > +----------------------+ > > 3. Non-interactive flows (Client Secret / Refresh_Token) > +----------------------+ +----------+ > | +-------+ | > | > | PQconnect | | | > | > | [grant_type]| | | | > | -> | | | > +-----------+ > | | | -------------- Empty Token ------------> | > > | | > | | libpq | <----- Error(w\ Root URL + Audience ) -- | < > | Pre-Auth | > | | | | > | Hook | > | | | | > +-----------+ > | | | +--------------+ | | > | | | -------[GET]---------> | OIDC | | Postgres | > | | | <--Provider Metadata-- | Discovery | | | > | | | +--------------+ | > | > | | | | > | > | | +--------+ +--------------+ | > | > | | | iddawc | <-- [ Secret ]----> | Issuer/ | | | > | | | | <-- Access Token -- | Authz Server | | | > | | +--------+ +--------------+ | | > | | | | > +-----------+ > | | | -------------- Access Token -----------> | > > | Validator | > | | | <---- Authorization Success/Failure ---- | < > | Hook | > | | | | > +-----------+ > | +-------+ +----------+ > +----------------------+ > > I think what was the most confusing in our latest patch is that > flow_type was passed to the server. > We are not proposing this going forward. > > > (For a future conversation: they need to set up authorization, too, > > with custom scopes or some other magic. It's not enough to check who > > the token belongs to; even if Postgres is just using the verified > > email from OpenID as an authenticator, you have to also know that the > > user authorized the token -- and therefore the client -- to access > > Postgres on their behalf.) > > My understanding is that metadata in the tokens is provider specific, > so server side hook would be the right place to handle that. > Plus I can envision for some providers it can make sense to make a > remote call to pull some information. > > The way we implement Azure AD auth today in PAAS PostgreSQL offering: > - Server administrator uses special extension functions to create > Azure AD enabled PostgreSQL roles. > - PostgreSQL extension maps Roles to unique identity Ids (UID) in the Directory. > - Connection flow: If the token is valid and Role => UID mapping > matches, we authenticate as the Role. > - Then its native PostgreSQL role based access control takes care of privileges. > > This is the same for both User- and System-to-system authorization. > Though I assume different providers may treat user- and system- > identities differently. So their extension would handle that. > > Thanks! > Andrey. > > On Wed, Dec 7, 2022 at 11:06 AM Jacob Champion <jchampion@timescale.com> wrote: > > > > On Mon, Dec 5, 2022 at 4:15 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > > > I think we can focus on the roles and responsibilities of the components first. > > > Details of the patch can be elaborated. Like "flow type code" is a > > > mistake on our side, and we will use the term "grant_type" which is > > > defined by OIDC spec. As well as details of usage of refresh_token. > > > > (For the record, whether we call it "flow type" or "grant type" > > doesn't address my concern.) > > > > > Basically Yes. We propose an increase of the server side hook responsibility. > > > From just validating the token, to also return the provider root URL > > > and required audience. And possibly provide more metadata in the > > > future. > > > > I think it's okay to have the extension and HBA collaborate to provide > > discovery information. Your proposal goes further than that, though, > > and makes the server aware of the chosen client flow. That appears to > > be an architectural violation: why does an OAuth resource server need > > to know the client flow at all? > > > > > Which is in our opinion aligned with SASL protocol, where the server > > > side is responsible for telling the client auth requirements based on > > > the requested role in the startup packet. > > > > You've proposed an alternative SASL mechanism. There's nothing wrong > > with that, per se, but I think it should be clear why we've chosen > > something nonstandard. > > > > > Our understanding is that in the original patch that information came > > > purely from hba, and we propose extension being able to control that > > > metadata. > > > As we see extension as being owned by the identity provider, compared > > > to HBA which is owned by the server administrator or cloud provider. > > > > That seems reasonable, considering how tightly coupled the Issuer and > > the token validation process are. > > > > > 2. Server Owners / PAAS providers (On premise admins, Cloud providers, > > > multi-cloud PAAS providers). > > > - Install extensions and configure HBA to allow clients to > > > authenticate with the identity providers of their choice. > > > > (For a future conversation: they need to set up authorization, too, > > with custom scopes or some other magic. It's not enough to check who > > the token belongs to; even if Postgres is just using the verified > > email from OpenID as an authenticator, you have to also know that the > > user authorized the token -- and therefore the client -- to access > > Postgres on their behalf.) > > > > > 3. Client Application Developers (Data Wis, integration tools, > > > PgAdmin, monitoring tools, e.t.c.) > > > - Independent from specific Identity providers or server providers. > > > Write one code for all identity providers. > > > > Ideally, yes, but that only works if all identity providers implement > > the same flows in compatible ways. We're already seeing instances > > where that's not the case and we'll necessarily have to deal with that > > up front. > > > > > - Rely on application deployment owners to configure which OIDC > > > provider to use across client and server setups. > > > 4. Application Deployment Owners (End customers setting up applications) > > > - The only actor actually aware of which identity provider to use. > > > Configures the stack based on the Identity and PostgreSQL deployments > > > they have. > > > > (I have doubts that the roles will be as decoupled in practice as you > > have described them, but I'd rather defer that for now.) > > > > > The critical piece of the vision is (3.) above is applications > > > agnostic of the identity providers. Those applications rely on > > > properly configured servers and rich driver logic (libpq, > > > com.postgresql, npgsql) to allow their application to popup auth > > > windows or do service-to-service authentication with any provider. In > > > our view that would significantly democratize the deployment of OAUTH > > > authentication in the community. > > > > That seems to be restating the goal of OAuth and OIDC. Can you explain > > how the incompatible change allows you to accomplish this better than > > standard implementations? > > > > > In order to allow this separation, we propose: > > > 1. HBA + Extension is the single source of truth of Provider root URL > > > + Required Audience for each role. If some backfill for missing OIDC > > > discovery is needed, the provider-specific extension would be > > > providing it. > > > 2. Client Application knows which grant_type to use in which scenario. > > > But can be coded without knowledge of a specific provider. So can't > > > provide discovery details. > > > 3. Driver (libpq, others) - coordinate the authentication flow based > > > on client grant_type and identity provider metadata to allow client > > > applications to use any flow with any provider in a unified way. > > > > > > Yes, this would require a little more complicated flow between > > > components than in your original patch. > > > > Why? I claim that standard OAUTHBEARER can handle all of that. What > > does your proposed architecture (the third diagram) enable that my > > proposed hook (the second diagram) doesn't? > > > > > And yes, more complexity comes > > > with more opportunity to make bugs. > > > However, I see PG Server and Libpq as the places which can have more > > > complexity. For the purpose of making work for the community > > > participants easier and simplify adoption. > > > > > > Does this make sense to you? > > > > Some of it, but it hasn't really addressed the questions from my last mail. > > > > Thanks, > > --Jacob
On Wed, Dec 7, 2022 at 3:22 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > >> I think it's okay to have the extension and HBA collaborate to >> provide discovery information. Your proposal goes further than >> that, though, and makes the server aware of the chosen client flow. >> That appears to be an architectural violation: why does an OAuth >> resource server need to know the client flow at all? > > Ok. It may have left there from intermediate iterations. We did > consider making extension drive the flow for specific grant_type, > but decided against that idea. For the same reason you point to. Is > it correct that your main concern about use of grant_type was that > it's propagated to the server? Then yes, we will remove sending it > to the server. Okay. Yes, that was my primary concern. >> Ideally, yes, but that only works if all identity providers >> implement the same flows in compatible ways. We're already seeing >> instances where that's not the case and we'll necessarily have to >> deal with that up front. > > Yes, based on our analysis OIDC spec is detailed enough, that > providers implementing that one, can be supported with generic code > in libpq / client. Github specifically won't fit there though. > Microsoft Azure AD, Google, Okta (including Auth0) will. > Theoretically discovery documents can be returned from the extension > (server-side) which is provider specific. Though we didn't plan to > prioritize that. As another example, Google's device authorization grant is incompatible with the spec (which they co-authored). I want to say I had problems with Azure AD not following that spec either, but I don't remember exactly what they were. I wouldn't be surprised to find more tiny departures once we get deeper into implementation. >> That seems to be restating the goal of OAuth and OIDC. Can you >> explain how the incompatible change allows you to accomplish this >> better than standard implementations? > > Do you refer to passing grant_type to the server? Which we will get > rid of in the next iteration. Or other incompatible changes as well? Just the grant type, yeah. >> Why? I claim that standard OAUTHBEARER can handle all of that. >> What does your proposed architecture (the third diagram) enable >> that my proposed hook (the second diagram) doesn't? > > The hook proposed on the 2nd diagram effectively delegates all Oauth > flows implementations to the client. We propose libpq takes care of > pulling OpenId discovery and coordination. Which is effectively > Diagram 1 + more flows + server hook providing root url/audience. > > Created the diagrams with all components for 3 flows: [snip] (I'll skip ahead to your later mail on this.) >> (For a future conversation: they need to set up authorization, >> too, with custom scopes or some other magic. It's not enough to >> check who the token belongs to; even if Postgres is just using the >> verified email from OpenID as an authenticator, you have to also >> know that the user authorized the token -- and therefore the client >> -- to access Postgres on their behalf.) > > My understanding is that metadata in the tokens is provider > specific, so server side hook would be the right place to handle > that. Plus I can envision for some providers it can make sense to > make a remote call to pull some information. The server hook is the right place to check the scopes, yes, but I think the DBA should be able to specify what those scopes are to begin with. The provider of the extension shouldn't be expected by the architecture to hardcode those decisions, even if Azure AD chooses to short-circuit that choice and provide magic instead. On 12/7/22 20:25, Andrey Chudnovsky wrote: > That being said, the Diagram 2 would look like this with our proposal: > [snip] > > With the application taking care of all Token acquisition logic. While > the server-side hook is participating in the pre-authentication reply. > > That is definitely a required scenario for the long term and the > easiest to implement in the client core.> And if we can do at least that flow in PG16 it will be a strong > foundation to provide more support for specific grants in libpq going > forward. Agreed. > Does the diagram above look good to you? We can then start cleaning up > the patch to get that in first. I maintain that the hook doesn't need to hand back artifacts to the client for a second PQconnect call. It can just use those artifacts to obtain the access token and hand that right back to libpq. (I think any requirement that clients be rewritten to call PQconnect twice will probably be a sticking point for adoption of an OAuth patch.) That said, now that your proposal is also compatible with OAUTHBEARER, I can pony up some code to hopefully prove my point. (I don't know if I'll be able to do that by the holidays though.) Thanks! --Jacob
> The server hook is the right place to check the scopes, yes, but I think > the DBA should be able to specify what those scopes are to begin with. > The provider of the extension shouldn't be expected by the architecture > to hardcode those decisions, even if Azure AD chooses to short-circuit > that choice and provide magic instead. Hardcode is definitely not expected, but customization for identity provider specific, I think, should be allowed. I can provide a couple of advanced use cases which happen in the cloud deployments world, and require per-role management: - Multi-tenant deployments, when root provider URL would be different for different roles, based on which tenant they come from. - Federation to multiple providers. Solutions like Amazon Cognito which offer a layer of abstraction with several providers transparently supported. If your concern is extension not honoring the DBA configured values: Would a server-side logic to prefer HBA value over extension-provided resolve this concern? We are definitely biased towards the cloud deployment scenarios, where direct access to .hba files is usually not offered at all. Let's find the middle ground here. A separate reason for creating this pre-authentication hook is further extensibility to support more metadata. Specifically when we add support for OAUTH flows to libpq, server-side extensions can help bridge the gap between the identity provider implementation and OAUTH/OIDC specs. For example, that could allow the Github extension to provide an OIDC discovery document. I definitely see identity providers as institutional actors here which can be given some power through the extension hooks to customize the behavior within the framework. > I maintain that the hook doesn't need to hand back artifacts to the > client for a second PQconnect call. It can just use those artifacts to > obtain the access token and hand that right back to libpq. (I think any > requirement that clients be rewritten to call PQconnect twice will > probably be a sticking point for adoption of an OAuth patch.) Obtaining a token is an asynchronous process with a human in the loop. Not sure if expecting a hook function to return a token synchronously is the best option here. Can that be an optional return value of the hook in cases when a token can be obtained synchronously? On Thu, Dec 8, 2022 at 4:41 PM Jacob Champion <jchampion@timescale.com> wrote: > > On Wed, Dec 7, 2022 at 3:22 PM Andrey Chudnovsky > <achudnovskij@gmail.com> wrote: > > > >> I think it's okay to have the extension and HBA collaborate to > >> provide discovery information. Your proposal goes further than > >> that, though, and makes the server aware of the chosen client flow. > >> That appears to be an architectural violation: why does an OAuth > >> resource server need to know the client flow at all? > > > > Ok. It may have left there from intermediate iterations. We did > > consider making extension drive the flow for specific grant_type, > > but decided against that idea. For the same reason you point to. Is > > it correct that your main concern about use of grant_type was that > > it's propagated to the server? Then yes, we will remove sending it > > to the server. > > Okay. Yes, that was my primary concern. > > >> Ideally, yes, but that only works if all identity providers > >> implement the same flows in compatible ways. We're already seeing > >> instances where that's not the case and we'll necessarily have to > >> deal with that up front. > > > > Yes, based on our analysis OIDC spec is detailed enough, that > > providers implementing that one, can be supported with generic code > > in libpq / client. Github specifically won't fit there though. > > Microsoft Azure AD, Google, Okta (including Auth0) will. > > Theoretically discovery documents can be returned from the extension > > (server-side) which is provider specific. Though we didn't plan to > > prioritize that. > > As another example, Google's device authorization grant is incompatible > with the spec (which they co-authored). I want to say I had problems > with Azure AD not following that spec either, but I don't remember > exactly what they were. I wouldn't be surprised to find more tiny > departures once we get deeper into implementation. > > >> That seems to be restating the goal of OAuth and OIDC. Can you > >> explain how the incompatible change allows you to accomplish this > >> better than standard implementations? > > > > Do you refer to passing grant_type to the server? Which we will get > > rid of in the next iteration. Or other incompatible changes as well? > > Just the grant type, yeah. > > >> Why? I claim that standard OAUTHBEARER can handle all of that. > >> What does your proposed architecture (the third diagram) enable > >> that my proposed hook (the second diagram) doesn't? > > > > The hook proposed on the 2nd diagram effectively delegates all Oauth > > flows implementations to the client. We propose libpq takes care of > > pulling OpenId discovery and coordination. Which is effectively > > Diagram 1 + more flows + server hook providing root url/audience. > > > > Created the diagrams with all components for 3 flows: [snip] > > (I'll skip ahead to your later mail on this.) > > >> (For a future conversation: they need to set up authorization, > >> too, with custom scopes or some other magic. It's not enough to > >> check who the token belongs to; even if Postgres is just using the > >> verified email from OpenID as an authenticator, you have to also > >> know that the user authorized the token -- and therefore the client > >> -- to access Postgres on their behalf.) > > > > My understanding is that metadata in the tokens is provider > > specific, so server side hook would be the right place to handle > > that. Plus I can envision for some providers it can make sense to > > make a remote call to pull some information. > > The server hook is the right place to check the scopes, yes, but I think > the DBA should be able to specify what those scopes are to begin with. > The provider of the extension shouldn't be expected by the architecture > to hardcode those decisions, even if Azure AD chooses to short-circuit > that choice and provide magic instead. > > On 12/7/22 20:25, Andrey Chudnovsky wrote: > > That being said, the Diagram 2 would look like this with our proposal: > > [snip] > > > > With the application taking care of all Token acquisition logic. While > > the server-side hook is participating in the pre-authentication reply. > > > > That is definitely a required scenario for the long term and the > > easiest to implement in the client core.> And if we can do at least that flow in PG16 it will be a strong > > foundation to provide more support for specific grants in libpq going > > forward. > > Agreed. > > Does the diagram above look good to you? We can then start cleaning up > > the patch to get that in first. > > I maintain that the hook doesn't need to hand back artifacts to the > client for a second PQconnect call. It can just use those artifacts to > obtain the access token and hand that right back to libpq. (I think any > requirement that clients be rewritten to call PQconnect twice will > probably be a sticking point for adoption of an OAuth patch.) > > That said, now that your proposal is also compatible with OAUTHBEARER, I > can pony up some code to hopefully prove my point. (I don't know if I'll > be able to do that by the holidays though.) > > Thanks! > --Jacob
On Mon, Dec 12, 2022 at 9:06 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > If your concern is extension not honoring the DBA configured values: > Would a server-side logic to prefer HBA value over extension-provided > resolve this concern? Yeah. It also seals the role of the extension here as "optional". > We are definitely biased towards the cloud deployment scenarios, where > direct access to .hba files is usually not offered at all. > Let's find the middle ground here. Sure. I don't want to make this difficult in cloud scenarios -- obviously I'd like for Timescale Cloud to be able to make use of this too. But if we make this easy for a lone DBA (who doesn't have any institutional power with the providers) to use correctly and securely, then it should follow that the providers who _do_ have power and resources will have an easy time of it as well. The reverse isn't necessarily true. So I'm definitely planning to focus on the DBA case first. > A separate reason for creating this pre-authentication hook is further > extensibility to support more metadata. > Specifically when we add support for OAUTH flows to libpq, server-side > extensions can help bridge the gap between the identity provider > implementation and OAUTH/OIDC specs. > For example, that could allow the Github extension to provide an OIDC > discovery document. > > I definitely see identity providers as institutional actors here which > can be given some power through the extension hooks to customize the > behavior within the framework. We'll probably have to make some compromises in this area, but I think they should be carefully considered exceptions and not a core feature of the mechanism. The gaps you point out are just fragmentation, and adding custom extensions to deal with it leads to further fragmentation instead of providing pressure on providers to just implement the specs. Worst case, we open up new exciting security flaws, and then no one can analyze them independently because no one other than the provider knows how the two sides work together anymore. Don't get me wrong; it would be naive to proceed as if the OAUTHBEARER spec were perfect, because it's clearly not. But if we need to make extensions to it, we can participate in IETF discussions and make our case publicly for review, rather than enshrining MS/GitHub/Google/etc. versions of the RFC and enabling that proliferation as a Postgres core feature. > Obtaining a token is an asynchronous process with a human in the loop. > Not sure if expecting a hook function to return a token synchronously > is the best option here. > Can that be an optional return value of the hook in cases when a token > can be obtained synchronously? I don't think the hook is generally going to be able to return a token synchronously, and I expect the final design to be async-first. As far as I know, this will need to be solved for the builtin flows as well (you don't want a synchronous HTTP call to block your PQconnectPoll architecture), so the hook should be able to make use of whatever solution we land on for that. This is hand-wavy, and I don't expect it to be easy to solve. I just don't think we have to solve it twice. Have a good end to the year! --Jacob
Hi All,
Changes added to Jacob's patch(v2) as per the discussion in the thread.
The changes allow the customer to send the OAUTH BEARER token through psql connection string.
Example:
psql -U user@example.com -d 'dbname=postgres oauth_bearer_token=abc'
To configure OAUTH, the pg_hba.conf line look like:
local all all oauth provider=oauth_provider issuer="https://example.com" scope="openid email"
We also added hook to libpq to pass on the metadata about the issuer.
Thanks,
Mahendrakar.
Changes added to Jacob's patch(v2) as per the discussion in the thread.
The changes allow the customer to send the OAUTH BEARER token through psql connection string.
Example:
psql -U user@example.com -d 'dbname=postgres oauth_bearer_token=abc'
To configure OAUTH, the pg_hba.conf line look like:
local all all oauth provider=oauth_provider issuer="https://example.com" scope="openid email"
We also added hook to libpq to pass on the metadata about the issuer.
Thanks,
Mahendrakar.
On Sat, 17 Dec 2022 at 04:48, Jacob Champion <jchampion@timescale.com> wrote:
>
> On Mon, Dec 12, 2022 at 9:06 PM Andrey Chudnovsky
> <achudnovskij@gmail.com> wrote:
> > If your concern is extension not honoring the DBA configured values:
> > Would a server-side logic to prefer HBA value over extension-provided
> > resolve this concern?
>
> Yeah. It also seals the role of the extension here as "optional".
>
> > We are definitely biased towards the cloud deployment scenarios, where
> > direct access to .hba files is usually not offered at all.
> > Let's find the middle ground here.
>
> Sure. I don't want to make this difficult in cloud scenarios --
> obviously I'd like for Timescale Cloud to be able to make use of this
> too. But if we make this easy for a lone DBA (who doesn't have any
> institutional power with the providers) to use correctly and securely,
> then it should follow that the providers who _do_ have power and
> resources will have an easy time of it as well. The reverse isn't
> necessarily true. So I'm definitely planning to focus on the DBA case
> first.
>
> > A separate reason for creating this pre-authentication hook is further
> > extensibility to support more metadata.
> > Specifically when we add support for OAUTH flows to libpq, server-side
> > extensions can help bridge the gap between the identity provider
> > implementation and OAUTH/OIDC specs.
> > For example, that could allow the Github extension to provide an OIDC
> > discovery document.
> >
> > I definitely see identity providers as institutional actors here which
> > can be given some power through the extension hooks to customize the
> > behavior within the framework.
>
> We'll probably have to make some compromises in this area, but I think
> they should be carefully considered exceptions and not a core feature
> of the mechanism. The gaps you point out are just fragmentation, and
> adding custom extensions to deal with it leads to further
> fragmentation instead of providing pressure on providers to just
> implement the specs. Worst case, we open up new exciting security
> flaws, and then no one can analyze them independently because no one
> other than the provider knows how the two sides work together anymore.
>
> Don't get me wrong; it would be naive to proceed as if the OAUTHBEARER
> spec were perfect, because it's clearly not. But if we need to make
> extensions to it, we can participate in IETF discussions and make our
> case publicly for review, rather than enshrining MS/GitHub/Google/etc.
> versions of the RFC and enabling that proliferation as a Postgres core
> feature.
>
> > Obtaining a token is an asynchronous process with a human in the loop.
> > Not sure if expecting a hook function to return a token synchronously
> > is the best option here.
> > Can that be an optional return value of the hook in cases when a token
> > can be obtained synchronously?
>
> I don't think the hook is generally going to be able to return a token
> synchronously, and I expect the final design to be async-first. As far
> as I know, this will need to be solved for the builtin flows as well
> (you don't want a synchronous HTTP call to block your PQconnectPoll
> architecture), so the hook should be able to make use of whatever
> solution we land on for that.
>
> This is hand-wavy, and I don't expect it to be easy to solve. I just
> don't think we have to solve it twice.
>
> Have a good end to the year!
> --Jacob
Attachment
More information on the latest patch. 1. We aligned the implementation with the barebone SASL for OAUTH described here - https://www.rfc-editor.org/rfc/rfc7628 The flow can be explained in the diagram below: +----------------------+ +----------+ | +-------+ | Postgres | | PQconnect ->| | | | | | | | +-----------+ | | | ---------- Empty Token---------> | > | | | | libpq | <-- Error(Discovery + Scope ) -- | < | Pre-Auth | | +------+ | | | Hook | | +- < | Hook | | | +-----------+ | | +------+ | | | | v | | | | | [get token]| | | | | | | | | | | + | | | +-----------+ | PQconnect > | | --------- Access Token --------> | > | Validator | | | | <---------- Auth Result -------- | < | Hook | | | | | +-----------+ | +-------+ | | +----------------------+ +----------+ 2. Removed Device Code implementation in libpq. Several reasons: - Reduce scope and focus on the protocol first. - Device code implementation uses iddawc dependency. Taking this dependency is a controversial step which requires broader discussion. - Device code implementation without iddaws would significantly increase the scope of the patch, as libpq needs to poll the token endpoint, setup different API calls, e.t.c. - That flow should canonically only be used for clients which can't invoke browsers. If it is the only flow to be implemented, it can be used in the context when it's not expected by the OAUTH protocol. 3. Temporarily removed test suite. We are actively working on aligning the tests with the latest changes. Will add a patch with tests soon. We will change the "V3" prefix to make it the next after the previous iterations. Thanks! Andrey. On Thu, Jan 12, 2023 at 11:08 AM mahendrakar s <mahendrakarforpg@gmail.com> wrote: > > Hi All, > > Changes added to Jacob's patch(v2) as per the discussion in the thread. > > The changes allow the customer to send the OAUTH BEARER token through psql connection string. > > Example: > psql -U user@example.com -d 'dbname=postgres oauth_bearer_token=abc' > > To configure OAUTH, the pg_hba.conf line look like: > local all all oauth provider=oauth_provider issuer="https://example.com"scope="openid email" > > We also added hook to libpq to pass on the metadata about the issuer. > > Thanks, > Mahendrakar. > > > On Sat, 17 Dec 2022 at 04:48, Jacob Champion <jchampion@timescale.com> wrote: > > > > On Mon, Dec 12, 2022 at 9:06 PM Andrey Chudnovsky > > <achudnovskij@gmail.com> wrote: > > > If your concern is extension not honoring the DBA configured values: > > > Would a server-side logic to prefer HBA value over extension-provided > > > resolve this concern? > > > > Yeah. It also seals the role of the extension here as "optional". > > > > > We are definitely biased towards the cloud deployment scenarios, where > > > direct access to .hba files is usually not offered at all. > > > Let's find the middle ground here. > > > > Sure. I don't want to make this difficult in cloud scenarios -- > > obviously I'd like for Timescale Cloud to be able to make use of this > > too. But if we make this easy for a lone DBA (who doesn't have any > > institutional power with the providers) to use correctly and securely, > > then it should follow that the providers who _do_ have power and > > resources will have an easy time of it as well. The reverse isn't > > necessarily true. So I'm definitely planning to focus on the DBA case > > first. > > > > > A separate reason for creating this pre-authentication hook is further > > > extensibility to support more metadata. > > > Specifically when we add support for OAUTH flows to libpq, server-side > > > extensions can help bridge the gap between the identity provider > > > implementation and OAUTH/OIDC specs. > > > For example, that could allow the Github extension to provide an OIDC > > > discovery document. > > > > > > I definitely see identity providers as institutional actors here which > > > can be given some power through the extension hooks to customize the > > > behavior within the framework. > > > > We'll probably have to make some compromises in this area, but I think > > they should be carefully considered exceptions and not a core feature > > of the mechanism. The gaps you point out are just fragmentation, and > > adding custom extensions to deal with it leads to further > > fragmentation instead of providing pressure on providers to just > > implement the specs. Worst case, we open up new exciting security > > flaws, and then no one can analyze them independently because no one > > other than the provider knows how the two sides work together anymore. > > > > Don't get me wrong; it would be naive to proceed as if the OAUTHBEARER > > spec were perfect, because it's clearly not. But if we need to make > > extensions to it, we can participate in IETF discussions and make our > > case publicly for review, rather than enshrining MS/GitHub/Google/etc. > > versions of the RFC and enabling that proliferation as a Postgres core > > feature. > > > > > Obtaining a token is an asynchronous process with a human in the loop. > > > Not sure if expecting a hook function to return a token synchronously > > > is the best option here. > > > Can that be an optional return value of the hook in cases when a token > > > can be obtained synchronously? > > > > I don't think the hook is generally going to be able to return a token > > synchronously, and I expect the final design to be async-first. As far > > as I know, this will need to be solved for the builtin flows as well > > (you don't want a synchronous HTTP call to block your PQconnectPoll > > architecture), so the hook should be able to make use of whatever > > solution we land on for that. > > > > This is hand-wavy, and I don't expect it to be easy to solve. I just > > don't think we have to solve it twice. > > > > Have a good end to the year! > > --Jacob
On Sun, Jan 15, 2023 at 12:03 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > 2. Removed Device Code implementation in libpq. Several reasons: > - Reduce scope and focus on the protocol first. > - Device code implementation uses iddawc dependency. Taking this > dependency is a controversial step which requires broader discussion. > - Device code implementation without iddaws would significantly > increase the scope of the patch, as libpq needs to poll the token > endpoint, setup different API calls, e.t.c. > - That flow should canonically only be used for clients which can't > invoke browsers. If it is the only flow to be implemented, it can be > used in the context when it's not expected by the OAUTH protocol. I'm not understanding the concern in the final point -- providers generally require you to opt into device authorization, at least as far as I can tell. So if you decide that it's not appropriate for your use case... don't enable it. (And I haven't seen any claims that opting into device authorization weakens the other flows in any way. So if we're going to implement a flow in libpq, I still think device authorization is the best choice, since it works on headless machines as well as those with browsers.) All of this points at a bigger question to the community: if we choose not to provide a flow implementation in libpq, is adding OAUTHBEARER worth the additional maintenance cost? My personal vote would be "no". I think the hook-only approach proposed here would ensure that only larger providers would implement it in practice, and in that case I'd rather spend cycles on generic SASL. > 3. Temporarily removed test suite. We are actively working on aligning > the tests with the latest changes. Will add a patch with tests soon. Okay. Case in point, the following change to the patch appears to be invalid JSON: > + appendStringInfo(&buf, > + "{ " > + "\"status\": \"invalid_token\", " > + "\"openid-configuration\": \"%s\"," > + "\"scope\": \"%s\" ", > + "\"issuer\": \"%s\" ", > + "}", Additionally, the "issuer" field added here is not part of the RFC. I've written my thoughts about unofficial extensions upthread but haven't received a response, so I'm going to start being more strident: Please, for the sake of reviewers, call out changes you've made to the spec, and why they're justified. The patches seem to be out of order now (and the documentation in the commit messages has been removed). Thanks, --Jacob
> All of this points at a bigger question to the community: if we choose > not to provide a flow implementation in libpq, is adding OAUTHBEARER > worth the additional maintenance cost? > My personal vote would be "no". I think the hook-only approach proposed > here would ensure that only larger providers would implement it in > practice Flow implementations in libpq are definitely a long term plan, and I agree that it would democratise the adoption. In the previous posts in this conversation I outlined the ones I think we should support. However, I don't see why it's strictly necessary to couple those. As long as the SASL exchange for OAUTHBEARER mechanism is supported by the protocol, the Client side can evolve at its own pace. At the same time, the current implementation allows clients to start building provider-agnostic OAUTH support. By using iddawc or OAUTH client implementations in the respective platforms. So I wouldn't refer to "larger providers", but rather "more motivated clients" here. Which definitely overlaps, but keeps the system open. > I'm not understanding the concern in the final point -- providers > generally require you to opt into device authorization, at least as far > as I can tell. So if you decide that it's not appropriate for your use > case... don't enable it. (And I haven't seen any claims that opting into > device authorization weakens the other flows in any way. So if we're > going to implement a flow in libpq, I still think device authorization > is the best choice, since it works on headless machines as well as those > with browsers.) I agree with the statement that Device code is the best first choice if we absolutely have to pick one. Though I don't think we have to. While device flow can be used for all kinds of user-facing applications, it's specifically designed for input-constrained scenarios. As clearly stated in the Abstract here - https://www.rfc-editor.org/rfc/rfc8628 The authorization code with pkce flow is recommended by the RFSc and major providers for cases when it's feasible. The long term goal is to provide both, though I don't see why the backbone protocol implementation first wouldn't add value. Another point is user authentication is one side of the whole story and the other critical one is system-to-system authentication. Where we have Client Credentials and Certificates. With the latter it is much harder to get generically implemented, as provider-specific tokens need to be signed. Adding the other reasoning, I think libpq support for specific flows can get in the further iterations, after the protocol support. > in that case I'd rather spend cycles on generic SASL. I see 2 approaches to generic SASL: (a). Generic SASL is a framework used in the protocol, with the mechanisms implemented on top and exposed to the DBAs as auth types to configure in hba. This is the direction we're going here, which is well aligned with the existing hba-based auth configuration. (b). Generic SASL exposed to developers on the server- and client- side to extend on. It seems to be a much longer shot. The specific points of large ambiguity are libpq distribution model (which you pointed to) and potential pluggability of insecure mechanisms. I do see (a) as a sweet spot with a lot of value for various participants with much less ambiguity. > Additionally, the "issuer" field added here is not part of the RFC. I've > written my thoughts about unofficial extensions upthread but haven't > received a response, so I'm going to start being more strident: Please, > for the sake of reviewers, call out changes you've made to the spec, and > why they're justified. Thanks for your feedback on this. We had this discussion as well, and added that as a convenience for the client to identify the provider. I don't see a reason why an issuer would be absolutely necessary, so we will get your point that sticking to RFCs is a safer choice. > The patches seem to be out of order now (and the documentation in the > commit messages has been removed). Feedback taken. Work in progress. On Tue, Jan 17, 2023 at 2:44 PM Jacob Champion <jchampion@timescale.com> wrote: > > On Sun, Jan 15, 2023 at 12:03 PM Andrey Chudnovsky > <achudnovskij@gmail.com> wrote: > > 2. Removed Device Code implementation in libpq. Several reasons: > > - Reduce scope and focus on the protocol first. > > - Device code implementation uses iddawc dependency. Taking this > > dependency is a controversial step which requires broader discussion. > > - Device code implementation without iddaws would significantly > > increase the scope of the patch, as libpq needs to poll the token > > endpoint, setup different API calls, e.t.c. > > - That flow should canonically only be used for clients which can't > > invoke browsers. If it is the only flow to be implemented, it can be > > used in the context when it's not expected by the OAUTH protocol. > > I'm not understanding the concern in the final point -- providers > generally require you to opt into device authorization, at least as far > as I can tell. So if you decide that it's not appropriate for your use > case... don't enable it. (And I haven't seen any claims that opting into > device authorization weakens the other flows in any way. So if we're > going to implement a flow in libpq, I still think device authorization > is the best choice, since it works on headless machines as well as those > with browsers.) > > All of this points at a bigger question to the community: if we choose > not to provide a flow implementation in libpq, is adding OAUTHBEARER > worth the additional maintenance cost? > > My personal vote would be "no". I think the hook-only approach proposed > here would ensure that only larger providers would implement it in > practice, and in that case I'd rather spend cycles on generic SASL. > > > 3. Temporarily removed test suite. We are actively working on aligning > > the tests with the latest changes. Will add a patch with tests soon. > > Okay. Case in point, the following change to the patch appears to be > invalid JSON: > > > + appendStringInfo(&buf, > > + "{ " > > + "\"status\": \"invalid_token\", " > > + "\"openid-configuration\": \"%s\"," > > + "\"scope\": \"%s\" ", > > + "\"issuer\": \"%s\" ", > > + "}", > > Additionally, the "issuer" field added here is not part of the RFC. I've > written my thoughts about unofficial extensions upthread but haven't > received a response, so I'm going to start being more strident: Please, > for the sake of reviewers, call out changes you've made to the spec, and > why they're justified. > > The patches seem to be out of order now (and the documentation in the > commit messages has been removed). > > Thanks, > --Jacob
Hi All, The "issuer" field has been removed to align with the RFC implementation - https://www.rfc-editor.org/rfc/rfc7628. This patch "v6" is a single patch to support the OAUTH BEARER token through psql connection string. Below flow is supported. Added the documentation in the commit messages. +----------------------+ +----------+ | +-------+ | Postgres | | PQconnect ->| | | | | | | | +-----------+ | | | ---------- Empty Token---------> | > | | | | libpq | <-- Error(Discovery + Scope ) -- | < | Pre-Auth | | +------+ | | | Hook | | +- < | Hook | | | +-----------+ | | +------+ | | | | v | | | | | [get token]| | | | | | | | | | | + | | | +-----------+ | PQconnect > | | --------- Access Token --------> | > | Validator | | | | <---------- Auth Result -------- | < | Hook | | | | | +-----------+ | +-------+ | | +----------------------+ +----------+ Please note that we are working on modifying/adding new tests (from Jacob's Patch) with the latest changes. Will add a patch with tests soon. Thanks, Mahendrakar. On Wed, 18 Jan 2023 at 07:24, Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > > > All of this points at a bigger question to the community: if we choose > > not to provide a flow implementation in libpq, is adding OAUTHBEARER > > worth the additional maintenance cost? > > > My personal vote would be "no". I think the hook-only approach proposed > > here would ensure that only larger providers would implement it in > > practice > > Flow implementations in libpq are definitely a long term plan, and I > agree that it would democratise the adoption. > In the previous posts in this conversation I outlined the ones I think > we should support. > > However, I don't see why it's strictly necessary to couple those. > As long as the SASL exchange for OAUTHBEARER mechanism is supported by > the protocol, the Client side can evolve at its own pace. > > At the same time, the current implementation allows clients to start > building provider-agnostic OAUTH support. By using iddawc or OAUTH > client implementations in the respective platforms. > So I wouldn't refer to "larger providers", but rather "more motivated > clients" here. Which definitely overlaps, but keeps the system open. > > > I'm not understanding the concern in the final point -- providers > > generally require you to opt into device authorization, at least as far > > as I can tell. So if you decide that it's not appropriate for your use > > case... don't enable it. (And I haven't seen any claims that opting into > > device authorization weakens the other flows in any way. So if we're > > going to implement a flow in libpq, I still think device authorization > > is the best choice, since it works on headless machines as well as those > > with browsers.) > I agree with the statement that Device code is the best first choice > if we absolutely have to pick one. > Though I don't think we have to. > > While device flow can be used for all kinds of user-facing > applications, it's specifically designed for input-constrained > scenarios. As clearly stated in the Abstract here - > https://www.rfc-editor.org/rfc/rfc8628 > The authorization code with pkce flow is recommended by the RFSc and > major providers for cases when it's feasible. > The long term goal is to provide both, though I don't see why the > backbone protocol implementation first wouldn't add value. > > Another point is user authentication is one side of the whole story > and the other critical one is system-to-system authentication. Where > we have Client Credentials and Certificates. > With the latter it is much harder to get generically implemented, as > provider-specific tokens need to be signed. > > Adding the other reasoning, I think libpq support for specific flows > can get in the further iterations, after the protocol support. > > > in that case I'd rather spend cycles on generic SASL. > I see 2 approaches to generic SASL: > (a). Generic SASL is a framework used in the protocol, with the > mechanisms implemented on top and exposed to the DBAs as auth types to > configure in hba. > This is the direction we're going here, which is well aligned with the > existing hba-based auth configuration. > (b). Generic SASL exposed to developers on the server- and client- > side to extend on. It seems to be a much longer shot. > The specific points of large ambiguity are libpq distribution model > (which you pointed to) and potential pluggability of insecure > mechanisms. > > I do see (a) as a sweet spot with a lot of value for various > participants with much less ambiguity. > > > Additionally, the "issuer" field added here is not part of the RFC. I've > > written my thoughts about unofficial extensions upthread but haven't > > received a response, so I'm going to start being more strident: Please, > > for the sake of reviewers, call out changes you've made to the spec, and > > why they're justified. > Thanks for your feedback on this. We had this discussion as well, and > added that as a convenience for the client to identify the provider. > I don't see a reason why an issuer would be absolutely necessary, so > we will get your point that sticking to RFCs is a safer choice. > > > The patches seem to be out of order now (and the documentation in the > > commit messages has been removed). > Feedback taken. Work in progress. > > On Tue, Jan 17, 2023 at 2:44 PM Jacob Champion <jchampion@timescale.com> wrote: > > > > On Sun, Jan 15, 2023 at 12:03 PM Andrey Chudnovsky > > <achudnovskij@gmail.com> wrote: > > > 2. Removed Device Code implementation in libpq. Several reasons: > > > - Reduce scope and focus on the protocol first. > > > - Device code implementation uses iddawc dependency. Taking this > > > dependency is a controversial step which requires broader discussion. > > > - Device code implementation without iddaws would significantly > > > increase the scope of the patch, as libpq needs to poll the token > > > endpoint, setup different API calls, e.t.c. > > > - That flow should canonically only be used for clients which can't > > > invoke browsers. If it is the only flow to be implemented, it can be > > > used in the context when it's not expected by the OAUTH protocol. > > > > I'm not understanding the concern in the final point -- providers > > generally require you to opt into device authorization, at least as far > > as I can tell. So if you decide that it's not appropriate for your use > > case... don't enable it. (And I haven't seen any claims that opting into > > device authorization weakens the other flows in any way. So if we're > > going to implement a flow in libpq, I still think device authorization > > is the best choice, since it works on headless machines as well as those > > with browsers.) > > > > All of this points at a bigger question to the community: if we choose > > not to provide a flow implementation in libpq, is adding OAUTHBEARER > > worth the additional maintenance cost? > > > > My personal vote would be "no". I think the hook-only approach proposed > > here would ensure that only larger providers would implement it in > > practice, and in that case I'd rather spend cycles on generic SASL. > > > > > 3. Temporarily removed test suite. We are actively working on aligning > > > the tests with the latest changes. Will add a patch with tests soon. > > > > Okay. Case in point, the following change to the patch appears to be > > invalid JSON: > > > > > + appendStringInfo(&buf, > > > + "{ " > > > + "\"status\": \"invalid_token\", " > > > + "\"openid-configuration\": \"%s\"," > > > + "\"scope\": \"%s\" ", > > > + "\"issuer\": \"%s\" ", > > > + "}", > > > > Additionally, the "issuer" field added here is not part of the RFC. I've > > written my thoughts about unofficial extensions upthread but haven't > > received a response, so I'm going to start being more strident: Please, > > for the sake of reviewers, call out changes you've made to the spec, and > > why they're justified. > > > > The patches seem to be out of order now (and the documentation in the > > commit messages has been removed). > > > > Thanks, > > --Jacob
Attachment
Greetings, * mahendrakar s (mahendrakarforpg@gmail.com) wrote: > The "issuer" field has been removed to align with the RFC > implementation - https://www.rfc-editor.org/rfc/rfc7628. > This patch "v6" is a single patch to support the OAUTH BEARER token > through psql connection string. > Below flow is supported. Added the documentation in the commit messages. > > +----------------------+ +----------+ > | +-------+ | Postgres | > | PQconnect ->| | | | > | | | | +-----------+ > | | | ---------- Empty Token---------> | > | | > | | libpq | <-- Error(Discovery + Scope ) -- | < | Pre-Auth | > | +------+ | | | Hook | > | +- < | Hook | | | +-----------+ > | | +------+ | | | > | v | | | | > | [get token]| | | | > | | | | | | > | + | | | +-----------+ > | PQconnect > | | --------- Access Token --------> | > | Validator | > | | | <---------- Auth Result -------- | < | Hook | > | | | | +-----------+ > | +-------+ | | > +----------------------+ +----------+ > > Please note that we are working on modifying/adding new tests (from > Jacob's Patch) with the latest changes. Will add a patch with tests > soon. Having skimmed back through this thread again, I still feel that the direction that was originally being taken (actually support something in libpq and the backend, be it with libiddawc or something else or even our own code, and not just throw hooks in various places) makes a lot more sense and is a lot closer to how Kerberos and client-side certs and even LDAP auth work today. That also seems like a much better answer for our users when it comes to new authentication methods than having extensions and making libpq developers have to write their own custom code, not to mention that we'd still need to implement something in psql to provide such a hook if we are to have psql actually usefully exercise this, no? In the Kerberos test suite we have today, we actually bring up a proper Kerberos server, set things up, and then test end-to-end installing a keytab for the server, getting a TGT, getting a service ticket, testing authentication and encryption, etc. Looking around, it seems like the equivilant would perhaps be to use Glewlwyd and libiddawc or libcurl and our own code to really be able to test this and show that it works and that we're doing it correctly, and to let us know if we break something. Thanks, Stephen
Attachment
On Mon, Feb 20, 2023 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote: > Having skimmed back through this thread again, I still feel that the > direction that was originally being taken (actually support something in > libpq and the backend, be it with libiddawc or something else or even > our own code, and not just throw hooks in various places) makes a lot > more sense and is a lot closer to how Kerberos and client-side certs and > even LDAP auth work today. Cool, that helps focus the effort. Thanks! > That also seems like a much better answer > for our users when it comes to new authentication methods than having > extensions and making libpq developers have to write their own custom > code, not to mention that we'd still need to implement something in psql > to provide such a hook if we are to have psql actually usefully exercise > this, no? I don't mind letting clients implement their own flows... as long as it's optional. So even if we did use a hook in the end, I agree that we've got to exercise it ourselves. > In the Kerberos test suite we have today, we actually bring up a proper > Kerberos server, set things up, and then test end-to-end installing a > keytab for the server, getting a TGT, getting a service ticket, testing > authentication and encryption, etc. Looking around, it seems like the > equivilant would perhaps be to use Glewlwyd and libiddawc or libcurl and > our own code to really be able to test this and show that it works and > that we're doing it correctly, and to let us know if we break something. The original patchset includes a test server in Python -- a major advantage being that you can test the client and server independently of each other, since the implementation is so asymmetric. Additionally testing against something like Glewlwyd would be a great way to stack coverage. (If we *only* test against a packaged server, though, it'll be harder to test our stuff in the presence of malfunctions and other corner cases.) Thanks, --Jacob
Thanks for the feedback, > Having skimmed back through this thread again, I still feel that the > direction that was originally being taken (actually support something in > libpq and the backend, be it with libiddawc or something else or even > our own code, and not just throw hooks in various places) makes a lot > more sense and is a lot closer to how Kerberos and client-side certs and > even LDAP auth work today. That also seems like a much better answer > for our users when it comes to new authentication methods than having > extensions and making libpq developers have to write their own custom > code, not to mention that we'd still need to implement something in psql > to provide such a hook if we are to have psql actually usefully exercise > this, no? libpq implementation is the long term plan. However, our intention is to start with the protocol implementation which allows us to build on top of. While device code is the right solution for psql, having that as the only one can result in incentive to use it in the cases it's not intended to. Reasonably good implementation should support all of the following: (1.) authorization code with pkce (for GUI applications) (2.) device code (for console user logins) (3.) client secret (4.) some support for client certificate flow (1.) and (4.) require more work to get implemented, though necessary for encouraging the most secure grant types. As we didn't have those pieces, we're proposing starting with the protocol, which can be used by the ecosystem to build token flow implementations. Then add the libpq support for individual grant types. We originally looked at starting with bare bone protocol for PG16 and adding libpq support in PG17. That plan won't happen, though still splitting the work into separate stages would make more sense in my opinion. Several questions to follow up: (a.) Would you support committing the protocol first? or you see libpq implementation for grants as the prerequisite to consider the auth type? (b.) As of today, the server side core does not validate that the token is actually a valid jwt token. Instead relies on the extensions to do the validation. Do you think server core should do the basic validation before passing to extensions to prevent the auth type being used for anything other than OAUTH flows? Tests are the plan for the commit-ready implementation. Thanks! Andrey. On Tue, Feb 21, 2023 at 2:24 PM Jacob Champion <jchampion@timescale.com> wrote: > > On Mon, Feb 20, 2023 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote: > > Having skimmed back through this thread again, I still feel that the > > direction that was originally being taken (actually support something in > > libpq and the backend, be it with libiddawc or something else or even > > our own code, and not just throw hooks in various places) makes a lot > > more sense and is a lot closer to how Kerberos and client-side certs and > > even LDAP auth work today. > > Cool, that helps focus the effort. Thanks! > > > That also seems like a much better answer > > for our users when it comes to new authentication methods than having > > extensions and making libpq developers have to write their own custom > > code, not to mention that we'd still need to implement something in psql > > to provide such a hook if we are to have psql actually usefully exercise > > this, no? > > I don't mind letting clients implement their own flows... as long as > it's optional. So even if we did use a hook in the end, I agree that > we've got to exercise it ourselves. > > > In the Kerberos test suite we have today, we actually bring up a proper > > Kerberos server, set things up, and then test end-to-end installing a > > keytab for the server, getting a TGT, getting a service ticket, testing > > authentication and encryption, etc. Looking around, it seems like the > > equivilant would perhaps be to use Glewlwyd and libiddawc or libcurl and > > our own code to really be able to test this and show that it works and > > that we're doing it correctly, and to let us know if we break something. > > The original patchset includes a test server in Python -- a major > advantage being that you can test the client and server independently > of each other, since the implementation is so asymmetric. Additionally > testing against something like Glewlwyd would be a great way to stack > coverage. (If we *only* test against a packaged server, though, it'll > be harder to test our stuff in the presence of malfunctions and other > corner cases.) > > Thanks, > --Jacob
Greetings, * Jacob Champion (jchampion@timescale.com) wrote: > On Mon, Feb 20, 2023 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote: > > Having skimmed back through this thread again, I still feel that the > > direction that was originally being taken (actually support something in > > libpq and the backend, be it with libiddawc or something else or even > > our own code, and not just throw hooks in various places) makes a lot > > more sense and is a lot closer to how Kerberos and client-side certs and > > even LDAP auth work today. > > Cool, that helps focus the effort. Thanks! Great, glad to hear that. > > That also seems like a much better answer > > for our users when it comes to new authentication methods than having > > extensions and making libpq developers have to write their own custom > > code, not to mention that we'd still need to implement something in psql > > to provide such a hook if we are to have psql actually usefully exercise > > this, no? > > I don't mind letting clients implement their own flows... as long as > it's optional. So even if we did use a hook in the end, I agree that > we've got to exercise it ourselves. This really doesn't feel like a great area to try and do hooks or similar in, not the least because that approach has been tried and tried again (PAM, GSSAPI, SASL would all be examples..) and frankly none of them has turned out great (which is why we can't just tell people "well, install the pam_oauth2 and watch everything work!") and this strikes me as trying to do that yet again but worse as it's not even a dedicated project trying to solve the problem but more like a side project. SCRAM was good, we've come a long way thanks to that, this feels like it should be more in line with that rather than trying to invent yet another new "generic" set of hooks/APIs that will just cause DBAs and our users headaches trying to make work. > > In the Kerberos test suite we have today, we actually bring up a proper > > Kerberos server, set things up, and then test end-to-end installing a > > keytab for the server, getting a TGT, getting a service ticket, testing > > authentication and encryption, etc. Looking around, it seems like the > > equivilant would perhaps be to use Glewlwyd and libiddawc or libcurl and > > our own code to really be able to test this and show that it works and > > that we're doing it correctly, and to let us know if we break something. > > The original patchset includes a test server in Python -- a major > advantage being that you can test the client and server independently > of each other, since the implementation is so asymmetric. Additionally > testing against something like Glewlwyd would be a great way to stack > coverage. (If we *only* test against a packaged server, though, it'll > be harder to test our stuff in the presence of malfunctions and other > corner cases.) Oh, that's even better- I agree entirely that having test code that can be instructed to return specific errors so that we can test that our code responds properly is great (and is why pgbackrest has things like a stub'd out libpq, fake s3, GCS, and Azure servers, and more) and would certainly want to keep that, even if we also build out a test that uses a real server to provide integration testing with not-our-code too. Thanks! Stephen
Attachment
> This really doesn't feel like a great area to try and do hooks or > similar in, not the least because that approach has been tried and tried > again (PAM, GSSAPI, SASL would all be examples..) and frankly none of > them has turned out great (which is why we can't just tell people "well, > install the pam_oauth2 and watch everything work!") and this strikes me > as trying to do that yet again but worse as it's not even a dedicated > project trying to solve the problem but more like a side project. In this case it's not intended to be an open-ended hook, but rather an implementation of a specific rfc (rfc-7628) which defines a client-server communication for the authentication flow. The rfc itself does leave a lot of flexibility on specific parts of the implementation. Which do require hooks: (1.) Server side hook to validate the token, which is specific to the OAUTH provider. (2.) Client side hook to request the client to obtain the token. On (1.), we would need a hook for the OAUTH provider extension to do validation. We can though do some basic check that the credential is indeed a JWT token signed by the requested issuer. Specifically (2.) is where we can provide a layer in libpq to simplify the integration. i.e. implement some OAUTH flows. Though we would need some flexibility for the clients to bring their own token: For example there are cases where the credential to obtain the token is stored in a separate secure location and the token is returned from a separate service or pushed from a more secure environment. > another new "generic" set of hooks/APIs that will just cause DBAs and > our users headaches trying to make work. As I mentioned above, it's an rfc implementation, rather than our invention. When it comes to DBAs and the users. Builtin libpq implementations which allows psql and pgadmin to seamlessly connect should suffice those needs. While extensibility would allow the ecosystem to be open for OAUTH providers, SAAS developers, PAAS providers and other institutional players. Thanks! Andrey. On Thu, Feb 23, 2023 at 10:47 AM Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, > > * Jacob Champion (jchampion@timescale.com) wrote: > > On Mon, Feb 20, 2023 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote: > > > Having skimmed back through this thread again, I still feel that the > > > direction that was originally being taken (actually support something in > > > libpq and the backend, be it with libiddawc or something else or even > > > our own code, and not just throw hooks in various places) makes a lot > > > more sense and is a lot closer to how Kerberos and client-side certs and > > > even LDAP auth work today. > > > > Cool, that helps focus the effort. Thanks! > > Great, glad to hear that. > > > > That also seems like a much better answer > > > for our users when it comes to new authentication methods than having > > > extensions and making libpq developers have to write their own custom > > > code, not to mention that we'd still need to implement something in psql > > > to provide such a hook if we are to have psql actually usefully exercise > > > this, no? > > > > I don't mind letting clients implement their own flows... as long as > > it's optional. So even if we did use a hook in the end, I agree that > > we've got to exercise it ourselves. > > This really doesn't feel like a great area to try and do hooks or > similar in, not the least because that approach has been tried and tried > again (PAM, GSSAPI, SASL would all be examples..) and frankly none of > them has turned out great (which is why we can't just tell people "well, > install the pam_oauth2 and watch everything work!") and this strikes me > as trying to do that yet again but worse as it's not even a dedicated > project trying to solve the problem but more like a side project. SCRAM > was good, we've come a long way thanks to that, this feels like it > should be more in line with that rather than trying to invent yet > another new "generic" set of hooks/APIs that will just cause DBAs and > our users headaches trying to make work. > > > > In the Kerberos test suite we have today, we actually bring up a proper > > > Kerberos server, set things up, and then test end-to-end installing a > > > keytab for the server, getting a TGT, getting a service ticket, testing > > > authentication and encryption, etc. Looking around, it seems like the > > > equivilant would perhaps be to use Glewlwyd and libiddawc or libcurl and > > > our own code to really be able to test this and show that it works and > > > that we're doing it correctly, and to let us know if we break something. > > > > The original patchset includes a test server in Python -- a major > > advantage being that you can test the client and server independently > > of each other, since the implementation is so asymmetric. Additionally > > testing against something like Glewlwyd would be a great way to stack > > coverage. (If we *only* test against a packaged server, though, it'll > > be harder to test our stuff in the presence of malfunctions and other > > corner cases.) > > Oh, that's even better- I agree entirely that having test code that can > be instructed to return specific errors so that we can test that our > code responds properly is great (and is why pgbackrest has things like > a stub'd out libpq, fake s3, GCS, and Azure servers, and more) and would > certainly want to keep that, even if we also build out a test that uses > a real server to provide integration testing with not-our-code too. > > Thanks! > > Stephen
Greetings, * Andrey Chudnovsky (achudnovskij@gmail.com) wrote: > > This really doesn't feel like a great area to try and do hooks or > > similar in, not the least because that approach has been tried and tried > > again (PAM, GSSAPI, SASL would all be examples..) and frankly none of > > them has turned out great (which is why we can't just tell people "well, > > install the pam_oauth2 and watch everything work!") and this strikes me > > as trying to do that yet again but worse as it's not even a dedicated > > project trying to solve the problem but more like a side project. > > In this case it's not intended to be an open-ended hook, but rather an > implementation of a specific rfc (rfc-7628) which defines a > client-server communication for the authentication flow. > The rfc itself does leave a lot of flexibility on specific parts of > the implementation. Which do require hooks: Color me skeptical on an RFC that requires hooks. > (1.) Server side hook to validate the token, which is specific to the > OAUTH provider. > (2.) Client side hook to request the client to obtain the token. Perhaps I'm missing it... but weren't these handled with what the original patch that Jacob had was doing? > On (1.), we would need a hook for the OAUTH provider extension to do > validation. We can though do some basic check that the credential is > indeed a JWT token signed by the requested issuer. > > Specifically (2.) is where we can provide a layer in libpq to simplify > the integration. i.e. implement some OAUTH flows. > Though we would need some flexibility for the clients to bring their own token: > For example there are cases where the credential to obtain the token > is stored in a separate secure location and the token is returned from > a separate service or pushed from a more secure environment. In those cases... we could, if we wanted, simply implement the code to actually pull the token, no? We don't *have* to have a hook here for this, we could just make it work. > > another new "generic" set of hooks/APIs that will just cause DBAs and > > our users headaches trying to make work. > As I mentioned above, it's an rfc implementation, rather than our invention. While I only took a quick look, I didn't see anything in that RFC that explicitly says that hooks or a plugin or a library or such is required to meet the RFC. Sure, there are places which say that the implementation is specific to a particular server or client but that's not the same thing. > When it comes to DBAs and the users. > Builtin libpq implementations which allows psql and pgadmin to > seamlessly connect should suffice those needs. > While extensibility would allow the ecosystem to be open for OAUTH > providers, SAAS developers, PAAS providers and other institutional > players. Each to end up writing their own code to do largely the same thing without the benefit of the larger community to be able to review and ensure that it's done properly? That doesn't sound like a great approach to me. Thanks, Stephen
Attachment
On Fri, Sep 23, 2022 at 3:39 PM Jacob Champion <jchampion@timescale.com> wrote: > Here's a newly rebased v5. (They're all zipped now, which I probably > should have done a while back, sorry.) To keep this current, v7 is rebased over latest, without the pluggable authentication patches. This doesn't yet address the architectural feedback that was discussed previously, so if you're primarily interested in that, you can safely ignore this version of the patchset. The key changes here include - Meson support, for both the build and the pytest suite - Cirrus support (and unsurprisingly, Mac and Windows builds fail due to the Linux-oriented draft code) - A small tweak to support iddawc down to 0.9.8 (shipped with e.g. Debian Bullseye) - Removal of the authn_id test extension in favor of SYSTEM_USER The meson+pytest support was big enough that I split it into its own patch. It's not very polished yet, but it mostly works, and when running tests via Meson it'll now spin up a test server for you. My virtualenv approach apparently interacts poorly with the multiarch Cirrus setup (64-bit tests pass, 32-bit tests fail). Moving forward, the first thing I plan to tackle is asynchronous operation, so that polling clients can still operate sanely. If I can find a good solution there, the conversations about possible extension points should get a lot easier. Thanks, --Jacob
Attachment
On 4/27/23 10:35, Jacob Champion wrote: > Moving forward, the first thing I plan to tackle is asynchronous > operation, so that polling clients can still operate sanely. If I can > find a good solution there, the conversations about possible extension > points should get a lot easier. Attached is patchset v8, now with concurrency and 300% more cURL! And many more questions to answer. This is a full reimplementation of the client-side OAuth flow. It's an async-first engine built on top of cURL's multi handles. All pending operations are multiplexed into a single epoll set (the "altsock"), which is exposed through PQsocket() for the duration of the OAuth flow. Clients return to the flow on their next call to PQconnectPoll(). Andrey and Mahendrakar: you'll probably be interested in the conn->async_auth() callback, conn->altsock, and the pg_fe_run_oauth_flow entry point. This is intended to be the foundation for alternative flows. I've kept the blocking iddawc implementation for comparison, but if you're running the tests against it, be aware that the asynchronous tests will, predictably, hang. Skip them with `py.test -k 'not asynchronous'`. = The Good = - PQconnectPoll() is no longer indefinitely blocked on a single connection's OAuth handshake. (iddawc doesn't appear to have any asynchronous primitives in its API, unless I've missed something crucial.) - We now have a swappable entry point. Alternative flows could be implemented by applications without forcing clients to redesign their polling loops (PQconnect* should just work as expected). - We have full control over corner cases in our default flow. Debugging failures is much nicer, with explanations of exactly what has gone wrong and where, compared to iddawc's "I_ERROR" messages. - cURL is not a lightweight library by any means, but we're no longer bundling things like web servers that we're not going to use. = The Bad = - Unsurprisingly, there's a lot more code now that we're implementing the flow ourselves. The client patch has tripled in size, and we'd be on the hook for implementing and staying current with the RFCs. - The client implementation is currently epoll-/Linux-specific. I think kqueue shouldn't be too much trouble for the BSDs, but it's even more code to maintain. - Some clients in the wild (psycopg2/psycopg) suppress all notifications during PQconnectPoll(). To accommodate them, I no longer use the noticeHooks for communicating the user code, but that means we have to come up with some other way to let applications override the printing to stderr. Something like the OpenSSL decryption callback, maybe? = The Ugly = - Unless someone is aware of some amazing Winsock magic, I'm pretty sure the multiplexed-socket approach is dead in the water on Windows. I think the strategy there probably has to be a background thread plus a fake "self-pipe" (loopback socket) for polling... which may be controversial? - We have to figure out how to initialize cURL in a thread-safe manner. Newer versions of libcurl and OpenSSL improve upon this situation, but I don't think there's a way to check at compile time whether the initialization strategy is safe or not (and even at runtime, I think there may be a chicken-and-egg problem with the API, where it's not safe to check for thread-safe initialization until after you've safely initialized). = Next Steps = There are so many TODOs in the cURL implementation: it's been a while since I've done any libcurl programming, it all needs to be hardened, and I need to comb through the relevant specs again. But I don't want to gold-plate it if this overall approach is unacceptable. So, questions for the gallery: 1) Would starting up a background thread (pooled or not) be acceptable on Windows? Alternatively, does anyone know enough Winsock deep magic to combine multiple pending events into one (selectable!) socket? 2) If a background thread is acceptable on one platform, does it make more sense to use one on every platform and just have synchronous code everywhere? Or should we use a threadless async implementation when we can? 3) Is the current conn->async_auth() entry point sufficient for an application to implement the Microsoft flows discussed upthread? 4) Would we want to try to require a new enough cURL/OpenSSL to avoid thread safety problems during initialization, or do we need to introduce some API equivalent to PQinitOpenSSL? 5) Does this maintenance tradeoff (full control over the client vs. a large amount of RFC-governed code) seem like it could be okay? Thanks, --Jacob
Attachment
On Sat, 20 May 2023 at 00:01, Jacob Champion <jchampion@timescale.com> wrote: > - Some clients in the wild (psycopg2/psycopg) suppress all notifications > during PQconnectPoll(). If there is anything we can improve in psycopg please reach out. -- Daniele
On Tue, May 23, 2023 at 4:22 AM Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > On Sat, 20 May 2023 at 00:01, Jacob Champion <jchampion@timescale.com> wrote: > > - Some clients in the wild (psycopg2/psycopg) suppress all notifications > > during PQconnectPoll(). > > If there is anything we can improve in psycopg please reach out. Will do, thank you! But in this case, I think there's nothing to improve in psycopg -- in fact, it highlighted the problem with my initial design, and now I think the notice processor will never be an appropriate avenue for communication of the user code. The biggest issue is that there's a chicken-and-egg situation: if you're using the synchronous PQconnect* API, you can't override the notice hooks while the handshake is in progress, because you don't have a connection handle yet. The second problem is that there are a bunch of parameters coming back from the server (user code, verification URI, expiration time) that the application may choose to display or use, and communicating those pieces in a (probably already translated) flat text string is a pretty hostile API. So I think we'll probably need to provide a global handler API, similar to the passphrase hook we currently provide, that can receive these pieces separately and assemble them however the application desires. The hard part will be to avoid painting ourselves into a corner, because this particular information is specific to the device authorization flow, and if we ever want to add other flows into libpq, we'll probably not want to add even more hooks. Thanks, --Jacob
On 5/19/23 15:01, Jacob Champion wrote: > But I don't want to > gold-plate it if this overall approach is unacceptable. So, questions > for the gallery: > > 1) Would starting up a background thread (pooled or not) be acceptable > on Windows? Alternatively, does anyone know enough Winsock deep magic to > combine multiple pending events into one (selectable!) socket? > > 2) If a background thread is acceptable on one platform, does it make > more sense to use one on every platform and just have synchronous code > everywhere? Or should we use a threadless async implementation when we can? > > 3) Is the current conn->async_auth() entry point sufficient for an > application to implement the Microsoft flows discussed upthread? > > 4) Would we want to try to require a new enough cURL/OpenSSL to avoid > thread safety problems during initialization, or do we need to introduce > some API equivalent to PQinitOpenSSL? > > 5) Does this maintenance tradeoff (full control over the client vs. a > large amount of RFC-governed code) seem like it could be okay? There was additional interest at PGCon, so I've registered this in the commitfest. Potential reviewers should be aware that the current implementation requires Linux (or, more specifically, epoll), as the cfbot shows. But if you have any opinions on the above questions, those will help me tackle the other platforms. :D Thanks! --Jacob
On Sat, May 20, 2023 at 10:01 AM Jacob Champion <jchampion@timescale.com> wrote: > - The client implementation is currently epoll-/Linux-specific. I think > kqueue shouldn't be too much trouble for the BSDs, but it's even more > code to maintain. I guess you also need a fallback that uses plain old POSIX poll()? I see you're not just using epoll but also timerfd. Could that be converted to plain old timeout bookkeeping? That should be enough to get every other Unix and *possibly* also Windows to work with the same code path. > - Unless someone is aware of some amazing Winsock magic, I'm pretty sure > the multiplexed-socket approach is dead in the water on Windows. I think > the strategy there probably has to be a background thread plus a fake > "self-pipe" (loopback socket) for polling... which may be controversial? I am not a Windows user or hacker, but there are certainly several ways to multiplex sockets. First there is the WSAEventSelect() + WaitForMultipleObjects() approach that latch.c uses. It has the advantage that it allows socket readiness to be multiplexed with various other things that use Windows "events". But if you don't need that, ie you *only* need readiness-based wakeup for a bunch of sockets and no other kinds of fd or object, you can use winsock's plain old select() or its fairly faithful poll() clone called WSAPoll(). It looks a bit like that'd be true here if you could kill the timerfd? It's a shame to write modern code using select(), but you can find lots of shouting all over the internet about WSAPoll()'s defects, most famously the cURL guys[1] whose blog is widely cited, so people still do it. Possibly some good news on that front: by my reading of the docs, it looks like that problem was fixed in Windows 10 2004[2] which itself is by now EOL, so all systems should have the fix? I suspect that means that, finally, you could probably just use the same poll() code path for Unix (when epoll is not available) *and* Windows these days, making porting a lot easier. But I've never tried it, so I don't know what other problems there might be. Another thing people complain about is the lack of socketpair() or similar in winsock which means you unfortunately can't easily make anonymous select/poll-compatible local sockets, but that doesn't seem to be needed here. [1] https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/ [2] https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsapoll
On Fri, Jun 30, 2023 at 9:29 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sat, May 20, 2023 at 10:01 AM Jacob Champion <jchampion@timescale.com> wrote: > > - The client implementation is currently epoll-/Linux-specific. I think > > kqueue shouldn't be too much trouble for the BSDs, but it's even more > > code to maintain. > > I guess you also need a fallback that uses plain old POSIX poll()? The use of the epoll API here is to combine several sockets into one, not to actually call epoll_wait() itself. kqueue descriptors should let us do the same, IIUC. > I see you're not just using epoll but also timerfd. Could that be > converted to plain old timeout bookkeeping? That should be enough to > get every other Unix and *possibly* also Windows to work with the same > code path. I might be misunderstanding your suggestion, but I think our internal bookkeeping is orthogonal to that. The use of timerfd here allows us to forward libcurl's timeout requirements up to the top-level PQsocket(). As an example, libcurl is free to tell us to call it again in ten milliseconds, and we have to make sure a nonblocking client calls us again after that elapses; otherwise they might hang waiting for data that's not coming. > > - Unless someone is aware of some amazing Winsock magic, I'm pretty sure > > the multiplexed-socket approach is dead in the water on Windows. I think > > the strategy there probably has to be a background thread plus a fake > > "self-pipe" (loopback socket) for polling... which may be controversial? > > I am not a Windows user or hacker, but there are certainly several > ways to multiplex sockets. First there is the WSAEventSelect() + > WaitForMultipleObjects() approach that latch.c uses. I don't think that strategy plays well with select() clients, though -- it requires a handle array, and we've just got the one socket. My goal is to maintain compatibility with existing PQconnectPoll() applications, where the only way we get to communicate with the client is through the PQsocket() for the connection. Ideally, you shouldn't have to completely rewrite your application loop just to make use of OAuth. (I assume a requirement like that would be a major roadblock to committing this -- and if that's not a correct assumption, then I guess my job gets a lot easier?) > It's a shame to write modern code using select(), but you can find > lots of shouting all over the internet about WSAPoll()'s defects, most > famously the cURL guys[1] whose blog is widely cited, so people still > do it. Right -- that's basically the root of my concern. I can't guarantee that existing Windows clients out there are all using WaitForMultipleObjects(). From what I can tell, whatever we hand up through PQsocket() has to be fully Winsock-/select-compatible. > Another thing people > complain about is the lack of socketpair() or similar in winsock which > means you unfortunately can't easily make anonymous > select/poll-compatible local sockets, but that doesn't seem to be > needed here. For the background-thread implementation, it probably would be. I've been looking at libevent (BSD-licensed) and its socketpair hack for Windows... Thanks! --Jacob
On Thu, Jul 6, 2023 at 9:00 AM Jacob Champion <jchampion@timescale.com> wrote: > My goal is to maintain compatibility with existing PQconnectPoll() > applications, where the only way we get to communicate with the client > is through the PQsocket() for the connection. Ideally, you shouldn't > have to completely rewrite your application loop just to make use of > OAuth. (I assume a requirement like that would be a major roadblock to > committing this -- and if that's not a correct assumption, then I > guess my job gets a lot easier?) Ah, right, I get it. I guess there are a couple of ways to do it if we give up the goal of no-code-change-for-the-client: 1. Generalised PQsocket(), that so that a client can call something like: int PQpollset(const PGConn *conn, struct pollfd fds[], int fds_size, int *nfds, int *timeout_ms); That way, libpq could tell you about which events it would like to wait for on which fds, and when it would like you to call it back due to timeout, and you can either pass that information directly to poll() or WSAPoll() or some equivalent interface (we don't care, we just gave you the info you need), or combine it in obvious ways with whatever else you want to multiplex with in your client program. 2. Convert those events into new libpq events like 'I want you to call me back in 100ms', and 'call me back when socket #42 has data', and let clients handle that by managing their own poll set etc. (This is something I've speculated about to support more efficient postgres_fdw shard query multiplexing; gotta figure out how to get multiple connections' events into one WaitEventSet...) I guess there is a practical middle ground where client code on systems that have epoll/kqueue can use OAUTHBEARER without any code change, and the feature is available on other systems too but you'll have to change your client code to use one of those interfaces or else you get an error 'coz we just can't do it. Or, more likely in the first version, you just can't do it at all... Doesn't seem that bad to me. BTW I will happily do the epoll->kqueue port work if necessary.
On Wed, Jul 5, 2023 at 3:07 PM Thomas Munro <thomas.munro@gmail.com> wrote: > I guess there are a couple of ways to do it if we give up the goal of > no-code-change-for-the-client: > > 1. Generalised PQsocket(), that so that a client can call something like: > > int PQpollset(const PGConn *conn, struct pollfd fds[], int fds_size, > int *nfds, int *timeout_ms); > > That way, libpq could tell you about which events it would like to > wait for on which fds, and when it would like you to call it back due > to timeout, and you can either pass that information directly to > poll() or WSAPoll() or some equivalent interface (we don't care, we > just gave you the info you need), or combine it in obvious ways with > whatever else you want to multiplex with in your client program. I absolutely wanted something like this while I was writing the code (it would have made things much easier), but I'd feel bad adding that much complexity to the API if the vast majority of connections use exactly one socket. Are there other use cases in libpq where you think this expanded API could be useful? Maybe to lift some of the existing restrictions for PQconnectPoll(), add async DNS resolution, or something? Couple complications I can think of at the moment: 1. Clients using persistent pollsets will have to remove old descriptors, presumably by tracking the delta since the last call, which might make for a rough transition. Bookkeeping bugs probably wouldn't show up unless they used OAuth in their test suites. With the current model, that's more hidden and libpq takes responsibility for getting it right. 2. In the future, we might need to think carefully around situations where we want multiple PGConn handles to share descriptors (e.g. multiplexed backend connections). I avoid tricky questions at the moment by assigning only one connection per multi pool. > 2. Convert those events into new libpq events like 'I want you to > call me back in 100ms', and 'call me back when socket #42 has data', > and let clients handle that by managing their own poll set etc. (This > is something I've speculated about to support more efficient > postgres_fdw shard query multiplexing; gotta figure out how to get > multiple connections' events into one WaitEventSet...) Something analogous to libcurl's socket and timeout callbacks [1], then? Or is there an existing libpq API you were thinking about using? > I guess there is a practical middle ground where client code on > systems that have epoll/kqueue can use OAUTHBEARER without any code > change, and the feature is available on other systems too but you'll > have to change your client code to use one of those interfaces or else > you get an error 'coz we just can't do it. That's a possibility -- if your platform is able to do it nicely, might as well use it. (In a similar vein, I'd personally vote against having every platform use a background thread, even if we decided to implement it for Windows.) > Or, more likely in the > first version, you just can't do it at all... Doesn't seem that bad > to me. Any initial opinions on whether it's worse or better than a worker thread? > BTW I will happily do the epoll->kqueue port work if necessary. And I will happily take you up on that; thanks! --Jacob [1] https://curl.se/libcurl/c/CURLMOPT_SOCKETFUNCTION.html
On Fri, Jul 7, 2023 at 4:57 AM Jacob Champion <jchampion@timescale.com> wrote: > On Wed, Jul 5, 2023 at 3:07 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > 2. Convert those events into new libpq events like 'I want you to > > call me back in 100ms', and 'call me back when socket #42 has data', > > and let clients handle that by managing their own poll set etc. (This > > is something I've speculated about to support more efficient > > postgres_fdw shard query multiplexing; gotta figure out how to get > > multiple connections' events into one WaitEventSet...) > > Something analogous to libcurl's socket and timeout callbacks [1], > then? Or is there an existing libpq API you were thinking about using? Yeah. Libpq already has an event concept. I did some work on getting long-lived WaitEventSet objects to be used in various places, some of which got committed[1], but not yet the parts related to postgres_fdw (which uses libpq connections to talk to other PostgreSQL servers, and runs into the limitations of PQsocket()). Horiguchi-san had the good idea of extending the event system to cover socket changes, but I haven't actually tried it yet. One day. > > Or, more likely in the > > first version, you just can't do it at all... Doesn't seem that bad > > to me. > > Any initial opinions on whether it's worse or better than a worker thread? My vote is that it's perfectly fine to make a new feature that only works on some OSes. If/when someone wants to work on getting it going on Windows/AIX/Solaris (that's the complete set of no-epoll, no-kqueue OSes we target), they can write the patch. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
On Thu, Jul 6, 2023 at 1:48 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Jul 7, 2023 at 4:57 AM Jacob Champion <jchampion@timescale.com> wrote: > > Something analogous to libcurl's socket and timeout callbacks [1], > > then? Or is there an existing libpq API you were thinking about using? > > Yeah. Libpq already has an event concept. Thanks -- I don't know how I never noticed libpq-events.h before. Per-connection events (or callbacks) might bring up the same chicken-and-egg situation discussed above, with the notice hook. We'll be fine as long as PQconnectStart is guaranteed to return before the PQconnectPoll engine gets to authentication, and it looks like that's true with today's implementation, which returns pessimistically at several points instead of just trying to continue the exchange. But I don't know if that's intended as a guarantee for the future. At the very least we would have to pin that implementation detail. > > > Or, more likely in the > > > first version, you just can't do it at all... Doesn't seem that bad > > > to me. > > > > Any initial opinions on whether it's worse or better than a worker thread? > > My vote is that it's perfectly fine to make a new feature that only > works on some OSes. If/when someone wants to work on getting it going > on Windows/AIX/Solaris (that's the complete set of no-epoll, no-kqueue > OSes we target), they can write the patch. Okay. I'm curious to hear others' thoughts on that, too, if anyone's lurking. Thanks! --Jacob
Thanks Jacob for making progress on this. > 3) Is the current conn->async_auth() entry point sufficient for an > application to implement the Microsoft flows discussed upthread? Please confirm my understanding of the flow is correct: 1. Client calls PQconnectStart. - The client doesn't know yet what is the issuer and the scope. - Parameters are strings, so callback is not provided yet. 2. Client gets PgConn from PQconnectStart return value and updates conn->async_auth to its own callback. 3. Client polls PQconnectPoll and checks conn->sasl_state until the value is SASL_ASYNC 4. Client accesses conn->oauth_issuer and conn->oauth_scope and uses those info to trigger the token flow. 5. Expectations on async_auth: a. It returns PGRES_POLLING_READING while token acquisition is going on b. It returns PGRES_POLLING_OK and sets conn->sasl_state->token when token acquisition succeeds. 6. Is the client supposed to do anything with the altsock parameter? Is the above accurate understanding? If yes, it looks workable with a couple of improvements I think would be nice: 1. Currently, oauth_exchange function sets conn->async_auth = pg_fe_run_oauth_flow and starts Device Code flow automatically when receiving challenge and metadata from the server. There probably should be a way for the client to prevent default Device Code flow from triggering. 2. The current signature and expectations from async_auth function seems to be tightly coupled with the internal implementation: - Pieces of information need to be picked and updated in different places in the PgConn structure. - Function is expected to return PostgresPollingStatusType which is used to communicate internal state to the client. Would it make sense to separate the internal callback used to communicate with Device Code flow from client facing API? I.e. introduce a new client facing structure and enum to facilitate callback and its return value. ----------- On a separate note: The backend code currently spawns an external command for token validation. As we discussed before, an extension hook would be a more efficient extensibility option. We see clients make 10k+ connections using OAuth tokens per minute to our service, and stating external processes would be too much overhead here. ----------- > 5) Does this maintenance tradeoff (full control over the client vs. a > large amount of RFC-governed code) seem like it could be okay? It's nice for psql to have Device Code flow. Can be made even more convenient with refresh tokens support. And for clients on resource constrained devices to be able to authenticate with Client Credentials (app secret) without bringing more dependencies. In most other cases, upstream PostgreSQL drivers written in higher level languages have libraries / abstractions to implement OAUTH flows for the platforms they support. On Fri, Jul 7, 2023 at 11:48 AM Jacob Champion <jchampion@timescale.com> wrote: > > On Thu, Jul 6, 2023 at 1:48 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Jul 7, 2023 at 4:57 AM Jacob Champion <jchampion@timescale.com> wrote: > > > Something analogous to libcurl's socket and timeout callbacks [1], > > > then? Or is there an existing libpq API you were thinking about using? > > > > Yeah. Libpq already has an event concept. > > Thanks -- I don't know how I never noticed libpq-events.h before. > > Per-connection events (or callbacks) might bring up the same > chicken-and-egg situation discussed above, with the notice hook. We'll > be fine as long as PQconnectStart is guaranteed to return before the > PQconnectPoll engine gets to authentication, and it looks like that's > true with today's implementation, which returns pessimistically at > several points instead of just trying to continue the exchange. But I > don't know if that's intended as a guarantee for the future. At the > very least we would have to pin that implementation detail. > > > > > Or, more likely in the > > > > first version, you just can't do it at all... Doesn't seem that bad > > > > to me. > > > > > > Any initial opinions on whether it's worse or better than a worker thread? > > > > My vote is that it's perfectly fine to make a new feature that only > > works on some OSes. If/when someone wants to work on getting it going > > on Windows/AIX/Solaris (that's the complete set of no-epoll, no-kqueue > > OSes we target), they can write the patch. > > Okay. I'm curious to hear others' thoughts on that, too, if anyone's lurking. > > Thanks! > --Jacob
On Fri, Jul 7, 2023 at 4:57 AM Jacob Champion <jchampion@timescale.com> wrote: > On Wed, Jul 5, 2023 at 3:07 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > BTW I will happily do the epoll->kqueue port work if necessary. > > And I will happily take you up on that; thanks! Some initial hacking, about 2 coffees' worth: https://github.com/macdice/postgres/commits/oauth-kqueue This compiles on FreeBSD and macOS, but I didn't have time to figure out all your Python testing magic so I don't know if it works yet and it's still red on CI... one thing I wondered about is the *altsock = timerfd part which I couldn't do. The situation on macOS is a little odd: the man page says EVFILT_TIMER is not implemented. But clearly it is, we can read the source code as I had to do to find out which unit of time it defaults to[1] (huh, Apple's github repo for Darwin appears to have been archived recently -- no more source code updates? that'd be a shame!), and it works exactly as expected in simple programs. So I would just assume it works until we see evidence otherwise. (We already use a couple of other things on macOS more or less by accident because configure finds them, where they are undocumented or undeclared.) [1] https://github.com/apple/darwin-xnu/blob/main/bsd/kern/kern_event.c#L1345
On Fri, Jul 7, 2023 at 2:16 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > Please confirm my understanding of the flow is correct: > 1. Client calls PQconnectStart. > - The client doesn't know yet what is the issuer and the scope. Right. (Strictly speaking it doesn't even know that OAuth will be used for the connection, yet, though at some point we'll be able to force the issue with e.g. `require_auth=oauth`. That's not currently implemented.) > - Parameters are strings, so callback is not provided yet. > 2. Client gets PgConn from PQconnectStart return value and updates > conn->async_auth to its own callback. This is where some sort of official authn callback registration (see above reply to Daniele) would probably come in handy. > 3. Client polls PQconnectPoll and checks conn->sasl_state until the > value is SASL_ASYNC In my head, the client's custom callback would always be invoked during the call to PQconnectPoll, rather than making the client do work in between calls. That way, a client can use custom flows even with a synchronous PQconnectdb(). > 4. Client accesses conn->oauth_issuer and conn->oauth_scope and uses > those info to trigger the token flow. Right. > 5. Expectations on async_auth: > a. It returns PGRES_POLLING_READING while token acquisition is going on > b. It returns PGRES_POLLING_OK and sets conn->sasl_state->token > when token acquisition succeeds. Yes. Though the token should probably be returned through some explicit part of the callback, now that you mention it... > 6. Is the client supposed to do anything with the altsock parameter? The callback needs to set the altsock up with a select()able descriptor, which wakes up the client when more work is ready to be done. Without that, you can't handle multiple connections on a single thread. > If yes, it looks workable with a couple of improvements I think would be nice: > 1. Currently, oauth_exchange function sets conn->async_auth = > pg_fe_run_oauth_flow and starts Device Code flow automatically when > receiving challenge and metadata from the server. > There probably should be a way for the client to prevent default > Device Code flow from triggering. Agreed. I'd like the client to be able to override this directly. > 2. The current signature and expectations from async_auth function > seems to be tightly coupled with the internal implementation: > - Pieces of information need to be picked and updated in different > places in the PgConn structure. > - Function is expected to return PostgresPollingStatusType which > is used to communicate internal state to the client. > Would it make sense to separate the internal callback used to > communicate with Device Code flow from client facing API? > I.e. introduce a new client facing structure and enum to facilitate > callback and its return value. Yep, exactly right! I just wanted to check that the architecture *looked* sufficient before pulling it up into an API. > On a separate note: > The backend code currently spawns an external command for token validation. > As we discussed before, an extension hook would be a more efficient > extensibility option. > We see clients make 10k+ connections using OAuth tokens per minute to > our service, and stating external processes would be too much overhead > here. +1. I'm curious, though -- what language do you expect to use to write a production validator hook? Surely not low-level C...? > > 5) Does this maintenance tradeoff (full control over the client vs. a > > large amount of RFC-governed code) seem like it could be okay? > > It's nice for psql to have Device Code flow. Can be made even more > convenient with refresh tokens support. > And for clients on resource constrained devices to be able to > authenticate with Client Credentials (app secret) without bringing > more dependencies. > > In most other cases, upstream PostgreSQL drivers written in higher > level languages have libraries / abstractions to implement OAUTH flows > for the platforms they support. Yeah, I'm really interested in seeing which existing high-level flows can be mixed in through a driver. Trying not to get too far ahead of myself :D Thanks for the review! --Jacob
On Fri, Jul 7, 2023 at 6:01 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Jul 7, 2023 at 4:57 AM Jacob Champion <jchampion@timescale.com> wrote: > > On Wed, Jul 5, 2023 at 3:07 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > BTW I will happily do the epoll->kqueue port work if necessary. > > > > And I will happily take you up on that; thanks! > > Some initial hacking, about 2 coffees' worth: > https://github.com/macdice/postgres/commits/oauth-kqueue > > This compiles on FreeBSD and macOS, but I didn't have time to figure > out all your Python testing magic so I don't know if it works yet and > it's still red on CI... This is awesome, thank you! I need to look into the CI more, but it looks like the client tests are passing, which is a good sign. (I don't understand why the server-side tests are failing on FreeBSD, but they shouldn't be using the libpq code at all, so I think your kqueue implementation is in the clear. Cirrus doesn't have the logs from the server-side test failures anywhere -- probably a bug in my Meson patch.) > one thing I wondered about is the *altsock = > timerfd part which I couldn't do. I did that because I'm not entirely sure that libcurl is guaranteed to have cleared out all its sockets from the mux, and I didn't want to invite spurious wakeups. I should probably verify whether or not that's possible. If so, we could just make that code resilient to early wakeup, so that it matters less, or set up a second kqueue that only holds the timer if that turns out to be unacceptable? > The situation on macOS is a little odd: the man page says EVFILT_TIMER > is not implemented. But clearly it is, we can read the source code as > I had to do to find out which unit of time it defaults to[1] (huh, > Apple's github repo for Darwin appears to have been archived recently > -- no more source code updates? that'd be a shame!), and it works > exactly as expected in simple programs. So I would just assume it > works until we see evidence otherwise. (We already use a couple of > other things on macOS more or less by accident because configure finds > them, where they are undocumented or undeclared.) Huh. Something to keep an eye on... might be a problem with older versions? Thanks! --Jacob
On Mon, Jul 10, 2023 at 4:50 PM Jacob Champion <jchampion@timescale.com> wrote: > I don't understand why the > server-side tests are failing on FreeBSD, but they shouldn't be using > the libpq code at all, so I think your kqueue implementation is in the > clear. Oh, whoops, it's just the missed CLOEXEC flag in the final patch. (If the write side of the pipe gets copied around, it hangs open and the validator never sees the "end" of the token.) I'll switch the logic around to set the flag on the write side instead of unsetting it on the read side. I have a WIP patch that passes tests on FreeBSD, which I'll clean up and post Sometime Soon. macOS builds now but still fails before it runs the test; looks like it's having trouble finding OpenSSL during `pip install` of the test modules... Thanks! --Jacob
On Wed, Jul 12, 2023 at 5:50 AM Jacob Champion <jchampion@timescale.com> wrote: > Oh, whoops, it's just the missed CLOEXEC flag in the final patch. (If > the write side of the pipe gets copied around, it hangs open and the > validator never sees the "end" of the token.) I'll switch the logic > around to set the flag on the write side instead of unsetting it on > the read side. Oops, sorry about that. Glad to hear it's all working! (FTR my parenthetical note about macOS/XNU sources on Github was a false alarm: the "apple" account has stopped publishing a redundant copy of that, but "apple-oss-distributions" is the account I should have been looking at and it is live. I guess it migrated at some point, or something. Phew.)
> > - Parameters are strings, so callback is not provided yet.
> > 2. Client gets PgConn from PQconnectStart return value and updates
> > conn->async_auth to its own callback.
>
> This is where some sort of official authn callback registration (see
> above reply to Daniele) would probably come in handy.
+1
> > 3. Client polls PQconnectPoll and checks conn->sasl_state until the
> > value is SASL_ASYNC
>
> In my head, the client's custom callback would always be invoked
> during the call to PQconnectPoll, rather than making the client do
> work in between calls. That way, a client can use custom flows even
> with a synchronous PQconnectdb().
> > > 5. Expectations on async_auth:
> > > a. It returns PGRES_POLLING_READING while token acquisition is going on
> > > b. It returns PGRES_POLLING_OK and sets conn->sasl_state->token
> > > when token acquisition succeeds.
> >
> > Yes. Though the token should probably be returned through some
> > explicit part of the callback, now that you mention it...
>
> > 6. Is the client supposed to do anything with the altsock parameter?
>
> The callback needs to set the altsock up with a select()able
> descriptor, which wakes up the client when more work is ready to be
> done. Without that, you can't handle multiple connections on a single
> thread.
> > 2. Client gets PgConn from PQconnectStart return value and updates
> > conn->async_auth to its own callback.
>
> This is where some sort of official authn callback registration (see
> above reply to Daniele) would probably come in handy.
+1
> > 3. Client polls PQconnectPoll and checks conn->sasl_state until the
> > value is SASL_ASYNC
>
> In my head, the client's custom callback would always be invoked
> during the call to PQconnectPoll, rather than making the client do
> work in between calls. That way, a client can use custom flows even
> with a synchronous PQconnectdb().
The way I see this API working is the asynchronous client needs at least 2 PQConnectPoll calls:
1. To be notified of what the authentication requirements are and get parameters.
2. When it acquires the token, the callback is used to inform libpq of the token and return PGRES_POLLING_OK.
For the synchronous client, the callback implementation would need to be aware of the fact that synchronous implementation invokes callback frequently and be implemented accordingly.
Bottom lime, I don't see much problem with the current proposal. Just the way of callback to know that OAUTH token is requested and get parameters relies on PQconnectPoll being invoked after corresponding parameters of conn object are populated.
> > > 5. Expectations on async_auth:
> > > a. It returns PGRES_POLLING_READING while token acquisition is going on
> > > b. It returns PGRES_POLLING_OK and sets conn->sasl_state->token
> > > when token acquisition succeeds.
> >
> > Yes. Though the token should probably be returned through some
> > explicit part of the callback, now that you mention it...
>
> > 6. Is the client supposed to do anything with the altsock parameter?
>
> The callback needs to set the altsock up with a select()able
> descriptor, which wakes up the client when more work is ready to be
> done. Without that, you can't handle multiple connections on a single
> thread.
Ok, thanks for clarification.
> > On a separate note:
> > The backend code currently spawns an external command for token validation.
> > As we discussed before, an extension hook would be a more efficient
> > extensibility option.
> > We see clients make 10k+ connections using OAuth tokens per minute to
> > our service, and stating external processes would be too much overhead
> > here.
>
> +1. I'm curious, though -- what language do you expect to use to write
> a production validator hook? Surely not low-level C...?
For the server side code, it would likely be identity providers publishing extensions to validate their tokens.
> > On a separate note:
> > The backend code currently spawns an external command for token validation.
> > As we discussed before, an extension hook would be a more efficient
> > extensibility option.
> > We see clients make 10k+ connections using OAuth tokens per minute to
> > our service, and stating external processes would be too much overhead
> > here.
>
> +1. I'm curious, though -- what language do you expect to use to write
> a production validator hook? Surely not low-level C...?
For the server side code, it would likely be identity providers publishing extensions to validate their tokens.
Those can do that in C too. Or extensions now can be implemented in Rust using pgrx. Which is developer friendly enough in my opinion.
> Yeah, I'm really interested in seeing which existing high-level flows
> can be mixed in through a driver. Trying not to get too far ahead of
> myself :D
> Yeah, I'm really interested in seeing which existing high-level flows
> can be mixed in through a driver. Trying not to get too far ahead of
> myself :D
I can think of the following as the most common:
1. Authorization code with PKCE. This is by far the most common for the user login flows. Requires to spin up a browser and listen to redirect URL/Port. Most high level platforms have libraries to do both.
2. Client Certificates. This requires an identity provider specific library to construct and sign the token. The providers publish SDKs to do that for most common app development platforms.
On Tue, Jul 11, 2023 at 10:50 AM Jacob Champion <jchampion@timescale.com> wrote: > I have a WIP patch that passes tests on FreeBSD, which I'll clean up > and post Sometime Soon. macOS builds now but still fails before it > runs the test; looks like it's having trouble finding OpenSSL during > `pip install` of the test modules... Hi Thomas, v9 folds in your kqueue implementation (thanks again!) and I have a quick question to check my understanding: > + case CURL_POLL_REMOVE: > + /* > + * We don't know which of these is currently registered, perhaps > + * both, so we try to remove both. This means we need to tolerate > + * ENOENT below. > + */ > + EV_SET(&ev[nev], socket, EVFILT_READ, EV_DELETE, 0, 0, 0); > + nev++; > + EV_SET(&ev[nev], socket, EVFILT_WRITE, EV_DELETE, 0, 0, 0); > + nev++; > + break; We're not setting EV_RECEIPT for these -- is that because none of the filters we're using are EV_CLEAR, and so it doesn't matter if we accidentally pull pending events off the queue during the kevent() call? v9 also improves the Cirrus debugging experience and fixes more issues on macOS, so the tests should be green there now. The final patch in the series works around what I think is a build bug in psycopg2 2.9 [1] for the BSDs+meson. Thanks, --Jacob [1] https://github.com/psycopg/psycopg2/issues/1599
Attachment
- since-v8.diff.txt
- v9-0001-common-jsonapi-support-FRONTEND-clients.patch.gz
- v9-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v9-0003-backend-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v9-0004-Add-pytest-suite-for-OAuth.patch.gz
- v9-0005-squash-Add-pytest-suite-for-OAuth.patch.gz
- v9-0006-XXX-work-around-psycopg2-build-failures.patch.gz
On Tue, Jul 18, 2023 at 11:55 AM Jacob Champion <jchampion@timescale.com> wrote: > We're not setting EV_RECEIPT for these -- is that because none of the > filters we're using are EV_CLEAR, and so it doesn't matter if we > accidentally pull pending events off the queue during the kevent() call? +1 for EV_RECEIPT ("just tell me about errors, don't drain any events"). I had a vague memory that it caused portability problems. Just checked... it was OpenBSD I was thinking of, but they finally added that flag in 6.2 (2017). Our older-than-that BF OpenBSD animal recently retired so that should be fine. (Yes, without EV_CLEAR it's "level triggered" not "edge triggered" in epoll terminology, so the way I had it was not broken, but the way you're suggesting would be nicer.) Note that you'll have to skip data == 0 (no error) too. + #ifdef HAVE_SYS_EVENT_H + /* macOS doesn't define the time unit macros, but uses milliseconds by default. */ + #ifndef NOTE_MSECONDS + #define NOTE_MSECONDS 0 + #endif + #endif While comparing the cousin OSs' man pages just now, I noticed that it's not only macOS that lacks NOTE_MSECONDS, it's also OpenBSD and NetBSD < 10. Maybe just delete that cruft ^^^ and use literal 0 in fflags directly. FreeBSD, and recently also NetBSD, decided to get fancy with high resolution timers, but 0 gets the traditional unit of milliseconds on all platforms (I just wrote it like that because I started from FreeBSD and didn't know the history/portability story).
On Tue, Jul 18, 2023 at 4:04 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, Jul 18, 2023 at 11:55 AM Jacob Champion <jchampion@timescale.com> wrote: > +1 for EV_RECEIPT ("just tell me about errors, don't drain any > events"). Sounds good. > While comparing the cousin OSs' man pages just now, I noticed that > it's not only macOS that lacks NOTE_MSECONDS, it's also OpenBSD and > NetBSD < 10. Maybe just delete that cruft ^^^ and use literal 0 in > fflags directly. So I don't lose track of it, here's a v10 with those two changes. Thanks! --Jacob
Attachment
- since-v9.diff.txt
- v10-0001-common-jsonapi-support-FRONTEND-clients.patch.gz
- v10-0003-backend-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v10-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v10-0004-Add-pytest-suite-for-OAuth.patch.gz
- v10-0005-squash-Add-pytest-suite-for-OAuth.patch.gz
- v10-0006-XXX-work-around-psycopg2-build-failures.patch.gz
v11 is a quick rebase over the recent Cirrus changes, and I've dropped 0006 now that psycopg2 can build against BSD/Meson setups (thanks Daniele!). --Jacob
Attachment
v12 implements a first draft of a client hook, so applications can replace either the device prompt or the entire OAuth flow. (Andrey and Mahendrakar: hopefully this is close to what you need.) It also cleans up some of the JSON tech debt. Since (IMO) we don't want to introduce new hooks every time we make improvements to the internal flows, the new hook is designed to retrieve multiple pieces of data from the application. Clients either declare their ability to get that data, or delegate the job to the next link in the chain, which by default is a no-op. That lets us add new data types to the end, and older clients will ignore them until they're taught otherwise. (I'm trying hard not to over-engineer this, but it seems like the concept of "give me some piece of data to continue authenticating" could pretty easily subsume things like the PQsslKeyPassHook if we wanted.) The PQAUTHDATA_OAUTH_BEARER_TOKEN case is the one that replaces the flow entirely, as discussed upthread. Your application gets the discovery URI and the requested scope for the connection. It can then either delegate back to libpq (e.g. if the issuer isn't one it can help with), immediately return a token (e.g. if one is already cached for the current user), or install a nonblocking callback to implement a custom async flow. When the connection is closed (or fails), the hook provides a cleanup function to free any resources it may have allocated. Thanks, --Jacob
Attachment
Hi, On Fri, 3 Nov 2023 at 17:14, Jacob Champion <jchampion@timescale.com> wrote: > > v12 implements a first draft of a client hook, so applications can > replace either the device prompt or the entire OAuth flow. (Andrey and > Mahendrakar: hopefully this is close to what you need.) It also cleans > up some of the JSON tech debt. I went through CFbot and found that build is failing, links: https://cirrus-ci.com/task/6061898244816896 https://cirrus-ci.com/task/6624848198238208 https://cirrus-ci.com/task/5217473314684928 https://cirrus-ci.com/task/6343373221527552 Just want to make sure you are aware of these failures. Thanks, Shlok Kumar Kyal
On Fri, Nov 3, 2023 at 5:28 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > Just want to make sure you are aware of these failures. Thanks for the nudge! Looks like I need to reconcile with the changes to JsonLexContext in 1c99cde2. I should be able to get to that next week; in the meantime I'll mark it Waiting on Author. --Jacob
On Fri, Nov 3, 2023 at 4:48 PM Jacob Champion <champion.p@gmail.com> wrote: > Thanks for the nudge! Looks like I need to reconcile with the changes > to JsonLexContext in 1c99cde2. I should be able to get to that next > week; in the meantime I'll mark it Waiting on Author. v13 rebases over latest. The JsonLexContext changes have simplified 0001 quite a bit, and there's probably a bit more minimization that could be done. Unfortunately the configure/Makefile build of libpq now seems to be pulling in an `exit()` dependency in a way that Meson does not. (Or maybe Meson isn't checking?) I still need to investigate that difference and fix it, so I recommend Meson if you're looking to test-drive a build. Thanks, --Jacob
Attachment
Hi Jacob,
Wanted to follow up on one of the topics discussed here in the past:
Do you plan to support adding an extension hook to validate the token?
It would allow a more efficient integration, then spinning a separate process.
Thanks!
Andrey.
On Wed, Nov 8, 2023 at 11:00 AM Jacob Champion <champion.p@gmail.com> wrote:
On Fri, Nov 3, 2023 at 4:48 PM Jacob Champion <champion.p@gmail.com> wrote:
> Thanks for the nudge! Looks like I need to reconcile with the changes
> to JsonLexContext in 1c99cde2. I should be able to get to that next
> week; in the meantime I'll mark it Waiting on Author.
v13 rebases over latest. The JsonLexContext changes have simplified
0001 quite a bit, and there's probably a bit more minimization that
could be done.
Unfortunately the configure/Makefile build of libpq now seems to be
pulling in an `exit()` dependency in a way that Meson does not. (Or
maybe Meson isn't checking?) I still need to investigate that
difference and fix it, so I recommend Meson if you're looking to
test-drive a build.
Thanks,
--Jacob
On Thu, Nov 9, 2023 at 5:43 PM Andrey Chudnovsky <achudnovskij@gmail.com> wrote: > Do you plan to support adding an extension hook to validate the token? > > It would allow a more efficient integration, then spinning a separate process. I think an API in the style of archive modules might probably be a good way to go, yeah. It's probably not very high on the list of priorities, though, since the inputs and outputs are going to "look" the same whether you're inside or outside of the server process. The client side is going to need the bulk of the work/testing/validation. Speaking of which -- how is the current PQauthDataHook design doing when paired with MS AAD (er, Entra now I guess)? I haven't had an Azure test bed available for a while. Thanks, --Jacob
> On 8 Nov 2023, at 20:00, Jacob Champion <champion.p@gmail.com> wrote: > Unfortunately the configure/Makefile build of libpq now seems to be > pulling in an `exit()` dependency in a way that Meson does not. I believe this comes from the libcurl and specifically the ntlm_wb support which is often enabled in system and package manager provided installations. There isn't really a fix here apart from requiring a libcurl not compiled with ntlm_wb support, or adding an exception to the exit() check in the Makefile. Bringing this up with other curl developers to see if it could be fixed we instead decided to deprecate the whole module as it's quirky and not used much. This won't help with existing installations but at least it will be deprecated and removed by the time v17 ships, so gating on a version shipped after its removal will avoid it. https://github.com/curl/curl/commit/04540f69cfd4bf16e80e7c190b645f1baf505a84 > (Or maybe Meson isn't checking?) I still need to investigate that > difference and fix it, so I recommend Meson if you're looking to > test-drive a build. There is no corresponding check in the Meson build, which seems like a TODO. -- Daniel Gustafsson
On Tue, Dec 5, 2023 at 1:44 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 8 Nov 2023, at 20:00, Jacob Champion <champion.p@gmail.com> wrote: > > > Unfortunately the configure/Makefile build of libpq now seems to be > > pulling in an `exit()` dependency in a way that Meson does not. > > I believe this comes from the libcurl and specifically the ntlm_wb support > which is often enabled in system and package manager provided installations. > There isn't really a fix here apart from requiring a libcurl not compiled with > ntlm_wb support, or adding an exception to the exit() check in the Makefile. > > Bringing this up with other curl developers to see if it could be fixed we > instead decided to deprecate the whole module as it's quirky and not used much. > This won't help with existing installations but at least it will be deprecated > and removed by the time v17 ships, so gating on a version shipped after its > removal will avoid it. > > https://github.com/curl/curl/commit/04540f69cfd4bf16e80e7c190b645f1baf505a84 Ooh, thank you for looking into that and fixing it! > > (Or maybe Meson isn't checking?) I still need to investigate that > > difference and fix it, so I recommend Meson if you're looking to > > test-drive a build. > > There is no corresponding check in the Meson build, which seems like a TODO. Okay, I'll look into that too when I get time. Thanks, --Jacob
Hi all, v14 rebases over latest and fixes a warning when assertions are disabled. 0006 is temporary and hacks past a couple of issues I have not yet root caused -- one of which makes me wonder if 0001 needs to be considered alongside the recent pg_combinebackup and incremental JSON work...? --Jacob
Attachment
- since-v13.diff.txt
- v14-0003-backend-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v14-0001-common-jsonapi-support-FRONTEND-clients.patch.gz
- v14-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v14-0004-Add-pytest-suite-for-OAuth.patch.gz
- v14-0005-squash-Add-pytest-suite-for-OAuth.patch.gz
- v14-0006-XXX-temporary-patches-to-build-and-test.patch.gz
On Tue, Feb 20, 2024 at 5:00 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > v14 rebases over latest and fixes a warning when assertions are > disabled. v15 is a housekeeping update that adds typedefs.list entries and runs pgindent. It also includes a temporary patch from Daniel to get the cfbot a bit farther (see above discussion on libcurl/exit). --Jacob
Attachment
- since-v14.diff.txt
- v15-0004-Add-pytest-suite-for-OAuth.patch.gz
- v15-0003-backend-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v15-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v15-0001-common-jsonapi-support-FRONTEND-clients.patch.gz
- v15-0005-squash-Add-pytest-suite-for-OAuth.patch.gz
- v15-0007-REVERT-temporarily-skip-the-exit-check.patch.gz
- v15-0006-XXX-temporary-patches-to-build-and-test.patch.gz
On Thu, Feb 22, 2024 at 6:08 AM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > v15 is a housekeeping update that adds typedefs.list entries and runs > pgindent. v16 is more transformational! Daniel contributed 0004, which completely replaces the validator_command architecture with a C module API. This solves a bunch of problems as discussed upthread and vastly simplifies the test framework for the server side. 0004 also adds a set of Perl tests, which will begin to subsume some of the Python server-side tests as I get around to porting them. (@Daniel: 0005 is my diff against your original patch, for review.) 0008 has been modified to quickfix the pgcommon linkage on the Makefile side; my previous attempt at this only fixed Meson. The patchset is now carrying a lot of squash-cruft, and I plan to flatten it in the next version. Thanks, --Jacob
Attachment
- since-v15.diff.txt
- v16-0005-squash-Introduce-OAuth-validator-libraries.patch.gz
- v16-0001-common-jsonapi-support-FRONTEND-clients.patch.gz
- v16-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v16-0003-backend-add-OAUTHBEARER-SASL-mechanism.patch.gz
- v16-0004-Introduce-OAuth-validator-libraries.patch.gz
- v16-0008-XXX-temporary-patches-to-build-and-test.patch.gz
- v16-0009-REVERT-temporarily-skip-the-exit-check.patch.gz
- v16-0007-squash-Add-pytest-suite-for-OAuth.patch.gz
- v16-0006-Add-pytest-suite-for-OAuth.patch.gz
On Tue, Feb 27, 2024 at 11:20 AM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > This is done in v17, which is also now based on the two patches pulled > out by Daniel in [1]. It looks like my patchset has been eaten by a malware scanner: 550 Message content failed content scanning (Sanesecurity.Foxhole.Mail_gz.UNOFFICIAL) Was there a recent change to the lists? Is anyone able to see what the actual error was so I don't do it again? Thanks, --Jacob
[Trying again, with all patches unzipped and the CC list temporarily removed to avoid flooding people's inboxes. Original message follows.] On Fri, Feb 23, 2024 at 5:01 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > The > patchset is now carrying a lot of squash-cruft, and I plan to flatten > it in the next version. This is done in v17, which is also now based on the two patches pulled out by Daniel in [1]. Besides the squashes, which make up most of the range-diff, I've fixed a call to strncasecmp() which is not available on Windows. Daniel and I discussed trying a Python version of the test server, since the standard library there should give us more goodies to work with. A proof of concept is in 0009. I think the big question I have for it is, how would we communicate what we want the server to do for the test? (We could perhaps switch on magic values of the client ID?) In the end I'd like to be testing close to 100% of the failure modes, and that's likely to mean a lot of back-and-forth if the server implementation isn't in the Perl process. --Jacob [1] https://postgr.es/m/flat/F51F8777-FAF5-49F2-BC5E-8F9EB423ECE0%40yesql.se
Attachment
- since-v16.diff.txt
- v17-0003-Explicitly-require-password-for-SCRAM-exchange.patch
- v17-0004-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v17-0001-common-jsonapi-support-FRONTEND-clients.patch
- v17-0002-Refactor-SASL-exchange-to-return-tri-state-statu.patch
- v17-0006-Introduce-OAuth-validator-libraries.patch
- v17-0007-Add-pytest-suite-for-OAuth.patch
- v17-0008-XXX-temporary-patches-to-build-and-test.patch
- v17-0005-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v17-0009-WIP-Python-OAuth-provider-implementation.patch
> On 28 Feb 2024, at 15:05, Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > [Trying again, with all patches unzipped and the CC list temporarily > removed to avoid flooding people's inboxes. Original message follows.] > > On Fri, Feb 23, 2024 at 5:01 PM Jacob Champion > <jacob.champion@enterprisedb.com> wrote: >> The >> patchset is now carrying a lot of squash-cruft, and I plan to flatten >> it in the next version. > > This is done in v17, which is also now based on the two patches pulled > out by Daniel in [1]. Besides the squashes, which make up most of the > range-diff, I've fixed a call to strncasecmp() which is not available > on Windows. > > Daniel and I discussed trying a Python version of the test server, > since the standard library there should give us more goodies to work > with. A proof of concept is in 0009. I think the big question I have > for it is, how would we communicate what we want the server to do for > the test? (We could perhaps switch on magic values of the client ID?) > In the end I'd like to be testing close to 100% of the failure modes, > and that's likely to mean a lot of back-and-forth if the server > implementation isn't in the Perl process. Thanks for the new version, I'm digesting the test patches but for now I have a few smaller comments: +#define ALLOC(size) malloc(size) I wonder if we should use pg_malloc_extended(size, MCXT_ALLOC_NO_OOM) instead to self document the code. We clearly don't want feature-parity with server- side palloc here. I know we use malloc in similar ALLOC macros so it's not unique in that regard, but maybe? +#ifdef FRONTEND + destroyPQExpBuffer(lex->errormsg); +#else + pfree(lex->errormsg->data); + pfree(lex->errormsg); +#endif Wouldn't it be nicer if we abstracted this into a destroyStrVal function to a) avoid the ifdefs and b) make it more like the rest of the new API? While it's only used in two places (close to each other) it's a shame to let the underlying API bleed through the abstraction. + CURLM *curlm; /* top-level multi handle for cURL operations */ Nitpick, but curl is not capitalized cURL anymore (for some value of "anymore" since it changed in 2016 [0]). I do wonder if we should consistently write "libcurl" as well since we don't use curl but libcurl. + PQExpBufferData work_data; /* scratch buffer for general use (remember + to clear out prior contents first!) */ This seems like asking for subtle bugs due to uncleared buffers bleeding into another operation (especially since we are writing this data across the wire). How about having an array the size of OAuthStep of unallocated buffers where each step use it's own? Storing the content of each step could also be useful for debugging. Looking at the statemachine here it's not an obvious change but also not impossible. + * TODO: This disables DNS resolution timeouts unless libcurl has been + * compiled against alternative resolution support. We should check that. curl_version_info() can be used to check for c-ares support. + * so you don't have to write out the error handling every time. They assume + * that they're embedded in a function returning bool, however. It feels a bit iffy to encode the returntype in the macro, we can use the same trick that DISABLE_SIGPIPE employs where a failaction is passed in. + if (!strcmp(name, field->name)) Project style is to test for (strcmp(x,y) == 0) rather than (!strcmp()) to improve readability. + libpq_append_conn_error(conn, "out of memory"); While not introduced in this patch, it's not an ideal pattern to report "out of memory" errors via a function which may allocate memory. + appendPQExpBufferStr(&conn->errorMessage, + libpq_gettext("server's error message contained an embedded NULL")); We should maybe add ", discarding" or something similar after this string to indicate that there was an actual error which has been thrown away, the error wasn't that the server passed an embedded NULL. +#ifdef USE_OAUTH + else if (strcmp(mechanism_buf.data, OAUTHBEARER_NAME) == 0 && + !selected_mechanism) I wonder if we instead should move the guards inside the statement and error out with "not built with OAuth support" or something similar like how we do with TLS and other optional components? + errdetail("Comma expected, but found character %s.", + sanitize_char(*p)))); The %s formatter should be wrapped like '%s' to indicate that the message part is the character in question (and we can then reuse the translation since the error message already exist for SCRAM). + temp = curl_slist_append(temp, "authorization_code"); + if (!temp) + oom = true; + + temp = curl_slist_append(temp, "implicit"); While not a bug per se, it reads a bit odd to call another operation that can allocate memory when the oom flag has been set. I think we can move some things around a little to make it clearer. The attached diff contains some (most?) of the above as a patch on top of your v17, but as a .txt to keep the CFBot from munging on it. -- Daniel Gustafsson
Attachment
On 2024-02-28 We 09:05, Jacob Champion wrote: > > Daniel and I discussed trying a Python version of the test server, > since the standard library there should give us more goodies to work > with. A proof of concept is in 0009. I think the big question I have > for it is, how would we communicate what we want the server to do for > the test? (We could perhaps switch on magic values of the client ID?) > In the end I'd like to be testing close to 100% of the failure modes, > and that's likely to mean a lot of back-and-forth if the server > implementation isn't in the Perl process. Can you give some more details about what this python gadget would buy us? I note that there are a couple of CPAN modules that provide OAuth2 servers, not sure if they would be of any use. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
> On 28 Feb 2024, at 22:50, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 2024-02-28 We 09:05, Jacob Champion wrote: >> >> Daniel and I discussed trying a Python version of the test server, >> since the standard library there should give us more goodies to work >> with. A proof of concept is in 0009. I think the big question I have >> for it is, how would we communicate what we want the server to do for >> the test? (We could perhaps switch on magic values of the client ID?) >> In the end I'd like to be testing close to 100% of the failure modes, >> and that's likely to mean a lot of back-and-forth if the server >> implementation isn't in the Perl process. > > Can you give some more details about what this python gadget would buy us? I note that there are a couple of CPAN modulesthat provide OAuth2 servers, not sure if they would be of any use. The main benefit would be to be able to provide a full testharness without adding any additional dependencies over what we already have (Python being required by meson). That should ideally make it easy to get good coverage from BF animals as no installation is needed. -- Daniel Gustafsson
[re-adding the CC list I dropped earlier] On Wed, Feb 28, 2024 at 1:52 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 28 Feb 2024, at 22:50, Andrew Dunstan <andrew@dunslane.net> wrote: > > Can you give some more details about what this python gadget would buy us? I note that there are a couple of CPAN modulesthat provide OAuth2 servers, not sure if they would be of any use. > > The main benefit would be to be able to provide a full testharness without > adding any additional dependencies over what we already have (Python being > required by meson). That should ideally make it easy to get good coverage from > BF animals as no installation is needed. As an additional note, the test suite ideally needs to be able to exercise failure modes where the provider itself is malfunctioning. So we hand-roll responses rather than deferring to an external OAuth/OpenID implementation, which adds HTTP and JSON dependencies at minimum, and Python includes both. See also the discussion with Stephen upthread [1]. (I do think it'd be nice to eventually include a prepackaged OAuth server in the test suite, to stack coverage for the happy path and further test interoperability.) Thanks, --Jacob [1] https://postgr.es/m/CAAWbhmh%2B6q4t3P%2BwDmS%3DJuHBpcgF-VM2cXNft8XV02yk-cHCpQ%40mail.gmail.com
> On 27 Feb 2024, at 20:20, Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > On Fri, Feb 23, 2024 at 5:01 PM Jacob Champion > <jacob.champion@enterprisedb.com> wrote: >> The >> patchset is now carrying a lot of squash-cruft, and I plan to flatten >> it in the next version. > > This is done in v17, which is also now based on the two patches pulled > out by Daniel in [1]. Besides the squashes, which make up most of the > range-diff, I've fixed a call to strncasecmp() which is not available > on Windows. Two quick questions: + /* TODO */ + CHECK_SETOPT(actx, CURLOPT_WRITEDATA, stderr); I might be missing something, but what this is intended for in setup_curl_handles()? --- /dev/null +++ b/src/interfaces/libpq/fe-auth-oauth-iddawc.c As discussed off-list I think we should leave iddawc support for later and focus on getting one library properly supported to start with. If you agree, let's drop this from the patchset to make it easier to digest. We should make sure we keep pluggability such that another library can be supported though, much like the libpq TLS support. -- Daniel Gustafsson
On Wed, Feb 28, 2024 at 9:40 AM Daniel Gustafsson <daniel@yesql.se> wrote: > +#define ALLOC(size) malloc(size) > I wonder if we should use pg_malloc_extended(size, MCXT_ALLOC_NO_OOM) instead > to self document the code. We clearly don't want feature-parity with server- > side palloc here. I know we use malloc in similar ALLOC macros so it's not > unique in that regard, but maybe? I have a vague recollection that linking fe_memutils into libpq tripped the exit() checks, but I can try again and see. > +#ifdef FRONTEND > + destroyPQExpBuffer(lex->errormsg); > +#else > + pfree(lex->errormsg->data); > + pfree(lex->errormsg); > +#endif > Wouldn't it be nicer if we abstracted this into a destroyStrVal function to a) > avoid the ifdefs and b) make it more like the rest of the new API? While it's > only used in two places (close to each other) it's a shame to let the > underlying API bleed through the abstraction. Good idea. I'll fold this from your patch into the next set (and do the same for the ones I've marked +1 below). > + CURLM *curlm; /* top-level multi handle for cURL operations */ > Nitpick, but curl is not capitalized cURL anymore (for some value of "anymore" > since it changed in 2016 [0]). I do wonder if we should consistently write > "libcurl" as well since we don't use curl but libcurl. Huh, I missed that memo. Thanks -- that makes it much easier to type! > + PQExpBufferData work_data; /* scratch buffer for general use (remember > + to clear out prior contents first!) */ > This seems like asking for subtle bugs due to uncleared buffers bleeding into > another operation (especially since we are writing this data across the wire). > How about having an array the size of OAuthStep of unallocated buffers where > each step use it's own? Storing the content of each step could also be useful > for debugging. Looking at the statemachine here it's not an obvious change but > also not impossible. I like that idea; I'll give it a look. > + * TODO: This disables DNS resolution timeouts unless libcurl has been > + * compiled against alternative resolution support. We should check that. > curl_version_info() can be used to check for c-ares support. +1 > + * so you don't have to write out the error handling every time. They assume > + * that they're embedded in a function returning bool, however. > It feels a bit iffy to encode the returntype in the macro, we can use the same > trick that DISABLE_SIGPIPE employs where a failaction is passed in. +1 > + if (!strcmp(name, field->name)) > Project style is to test for (strcmp(x,y) == 0) rather than (!strcmp()) to > improve readability. +1 > + libpq_append_conn_error(conn, "out of memory"); > While not introduced in this patch, it's not an ideal pattern to report "out of > memory" errors via a function which may allocate memory. Does trying (and failing) to allocate more memory cause any harm? Best case, we still have enough room in the errorMessage to fit the whole error; worst case, we mark the errorMessage broken and then PQerrorMessage() can handle it correctly. > + appendPQExpBufferStr(&conn->errorMessage, > + libpq_gettext("server's error message contained an embedded NULL")); > We should maybe add ", discarding" or something similar after this string to > indicate that there was an actual error which has been thrown away, the error > wasn't that the server passed an embedded NULL. +1 > +#ifdef USE_OAUTH > + else if (strcmp(mechanism_buf.data, OAUTHBEARER_NAME) == 0 && > + !selected_mechanism) > I wonder if we instead should move the guards inside the statement and error > out with "not built with OAuth support" or something similar like how we do > with TLS and other optional components? This one seems like a step backwards. IIUC, the client can currently handle a situation where the server returns multiple mechanisms (though the server doesn't support that yet), and I'd really like to make use of that property without making users upgrade libpq. That said, it'd be good to have a more specific error message in the case where we don't have a match... > + errdetail("Comma expected, but found character %s.", > + sanitize_char(*p)))); > The %s formatter should be wrapped like '%s' to indicate that the message part > is the character in question (and we can then reuse the translation since the > error message already exist for SCRAM). +1 > + temp = curl_slist_append(temp, "authorization_code"); > + if (!temp) > + oom = true; > + > + temp = curl_slist_append(temp, "implicit"); > While not a bug per se, it reads a bit odd to call another operation that can > allocate memory when the oom flag has been set. I think we can move some > things around a little to make it clearer. I'm not a huge fan of nested happy paths/pyramids of doom, but in this case it's small enough that I'm not opposed. :D > The attached diff contains some (most?) of the above as a patch on top of your > v17, but as a .txt to keep the CFBot from munging on it. Thanks very much! I plan to apply all but the USE_OAUTH guard change (but let me know if you feel strongly about it). --Jacob
On Thu, Feb 29, 2024 at 1:08 PM Daniel Gustafsson <daniel@yesql.se> wrote: > + /* TODO */ > + CHECK_SETOPT(actx, CURLOPT_WRITEDATA, stderr); > I might be missing something, but what this is intended for in > setup_curl_handles()? Ah, that's cruft left over from early debugging, just so that I could see what was going on. I'll remove it. > --- /dev/null > +++ b/src/interfaces/libpq/fe-auth-oauth-iddawc.c > As discussed off-list I think we should leave iddawc support for later and > focus on getting one library properly supported to start with. If you agree, > let's drop this from the patchset to make it easier to digest. We should make > sure we keep pluggability such that another library can be supported though, > much like the libpq TLS support. Agreed. The number of changes being folded into the next set is already pretty big so I think this will wait until next+1. --Jacob
On Thu, Feb 29, 2024 at 4:04 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > On Wed, Feb 28, 2024 at 9:40 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > + temp = curl_slist_append(temp, "authorization_code"); > > + if (!temp) > > + oom = true; > > + > > + temp = curl_slist_append(temp, "implicit"); > > While not a bug per se, it reads a bit odd to call another operation that can > > allocate memory when the oom flag has been set. I think we can move some > > things around a little to make it clearer. > > I'm not a huge fan of nested happy paths/pyramids of doom, but in this > case it's small enough that I'm not opposed. :D I ended up rewriting this patch hunk a bit to handle earlier OOM failures; let me know what you think. -- v18 is the result of plenty of yak shaving now that the Windows build is working. In addition to Daniel's changes as discussed upthread, - I have rebased over v2 of the SASL-refactoring patches - the last CompilerWarnings failure has been fixed - the py.test suite now runs on Windows (but does not yet completely pass) - py.test has been completely disabled for the 32-bit Debian test in Cirrus; I don't know if there's a way to install 32-bit Python side-by-side with 64-bit We are now very, very close to green. The new oauth_validator tests can't work on Windows, since the client doesn't support OAuth there. The python/server tests can handle this case, since they emulate the client behavior; do we want to try something similar in Perl? --Jacob
Attachment
- since-v17.diff.txt
- v18-0005-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v18-0003-Explicitly-require-password-for-SCRAM-exchange.patch
- v18-0002-Refactor-SASL-exchange-to-return-tri-state-statu.patch
- v18-0001-common-jsonapi-support-FRONTEND-clients.patch
- v18-0004-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v18-0009-WIP-Python-OAuth-provider-implementation.patch
- v18-0008-XXX-temporary-patches-to-build-and-test.patch
- v18-0007-Add-pytest-suite-for-OAuth.patch
- v18-0006-Introduce-OAuth-validator-libraries.patch
On Thu, Feb 29, 2024 at 5:08 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > We are now very, very close to green. v19 gets us a bit closer by adding a missed import for Windows. I've also removed iddawc support, so the client patch is lighter. > The new oauth_validator tests can't work on Windows, since the client > doesn't support OAuth there. The python/server tests can handle this > case, since they emulate the client behavior; do we want to try > something similar in Perl? In addition to this question, I'm starting to notice intermittent failures of the form error: ... failed to fetch OpenID discovery document: failed to queue HTTP request This corresponds to a TODO in the libcurl implementation -- if the initial call to curl_multi_socket_action() reports that no handles are running, I treated that as an error. But it looks like it's possible for libcurl to finish a request synchronously if the remote responds quickly enough, so that needs to change. --Jacob
Attachment
- since-v18.diff.txt
- v19-0002-Refactor-SASL-exchange-to-return-tri-state-statu.patch
- v19-0004-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v19-0001-common-jsonapi-support-FRONTEND-clients.patch
- v19-0003-Explicitly-require-password-for-SCRAM-exchange.patch
- v19-0005-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v19-0007-Add-pytest-suite-for-OAuth.patch
- v19-0008-XXX-temporary-patches-to-build-and-test.patch
- v19-0006-Introduce-OAuth-validator-libraries.patch
- v19-0009-WIP-Python-OAuth-provider-implementation.patch
On Fri, Mar 1, 2024 at 9:46 AM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > v19 gets us a bit closer by adding a missed import for Windows. I've > also removed iddawc support, so the client patch is lighter. v20 fixes a bunch more TODOs: 1) the client initial response is validated more closely 2) the server's invalid_token parameters are properly escaped into the containing JSON (though, eventually, we probably want to just reject invalid HBA settings instead of passing them through to the client) 3) Windows-specific responses have been recorded in the test suite While poking at item 2, I was reminded that there's an alternative way to get OAuth parameters from the server, and it's subtly incompatible with the OpenID spec because OpenID didn't follow the rules for .well-known URI construction [1]. :( Some sort of knob will be required to switch the behaviors. I renamed the API for the validator module from res->authenticated to res->authorized. Authentication is optional, but a validator *must* check that the client it's talking to was authorized by the user to access the server, whether or not the user is authenticated. (It may additionally verify that the user is authorized to access the database, or it may simply authenticate the user and defer to the usermap.) Documenting that particular subtlety is going to be interesting... The tests now exercise different issuers for different users, which will also be a good way to signal the server to respond in different ways during the validator tests. It does raise the question: if a third party provides an issuer-specific module, how do we switch between that and some other module for a different user? Andrew asked over at [2] if we could perhaps get 0001 in as well. I think the main thing to figure out there is, is requiring linkage against libpq (see 0008) going to be okay for the frontend binaries that need JSON support? Or do we need to do something like moving PQExpBuffer into src/common to simplify the dependency tree? --Jacob [1] https://www.rfc-editor.org/rfc/rfc8414.html#section-5 [2] https://www.postgresql.org/message-id/682c8fff-355c-a04f-57ac-81055c4ccda8%40dunslane.net
Attachment
- since-v19.diff.txt
- v20-0003-Explicitly-require-password-for-SCRAM-exchange.patch
- v20-0002-Refactor-SASL-exchange-to-return-tri-state-statu.patch
- v20-0005-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v20-0001-common-jsonapi-support-FRONTEND-clients.patch
- v20-0004-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v20-0006-Introduce-OAuth-validator-libraries.patch
- v20-0007-Add-pytest-suite-for-OAuth.patch
- v20-0009-WIP-Python-OAuth-provider-implementation.patch
- v20-0008-XXX-temporary-patches-to-build-and-test.patch
v21 is a quick rebase over HEAD, which has adopted a few pieces of v20. I've also fixed a race condition in the tests. On Mon, Mar 11, 2024 at 3:51 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > Andrew asked over at [2] if we could perhaps get 0001 in as well. I > think the main thing to figure out there is, is requiring linkage > against libpq (see 0008) going to be okay for the frontend binaries > that need JSON support? Or do we need to do something like moving > PQExpBuffer into src/common to simplify the dependency tree? 0001 has been pared down to the part that teaches jsonapi.c to use PQExpBuffer and track out-of-memory conditions; the linkage questions remain. Thanks, --Jacob
Attachment
- since-v20.diff.txt
- v21-0003-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v21-0002-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v21-0004-Introduce-OAuth-validator-libraries.patch
- v21-0001-common-jsonapi-support-libpq-as-a-client.patch
- v21-0005-Add-pytest-suite-for-OAuth.patch
- v21-0006-XXX-temporary-patches-to-build-and-test.patch
- v21-0007-WIP-Python-OAuth-provider-implementation.patch
> On 22 Mar 2024, at 19:21, Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > v21 is a quick rebase over HEAD, which has adopted a few pieces of > v20. I've also fixed a race condition in the tests. Thanks for the rebase, I have a few comments from working with it a bit: In jsonapi.c, makeJsonLexContextCstringLen initialize a JsonLexContext with palloc0 which would need to be ported over to use ALLOC for frontend code. On that note, the errorhandling in parse_oauth_json() for content-type checks attempts to free the JsonLexContext even before it has been created. Here we can just return false. - echo 'libpq must not be calling any function which invokes exit'; exit 1; \ + echo 'libpq must not be calling any function which invokes exit'; \ The offending codepath in libcurl was in the NTLM_WB module, a very old and obscure form of NTLM support which was replaced (yet remained in the tree) a long time ago by a full NTLM implementatin. Based on the findings in this thread it was deprecated with a removal date set to April 2024 [0]. A bug in the 8.4.0 release however disconnected NTLM_WB from the build and given the lack of complaints it was decided to leave as is, so we can base our libcurl requirements on 8.4.0 while keeping the exit() check intact. + else if (strcasecmp(content_type, "application/json") != 0) This needs to handle parameters as well since it will now fail if the charset parameter is appended (which undoubtedly will be pretty common). The easiest way is probably to just verify the mediatype and skip the parameters since we know it can only be charset? + /* TODO investigate using conn->Pfdebug and CURLOPT_DEBUGFUNCTION here */ + CHECK_SETOPT(actx, CURLOPT_VERBOSE, 1L, return false); + CHECK_SETOPT(actx, CURLOPT_ERRORBUFFER, actx->curl_err, return false); CURLOPT_ERRORBUFFER is the old and finicky way of extracting error messages, we should absolutely move to using CURLOPT_DEBUGFUNCTION instead. + /* && response_code != 401 TODO */ ) Why is this marked with a TODO, do you remember? + print("# OAuth provider (PID $pid) is listening on port $port\n"); Code running under Test::More need to use diag() for printing non-test output like this. Another issue I have is the sheer size and the fact that so much code is replaced by subsequent commits, so I took the liberty to squash some of this down into something less daunting. The attached v22 retains the 0001 and then condenses the rest into two commits for frontent and backend parts. I did drop the Python pytest patch since I feel that it's unlikely to go in from this thread (adding pytest seems worthy of its own thread and discussion), and the weight of it makes this seem scarier than it is. For using it, it can be easily applied from the v21 patchset independently. I did tweak the commit message to match reality a bit better, but there is a lot of work left there. The final patch contains fixes for all of the above review comments as well as a some refactoring, smaller clean-ups and TODO fixing. If these fixes are accepted I'll incorporate them into the two commits. Next I intend to work on writing documentation for this. -- Daniel Gustafsson [0] https://curl.se/dev/deprecate.html [1] https://github.com/curl/curl/pull/12479
Attachment
On Thu, Mar 28, 2024 at 3:34 PM Daniel Gustafsson <daniel@yesql.se> wrote: > In jsonapi.c, makeJsonLexContextCstringLen initialize a JsonLexContext with > palloc0 which would need to be ported over to use ALLOC for frontend code. Seems reasonable (but see below, too). > On > that note, the errorhandling in parse_oauth_json() for content-type checks > attempts to free the JsonLexContext even before it has been created. Here we > can just return false. Agreed. They're zero-initialized, so freeJsonLexContext() is safe IIUC, but it's clearer not to call the free function at all. But for these additions: > - makeJsonLexContextCstringLen(&lex, resp->data, resp->len, PG_UTF8, true); > + if (!makeJsonLexContextCstringLen(&lex, resp->data, resp->len, PG_UTF8, true)) > + { > + actx_error(actx, "out of memory"); > + return false; > + } ...since we're using the stack-based API as opposed to the heap-based API, they shouldn't be possible to hit. Any failures in createStrVal() are deferred to parse time on purpose. > - echo 'libpq must not be calling any function which invokes exit'; exit 1; \ > + echo 'libpq must not be calling any function which invokes exit'; \ > The offending codepath in libcurl was in the NTLM_WB module, a very old and > obscure form of NTLM support which was replaced (yet remained in the tree) a > long time ago by a full NTLM implementatin. Based on the findings in this > thread it was deprecated with a removal date set to April 2024 [0]. A bug in > the 8.4.0 release however disconnected NTLM_WB from the build and given the > lack of complaints it was decided to leave as is, so we can base our libcurl > requirements on 8.4.0 while keeping the exit() check intact. Of the Cirrus machines, it looks like only FreeBSD has a new enough libcurl for that. Ubuntu won't until 24.04, Debian Bookworm doesn't have it unless you're running backports, RHEL 9 is still on 7.x... I think requiring libcurl 8 is effectively saying no one will be able to use this for a long time. Is there an alternative? > + else if (strcasecmp(content_type, "application/json") != 0) > This needs to handle parameters as well since it will now fail if the charset > parameter is appended (which undoubtedly will be pretty common). The easiest > way is probably to just verify the mediatype and skip the parameters since we > know it can only be charset? Good catch. application/json no longer defines charsets officially [1], so I think we should be able to just ignore them. The new strncasecmp needs to handle a spurious prefix, too; I have that on my TODO list. > + /* TODO investigate using conn->Pfdebug and CURLOPT_DEBUGFUNCTION here */ > + CHECK_SETOPT(actx, CURLOPT_VERBOSE, 1L, return false); > + CHECK_SETOPT(actx, CURLOPT_ERRORBUFFER, actx->curl_err, return false); > CURLOPT_ERRORBUFFER is the old and finicky way of extracting error messages, we > should absolutely move to using CURLOPT_DEBUGFUNCTION instead. This new way doesn't do the same thing. Here's a sample error: connection to server at "127.0.0.1", port 56619 failed: failed to fetch OpenID discovery document: Weird server reply ( Trying 127.0.0.1:36647... Connected to localhost (127.0.0.1) port 36647 (#0) Mark bundle as not supporting multiuse HTTP 1.0, assume close after body Invalid Content-Length: value Closing connection 0 ) IMO that's too much noise. Prior to the change, the same error would have been connection to server at "127.0.0.1", port 56619 failed: failed to fetch OpenID discovery document: Weird server reply (Invalid Content-Length: value) The error buffer is finicky for sure, but it's also a generic one-line explanation of what went wrong... Is there an alternative API for that I'm missing? > + /* && response_code != 401 TODO */ ) > Why is this marked with a TODO, do you remember? Yeah -- I have a feeling that 401s coming back are going to need more helpful hints to the user, since it implies that libpq itself hasn't authenticated correctly as opposed to some user-related auth failure. I was hoping to find some sample behaviors in the wild and record those into the suite. > + print("# OAuth provider (PID $pid) is listening on port $port\n"); > Code running under Test::More need to use diag() for printing non-test output > like this. Ah, thanks. > +#if LIBCURL_VERSION_MAJOR <= 8 && LIBCURL_VERSION_MINOR < 4 I don't think this catches versions like 7.76, does it? Maybe `LIBCURL_VERSION_MAJOR < 8 || (LIBCURL_VERSION_MAJOR == 8 && LIBCURL_VERSION_MINOR < 4)`, or else `LIBCURL_VERSION_NUM < 0x080400`? > my $pid = open(my $read_fh, "-|", $ENV{PYTHON}, "t/oauth_server.py") > - // die "failed to start OAuth server: $!"; > + or die "failed to start OAuth server: $!"; > > - read($read_fh, $port, 7) // die "failed to read port number: $!"; > + read($read_fh, $port, 7) or die "failed to read port number: $!"; The first hunk here looks good (thanks for the catch!) but I think the second is not correct behavior. $! doesn't get set unless undef is returned, if I'm reading the docs correctly. Yay Perl. > + /* Sanity check the previous operation */ > + if (actx->running != 1) > + { > + actx_error(actx, "failed to queue HTTP request"); > + return false; > + } `running` can be set to zero on success, too. I'm having trouble forcing that code path in a test so far, but we're going to have to do something special in that case. > Another issue I have is the sheer size and the fact that so much code is > replaced by subsequent commits, so I took the liberty to squash some of this > down into something less daunting. The attached v22 retains the 0001 and then > condenses the rest into two commits for frontent and backend parts. Looks good. > I did drop > the Python pytest patch since I feel that it's unlikely to go in from this > thread (adding pytest seems worthy of its own thread and discussion), and the > weight of it makes this seem scarier than it is. Until its coverage gets ported over, can we keep it as a `DO NOT MERGE` patch? Otherwise there's not much to run in Cirrus. > The final patch contains fixes for all of the above review comments as well as > a some refactoring, smaller clean-ups and TODO fixing. If these fixes are > accepted I'll incorporate them into the two commits. > > Next I intend to work on writing documentation for this. Awesome, thank you! I will start adding coverage to the new code paths. --Jacob [1] https://datatracker.ietf.org/doc/html/rfc7159#section-11
On Mon, Apr 1, 2024 at 3:07 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > Awesome, thank you! I will start adding coverage to the new code paths. This patchset rotted more than I thought it would with the new incremental JSON, and I got stuck in rebase hell. Rather than chip away at that while the cfbot is red, here's a rebase of v22 to get the CI up again, and I will port what I've been working on over that. (So, for prior reviewers: recent upthread and offline feedback is not yet incorporated, sorry, come back later.) The big change in v23 is that I've removed fe_memutils.c from libpgcommon_shlib completely, to try to reduce my own hair-pulling when it comes to keeping exit() out of libpq. (It snuck in several ways with incremental JSON.) As far as I can tell, removing fe_memutils causes only one problem, which is that Informix ECPG is relying on pnstrdup(). And I think that may be a bug in itself? There's code in deccvasc() right after the pnstrdup() call that takes care of a failed allocation, but the frontend pnstrdup() is going to call exit() on failure. So my 0001 patch reverts that change, which was made in 0b9466fce. If that can go in, and I'm not missing something that makes that call okay, maybe 0002 can be peeled off as well. Thanks, --Jacob
Attachment
- since-v22.diff.txt
- v23-0002-Remove-fe_memutils-from-libpgcommon_shlib.patch
- v23-0003-common-jsonapi-support-libpq-as-a-client.patch
- v23-0005-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v23-0004-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v23-0006-Review-comments.patch
- v23-0001-Revert-ECPG-s-use-of-pnstrdup.patch
Hi Daniel, On Mon, Apr 1, 2024 at 3:07 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > Of the Cirrus machines, it looks like only FreeBSD has a new enough > libcurl for that. Ubuntu won't until 24.04, Debian Bookworm doesn't > have it unless you're running backports, RHEL 9 is still on 7.x... I > think requiring libcurl 8 is effectively saying no one will be able to > use this for a long time. Is there an alternative? Since the exit() checks appear to be happy now that fe_memutils is out, I've lowered the requirement to the version of libcurl that seems to be shipped in RHEL 8 (7.61.0). This also happens to be when TLS 1.3 ciphersuite control was added to Curl, which seems like something we may want in the very near future, so I'm taking that as a good sign for what is otherwise a very arbitrary cutoff point. Counterproposals welcome :D > Good catch. application/json no longer defines charsets officially > [1], so I think we should be able to just ignore them. The new > strncasecmp needs to handle a spurious prefix, too; I have that on my > TODO list. I've expanded this handling in v24, attached. > This new way doesn't do the same thing. Here's a sample error: > > connection to server at "127.0.0.1", port 56619 failed: failed to > fetch OpenID discovery document: Weird server reply ( Trying > 127.0.0.1:36647... > Connected to localhost (127.0.0.1) port 36647 (#0) > Mark bundle as not supporting multiuse > HTTP 1.0, assume close after body > Invalid Content-Length: value > Closing connection 0 > ) > > IMO that's too much noise. Prior to the change, the same error would have been > > connection to server at "127.0.0.1", port 56619 failed: failed to > fetch OpenID discovery document: Weird server reply (Invalid > Content-Length: value) I have reverted this change for now, but I'm still hoping there's an alternative that can help us clean up? > `running` can be set to zero on success, too. I'm having trouble > forcing that code path in a test so far, but we're going to have to do > something special in that case. For whatever reason, the magic timing for this is popping up more and more often on Cirrus, leading to really annoying test failures. So I may have to abandon the search for a perfect test case and just fix it. > > I did drop > > the Python pytest patch since I feel that it's unlikely to go in from this > > thread (adding pytest seems worthy of its own thread and discussion), and the > > weight of it makes this seem scarier than it is. > > Until its coverage gets ported over, can we keep it as a `DO NOT > MERGE` patch? Otherwise there's not much to run in Cirrus. I have added this back (marked loudly as don't-merge) so that we keep the test coverage for now. The Perl suite (plus Python server) has been tricked out a lot more in v24, so it shouldn't be too bad to get things ported. > > Next I intend to work on writing documentation for this. > > Awesome, thank you! I will start adding coverage to the new code paths. Peter E asked for some documentation stubs to ease review, which I've added. Hopefully that doesn't step on your toes any. A large portion of your "Review comments" patch has been pulled backwards into the previous commits; the remaining pieces are things I'm still peering at and/or writing tests for. I also owe this thread an updated roadmap and summary, to make it a little less daunting for new reviewers. Soon (tm). Thanks! --Jacob
Attachment
- since-v23.diff.txt
- v24-0003-common-jsonapi-support-libpq-as-a-client.patch
- v24-0004-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v24-0002-Remove-fe_memutils-from-libpgcommon_shlib.patch
- v24-0001-Revert-ECPG-s-use-of-pnstrdup.patch
- v24-0006-Review-comments.patch
- v24-0005-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v24-0007-DO-NOT-MERGE-Add-pytest-suite-for-OAuth.patch
I have some comments about the first three patches, that deal with memory management. v24-0001-Revert-ECPG-s-use-of-pnstrdup.patch This looks right. I suppose another approach would be to put a full replacement for strndup() into src/port/. But since there is currently only one user, and most other users should be using pnstrdup(), the presented approach seems ok. We should take the check for exit() calls from libpq and expand it to cover the other libraries as well. Maybe there are other problems like this? v24-0002-Remove-fe_memutils-from-libpgcommon_shlib.patch I don't quite understand how this problem can arise. The description says """ libpq appears to have no need for this, and the exit() references cause our libpq-refs-stamp test to fail if the linker doesn't strip out the unused code. """ But under what circumstances does "the linker doesn't strip out" happen? If this happens accidentally, then we should have seen some buildfarm failures or something? Also, one could look further and notice that restricted_token.c and sprompt.c both a) are not needed by libpq and b) can trigger exit() calls. Then it's not clear why those are not affected. v24-0003-common-jsonapi-support-libpq-as-a-client.patch I'm reminded of thread [0]. I think there is quite a bit of confusion about the pqexpbuffer vs. stringinfo APIs, and they are probably used incorrectly quite a bit. There are now also programs that use both of them! This patch now introduces another layer on top of them. I fear, at the end, nobody is going to understand any of this anymore. Also, changing all the programs to link in libpq for pqexpbuffer seems like the opposite direction from what was suggested in [0]. I think we need to do some deeper thinking here about how we want the memory management on the client side to work. Maybe we could just use one API but have some flags or callbacks to control the out-of-memory behavior. [0]: https://www.postgresql.org/message-id/flat/16d0beac-a141-e5d3-60e9-323da75f49bf%40eisentraut.org
Thanks for working on this patchset, I'm looking over 0004 and 0005 but came across a thing I wanted to bring up one thing sooner than waiting for the review. In parse_device_authz we have this: {"user_code", JSON_TOKEN_STRING, {&authz->user_code}, REQUIRED}, {"verification_uri", JSON_TOKEN_STRING, {&authz->verification_uri}, REQUIRED}, /* * The following fields are technically REQUIRED, but we don't use * them anywhere yet: * * - expires_in */ {"interval", JSON_TOKEN_NUMBER, {&authz->interval_str}, OPTIONAL}, Together with a colleage we found the Azure provider use "verification_url" rather than xxx_uri. Another discrepancy is that it uses a string for the interval (ie: "interval":"5"). One can of course argue that Azure is wrong and should feel bad, but I fear that virtually all (major) providers will have differences like this, so we will have to deal with it in an extensible fashion (compile time, not runtime configurable). I was toying with making the name json_field name member an array, to allow variations. That won't help with the fieldtype differences though, so another train of thought was to have some form of REQUIRED_XOR where fields can tied together. What do you think about something along these lines? Another thing, shouldn't we really parse and interpret *all* REQUIRED fields even if we don't use them to ensure that the JSON is wellformed? If the JSON we get is malformed in any way it seems like the safe/conservative option to error out. -- Daniel Gustafsson
On Mon, Jul 29, 2024 at 5:02 AM Peter Eisentraut <peter@eisentraut.org> wrote: > We should take the check for exit() calls from libpq and expand it to > cover the other libraries as well. Maybe there are other problems like > this? Seems reasonable, yeah. > But under what circumstances does "the linker doesn't strip out" happen? > If this happens accidentally, then we should have seen some buildfarm > failures or something? On my machine, for example, I see differences with optimization levels. Say you inadvertently call pfree() in a _shlib build, as I did multiple times upthread. By itself, that shouldn't actually be a problem (it eventually redirects to free()), so it should be legal to call pfree(), and with -O2 the build succeeds. But with -Og, the exit() check trips, and when I disassemble I see that pg_malloc() et all have infected the shared object. After all, we did tell the linker to put that object file in, and we don't ask it to garbage-collect sections. > Also, one could look further and notice that restricted_token.c and > sprompt.c both a) are not needed by libpq and b) can trigger exit() > calls. Then it's not clear why those are not affected. I think it's easier for the linker to omit whole object files rather than partial ones. If libpq doesn't use any of those APIs there's not really a reason to trip over it. (Maybe the _shlib variants should just contain the minimum objects required to compile.) > I'm reminded of thread [0]. I think there is quite a bit of confusion > about the pqexpbuffer vs. stringinfo APIs, and they are probably used > incorrectly quite a bit. There are now also programs that use both of > them! This patch now introduces another layer on top of them. I fear, > at the end, nobody is going to understand any of this anymore. "anymore"? :) In all seriousness -- I agree that this isn't sustainable. At the moment the worst pain (the new layer) is isolated to jsonapi.c, which seems like an okay place to try something new, since there aren't that many clients. But to be honest I'm not excited about deciding the Best Way Forward based on a sample size of JSON. > Also, > changing all the programs to link in libpq for pqexpbuffer seems like > the opposite direction from what was suggested in [0]. (I don't really want to keep that new libpq dependency. We'd just have to decide where PQExpBuffer is going to go if we're not okay with it.) > I think we need to do some deeper thinking here about how we want the > memory management on the client side to work. Maybe we could just use > one API but have some flags or callbacks to control the out-of-memory > behavior. Any src/common code that needs to handle both in-band and out-of-band failure modes will still have to decide whether it's going to 1) duplicate code paths or 2) just act as if in-band failures can always happen. I think that's probably essential complexity; an ideal API might make it nicer to deal with but it can't abstract it away. Thanks! --Jacob
On Mon, Jul 29, 2024 at 1:51 PM Daniel Gustafsson <daniel@yesql.se> wrote: > Together with a colleage we found the Azure provider use "verification_url" > rather than xxx_uri. Yeah, I think that's originally a Google-ism. (As far as I can tell they helped author the spec for this and then didn't follow it. :/ ) I didn't recall Azure having used it back when I was testing against it, though, so that's good to know. > Another discrepancy is that it uses a string for the > interval (ie: "interval":"5"). Oh, that's a new one. I don't remember needing to hack around that either; maybe iddawc handled it silently? > One can of course argue that Azure is wrong and > should feel bad, but I fear that virtually all (major) providers will have > differences like this, so we will have to deal with it in an extensible fashion > (compile time, not runtime configurable). Such is life... verification_url we will just have to deal with by default, I think, since Google does/did it too. Not sure about interval -- but do we want to make our distribution maintainers deal with a compile-time setting for libpq, just to support various OAuth flavors? To me it seems like we should just hold our noses and support known (large) departures in the core. > I was toying with making the name json_field name member an array, to allow > variations. That won't help with the fieldtype differences though, so another > train of thought was to have some form of REQUIRED_XOR where fields can tied > together. What do you think about something along these lines? If I designed it right, just adding alternative spellings directly to the fields list should work. (The "required" check is by struct member, not name, so both spellings can point to the same destination.) The alternative typing on the other hand might require something like a new sentinel "type" that will accept both... I hadn't expected that. > Another thing, shouldn't we really parse and interpret *all* REQUIRED fields > even if we don't use them to ensure that the JSON is wellformed? If the JSON > we get is malformed in any way it seems like the safe/conservative option to > error out. Good, I was hoping to have a conversation about that. I am fine with either option in principle. In practice I expect to add code to use `expires_in` (so that we can pass it to custom OAuth hook implementations) and `scope` (to check if the server has changed it on us). That leaves the provider... Forcing the provider itself to implement unused stuff in order to interoperate seems like it could backfire on us, especially since IETF standardized an alternate .well-known URI [1] that changes some of these REQUIRED things into OPTIONAL. (One way for us to interpret this: those fields may be required for OpenID, but your OAuth provider might not be an OpenID provider, and our code doesn't require OpenID.) I think we should probably tread lightly in that particular case. Thoughts on that? Thanks! --Jacob [1] https://www.rfc-editor.org/rfc/rfc8414.html
On 30.07.24 00:30, Jacob Champion wrote: >> But under what circumstances does "the linker doesn't strip out" happen? >> If this happens accidentally, then we should have seen some buildfarm >> failures or something? > On my machine, for example, I see differences with optimization > levels. Say you inadvertently call pfree() in a _shlib build, as I did > multiple times upthread. By itself, that shouldn't actually be a > problem (it eventually redirects to free()), so it should be legal to > call pfree(), and with -O2 the build succeeds. But with -Og, the > exit() check trips, and when I disassemble I see that pg_malloc() et > all have infected the shared object. After all, we did tell the linker > to put that object file in, and we don't ask it to garbage-collect > sections. I'm tempted to say, this is working as intended. libpgcommon is built as a static library. So we can put all the object files in the library, and its users only use the object files they really need. So this garbage collection you allude to actually does happen, on an object-file level. You shouldn't use pfree() interchangeably with free(), even if that is not enforced because it's the same thing underneath. First, it just makes sense to keep the alloc and free pairs matched up. And second, on Windows there is some additional restriction (vague knowledge) that the allocate and free functions must be in the same library, so mixing them freely might not even work.
On Fri, Aug 2, 2024 at 10:13 AM Peter Eisentraut <peter@eisentraut.org> wrote: > You shouldn't use pfree() interchangeably with free(), even if that is > not enforced because it's the same thing underneath. First, it just > makes sense to keep the alloc and free pairs matched up. And second, on > Windows there is some additional restriction (vague knowledge) that the > allocate and free functions must be in the same library, so mixing them > freely might not even work. Ah, I forgot about the CRT problems on Windows. So my statement of "the linker might not garbage collect" is pretty much irrelevant. But it sounds like we agree that we shouldn't be using fe_memutils at all in shlib builds. (If you can't use palloc -- it calls exit -- then you can't use pfree either.) Is 0002 still worth pursuing, once I've correctly wordsmithed the commit? Or did I misunderstand your point? Thanks! --Jacob
On 02.08.24 19:51, Jacob Champion wrote: > But it sounds like we agree that we shouldn't be using fe_memutils at > all in shlib builds. (If you can't use palloc -- it calls exit -- then > you can't use pfree either.) Is 0002 still worth pursuing, once I've > correctly wordsmithed the commit? Or did I misunderstand your point? Yes, I think with an adjusted comment and commit message, the actual change makes sense.
On Fri, Aug 2, 2024 at 11:48 AM Peter Eisentraut <peter@eisentraut.org> wrote: > Yes, I think with an adjusted comment and commit message, the actual > change makes sense. Done in v25. ...along with a bunch of other stuff: 1. All the debug-mode things that we want for testing but not in production have now been hidden behind a PGOAUTHDEBUG environment variable, instead of being enabled by default. At the moment, that means 1) sensitive HTTP traffic gets printed on stderr, 2) plaintext HTTP is allowed, and 3) servers may DoS the client by sending a zero-second retry interval (which speeds up testing a lot). I've resurrected some of Daniel's CURLOPT_DEBUGFUNCTION implementation for this. I think this feature needs more thought, but I'm not sure how much. In particular I don't think a connection string option would be appropriate (imagine the "fun" a proxy solution would have with a spray-my-password-to-stderr switch). But maybe it makes sense to further divide the dangerous behavior up, so that for example you can debug the HTTP stream without also allowing plaintext connections, or something. And maybe stricter maintainers would like to compile the feature out entirely? 2. The verification_url variant from Azure and Google is now directly supported. @Daniel: I figured out why I wasn't seeing the string-based-interval issue in my testing. I've been using Azure's v2.0 OpenID endpoint, which seems to be much more compliant than the original. Since this is a new feature, would it be okay to just push new users to that endpoint rather than supporting the previous weirdness in our code? (Either way, I think we should support verification_url.) Along those lines, with Azure I'm now seeing that device_code is not advertised in grant_types_supported... is that new behavior? Or did iddawc just not care? 3. I've restructured the libcurl calls to allow curl_multi_socket_action() to synchronously succeed on its first call, which we've been seeing a lot in the CI as mentioned upthread. This led to a bunch of refactoring of the top-level state machine, which had gotten too complex. I'm much happier with the code organization now, but it's a big diff. 4. I've changed things around to get rid of two modern libcurl deprecation warnings. I need to ask curl-library about my use of curl_multi_socket_all(), which seems like it's exactly what our use case needs. Thanks, --Jacob
Attachment
- since-v24.diff.txt
- v25-0005-backend-add-OAUTHBEARER-SASL-mechanism.patch
- v25-0004-libpq-add-OAUTHBEARER-SASL-mechanism.patch
- v25-0001-Revert-ECPG-s-use-of-pnstrdup.patch
- v25-0002-Remove-fe_memutils-from-libpgcommon_shlib.patch
- v25-0003-common-jsonapi-support-libpq-as-a-client.patch
- v25-0006-Review-comments.patch
- v25-0007-DO-NOT-MERGE-Add-pytest-suite-for-OAuth.patch
On 05.08.24 19:53, Jacob Champion wrote: > On Fri, Aug 2, 2024 at 11:48 AM Peter Eisentraut <peter@eisentraut.org> wrote: >> Yes, I think with an adjusted comment and commit message, the actual >> change makes sense. > > Done in v25. > > ...along with a bunch of other stuff: I have committed 0001, and I plan to backpatch it once the release freeze lifts. I'll work on 0002 next.
On 07.08.24 09:34, Peter Eisentraut wrote: > On 05.08.24 19:53, Jacob Champion wrote: >> On Fri, Aug 2, 2024 at 11:48 AM Peter Eisentraut >> <peter@eisentraut.org> wrote: >>> Yes, I think with an adjusted comment and commit message, the actual >>> change makes sense. >> >> Done in v25. >> >> ...along with a bunch of other stuff: > > I have committed 0001, and I plan to backpatch it once the release > freeze lifts. > > I'll work on 0002 next. I have committed 0002 now.
On Sun, Aug 11, 2024 at 11:37 PM Peter Eisentraut <peter@eisentraut.org> wrote: > I have committed 0002 now. Thanks Peter! Rebased over both in v26. --Jacob
Attachment
On 13.08.24 23:11, Jacob Champion wrote: > On Sun, Aug 11, 2024 at 11:37 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> I have committed 0002 now. > > Thanks Peter! Rebased over both in v26. I have looked again at the jsonapi memory management patch (v26-0001). As previously mentioned, I think adding a third or fourth (depending on how you count) memory management API is maybe something we should avoid. Also, the weird layering where src/common/ now (sometimes) depends on libpq seems not great. I'm thinking, maybe we leave the use of StringInfo at the source code level, but #define the symbols to use PQExpBuffer. Something like #ifdef JSONAPI_USE_PQEXPBUFFER #define StringInfo PQExpBuffer #define appendStringInfo appendPQExpBuffer #define appendBinaryStringInfo appendBinaryPQExpBuffer #define palloc malloc //etc. #endif (simplified, the argument lists might differ) Or, if people find that too scary, something like #ifdef JSONAPI_USE_PQEXPBUFFER #define jsonapi_StringInfo PQExpBuffer #define jsonapi_appendStringInfo appendPQExpBuffer #define jsonapi_appendBinaryStringInfo appendBinaryPQExpBuffer #define jsonapi_palloc malloc //etc. #else #define jsonapi_StringInfo StringInfo #define jsonapi_appendStringInfo appendStringInfo #define jsonapi_appendBinaryStringInfo appendBinaryStringInfo #define jsonapi_palloc palloc //etc. #endif That way, it's at least more easy to follow the source code because you see a mostly-familiar API. Also, we should make this PQExpBuffer-using mode only used by libpq, not by frontend programs. So libpq takes its own copy of jsonapi.c and compiles it using JSONAPI_USE_PQEXPBUFFER. That will make the libpq build descriptions a bit more complicated, but everyone who is not libpq doesn't need to change. Once you get past all the function renaming, the logic changes in jsonapi.c all look pretty reasonable. Refactoring like allocate_incremental_state() makes sense. You could add pg_nodiscard attributes to makeJsonLexContextCstringLen() and makeJsonLexContextIncremental() so that callers who are using the libpq mode are forced to check for errors. Or maybe there is a clever way to avoid even that: Create a fixed JsonLexContext like static const JsonLexContext failed_oom; and on OOM you return that one from makeJsonLexContext*(). And then in pg_parse_json(), when you get handed that context, you return JSON_OUT_OF_MEMORY immediately. Other than that detail and the need to use freeJsonLexContext(), it looks like this new mode doesn't impose any additional burden on callers, since during parsing they need to check for errors anyway, and this just adds one more error type for out of memory. That's a good outcome.
On Mon, Aug 26, 2024 at 1:18 AM Peter Eisentraut <peter@eisentraut.org> wrote: > Or, if people find that too scary, something like > > #ifdef JSONAPI_USE_PQEXPBUFFER > > #define jsonapi_StringInfo PQExpBuffer > [...] > > That way, it's at least more easy to follow the source code because > you see a mostly-familiar API. I was having trouble reasoning about the palloc-that-isn't-palloc code during the first few drafts, so I will try a round with the jsonapi_ prefix. > Also, we should make this PQExpBuffer-using mode only used by libpq, > not by frontend programs. So libpq takes its own copy of jsonapi.c > and compiles it using JSONAPI_USE_PQEXPBUFFER. That will make the > libpq build descriptions a bit more complicated, but everyone who is > not libpq doesn't need to change. Sounds reasonable. It complicates the test coverage situation a little bit, but I think my current patch was maybe insufficient there anyway, since the coverage for the backend flavor silently dropped... > Or maybe there is a clever way to avoid even that: Create a > fixed JsonLexContext like > > static const JsonLexContext failed_oom; > > and on OOM you return that one from makeJsonLexContext*(). And then > in pg_parse_json(), when you get handed that context, you return > JSON_OUT_OF_MEMORY immediately. I like this idea. Thanks! --Jacob
On 28.08.24 18:31, Jacob Champion wrote: > On Mon, Aug 26, 2024 at 4:23 PM Jacob Champion > <jacob.champion@enterprisedb.com> wrote: >> I was having trouble reasoning about the palloc-that-isn't-palloc code >> during the first few drafts, so I will try a round with the jsonapi_ >> prefix. > > v27 takes a stab at that. I have kept the ALLOC/FREE naming to match > the strategy in other src/common source files. This looks pretty good to me. Maybe on the naming side, this seems like a gratuitous divergence: +#define jsonapi_createStringInfo makeStringInfo > The name of the variable JSONAPI_USE_PQEXPBUFFER leads to sections of > code that look like this: > > +#ifdef JSONAPI_USE_PQEXPBUFFER > + if (!new_prediction || !new_fnames || !new_fnull) > + return false; > +#endif > > To me it wouldn't be immediately obvious why "using PQExpBuffer" has > anything to do with this code; the key idea is that we expect any > allocations to be able to fail. Maybe a name like JSONAPI_ALLOW_OOM or > JSONAPI_SHLIB_ALLOCATIONS or...? Seems ok to me as is. I think the purpose of JSONAPI_USE_PQEXPBUFFER is adequately explained by this comment +/* + * By default, we will use palloc/pfree along with StringInfo. In libpq, + * use malloc and PQExpBuffer, and return JSON_OUT_OF_MEMORY on out-of-memory. + */ +#ifdef JSONAPI_USE_PQEXPBUFFER For some of the other proposed names, I'd be afraid that someone might think you are free to mix and match APIs, OOM behavior, and compilation options. Some comments on src/include/common/jsonapi.h: -#include "lib/stringinfo.h" I suspect this will fail headerscheck? Probably needs an exception added there. +#ifdef JSONAPI_USE_PQEXPBUFFER +#define StrValType PQExpBufferData +#else +#define StrValType StringInfoData +#endif Maybe use jsonapi_StrValType here. +typedef struct StrValType StrValType; I don't think that is needed. It would just duplicate typedefs that already exist elsewhere, depending on what StrValType is set to. + bool parse_strval; + StrValType *strval; /* only used if parse_strval == true */ The parse_strval field could use a better explanation. I actually don't understand the need for this field. AFAICT, this is just used to record whether strval is valid. But in the cases where it's not valid, why do we need to record that? Couldn't you just return failed_oom in those cases?
On 03.09.24 22:56, Jacob Champion wrote: >> The parse_strval field could use a better explanation. >> >> I actually don't understand the need for this field. AFAICT, this is >> just used to record whether strval is valid. > No, it's meant to track the value of the need_escapes argument to the > constructor. I've renamed it and moved the assignment to hopefully > make that a little more obvious. WDYT? Yes, this is clearer. This patch (v28-0001) looks good to me now.
On 04.09.24 11:28, Peter Eisentraut wrote: > On 03.09.24 22:56, Jacob Champion wrote: >>> The parse_strval field could use a better explanation. >>> >>> I actually don't understand the need for this field. AFAICT, this is >>> just used to record whether strval is valid. >> No, it's meant to track the value of the need_escapes argument to the >> constructor. I've renamed it and moved the assignment to hopefully >> make that a little more obvious. WDYT? > > Yes, this is clearer. > > This patch (v28-0001) looks good to me now. This has been committed. About the subsequent patches: Is there any sense in dealing with the libpq and backend patches separately in sequence, or is this split just for ease of handling? (I suppose the 0004 "review comments" patch should be folded into the respective other patches?) What could be the next steps to keep this moving along, other than stare at the remaining patches until we're content with them? ;-)
(Thanks for the commit, Peter!) On Wed, Sep 11, 2024 at 6:44 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 11 Sep 2024, at 09:37, Peter Eisentraut <peter@eisentraut.org> wrote: > > > Is there any sense in dealing with the libpq and backend patches separately in sequence, or is this split just for easeof handling? > > I think it's just make reviewing a bit easier. At this point I think they can > be merged together, it's mostly out of historic reasons IIUC since the patchset > earlier on supported more than one library. I can definitely do that (and yeah, it was to make the review slightly less daunting). The server side could potentially be committed independently, if you want to parallelize a bit, but it'd have to be torn back out if the libpq stuff didn't land in time. > > (I suppose the 0004 "review comments" patch should be folded into the respective other patches?) Yes. I'm using that patch as a holding area while I write tests for the hunks, and then moving them backwards. > I added a warning to autconf in case --with-oauth is used without --with-python > since this combination will error out in running the tests. Might be > superfluous but I had an embarrassingly long headscratcher myself as to why the > tests kept failing =) Whoops, sorry. I guess we should just skip them if Python isn't there? > CURL_IGNORE_DEPRECATION(x;) broke pgindent, it needs to keep the semicolon on > the outside like CURL_IGNORE_DEPRECATION(x);. This doesn't really work well > with how the macro is defined, not sure how we should handle that best (the > attached makes the style as per how pgindent want's it with the semicolon > returned). Ugh... maybe a case for a pre_indent rule in pgindent? > The oauth_validator test module need to load Makefile.global before exporting > the symbols from there. Hm. Why was that passing the CI, though...? > There is a first stab at documenting the validator module API, more to come (it > doesn't compile right now). > > It contains a pgindent and pgperltidy run to keep things as close to in final > sync as we can to catch things like the curl deprecation macro mentioned above > early. Thanks! > > What could be the next steps to keep this moving along, other than stare at the remaining patches until we're contentwith them? ;-) > > I'm in the "stare at things" stage now to try and get this into the tree =) Yeah, and I still owe you all an updated roadmap. While I fix up the tests, I've also been picking away at the JSON encoding problem that was mentioned in [1]; the recent SASLprep fix was fallout from that, since I'm planning to pull in pieces of its UTF-8 validation. I will eventually want to fuzz the heck out of this. > To further pick away at this huge patch I propose to merge the SASL message > length hunk which can be extracted separately. The attached .txt (to keep the > CFBot from poking at it) contains a diff which can be committed ahead of the > rest of this patch to make it a tad smaller and to keep the history of that > change a bit clearer. LGTM! -- Peter asked me if there were plans to provide a "standard" validator module, say as part of contrib. The tricky thing is that Bearer validation is issuer-specific, and many providers give you an opaque token that you're not supposed to introspect at all. We could use token introspection (RFC 7662) for online verification, but last I looked at it, no one had actually implemented those endpoints. For offline verification, I think the best we could do would be to provide a generic JWT Profile (RFC 9068) validator, but again I don't know if anyone is actually providing those token formats in practice. I'm inclined to push that out into the future. Thanks, --Jacob [1] https://www.postgresql.org/message-id/ZjxQnOD1OoCkEeMN%40paquier.xyz
On Wed, Sep 11, 2024 at 3:54 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > Yeah, and I still owe you all an updated roadmap. Okay, here goes. New reviewers: start here! == What is This? == OAuth 2.0 is a way for a trusted third party (a "provider") to tell a server whether a client on the other end of the line is allowed to do something. This patchset adds OAuth support to libpq with libcurl, provides a server-side API so that extension modules can add support for specific OAuth providers, and extends our SASL support to carry the OAuth access tokens over the OAUTHBEARER mechanism. Most OAuth clients use a web browser to perform the third-party handshake. (These are your "Okta logins", "sign in with XXX", etc.) But there are plenty of people who use psql without a local browser, and invoking a browser safely across all supported platforms is actually surprisingly fraught. So this patchset implements something called device authorization, where the client will display a link and a code, and then you can log in on whatever device is convenient for you. Once you've told your provider that you trust libpq to connect to Postgres on your behalf, it'll give libpq an access token, and libpq will forward that on to the server. == How This Fits, or: The Sales Pitch == The most popular third-party auth methods we have today are probably the Kerberos family (AD/GSS/SSPI) and LDAP. If you're not already in an MS ecosystem, it's unlikely that you're using the former. And users of the latter are, in my experience, more-or-less resigned to its use, in spite of LDAP's architectural security problems and the fact that you have to run weird synchronization scripts to tell Postgres what certain users are allowed to do. OAuth provides a decently mature and widely-deployed third option. You don't have to be running the infrastructure yourself, as long as you have a provider you trust. If you are running your own infrastructure (or if your provider is configurable), the tokens being passed around can carry org-specific user privileges, so that Postgres can figure out who's allowed to do what without out-of-band synchronization scripts. And those access tokens are a straight upgrade over passwords: even if they're somehow stolen, they are time-limited, they are optionally revocable, and they can be scoped to specific actions. == Extension Points == This patchset provides several points of customization: Server-side validation is farmed out entirely to an extension, which we do not provide. (Each OAuth provider is free to come up with its own proprietary method of verifying its access tokens, and so far the big players have absolutely not standardized.) Depending on the provider, the extension may need to contact an external server to see what the token has been authorized to do, or it may be able to do that offline using signing keys and an agreed-upon token format. The client driver using libpq may replace the device authorization prompt (which by default is done on standard error), for example to move it into an existing GUI, display a scannable QR code instead of a link, and so on. The driver may also replace the entire OAuth flow. For example, a client that already interacts with browsers may be able to use one of the more standard web-based methods to get an access token. And clients attached to a service rather than an end user could use a more straightforward server-to-server flow, with pre-established credentials. == Architecture == The client needs to speak HTTP, which is implemented entirely with libcurl. Originally, I used another OAuth library for rapid prototyping, but the quality just wasn't there and I ported the implementation. An internal abstraction layer remains in the libpq code, so if a better client library comes along, switching to it shouldn't be too painful. The client-side hooks all go through a single extension point, so that we don't continually add entry points in the API for each new piece of authentication data that a driver may be able to provide. If we wanted to, we could potentially move the existing SSL passphrase hook into that, or even handle password retries within libpq itself, but I don't see any burning reason to do that now. I wanted to make sure that OAuth could be dropped into existing deployments without driver changes. (Drivers will probably *want* to look at the extension hooks for better UX, but they shouldn't necessarily *have* to.) That has driven several parts of the design. Drivers using the async APIs should continue to work without blocking, even during the long HTTP handshakes. So the new client code is structured as a typical event-driven state machine (similar to PQconnectPoll). The protocol machine hands off control to the OAuth machine during authentication, without really needing to know how it works, because the OAuth machine replaces the PQsocket with a general-purpose multiplexer that handles all of the HTTP sockets and events. Once that's completed, the OAuth machine hands control right back and we return to the Postgres protocol on the wire. This decision led to a major compromise: Windows client support is nonexistent. Multiplexer handles exist in Windows (for example with WSAEventSelect, IIUC), but last I checked they were completely incompatible with Winsock select(), which means existing async-aware drivers would fail. We could compromise by providing synchronous-only support, or by cobbling together a socketpair plus thread pool (or IOCP?), or simply by saying that existing Windows clients need a new API other than PQsocket() to be able to work properly. None of those approaches have been attempted yet, though. == Areas of Concern == Here are the iffy things that a committer is signing up for: The client implementation is roughly 3k lines, requiring domain knowledge of Curl, HTTP, JSON, and OAuth, the specifications of which are spread across several separate standards bodies. (And some big providers ignore those anyway.) The OAUTHBEARER mechanism is extensible, but not in the same way as HTTP. So sometimes, it looks like people design new OAuth features that rely heavily on HTTP and forget to "port" them over to SASL. That may be a point of future frustration. C is not really anyone's preferred language for implementing an extensible authn/z protocol running on top of HTTP, and constant vigilance is going to be required to maintain safety. What's more, we don't really "trust" the endpoints we're talking to in the same way that we normally trust our servers. It's a fairly hostile environment for maintainers. Along the same lines, our JSON implementation assumes some level of trust in the JSON data -- which is true for the backend, and can be assumed for a DBA running our utilities, but is absolutely not the case for a libpq client downloading data from Some Server on the Internet. I've been working to fuzz the implementation and there are a few known problems registered in the CF already. Curl is not a lightweight dependency by any means. Typically, libcurl is configured with a wide variety of nice options, a tiny subset of which we're actually going to use, but all that code (and its transitive dependencies!) is going to arrive in our process anyway. That might not be a lot of fun if you're not using OAuth. It's possible that the application embedding libpq is also a direct client of libcurl. We need to make sure we're not stomping on their toes at any point. == TODOs/Known Issues == The client does not deal with verification failure well at the moment; it just keeps retrying with a new OAuth handshake. Some people are not going to be okay with just contacting any web server that Postgres tells them to. There's a more paranoid mode sketched out that lets the connection string specify the trusted issuer, but it's not complete. The new code still needs to play well with orthogonal connection options, like connect_timeout and require_auth. The server does not deal well with multi-issuer setups yet. And you only get one oauth_validator_library... Harden, harden, harden. There are still a handful of inline TODOs around double-checking certain pieces of the response before continuing with the handshake. Servers should not be able to run our recursive descent parser out of stack. And my JSON code is using assertions too liberally, which will turn bugs into DoS vectors. I've been working to fit a fuzzer into more and more places, and I'm hoping to eventually drive it directly from the socket. Documentation still needs to be filled in. (Thanks Daniel for your work here!) == Future Features == There is no support for token caching (refresh or otherwise). Each new connection needs a new approval, and the only way to change that for v1 is to replace the entire flow. I think that's eventually going to annoy someone. The question is, where do you persist it? Does that need to be another extensibility point? We already have pretty good support for client certificates, and it'd be great if we could bind our tokens to those. That way, even if you somehow steal the tokens, you can't do anything with them without the private key! But the state of proof-of-possession in OAuth is an absolute mess, involving at least three competing standards (Token Binding, mTLS, DPoP). I don't know what's going to win. -- Hope this helps! Next I'll be working to fold the patches together, as discussed upthread. Thanks, --Jacob
Jacob Champion <jacob.champion@enterprisedb.com> wrote: > Peter asked me if there were plans to provide a "standard" validator > module, say as part of contrib. The tricky thing is that Bearer > validation is issuer-specific, and many providers give you an opaque > token that you're not supposed to introspect at all. > > We could use token introspection (RFC 7662) for online verification, > but last I looked at it, no one had actually implemented those > endpoints. For offline verification, I think the best we could do > would be to provide a generic JWT Profile (RFC 9068) validator, but > again I don't know if anyone is actually providing those token formats > in practice. I'm inclined to push that out into the future. Have you considered sending the token for validation to the server, like this curl -X GET "https://www.googleapis.com/oauth2/v3/userinfo" -H "Authorization: Bearer $TOKEN" and getting the userid (e.g. email address) from the response, as described in [1]? ISTM that this is what pgadmin4 does - in paricular, see the get_user_profile() function in web/pgadmin/authenticate/oauth2.py. [1] https://www.oauth.com/oauth2-servers/signing-in-with-google/verifying-the-user-info/ -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Fri, Sep 27, 2024 at 10:58 AM Antonin Houska <ah@cybertec.at> wrote: > Have you considered sending the token for validation to the server, like this > > curl -X GET "https://www.googleapis.com/oauth2/v3/userinfo" -H "Authorization: Bearer $TOKEN" In short, no, but I'm glad you asked. I think it's going to be a common request, and I need to get better at explaining why it's not safe, so we can document it clearly. Or else someone can point out that I'm misunderstanding, which honestly would make all this much easier and less complicated. I would love to be able to do it that way. We cannot, for the same reason libpq must send the server an access token instead of an ID token. The /userinfo endpoint tells you who the end user is, but it doesn't tell you whether the Bearer is actually allowed to access the database. That difference is critical: it's entirely possible for an end user to be authorized to access the database, *and yet* the Bearer token may not actually carry that authorization on their behalf. (In fact, the user may have actively refused to give the Bearer that permission.) That's why people are so pedantic about saying that OAuth is an authorization framework and not an authentication framework. To illustrate, think about all the third-party web services out there that ask you to Sign In with Google. They ask Google for permission to access your personal ID, and Google asks you if you're okay with that, and you either allow or deny it. Now imagine that I ran one of those services, and I decided to become evil. I could take my legitimately acquired Bearer token -- which should only give me permission to query your Google ID -- and send it to a Postgres database you're authorized to access. The server is supposed to introspect it, say, "hey, this token doesn't give the bearer access to the database at all," and shut everything down. For extra credit, the server could notice that the client ID tied to the access token isn't even one that it recognizes! But if all the server does is ask Google, "what's the email address associated with this token's end user?", then it's about to make some very bad decisions. The email address it gets back doesn't belong to Jacob the Evil Bearer; it belongs to you. Now, the token introspection endpoint I mentioned upthread should give us the required information (scopes, etc.). But Google doesn't implement that one. In fact they don't seem to have implemented custom scopes at all in the years since I started work on this feature, which makes me think that people are probably not going to be able to safely log into Postgres using Google tokens. Hopefully there's some feature buried somewhere that I haven't seen. Let me know if that makes sense. (And again: I'd love to be proven wrong. It would improve the reach of the feature considerably if I am.) Thanks, --Jacob
Jacob Champion <jacob.champion@enterprisedb.com> wrote: > On Fri, Sep 27, 2024 at 10:58 AM Antonin Houska <ah@cybertec.at> wrote: > > Have you considered sending the token for validation to the server, like this > > > > curl -X GET "https://www.googleapis.com/oauth2/v3/userinfo" -H "Authorization: Bearer $TOKEN" > > In short, no, but I'm glad you asked. I think it's going to be a > common request, and I need to get better at explaining why it's not > safe, so we can document it clearly. Or else someone can point out > that I'm misunderstanding, which honestly would make all this much > easier and less complicated. I would love to be able to do it that > way. > > We cannot, for the same reason libpq must send the server an access > token instead of an ID token. The /userinfo endpoint tells you who the > end user is, but it doesn't tell you whether the Bearer is actually > allowed to access the database. That difference is critical: it's > entirely possible for an end user to be authorized to access the > database, *and yet* the Bearer token may not actually carry that > authorization on their behalf. (In fact, the user may have actively > refused to give the Bearer that permission.) > That's why people are so pedantic about saying that OAuth is an > authorization framework and not an authentication framework. This statement alone sounds as if you missed *authentication*, but you seem to admit above that the /userinfo endpoint provides it ("tells you who the end user is"). I agree that it does. My understanding is that this endpoint, as well as the concept of "claims" and "scopes", is introduced by OpenID, which is an *authentication* framework, although it's built on top of OAuth. Regarding *authorization*, I agree that the bearer token may not contain enough information to determine whether the owner of the token is allowed to access the database. However, I consider database a special kind of "application", which can handle authorization on its own. In this case, the authorization can be controlled by (not) assigning the user the LOGIN attribute, as well as by (not) granting it privileges on particular database objects. In short, I think that *authentication* is all we need. > To illustrate, think about all the third-party web services out there > that ask you to Sign In with Google. They ask Google for permission to > access your personal ID, and Google asks you if you're okay with that, > and you either allow or deny it. Now imagine that I ran one of those > services, and I decided to become evil. I could take my legitimately > acquired Bearer token -- which should only give me permission to query > your Google ID -- and send it to a Postgres database you're authorized > to access. > > The server is supposed to introspect it, say, "hey, this token doesn't > give the bearer access to the database at all," and shut everything > down. For extra credit, the server could notice that the client ID > tied to the access token isn't even one that it recognizes! But if all > the server does is ask Google, "what's the email address associated > with this token's end user?", then it's about to make some very bad > decisions. The email address it gets back doesn't belong to Jacob the > Evil Bearer; it belongs to you. Are you sure you can legitimately acquire the bearer token containing my email address? I think the email address returned by the /userinfo endpoint is one of the standard claims [1]. Thus by returning the particular value of "email" from the endpoint the identity provider asserts that the token owner does have this address. (And that, if "email_verified" claim is "true", it spent some effort to verify that the email address is controlled by that user.) > Now, the token introspection endpoint I mentioned upthread Can you please point me to the particular message? > should give us the required information (scopes, etc.). But Google doesn't > implement that one. In fact they don't seem to have implemented custom > scopes at all in the years since I started work on this feature, which makes > me think that people are probably not going to be able to safely log into > Postgres using Google tokens. Hopefully there's some feature buried > somewhere that I haven't seen. > > Let me know if that makes sense. (And again: I'd love to be proven > wrong. It would improve the reach of the feature considerably if I > am.) Another question, assuming the token verification is resolved somehow: wouldn't it be sufficient for the initial implementation if the client could pass the bearer token to libpq in the connection string? Obviously, one use case is than an application / web server which needs the token to authenticate the user could eventually pass the token to the database server. Thus, if users could authenticate to the database using their individual ids, it would no longer be necessary to store a separate userid / password for the application in a configuration file. Also, if libpq accepted the bearer token via the connection string, it would be possible to implement the authorization as a separate front-end application (e.g. pg_oauth_login) rather than adding more complexity to libpq itself. (I'm learning this stuff on-the-fly, so there might be something naive in my comments.) [1] https://openid.net/specs/openid-connect-core-1_0.html#StandardClaims -- Antonin Houska Web: https://www.cybertec-postgresql.com
Antonin Houska <ah@cybertec.at> wrote: > Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > Now, the token introspection endpoint I mentioned upthread > > Can you please point me to the particular message? Please ignore this dumb question. You probably referred to the email I was responding to. -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Mon, Sep 30, 2024 at 6:38 AM Antonin Houska <ah@cybertec.at> wrote: > > Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > > On Fri, Sep 27, 2024 at 10:58 AM Antonin Houska <ah@cybertec.at> wrote: > > That's why people are so pedantic about saying that OAuth is an > > authorization framework and not an authentication framework. > > This statement alone sounds as if you missed *authentication*, but you seem to > admit above that the /userinfo endpoint provides it ("tells you who the end > user is"). I agree that it does. My understanding is that this endpoint, as > well as the concept of "claims" and "scopes", is introduced by OpenID, which > is an *authentication* framework, although it's built on top of OAuth. OpenID is an authentication framework, but it's generally focused on a type of client known as a Relying Party. In the architecture of this patchset, the Relying Party would be libpq, which has the option of retrieving authentication claims from the provider. Unfortunately for us, libpq has no use for those claims. It's not trying to authenticate the user for its own purposes. The Postgres server, on the other hand, is not a Relying Party. (It's an OAuth resource server, in this architecture.) It's not performing any of the OIDC flows, it's not talking to the end user and the provider at the same time, and it is very restricted in its ability to influence the client exchange via the SASL mechanism. > Regarding *authorization*, I agree that the bearer token may not contain > enough information to determine whether the owner of the token is allowed to > access the database. However, I consider database a special kind of > "application", which can handle authorization on its own. In this case, the > authorization can be controlled by (not) assigning the user the LOGIN > attribute, as well as by (not) granting it privileges on particular database > objects. In short, I think that *authentication* is all we need. Authorizing the *end user's* access to the database using scopes is optional. Authorizing the *bearer's* ability to connect on behalf of the end user, however, is mandatory. Hopefully the below clarifies. (I agree that most people probably want to use authentication, so that the database can then make decisions based on HBA settings. OIDC is a fine way to do that.) > Are you sure you can legitimately acquire the bearer token containing my email > address? Yes. In general that's how OpenID-based "Sign in with <Service>" works. All those third-party services are running around with tokens that identify you, but unless they've asked for more abilities and you've granted them the associated scopes, identifying you is all they can do. > I think the email address returned by the /userinfo endpoint is one > of the standard claims [1]. Thus by returning the particular value of "email" > from the endpoint the identity provider asserts that the token owner does have > this address. We agree that /userinfo gives authentication claims for the end user. It's just insufficient for our use case. For example, there are enterprise applications out there that will ask for read access to your Google Calendar. If you're willing to grant that, then you probably won't mind if those applications also know your email address, but you probably do mind if they're suddenly able to access your production databases just because you gave them your email. Put another way: if you log into Postgres using OAuth, and your provider doesn't show you a big message saying "this application is about to access *your* prod database using *your* identity; do you want to allow that?", then your DBA has deployed a really dangerous configuration. That's a critical protection feature you get from your OAuth provider. Otherwise, what's stopping somebody else from setting up their own malicious service to farm access tokens? All they'd have to do is ask for your email. > Another question, assuming the token verification is resolved somehow: > wouldn't it be sufficient for the initial implementation if the client could > pass the bearer token to libpq in the connection string? It was discussed wayyy upthread: https://postgr.es/m/CAAWbhmhmBe9v3aCffz5j8Sg4HMWWkB5FvTDCSZ_Vh8E1fX91Gw%40mail.gmail.com Basically, at that point the entire implementation becomes an exercise for the reader. I want to avoid that if possible. I'm not adamantly opposed to it, but I think the client-side hook implementation is going to be better for the use cases that have been discussed so far. > Also, if libpq accepted the bearer token via the connection string, it would > be possible to implement the authorization as a separate front-end application > (e.g. pg_oauth_login) rather than adding more complexity to libpq itself. The application would still need to parse the server error response. There was (a small) consensus at the time [1] that parsing error messages for that purpose would be really unpleasant; hence the hook architecture. > (I'm learning this stuff on-the-fly, so there might be something naive in my > comments.) No worries! Please keep the questions coming; this OAuth architecture is unintuitive, and I need to be able to defend it. Thanks, --Jacob [1] https://postgr.es/m/CACrwV54_euYe%2Bv7bcLrxnje-JuM%3DKRX5azOcmmrXJ5qrffVZfg%40mail.gmail.com
Jacob Champion <jacob.champion@enterprisedb.com> wrote: > On Mon, Sep 30, 2024 at 6:38 AM Antonin Houska <ah@cybertec.at> wrote: > > > > Are you sure you can legitimately acquire the bearer token containing my email > > address? > > Yes. In general that's how OpenID-based "Sign in with <Service>" > works. All those third-party services are running around with tokens > that identify you, but unless they've asked for more abilities and > you've granted them the associated scopes, identifying you is all they > can do. > > > I think the email address returned by the /userinfo endpoint is one > > of the standard claims [1]. Thus by returning the particular value of "email" > > from the endpoint the identity provider asserts that the token owner does have > > this address. > > We agree that /userinfo gives authentication claims for the end user. > It's just insufficient for our use case. > > For example, there are enterprise applications out there that will ask > for read access to your Google Calendar. If you're willing to grant > that, then you probably won't mind if those applications also know > your email address, but you probably do mind if they're suddenly able > to access your production databases just because you gave them your > email. > > Put another way: if you log into Postgres using OAuth, and your > provider doesn't show you a big message saying "this application is > about to access *your* prod database using *your* identity; do you > want to allow that?", then your DBA has deployed a really dangerous > configuration. That's a critical protection feature you get from your > OAuth provider. Otherwise, what's stopping somebody else from setting > up their own malicious service to farm access tokens? All they'd have > to do is ask for your email. Perhaps I understand now. I use getmail [2] to retrieve email messages from my Google account. What made me confused is that the getmail application, although installed on my workstation (and thus the bearer token it eventually gets contains my email address), it's "someone else" (in particular the "Relying Party") from the perspective of the OpenID protocol. And the same applies to "psql" in the context of your patch. Thus, in addition to the email, we'd need special claims which authorize the RPs to access the database and only the database. Does this sound correct? > > (I'm learning this stuff on-the-fly, so there might be something naive in my > > comments.) > > No worries! Please keep the questions coming; this OAuth architecture > is unintuitive, and I need to be able to defend it. I'd like to play with the code a bit and provide some review before or during the next CF. That will probably generate some more questions. > > [1] https://postgr.es/m/CACrwV54_euYe%2Bv7bcLrxnje-JuM%3DKRX5azOcmmrXJ5qrffVZfg%40mail.gmail.com [2] https://github.com/getmail6/getmail6/ -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Tue, Oct 8, 2024 at 3:46 AM Antonin Houska <ah@cybertec.at> wrote: > Perhaps I understand now. I use getmail [2] to retrieve email messages from my > Google account. What made me confused is that the getmail application, > although installed on my workstation (and thus the bearer token it eventually > gets contains my email address), it's "someone else" (in particular the > "Relying Party") from the perspective of the OpenID protocol. And the same > applies to "psql" in the context of your patch. > > Thus, in addition to the email, we'd need special claims which authorize the > RPs to access the database and only the database. Does this sound correct? Yes. (One nitpick: the "special claims" in this case are not OpenID claims at all, but OAuth scopes. The HBA will be configured with the list of scopes that the server requires, and it requests those from the client during the SASL handshake.) > I'd like to play with the code a bit and provide some review before or during > the next CF. That will probably generate some more questions. Thanks very much for the review! --Jacob
Hello Peter, 11.09.2024 10:37, Peter Eisentraut wrote: > > This has been committed. > I've discovered that starting from 0785d1b8b, make check -C src/bin/pg_combinebackup fails under Valgrind, with the following diagnostics: 2024-10-15 14:29:52.883 UTC [3338981] 002_compare_backups.pl STATEMENT: UPLOAD_MANIFEST ==00:00:00:20.028 3338981== Conditional jump or move depends on uninitialised value(s) ==00:00:00:20.028 3338981== at 0xA3E68F: json_lex (jsonapi.c:1496) ==00:00:00:20.028 3338981== by 0xA3ED13: json_lex (jsonapi.c:1666) ==00:00:00:20.028 3338981== by 0xA3D5AF: pg_parse_json_incremental (jsonapi.c:822) ==00:00:00:20.028 3338981== by 0xA40ECF: json_parse_manifest_incremental_chunk (parse_manifest.c:194) ==00:00:00:20.028 3338981== by 0x31656B: FinalizeIncrementalManifest (basebackup_incremental.c:237) ==00:00:00:20.028 3338981== by 0x73B4A4: UploadManifest (walsender.c:709) ==00:00:00:20.028 3338981== by 0x73DF4A: exec_replication_command (walsender.c:2185) ==00:00:00:20.028 3338981== by 0x7C58C3: PostgresMain (postgres.c:4762) ==00:00:00:20.028 3338981== by 0x7BBDA7: BackendMain (backend_startup.c:107) ==00:00:00:20.028 3338981== by 0x6CF60F: postmaster_child_launch (launch_backend.c:274) ==00:00:00:20.028 3338981== by 0x6D546F: BackendStartup (postmaster.c:3415) ==00:00:00:20.028 3338981== by 0x6D2B21: ServerLoop (postmaster.c:1648) ==00:00:00:20.028 3338981== (Initializing dummy_lex.inc_state = NULL; before partial_result = json_lex(&dummy_lex); makes these TAP tests pass for me.) Best regards, Alexander
On 15.10.24 20:10, Jacob Champion wrote: > On Tue, Oct 15, 2024 at 11:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: >> I've discovered that starting from 0785d1b8b, >> make check -C src/bin/pg_combinebackup >> fails under Valgrind, with the following diagnostics: > > Yep, sorry for that (and thanks for the report!). It's currently > tracked over at [1], but I should have mentioned it here. The patch I > used is attached, renamed to not stress out the cfbot. I have committed this fix.
Antonin Houska <ah@cybertec.at> wrote: > I'd like to play with the code a bit and provide some review before or during > the next CF. That will probably generate some more questions. This is the 1st round, based on reading the code. I'll continue paying attention to the project and possibly post some more comments in the future. * Information on the new method should be added to pg_hba.conf.sample.method. * Is it important that fe_oauth_state.token also contains the "Bearer" keyword? I'd expect only the actual token value here. The keyword can be added to the authentication message w/o storing it. The same applies to the 'token' structure in fe-auth-oauth-curl.c. * Does PQdefaultAuthDataHook() have to be declared extern and exported via libpq/exports.txt ? Even if the user was interested in it, he can use PQgetAuthDataHook() to get the pointer (unless he already installed his custom hook). * I wonder if the hooks (PQauthDataHook) can be implemented in a separate diff. Couldn't the first version of the feature be commitable without these hooks? * Instead of allocating an instance of PQoauthBearerRequest, assigning it to fe_oauth_state.async_ctx, and eventually having to all its cleanup() function, wouldn't it be simpler to embed PQoauthBearerRequest as a member in fe_oauth_state ? * oauth_validator_library is defined as PGC_SIGHUP - is that intentional? And regardless, the library appears to be loaded by every backend during authentication. Why isn't it loaded by postmaster like libraries listed in shared_preload_libraries? fork() would then ensure that the backends do have the library in their address space. * pg_fe_run_oauth_flow() When first time here case OAUTH_STEP_TOKEN_REQUEST: if (!handle_token_response(actx, &state->token)) goto error_return; the user hasn't been prompted yet so ISTM that the first token request must always fail. It seems more logical if the prompt is set to the user before sending the token request to the server. (Although the user probably won't be that fast to make the first request succeed, so consider this just a hint.) * As long as I understand, the following comment would make sense: diff --git a/src/interfaces/libpq/fe-auth-oauth.c b/src/interfaces/libpq/fe-auth-oauth.c index f943a31cc08..97259fb5654 100644 --- a/src/interfaces/libpq/fe-auth-oauth.c +++ b/src/interfaces/libpq/fe-auth-oauth.c @@ -518,6 +518,7 @@ oauth_exchange(void *opaq, bool final, switch (state->state) { case FE_OAUTH_INIT: + /* Initial Client Response */ Assert(inputlen == -1); if (!derive_discovery_uri(conn)) Or, doesn't the FE_OAUTH_INIT branch of the switch statement actually fit better into oauth_init()? A side-effect of that might be (I only judge from reading the code, haven't tried to implement this suggestion) that oauth_exchange() would no longer return the SASL_ASYNC status. Furthermore, I'm not sure if pg_SASL_continue() can receive the SASL_ASYNC at all. So I wonder if moving that part from oauth_exchange() to oauth_init() would make the SASL_ASYNC state unnecessary. * Finally, the user documentation is almost missing. I say that just for the sake of completeness, you obviously know it. (On the other hand, I think that the lack of user information might discourage some people from running the code and testing it.) -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Thu, Oct 17, 2024 at 10:51 PM Antonin Houska <ah@cybertec.at> wrote: > This is the 1st round, based on reading the code. I'll continue paying > attention to the project and possibly post some more comments in the future. Thanks again for the reviews! > * Information on the new method should be added to pg_hba.conf.sample.method. Whoops, this will be fixed in v34. > * Is it important that fe_oauth_state.token also contains the "Bearer" > keyword? I'd expect only the actual token value here. The keyword can be > added to the authentication message w/o storing it. > > The same applies to the 'token' structure in fe-auth-oauth-curl.c. Excellent question; I've waffled a bit on that myself. I think you're probably right, but here's some background on why I originally made that decision. RFC 7628 defines not only OAUTHBEARER but also a generic template for future OAuth-based SASL methods, and as part of that, the definition of the "auth" key is incredibly vague: auth (REQUIRED): The payload that would be in the HTTP Authorization header if this OAuth exchange was being carried out over HTTP. I was worried that forcing a specific format would prevent future extensibility, if say the Bearer scheme were updated to add additional auth-params. I was also wondering if maybe a future specification would allow OAUTHBEARER to carry a different scheme altogether, such as DPoP [1]. However: - auth-param support for Bearer was considered at the draft stage and explicitly removed, with the old drafts stating "If additional parameters are needed in the future, a different scheme would need to be defined." - I think the intent of RFC 7628 is that a new SASL mechanism will be named for each new scheme (even if the new scheme shares all of the bones of the old one). So DPoP tokens wouldn't piggyback on OAUTHBEARER, and instead something like an OAUTHDPOP mech would need to be defined. So: the additional complexity in the current API is probably a YAGNI violation, and I should just hardcode the Bearer format as you suggest. Any future OAuth SASL mechanisms we support will have to go through a different PQAUTHDATA type, e.g. PQAUTHDATA_OAUTH_DPOP_TOKEN. And I'll need to make sure that I'm not improperly coupling the concepts elsewhere in the API. > * Does PQdefaultAuthDataHook() have to be declared extern and exported via > libpq/exports.txt ? Even if the user was interested in it, he can use > PQgetAuthDataHook() to get the pointer (unless he already installed his > custom hook). I guess I don't have a strongly held opinion, but is there a good reason not to? Exposing it means that a client application may answer questions like "is the current hook set to the default?" and so on. IME, hook-chain maintenance is not a lot of fun in general, and having more visibility can be nice for third-party developers. > * I wonder if the hooks (PQauthDataHook) can be implemented in a separate > diff. Couldn't the first version of the feature be commitable without these > hooks? I am more than happy to split things up as needed! But in the end, I think this is a question that can only be answered by the first brave committer to take a bite. :) (The original patchset didn't have these hooks; they were added as a compromise, to prevent the builtin implementation from having to be all things for all people.) > * Instead of allocating an instance of PQoauthBearerRequest, assigning it to > fe_oauth_state.async_ctx, and eventually having to all its cleanup() > function, wouldn't it be simpler to embed PQoauthBearerRequest as a member > in fe_oauth_state ? Hmm, that would maybe be simpler. But you'd still have to call cleanup() and set the async_ctx, right? The primary gain would be in reducing the number of malloc calls. > * oauth_validator_library is defined as PGC_SIGHUP - is that intentional? Yes, I think it's going to be important to let DBAs migrate their authentication modules without a full restart. That probably deserves more explicit testing, now that you mention it. Is there a specific concern that you have with that? > And regardless, the library appears to be loaded by every backend during > authentication. Why isn't it loaded by postmaster like libraries listed in > shared_preload_libraries? fork() would then ensure that the backends do have > the library in their address space. It _can_ be, if you want -- there's nothing that I know of preventing the validator from also being preloaded with its own _PG_init(), is there? But I don't think it's a good idea to force that, for the same reason we want to allow SIGHUP. > * pg_fe_run_oauth_flow() > > When first time here > case OAUTH_STEP_TOKEN_REQUEST: > if (!handle_token_response(actx, &state->token)) > goto error_return; > > the user hasn't been prompted yet so ISTM that the first token request must > always fail. It seems more logical if the prompt is set to the user before > sending the token request to the server. (Although the user probably won't > be that fast to make the first request succeed, so consider this just a > hint.) That's also intentional -- if the first token response fails for a reason _other_ than "we're waiting for the user", then we want to immediately fail hard instead of making them dig out their phone and go on a two-minute trip, because they're going to come back and find that it was all for nothing. There's a comment immediately below the part you quoted that mentions this briefly; maybe I should move it up a bit? > * As long as I understand, the following comment would make sense: > > diff --git a/src/interfaces/libpq/fe-auth-oauth.c b/src/interfaces/libpq/fe-auth-oauth.c > index f943a31cc08..97259fb5654 100644 > --- a/src/interfaces/libpq/fe-auth-oauth.c > +++ b/src/interfaces/libpq/fe-auth-oauth.c > @@ -518,6 +518,7 @@ oauth_exchange(void *opaq, bool final, > switch (state->state) > { > case FE_OAUTH_INIT: > + /* Initial Client Response */ > Assert(inputlen == -1); > > if (!derive_discovery_uri(conn)) There are multiple "initial client response" cases, though. What questions are you hoping to clarify with the comment? Maybe we can find a more direct answer. > Or, doesn't the FE_OAUTH_INIT branch of the switch statement actually fit > better into oauth_init()? oauth_init() is the mechanism initialization for the SASL framework itself, which is shared with SCRAM. In the current architecture, the init callback doesn't take the initial client response into consideration at all. Generating the client response is up to the exchange callback -- and even if we moved the SASL_ASYNC processing elsewhere, I don't think we can get rid of its added complexity. Something has to signal upwards that it's time to transfer control to an async engine. And we can't make the asynchronicity a static attribute of the mechanism itself, because we can skip the flow if something gives us a cached token. > * Finally, the user documentation is almost missing. I say that just for the > sake of completeness, you obviously know it. (On the other hand, I think > that the lack of user information might discourage some people from running > the code and testing it.) Yeah, the catch-22 of writing huge features... By the way, if anyone's reading along and dissuaded by the lack of docs, please say so! (Daniel has been helping me out so much with the docs; thanks again, Daniel.) --Jacob [1] https://datatracker.ietf.org/doc/html/rfc9449
On Fri, Oct 18, 2024 at 4:38 AM Daniel Gustafsson <daniel@yesql.se> wrote: > In validate() it seems to me we should clear out ret->authn_id on failure to > pair belts with suspenders. Fixed by calling explicit_bzero on it in the error > path. The new hunk says: > cleanup: > /* > * Clear and free the validation result from the validator module once > * we're done with it to avoid accidental re-use. > */ > if (ret->authn_id != NULL) > { > explicit_bzero(ret->authn_id, strlen(ret->authn_id)); > pfree(ret->authn_id); > } > pfree(ret); But I'm not clear on what's being protected against. Which code would reuse this result? Thanks, --Jacob
On Mon, Oct 28, 2024 at 6:24 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > On 25 Oct 2024, at 20:22, Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > > I have combed almost all of Daniel's feedback backwards into the main > > patch (just the new bzero code remains, with the open question > > upthread), > > Re-reading I can't see a vector there, I guess I am just scarred from what > seemed to be harmless leaks in auth codepaths and treat every bit as > potentially important. Feel free to drop from the patchset for now. Okay. For authn_id specifically, which isn't secret and doesn't have any power unless it's somehow copied into the ClientConnectionInfo, I'm not sure that the bzero() gives us much. But I do see value in clearing out, say, the Bearer token once we're finished with it. Also in this validate() code path, I'm taking a look at the added memory management with the pfree(): 1. Should we add any more ceremony to the returned struct, to try to ensure that the ABI matches? Or is it good enough to declare that modules need to be compiled against a specific server version? 2. Should we split off a separate memory context to contain allocations made by the validator? > Looking more at the patchset I think we need to apply conditional compilation > of the backend for oauth like how we do with other opt-in schemes in configure > and meson. The attached .txt has a diff for making --with-oauth a requirement > for compiling support into backend libpq. Do we get the flexibility we need with that approach? With other opt-in schemes, the backend and the frontend both need some sort of third-party dependency, but that's not true for OAuth. I could see some people wanting to support an offline token validator on the server side but not wanting to build the HTTP dependency into their clients. I was considering going in the opposite direction: With the client hooks, a user could plug in their own implementation without ever having to touch the built-in flow, and I'm wondering if --with-oauth should really just be --with-builtin-oauth or similar. Then if the server sends OAUTHBEARER, the client only complains if it doesn't have a flow available to use, rather than checking USE_OAUTH. This kind of ties into the other big open question of "what do we do about users that don't want the additional overhead of something they're not using?" --Jacob
> On 28 Oct 2024, at 17:09, Jacob Champion <jacob.champion@enterprisedb.com> wrote: > On Mon, Oct 28, 2024 at 6:24 AM Daniel Gustafsson <daniel@yesql.se> wrote: >> Looking more at the patchset I think we need to apply conditional compilation >> of the backend for oauth like how we do with other opt-in schemes in configure >> and meson. The attached .txt has a diff for making --with-oauth a requirement >> for compiling support into backend libpq. > > Do we get the flexibility we need with that approach? With other > opt-in schemes, the backend and the frontend both need some sort of > third-party dependency, but that's not true for OAuth. I could see > some people wanting to support an offline token validator on the > server side but not wanting to build the HTTP dependency into their > clients. Currently we don't support any conditional compilation which only affects backend or frontend, all --without-XXX flags turn it off for both. Maybe this is something which should change but I'm not sure that property should be altered as part of a patch rather than discussed on its own merit. > I was considering going in the opposite direction: With the client > hooks, a user could plug in their own implementation without ever > having to touch the built-in flow, and I'm wondering if --with-oauth > should really just be --with-builtin-oauth or similar. Then if the > server sends OAUTHBEARER, the client only complains if it doesn't have > a flow available to use, rather than checking USE_OAUTH. This kind of > ties into the other big open question of "what do we do about users > that don't want the additional overhead of something they're not > using?" We already know that GSS cause measurable performance impact on connections even when compiled but not in use [0], so I think we should be careful about piling on more. -- Daniel Gustafsson [0] 20240610181212.auytluwmbfl7lb5n@awork3.anarazel.de
On Tue, Oct 29, 2024 at 3:52 AM Daniel Gustafsson <daniel@yesql.se> wrote: > Currently we don't support any conditional compilation which only affects > backend or frontend, all --without-XXX flags turn it off for both. I don't think that's strictly true; see --with-pam which affects only server-side code, since the hard part is in the server. Similarly, --with-oauth currently affects only client-side code. But in any case, that confusion is why I'm proposing a change to the option name. I chose --with-oauth way before the architecture solidified, and it doesn't reflect reality anymore. OAuth support on the server side doesn't require Curl, and likely never will. So if you want to support that on a Windows server, it's going to be strange if we also force you to build the client with a libcurl dependency that we won't even make use of on that platform. > We already know that GSS cause measurable performance impact on connections > even when compiled but not in use [0], so I think we should be careful about > piling on more. I agree, but if the server asks for OAUTHBEARER, that's the end of it. Either the client supports OAuth and initiates a token flow, or it doesn't and the connection fails. That's very different from the client-initiated transport negotiation. On the other hand, if we're concerned about the link-time overhead (time and/or RAM) of the new dependency, I think that's going to need something different from a build-time switch. My guess is that maintainers are only going to want to ship one libpq. Thanks, --Jacob
> On 29 Oct 2024, at 17:40, Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > On Tue, Oct 29, 2024 at 3:52 AM Daniel Gustafsson <daniel@yesql.se> wrote: >> Currently we don't support any conditional compilation which only affects >> backend or frontend, all --without-XXX flags turn it off for both. > > I don't think that's strictly true; see --with-pam which affects only > server-side code, since the hard part is in the server. Similarly, > --with-oauth currently affects only client-side code. Fair, maybe it's an unwarranted concern. Question is though, if we added PAM today would we have done the same? > But in any case, that confusion is why I'm proposing a change to the > option name. +1 -- Daniel Gustafsson
On Tue, Oct 29, 2024 at 10:41 AM Daniel Gustafsson <daniel@yesql.se> wrote: > Question is though, if we added PAM > today would we have done the same? I assume so; the client can't tell PAM apart from LDAP or any other plaintext method. (In the same vein, the server can't tell if the client uses libcurl to grab a token, or something entirely different.) --Jacob
Jacob Champion <jacob.champion@enterprisedb.com> wrote: > On Thu, Oct 17, 2024 at 10:51 PM Antonin Houska <ah@cybertec.at> wrote: > > * oauth_validator_library is defined as PGC_SIGHUP - is that intentional? > > Yes, I think it's going to be important to let DBAs migrate their > authentication modules without a full restart. That probably deserves > more explicit testing, now that you mention it. Is there a specific > concern that you have with that? No concern. I was just trying to imagine when the module needs to be changed. > > And regardless, the library appears to be loaded by every backend during > > authentication. Why isn't it loaded by postmaster like libraries listed in > > shared_preload_libraries? fork() would then ensure that the backends do have > > the library in their address space. > > It _can_ be, if you want -- there's nothing that I know of preventing > the validator from also being preloaded with its own _PG_init(), is > there? But I don't think it's a good idea to force that, for the same > reason we want to allow SIGHUP. Loading the library by postmaster does not prevent the backends from reloading it on SIGHUP later. I was simply concerned about performance. (I proposed loading the library at another stage of backend initialization rather than adding _PG_init() to it.) > > * pg_fe_run_oauth_flow() > > > > When first time here > > case OAUTH_STEP_TOKEN_REQUEST: > > if (!handle_token_response(actx, &state->token)) > > goto error_return; > > > > the user hasn't been prompted yet so ISTM that the first token request must > > always fail. It seems more logical if the prompt is set to the user before > > sending the token request to the server. (Although the user probably won't > > be that fast to make the first request succeed, so consider this just a > > hint.) > > That's also intentional -- if the first token response fails for a > reason _other_ than "we're waiting for the user", then we want to > immediately fail hard instead of making them dig out their phone and > go on a two-minute trip, because they're going to come back and find > that it was all for nothing. > > There's a comment immediately below the part you quoted that mentions > this briefly; maybe I should move it up a bit? That's fine, I understand now. > > * As long as I understand, the following comment would make sense: > > > > diff --git a/src/interfaces/libpq/fe-auth-oauth.c b/src/interfaces/libpq/fe-auth-oauth.c > > index f943a31cc08..97259fb5654 100644 > > --- a/src/interfaces/libpq/fe-auth-oauth.c > > +++ b/src/interfaces/libpq/fe-auth-oauth.c > > @@ -518,6 +518,7 @@ oauth_exchange(void *opaq, bool final, > > switch (state->state) > > { > > case FE_OAUTH_INIT: > > + /* Initial Client Response */ > > Assert(inputlen == -1); > > > > if (!derive_discovery_uri(conn)) > > There are multiple "initial client response" cases, though. What > questions are you hoping to clarify with the comment? Maybe we can > find a more direct answer. Easiness of reading is the only "question" here :-) It's might not always be obvious why a variable should have some particular value. In general, the Assert() statements are almost always preceded with a comment in the PG source. > > Or, doesn't the FE_OAUTH_INIT branch of the switch statement actually fit > > better into oauth_init()? > > oauth_init() is the mechanism initialization for the SASL framework > itself, which is shared with SCRAM. In the current architecture, the > init callback doesn't take the initial client response into > consideration at all. Sure. The FE_OAUTH_INIT branch in oauth_exchange() (FE) also does not generate the initial client response. Based on reading the SCRAM implementation, I concluded that the init() callback can do authentication method specific things, but unlike exchange() it does not generate any output. > Generating the client response is up to the exchange callback -- and > even if we moved the SASL_ASYNC processing elsewhere, I don't think we > can get rid of its added complexity. Something has to signal upwards > that it's time to transfer control to an async engine. And we can't > make the asynchronicity a static attribute of the mechanism itself, > because we can skip the flow if something gives us a cached token. I didn't want to skip the flow. I thought that the init() callback could be made responsible for getting the token, but forgot that it still needs some way to signal to the caller that the async flow is needed. Anyway, are you sure that pg_SASL_continue() can also receive the SASL_ASYNC value from oauth_exchange()? My understanding is that pg_SASL_init() receives it if there is no token, but after that, oauth_exchange() is not called util the token is available, and thus it should not return SASL_ASYNC anymore. -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Thu, Oct 31, 2024 at 4:05 AM Antonin Houska <ah@cybertec.at> wrote: > > > And regardless, the library appears to be loaded by every backend during > > > authentication. Why isn't it loaded by postmaster like libraries listed in > > > shared_preload_libraries? fork() would then ensure that the backends do have > > > the library in their address space. > > > > It _can_ be, if you want -- there's nothing that I know of preventing > > the validator from also being preloaded with its own _PG_init(), is > > there? But I don't think it's a good idea to force that, for the same > > reason we want to allow SIGHUP. > > Loading the library by postmaster does not prevent the backends from reloading > it on SIGHUP later. I was simply concerned about performance. (I proposed > loading the library at another stage of backend initialization rather than > adding _PG_init() to it.) Okay. I think this is going to be one of the slower authentication methods by necessity: the builtin flow in libpq requires a human in the loop, and an online validator is going to be making several HTTP calls from the backend. So if it turns out later that we need to optimize the backend logic, I'd prefer to have a case study in hand; otherwise I think we're likely to optimize the wrong things. > Easiness of reading is the only "question" here :-) It's might not always be > obvious why a variable should have some particular value. In general, the > Assert() statements are almost always preceded with a comment in the PG > source. Oh, an assertion label! I can absolutely add one; I originally thought you were proposing a label for the case itself. > > > Or, doesn't the FE_OAUTH_INIT branch of the switch statement actually fit > > > better into oauth_init()? > > > > oauth_init() is the mechanism initialization for the SASL framework > > itself, which is shared with SCRAM. In the current architecture, the > > init callback doesn't take the initial client response into > > consideration at all. > > Sure. The FE_OAUTH_INIT branch in oauth_exchange() (FE) also does not generate > the initial client response. It might, if it ends up falling through to FE_OAUTH_REQUESTING_TOKEN. There are two paths that can do that: the case where we have no discovery URI, and the case where a custom user flow returns a token synchronously (it was probably cached). > Anyway, are you sure that pg_SASL_continue() can also receive the SASL_ASYNC > value from oauth_exchange()? My understanding is that pg_SASL_init() receives > it if there is no token, but after that, oauth_exchange() is not called util > the token is available, and thus it should not return SASL_ASYNC anymore. Correct -- the only way for the current implementation of the OAUTHBEARER mechanism to return SASL_ASYNC is during the very first call. That's not an assumption I want to put into the higher levels, though; I think Michael will be unhappy with me if I introduce additional SASL coupling after the decoupling work that's been done over the last few releases. :D Thanks again, --Jacob
Hi there. zero knowledge of Oath, just reading through the v35-0001. forgive me if my comments are naive. +static int +parse_interval(struct async_ctx *actx, const char *interval_str) +{ + double parsed; + int cnt; + + /* + * The JSON lexer has already validated the number, which is stricter than + * the %f format, so we should be good to use sscanf(). + */ + cnt = sscanf(interval_str, "%lf", &parsed); + + if (cnt != 1) + { + /* + * Either the lexer screwed up or our assumption above isn't true, and + * either way a developer needs to take a look. + */ + Assert(cnt == 1); + return 1; /* don't fall through in release builds */ + } + + parsed = ceil(parsed); + + if (parsed < 1) + return actx->debugging ? 0 : 1; + + else if (INT_MAX <= parsed) + return INT_MAX; + + return parsed; +} The above Assert looks very wrong to me. we can also use PG_INT32_MAX, instead of INT_MAX (generally i think PG_INT32_MAX looks more intuitive to me) +/* + * The Device Authorization response, described by RFC 8628: + * + * https://www.rfc-editor.org/rfc/rfc8628#section-3.2 + */ +struct device_authz +{ + char *device_code; + char *user_code; + char *verification_uri; + char *interval_str; + + /* Fields below are parsed from the corresponding string above. */ + int interval; +}; click through the link https://www.rfc-editor.org/rfc/rfc8628#section-3.2 it says " expires_in REQUIRED. The lifetime in seconds of the "device_code" and "user_code". interval OPTIONAL. The minimum amount of time in seconds that the client SHOULD wait between polling requests to the token endpoint. If no value is provided, clients MUST use 5 as the default. " these two fields seem to differ from struct device_authz.
> On 4 Nov 2024, at 06:00, jian he <jian.universality@gmail.com> wrote: > + if (cnt != 1) > + { > + /* > + * Either the lexer screwed up or our assumption above isn't true, and > + * either way a developer needs to take a look. > + */ > + Assert(cnt == 1); > + return 1; /* don't fall through in release builds */ > + } > The above Assert looks very wrong to me. I think the point is to fail hard in development builds to ensure whatever caused the disconnect between the json lexer and sscanf parsing is looked at. It should probably be changed to Assert(false); which is the common pattern for erroring out like this. -- Daniel Gustafsson
On Sun, Nov 3, 2024 at 9:00 PM jian he <jian.universality@gmail.com> wrote: > The above Assert looks very wrong to me. I can switch to Assert(false) if that's preferred, but it makes part of the libc assert() report useless. (I wish we had more fluent ways to say "this shouldn't happen, but if it does, we still need to get out safely.") > we can also use PG_INT32_MAX, instead of INT_MAX > (generally i think PG_INT32_MAX looks more intuitive to me) That's a fixed-width max; we want the maximum for the `int` type here. > expires_in > REQUIRED. The lifetime in seconds of the "device_code" and > "user_code". > interval > OPTIONAL. The minimum amount of time in seconds that the client > SHOULD wait between polling requests to the token endpoint. If no > value is provided, clients MUST use 5 as the default. > " > these two fields seem to differ from struct device_authz. Yeah, Daniel and I had talked about being stricter about REQUIRED fields that are not currently used. There's a comment making note of this in parse_device_authz(). The v1 code will need to make expires_in REQUIRED, so that future developers can develop features that depend on it without worrying about breaking currently-working-but-noncompliant deployments. (And if there are any noncompliant deployments out there now, we need to know about them so we can have that explicit discussion.) Thanks, --Jacob
On Tue, Nov 5, 2024 at 3:33 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > Done in v36, attached. Forgot to draw attention to this part: > +# XXX libcurl must link after libgssapi_krb5 on FreeBSD to avoid segfaults > +# during gss_acquire_cred(). This is possibly related to Curl's Heimdal > +# dependency on that platform? Best I can tell, libpq for FreeBSD has a dependency diamond for GSS symbols: libpq links against MIT krb5, libcurl links against Heimdal, libpq links against libcurl. Link order becomes critical to avoid nasty segfaults, but I have not dug deeply into the root cause. --Jacob
On 06.11.24 00:33, Jacob Champion wrote: > Done in v36, attached. Assorted review comments from me: Everything in the commit message between = Debug Mode = and Several TODOs: should be moved to the documentation. In some cases, it already is, but it doesn't always have the same level of detail. (You could point from the commit message to .sgml files if you want to highlight usage instructions, but I don't think this is generally necessary.) * config/programs.m4 Can we do the libcurl detection using pkg-config only? Seems simpler, and maintains consistency to meson. * doc/src/sgml/client-auth.sgml In the list of terms (this could be a <variablelist>), state how these terms map to a PostgreSQL installation. You already explain what the client and the resource server are, but not who the resource owner is and what the authorization server is. It would also be good to be explicit and upfront that the authorization server is a third-party component that needs to be obtained separately. trust_validator_authz: Personally, I'm not a fan of the "authz" and "authn" abbreviations. I know this is security jargon. But are regular users going to understand this? Can we just spell it out? * doc/src/sgml/config.sgml Also here maybe state that these OAuth libraries have to be obtained separately. * doc/src/sgml/installation.sgml I find the way the installation options are structured a bit odd. I would have expected --with-libcurl and -Dlibcurl (or --with-curl and -Dcurl). These build options usually just say, use this library. We don't spell out what, for example, libldap is used for, we just use it and enable all the features that require it. * doc/src/sgml/libpq.sgml Maybe oauth_issuer should be oauth_issuer_url? Otherwise one might expect to just write "google" here or something. Or there might be other ways to contact an issuer in the future? Just a thought. * doc/src/sgml/oauth-validators.sgml This chapter says "libpq" several times, but I think this is a server side plugin, so libpq does not participate. Check please. * src/backend/libpq/auth-oauth.c I'm confused by the use of PG_MAX_AUTH_TOKEN_LENGTH in the pg_be_oauth_mech definition. What does that mean? +#define KVSEP 0x01 +#define AUTH_KEY "auth" +#define BEARER_SCHEME "Bearer " Add comments to these. Also, add comments to all functions defined here that don't have one yet. * src/backend/utils/misc/guc_tables.c Why is oauth_validator_library GUC_NOT_IN_SAMPLE? Also, shouldn't this be an hba option instead? What if you want to use different validators for different connections? * src/interfaces/libpq/fe-auth-oauth-curl.c The CURL_IGNORE_DEPRECATION thing needs clarification. Is that in progress? +#define MAX_OAUTH_RESPONSE_SIZE (1024 * 1024) Add a comment about why this value. + union + { + char **scalar; /* for all scalar types */ + struct curl_slist **array; /* for type == JSON_TOKEN_ARRAY_START */ + }; This is an anonymous union, which requires C11. Strangely, I cannot get clang to warn about this with -Wc11-extensions. Probably better to fix anyway. (The trailing supported MSVC versions don't support C11 yet.) * src/interfaces/libpq/fe-auth.h +extern const pg_fe_sasl_mech pg_oauth_mech; Should this rather be in fe-auth-oauth.h? * src/interfaces/libpq/libpq-fe.h The naming scheme of types and functions in this file is clearly obscure and has grown randomly over time. But at least my intuition is that the preferred way is types start with PG function start with PQ and the next letter is usually lower case. (PQconnectdb, PQhost, PGconn, PQresult) Maybe check your additions against that. * src/interfaces/libpq/pqexpbuffer.c * src/interfaces/libpq/pqexpbuffer.h Let's try to do this without opening up additional APIs here. This is only used once, in append_urlencoded(), and there are other ways to communicate errors, for example returning a bool. * src/test/modules/oauth_validator/ Everything in this directory needs more comments, at least on a file level. Add a README in this directory. Also update the README in the upper directory. * src/test/modules/oauth_validator/t/001_server.pl On Cirrus CI Windows task, this test reports SKIP. Can't tell why, because the log is not kept. I suppose you expect this to work on Windows (but see my comment below), so it would be good to get this test running. * src/test/modules/oauth_validator/t/002_client.pl +my $issuer = "https://127.0.0.1:54321"; Use PostgreSQL::Test::Cluster::get_free_port() instead of hardcoding port numbers. Or is this a real port? I don't see it used anywhere else. + diag "running '" . join("' '", @cmd) . "'"; This should be "note" instead. Otherwise it garbles the output. * src/test/perl/PostgreSQL/Test/OAuthServer.pm Add some comments to this file, what it's for. Is this meant to work on Windows? Just thinking, things like kill(15, $self->{'pid'}); pgperlcritic complains: src/test/perl/PostgreSQL/Test/OAuthServer.pm: Return value of flagged function ignored - read at line 39, column 2. * src/tools/pgindent/typedefs.list We don't need to typedef every locally used enum or similar into a full typedef. I suggest the following might be unnecessary: AsyncAuthFunc OAuthStep fe_oauth_state_enum
On Fri, Nov 8, 2024 at 1:21 AM Peter Eisentraut <peter@eisentraut.org> wrote: > Assorted review comments from me: Thank you! I will cherry-pick some responses here and plan to address the rest in a future patchset. > trust_validator_authz: Personally, I'm not a fan of the "authz" and > "authn" abbreviations. I know this is security jargon. But are > regular users going to understand this? Can we just spell it out? Yes. That name's a holdover from the very first draft, actually. Is "trust_validator_authorization" a great name in the first place? The key concept is that user mapping is being delegated to the OAuth system itself, so you'd better make sure that the validator has been built to do that. (Anyone have any suggestions?) > I find the way the installation options are structured a bit odd. I > would have expected --with-libcurl and -Dlibcurl (or --with-curl and > -Dcurl). These build options usually just say, use this library. It's patterned directly off of -Dssl/--with-ssl (which I liberally borrowed from) because the builtin client implementation used to have multiple options for the library in use. I can change it if needed, but I thought it'd be helpful for future devs if I didn't undo the generalization. > Maybe oauth_issuer should be oauth_issuer_url? Otherwise one might > expect to just write "google" here or something. Or there might be > other ways to contact an issuer in the future? Just a thought. More specifically this is an "issuer identifier", as defined by the OAuth/OpenID discovery specs. It's a subset of a URL, and I want to make sure users know how to differentiate between an "issuer" they trust and the "discovery URI" that's in use for that issuer. They may want to set one or the other -- a discovery URI is associated with exactly one issuer, but unfortunately an issuer may have multiple discovery URIs, which I'm actively working on. (There is also some relation to the multiple-issuers problem mentioned below.) > I'm confused by the use of PG_MAX_AUTH_TOKEN_LENGTH in the > pg_be_oauth_mech definition. What does that mean? Just that Bearer tokens can be pretty long, so we don't want to limit them to 1k like SCRAM does. 64k is probably overkill, but I've seen anecdotal reports of tens of KBs and it seemed reasonable to match what we're doing for GSS tokens. > Also, shouldn't [oauth_validator_library] be an hba option instead? What if you want to > use different validators for different connections? Yes. This is again the multiple-issuers problem; I will split that off into its own email since this one's getting long. It has security implications. > The CURL_IGNORE_DEPRECATION thing needs clarification. Is that in > progress? Thanks for the nudge, I've started a thread: https://curl.se/mail/lib-2024-11/0028.html > This is an anonymous union, which requires C11. Strangely, I cannot > get clang to warn about this with -Wc11-extensions. Probably better > to fix anyway. (The trailing supported MSVC versions don't support > C11 yet.) Oh, that's not going to be fun. > This is only used once, in append_urlencoded(), and there are other > ways to communicate errors, for example returning a bool. I'd rather not introduce two parallel error indicators for the caller to have to check for that particular part. But I can change over to using the (identical!) termPQExpBuffer. I felt like the other API signaled the intent a little better, though. > On Cirrus CI Windows task, this test reports SKIP. Can't tell why, > because the log is not kept. I suppose you expect this to work on > Windows (but see my comment below) No, builtin client support does not exist on Windows. If/when it's added, the 001_server tests will need to be ported. > +my $issuer = "https://127.0.0.1:54321"; > > Use PostgreSQL::Test::Cluster::get_free_port() instead of hardcoding > port numbers. > > Or is this a real port? I don't see it used anywhere else. It's not real; 002_client.pl doesn't start an authorization server at all. I can make that more explicit. > src/test/perl/PostgreSQL/Test/OAuthServer.pm: Return value of flagged > function ignored - read at line 39, column 2. So perlcritic recognizes "or" but not the "//" operator... Lovely. Thanks! --Jacob
On Tue, Nov 12, 2024 at 1:47 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > On Fri, Nov 8, 2024 at 1:21 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > Also, shouldn't [oauth_validator_library] be an hba option instead? What if you want to > > use different validators for different connections? > > Yes. This is again the multiple-issuers problem; I will split that off > into its own email since this one's getting long. It has security > implications. Okay, so, how to use multiple issuers/providers. Here's my current plan, with justification below: 1. libpq connection strings must specify exactly one issuer 2. the discovery document coming from the server must belong to that libpq issuer 3. the HBA should allow a choice of discovery document and validator = Current Bug = First, I should point out a critical mistake I've made on the client side: I treat oauth_issuer and oauth_client_id as if they can be arbitrarily mixed and matched. Some of the providers I've been testing do allow you to use one registered client across multiple issuers, but that's the exception rather than the norm. Even if you have multiple issuers available, you still expect your registered client to be talking to only the provider you registered it with. And you don't want the Postgres server to switch providers for you. Imagine that you've registered a client application for use with a big provider, and that provider has given you a client secret. You expect to share that secret only with them, but with the current setup, if a DBA wants to steal that secret from you, all they have to do is stand up a provider of their own, and libpq will send the secret straight to it instead. Great. There's actually a worse scenario that's pointed out in the spec for the Device Authorization flow [1]: Note that if an authorization server used with this flow is malicious, then it could perform a man-in-the-middle attack on the backchannel flow to another authorization server. [...] For this to be possible, the device manufacturer must either be the attacker and shipping a device intended to perform the man-in-the-middle attack, or be using an authorization server that is controlled by an attacker, possibly because the attacker compromised the authorization server used by the device. Back when I implemented this, that paragraph seemed pointlessly obvious: of course you must trust your authorization server. What I missed was, the Postgres server MUST NOT be able to control the entry point into the device flow, because that means a malicious DBA can trivially start a device prompt with a different provider, forward you all the details through the endpoint they control, and hope you're too fatigued to notice the difference before clicking through. (This is easier if that provider is one of the big ones that you're already used to trusting.) Then they have a token with which to attack you on a completely different platform. So in my opinion, my patchset must be changed to require a trusted issuer in the libpq connection string. The server can tell you which discovery document to get from that issuer, and it can tell you which scopes are required (as long as the user hasn't hardcoded those too), but it shouldn't be able to force the client to talk to an arbitrary provider or swap out issuers. = Multiple Issuers = Okay, with that out of the way, let's talk about multiple issuer support. First, server-side. If a server wants different groups of users/databases/etc. to go through different issuers, then it stands to reason that a validator should be selectable in the HBA settings, since a validator for Provider A may not have any clue how to validate Provider B. I don't like the idea of pg_hba being used to load arbitrary libraries, though; I think the superuser should have to designate a pool of "blessed" validator libraries to load through a GUC. As a UX improvement for the common case, maybe we don't require the HBA to have an explicit validator parameter if the conf contains exactly one blessed library. In case someone does want to develop a multi-issuer validator (say, to deal with the providers that have multiple issuers underneath their umbrella), we need to make sure that the configured issuer in use is available to the validator, so that they aren't susceptible to a mix-up attack of their own. As for the client side, I think v1 should allow only one expected issuer per connection. There are OAuth features [2] that help clients handle more safely, but as far as I can tell they are not widely deployed yet, and I don't know if any of them apply to the device flow. (With the device flow, if the client allows multiple providers, those providers can attack each other as described above.) If a more complicated client application associates a single end user with multiple Postgres connections, and each connection needs its own issuer, then that application needs to be encouraged to use a flow which has been hardened for that use case. (Setting aside the security problems with mix-ups, the device flow won't be particularly pleasant for that anyway. "Here's a bunch of URLs and codes, go to all of them before they time out, good luck!") = Discovery Documents = There are two flavors of discovery document, OAuth and OpenID. And OIDC Discovery and RFC 8414 disagree on the rules, so for the issuer "https://example.com/abcd", you have two discovery document locations using postfix or infix styles for the path: - OpenID: https://example.com/abcd/.well-known/openid-configuration - OAuth: https://example.com/.well-known/oauth-authorization-server/abcd Some providers publish different information at each [3], so the difference may be important for some deployments. RFC 8414 claims the OpenID flavor should transition to the infix style at some point (a transition that is not happening as far as I can see), so now there are three standards. And Okta uses the construction "https://example.com/abcd/.well-known/oauth-authorization-server", which you may notice matches _neither_ of the two options above, so now there are four standards. To deal with all of this, I plan to better separate the difference between the issuer and the discovery URL in the code, as well as allow DBAs and clients to specify the discovery URL explicitly to override the default OpenID flavor. For now I plan to support only "openid-configuration" and "oauth-authorization-server" in both postfix and infix notation (four options total, as seen in the wild). How's all that sound? --Jacob [1] https://datatracker.ietf.org/doc/html/rfc8628#section-5.3 [2] https://datatracker.ietf.org/doc/html/rfc9207 [3] https://devforum.okta.com/t/is-userinfo-endpoint-available-in-oauth-authorization-server/24284
On 12.11.24 22:47, Jacob Champion wrote: > On Fri, Nov 8, 2024 at 1:21 AM Peter Eisentraut <peter@eisentraut.org> wrote: >> I find the way the installation options are structured a bit odd. I >> would have expected --with-libcurl and -Dlibcurl (or --with-curl and >> -Dcurl). These build options usually just say, use this library. > > It's patterned directly off of -Dssl/--with-ssl (which I liberally > borrowed from) because the builtin client implementation used to have > multiple options for the library in use. I can change it if needed, > but I thought it'd be helpful for future devs if I didn't undo the > generalization. Personally, I'm not even a fan of the -Dssl/--with-ssl system. I'm more attached to --with-openssl. But if you want to stick with that, a more suitable naming would be something like, say, --with-httplib=curl, which means, use curl for all your http needs. Because if we later add other functionality that can use some http, I don't think we want to enable or disable them all individually, or even mix different http libraries for different features. In practice, curl is a widely available and respected library, so I'd expect packagers to be just turn it all on without much further consideration. >> I'm confused by the use of PG_MAX_AUTH_TOKEN_LENGTH in the >> pg_be_oauth_mech definition. What does that mean? > > Just that Bearer tokens can be pretty long, so we don't want to limit > them to 1k like SCRAM does. 64k is probably overkill, but I've seen > anecdotal reports of tens of KBs and it seemed reasonable to match > what we're doing for GSS tokens. Ah, ok, I totally misread that code. Could you maybe write this definition +/* Mechanism declaration */ +const pg_be_sasl_mech pg_be_oauth_mech = { + oauth_get_mechanisms, + oauth_init, + oauth_exchange, + + PG_MAX_AUTH_TOKEN_LENGTH, +}; with designated initializers: const pg_be_sasl_mech pg_be_oauth_mech = { .get_mechanisms = oauth_get_mechanisms, .init = oauth_init, .exchange = oauth_exchange, .max_message_length = PG_MAX_AUTH_TOKEN_LENGTH, }; >> The CURL_IGNORE_DEPRECATION thing needs clarification. Is that in >> progress? > > Thanks for the nudge, I've started a thread: > > https://curl.se/mail/lib-2024-11/0028.html It looks like this has been clarified, so let's put that URL into a code comment. >> This is only used once, in append_urlencoded(), and there are other >> ways to communicate errors, for example returning a bool. > > I'd rather not introduce two parallel error indicators for the caller > to have to check for that particular part. But I can change over to > using the (identical!) termPQExpBuffer. I felt like the other API > signaled the intent a little better, though. I think it's better to not drill a new hole into an established API for such a limited use. So termPQExpBuffer() seems better for now. If it later turns out, many callers are using termPQExpBuffer() for fake error handling purposes, then that can be considered independently. >> On Cirrus CI Windows task, this test reports SKIP. Can't tell why, >> because the log is not kept. I suppose you expect this to work on >> Windows (but see my comment below) > > No, builtin client support does not exist on Windows. If/when it's > added, the 001_server tests will need to be ported. Could you put some kind of explicit conditional or a comment in there. Right now, it's not possible to tell that Windows is not supported.
On Tue, Nov 19, 2024 at 3:05 AM Peter Eisentraut <peter@eisentraut.org> wrote: > Personally, I'm not even a fan of the -Dssl/--with-ssl system. I'm more > attached to --with-openssl. But if you want to stick with that, a more > suitable naming would be something like, say, --with-httplib=curl, which > means, use curl for all your http needs. Because if we later add other > functionality that can use some http, I don't think we want to enable or > disable them all individually, or even mix different http libraries for > different features. In practice, curl is a widely available and > respected library, so I'd expect packagers to be just turn it all on > without much further consideration. Okay, I can see that. I'll work on replacing --with-builtin-oauth. Any votes from the gallery on --with-httplib vs. --with-libcurl? The other suggestions look good and I've added them to my personal TODO list. Thanks again for all the feedback! --Jacob