Thread: [PoC] Delegating pg_ident to a third party
Hi all, In keeping with my theme of expanding the authentication/authorization options for the server, attached is an experimental patchset that lets Postgres determine an authenticated user's allowed roles by querying an LDAP server, and enables SASL binding for those queries. This lets you delegate pieces of pg_ident.conf to a central server, so that you don't have to run any synchronization scripts (or deal with associated staleness problems, repeated load on the LDAP deployment, etc.). And it lets you make those queries with a client certificate instead of a bind password, or at the very least protect your bind password with some SCRAM crypto. You don't have to use the LDAP auth method for this to work; you can combine it with Kerberos or certs or any auth method that already supports pg_ident. The target users, in my mind, are admins who are already using an auth method with user maps, but have many deployments and want easier control over granting and revoking database access from one location. This won't help you so much if you need to have exactly one role per user -- there's no logic to automatically create roles, so it can't fully replace the existing synchronization scripts that are out there. But if all you need is "X, Y, and Z are allowed to log in as guest, and A and B may connect as admins", then this is meant to simplify your life. This is a smaller step than my previous proof-of-concept, which handled fully federated authentication and authorization via an OAuth provider [1], and it should be a nice companion to my patch that adds user mappings to the LDAP auth method [2], though I haven't tried them together yet. (I've also been thinking about pulling group membership information out of Kerberos authorization data, for those of you using Active Directory. Things for later.) = How-To = If you want to try it out -- on a non-production system please -- take a look at the test suite in src/test/ldap, which has been filled out with some example usage. The core features are the "ldapmap" HBA option (which you would use instead of "map" in your existing HBA) and the "ldapsaslmechs" HBA option, which you can set to a list of SASL mechanisms that you will accept. (The list of supported mechanisms is determined by both systems' LDAP and SASL libraries, not by Postgres.) The tricky part is writing the pg_ident line correctly, because it's currently not a very good user experience. The query is in the form of an LDAP URL. It needs to return exactly one entry for the user being authorized; the attribute values contained in that entry will be interpreted as the list of roles that the user is allowed to connect as. Regex matching and substitution are supported as they are for regular maps. Here's a sample: pg_ident.conf: myldapmap /^(.*)$ ldap://example.com/dc=example,dc=com?postgresRole?sub?(uid=\1) pg_hba.conf: hostssl all all all cert ldapmap=myldapmap ldaptls=1 ldapsaslmechs=scram-sha-1 ldapbinddn=admin ldapbindpasswd=secret This particular setup can be described as follows: - Clients must use client certificates to authenticate to Postgres. - Once the certificate is verified, Postgres will connect to the LDAP server at example.com, issue StartTLS, and begin a SCRAM-SHA-1 exchange using the bind username and password (admin/secret). - Once that completes, Postgres will issue a query for the LDAP user that has a uid matching the CN of the client certificate. (If more than one user matches, authorization fails.) - The client's PGUSER will be compared with the list of postgresRole attributes belonging to that LDAP user, and if one matches, authorization succeeds. = Areas for Improvement = I think it would be nice to support LDAP group membership in addition to object attributes. Settings for the LDAP connection are currently spread between pg_hba, pg_ident, and environment variables like LDAPTLS_CERT. I made the situation worse by allowing the pg_ident query to contain a scheme, host, and port. That makes it seem like you could send different users to different LDAP servers, but since they would all have to share exactly the same TLS settings anyway, I think this was a mistake on my part. That mistake aside, I think the current URL query syntax is powerful but unintuitive. I would rather see that as an option for power users, and let other people just specify the user filter and role attribute separately. And there needs to be more logging around the feature, to help debug problems. Regex substitution of user-controlled data into an LDAP query is perilous, and I don't like it. For now I have restricted the allowed characters as a first mitigation. Is it safe to use listen_addresses in the test suite, as I have done, as long as the HBA requires authentication? Or is that reopening a security hole? I seem to recall discussion on this but my search-fu has failed me. There's a lot of code duplication in the current patchset that would need to be undone. ...and more; see TODOs in the patches if you're interested. = Patch Roadmap = - 0001 fixes error messages that are printed when ldap_url_parse() fails. Since the pg_ident queries use LDAP URLs, and it's easy to get them wrong, that fix is particularly important for this patchset. But I think it could potentially be applied separately. - 0002 implements the "ldapmap" HBA option and enables the ldaptls, ldapbinddn, and ldapbindpasswd options for it. It also adds corresponding tests to the LDAP suite. - 0003 tests the use of client certificates via LDAP environment variables. (This is already supported today but I didn't see any coverage, which will be important for the last patch.) - 0004 implements the "ldapsaslmechs" HBA option and adds enough SASL support for at least the EXTERNAL and SCRAM-* mechanisms. Others may work but I haven't tested them. This feature is available only if you have the <sasl/sasl.h> header on your system at build time. WDYT? (My responses here will be slower than usual. Hope you all have a great end to the year!) --Jacob [1] https://www.postgresql.org/message-id/flat/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com [2] https://www.postgresql.org/message-id/flat/1a61806047c536e7528b943d0cfe12608118ca31.camel@vmware.com
Attachment
On 17.12.21 00:48, Jacob Champion wrote: > WDYT? (My responses here will be slower than usual. Hope you all have a > great end to the year!) Looks interesting. I wonder whether putting this into pg_ident.conf is sensible. I suspect people will want to eventually add more features around this, like automatically creating roles or role memberships, at which point pg_ident.conf doesn't seem appropriate anymore. Should we have a new file for this? Do you have any further ideas?
On Fri, 2021-12-17 at 10:06 +0100, Peter Eisentraut wrote: > On 17.12.21 00:48, Jacob Champion wrote: > > WDYT? (My responses here will be slower than usual. Hope you all have a > > great end to the year!) > > Looks interesting. I wonder whether putting this into pg_ident.conf is > sensible. I suspect people will want to eventually add more features > around this, like automatically creating roles or role memberships, at > which point pg_ident.conf doesn't seem appropriate anymore. Yeah, pg_ident is getting too cramped for this. > Should we have a new file for this? Do you have any further ideas? My experience with these configs is mostly limited to HTTP servers. That said, it's pretty hard to beat the flexibility of arbitrary key- value pairs inside nested contexts. It's nice to be able to say things like Everyone has to use LDAP auth With this server And these TLS settings Except admins who additionally need client certificates with this CA root And Jacob who isn't allowed in anymore Are there any existing discussions along these lines that I should take a look at? --Jacob
Greetings, * Jacob Champion (pchampion@vmware.com) wrote: > On Fri, 2021-12-17 at 10:06 +0100, Peter Eisentraut wrote: > > On 17.12.21 00:48, Jacob Champion wrote: > > > WDYT? (My responses here will be slower than usual. Hope you all have a > > > great end to the year!) > > > > Looks interesting. I wonder whether putting this into pg_ident.conf is > > sensible. I suspect people will want to eventually add more features > > around this, like automatically creating roles or role memberships, at > > which point pg_ident.conf doesn't seem appropriate anymore. This is the part that I really wonder about also ... I've always viewed pg_ident as being intended mainly for one-to-one kind of mappings and not the "map a bunch of different users into the same role" that this advocated for. Being able to have roles and memberships automatically created is much more the direction that I'd say we should be going in, so that in-database auditing has an actual user to go on and not some generic role that could be any number of people. I'd go a step further and suggest that the way to do this is with a background worker that's started up and connects to an LDAP infrastructure and listens for changes, allowing the system to pick up on new roles/memberships as soon as they're created in the LDAP environment. That would then be controlled by appropriate settings in postgresql.conf/.auto.conf. > Yeah, pg_ident is getting too cramped for this. All that said, I do see how having the ability to call out to another system for mappings may be useful, so I'm not sure that we shouldn't consider this specific change and have it be specifically just for mappings, in which case pg_ident seems appropriate. > > Should we have a new file for this? Do you have any further ideas? > > My experience with these configs is mostly limited to HTTP servers. > That said, it's pretty hard to beat the flexibility of arbitrary key- > value pairs inside nested contexts. It's nice to be able to say things > like > > Everyone has to use LDAP auth > With this server > And these TLS settings > > Except admins > who additionally need client certificates > with this CA root > > And Jacob > who isn't allowed in anymore I certainly don't think we should have this be limited to LDAP auth- such an external mapping ability is suitable for any authentication method that supports a mapping (thinking specifically of GSSAPI, of course..). Not sure if that's what was meant above but did want to make sure that was clear. The rest looks a lot more like pg_hba or perhaps in-database privileges like roles/memberships existing or not and CONNECT rights. I'm not really sold on the idea of adding yet even more different ways to control authorization. Thanks, Stephen
Attachment
On Mon, 2022-01-03 at 12:36 -0500, Stephen Frost wrote: > * Jacob Champion (pchampion@vmware.com) wrote: > > On Fri, 2021-12-17 at 10:06 +0100, Peter Eisentraut wrote: > > > On 17.12.21 00:48, Jacob Champion wrote: > > > > WDYT? (My responses here will be slower than usual. Hope you all have a > > > > great end to the year!) > > > > > > Looks interesting. I wonder whether putting this into pg_ident.conf is > > > sensible. I suspect people will want to eventually add more features > > > around this, like automatically creating roles or role memberships, at > > > which point pg_ident.conf doesn't seem appropriate anymore. > > This is the part that I really wonder about also ... I've always viewed > pg_ident as being intended mainly for one-to-one kind of mappings and > not the "map a bunch of different users into the same role" that this > advocated for. Being able to have roles and memberships automatically > created is much more the direction that I'd say we should be going in, > so that in-database auditing has an actual user to go on and not some > generic role that could be any number of people. That last point was my motivation for the authn_id patch [1] -- so that auditing could see the actual user _and_ the generic role. The information is already there to be used, it's just not exposed to the stats framework yet. Forcing one role per individual end user is wasteful and isn't really making good use of the role-based system that you already have. Generally speaking, when administering hundreds or thousands of users, people start dividing them up into groups as opposed to dealing with them individually. So I don't think new features should be taking away flexibility in this area -- if one role per user already works well for you, great, but don't make everyone do the same. > I'd go a step further and suggest that the way to do this is with a > background worker that's started up and connects to an LDAP > infrastructure and listens for changes, allowing the system to pick up > on new roles/memberships as soon as they're created in the LDAP > environment. That would then be controlled by appropriate settings in > postgresql.conf/.auto.conf. This is roughly what you can already do with existing (third-party) tools, and that approach isn't scaling out in practice for some of our existing customers. The load on the central server, for thousands of idle databases dialing in just to see if there are any new users, is huge. > All that said, I do see how having the ability to call out to another > system for mappings may be useful, so I'm not sure that we shouldn't > consider this specific change and have it be specifically just for > mappings, in which case pg_ident seems appropriate. Yeah, this PoC was mostly an increment on the functionality that already existed. The division between what goes in pg_hba and what goes in pg_ident is starting to blur with this patchset, though, and I think Peter's point is sound. > I certainly don't think we should have this be limited to LDAP auth- > such an external mapping ability is suitable for any authentication > method that supports a mapping (thinking specifically of GSSAPI, of > course..). Not sure if that's what was meant above but did want to > make sure that was clear. You can't use usermaps with LDAP auth yet, so no, that's not what I meant. (I have another patch for that feature in commitfest, which would allow these two things to be used together.) Thanks, --Jacob [1] https://www.postgresql.org/message-id/flat/E1lTwp4-0002l4-L9%40gemulon.postgresql.org
Greetings, * Jacob Champion (pchampion@vmware.com) wrote: > On Mon, 2022-01-03 at 12:36 -0500, Stephen Frost wrote: > > * Jacob Champion (pchampion@vmware.com) wrote: > > > On Fri, 2021-12-17 at 10:06 +0100, Peter Eisentraut wrote: > > > > On 17.12.21 00:48, Jacob Champion wrote: > > > > > WDYT? (My responses here will be slower than usual. Hope you all have a > > > > > great end to the year!) > > > > > > > > Looks interesting. I wonder whether putting this into pg_ident.conf is > > > > sensible. I suspect people will want to eventually add more features > > > > around this, like automatically creating roles or role memberships, at > > > > which point pg_ident.conf doesn't seem appropriate anymore. > > > > This is the part that I really wonder about also ... I've always viewed > > pg_ident as being intended mainly for one-to-one kind of mappings and > > not the "map a bunch of different users into the same role" that this > > advocated for. Being able to have roles and memberships automatically > > created is much more the direction that I'd say we should be going in, > > so that in-database auditing has an actual user to go on and not some > > generic role that could be any number of people. > > That last point was my motivation for the authn_id patch [1] -- so that > auditing could see the actual user _and_ the generic role. The > information is already there to be used, it's just not exposed to the > stats framework yet. While that helps, and I generally support adding that information to the logs, it's certainly not nearly as good or useful as having the actual user known to the database. > Forcing one role per individual end user is wasteful and isn't really > making good use of the role-based system that you already have. > Generally speaking, when administering hundreds or thousands of users, > people start dividing them up into groups as opposed to dealing with > them individually. So I don't think new features should be taking away > flexibility in this area -- if one role per user already works well for > you, great, but don't make everyone do the same. Using the role system we have to assign privileges certainly is useful and sensible, of course, though I don't see where you've actually made an argument for why one role per individual is somehow wasteful or somehow takes away from the role system that we have for granting rights. I'm also not suggesting that we make everyone do the same thing, indeed, later on I was supportive of having an external system provide the mapping. Here, I'm just making the point that we should also be looking at automatic role/membership creation. > > I'd go a step further and suggest that the way to do this is with a > > background worker that's started up and connects to an LDAP > > infrastructure and listens for changes, allowing the system to pick up > > on new roles/memberships as soon as they're created in the LDAP > > environment. That would then be controlled by appropriate settings in > > postgresql.conf/.auto.conf. > > This is roughly what you can already do with existing (third-party) > tools, and that approach isn't scaling out in practice for some of our > existing customers. The load on the central server, for thousands of > idle databases dialing in just to see if there are any new users, is > huge. If you're referring specifically to cron-based tools which are constantly hammering on the LDAP servers running the same queries over and over, sure, I agree that that's creating load on the LDAP infrastructure (though, well, it was kind of designed to be very scalable for exactly that kind of load, no? So I'm not really sure why that's such an issue..). That's also why I specifically wasn't suggesting that and was instead suggesting that we have something that's connected to one of the (hopefully, many, many) LDAP servers and is doing change monitoring, allowing changes to be pushed down to PG, rather than cronjobs constantly running the same queries and re-checking things over and over. I appreciate that that's also not free, but I don't believe it's nearly as bad as the cron-based approach and it's certainly something that an LDAP infrastructure should be really rather good at. > > All that said, I do see how having the ability to call out to another > > system for mappings may be useful, so I'm not sure that we shouldn't > > consider this specific change and have it be specifically just for > > mappings, in which case pg_ident seems appropriate. > > Yeah, this PoC was mostly an increment on the functionality that > already existed. The division between what goes in pg_hba and what goes > in pg_ident is starting to blur with this patchset, though, and I think > Peter's point is sound. This part I tend to disagree with- pg_ident for mappings and for ways to call out to other systems to provide those mappings strikes me as entirely appropriate and doesn't blur the lines and that's really what this patch seems to be primarily about. Peter noted that there might be other things we want to do and argued that those might not be appropriate in pg_ident, which I tend to agree with, but I don't think we need to invent something entirely new for mappings when we have pg_ident already. When it comes to the question of "how to connect to an LDAP server for $whatever", it seems like it'd be nice to be able to configure that once and reuse that configuration. Not sure I have a great suggestion for how to do that. The approach this patch takes of adding options to pg_hba for that, just like other options in pg_hba do, strikes me as pretty reasonable. I would advocate for other methods to work when it comes to authenticating to LDAP from PG though (such as GSSAPI, in particular, of course...). > > I certainly don't think we should have this be limited to LDAP auth- > > such an external mapping ability is suitable for any authentication > > method that supports a mapping (thinking specifically of GSSAPI, of > > course..). Not sure if that's what was meant above but did want to > > make sure that was clear. > > You can't use usermaps with LDAP auth yet, so no, that's not what I > meant. (I have another patch for that feature in commitfest, which > would allow these two things to be used together.) Yes, I'm aware of the other patch, just wanted to make sure the intent is for this to work for all map-supporting auth methods. Figured that was the case but the examples in the prior email had me concerned and just wanted to make sure. Thanks, Stephen
Attachment
On Mon, 2022-01-03 at 19:42 -0500, Stephen Frost wrote: > * Jacob Champion (pchampion@vmware.com) wrote: > > > > That last point was my motivation for the authn_id patch [1] -- so that > > auditing could see the actual user _and_ the generic role. The > > information is already there to be used, it's just not exposed to the > > stats framework yet. > > While that helps, and I generally support adding that information to the > logs, it's certainly not nearly as good or useful as having the actual > user known to the database. Could you talk more about the use cases for which having the "actual user" is better? From an auditing perspective I don't see why "authenticated as jacob@example.net, logged in as admin" is any worse than "logged in as jacob". > > Forcing one role per individual end user is wasteful and isn't really > > making good use of the role-based system that you already have. > > Generally speaking, when administering hundreds or thousands of users, > > people start dividing them up into groups as opposed to dealing with > > them individually. So I don't think new features should be taking away > > flexibility in this area -- if one role per user already works well for > > you, great, but don't make everyone do the same. > > Using the role system we have to assign privileges certainly is useful > and sensible, of course, though I don't see where you've actually made > an argument for why one role per individual is somehow wasteful or > somehow takes away from the role system that we have for granting > rights. I was responding more to your statement that "Being able to have roles and memberships automatically created is much more the direction that I'd say we should be going in". It's not that one-role-per-user is inherently wasteful, but forcing role proliferation where it's not needed is. If all users have the same set of permissions, there doesn't need to be more than one role. But see below. > I'm also not suggesting that we make everyone do the same > thing, indeed, later on I was supportive of having an external system > provide the mapping. Here, I'm just making the point that we should > also be looking at automatic role/membership creation. Gotcha. Agreed; that would open up the ability to administer role privileges externally too, which would be cool. That could be used in tandem with something like this patchset. > > > I'd go a step further and suggest that the way to do this is with a > > > background worker that's started up and connects to an LDAP > > > infrastructure and listens for changes, allowing the system to pick up > > > on new roles/memberships as soon as they're created in the LDAP > > > environment. That would then be controlled by appropriate settings in > > > postgresql.conf/.auto.conf. > > > > This is roughly what you can already do with existing (third-party) > > tools, and that approach isn't scaling out in practice for some of our > > existing customers. The load on the central server, for thousands of > > idle databases dialing in just to see if there are any new users, is > > huge. > > If you're referring specifically to cron-based tools which are > constantly hammering on the LDAP servers running the same queries over > and over, sure, I agree that that's creating load on the LDAP > infrastructure (though, well, it was kind of designed to be very > scalable for exactly that kind of load, no? So I'm not really sure why > that's such an issue..). I don't have hands-on experience here -- just going on what I've been told via field/product teams -- but it seems to me that there's a big difference between asking an LDAP server to give you information on a user at the time that user logs in, and asking it to give a list of _all_ users to every single Postgres instance you have on a regular timer. The latter is what seems to be problematic. > That's also why I specifically wasn't > suggesting that and was instead suggesting that we have something that's > connected to one of the (hopefully, many, many) LDAP servers and is > doing change monitoring, allowing changes to be pushed down to PG, > rather than cronjobs constantly running the same queries and re-checking > things over and over. I appreciate that that's also not free, but I > don't believe it's nearly as bad as the cron-based approach and it's > certainly something that an LDAP infrastructure should be really rather > good at. I guess I'd have to see an implementation -- I was under the impression that persistent search wasn't widely implemented? > > > All that said, I do see how having the ability to call out to another > > > system for mappings may be useful, so I'm not sure that we shouldn't > > > consider this specific change and have it be specifically just for > > > mappings, in which case pg_ident seems appropriate. > > > > Yeah, this PoC was mostly an increment on the functionality that > > already existed. The division between what goes in pg_hba and what goes > > in pg_ident is starting to blur with this patchset, though, and I think > > Peter's point is sound. > > This part I tend to disagree with- pg_ident for mappings and for ways to > call out to other systems to provide those mappings strikes me as > entirely appropriate and doesn't blur the lines and that's really what > this patch seems to be primarily about. Peter noted that there might be > other things we want to do and argued that those might not be > appropriate in pg_ident, which I tend to agree with, but I don't think > we need to invent something entirely new for mappings when we have > pg_ident already. The current patchset here has pieces of what is usually contained in HBA (the LDAP host/port/base/filter/etc.) effectively moved into pg_ident, while other pieces (TLS settings) remain in the HBA and the environment. That's what I'm referring to. If that is workable for you in the end, that's fine, but for me it'd be much easier to maintain if the mapping query and the LDAP connection settings for that mapping query were next to each other. > When it comes to the question of "how to connect to an LDAP server for > $whatever", it seems like it'd be nice to be able to configure that once > and reuse that configuration. Not sure I have a great suggestion for > how to do that. The approach this patch takes of adding options to > pg_hba for that, just like other options in pg_hba do, strikes me as > pretty reasonable. Right. That part seems less reasonable to me, given the current format of the HBA. YMMV. > I would advocate for other methods to work when it comes to > authenticating to LDAP from PG though (such as GSSAPI, in particular, > of course...). I can take a look at the Cyrus requirements for the GSSAPI mechanism. Might be tricky to add tests for it, though. Any others you're interested in? > > > I certainly don't think we should have this be limited to LDAP auth- > > > such an external mapping ability is suitable for any authentication > > > method that supports a mapping (thinking specifically of GSSAPI, of > > > course..). Not sure if that's what was meant above but did want to > > > make sure that was clear. > > > > You can't use usermaps with LDAP auth yet, so no, that's not what I > > meant. (I have another patch for that feature in commitfest, which > > would allow these two things to be used together.) > > Yes, I'm aware of the other patch, just wanted to make sure the intent > is for this to work for all map-supporting auth methods. Figured that > was the case but the examples in the prior email had me concerned and > just wanted to make sure. Correct. The new tests use cert auth, for instance. Thanks, --Jacob
Greetings,
On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote:
On Mon, 2022-01-03 at 19:42 -0500, Stephen Frost wrote:
> * Jacob Champion (pchampion@vmware.com) wrote:
> >
> > That last point was my motivation for the authn_id patch [1] -- so that
> > auditing could see the actual user _and_ the generic role. The
> > information is already there to be used, it's just not exposed to the
> > stats framework yet.
>
> While that helps, and I generally support adding that information to the
> logs, it's certainly not nearly as good or useful as having the actual
> user known to the database.
Could you talk more about the use cases for which having the "actual
user" is better? From an auditing perspective I don't see why
"authenticated as jacob@example.net, logged in as admin" is any worse
than "logged in as jacob".
The above case isn’t what we are talking about, as far as I understand anyway. You’re suggesting “authenticated as jacob@example.net, logged in as sales” where the user in the database is “sales”. Consider triggers which only have access to “sales”, or a tool like pgaudit which only has access to “sales”. Who was it in sales that updated that record though? We don’t know- we would have to go try to figure it out from the logs, but even if we had time stamps on the row update, there could be 50 sales people logged in at overlapping times.
> > Forcing one role per individual end user is wasteful and isn't really
> > making good use of the role-based system that you already have.
> > Generally speaking, when administering hundreds or thousands of users,
> > people start dividing them up into groups as opposed to dealing with
> > them individually. So I don't think new features should be taking away
> > flexibility in this area -- if one role per user already works well for
> > you, great, but don't make everyone do the same.
>
> Using the role system we have to assign privileges certainly is useful
> and sensible, of course, though I don't see where you've actually made
> an argument for why one role per individual is somehow wasteful or
> somehow takes away from the role system that we have for granting
> rights.
I was responding more to your statement that "Being able to have roles
and memberships automatically created is much more the direction that
I'd say we should be going in". It's not that one-role-per-user is
inherently wasteful, but forcing role proliferation where it's not
needed is. If all users have the same set of permissions, there doesn't
need to be more than one role. But see below.
Just saying it’s wasteful isn’t actually saying what is wasteful about it.
> I'm also not suggesting that we make everyone do the same
> thing, indeed, later on I was supportive of having an external system
> provide the mapping. Here, I'm just making the point that we should
> also be looking at automatic role/membership creation.
Gotcha. Agreed; that would open up the ability to administer role
privileges externally too, which would be cool. That could be used in
tandem with something like this patchset.
Not sure exactly what you’re referring to here by “administer role privileges externally too”..? Curious to hear what you are imagining specifically.
> > > I'd go a step further and suggest that the way to do this is with a
> > > background worker that's started up and connects to an LDAP
> > > infrastructure and listens for changes, allowing the system to pick up
> > > on new roles/memberships as soon as they're created in the LDAP
> > > environment. That would then be controlled by appropriate settings in
> > > postgresql.conf/.auto.conf.
> >
> > This is roughly what you can already do with existing (third-party)
> > tools, and that approach isn't scaling out in practice for some of our
> > existing customers. The load on the central server, for thousands of
> > idle databases dialing in just to see if there are any new users, is
> > huge.
>
> If you're referring specifically to cron-based tools which are
> constantly hammering on the LDAP servers running the same queries over
> and over, sure, I agree that that's creating load on the LDAP
> infrastructure (though, well, it was kind of designed to be very
> scalable for exactly that kind of load, no? So I'm not really sure why
> that's such an issue..).
I don't have hands-on experience here -- just going on what I've been
told via field/product teams -- but it seems to me that there's a big
difference between asking an LDAP server to give you information on a
user at the time that user logs in, and asking it to give a list of
_all_ users to every single Postgres instance you have on a regular
timer. The latter is what seems to be problematic.
And to be clear, I agree that’s not good (though, again, really, your ldap infrastructure shouldn’t be having all that much trouble with it- you can scale those out verryyyy far, and far more easily than a relational database..).
I’d also point out though that having to do an ldap lookup on every login to PG is *already* an issue in some environments, having to do multiple amplifies that. Not to mention that when the ldap servers can’t be reached for some reason, no one can log into the database and that’s rather unfortunate too. These are, of course, arguments for moving away from methods that require checking with some other system synchronously during login- which is another reason why it’s better to have the authentication credentials easily map to the PG role, without the need for external checks at login time. That’s done with today’s pg_ident, but this patch would change that.
Consider the approach I continue to advocate- GSSAPI based authentication, where a user only needs to contact the Kerberos server perhaps every 8 hours or so for an updated ticket but otherwise can authorize directly to PG using their existing ticket and credentials, where their role was previously created and their memberships already exist thanks to a background worker whose job it is to handle that and which deals with transient network failures or other issues. In this world, most logins to PG don’t require any other system to be involved besides the client, the PG server, and the networking between them; perhaps DNS if things aren’t cached on the client.
On the other hand, to use ldap authentication (which also happens to be demonstrable insecure without any reasonable way to fix that), with an ldap mapping setup, requires two logins to an ldap server every single time a user logs into PG and if the ldap environment is offline or overloaded for whatever reason, the login fails or takes an excessively long amount of time.
> That's also why I specifically wasn't
> suggesting that and was instead suggesting that we have something that's
> connected to one of the (hopefully, many, many) LDAP servers and is
> doing change monitoring, allowing changes to be pushed down to PG,
> rather than cronjobs constantly running the same queries and re-checking
> things over and over. I appreciate that that's also not free, but I
> don't believe it's nearly as bad as the cron-based approach and it's
> certainly something that an LDAP infrastructure should be really rather
> good at.
I guess I'd have to see an implementation -- I was under the impression
that persistent search wasn't widely implemented?
I mean … let’s talk about the one that really matters here:
OpenLDAP has an audit log system which can be used though it’s certainly not as nice and would require code specific to it.
This talks a bit about other directories:
I do wish they all supported it cleanly in the same way.
> > > All that said, I do see how having the ability to call out to another
> > > system for mappings may be useful, so I'm not sure that we shouldn't
> > > consider this specific change and have it be specifically just for
> > > mappings, in which case pg_ident seems appropriate.
> >
> > Yeah, this PoC was mostly an increment on the functionality that
> > already existed. The division between what goes in pg_hba and what goes
> > in pg_ident is starting to blur with this patchset, though, and I think
> > Peter's point is sound.
>
> This part I tend to disagree with- pg_ident for mappings and for ways to
> call out to other systems to provide those mappings strikes me as
> entirely appropriate and doesn't blur the lines and that's really what
> this patch seems to be primarily about. Peter noted that there might be
> other things we want to do and argued that those might not be
> appropriate in pg_ident, which I tend to agree with, but I don't think
> we need to invent something entirely new for mappings when we have
> pg_ident already.
The current patchset here has pieces of what is usually contained in
HBA (the LDAP host/port/base/filter/etc.) effectively moved into
pg_ident, while other pieces (TLS settings) remain in the HBA and the
environment. That's what I'm referring to. If that is workable for you
in the end, that's fine, but for me it'd be much easier to maintain if
the mapping query and the LDAP connection settings for that mapping
query were next to each other.
I can agree with the point that it would be nicer to have the ldap host/port/base/filter be in the hba instead, if there is a way to accomplish that reasonably. Did you have a suggestion in mind for how to do that..? If there’s an alternative approach to consider, it’d be useful to see them next to each other and then we could all contemplate which is better.
> When it comes to the question of "how to connect to an LDAP server for
> $whatever", it seems like it'd be nice to be able to configure that once
> and reuse that configuration. Not sure I have a great suggestion for
> how to do that. The approach this patch takes of adding options to
> pg_hba for that, just like other options in pg_hba do, strikes me as
> pretty reasonable.
Right. That part seems less reasonable to me, given the current format
of the HBA. YMMV.
If the ldap connection info and filters and such could all exist in the hba, then perhaps a way to define those credentials in one place in the hba file and then use them on other lines would be possible..? Seems like that would be easier than having them also in the ident or having the ident refer to something defined elsewhere.
Consider in the hba having:
LDAPSERVER[ldap1]=“ldaps://whatever other options go here”
Then later:
hostssl all all ::0/0 ldap ldapserver=ldap1 ldapmapserver=ldap1 map=myldapmap
Clearly needs more thought needed due to different requirements for ldap authentication vs. the map, but still, the general idea being to have all of it in the hba and then a way to define ldap server configuration in the hba once and then reused.
> I would advocate for other methods to work when it comes to
> authenticating to LDAP from PG though (such as GSSAPI, in particular,
> of course...).
I can take a look at the Cyrus requirements for the GSSAPI mechanism.
Might be tricky to add tests for it, though. Any others you're
interested in?
GSSAPI is the main one … I suppose client side certificates would be nice too if that’s possible. I suspect some would like a way to have username/pw ldap credentials in some other file besides the hba, but that isn’t as interesting to me, at least.
> > > I certainly don't think we should have this be limited to LDAP auth-
> > > such an external mapping ability is suitable for any authentication
> > > method that supports a mapping (thinking specifically of GSSAPI, of
> > > course..). Not sure if that's what was meant above but did want to
> > > make sure that was clear.
> >
> > You can't use usermaps with LDAP auth yet, so no, that's not what I
> > meant. (I have another patch for that feature in commitfest, which
> > would allow these two things to be used together.)
>
> Yes, I'm aware of the other patch, just wanted to make sure the intent
> is for this to work for all map-supporting auth methods. Figured that
> was the case but the examples in the prior email had me concerned and
> just wanted to make sure.
Correct. The new tests use cert auth, for instance.
Great.
Thanks!
Stephen
On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote: > On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote: > > > > Could you talk more about the use cases for which having the "actual > > user" is better? From an auditing perspective I don't see why > > "authenticated as jacob@example.net, logged in as admin" is any worse > > than "logged in as jacob". > > The above case isn’t what we are talking about, as far as I > understand anyway. You’re suggesting “authenticated as > jacob@example.net, logged in as sales” where the user in the database > is “sales”. Consider triggers which only have access to “sales”, or > a tool like pgaudit which only has access to “sales”. Okay. So an additional getter function in miscadmin.h, and surfacing that function to trigger languages, are needed to make authn_id more generally useful. Any other cases you can think of? > > I was responding more to your statement that "Being able to have roles > > and memberships automatically created is much more the direction that > > I'd say we should be going in". It's not that one-role-per-user is > > inherently wasteful, but forcing role proliferation where it's not > > needed is. If all users have the same set of permissions, there doesn't > > need to be more than one role. But see below. > > Just saying it’s wasteful isn’t actually saying what is wasteful about it. Well, I felt like it was irrelevant; you've already said you have no intention to force one-user-per-role. But to elaborate: *forcing* one-user-per-role is wasteful, because if I have a thousand employees, and I want to give all my employees access to a guest role in the database, then I have to administer a thousand roles: maintaining them through dump/restores and pg_upgrades, auditing them to figure out why Bob in Accounting somehow got a different privilege GRANT than the rest of the users, adding new accounts, purging old ones, maintaining the inevitable scripts that will result. If none of the users need to be "special" in any way, that's all wasted overhead. (If they do actually need to be special, then at least some of that overhead becomes necessary. Otherwise it's waste.) You may be able to mitigate the cost of the waste, or absorb the mitigations into Postgres so that the user can't see the waste, or decide that the waste is not costly enough to care about. It's still waste. > > > I'm also not suggesting that we make everyone do the same > > > thing, indeed, later on I was supportive of having an external system > > > provide the mapping. Here, I'm just making the point that we should > > > also be looking at automatic role/membership creation. > > > > Gotcha. Agreed; that would open up the ability to administer role > > privileges externally too, which would be cool. That could be used in > > tandem with something like this patchset. > > Not sure exactly what you’re referring to here by “administer role > privileges externally too”..? Curious to hear what you are imagining > specifically. Just that it would be nice to centrally provision role GRANTs as well as role membership, that's all. No specifics in mind, and I'm not even sure if LDAP would be a helpful place to put that sort of config. > I’d also point out though that having to do an ldap lookup on every > login to PG is *already* an issue in some environments, having to do > multiple amplifies that. You can't use the LDAP auth method with this patch yet, so this concern is based on code that doesn't exist. It's entirely possible that you could do the role query as part of the first bound connection. If that proves unworkable, then yes, I agree that it's a concern. > Not to mention that when the ldap servers can’t be reached for some > reason, no one can log into the database and that’s rather > unfortunate too. Assuming you have no caches, then yes. That might be a pretty good argument for allowing ldapmap and map to be used together, actually, so that you can have some critical users who can always log in as "themselves" or "admin" or etc. Or maybe it's an argument for allowing HBA to handle fallback methods of authentication. Luckily I think it's pretty easy to communicate to LDAP users that if *all* your login infrastructure goes down, you will no longer be able to log in. They're probably used to that idea, if they haven't set up any availability infra. > These are, of course, arguments for moving away from methods that > require checking with some other system synchronously during login- > which is another reason why it’s better to have the authentication > credentials easily map to the PG role, without the need for external > checks at login time. That’s done with today’s pg_ident, but this > patch would change that. There are arguments for moving towards synchronous checks as well. Central revocation of credentials (in timeframes shorter than ticket expiration) is what comes to mind. Revocation is hard and usually conflicts with the desire for availability. What's "better" for me or you is not necessarily "better" overall; it's all tradeoffs, all the time. > Consider the approach I continue to advocate- GSSAPI based > authentication, where a user only needs to contact the Kerberos > server perhaps every 8 hours or so for an updated ticket but > otherwise can authorize directly to PG using their existing ticket > and credentials, where their role was previously created and their > memberships already exist thanks to a background worker whose job it > is to handle that and which deals with transient network failures or > other issues. In this world, most logins to PG don’t require any > other system to be involved besides the client, the PG server, and > the networking between them; perhaps DNS if things aren’t cached on > the client. > > On the other hand, to use ldap authentication (which also happens to > be demonstrable insecure without any reasonable way to fix that), > with an ldap mapping setup, requires two logins to an ldap server > every single time a user logs into PG and if the ldap environment is > offline or overloaded for whatever reason, the login fails or takes > an excessively long amount of time. The two systems have different architectures, and different security properties, and you have me at a disadvantage in that you can see the experimental code I have written and I cannot see the hypothetical code in your head. It sounds like I'm more concerned with the ability to have an online central source of truth for access control, accepting that denial of service may cause the system to fail shut; and you're more concerned with availability in the face of network failure, accepting that denial of service may cause the system to fail open. I think that's a design decision that belongs to an end user. The distributed availability problems you're describing are, in my experience, typically solved by caching. With your not-yet-written solution, the caching is built into Postgres, and it's on all of the time, but may (see below) only actually perform well with Active Directory. With my solution, any caching is optional, because it has to be implemented/maintained external to Postgres, but because it's just generic "LDAP caching" then it should be broadly compatible and we don't have to maintain it. I can see arguments for and against both approaches. > > I guess I'd have to see an implementation -- I was under the impression > > that persistent search wasn't widely implemented? > > I mean … let’s talk about the one that really matters here: > > https://docs.microsoft.com/en-us/windows/win32/ad/change-notifications-in-active-directory-domain-services That would certainly be a useful thing to implement for deployments that can use it. But my personal interest in writing "LDAP" code that only works with AD is nil, at least in the short term. (The continued attitude that Microsoft Active Directory is "the one that really matters" is really frustrating. I have users on LDAP without Active Directory. Postgres tests are written against OpenLDAP.) > OpenLDAP has an audit log system which can be used though it’s > certainly not as nice and would require code specific to it. > > This talks a bit about other directories: > https://docs.informatica.com/data-integration/powerexchange-adapters-for-powercenter/10-1/powerexchange-for-ldap-user-guide-for-powercenter/ldap-sessions/configuring-change-data-capture/methods-for-tracking-changes-in-different-directories.html > > I do wish they all supported it cleanly in the same way. Okay. But the answer to "is persistent search widely implemented?" appears to be "No." > > The current patchset here has pieces of what is usually contained in > > HBA (the LDAP host/port/base/filter/etc.) effectively moved into > > pg_ident, while other pieces (TLS settings) remain in the HBA and the > > environment. That's what I'm referring to. If that is workable for you > > in the end, that's fine, but for me it'd be much easier to maintain if > > the mapping query and the LDAP connection settings for that mapping > > query were next to each other. > > I can agree with the point that it would be nicer to have the ldap > host/port/base/filter be in the hba instead, if there is a way to > accomplish that reasonably. Did you have a suggestion in mind for how > to do that..? If there’s an alternative approach to consider, it’d > be useful to see them next to each other and then we could all > contemplate which is better. I didn't say I necessarily wanted it all in the HBA, just that I wanted it all in the same spot. I don't see a good way to push the filter back into the HBA, because it may very well depend on the users being mapped (i.e. there may need to be multiple lines in the map). Same for the query attributes. In fact if I'm already using AD Kerberos or SSPI and I want to be able to handle users coming from multiple domains, couldn't I be querying entirely different servers depending on the username presented? > > > When it comes to the question of "how to connect to an LDAP server for > > > $whatever", it seems like it'd be nice to be able to configure that once > > > and reuse that configuration. Not sure I have a great suggestion for > > > how to do that. The approach this patch takes of adding options to > > > pg_hba for that, just like other options in pg_hba do, strikes me as > > > pretty reasonable. > > > > Right. That part seems less reasonable to me, given the current format > > of the HBA. YMMV. > > If the ldap connection info and filters and such could all exist in > the hba, then perhaps a way to define those credentials in one place > in the hba file and then use them on other lines would be > possible..? Seems like that would be easier than having them also in > the ident or having the ident refer to something defined elsewhere. > > Consider in the hba having: > > LDAPSERVER[ldap1]=“ldaps://whatever other options go here” > > Then later: > > hostssl all all ::0/0 ldap ldapserver=ldap1 ldapmapserver=ldap1 map=myldapmap > > Clearly needs more thought needed due to different requirements for > ldap authentication vs. the map, but still, the general idea being to > have all of it in the hba and then a way to define ldap server > configuration in the hba once and then reused. You're open to the idea of bolting a new key/value grammar onto the HBA parser, but not to the idea of brainstorming a different configuration DSL? > > I can take a look at the Cyrus requirements for the GSSAPI mechanism. > > Might be tricky to add tests for it, though. Any others you're > > interested in? > > GSSAPI is the main one … I suppose client side certificates would be > nice too if that’s possible. I suspect some would like a way to have > username/pw ldap credentials in some other file besides the hba, but > that isn’t as interesting to me, at least. Certificate auth is already there in the patch. See the end of t/001_ldap.t. Thanks, --Jacob
Greetings, * Jacob Champion (pchampion@vmware.com) wrote: > On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote: > > On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote: > > > > > > Could you talk more about the use cases for which having the "actual > > > user" is better? From an auditing perspective I don't see why > > > "authenticated as jacob@example.net, logged in as admin" is any worse > > > than "logged in as jacob". > > > > The above case isn’t what we are talking about, as far as I > > understand anyway. You’re suggesting “authenticated as > > jacob@example.net, logged in as sales” where the user in the database > > is “sales”. Consider triggers which only have access to “sales”, or > > a tool like pgaudit which only has access to “sales”. > > Okay. So an additional getter function in miscadmin.h, and surfacing > that function to trigger languages, are needed to make authn_id more > generally useful. Any other cases you can think of? That would help but now you've got two different things that have to be tracked, potentially, because for some people you might not want to use their system auth'd-as ID. I don't see that as a great solution and instead as a workaround. Yes, we should also do this but it's really an argument for how to deal with such a setup, not a justification for going down this route. > > > I was responding more to your statement that "Being able to have roles > > > and memberships automatically created is much more the direction that > > > I'd say we should be going in". It's not that one-role-per-user is > > > inherently wasteful, but forcing role proliferation where it's not > > > needed is. If all users have the same set of permissions, there doesn't > > > need to be more than one role. But see below. > > > > Just saying it’s wasteful isn’t actually saying what is wasteful about it. > > Well, I felt like it was irrelevant; you've already said you have no > intention to force one-user-per-role. Forcing one-user-per-role would be breaking things we already support so, no, I certainly don't have any intention of requiring such a change. That said, I do feel it's useful to have these discussions. > But to elaborate: *forcing* one-user-per-role is wasteful, because if I > have a thousand employees, and I want to give all my employees access > to a guest role in the database, then I have to administer a thousand > roles: maintaining them through dump/restores and pg_upgrades, auditing > them to figure out why Bob in Accounting somehow got a different > privilege GRANT than the rest of the users, adding new accounts, > purging old ones, maintaining the inevitable scripts that will result. pg_upgrade just handles it, no? pg_dumpall -g does too. Having to deal with roles in general is a pain but the number of them isn't necessarily an issue. A guest role which doesn't have any auditing requirements might be a decent use-case for what you're talking about here but I don't know that we'd implement this for just that case. Part of this discussion was specifically about addressing the other challenges- like having automation around the account addition/removal and sync'ing role membership too. As for auditing privileges, that should be done regardless and the case you outline isn't somehow different from others (the same could be as easily said for how the 'guest' account got access to whatever it did). > If none of the users need to be "special" in any way, that's all wasted > overhead. (If they do actually need to be special, then at least some > of that overhead becomes necessary. Otherwise it's waste.) You may be > able to mitigate the cost of the waste, or absorb the mitigations into > Postgres so that the user can't see the waste, or decide that the waste > is not costly enough to care about. It's still waste. Except the amount of 'wasted' overhead being claimed here seems to be hardly any. The biggest complaint levied at this seems to really be just the issues around the load on the ldap systems from having to deal with the frequent sync queries, and that's largely a solvable issue in the majority of environments out there today. > > > > I'm also not suggesting that we make everyone do the same > > > > thing, indeed, later on I was supportive of having an external system > > > > provide the mapping. Here, I'm just making the point that we should > > > > also be looking at automatic role/membership creation. > > > > > > Gotcha. Agreed; that would open up the ability to administer role > > > privileges externally too, which would be cool. That could be used in > > > tandem with something like this patchset. > > > > Not sure exactly what you’re referring to here by “administer role > > privileges externally too”..? Curious to hear what you are imagining > > specifically. > > Just that it would be nice to centrally provision role GRANTs as well > as role membership, that's all. No specifics in mind, and I'm not even > sure if LDAP would be a helpful place to put that sort of config. GRANT's on objects, you mean? I agree, that would be interesting to consider though it would involve custom entries in the LDAP directory, no? Role membership would be able to be sync'd as part of group membership and that was something I was thinking would be handled as part of this in a similar manner to what the 3rd party solutions provide today using the cron-based approach. > > I’d also point out though that having to do an ldap lookup on every > > login to PG is *already* an issue in some environments, having to do > > multiple amplifies that. > > You can't use the LDAP auth method with this patch yet, so this concern > is based on code that doesn't exist. It's entirely possible that you > could do the role query as part of the first bound connection. If that > proves unworkable, then yes, I agree that it's a concern. Perhaps it could be done as part of the same connection but that then has an impact on what the configuration of the ident LDAP lookup would be, no? That seems like an important thing to flesh out before we move too much farther with this patch, to make sure that, if we want that to work, that there's a clear way to configure it to avoid the double LDAP connection. I'm guessing you already have an idea how that'll work though..? > > Not to mention that when the ldap servers can’t be reached for some > > reason, no one can log into the database and that’s rather > > unfortunate too. > > Assuming you have no caches, then yes. That might be a pretty good > argument for allowing ldapmap and map to be used together, actually, so > that you can have some critical users who can always log in as > "themselves" or "admin" or etc. Or maybe it's an argument for allowing > HBA to handle fallback methods of authentication. Ok, so now we're talking about a cache that needs to be implemented which will ... store the user's password for LDAP authentication? Or what the mapping is for various LDAP IDs to PG roles? And how will that cache be managed? Would it be handled by dump/restore? What about pg_upgrade? How will entries in the cache be removed? And mainly- how is this different from just having all the roles in PG to begin with..? > Luckily I think it's pretty easy to communicate to LDAP users that if > *all* your login infrastructure goes down, you will no longer be able > to log in. They're probably used to that idea, if they haven't set up > any availability infra. Except that most of the rest of the infrastructure may continue to work just fine except for logging in- which is something most folks only do once a day. That is, why is the SQL Server system still happily accepting connections while the AD is being rebooted? Or why can I still log into the company website even though AD is down, but I can't get into PG? Not everything in an environment is tied to LDAP being up and running all the time, so it's not nearly so cut and dry in many, many cases. > > These are, of course, arguments for moving away from methods that > > require checking with some other system synchronously during login- > > which is another reason why it’s better to have the authentication > > credentials easily map to the PG role, without the need for external > > checks at login time. That’s done with today’s pg_ident, but this > > patch would change that. > > There are arguments for moving towards synchronous checks as well. > Central revocation of credentials (in timeframes shorter than ticket > expiration) is what comes to mind. Revocation is hard and usually > conflicts with the desire for availability. Revocation in less time than ticket lifetime and everything falling over due to the AD being restarted are very different. The approaches being discussed are all much shorter than ticket lifetime and so that's hardly an appropriate comparison to be making. I didn't suggest that waiting for ticket expiration would be appropriate when it comes to syncing accounts between AD and PG or that it would be appropriate for revocation. Regarding the cache'ing proposed above- in such a case, clearly, revocation wouldn't be syncronous either. Certainly in the cases today where cronjobs are being used to perform the sync, revocation also isn't syncronous (unless also using LDAP for authentication, of course, though that wouldn't do anything for existing sessions, while removing role memberships does...). > What's "better" for me or you is not necessarily "better" overall; it's > all tradeoffs, all the time. Sure. > > Consider the approach I continue to advocate- GSSAPI based > > authentication, where a user only needs to contact the Kerberos > > server perhaps every 8 hours or so for an updated ticket but > > otherwise can authorize directly to PG using their existing ticket > > and credentials, where their role was previously created and their > > memberships already exist thanks to a background worker whose job it > > is to handle that and which deals with transient network failures or > > other issues. In this world, most logins to PG don’t require any > > other system to be involved besides the client, the PG server, and > > the networking between them; perhaps DNS if things aren’t cached on > > the client. > > > > On the other hand, to use ldap authentication (which also happens to > > be demonstrable insecure without any reasonable way to fix that), > > with an ldap mapping setup, requires two logins to an ldap server > > every single time a user logs into PG and if the ldap environment is > > offline or overloaded for whatever reason, the login fails or takes > > an excessively long amount of time. > > The two systems have different architectures, and different security > properties, and you have me at a disadvantage in that you can see the > experimental code I have written and I cannot see the hypothetical code > in your head. I've barely glanced at the code you've written and it largely hasn't been driving my comments on this thread- merely the understanding of how it works. Further, you've stated that you're already familiar with systems that sync between LDAP and PG and the vast majority of this discussion has been about that distinction- if we push the mappings into PG as roles, or if we execute a query out to LDAP on connection to check the mapping. The above references to tickets and GSSAPI/Kerberos are all from existing code as well. The only reference to hypothetical code is the idea of a background or other worker that subscribes to changes in LDAP and implements those changes in PG instead of having something cron-based do it, but that doesn't really change anything about the architectural question of if we cache (either with an explicit cache, as you've opined us adding above, though which there is no code for today, or just by using PG's existing role/membership system) or call out to LDAP for every login. > It sounds like I'm more concerned with the ability to have an online > central source of truth for access control, accepting that denial of > service may cause the system to fail shut; and you're more concerned > with availability in the face of network failure, accepting that denial > of service may cause the system to fail open. I think that's a design > decision that belongs to an end user. There is more to it than just failing shut/closed. Part of the argument being used to drive this change was that it would help to reduce the load on the LDAP servers because there wouldn't be a need to run large queries on them frequently out of cron to keep PG's understanding of what the roles are and their mappings is matching what's in LDAP. > The distributed availability problems you're describing are, in my > experience, typically solved by caching. With your not-yet-written > solution, the caching is built into Postgres, and it's on all of the > time, but may (see below) only actually perform well with Active > Directory. With my solution, any caching is optional, because it has to > be implemented/maintained external to Postgres, but because it's just > generic "LDAP caching" then it should be broadly compatible and we > don't have to maintain it. I can see arguments for and against both > approaches. I'm a bit confused by the this- either you're referring to the cache being PG's existing system, which certainly has already been written, and has existed since it was committed and released as part of 8.1, and is, indeed, on all the time ... or you're talking about something else which hasn't been written and could therefore be anything, though I'm generally against the idea of having an independent cache for this, as described above. As for optional cacheing with some generic LDAP caching system, that strikes me as clearly even worse than building something into PG for this as it requires maintaining yet another system in order to have a reasonably well working system and that isn't good. While it's good that we have pgbouncer, it'd certainly be better if we didn't need it and it's got a bunch of downsides to it. I strongly suspect the same would be true of some external generic "LDAP cacheing" system as is referred to above, though as there isn't anything to look at, I can't say for sure. Regarding 'performing well', while lots of little queries may be better in some cases than less frequent larger queries, that's really going to depend on the frequency of each and therefore really be rather dependent on the environment and usage. In any case, however, being able to leverage change modifications instead of fully resyncing will definitely be better. At the same time, however, if we have the external generic LDAP cacheing system that's being claimed ... why wouldn't we simply use that with the cron-based system today to offload those from the main LDAP systems? > > > I guess I'd have to see an implementation -- I was under the impression > > > that persistent search wasn't widely implemented? > > > > I mean … let’s talk about the one that really matters here: > > > > https://docs.microsoft.com/en-us/windows/win32/ad/change-notifications-in-active-directory-domain-services > > That would certainly be a useful thing to implement for deployments > that can use it. But my personal interest in writing "LDAP" code that > only works with AD is nil, at least in the short term. > > (The continued attitude that Microsoft Active Directory is "the one > that really matters" is really frustrating. I have users on LDAP > without Active Directory. Postgres tests are written against OpenLDAP.) What would you consider the important directories to worry about beyond AD? I don't consider the PG testing framework to be particularly indicative of what enterprises are actually running. > > OpenLDAP has an audit log system which can be used though it’s > > certainly not as nice and would require code specific to it. > > > > This talks a bit about other directories: > > https://docs.informatica.com/data-integration/powerexchange-adapters-for-powercenter/10-1/powerexchange-for-ldap-user-guide-for-powercenter/ldap-sessions/configuring-change-data-capture/methods-for-tracking-changes-in-different-directories.html > > > > I do wish they all supported it cleanly in the same way. > > Okay. But the answer to "is persistent search widely implemented?" > appears to be "No." I'm curious as to how the large environments that you've worked with have generally solved this issue. Is there a generic LDAP cacheing system that's been used? What? > > > The current patchset here has pieces of what is usually contained in > > > HBA (the LDAP host/port/base/filter/etc.) effectively moved into > > > pg_ident, while other pieces (TLS settings) remain in the HBA and the > > > environment. That's what I'm referring to. If that is workable for you > > > in the end, that's fine, but for me it'd be much easier to maintain if > > > the mapping query and the LDAP connection settings for that mapping > > > query were next to each other. > > > > I can agree with the point that it would be nicer to have the ldap > > host/port/base/filter be in the hba instead, if there is a way to > > accomplish that reasonably. Did you have a suggestion in mind for how > > to do that..? If there’s an alternative approach to consider, it’d > > be useful to see them next to each other and then we could all > > contemplate which is better. > > I didn't say I necessarily wanted it all in the HBA, just that I wanted > it all in the same spot. > > I don't see a good way to push the filter back into the HBA, because it > may very well depend on the users being mapped (i.e. there may need to > be multiple lines in the map). Same for the query attributes. In fact > if I'm already using AD Kerberos or SSPI and I want to be able to > handle users coming from multiple domains, couldn't I be querying > entirely different servers depending on the username presented? Yeah, that's a good point and which argues for putting everything into the ident. In such a situation as you describe above, we wouldn't actually have any LDAP configuration in the HBA and I'm entirely fine with that- we'd just have it all in ident. I don't see how you'd make that work with, as you suggest above, LDAP-based authentication and the idea of having only one connection be used for the LDAP-based auth and the mapping lookup, but I'm also not generally worried about LDAP-based auth and would rather we rip it out entirely. :) As such, I'd say that you've largely convinced me that we should just move all of the LDAP configuration for the lookup into the ident and discourage people from using LDAP-based authentication and from putting LDAP configuration into the hba. I'm still a fan of the general idea of having a way to configure such ldap parameters in one place in whatever file they go into and then re-using that multiple times on the general assumption that folks are likely to need to reference a particular LDAP configuration more than once, wherever it's configured. > > > > When it comes to the question of "how to connect to an LDAP server for > > > > $whatever", it seems like it'd be nice to be able to configure that once > > > > and reuse that configuration. Not sure I have a great suggestion for > > > > how to do that. The approach this patch takes of adding options to > > > > pg_hba for that, just like other options in pg_hba do, strikes me as > > > > pretty reasonable. > > > > > > Right. That part seems less reasonable to me, given the current format > > > of the HBA. YMMV. > > > > If the ldap connection info and filters and such could all exist in > > the hba, then perhaps a way to define those credentials in one place > > in the hba file and then use them on other lines would be > > possible..? Seems like that would be easier than having them also in > > the ident or having the ident refer to something defined elsewhere. > > > > Consider in the hba having: > > > > LDAPSERVER[ldap1]=“ldaps://whatever other options go here” > > > > Then later: > > > > hostssl all all ::0/0 ldap ldapserver=ldap1 ldapmapserver=ldap1 map=myldapmap > > > > Clearly needs more thought needed due to different requirements for > > ldap authentication vs. the map, but still, the general idea being to > > have all of it in the hba and then a way to define ldap server > > configuration in the hba once and then reused. > > You're open to the idea of bolting a new key/value grammar onto the HBA > parser, but not to the idea of brainstorming a different configuration > DSL? Short answer- yes (or, as mentioned just above, into the ident file vs. the hba). I'd rather we build on the existing configuration systems that we have rather than invent something new that will then have to work with the others, as I don't see it as likely that we could just replace the existing ones with something new and make everyone change. Having yet another one strikes me as worse than making improvements to the existing ones (be those 'bolted on' or otherwise). Thanks, Stephen
Attachment
On Mon, 2022-01-10 at 15:09 -0500, Stephen Frost wrote: > Greetings, Sorry for the delay, the last few weeks have been insane. > * Jacob Champion (pchampion@vmware.com) wrote: > > On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote: > > > On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote: > > > > Could you talk more about the use cases for which having the "actual > > > > user" is better? From an auditing perspective I don't see why > > > > "authenticated as jacob@example.net, logged in as admin" is any worse > > > > than "logged in as jacob". > > > > > > The above case isn’t what we are talking about, as far as I > > > understand anyway. You’re suggesting “authenticated as > > > jacob@example.net, logged in as sales” where the user in the database > > > is “sales”. Consider triggers which only have access to “sales”, or > > > a tool like pgaudit which only has access to “sales”. > > > > Okay. So an additional getter function in miscadmin.h, and surfacing > > that function to trigger languages, are needed to make authn_id more > > generally useful. Any other cases you can think of? > > That would help but now you've got two different things that have to be > tracked, potentially, because for some people you might not want to use > their system auth'd-as ID. I don't see that as a great solution and > instead as a workaround. There's nothing to be worked around. If you have a user mapping set up using the features that exist today, and you want to audit who logged in at some point in the past, then you need to log both the authenticated ID and the authorized role. There's no getting around that. It's not enough to say "just check the configuration" because the config can change over time. > > But to elaborate: *forcing* one-user-per-role is wasteful, because if I > > have a thousand employees, and I want to give all my employees access > > to a guest role in the database, then I have to administer a thousand > > roles: maintaining them through dump/restores and pg_upgrades, auditing > > them to figure out why Bob in Accounting somehow got a different > > privilege GRANT than the rest of the users, adding new accounts, > > purging old ones, maintaining the inevitable scripts that will result. > > pg_upgrade just handles it, no? pg_dumpall -g does too. Having to deal > with roles in general is a pain but the number of them isn't necessarily > an issue. A guest role which doesn't have any auditing requirements > might be a decent use-case for what you're talking about here but I > don't know that we'd implement this for just that case. Part of this > discussion was specifically about addressing the other challenges- like > having automation around the account addition/removal and sync'ing role > membership too. As for auditing privileges, that should be done > regardless and the case you outline isn't somehow different from others > (the same could be as easily said for how the 'guest' account got access > to whatever it did). I think there's a difference between auditing a small fixed number of roles and auditing many thousands of them that change on a weekly or daily basis. I'd rather maintain the former, given the choice. It's harder for things to slip through the cracks with fewer moving pieces. > > If none of the users need to be "special" in any way, that's all wasted > > overhead. (If they do actually need to be special, then at least some > > of that overhead becomes necessary. Otherwise it's waste.) You may be > > able to mitigate the cost of the waste, or absorb the mitigations into > > Postgres so that the user can't see the waste, or decide that the waste > > is not costly enough to care about. It's still waste. > > Except the amount of 'wasted' overhead being claimed here seems to be > hardly any. The biggest complaint levied at this seems to really be > just the issues around the load on the ldap systems from having to deal > with the frequent sync queries, and that's largely a solvable issue in > the majority of environments out there today. As long as we're in agreement that there is waste, I don't think I'm going to convince you about the cost. It's tangential anyway if you're not going to remove many-to-many maps. > > > Not sure exactly what you’re referring to here by “administer role > > > privileges externally too”..? Curious to hear what you are imagining > > > specifically. > > > > Just that it would be nice to centrally provision role GRANTs as well > > as role membership, that's all. No specifics in mind, and I'm not even > > sure if LDAP would be a helpful place to put that sort of config. > > GRANT's on objects, you mean? I agree, that would be interesting to > consider though it would involve custom entries in the LDAP directory, > no? Role membership would be able to be sync'd as part of group > membership and that was something I was thinking would be handled as > part of this in a similar manner to what the 3rd party solutions provide > today using the cron-based approach. Agreed. I haven't put too much thought into those use cases yet. > > > I’d also point out though that having to do an ldap lookup on every > > > login to PG is *already* an issue in some environments, having to do > > > multiple amplifies that. > > > > You can't use the LDAP auth method with this patch yet, so this concern > > is based on code that doesn't exist. It's entirely possible that you > > could do the role query as part of the first bound connection. If that > > proves unworkable, then yes, I agree that it's a concern. > > Perhaps it could be done as part of the same connection but that then > has an impact on what the configuration of the ident LDAP lookup would > be, no? That seems like an important thing to flesh out before we move > too much farther with this patch, to make sure that, if we want that to > work, that there's a clear way to configure it to avoid the double LDAP > connection. I'm guessing you already have an idea how that'll work > though..? It's only relevant if the other thread (which you've said you're ignoring) progresses. The patch discussed here does not touch that code path. But yes, I have a general idea that as long as a user can look up (but not modify) their own role information, this should work just fine. > > > Not to mention that when the ldap servers can’t be reached for some > > > reason, no one can log into the database and that’s rather > > > unfortunate too. > > > > Assuming you have no caches, then yes. That might be a pretty good > > argument for allowing ldapmap and map to be used together, actually, so > > that you can have some critical users who can always log in as > > "themselves" or "admin" or etc. Or maybe it's an argument for allowing > > HBA to handle fallback methods of authentication. > > Ok, so now we're talking about a cache that needs to be implemented > which will ... store the user's password for LDAP authentication? Or > what the mapping is for various LDAP IDs to PG roles? And how will that > cache be managed? Would it be handled by dump/restore? What about > pg_upgrade? How will entries in the cache be removed? You keep pulling the authentication discussion, which this patch does not touch on purpose, into this discussion about authorization. The authz info requested by this patch seems like it can be cached. People currently using LDAP authentication (which again, this patch cannot use because there is no LDAP user mapping) either have existing HA infrastructure that they're happy with, or they don't. This patch shouldn't make that situation any better or worse -- *if* the lookup can be done on one connection. > And mainly- how is this different from just having all the roles in PG > to begin with..? This comment seems counterproductive. One major difference is that Postgres doesn't have to duplicate the authentication info that some other system already holds. > > Luckily I think it's pretty easy to communicate to LDAP users that if > > *all* your login infrastructure goes down, you will no longer be able > > to log in. They're probably used to that idea, if they haven't set up > > any availability infra. > > Except that most of the rest of the infrastructure may continue to work > just fine except for logging in- which is something most folks only do > once a day. That is, why is the SQL Server system still happily > accepting connections while the AD is being rebooted? Or why can I > still log into the company website even though AD is down, but I can't > get into PG? Not everything in an environment is tied to LDAP being up > and running all the time, so it's not nearly so cut and dry in many, > many cases. Whatever LDAP users currently deal with, this patch doesn't change their experience, right? It seems like it's a lot easier to add caching to a synchronous check, to make it asynchronous and a little more fault-tolerant, than it is to do the reverse. > > > These are, of course, arguments for moving away from methods that > > > require checking with some other system synchronously during login- > > > which is another reason why it’s better to have the authentication > > > credentials easily map to the PG role, without the need for external > > > checks at login time. That’s done with today’s pg_ident, but this > > > patch would change that. > > > > There are arguments for moving towards synchronous checks as well. > > Central revocation of credentials (in timeframes shorter than ticket > > expiration) is what comes to mind. Revocation is hard and usually > > conflicts with the desire for availability. > > Revocation in less time than ticket lifetime and everything falling over > due to the AD being restarted are very different. The approaches being > discussed are all much shorter than ticket lifetime and so that's hardly > an appropriate comparison to be making. I didn't suggest that waiting > for ticket expiration would be appropriate when it comes to syncing > accounts between AD and PG or that it would be appropriate for > revocation. Regarding the cache'ing proposed above- in such a case, > clearly, revocation wouldn't be syncronous either. Certainly in the > cases today where cronjobs are being used to perform the sync, > revocation also isn't syncronous (unless also using LDAP for > authentication, of course, though that wouldn't do anything for existing > sessions, while removing role memberships does...). Sure. Again: tradeoffs. > > The two systems have different architectures, and different security > > properties, and you have me at a disadvantage in that you can see the > > experimental code I have written and I cannot see the hypothetical code > > in your head. > > I've barely glanced at the code you've written <snip> This is frustrating to read. I think we're talking past each other, because I'm trying to talk about this patch and you're talking about other things. > The only reference to hypothetical code > is the idea of a background or other worker that subscribes to changes > in LDAP and implements those changes in PG instead of having something > cron-based do it Yes. That's what I was referring to. > , but that doesn't really change anything about the > architectural question of if we cache (either with an explicit cache, as > you've opined us adding above, though which there is no code for today, LDAP caches exist... I'm not suggesting we implement a Postgres-branded LDAP cache. > or just by using PG's existing role/membership system) or call out to > LDAP for every login. > > > It sounds like I'm more concerned with the ability to have an online > > central source of truth for access control, accepting that denial of > > service may cause the system to fail shut; and you're more concerned > > with availability in the face of network failure, accepting that denial > > of service may cause the system to fail open. I think that's a design > > decision that belongs to an end user. > > There is more to it than just failing shut/closed. Part of the argument > being used to drive this change was that it would help to reduce the > load on the LDAP servers because there wouldn't be a need to run large > queries on them frequently out of cron to keep PG's understanding of > what the roles are and their mappings is matching what's in LDAP. Yes. > > The distributed availability problems you're describing are, in my > > experience, typically solved by caching. With your not-yet-written > > solution, the caching is built into Postgres, and it's on all of the > > time, but may (see below) only actually perform well with Active > > Directory. With my solution, any caching is optional, because it has to > > be implemented/maintained external to Postgres, but because it's just > > generic "LDAP caching" then it should be broadly compatible and we > > don't have to maintain it. I can see arguments for and against both > > approaches. > > I'm a bit confused by the this- either you're referring to the cache > being PG's existing system, which certainly has already been written, > and has existed since it was committed and released as part of 8.1, and > is, indeed, on all the time ... or you're talking about something else > which hasn't been written and could therefore be anything, though I'm > generally against the idea of having an independent cache for this, as > described above. You just proposed an internal caching system, immediately upthread: "I'd go a step further and suggest that the way to do this is with a background worker that's started up and connects to an LDAP infrastructure and listens for changes, allowing the system to pick up on new roles/memberships as soon as they're created in the LDAP environment." That proposal is what I was referring to by "your not- yet-written solution". > As for optional cacheing with some generic LDAP caching system, that > strikes me as clearly even worse than building something into PG for > this as it requires maintaining yet another system in order to have a > reasonably well working system and that isn't good. A choice for the end user. If they don't want to deal with LDAP infrastructure, they don't have to use it. > While it's good > that we have pgbouncer, it'd certainly be better if we didn't need it > and it's got a bunch of downsides to it. I strongly suspect the same > would be true of some external generic "LDAP cacheing" system as is > referred to above, though as there isn't anything to look at, I can't > say for sure. We can take a look at OpenLDAP's proxy caching for some info. That won't be perfectly representative but I don't think there's "nothing to look at". > Regarding 'performing well', while lots of little queries may be better > in some cases than less frequent larger queries, that's really going to > depend on the frequency of each and therefore really be rather dependent > on the environment and usage. In any case, however, being able to > leverage change modifications instead of fully resyncing will definitely > be better. At the same time, however, if we have the external generic > LDAP cacheing system that's being claimed ... why wouldn't we simply use > that with the cron-based system today to offload those from the main > LDAP systems? I think there's an architectural difference between a proxy cache that is set up to reduce load on a central server, and one that is set up to handle network partitions while ensuring liveness. To be fair, I don't know which use cases existing solutions can handle. But those two don't seem to be the same to me. I know that I have users who are okay with the query load from logins, but not with the query load of their role-sync scripts. That's a good enough datapoint for me. > > That would certainly be a useful thing to implement for deployments > > that can use it. But my personal interest in writing "LDAP" code that > > only works with AD is nil, at least in the short term. > > > > (The continued attitude that Microsoft Active Directory is "the one > > that really matters" is really frustrating. I have users on LDAP > > without Active Directory. Postgres tests are written against OpenLDAP.) > > What would you consider the important directories to worry about beyond > AD? I don't consider the PG testing framework to be particularly > indicative of what enterprises are actually running. I have end users running - NetIQ/Novell eDirectory - Oracle Directory Server - Red Hat IdM in addition to AD. > > > OpenLDAP has an audit log system which can be used though it’s > > > certainly not as nice and would require code specific to it. > > > > > > This talks a bit about other directories: > > > https://docs.informatica.com/data-integration/powerexchange-adapters-for-powercenter/10-1/powerexchange-for-ldap-user-guide-for-powercenter/ldap-sessions/configuring-change-data-capture/methods-for-tracking-changes-in-different-directories.html > > > > > > I do wish they all supported it cleanly in the same way. > > > > Okay. But the answer to "is persistent search widely implemented?" > > appears to be "No." > > I'm curious as to how the large environments that you've worked with > have generally solved this issue. Is there a generic LDAP cacheing > system that's been used? What? They haven't solved the issue; that's why I'm poking at it. Several users have to cobble together scripts because of poor interaction with their existing LDAP deployments (or complete lack of support, in the case of pgbouncer). > > I don't see a good way to push the filter back into the HBA, because it > > may very well depend on the users being mapped (i.e. there may need to > > be multiple lines in the map). Same for the query attributes. In fact > > if I'm already using AD Kerberos or SSPI and I want to be able to > > handle users coming from multiple domains, couldn't I be querying > > entirely different servers depending on the username presented? > > Yeah, that's a good point and which argues for putting everything into > the ident. In such a situation as you describe above, we wouldn't > actually have any LDAP configuration in the HBA and I'm entirely fine > with that- we'd just have it all in ident. I don't see how you'd make > that work with, as you suggest above, LDAP-based authentication and the > idea of having only one connection be used for the LDAP-based auth and > the mapping lookup, but I'm also not generally worried about LDAP-based > auth and would rather we rip it out entirely. :) > > As such, I'd say that you've largely convinced me that we should just > move all of the LDAP configuration for the lookup into the ident and > discourage people from using LDAP-based authentication and from putting > LDAP configuration into the hba. I'm willing to bet that Postgres dropping support will not result in my end users abandoning their LDAP infrastructure. Either I and others in my position will need to maintain forks, or my end users will find a different database. If there's widespread agreement that the project doesn't want to maintain an LDAP auth method -- so far I think you've provided the only such opinion, that I've seen at least -- that might be a good argument for introducing pluggable auth so that the community can maintain the methods that are important to them. > I'm still a fan of the general idea of > having a way to configure such ldap parameters in one place in whatever > file they go into and then re-using that multiple times on the general > assumption that folks are likely to need to reference a particular LDAP > configuration more than once, wherever it's configured. Sure. > > You're open to the idea of bolting a new key/value grammar onto the HBA > > parser, but not to the idea of brainstorming a different configuration > > DSL? > > Short answer- yes (or, as mentioned just above, into the ident file vs. > the hba). I'd rather we build on the existing configuration systems > that we have rather than invent something new that will then have to > work with the others, as I don't see it as likely that we could just > replace the existing ones with something new and make everyone > change. Having yet another one strikes me as worse than making > improvements to the existing ones (be those 'bolted on' or otherwise). I think the key to maintaining incrementally built systems is that at some point, eventually, you refactor the thing. There was a brief question on what that might look like, from Peter. You stepped in with some very strong opinions. --Jacob