Re: [PoC] Delegating pg_ident to a third party - Mailing list pgsql-hackers

From Jacob Champion
Subject Re: [PoC] Delegating pg_ident to a third party
Date
Msg-id 71e66c542d9ba5c9121caf7a858ab60b3c5f66fa.camel@vmware.com
Whole thread Raw
In response to Re: [PoC] Delegating pg_ident to a third party  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Mon, 2022-01-10 at 15:09 -0500, Stephen Frost wrote:
> Greetings,

Sorry for the delay, the last few weeks have been insane.

> * Jacob Champion (pchampion@vmware.com) wrote:
> > On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote:
> > > On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote:
> > > > Could you talk more about the use cases for which having the "actual
> > > > user" is better? From an auditing perspective I don't see why
> > > > "authenticated as jacob@example.net, logged in as admin" is any worse
> > > > than "logged in as jacob".
> > > 
> > > The above case isn’t what we are talking about, as far as I
> > > understand anyway. You’re suggesting “authenticated as 
> > > jacob@example.net, logged in as sales” where the user in the database
> > > is “sales”.  Consider triggers which only have access to “sales”, or
> > > a tool like pgaudit which only has access to “sales”.
> > 
> > Okay. So an additional getter function in miscadmin.h, and surfacing
> > that function to trigger languages, are needed to make authn_id more
> > generally useful. Any other cases you can think of?
> 
> That would help but now you've got two different things that have to be
> tracked, potentially, because for some people you might not want to use
> their system auth'd-as ID.  I don't see that as a great solution and
> instead as a workaround.

There's nothing to be worked around. If you have a user mapping set up
using the features that exist today, and you want to audit who logged
in at some point in the past, then you need to log both the
authenticated ID and the authorized role. There's no getting around
that. It's not enough to say "just check the configuration" because the
config can change over time.

> > But to elaborate: *forcing* one-user-per-role is wasteful, because if I
> > have a thousand employees, and I want to give all my employees access
> > to a guest role in the database, then I have to administer a thousand
> > roles: maintaining them through dump/restores and pg_upgrades, auditing
> > them to figure out why Bob in Accounting somehow got a different
> > privilege GRANT than the rest of the users, adding new accounts,
> > purging old ones, maintaining the inevitable scripts that will result.
> 
> pg_upgrade just handles it, no?  pg_dumpall -g does too.  Having to deal
> with roles in general is a pain but the number of them isn't necessarily
> an issue.  A guest role which doesn't have any auditing requirements
> might be a decent use-case for what you're talking about here but I
> don't know that we'd implement this for just that case.  Part of this
> discussion was specifically about addressing the other challenges- like
> having automation around the account addition/removal and sync'ing role
> membership too.  As for auditing privileges, that should be done
> regardless and the case you outline isn't somehow different from others
> (the same could be as easily said for how the 'guest' account got access
> to whatever it did).

I think there's a difference between auditing a small fixed number of
roles and auditing many thousands of them that change on a weekly or
daily basis. I'd rather maintain the former, given the choice. It's
harder for things to slip through the cracks with fewer moving pieces.

> > If none of the users need to be "special" in any way, that's all wasted
> > overhead. (If they do actually need to be special, then at least some
> > of that overhead becomes necessary. Otherwise it's waste.) You may be
> > able to mitigate the cost of the waste, or absorb the mitigations into
> > Postgres so that the user can't see the waste, or decide that the waste
> > is not costly enough to care about. It's still waste.
> 
> Except the amount of 'wasted' overhead being claimed here seems to be
> hardly any.  The biggest complaint levied at this seems to really be
> just the issues around the load on the ldap systems from having to deal
> with the frequent sync queries, and that's largely a solvable issue in
> the majority of environments out there today.

As long as we're in agreement that there is waste, I don't think I'm
going to convince you about the cost. It's tangential anyway if you're
not going to remove many-to-many maps.

> > > Not sure exactly what you’re referring to here by “administer role
> > > privileges externally too”..?  Curious to hear what you are imagining
> > > specifically.
> > 
> > Just that it would be nice to centrally provision role GRANTs as well
> > as role membership, that's all. No specifics in mind, and I'm not even
> > sure if LDAP would be a helpful place to put that sort of config.
> 
> GRANT's on objects, you mean?  I agree, that would be interesting to
> consider though it would involve custom entries in the LDAP directory,
> no?  Role membership would be able to be sync'd as part of group
> membership and that was something I was thinking would be handled as
> part of this in a similar manner to what the 3rd party solutions provide
> today using the cron-based approach.

Agreed. I haven't put too much thought into those use cases yet.

> > > I’d also point out though that having to do an ldap lookup on every
> > > login to PG is *already* an issue in some environments, having to do
> > > multiple amplifies that.
> > 
> > You can't use the LDAP auth method with this patch yet, so this concern
> > is based on code that doesn't exist. It's entirely possible that you
> > could do the role query as part of the first bound connection. If that
> > proves unworkable, then yes, I agree that it's a concern.
> 
> Perhaps it could be done as part of the same connection but that then
> has an impact on what the configuration of the ident LDAP lookup would
> be, no?  That seems like an important thing to flesh out before we move
> too much farther with this patch, to make sure that, if we want that to
> work, that there's a clear way to configure it to avoid the double LDAP
> connection.  I'm guessing you already have an idea how that'll work
> though..?

It's only relevant if the other thread (which you've said you're
ignoring) progresses. The patch discussed here does not touch that code
path.

But yes, I have a general idea that as long as a user can look up (but
not modify) their own role information, this should work just fine.

> > > Not to mention that when the ldap servers can’t be reached for some
> > > reason, no one can log into the database and that’s rather
> > > unfortunate too.
> > 
> > Assuming you have no caches, then yes. That might be a pretty good
> > argument for allowing ldapmap and map to be used together, actually, so
> > that you can have some critical users who can always log in as
> > "themselves" or "admin" or etc. Or maybe it's an argument for allowing
> > HBA to handle fallback methods of authentication.
> 
> Ok, so now we're talking about a cache that needs to be implemented
> which will ... store the user's password for LDAP authentication?  Or
> what the mapping is for various LDAP IDs to PG roles?  And how will that
> cache be managed?  Would it be handled by dump/restore?  What about
> pg_upgrade?  How will entries in the cache be removed?

You keep pulling the authentication discussion, which this patch does
not touch on purpose, into this discussion about authorization. The
authz info requested by this patch seems like it can be cached.

People currently using LDAP authentication (which again, this patch
cannot use because there is no LDAP user mapping) either have existing
HA infrastructure that they're happy with, or they don't. This patch
shouldn't make that situation any better or worse -- *if* the lookup
can be done on one connection.

> And mainly- how is this different from just having all the roles in PG
> to begin with..?

This comment seems counterproductive. One major difference is that
Postgres doesn't have to duplicate the authentication info that some
other system already holds.

> > Luckily I think it's pretty easy to communicate to LDAP users that if
> > *all* your login infrastructure goes down, you will no longer be able
> > to log in. They're probably used to that idea, if they haven't set up
> > any availability infra.
> 
> Except that most of the rest of the infrastructure may continue to work
> just fine except for logging in- which is something most folks only do
> once a day.  That is, why is the SQL Server system still happily
> accepting connections while the AD is being rebooted?  Or why can I
> still log into the company website even though AD is down, but I can't
> get into PG?  Not everything in an environment is tied to LDAP being up
> and running all the time, so it's not nearly so cut and dry in many,
> many cases.

Whatever LDAP users currently deal with, this patch doesn't change
their experience, right? It seems like it's a lot easier to add caching
to a synchronous check, to make it asynchronous and a little more
fault-tolerant, than it is to do the reverse.

> > >  These are, of course, arguments for moving away from methods that
> > > require checking with some other system synchronously during login-
> > > which is another reason why it’s better to have the authentication
> > > credentials easily map to the PG role, without the need for external
> > > checks at login time.  That’s done with today’s pg_ident, but this
> > > patch would change that.
> > 
> > There are arguments for moving towards synchronous checks as well.
> > Central revocation of credentials (in timeframes shorter than ticket
> > expiration) is what comes to mind. Revocation is hard and usually
> > conflicts with the desire for availability.
> 
> Revocation in less time than ticket lifetime and everything falling over
> due to the AD being restarted are very different.  The approaches being
> discussed are all much shorter than ticket lifetime and so that's hardly
> an appropriate comparison to be making.  I didn't suggest that waiting
> for ticket expiration would be appropriate when it comes to syncing
> accounts between AD and PG or that it would be appropriate for
> revocation.  Regarding the cache'ing proposed above- in such a case,
> clearly, revocation wouldn't be syncronous either.  Certainly in the
> cases today where cronjobs are being used to perform the sync,
> revocation also isn't syncronous (unless also using LDAP for
> authentication, of course, though that wouldn't do anything for existing
> sessions, while removing role memberships does...).

Sure. Again: tradeoffs.

> > The two systems have different architectures, and different security
> > properties, and you have me at a disadvantage in that you can see the
> > experimental code I have written and I cannot see the hypothetical code
> > in your head.
> 
> I've barely glanced at the code you've written <snip>

This is frustrating to read. I think we're talking past each other,
because I'm trying to talk about this patch and you're talking about
other things.

> The only reference to hypothetical code
> is the idea of a background or other worker that subscribes to changes
> in LDAP and implements those changes in PG instead of having something
> cron-based do it

Yes. That's what I was referring to.

> , but that doesn't really change anything about the
> architectural question of if we cache (either with an explicit cache, as
> you've opined us adding above, though which there is no code for today,

LDAP caches exist... I'm not suggesting we implement a Postgres-branded 
LDAP cache.

> or just by using PG's existing role/membership system) or call out to
> LDAP for every login.
> 
> > It sounds like I'm more concerned with the ability to have an online
> > central source of truth for access control, accepting that denial of
> > service may cause the system to fail shut; and you're more concerned
> > with availability in the face of network failure, accepting that denial
> > of service may cause the system to fail open. I think that's a design
> > decision that belongs to an end user.
> 
> There is more to it than just failing shut/closed.  Part of the argument
> being used to drive this change was that it would help to reduce the
> load on the LDAP servers because there wouldn't be a need to run large
> queries on them frequently out of cron to keep PG's understanding of
> what the roles are and their mappings is matching what's in LDAP.

Yes.

> > The distributed availability problems you're describing are, in my
> > experience, typically solved by caching. With your not-yet-written
> > solution, the caching is built into Postgres, and it's on all of the
> > time, but may (see below) only actually perform well with Active
> > Directory. With my solution, any caching is optional, because it has to
> > be implemented/maintained external to Postgres, but because it's just
> > generic "LDAP caching" then it should be broadly compatible and we
> > don't have to maintain it. I can see arguments for and against both
> > approaches.
> 
> I'm a bit confused by the this- either you're referring to the cache
> being PG's existing system, which certainly has already been written,
> and has existed since it was committed and released as part of 8.1, and
> is, indeed, on all the time ... or you're talking about something else
> which hasn't been written and could therefore be anything, though I'm
> generally against the idea of having an independent cache for this, as
> described above.

You just proposed an internal caching system, immediately upthread:
"I'd go a step further and suggest that the way to do this is with a
background worker that's started up and connects to an LDAP
infrastructure and listens for changes, allowing the system to pick up
on new roles/memberships as soon as they're created in the LDAP
environment." That proposal is what I was referring to by "your not-
yet-written solution".

> As for optional cacheing with some generic LDAP caching system, that
> strikes me as clearly even worse than building something into PG for
> this as it requires maintaining yet another system in order to have a
> reasonably well working system and that isn't good.

A choice for the end user. If they don't want to deal with LDAP
infrastructure, they don't have to use it.

> While it's good
> that we have pgbouncer, it'd certainly be better if we didn't need it
> and it's got a bunch of downsides to it.  I strongly suspect the same
> would be true of some external generic "LDAP cacheing" system as is
> referred to above, though as there isn't anything to look at, I can't
> say for sure.

We can take a look at OpenLDAP's proxy caching for some info. That
won't be perfectly representative but I don't think there's "nothing to
look at".

> Regarding 'performing well', while lots of little queries may be better
> in some cases than less frequent larger queries, that's really going to
> depend on the frequency of each and therefore really be rather dependent
> on the environment and usage.  In any case, however, being able to
> leverage change modifications instead of fully resyncing will definitely
> be better.  At the same time, however, if we have the external generic
> LDAP cacheing system that's being claimed ... why wouldn't we simply use
> that with the cron-based system today to offload those from the main
> LDAP systems?

I think there's an architectural difference between a proxy cache that
is set up to reduce load on a central server, and one that is set up to
handle network partitions while ensuring liveness. To be fair, I don't
know which use cases existing solutions can handle. But those two don't
seem to be the same to me.

I know that I have users who are okay with the query load from logins,
but not with the query load of their role-sync scripts. That's a good
enough datapoint for me.

> > That would certainly be a useful thing to implement for deployments
> > that can use it. But my personal interest in writing "LDAP" code that
> > only works with AD is nil, at least in the short term.
> > 
> > (The continued attitude that Microsoft Active Directory is "the one
> > that really matters" is really frustrating. I have users on LDAP
> > without Active Directory. Postgres tests are written against OpenLDAP.)
> 
> What would you consider the important directories to worry about beyond
> AD?  I don't consider the PG testing framework to be particularly
> indicative of what enterprises are actually running.

I have end users running
- NetIQ/Novell eDirectory
- Oracle Directory Server
- Red Hat IdM
in addition to AD.

> > > OpenLDAP has an audit log system which can be used though it’s
> > > certainly not as nice and would require code specific to it.
> > > 
> > > This talks a bit about other directories: 
> > >
https://docs.informatica.com/data-integration/powerexchange-adapters-for-powercenter/10-1/powerexchange-for-ldap-user-guide-for-powercenter/ldap-sessions/configuring-change-data-capture/methods-for-tracking-changes-in-different-directories.html
> > > 
> > > I do wish they all supported it cleanly in the same way.
> > 
> > Okay. But the answer to "is persistent search widely implemented?"
> > appears to be "No."
> 
> I'm curious as to how the large environments that you've worked with
> have generally solved this issue.  Is there a generic LDAP cacheing
> system that's been used?  What?

They haven't solved the issue; that's why I'm poking at it. Several
users have to cobble together scripts because of poor interaction with
their existing LDAP deployments (or complete lack of support, in the
case of pgbouncer).

> > I don't see a good way to push the filter back into the HBA, because it
> > may very well depend on the users being mapped (i.e. there may need to
> > be multiple lines in the map). Same for the query attributes. In fact
> > if I'm already using AD Kerberos or SSPI and I want to be able to
> > handle users coming from multiple domains, couldn't I be querying
> > entirely different servers depending on the username presented?
> 
> Yeah, that's a good point and which argues for putting everything into
> the ident.  In such a situation as you describe above, we wouldn't
> actually have any LDAP configuration in the HBA and I'm entirely fine
> with that- we'd just have it all in ident.  I don't see how you'd make
> that work with, as you suggest above, LDAP-based authentication and the
> idea of having only one connection be used for the LDAP-based auth and
> the mapping lookup, but I'm also not generally worried about LDAP-based
> auth and would rather we rip it out entirely. :)
> 
> As such, I'd say that you've largely convinced me that we should just
> move all of the LDAP configuration for the lookup into the ident and
> discourage people from using LDAP-based authentication and from putting
> LDAP configuration into the hba. 

I'm willing to bet that Postgres dropping support will not result in my
end users abandoning their LDAP infrastructure. Either I and others in
my position will need to maintain forks, or my end users will find a
different database.

If there's widespread agreement that the project doesn't want to
maintain an LDAP auth method -- so far I think you've provided the only
such opinion, that I've seen at least -- that might be a good argument
for introducing pluggable auth so that the community can maintain the
methods that are important to them.

> I'm still a fan of the general idea of
> having a way to configure such ldap parameters in one place in whatever
> file they go into and then re-using that multiple times on the general
> assumption that folks are likely to need to reference a particular LDAP
> configuration more than once, wherever it's configured.

Sure.

> > You're open to the idea of bolting a new key/value grammar onto the HBA
> > parser, but not to the idea of brainstorming a different configuration
> > DSL?
> 
> Short answer- yes (or, as mentioned just above, into the ident file vs.
> the hba).  I'd rather we build on the existing configuration systems
> that we have rather than invent something new that will then have to
> work with the others, as I don't see it as likely that we could just
> replace the existing ones with something new and make everyone
> change.  Having yet another one strikes me as worse than making
> improvements to the existing ones (be those 'bolted on' or otherwise).

I think the key to maintaining incrementally built systems is that at
some point, eventually, you refactor the thing. There was a brief
question on what that might look like, from Peter. You stepped in with
some very strong opinions.

--Jacob

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Server-side base backup: why superuser, not pg_write_server_files?
Next
From: Jacob Champion
Date:
Subject: Re: [PATCH] Accept IP addresses in server certificate SANs