Thread: Feature Proposal: Connection Pool Optimization - Change the Connection User

Feature Proposal: Connection Pool Optimization - Change the Connection User

From
Todd Hubers
Date:
Hi,

I have just joined to start a community consultation process for a proposal. I just finished the proposal document, I spent time writing a Problem and Solution section, and I have done quite a bit of upfront exploration of the code.

See:
I am very keen to make this happen.

Thanks

--
--
Todd Hubers
Attachment

Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

From
Justin Pryzby
Date:
On Sun, Nov 21, 2021 at 03:11:03AM +1100, Todd Hubers wrote:
> I have just joined to start a community consultation process for a
> proposal. I just finished the proposal document, I spent time writing a
> Problem and Solution section, and I have done quite a bit of upfront
> exploration of the code.
> 
>    - Google Document with Commenting turned on
>    https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit?usp=sharing.

The stated goal is to support persistent connections with hundreds or thousands
of different users.

However, this doesn't attempt to address the similar issue if there's hundreds
or thousands of DBs.  Which I don't think could work at all, since switching to
a new DB requires connecting to a new backend process.

You proposed a PQ protocol version of SET ROLE/SET SESSION authorization.
You'd need to make sure that a client connected to the connection pooler cannot
itself run the PQ "set role".  The connection pooler would need to reject the
request (or maybe ignore requests to switch to the same/current user).  Maybe
you'd have two protocol requests "begin switch user" and "end switch to user",
and then the server-side could enforce that "begin switch" is not nested.
Maybe the "begin switch" could return some kind of "nonce" to the connection
pooler, and "end switch" would require the same nonce to be validated.

It'd be interesting to hear if you've tested with postgresql 14, which improves
scalability to larger number of connections.

-- 
Justin



Justin Pryzby <pryzby@telsasoft.com> writes:
> On Sun, Nov 21, 2021 at 03:11:03AM +1100, Todd Hubers wrote:
>> - Google Document with Commenting turned on
>> https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit?usp=sharing.

> You proposed a PQ protocol version of SET ROLE/SET SESSION authorization.
> You'd need to make sure that a client connected to the connection pooler cannot
> itself run the PQ "set role".

It's not really apparent to me how this mechanism wouldn't be a giant
security hole.  In particular, I see no meaningful difference between
granting the proposed "impersonate" privilege and granting superuser.
You could restrict it to not allow impersonating superusers, which'd
make it a little better, but what of all the predefined roles we keep
inventing that have privileges we don't want to be accessible to Joe
User?  I think by the time you got to something where "impersonate"
wasn't a security hole, it would be pretty much equivalent to SET ROLE,
ie you could only impersonate users you'd been specifically given the
privlege for.

It also seems like this is transferring the entire responsibility for
user authentication onto the middleware, with the server expected to
just believe that the session should now be allowed to act as user X.
Again, that seems more like a security problem than a good design.

Another issue is that the proposed implementation seems pretty far
short of being an invisible substitute for actually logging in as
the target user.  For starters, what of instantiating ALTER ROLE SET
parameter values that apply to the target user, and getting rid of
such settings that applied to the original user ID?  How would this
interact with the "on login" triggers that people keep asking for?

Also, you'd still have to do DISCARD ALL (at least) when switching
users, or else integrate all that cache-flushing right into the
switching mechanism.  So I'm not entirely convinced about how big
the performance benefits are compared to a fresh session.

One more point is that the proposed business about

* ImpersonateDatabaseUser will either succeed silently (0-RTT), or
  fail. Upon failure, no further commands will be processed until
  ImpersonateDatabaseUser succeeds.

seems to require adding a huge amount of complication on the server side,
and complication in the protocol spec itself, to save a rather minimal
amount of complication in the middleware.  Why can't we just say that
a failed "impersonate" command leaves the session in the same state
as before, and it's up to the pooler to do something about it?  We are
in any case trusting the pooler not to send commands from user A to
a session logged in as user B.

            regards, tom lane

PS: I wonder how we test such a feature meaningfully without
incorporating a pooler right into the Postgres tree.  I don't
want to do that, for sure.



Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

From
Todd Hubers
Date:
Hi Tom, Justin, and Andrey,

Thanks everybody for your feedback so far! I agree, there are a few unknowns for the design and impact and there are many details to iron out.

Benchmarking - Overall I think it's best to explore improvements with benchmarking. The key goal of this proposal pertains to performance: "fast". That means benchmarking is essential in the design phase, to ensure there are measurable improvements, and reward for effort. This should also primarily influence the selection of Solution Option (of which there are 6 or 7 listed). I have added the proposed benchmarks to the document.

1) Andrey said: "And these expenses would be not necessary if we could just send a new Startup message after the Terminate ('X') message." - Answer: I have added the "Terminate/Restartup" benchmark in the document accordingly.

2) Justin said: "However, this doesn't attempt to address the similar issue if there's hundreds or thousands of DBs.  Which I don't think could work at all, since switching to a new DB requires connecting to a new backend process."

Good point. I do have some loose thinking around that. I do have some comments throughout the proposal document to tackle that in the future. As you point out, user switching seems to be a simpler well-trodden path. For now, my proposal is that the middleware can assert a database change, but that should create an error for now. In the future, it might be a supported capability.

I have also added database-switching as a consideration for the benchmarking.

3) Justin said: "You'd need to make sure that a client connected to the connection pooler cannot itself run the PQ "set role".  The connection pooler would need to reject the request (or maybe ignore requests to switch to the same/current user)." - Answer: Agreed.

4) Justin said: "Maybe you'd have two protocol requests "begin switch user" and "end switch to user", and then the server-side could enforce that "begin switch" is not nested. Maybe the "begin switch" could return some kind of "nonce" to the connection pooler, and "end switch" would require the same nonce to be validated."

Agreed, something like this should be suitable. For performance, I would prefer to not use a Nonce because that would not be a 0-RTT approach. The PQ level ImpersonateDatabaseUser is effectively the "begin". I don't think an "end" is necessary, because it is expected to switch the user very often. The next "begin" is effectively ending the last Impersonation. The middleware could switch back to its own username, but as per my draft proposal, that user should only be allowed (as a connection bound authorisation for ImpersonateDatabaseUser) to impersonate and nothing else.

And then there are yet 5 other solution options which have an explicit Begin/End pattern.

5) Justin said: "It'd be interesting to hear if you've tested with postgresql 14, which improves scalability to larger number of connections."

Agreed. I will include a baseline in the benchmarks. Regardless of improvements, it won't be infinite and there will still be a need to "enable" pooling where there are MAX_CONNECTIONS users using the system.

6) Tom said: "It's not really apparent to me how this mechanism wouldn't be a giant security hole."

I have added your important concerns to a Security Considerations section of the document:
  • "What is the difference between granting the proposed "impersonate" privilege and granting superuser?"
  • "Is this merely transferring the entire responsibility for user authentication onto the middleware, with the server expected to just believe that the session should now be allowed to act as user X?"
7) Tom said: "I think by the time you got to something where "impersonate" wasn't a security hole, it would be pretty much equivalent to SET ROLE"

I don't think so. This is already contemplated in the draft. See Solution Option 4. SET ROLE comes with RESET ROLE. The frontend client could call RESET ROLE then change to whatever role they like. That means that feature is not currently suitable to be used for the context of this proposal.

8) Tom said: "Another issue is that the proposed implementation seems pretty far short of being an invisible substitute for actually logging in as the target user"

The goal of this proposal is performance and to enable shared connection pooling. Direct logging in doesn't allow the reuse of existing connections in the pool.

9) Tom said: "For starters, what of instantiating ALTER ROLE SET parameter values that apply to the target user, and getting rid of such settings that applied to the original user ID?"

I think you are suggesting to modify the already-logged-in user. That's interesting. However, many systems audit the logged in user by username - who they are. Furthermore, the modification of user privileges would not help to enable connection pooling as rehashed in my answer to [8].

10) Tom said: "How would this interact with the "on login" triggers that people keep asking for?

That's a good point. I would imagine that SET ROLE (which is currently unsuitable) would have the same requirement. The answer is Shared Functions. SET ROLE calls a function like "SetSessionUserId". Our implementation should call the same function(s). If OnLogin functionality is implemented they should trigger from there.

11) Tom said: "Also, you'd still have to do DISCARD ALL (at least) when switching users, or else integrate all that cache-flushing right into the switching mechanism.  So I'm not entirely convinced about how big the performance benefits are compared to a fresh session."

Agreed, nor am I convinced. We should only be guided by benchmarks and tests, not subjective assumptions. See the Benchmarking section in the document for details.

12 Tom said: "...[Upon failure, no further commands will be processed until ImpersonateDatabaseUser succeeds.] seems to require adding a huge amount of complication on the server side, and complication in the protocol spec itself, to save a rather minimal amount of complication in the middleware.  Why can't we just say that a failed "impersonate" command leaves the session in the same state as before, and it's up to the pooler to do something about it?  We are in any case trusting the pooler not to send commands from user A to a session logged in as user B. We are in any case trusting the pooler not to send commands from user A to a session logged in as user B."

I think you are overstating the complexity. It only requires a LastUserSwitchFailed boolean which is cleared to false when a UserSwitch succeeds. If LastUserSwitchFailed is true, tcop ignores the messages and sends back errors. This detail has been added to the proposal document.

It's important that the implementation is objectively faster. The 0-RTT design is proposed for efficiency. The Middleware might be able to fit BOTH the UserSwitch AND the Query within a 1500 MTU. If not, it shouldn't wait for a confirmation - for efficiency. The middleware might be on localhost, or it might be 1-5ms away on the LAN. Effectively, the UserSwitch is a sort of "Header" before a series of commands. Performance is the goal. Therefore, the connection cannot be left in the same state as before, or else the pending Query will run in the incorrect context. This is a rare failure mode, so failure is ideal.

Ultimately these are assumptions and benchmarking results should drive decisions around the implementation of every aspect.

13. Tom said: "I wonder how we test such a feature meaningfully without incorporating a pooler right into the Postgres tree."

We can benchmark without a pooler - see the Benchmark section for details. (Furthermore, I propose that general benchmark tooling does belong in Postgres for the benefit of the ecosystem of connection poolers. I have included such a remark in the Benchmarking section "PostgreSQL is not planning to incorporate Connection Pooling...".)

Thanks again everyone for the tough questions and the ideas!

Regards,

Todd

On Sun, 21 Nov 2021 at 08:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Sun, Nov 21, 2021 at 03:11:03AM +1100, Todd Hubers wrote:
>> - Google Document with Commenting turned on
>> https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit?usp=sharing.

> You proposed a PQ protocol version of SET ROLE/SET SESSION authorization.
> You'd need to make sure that a client connected to the connection pooler cannot
> itself run the PQ "set role".

It's not really apparent to me how this mechanism wouldn't be a giant
security hole.  In particular, I see no meaningful difference between
granting the proposed "impersonate" privilege and granting superuser.
You could restrict it to not allow impersonating superusers, which'd
make it a little better, but what of all the predefined roles we keep
inventing that have privileges we don't want to be accessible to Joe
User?  I think by the time you got to something where "impersonate"
wasn't a security hole, it would be pretty much equivalent to SET ROLE,
ie you could only impersonate users you'd been specifically given the
privlege for.

It also seems like this is transferring the entire responsibility for
user authentication onto the middleware, with the server expected to
just believe that the session should now be allowed to act as user X.
Again, that seems more like a security problem than a good design.

Another issue is that the proposed implementation seems pretty far
short of being an invisible substitute for actually logging in as
the target user.  For starters, what of instantiating ALTER ROLE SET
parameter values that apply to the target user, and getting rid of
such settings that applied to the original user ID?  How would this
interact with the "on login" triggers that people keep asking for?

Also, you'd still have to do DISCARD ALL (at least) when switching
users, or else integrate all that cache-flushing right into the
switching mechanism.  So I'm not entirely convinced about how big
the performance benefits are compared to a fresh session.

One more point is that the proposed business about

* ImpersonateDatabaseUser will either succeed silently (0-RTT), or
  fail. Upon failure, no further commands will be processed until
  ImpersonateDatabaseUser succeeds.

seems to require adding a huge amount of complication on the server side,
and complication in the protocol spec itself, to save a rather minimal
amount of complication in the middleware.  Why can't we just say that
a failed "impersonate" command leaves the session in the same state
as before, and it's up to the pooler to do something about it?  We are
in any case trusting the pooler not to send commands from user A to
a session logged in as user B.

                        regards, tom lane

PS: I wonder how we test such a feature meaningfully without
incorporating a pooler right into the Postgres tree.  I don't
want to do that, for sure.


--
--
Todd Hubers

Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

From
Daniel Gustafsson
Date:
> On 21 Nov 2021, at 03:05, Todd Hubers <todd.hubers@gmail.com> wrote:

> 10) Tom said: "How would this interact with the "on login" triggers that people keep asking for?
>
> That's a good point. I would imagine that SET ROLE (which is currently unsuitable) would have the same requirement.
Theanswer is Shared Functions. SET ROLE calls a function like "SetSessionUserId". Our implementation should call the
samefunction(s). If OnLogin functionality is implemented they should trigger from there. 

I think thats conflating session_user and current_user, SET ROLE is not a login
event. This is by design and discussed in the documentation:

    "SET ROLE does not process session variables as specified by the role's
    ALTER ROLE settings; this only happens during login."

The current patch proposal doesn't fire the login event trigger on SET ROLE,
only on actual logins.  That patch needs more review before landing, but I'm
not sure tying it to SET ROLE is a good idea.

> 13. Tom said: "I wonder how we test such a feature meaningfully without incorporating a pooler right into the
Postgrestree." 
>
> We can benchmark without a pooler - see the Benchmark section for details. (Furthermore, I propose that general
benchmarktooling does belong in Postgres for the benefit of the ecosystem of connection poolers. I have included such a
remarkin the Benchmarking section "PostgreSQL is not planning to incorporate Connection Pooling...".) 

We might be talking about the same things using different words, but it's
important to remember that we need to cover the functionality in terms of
*tests* first, performance benchmarking is another concern.

--
Daniel Gustafsson        https://vmware.com/




Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

From
Jacob Champion
Date:
On Sat, 2021-11-20 at 16:16 -0500, Tom Lane wrote:
> One more point is that the proposed business about 
> 
> * ImpersonateDatabaseUser will either succeed silently (0-RTT), or
>   fail. Upon failure, no further commands will be processed until
>   ImpersonateDatabaseUser succeeds.
> 
> seems to require adding a huge amount of complication on the server side,
> and complication in the protocol spec itself, to save a rather minimal
> amount of complication in the middleware.  Why can't we just say that
> a failed "impersonate" command leaves the session in the same state
> as before, and it's up to the pooler to do something about it?  We are
> in any case trusting the pooler not to send commands from user A to
> a session logged in as user B.

When combined with the 0-RTT goal, I think a silent ignore would just
invite more security problems. Todd is effectively proposing packet
pipelining, so the pipeline has to fail shut.

A more modern approach might be to attach the authentication to the
packet itself (e.g. cryptographically, with a MAC), if the goal is to
enable per-statement authentication anyway. In theory that turns the
middleware into a message passer instead of a confusable deputy. But it
requires more complicated setup between the client and server.

> PS: I wonder how we test such a feature meaningfully without
> incorporating a pooler right into the Postgres tree.  I don't
> want to do that, for sure.

Having protocol-level tests for bytes on the wire would not only help
proposals like this but also get coverage for a huge number of edge
cases. Magnus has added src/test/protocol for the server, written in
Perl, in his PROXY proposal. And I've added a protocol suite for both
the client and server, written in Python/pytest, in my OAuth proof of
concept. I think something is badly needed in this area.

--Jacob

Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

From
Todd Hubers
Date:

Hi Jacob and Daniel,

Thanks for your feedback.

>@Daniel - I think thats conflating session_user and current_user, SET ROLE is not a login event. This is by design and discussed in the documentation..

Agreed, I am using those terms loosely. I have updated option 4 in the proposal document. I have crossed it out. Option 5 is more suitable "SET SESSION AUTHORIZATION" for further consideration.

>@Daniel - but it's important to remember that we need to cover the functionality in terms of *tests* first, performance benchmarking is another concern.

For implementation absolutely, but not for a basic feasibility prototype. A quick non-secure non-reliable prototype is probably an important first-step to confirming which options work best for the stated goals. Importantly, if the improvement is only 5% (whatever that might mean), then the project is probably not work starting. But I do expect that a benchmark will prove benefits that justify the resources to build the feature(s).

>@Jacob - A more modern approach might be to attach the authentication to the packet itself (e.g. cryptographically, with a MAC), if the goal is to enable per-statement authentication anyway. In theory that turns the middleware into a message passer instead of a confusable deputy. But it requires more complicated setup between the client and server.

I did consider this, but I ruled it out. I have now added it to the proposal document, and included two Issues. Please review and let me know whether I might be mistaken.

>@Jacob - Having protocol-level tests for bytes on the wire would not only help proposals like this but also get coverage for a huge number of edge cases. Magnus has added src/test/protocol for the server, written in Perl, in his PROXY proposal. And I've added a protocol suite for both the client and server, written in Python/pytest, in my OAuth proof of concept. I think something is badly needed in this area.

Thanks for highlighting this emerging work. I have noted this in the proposal in the Next Steps section.

--Todd


On Tue, 23 Nov 2021 at 12:12, Jacob Champion <pchampion@vmware.com> wrote:
On Sat, 2021-11-20 at 16:16 -0500, Tom Lane wrote:
> One more point is that the proposed business about
>
> * ImpersonateDatabaseUser will either succeed silently (0-RTT), or
>   fail. Upon failure, no further commands will be processed until
>   ImpersonateDatabaseUser succeeds.
>
> seems to require adding a huge amount of complication on the server side,
> and complication in the protocol spec itself, to save a rather minimal
> amount of complication in the middleware.  Why can't we just say that
> a failed "impersonate" command leaves the session in the same state
> as before, and it's up to the pooler to do something about it?  We are
> in any case trusting the pooler not to send commands from user A to
> a session logged in as user B.

When combined with the 0-RTT goal, I think a silent ignore would just
invite more security problems. Todd is effectively proposing packet
pipelining, so the pipeline has to fail shut.

A more modern approach might be to attach the authentication to the
packet itself (e.g. cryptographically, with a MAC), if the goal is to
enable per-statement authentication anyway. In theory that turns the
middleware into a message passer instead of a confusable deputy. But it
requires more complicated setup between the client and server.

> PS: I wonder how we test such a feature meaningfully without
> incorporating a pooler right into the Postgres tree.  I don't
> want to do that, for sure.

Having protocol-level tests for bytes on the wire would not only help
proposals like this but also get coverage for a huge number of edge
cases. Magnus has added src/test/protocol for the server, written in
Perl, in his PROXY proposal. And I've added a protocol suite for both
the client and server, written in Python/pytest, in my OAuth proof of
concept. I think something is badly needed in this area.

--Jacob


--
--
Todd Hubers
Hi Everyone,

I have started working on this:
  • Benchmarking - increasingly more comprehensive benchmarking
  • Prototyping - to simulate the change of users (toggling back and forth)
  • Draft Implementation - of OPTION-1 (New Protocol Message)
  • (Then: Working with Odyssey and PgBouncer to add support (when the GRANT role privilege is available))
I hope to have a patch ready by the end of March.

Regards,

Todd

On Wed, 24 Nov 2021 at 02:46, Todd Hubers <todd.hubers@gmail.com> wrote:

Hi Jacob and Daniel,

Thanks for your feedback.

>@Daniel - I think thats conflating session_user and current_user, SET ROLE is not a login event. This is by design and discussed in the documentation..

Agreed, I am using those terms loosely. I have updated option 4 in the proposal document. I have crossed it out. Option 5 is more suitable "SET SESSION AUTHORIZATION" for further consideration.

>@Daniel - but it's important to remember that we need to cover the functionality in terms of *tests* first, performance benchmarking is another concern.

For implementation absolutely, but not for a basic feasibility prototype. A quick non-secure non-reliable prototype is probably an important first-step to confirming which options work best for the stated goals. Importantly, if the improvement is only 5% (whatever that might mean), then the project is probably not work starting. But I do expect that a benchmark will prove benefits that justify the resources to build the feature(s).

>@Jacob - A more modern approach might be to attach the authentication to the packet itself (e.g. cryptographically, with a MAC), if the goal is to enable per-statement authentication anyway. In theory that turns the middleware into a message passer instead of a confusable deputy. But it requires more complicated setup between the client and server.

I did consider this, but I ruled it out. I have now added it to the proposal document, and included two Issues. Please review and let me know whether I might be mistaken.

>@Jacob - Having protocol-level tests for bytes on the wire would not only help proposals like this but also get coverage for a huge number of edge cases. Magnus has added src/test/protocol for the server, written in Perl, in his PROXY proposal. And I've added a protocol suite for both the client and server, written in Python/pytest, in my OAuth proof of concept. I think something is badly needed in this area.

Thanks for highlighting this emerging work. I have noted this in the proposal in the Next Steps section.

--Todd


On Tue, 23 Nov 2021 at 12:12, Jacob Champion <pchampion@vmware.com> wrote:
On Sat, 2021-11-20 at 16:16 -0500, Tom Lane wrote:
> One more point is that the proposed business about
>
> * ImpersonateDatabaseUser will either succeed silently (0-RTT), or
>   fail. Upon failure, no further commands will be processed until
>   ImpersonateDatabaseUser succeeds.
>
> seems to require adding a huge amount of complication on the server side,
> and complication in the protocol spec itself, to save a rather minimal
> amount of complication in the middleware.  Why can't we just say that
> a failed "impersonate" command leaves the session in the same state
> as before, and it's up to the pooler to do something about it?  We are
> in any case trusting the pooler not to send commands from user A to
> a session logged in as user B.

When combined with the 0-RTT goal, I think a silent ignore would just
invite more security problems. Todd is effectively proposing packet
pipelining, so the pipeline has to fail shut.

A more modern approach might be to attach the authentication to the
packet itself (e.g. cryptographically, with a MAC), if the goal is to
enable per-statement authentication anyway. In theory that turns the
middleware into a message passer instead of a confusable deputy. But it
requires more complicated setup between the client and server.

> PS: I wonder how we test such a feature meaningfully without
> incorporating a pooler right into the Postgres tree.  I don't
> want to do that, for sure.

Having protocol-level tests for bytes on the wire would not only help
proposals like this but also get coverage for a huge number of edge
cases. Magnus has added src/test/protocol for the server, written in
Perl, in his PROXY proposal. And I've added a protocol suite for both
the client and server, written in Python/pytest, in my OAuth proof of
concept. I think something is badly needed in this area.

--Jacob


--
--
Todd Hubers


--
--
Todd Hubers

Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

From
Todd Hubers
Date:
Hi Everyone,

Benchmarking work has commenced, and is ongoing.
  • OPTIONS 5/6/7 - `SET SESSION AUTHORIZATION` takes double the time of a single separate SimpleQuery. This is to be expected, because double the amount of SimpleQuery messages are being sent, and that requires a full SimpleQuery/Result/Ready cycle. If there is significant latency between a Connection Pooler and the database, this delay is amplified. It would be possible to concatenate text into a single SimpleQuery. In the real world, the performance impact MAY be negligible.
  • OPTION 0 - The time to reconnect (start a new connection from scratch with a different username/password) was found to be faster than using `SET SESSION AUTHORIZATION`.
  • OPTION 1 - My team is continuing to explore a distinct Impersonate message (Option-1). We are completing a prototype-quality implementation, and then benchmarking it. Given that Option-1 is asynchronous (Request and expect to succeed) and it can even be included within the same TCP packet as the SimpleQuery (at times), we expect the performance will be better than restarting a connection, and not impacted by links of higher latency.
I will be recording benchmark results in the document: https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit# after completion of the OPTION-1 prototype and benchmarking of that prototype.

Note: In order to accommodate something like OPTION-8, an Impersonation message might have a flag (valid for 1x SimpleQuery only, then automatically restore back to the last user).

Regards,

Todd


On Fri, 7 Jan 2022 at 10:55, Todd Hubers <todd.hubers@gmail.com> wrote:
Hi Everyone,

I have started working on this:
  • Benchmarking - increasingly more comprehensive benchmarking
  • Prototyping - to simulate the change of users (toggling back and forth)
  • Draft Implementation - of OPTION-1 (New Protocol Message)
  • (Then: Working with Odyssey and PgBouncer to add support (when the GRANT role privilege is available))
I hope to have a patch ready by the end of March.

Regards,

Todd

On Wed, 24 Nov 2021 at 02:46, Todd Hubers <todd.hubers@gmail.com> wrote:

Hi Jacob and Daniel,

Thanks for your feedback.

>@Daniel - I think thats conflating session_user and current_user, SET ROLE is not a login event. This is by design and discussed in the documentation..

Agreed, I am using those terms loosely. I have updated option 4 in the proposal document. I have crossed it out. Option 5 is more suitable "SET SESSION AUTHORIZATION" for further consideration.

>@Daniel - but it's important to remember that we need to cover the functionality in terms of *tests* first, performance benchmarking is another concern.

For implementation absolutely, but not for a basic feasibility prototype. A quick non-secure non-reliable prototype is probably an important first-step to confirming which options work best for the stated goals. Importantly, if the improvement is only 5% (whatever that might mean), then the project is probably not work starting. But I do expect that a benchmark will prove benefits that justify the resources to build the feature(s).

>@Jacob - A more modern approach might be to attach the authentication to the packet itself (e.g. cryptographically, with a MAC), if the goal is to enable per-statement authentication anyway. In theory that turns the middleware into a message passer instead of a confusable deputy. But it requires more complicated setup between the client and server.

I did consider this, but I ruled it out. I have now added it to the proposal document, and included two Issues. Please review and let me know whether I might be mistaken.

>@Jacob - Having protocol-level tests for bytes on the wire would not only help proposals like this but also get coverage for a huge number of edge cases. Magnus has added src/test/protocol for the server, written in Perl, in his PROXY proposal. And I've added a protocol suite for both the client and server, written in Python/pytest, in my OAuth proof of concept. I think something is badly needed in this area.

Thanks for highlighting this emerging work. I have noted this in the proposal in the Next Steps section.

--Todd


On Tue, 23 Nov 2021 at 12:12, Jacob Champion <pchampion@vmware.com> wrote:
On Sat, 2021-11-20 at 16:16 -0500, Tom Lane wrote:
> One more point is that the proposed business about
>
> * ImpersonateDatabaseUser will either succeed silently (0-RTT), or
>   fail. Upon failure, no further commands will be processed until
>   ImpersonateDatabaseUser succeeds.
>
> seems to require adding a huge amount of complication on the server side,
> and complication in the protocol spec itself, to save a rather minimal
> amount of complication in the middleware.  Why can't we just say that
> a failed "impersonate" command leaves the session in the same state
> as before, and it's up to the pooler to do something about it?  We are
> in any case trusting the pooler not to send commands from user A to
> a session logged in as user B.

When combined with the 0-RTT goal, I think a silent ignore would just
invite more security problems. Todd is effectively proposing packet
pipelining, so the pipeline has to fail shut.

A more modern approach might be to attach the authentication to the
packet itself (e.g. cryptographically, with a MAC), if the goal is to
enable per-statement authentication anyway. In theory that turns the
middleware into a message passer instead of a confusable deputy. But it
requires more complicated setup between the client and server.

> PS: I wonder how we test such a feature meaningfully without
> incorporating a pooler right into the Postgres tree.  I don't
> want to do that, for sure.

Having protocol-level tests for bytes on the wire would not only help
proposals like this but also get coverage for a huge number of edge
cases. Magnus has added src/test/protocol for the server, written in
Perl, in his PROXY proposal. And I've added a protocol suite for both
the client and server, written in Python/pytest, in my OAuth proof of
concept. I think something is badly needed in this area.

--Jacob


--
--
Todd Hubers


--
--
Todd Hubers


--
--
Todd Hubers
Hi Everyone,

Here is a progress update. I have an established team of 2 fulltime systems programmers who have been working on this area for a couple of months now.
  • Past
    • Impersonation - a prototype has been completed for Option-1 "Impersonation"
    • Benchmarking - has been completed on a range of options, including Option-0 "Reconnection".
    • Both Impersonation and the Benchmarking is currently on the backburner
  • Current: Notification Concentrator - This is not PostgreSQL Codebase work. This project makes NOTIFY/LISTEN work in Odyssey (and others) while in Transaction mode. (Until now, NOTIFY/LISTEN can only work in SESSION mode). We intend to also build patches for PgBouncer, and other popular Connection Pool systems.
    • It works in Session mode, but my organisation needs it to work in Transaction mode. It works by intercepting LISTEN/UNLISTEN in SQL and redirecting them to a single shared connection. There will be a Pub/Sub system within Odyssey. The LISTEN/UNLISTEN is only sent for the first subscriber or last unsubscriber accordingly. The NOTIFICATION messages are then dispatched to the Subscriber list. At most only one SESSION connection is required.
  • Next:
    • Update Benchmarking: I then expect to update Benchmarks with a range of prototype solutions, with both Impersonation and Notification Concentrator for final review.
    • Publishing Benchmarking: I will send our results here, and offer a patch for such benchmarking code.
    • Final Implementation: The team will finalise code for production-grade implementations, and tests
    • Patches: Then my team will submit a patch for PostgreSQL, Odyssey and others; working to polish anything else that might be required of us.
Todd

On Wed, 2 Feb 2022 at 10:56, Todd Hubers <todd.hubers@gmail.com> wrote:
Hi Everyone,

Benchmarking work has commenced, and is ongoing.
  • OPTIONS 5/6/7 - `SET SESSION AUTHORIZATION` takes double the time of a single separate SimpleQuery. This is to be expected, because double the amount of SimpleQuery messages are being sent, and that requires a full SimpleQuery/Result/Ready cycle. If there is significant latency between a Connection Pooler and the database, this delay is amplified. It would be possible to concatenate text into a single SimpleQuery. In the real world, the performance impact MAY be negligible.
  • OPTION 0 - The time to reconnect (start a new connection from scratch with a different username/password) was found to be faster than using `SET SESSION AUTHORIZATION`.
  • OPTION 1 - My team is continuing to explore a distinct Impersonate message (Option-1). We are completing a prototype-quality implementation, and then benchmarking it. Given that Option-1 is asynchronous (Request and expect to succeed) and it can even be included within the same TCP packet as the SimpleQuery (at times), we expect the performance will be better than restarting a connection, and not impacted by links of higher latency.
I will be recording benchmark results in the document: https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit# after completion of the OPTION-1 prototype and benchmarking of that prototype.

Note: In order to accommodate something like OPTION-8, an Impersonation message might have a flag (valid for 1x SimpleQuery only, then automatically restore back to the last user).

Regards,

Todd


On Fri, 7 Jan 2022 at 10:55, Todd Hubers <todd.hubers@gmail.com> wrote:
Hi Everyone,

I have started working on this:
  • Benchmarking - increasingly more comprehensive benchmarking
  • Prototyping - to simulate the change of users (toggling back and forth)
  • Draft Implementation - of OPTION-1 (New Protocol Message)
  • (Then: Working with Odyssey and PgBouncer to add support (when the GRANT role privilege is available))
I hope to have a patch ready by the end of March.

Regards,

Todd

On Wed, 24 Nov 2021 at 02:46, Todd Hubers <todd.hubers@gmail.com> wrote:

Hi Jacob and Daniel,

Thanks for your feedback.

>@Daniel - I think thats conflating session_user and current_user, SET ROLE is not a login event. This is by design and discussed in the documentation..

Agreed, I am using those terms loosely. I have updated option 4 in the proposal document. I have crossed it out. Option 5 is more suitable "SET SESSION AUTHORIZATION" for further consideration.

>@Daniel - but it's important to remember that we need to cover the functionality in terms of *tests* first, performance benchmarking is another concern.

For implementation absolutely, but not for a basic feasibility prototype. A quick non-secure non-reliable prototype is probably an important first-step to confirming which options work best for the stated goals. Importantly, if the improvement is only 5% (whatever that might mean), then the project is probably not work starting. But I do expect that a benchmark will prove benefits that justify the resources to build the feature(s).

>@Jacob - A more modern approach might be to attach the authentication to the packet itself (e.g. cryptographically, with a MAC), if the goal is to enable per-statement authentication anyway. In theory that turns the middleware into a message passer instead of a confusable deputy. But it requires more complicated setup between the client and server.

I did consider this, but I ruled it out. I have now added it to the proposal document, and included two Issues. Please review and let me know whether I might be mistaken.

>@Jacob - Having protocol-level tests for bytes on the wire would not only help proposals like this but also get coverage for a huge number of edge cases. Magnus has added src/test/protocol for the server, written in Perl, in his PROXY proposal. And I've added a protocol suite for both the client and server, written in Python/pytest, in my OAuth proof of concept. I think something is badly needed in this area.

Thanks for highlighting this emerging work. I have noted this in the proposal in the Next Steps section.

--Todd


On Tue, 23 Nov 2021 at 12:12, Jacob Champion <pchampion@vmware.com> wrote:
On Sat, 2021-11-20 at 16:16 -0500, Tom Lane wrote:
> One more point is that the proposed business about
>
> * ImpersonateDatabaseUser will either succeed silently (0-RTT), or
>   fail. Upon failure, no further commands will be processed until
>   ImpersonateDatabaseUser succeeds.
>
> seems to require adding a huge amount of complication on the server side,
> and complication in the protocol spec itself, to save a rather minimal
> amount of complication in the middleware.  Why can't we just say that
> a failed "impersonate" command leaves the session in the same state
> as before, and it's up to the pooler to do something about it?  We are
> in any case trusting the pooler not to send commands from user A to
> a session logged in as user B.

When combined with the 0-RTT goal, I think a silent ignore would just
invite more security problems. Todd is effectively proposing packet
pipelining, so the pipeline has to fail shut.

A more modern approach might be to attach the authentication to the
packet itself (e.g. cryptographically, with a MAC), if the goal is to
enable per-statement authentication anyway. In theory that turns the
middleware into a message passer instead of a confusable deputy. But it
requires more complicated setup between the client and server.

> PS: I wonder how we test such a feature meaningfully without
> incorporating a pooler right into the Postgres tree.  I don't
> want to do that, for sure.

Having protocol-level tests for bytes on the wire would not only help
proposals like this but also get coverage for a huge number of edge
cases. Magnus has added src/test/protocol for the server, written in
Perl, in his PROXY proposal. And I've added a protocol suite for both
the client and server, written in Python/pytest, in my OAuth proof of
concept. I think something is badly needed in this area.

--Jacob


--
--
Todd Hubers


--
--
Todd Hubers


--
--
Todd Hubers


--
--
Todd Hubers