Thread: [HACKERS] Adding connection id in the startup message
As a cloud service, Azure Database for PostgreSQL uses a gateway proxy to route connections to a node hosting the actual server. Potentially there could be multiple hops (for example client, optional proxy at the client like pgbouncer for connection pooling, Azure gateway proxy, backend server) in between the client, and the server. For various reasons (client firewall rules, network issues etc.), the connection can be dropped before it is fully authenticated at one of these hops, and it becomes extremely difficult to say where and why the connection is dropped.
The proposal is to tweak the connectivity wire protocol, and add a connection id (GUID) filed in the startup message. We can trace the connection using this GUID and investigate further on where the connection failed.
Client adds a connection id in the startup message and send it to the server it is trying to connect to. Proxy logs the connection id information in its logs, and passes it to the server. Server logs the connection Id in the server log, and set it in the GUC variable (ConnectionId).
When an attempt to connection to the server fails, the connection failed message must include the connection id in the message. This Id can be used to trace the connection end to end.
Customers can provide this Id to the support team to investigate the connectivity issues to the server, along with the server information.
This field can be an optional field driven by the connection parameters for PSql (-C or--include-clientId).
P.S: I am looking for initial feedback on this idea and can provide more design details if needed.
Thanks,
Satya
On 6/15/17 03:11, Satyanarayana Narlapuram wrote: > Client adds a connection id in the startup message and send it to the > server it is trying to connect to. Proxy logs the connection id > information in its logs, and passes it to the server. Server logs the > connection Id in the server log, and set it in the GUC variable > (ConnectionId). If you're using pgbouncer or similar, then it might reuse an already existing connection, so your connection ID wouldn't be passed on. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 6/15/17 03:11, Satyanarayana Narlapuram wrote: > > Client adds a connection id in the startup message and send it to the > > server it is trying to connect to. Proxy logs the connection id > > information in its logs, and passes it to the server. Server logs the > > connection Id in the server log, and set it in the GUC variable > > (ConnectionId). > > If you're using pgbouncer or similar, then it might reuse an already > existing connection, so your connection ID wouldn't be passed on. This tends to be what people (as pgbouncer does) use application_name for. Thanks! Stephen
Satyanarayana Narlapuram <Satyanarayana.Narlapuram@microsoft.com> writes: > As a cloud service, Azure Database for PostgreSQL uses a gateway proxy to route connections to a node hosting the actualserver. Potentially there could be multiple hops (for example client, optional proxy at the client like pgbouncer forconnection pooling, Azure gateway proxy, backend server) in between the client, and the server. For various reasons (clientfirewall rules, network issues etc.), the connection can be dropped before it is fully authenticated at one of thesehops, and it becomes extremely difficult to say where and why the connection is dropped. > The proposal is to tweak the connectivity wire protocol, and add a connection id (GUID) filed in the startup message. Wecan trace the connection using this GUID and investigate further on where the connection failed. > Client adds a connection id in the startup message and send it to the server it is trying to connect to. Proxy logs theconnection id information in its logs, and passes it to the server. Server logs the connection Id in the server log, andset it in the GUC variable (ConnectionId). > When an attempt to connection to the server fails, the connection failed message must include the connection id in themessage. This Id can be used to trace the connection end to end. > Customers can provide this Id to the support team to investigate the connectivity issues to the server, along with theserver information. This seems like a lot of added mechanism for not very much gain. In particular, it wouldn't help at all unless the client side were also on board with generating a connection UUID and making it visible to the end user, and then you'd have to get proxy authors on board, etc etc, so you have to sell the idea to a lot more people than just the server hackers. Can you give a concrete example where this would have helped above and beyond knowing, eg, the source and time of the connection attempt? regards, tom lane
On Thu, Jun 15, 2017 at 3:11 AM, Satyanarayana Narlapuram <Satyanarayana.Narlapuram@microsoft.com> wrote: > The proposal is to tweak the connectivity wire protocol, and add a > connection id (GUID) filed in the startup message. We can trace the > connection using this GUID and investigate further on where the connection > failed. Wire protocol changes are scary, because they can result in a need to update every client connector and every bit of middleware that speaks PostgreSQL, not just the server. If we thought this feature had enough value to justify adding it, we could add it as an optional feature so that existing clients don't break. But if we added it and only libpq and the server ended up supporting it, that would be a disappointing outcome. > Client adds a connection id in the startup message and send it to the server > it is trying to connect to. Proxy logs the connection id information in its > logs, and passes it to the server. Server logs the connection Id in the > server log, and set it in the GUC variable (ConnectionId). Are you imagining that this would be done by the client or by the driver the client is using? If the latter, it's transparent but maybe useless, because the client won't even know that this ID got created. If it's done by the client code proper, then it only works if not only the server and every bit of middleware and every connector but also every client application in the world is updated, which does not seem like a thing that is likely to occur. > This field can be an optional field driven by the connection parameters for > PSql (-C or--include-clientId). This makes it sound like you're imagining it being added by an explicit action on the part of the client, which means we're in update-every-PostgreSQL-application-in-the-world territory. > P.S: I am looking for initial feedback on this idea and can provide more > design details if needed. I think you need to build a more compelling case for why this would be useful, why application_name isn't the right answer, and how we avoid forcing everybody in the world to worry about this new thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 15, 2017 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Can you give a concrete example where this would have > helped above and beyond knowing, eg, the source and time of the connection > attempt? I can imagine that in really high-volume use cases (such as the OP apparently has) the number of client connections might be so large that identification by timestamp isn't useful, and the source IP will be obscured after the first hop through a connection pooler or other middleware. If you've got 100 connections per second coming in, matching things up by timestamp across different machines is going to be tough. But I agree with your other concerns. I think the problem is real, but I'm not sure that this is the best solution. On the other hand, I'm also not entirely sure I understand the proposal yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 21, 2017 at 12:15 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I think the problem is real, > but I'm not sure that this is the best solution. On the other hand, > I'm also not entirely sure I understand the proposal yet. Given the problems with changing the protocol it does seem like generalizing away from "connection id" to "arbitrary payload" would be useful. Like, add a new "attachment map" area where any application-aware node that encounters the message can attach data. If some node further downstream sends a response message the contents of the attachment map would be sent back as-is. The advantage of "connection id" is that the messages are still of fixed size whereas the more flexible arrangement would necessitate that message sizes vary. In return middleware gets a place to store session information that can be used more broadly than a simple externally generated connection id. David J.