Re: POC: Extension for adding distributed tracing - pg_tracing - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: POC: Extension for adding distributed tracing - pg_tracing |
Date | |
Msg-id | 787b5ad0-f727-473c-ba25-5b9f6813fb80@iki.fi Whole thread Raw |
In response to | Re: POC: Extension for adding distributed tracing - pg_tracing (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>) |
Responses |
Re: POC: Extension for adding distributed tracing - pg_tracing
|
List | pgsql-hackers |
Hi Anthonin, I'm only now catching up on this thread. Very exciting feature! My first observation is that you were able to implement this purely as an extension, without any core changes. That's very cool, because: - You don't need buy-in or blessing from anyone else. You can just put this on github and people can immediately start using it. - As an extension, it's not tied to the PostgreSQL release cycle. Users can immediately start using it with existing PostgreSQL versions. There are benefits to being in contrib/ vs. an extension outside the PostgreSQL source tree, but there are significant downsides too. Out-of-tree, you get more flexibility, and you can develop much faster. I think this should live out-of-tree, at least until it's relatively stable. By stable, I mean "not changing much", not that it's bug-free. # Core changes That said, there are a lot of things that would make sense to integrate in PostgreSQL itself. For example: - You're relying on various existing hooks, but it might be better to have the 'begin_span'/'end_span' calls directly in PostgreSQL source code in the right places. There are more places where spans might be nice, like when performing a large sort in tuplesort.c to name one example. Or even across every I/O; not sure how much overhead the spans incur and how much we'd be willing to accept. 'begin_span'/'end_span' could be new hooks that normally do nothing, but your extension would implement them. - Other extensions could include begin_span/end_span calls too. (It's complicated for an extension to call functions in another extension.) - Extensions like postgres_fdw could get the tracing context and propagate it to the remote system too. - How to pass the tracing context from the client? You went with SQLCommenter and others proposed a GUC. A protocol extension would make sense too. I can see merits to all of those. It probably would be good to support multiple mechanisms, but some might need core changes. It might be good to implement the mechanism for accepting trace context in core. Without the extension, we could still include the trace ID in the logs, for example. # Further ideas Some ideas on cool extra features on top of this: - SQL functions to begin/end a span. Could be handy if you have complicated PL/pgSQL functions, for example. - Have spans over subtransactions. - Include elog() messages in the traces. You might want to have a lower elevel for what's included in traces; something like the client_min_messages and log_min_messages settings. - Include EXPLAIN plans in the traces. # Comments on the implementation There was discussion already on push vs pull model. Currently, you collect the traces in memory / files, and require a client to poll them. A push model is more common in applications that support tracing. If Postgres could connect directly to the OTEL collector, you'd need one fewer running component. You could keep the traces in backend-private memory, no need to deal with shared memory and spilling to files. Admittedly supporting the OTEL wire protocol is complicated unless you use an existing library. Nevertheless, it might be a better tradeoff. OTEL/HTTP with JSON format seems just about feasible to implement from scratch. Or bite the bullet and use some external library. If this lives as an extension, you have more freedom to rely on 3rd party libraries. (If you publish this as an out-of-tree extension, then this is up to you, of course, and doesn't need consensus here on pgsql-hackers) # Suggested next steps with this Here's how I'd suggest to proceed with this: 1. Publish this as an extension on github, as it is. I think you'll get a lot of immediate adoption. 2. Start a new pgsql-hackers thread on in-core changes that would benefit the extension. Write patches for them. I'm thinking of the things I listed in the Core changes section above, but maybe there are others. PS. Does any other DBMS implement this? Any lessons to be learned from them? -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-hackers by date: