Thread: protocol support for labels

protocol support for labels

From
Jeremy Schneider
Date:
pgconf.dev is coming up soon. I won't be able to make it to Montreal,
but I saw that Dave Cramer posted on twitter asking about postgres
protocol topics.

while I don't have a patch, I wanted to send an email about something:
it'd be great to have a place - perhaps like application_name - for
arbitrary labels; and we need protocol support to pass those labels
inline with the query instead of having to require a separate round
trip. it would also be good if these labels were somehow accessible in
logging formats.

observability frameworks like OpenTelemetry support tracing through all
layers of a stack, and trace_ids can even be passed into sql with
extensions like sqlcommenter. however sqlcommenter puts the trace_id
into a comment which effectively breaks all the utility of
pg_stat_statements since each query has a unique trace_id.

if protocol enhancements are on the table, I think it would be great
for this topic to get a little discussion.

-Jeremy



Re: protocol support for labels

From
Kirill Reshke
Date:
On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider <schneider@ardentperf.com> wrote:

> observability frameworks like OpenTelemetry support tracing through all
> layers of a stack, and trace_ids can even be passed into sql with
> extensions like sqlcommenter. however sqlcommenter puts the trace_id
> into a comment which effectively breaks all the utility of
> pg_stat_statements since each query has a unique trace_id.
>
There are some other use-cases:
1) RO-RW routing. Users can pass target-session-attrs to the server
within query labels to hint about its need. Useful when some kind of
proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
2) pg_comment_stats uses comments in the query to accumulate statistics. [0]

However, I don't think PostgreSQL community is open for this big
change for much (non-big) matters.

[0] git@github.com:munakoiso/pg_comment_stats.git

-- 
Best regards,
Kirill Reshke



Re: protocol support for labels

From
Jeremy Schneider
Date:
> On Mar 11, 2025, at 3:03 AM, Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider <schneider@ardentperf.com> wrote:
>
>> observability frameworks like OpenTelemetry support tracing through all
>> layers of a stack, and trace_ids can even be passed into sql with
>> extensions like sqlcommenter. however sqlcommenter puts the trace_id
>> into a comment which effectively breaks all the utility of
>> pg_stat_statements since each query has a unique trace_id.
>>
> There are some other use-cases:
> 1) RO-RW routing. Users can pass target-session-attrs to the server
> within query labels to hint about its need. Useful when some kind of
> proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
> 2) pg_comment_stats uses comments in the query to accumulate statistics. [0]


Thinking a bit more, my root issue is specifically around pg_stat_statements so maybe it’s also solvable with some
changesto how query jumbling is done 

But that topic seems like one where we’d never get consensus

Should query jumbling for calculating query_id be customizable somehow? How would we resolve multiple concurrent
opinionsabout how queries should be jumbled (eg if comment_stats needs different tweaks than sqlcommenter)? Was there
previousdiscussion about this already? I’ll need to go search mailing list history a bit 

-Jeremy


Sent from my TI-83


Re: protocol support for labels

From
Frits Hoogland
Date:
The usecase that I think might be useful is to have a database client send metadata along with a query.
This partially is possible today by setting application_name, but that is a separate request, it would be great if that could be sent along with the query in one go.
Another option to pass metadata is to add a comment (/* .. */), but a comment cannot be set for a prepared statement, because the statement is prepared first and then later invoked on runtime, which executes a query that is fixed.

Frits Hoogland




On 11 Mar 2025, at 15:49, Jeremy Schneider <schneider@ardentperf.com> wrote:


On Mar 11, 2025, at 3:03 AM, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider <schneider@ardentperf.com> wrote:

observability frameworks like OpenTelemetry support tracing through all
layers of a stack, and trace_ids can even be passed into sql with
extensions like sqlcommenter. however sqlcommenter puts the trace_id
into a comment which effectively breaks all the utility of
pg_stat_statements since each query has a unique trace_id.

There are some other use-cases:
1) RO-RW routing. Users can pass target-session-attrs to the server
within query labels to hint about its need. Useful when some kind of
proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
2) pg_comment_stats uses comments in the query to accumulate statistics. [0]


Thinking a bit more, my root issue is specifically around pg_stat_statements so maybe it’s also solvable with some changes to how query jumbling is done

But that topic seems like one where we’d never get consensus

Should query jumbling for calculating query_id be customizable somehow? How would we resolve multiple concurrent opinions about how queries should be jumbled (eg if comment_stats needs different tweaks than sqlcommenter)? Was there previous discussion about this already? I’ll need to go search mailing list history a bit

-Jeremy


Sent from my TI-83


Re: protocol support for labels

From
Dave Cramer
Date:

Dave Cramer
www.postgres.rocks


On Tue, 11 Mar 2025 at 12:23, Frits Hoogland <frits.hoogland@gmail.com> wrote:
The usecase that I think might be useful is to have a database client send metadata along with a query.
This partially is possible today by setting application_name, but that is a separate request, it would be great if that could be sent along with the query in one go.
Another option to pass metadata is to add a comment (/* .. */), but a comment cannot be set for a prepared statement, because the statement is prepared first and then later invoked on runtime, which executes a query that is fixed.

Frits Hoogland




On 11 Mar 2025, at 15:49, Jeremy Schneider <schneider@ardentperf.com> wrote:


On Mar 11, 2025, at 3:03 AM, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider <schneider@ardentperf.com> wrote:

observability frameworks like OpenTelemetry support tracing through all
layers of a stack, and trace_ids can even be passed into sql with
extensions like sqlcommenter. however sqlcommenter puts the trace_id
into a comment which effectively breaks all the utility of
pg_stat_statements since each query has a unique trace_id.

There are some other use-cases:
1) RO-RW routing. Users can pass target-session-attrs to the server
within query labels to hint about its need. Useful when some kind of
proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
2) pg_comment_stats uses comments in the query to accumulate statistics. [0]


Thinking a bit more, my root issue is specifically around pg_stat_statements so maybe it’s also solvable with some changes to how query jumbling is done

But that topic seems like one where we’d never get consensus

Should query jumbling for calculating query_id be customizable somehow? How would we resolve multiple concurrent opinions about how queries should be jumbled (eg if comment_stats needs different tweaks than sqlcommenter)? Was there previous discussion about this already? I’ll need to go search mailing list history a bit

-Jeremy


Sent from my TI-83


Jeremy,

Thanks for this. I am hoping to get consensus of ideas for changes to the protocol. 
Please join the discussion here https://discord.gg/bWum3hbM as well.

Krill, change starts with requests and without them nothing will happen. Please post your ideas.

Dave 

Re: protocol support for labels

From
Nico Williams
Date:
On Tue, Mar 11, 2025 at 05:23:14PM +0100, Frits Hoogland wrote:
> The usecase that I think might be useful is to have a database client send metadata along with a query.
> This partially is possible today by setting application_name, but that is a separate request, it would be great if
thatcould be sent along with the query in one go.
 
> Another option to pass metadata is to add a comment (/* .. */), but a comment cannot be set for a prepared statement,
becausethe statement is prepared first and then later invoked on runtime, which executes a query that is fixed.
 

How about using a `set_config()` to deonte the "application_name" (and
any other details) for the _next_ query, then have those details appear
in the pg_stat_statements rows and logs?

Clients would send a `SELECT set_config(...)` and also the next query
one after the other without waiting for the response to the first.  The
server could similarly batch the two responses.

Look ma', no protocol change.

Nico
-- 



Re: protocol support for labels

From
Jeremy Schneider
Date:
On Tue, 11 Mar 2025 14:03:12 -0500
Nico Williams <nico@cryptonector.com> wrote:
> How about using a `set_config()` to deonte the "application_name" (and
> any other details) for the _next_ query, then have those details
> appear in the pg_stat_statements rows and logs?
> 
> Clients would send a `SELECT set_config(...)` and also the next query
> one after the other without waiting for the response to the first.
> The server could similarly batch the two responses.

Isn't multiple queries in one packet only possible with the simple
protocol, but not possible with the extended protocol? So this would be
entirely incompatible with prepared/parameterized statements?

-Jeremy




Re: protocol support for labels

From
Nico Williams
Date:
On Wed, Mar 12, 2025 at 11:39:57PM -0700, Jeremy Schneider wrote:
> Isn't multiple queries in one packet only possible with the simple
> protocol, but not possible with the extended protocol? So this would be
> entirely incompatible with prepared/parameterized statements?

It's TCP.  Packet and segment boundaries are not exposed to the
application, so PG doesn't know that you sent "two queries in one
packet" (or in one segment).  It just reads one query, processes it,
then it reads the next one, processes it, etc.

So all that's needed here is for a well-defined GUC that can be set with
set_config() and whose purpose is to decorate subsequent queries in
pg_stat_statements and/or logs.  You could set it once for all your
queries or once per-query -- whatever.

As long as the client-side API can batch one of these calls to
set_config() followed by the actual query you want to do in the same
write() on the socket (TCP_CORK helps) then this idea will be barely
noticeable performance-wise.

Nico
--