Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Vladimir Sitnikov
Subject Re: Slowness of extended protocol
Date
Msg-id CAB=Je-EN0X4Qux155BLVFs-32pO6qiD2W=WVkpGTQS7fn22kNQ@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Shay Rojansky <roji@roji.org>)
Responses Re: Slowness of extended protocol  (Shay Rojansky <roji@roji.org>)
List pgsql-hackers
Shay Rojansky <roji@roji.org>:
Ah, I understand the proposal better now - you're not proposing encoding a new message type in an old one, but rather a magic statement name in Parse which triggers an optimized processing path in PostgreSQL, that wouldn't go through the query cache etc.

Exactly.
 
If so, isn't that what the empty statement is already supposed to do? I know there's already some optimizations in place around the scenario of empty statement name (and empty portal).

The problem with "empty statement name" is statements with empty name can be reused (for instance, for batch insert executions), so the server side has to do a defensive copy (it cannot predict how many times this unnamed statement will be used).

Shay Rojansky <roji@roji.org>: 
Also, part of the point here is to reduce the number of protocol messages needed in order to send a parameterized query - not to have to do Parse/Describe/Bind/Execute/Sync - since part of the slowdown comes from that (although I'm not sure how much). Your proposal keeps the 5 messages.

As my benchmarks show, notable overhead is due to "defensive copying of the execution plan". So I would measure first, and only then would claim where the overhead is.

Some more profiling is required to tell which part is a main time consumer.
Technically speaking, I would prefer to have a more real-life looking example instead of SELECT 1.
Do you have something in mind?
For instance, for more complex queries, "Parse/Plan" could cost much more than we shave by adding "a special non-cached statement name" or by reducing "5 messages into 1".

There's a side problem: describe message requires full roundtrip since there are cases when client needs to know how to encode parameters. Additional roundtrip hurts much worse than just an additional message that is pipelined (e.g. sent in the same TCP packet).

Shay Rojansky <roji@roji.org>: 
But people seem to be suggesting that a significant part of the overhead comes from the fact that there are 5 messages, meaning there's no way to optimize this without a new message type.

Of course 5 messages are slower than 1 message.
However, that does not mean "there's no way to optimize without a new message type".
Profiling can easily reveal time consumer parts, then we can decide if there's a solution.
Note: if we improve "SELECT 1" by 10%, it does not mean we improved statement execution by 10%. Real-life statements matter for proper profiling/discussion.

 Shay Rojansky <roji@roji.org>: 
Note: it is quite easy to invent a name that is not yet used in the wild, so it is safe.

That's problematic, how do you know what's being used in the wild and what isn't? The protocol has a specification, it's very problematic to get up one day and to change it retroactively. But again, the empty statement seems to already be there for that.

Empty statement has different semantics, and it is wildly used.
For instance, pgjdbc uses unnamed statements a lot.
On the other hand, statement name of "!pq@#!@#42" is rather safe to use as a special case.
Note: statement names are not typically created by humans (statement name is not a SQL), and very little PG clients do support named statements.

Vladimir

 

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Slowness of extended protocol
Next
From: Vladimir Sitnikov
Date:
Subject: Re: Slowness of extended protocol