Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Slowness of extended protocol
Date
Msg-id CADT4RqBCMvRzZHO6iomBpdOT_=9nztNsYw-4rDmA4qtBbegjpw@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Responses Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-hackers

Vladimir wrote:

Shay> As Tom said, if an application can benefit from preparing, the developer has the responsibility (and also the best knowledge) to manage preparation, not the driver. Magical behavior under the hood causes surprises, hard-to-diagnose bugs etc.

Why do you do C# then?
Aren't you supposed to love machine codes as the least magical thing?
Even x86 does not describe "the exact way the code should be executed".
All the CPUs shuffle the instructions to make it faster.

These are valid questions. I have several answers to that.

First, it depends on the extent to which the "under the hood" activity may have visible side-effects. CPU instruction reordering is 100% invisible, having no effect whatsoever on the application apart from speeding it up. Prepared statements can have very visible effects apart from the speedup they provide (e.g. failure because of schema changes). It's not that these effects can't be worked around - they can be - but programmers can be surprised by these effects, which can cause difficult-to-diagnose issues. Nobody needs to be really aware of CPU instruction reordering because it will never cause an application issue (barring a CPU bug).

Another important visible effect of preparing a statement is server-side cost (i.e. memory). I'm not sure what the overhead is, but it is there and an irresponsible "prepare everything everywhere" can create a significant resource drain on your server. I know pgjdbc has knobs for controlling how many statements are prepared, but in a large-scale performance-sensitive app a programmer may want to manually decide, on a per-query basis, whether they want to prepare or not.

But the most important question is that this is a choice that should be left to the developer, rather than imposed. In some cases developers *do* drop down to assembly, or prefer to call system calls directly to be in control of exactly what's happening. They should be able to do this. Now, of course you can provide an option which disables implicit preparing, but if you refuse to optimize non-prepared paths you're effectively forcing the programmer to go down the prepared statement path.

Shay>As Tom said, if an application can benefit from preparing, the developer has the responsibility

Does developer have the responsibility to choose between "index scan" and "table seq scan"? So your "developer has the responsibility" is like building on sand.

I'm very glad you raised that point. The programmers indeed outsources that decision to the database, because the database has the most *knowledge* on optimal execution (e.g. which indices are defined). Implicit preparing, on the other hand, has nothing to do with knowledge: as Tom said, the driver knows absolutely nothing about the application, and doesn't have any advantage over the programmer in making that decision.

But note that the DBA *does* have to decide which indices exist and which don't. Would you accept a database that automatically creates indices based on use, i.e. "x queries caused full table scans, so I'll silently create an index here"? I wouldn't, it would be utter nonsense and create lots of unpredictability in terms of performance and resource drain.

My experience shows, that people are very bad at predicting where the performance problem would be.
For 80% (or even 99%) of the cases, they just do not care thinking if a particular statement should be server-prepared or not.
They have absolutely no idea how much resources it would take and so on.

And for those people it's great to have an implicit preparation feature, most probably opt-in. But that shouldn't mean we shouldn't optimize things for the rest.
 
ORMs have no that notion of "this query must be server-prepared, while that one must not be".
And so on.

It is somewhat insane to assume people would use naked SQL. Of course they would use ORMs and alike, so they just would not be able to pass that additional parameter telling if a particular query out of thousands should be server-prepared or not.

Uh, I really wouldn't make statements like that if you want to be taken seriously. Tons of applications use naked SQL and avoid ORMs for many reasons (e.g. performance benefits of hand-crafted SQL), such general statements on what applications do in the world are, well, silly. Again, ORMs are an argument for why implicit preparation should exist (I made that argument myself above), but they're not an argument for why optimizing other paths should be excluded.
 
Vladimir> "cached plan cannot change result type" -- PostgreSQL just fails to execute the server-prepared statement if a table was altered.

Shay>How exactly do users cope with this in pgjdbc? Do they have some special API to flush (i.e. deallocate) prepared statements which they're supposed to use after a DDL?

First of all, pgjdbc does report those problems to hackers.
Unfortunately, it is still "not implemented".
Then, a workaround at pgjdbc side is made.
Here's relevant pgjdbc fix: https://github.com/pgjdbc/pgjdbc/pull/451

It analyzes error code, and if it finds "not_implemented from RevalidateCachedQuery", then it realizes it should re-prepare. Unfortunately, there is no dedicated error code, but at least there's a routine name.

That's exactly the kind of behavior I don't like in libraries/drivers - implicit behavior which may trigger errors, which the library than has to (transparently!) recover from. It makes applications very brittle. What happens if you happen to be in a transaction when this error occurs? When an error occurs, PostgreSQL puts the ongoing transaction in state "failed" and requires you to rollback - how can you possibly recover from that?

Shay>it have been listed many times - pgbouncer

Let's stop discussing pgbouncer issue here?
It has absolutely nothing to do with pgsql-hackers.

You have the right to think so, but I disagree. PostgreSQL is part of an ecosystem, and pgbouncer is a big part of that ecosystem (as are other potential pools).

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: new autovacuum criterion for visible pages
Next
From: Vladimir Sitnikov
Date:
Subject: Re: Slowness of extended protocol