Re: Slowness of extended protocol - Mailing list pgsql-hackers
From | Vladimir Sitnikov |
---|---|
Subject | Re: Slowness of extended protocol |
Date | |
Msg-id | CAB=Je-F-JsWyPvKS=G=wRU=SMoys7NT2LjtQkbSq1Wjawb9w4Q@mail.gmail.com Whole thread Raw |
In response to | Re: Slowness of extended protocol (Shay Rojansky <roji@roji.org>) |
Responses |
Re: Slowness of extended protocol
|
List | pgsql-hackers |
Vladimir>> Yes, that is what happens. Vladimir>> The idea is not to mess with gucs. Shay:> Wow... That is... insane... Someone might say that "programming languages that enable side-effects are insane". Lots of connection pools work by sharing the connections and it is up to developer if he can behave well (see "It is not" below) Shay> it's entirely reasonable for Shay> more than one app to use use the same pool Sharing connections between different applications might be not that good idea. However, I would not agree that "having out-of-process connection pool" is the only sane way to go. I do admit there might be valid cases for out of process pooling, however I do not agree it is the only way to go. Not only "inprocess" is one of the ways, "in-process" way is wildly used in at least one of the top enterprise languages. If you agree with that, you might agree that "in-process connection pool that serves exactly one application" might work in a reasonable fashion even without DISCARD ALL. Shay> Even with in-process pools it's standard practice (and a good idea) to Shay> reset state. It is not. Can you quote where did you get that "standard practice is to reset state" from? Oracle Weblogic application server does not reset connections. JBoss WildFly application server does not reset connections. HikariCP connection pool does not reset connections. I can easily continue the list. The above are heavily used java servers (Hikari is a pool to be exact). Shay> If some part of a big complex Shay> application modified state, and then some other part happens to get that Shay> physical connection, it's extremely hard to make sense of things. Let's not go into "functional programming" vs "imperative programming" discussion? Of course you might argue that "programming in Haskell or OCaml or F#" makes "extremely easy to make sense of things", but that's completely another discussion. Shay> One note - in Npgsql's implementation of persistent prepared statements, Shay> instead of sending DISCARD ALL Npgsql sends the commands listed in Shay> https://www.postgresql.org/docs/current/static/sql-discard.html, except for Shay> DEALLOCATE ALL. This cleans up state changes but leaves prepared statements Shay> in place. Ok. At least you consider that "always discarding all the state" might be bad. Shay> This is somewhat similar to the CPU reordering you Shay> keep coming back to - it's totally invisible I would disagree. CPU reordering is easily visible if you are dealing with multithreaded case. It can easily result in application bugs if application misses some synchronization. CPU reordering is very visible to regular programmers, and it is a compromise: 1) Developers enable compiler and CPU do certain "reorderings" 2) Developers agree to follow the rules like "missing synchronization might screw things up" 3) In the result, the code gets executed faster. Vladimir> Just in case: PostgreSQL does not execute "discard all" on its own. Shay> Of course it doesn't - it doesn't know anything about connection pooling, Shay> it only knows about physical connections. When would it execute "discard Shay> all" on its own? That my point was for "pgpool aiming to look like a regular postgresql connection". The point was: "postgresql does not discard on its own, so pgpool should not discard". Shay> To enforce isolation, which is maybe the most important way for programs to Shay> be reliable - but this is a general theme which you don't seem to agree Shay> with. If you want to isolate something, you might better have a per-application connection pool. That way, if a particular application consumes all the connections, it would not impact other applications. If all the applications use the same out-of-process pool, there might be trouble of resource hogging. Shay> Regardless, resetting state doesn't have to have a necessary effect Shay> on response times/throughput. Even if you do not reset prepared statements, "reset query" takes time. For instance: there's a common problem to "validate connections before use". That is the pooler should ensure the connection is working before handling it to the application. Both Weblogic server, and HikariCP have those connection validation built in and the validation is enabled by default. However, it turns out that "connection validation" takes too much time, it is visible in the application response times, etc, so they both implement a grace period. That is "if the connection was recently used, it is assumed to be fine". Weblogic trusts 15 seconds by default, so if you borrow connections each 10 seconds, then they will not be tested. Well, there's additional background validation, but my main point is "even select 1" is visible on the application response times. Shay> This is something new - maybe it's part of the misunderstanding here. To Shay> me, the term "prepared statements" always means "server-prepared Shay> statements"; this seems to be supported by the documentation you quote: "If Shay> the driver supports precompilation, the method prepareStatement will send Shay> the statement to the database for precompilation". I don't understand the Shay> concept of a prepared statements which isn't server-prepared - do you have Shay> some sort of driver-only prepared statements, which The concept is "the implementation of PreparedStatement interface is free to chose how it will execute the queries". It can go with "server-prepare on each execution", it can go with "cache server-prepared statements", etc, etc. The whole purpose of having that "vague API" is to enable database vendors to make most sense of their databases. Shay> Regardless of any optimizations you may be doing, in every database driver Shay> I've ever seen in my life, "prepared" simply means "server-prepared". And Shay> in every driver I've ever seen, there's an explicit API for that. Therefore Shay> server-prepare is something that's exposed to the developer, rather than Shay> some internal detail left to the driver. Please, take CPU example (or TCP example) seriously. Seriously. CPU did not always had a L2 cache. L2 was invented to improve the performance of existing and future applications. The same applies to "prepared statement cache at the database driver level". It is implemented in pgjdbc to improve the performance of existing and future applications. Vladimir>> Suppose backend can handle 20 server-prepared statements at most (if Vladimir> using more it would run out of memory). Vladimir>> Suppose an application has 100 statements with ".prepare()" call. Vladimir>> I think it is reasonable for the DB driver to figure out which statements Vladimir> are most important and server-prepare just "20 most important ones", > and Vladimir> leave the rest 80 as regular non-prepared statements. Shay> I seriously don't find it reasonable at all. Would you please answer to "should db driver silently server-prepare all the 100 statements and crash the DB" question? Shay> Each connection fills Shay> the cache, ejecting the other component's prepared statements from the Shay> cache. As we switch between the two components, statements have to be Shay> reprepared. This is exactly what happens when several applications use the same CPU. L2/L3 can be shared between cores, so if one cares on the performance of a particular application, he should isolate the critical task to its own CPU (or set of cores). The same principle applies to connection pool. Either multiple pools should be used, or cache size increased, or more sophisticated algorithms should be used to provide better hit rate. Shay> Manual work here would allow picking truly hottest statements from each Shay> app/component, and preparing those - allowing the hottest statements across Shay> both components to always remain live. Shay> This is exactly what is meant by Shay> "programmer knowledge" which the driver doesn't have - it only has its Shay> recently-used logic which sometimes breaks down. Let me exaggerate a bit. Compiler optimizations sometimes break down (they produce not that well performing code), so all the optimizations should be disabled by default, and every developer should manually examine each line of code, identify hottest statements from each app/component and assign variables to CPU registers. This is exactly "programmer knowledge" which the compiler doesn't have. Does it sound good to you? Come on. Suppose you are writing a framework (ORM or whatever) that happens to execute queries into the database. Should that API have a special parameter "should_use_server_prepared" for each and every method? How should framework developer tell which statements should be server-prepared and which should not? Should framework use server-prepared statements at all? What if framework should support PostgreSQL, MySQL, and Oracle? Should there be three parameters? Just the same in another direction: can you show some github library that allows user (the developer) to control which statements should be server-prepared and which should not? Shay> You keep making this mistaken analogy... Caches (and CPU reordering) are Shay> invisible, prepared statements aren't - they can trigger bugs (because of Shay> DDL), Note: "the cache of server-prepared statements" is invisible to the end user. Well, when it comes to DDL it might trigger "PostgreSQL's not_implemented" issues, however those cases are not often, and those cases can be solved (in no particular order): 1) For instance, it can be solved by manual intervention: reset the connection pool after DDL. 2) It can be solved by "automatic savepointing" at the driver level. 3) It can be solved by fixing "not_implemented" at the database level. For instance, if "cached plan must not change state" error did not kill transaction, driver would be able just resend the full parse/describe/bind sequence and avoid throwing "cached plan" at user. 4) I'm sure there are other solutions. They might be even better than all the three above, but the point is reusing server-prepared statements is a way to go, so let's try fixing the database/protocol instead of reverting to "each connection should be taken through DISCARD ALL" Shay> and much more importantly, they take up resources on the PostgreSQL Shay> side. Period. The L2 cache is already there and takes no additional Shay> resources whether you use it or not. It's invisible. CPU caches take resources as well. L2 cache drains power, it costs money to produce. CPUs without L2/L3 would be cheaper. "it takes resources" does not justify that "the feature must not exist". "statement cache" is not completely invisible partially due to current database limitations, and partially due to "strange maintenance techniques end-users are using". Vladimir
pgsql-hackers by date: