Re: Slowness of extended protocol - Mailing list pgsql-hackers
From | Vladimir Sitnikov |
---|---|
Subject | Re: Slowness of extended protocol |
Date | |
Msg-id | CAB=Je-FnD8QLr4++Z+ADCYbsFxBEmxrQsyMq=2M5GpTW3UVciw@mail.gmail.com Whole thread Raw |
In response to | Re: Slowness of extended protocol (Shay Rojansky <roji@roji.org>) |
List | pgsql-hackers |
Shay> What? I really didn't understand your point here. All the doc is saying is Shay> that if the driver doesn't support prepared statements, then using them Please read again. PreparedStatement is the only way to execute statements in JDBC API. There's no API that allows user to specify "use server-prepared here". Well, there's non-prepared API in JDBC, however it misses "bind variables" support, so if bind variables required, developer would use PreparedStatement. Java's PreparedStatement does not have an option to distinguish which statements should be server-prepared and which should not. Vladimir>> My experience shows, that people are very bad at predicting where the Vladimir>> performance problem would be. Vladimir>> For 80% (or even 99%) of the cases, they just do not care thinking if a Vladimir>> particular statement should be server-prepared or not. Vladimir>> They have absolutely no idea how much resources it would take and so on. Shay> Maybe you're even right saying these things, I don't know. But that doesn't Shay> mean I as a driver should solve their problems for them. And I also get Shay> that you have an additional argument here besides programmer Shay> laziness/stupidity - the ORM argument - which makes more sense. Suppose backend can handle 20 server-prepared statements at most (if using more it would run out of memory). Suppose an application has 100 statements with ".prepare()" call. I think it is reasonable for the DB driver to figure out which statements are most important and server-prepare just "20 most important ones", and leave the rest 80 as regular non-prepared statements. Do you think the DB driver should just follow developer's advice and server-prepare all the 100 statements causing backend crash? Do you think application developer should have a single list of all the statements ever used in the application and make sure there's no more than 20 queries in it? My main point is not "developers are stupid", but "people often have wrong guess when it comes to performance". There are too many moving parts, so it is hard to predict performance implications. Often it is much easier to execute a series of benchmarks that validate certain hypothesis. For instance, as Tatsuo says, savepoint overhead for DML is higher than savepoint overhead for SELECT, so I plan to have that benchmark as well. Shay> First, there's nothing stopping an ORM from optimizing multiple inserts Shay> into a single multivalue insert. I do admit I'm not aware of any who do Shay> this, but it's a good idea for an optimization - I happen to maintain the Shay> Entity Framework Core provider for Npgsql, I might take a look at this Shay> optimization (so again thanks for the idea). Nothings stops, but M framework times N database drivers results in M*N effort for each feature. As you say: application should just use batch API, and it's driver's job to convert that into suitable for the database sequence of bytes. Same for Npgsql: if you implement rewrite at Npgsql level, that would automatically improve all the framework/applications running on top of Npgsql. Shay> I'm going to repeat what I said Shay> before and it would be good to get some reaction to this. Every software Shay> component in the stack has a role, and maintaining those separations is Shay> what keeps things simple and sane You might be missing my comments on CPU, x86, etc. My reaction is: almost every existing component is extremely hard to reason about. For instance: CPU has certain number of registers, it has certain amount of L1/L2/... caches and so on. Do you mean each and every developer should explicitly specify which program variable should use register and which one should go into L2 cache? This is a counter-example to your "sane" "separation". CPU is free to reorder instruction stream as long as the net result complies to the specification. In the same way, CPU is free to use L1/L2 caches in whatever way it thinks is the best. Note: typical DB driver developer does not try to maintain a set of "optimal assembly instructions". Driver developer relies on the compiler and the CPU so they would optimize driver's code into the best machine code. Of course driver might have inline assembly, but that is not how mainstream drivers are written. Another example: TCP stack. When DB driver sends some data, kernel is free to reorder packets, it is free to interleave, delay them, or even send even use multiple network cards to send a single TCP stream. Windows 10 includes several performance improvements to the TCP stack, and it is nowhere near to "kernel is doing exactly what application/driver coded". Once again: application/driver developer does not optimize for a specific hardware (e.g. network card). Developers just use common API and it is kernel's job to use best optimizations for the particular HW. The same goes to ORM-DB combo. ORM uses DB driver's API, and it's drivers job to use optimal command sequence for the specific database. Note: of course ORM might use vendor specific features for a good reason, however simple cases like "batch insert" are both common, and they are not top priority from "ORM developer" perspective. The same goes for application-driver-DB. Application just uses driver's API, and it is not application's business which bytes to send over the wire. Neither application should specify which particular packets to use. Note: I do not suggest to rewite SELECTs at the driver level yet. I do not know a common pattern there. For "insert multivalues" the pattern is simple: user sends insert statements via batch API. It happens quite often as it is the standard way of inserting data in bulks. It is recommended in all the books and articles on JDBC API. Not all the databases support multivaleus, so a cross-database application would likely just use "insert single values through batch API" and rely on the DB driver and database to do the right thing. Of course having that fix at network stack level would automatically improve all the postgresql drivers, however 1) It is NOT possible since multivalues have different SEMANTICS. When sending indivisual inserts, each individual response would have "number of rows inserted" field. When using multivalues insert, there will be just one insert with the total number of rows inserted. If application logic did expect separate row counts, then kernel stack is in trouble: it cannot recover individual counts out of a combined response. That is why multivalues rewrite at TCP level is impossible. So, PostgreSQL network protocol v3 prohibits kernel from doing automatic multivalues optimization. Let me show you how JDBC API enables PgJDBC to do that optimization: when application is using JDBC batch API, it does receive "per statement row count" as a batch result. However, there's a special result code that says "statement did succeed, but the row count is not known". Developers are prepared that DB driver can return "not known" for all the rows, so PgJDBC is free o use multivalues or copy or whatever makes most sense for a particular batch. 2) There are other implications of doing that optimization at kernel level (e.g. overhead for mysql users), but since "the optimization is not possible due to semantics", other reasons are not important. Shay> Or if the magic is implemented at the Shay> driver leve, it should be opt-in, or at least easy to disable entirely. "Optimizations in the CPU should be opt-in or at least easy to disable". Does that sound right to you? For debugging purposes, it could make sense. However, for regular production usage, optimizations should be enabled by default. Regular Intel CPUs come with optimizations enabled by default. Multivalues rewrite does comply with JDBC batch API, so it makes no surprise even if the optimization is enabled by default. Shay> So the Shay> several-microsecond difference translates to a 6% performance degradation. Shay> This is probably due to the slowness of processing the extra Bind/Execute Shay> pair for the savepoint, which maybe can be improved in PostgreSQL. Shay> Regardless, you may say there's no business case or that SELECT 1 against Shay> localhost means nothing, or that posting data is somehow wrong, but I don't Shay> find this very acceptable. In this particular case the purpose is to measure the overhead. In your particular case savepoint seems to cost 15us. It costs 3us for me, but that does not matter much. If each and every query is prefixed with a savepoint, then it will be like 15ms overhead for 1000 queries. I think if application issues 1000 queries, it would anyway spend much more time on processing the results and figuring out "which query to issue/which values to pass". However, that voltmeter is broken as Tatsuo have pointed out. He chimed in a pgdbc issue tracker and pointed out that "savepoint-DML-savepoint-DML-..." scenario would allocate extra xid for each savepoint. The more the xid allocation rate, the harder for vacuum the case is. So we need to measure 1) "savepoint-DML-savepoint-DML-..." in order to find "savepoint overhead in case of DML". Note: pgjdbc's autosave=conservative would not issue savepoint before regular DMLs as DML cannot fail with "cached plan ... result type" error. So "DML-select-DML-select" scenario is required. 2) "real life applications" in order to find how extra savepoints affect xid allocation rate. Vladimir
pgsql-hackers by date: