Re: [HACKERS] Statement-level rollback - Mailing list pgsql-hackers
From | Tsunakawa, Takayuki |
---|---|
Subject | Re: [HACKERS] Statement-level rollback |
Date | |
Msg-id | 0A3221C70F24FB45833433255569204D1F80F931@G01JPEXMBYT05 Whole thread Raw |
In response to | Re: [HACKERS] Statement-level rollback (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Statement-level rollback
|
List | pgsql-hackers |
From: Peter Eisentraut [mailto:peter.eisentraut@2ndquadrant.com] > The difference is how error recovery works. So this will necessarily be > tied to how the client code or other surrounding code is structured or what > the driver or framework is doing in the background to manage transactions. > It would also be bad if client code was not prepared for this new behavior, > reported the transaction as complete while some commands in the middle were > omitted. > > Drivers can already achieve this behavior and do do that by issuing savepoint > commands internally. The point raised in this thread was that that creates > too much network overhead, so a backend-based solution would be preferable. > We haven't seen any numbers or other evidence to quantify that claim, so > maybe it's worth looking into that some more. > > In principle, a backend-based solution that drivers just have to opt into > would save a lot of duplication. But the drivers that care or require it > according to their standards presumably already implement this behavior > in some other way, so it comes back to whether there is a performance or > other efficiency gain here. > > Another argument was that other SQL implementations have this behavior. > This appears to be the case. But as far as I can tell, it is also tied > to their particular interfaces and the structure and flow control they > provide. So a client-side solution like psql already provides or something > in the various drivers would work just fine here. > > So my summary for the moment is that a GUC or similar run-time setting might > be fine, with appropriate explanation and warnings. But it's not clear > whether it's worth it given the existing alternatives. I can think of four reasons why the server-side support is necessary or desirable. First, the server log could be filled with SAVEPOINT and RELEASE lines when you need to investigate performance or auditactivity. Second, the ease of use for those who migrate from other DBMSs. With the server-side support, only the DBA needs to be awareof the configuration in postgresql.conf. Other people don't need to be aware of the client-side parameter when theydeploy applications. Third, lack of server-side support causes trouble to driver developers. In a recent discussion with the psqlODBC committer,he had some trouble improving or fixing the statement-rollback support. Npgsql doesn't have the statement-rollbackyet. PgJDBC has supported the feature with autosave parameter only recently. Do the drivers for otherlanguages like Python, Go, JavaScript have the feature? We should reduce the burdon on the driver developers. Fourth, the runtime performance. In a performance benchmark of one of our customers, where a batch application ran 1.5 or5 million small SELECTs with primary key access, the execution time of the whole batch became shorter by more than 30%(IIRC) when the local connection was used instead of the remote TCP/IP one. The communication overhead is not small. Also, in the PostgreSQL documentation, the communication overhead is treated seriously as follows: https://www.postgresql.org/docs/devel/static/plpgsql-overview.html#plpgsql-advantages [Excerpt] -------------------------------------------------- That means that your client application must send each query to the database server, wait for it to be processed, receiveand process the results, do some computation, then send further queries to the server. All this incurs interprocesscommunication and will also incur network overhead if your client is on a different machine than the databaseserver. With PL/pgSQL you can group a block of computation and a series of queries inside the database server, thus having the powerof a procedural language and the ease of use of SQL, but with considerable savings of client/server communication overhead. •Extra round trips between client and server are eliminated •Intermediate results that the client does not need do not have to be marshaled or transferred between server and client •Multiple rounds of query parsing can be avoided This can result in a considerable performance increase as compared to an application that does not use stored functions. -------------------------------------------------- Craig reports the big communication overhead: PATCH: Batch/pipelining support for libpq https://www.postgresql.org/message-id/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com#CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com Re: foreign table batch insert https://www.postgresql.org/message-id/CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com [Excerpt] -------------------------------------------------- The time difference for 10k inserts on the local host over a unix socket shows a solid improvement: batch insert elapsed: 0.244293s sequential insert elapsed: 0.375402s ... but over, say, a connection to a random AWS RDS instance fired up for the purpose that lives about 320ms away the difference is huge: batch insert elapsed: 9.029995s sequential insert elapsed: (I got bored after 10 minutes; it should take a bit less then an hour based on the latency numbers) With 500 rows on the remote AWS RDS instance, once the I/O quota is already saturated: batch insert elapsed: 1.229024s sequential insert elapsed: 156.962180s which is an improvement by a factor of over 120 -------------------------------------------------- Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: