Re: [HACKERS] Statement-level rollback - Mailing list pgsql-hackers

From Tsunakawa, Takayuki
Subject Re: [HACKERS] Statement-level rollback
Date
Msg-id 0A3221C70F24FB45833433255569204D1F80F931@G01JPEXMBYT05
Whole thread Raw
In response to Re: [HACKERS] Statement-level rollback  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: [HACKERS] Statement-level rollback
List pgsql-hackers
From: Peter Eisentraut [mailto:peter.eisentraut@2ndquadrant.com]
> The difference is how error recovery works.  So this will necessarily be
> tied to how the client code or other surrounding code is structured or what
> the driver or framework is doing in the background to manage transactions.
> It would also be bad if client code was not prepared for this new behavior,
> reported the transaction as complete while some commands in the middle were
> omitted.
> 
> Drivers can already achieve this behavior and do do that by issuing savepoint
> commands internally.  The point raised in this thread was that that creates
> too much network overhead, so a backend-based solution would be preferable.
> We haven't seen any numbers or other evidence to quantify that claim, so
> maybe it's worth looking into that some more.
> 
> In principle, a backend-based solution that drivers just have to opt into
> would save a lot of duplication.  But the drivers that care or require it
> according to their standards presumably already implement this behavior
> in some other way, so it comes back to whether there is a performance or
> other efficiency gain here.
> 
> Another argument was that other SQL implementations have this behavior.
> This appears to be the case.  But as far as I can tell, it is also tied
> to their particular interfaces and the structure and flow control they
> provide.  So a client-side solution like psql already provides or something
> in the various drivers would work just fine here.
> 
> So my summary for the moment is that a GUC or similar run-time setting might
> be fine, with appropriate explanation and warnings.  But it's not clear
> whether it's worth it given the existing alternatives.

I can think of four reasons why the server-side support is necessary or desirable.

First, the server log could be filled with SAVEPOINT and RELEASE lines when you need to investigate performance or
auditactivity.
 

Second, the ease of use for those who migrate from other DBMSs.  With the server-side support, only the DBA needs to be
awareof the configuration in postgresql.conf.  Other people don't need to be aware of the client-side parameter when
theydeploy applications.
 

Third, lack of server-side support causes trouble to driver developers.  In a recent discussion with the psqlODBC
committer,he had some trouble improving or fixing the statement-rollback support.  Npgsql doesn't have the
statement-rollbackyet.  PgJDBC has supported the feature with autosave parameter only recently.  Do the drivers for
otherlanguages like Python, Go, JavaScript have the feature?  We should reduce the burdon on the driver developers.
 

Fourth, the runtime performance.  In a performance benchmark of one of our customers, where a batch application ran 1.5
or5 million small SELECTs with primary key access, the execution time of the whole batch became shorter by more than
30%(IIRC) when the local connection was used instead of the remote TCP/IP one.  The communication overhead is not
small.

Also, in the PostgreSQL documentation, the communication overhead is treated seriously as follows:


https://www.postgresql.org/docs/devel/static/plpgsql-overview.html#plpgsql-advantages

[Excerpt]
--------------------------------------------------
That means that your client application must send each query to the database server, wait for it to be processed,
receiveand process the results, do some computation, then send further queries to the server. All this incurs
interprocesscommunication and will also incur network overhead if your client is on a different machine than the
databaseserver.
 

With PL/pgSQL you can group a block of computation and a series of queries inside the database server, thus having the
powerof a procedural language and the ease of use of SQL, but with considerable savings of client/server communication
overhead.


•Extra round trips between client and server are eliminated


•Intermediate results that the client does not need do not have to be marshaled or transferred between server and
client


•Multiple rounds of query parsing can be avoided


This can result in a considerable performance increase as compared to an application that does not use stored
functions.
--------------------------------------------------


Craig reports the big communication overhead:

PATCH: Batch/pipelining support for libpq

https://www.postgresql.org/message-id/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com#CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com

Re: foreign table batch insert
https://www.postgresql.org/message-id/CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com

[Excerpt]
--------------------------------------------------
The time difference for 10k inserts on the local host over a unix socket
shows a solid improvement:

batch insert elapsed:      0.244293s
sequential insert elapsed: 0.375402s

... but over, say, a connection to a random AWS RDS instance fired up for
the purpose that lives about 320ms away the difference is huge:

batch insert elapsed:      9.029995s
sequential insert elapsed: (I got bored after 10 minutes; it should take a
bit less then an hour based on the latency numbers)

With 500 rows on the remote AWS RDS instance, once the I/O quota is already
saturated:

batch insert elapsed:      1.229024s
sequential insert elapsed: 156.962180s

which is an improvement by a factor of over 120
--------------------------------------------------


Regards
Takayuki Tsunakawa



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Account for cost and selectivity of HAVING quals
Next
From: "Tsunakawa, Takayuki"
Date:
Subject: Re: [HACKERS] Statement-level rollback