Insert values() per-statement overhead - Mailing list pgsql-hackers

From Vladimir Sitnikov
Subject Insert values() per-statement overhead
Date
Msg-id CAB=Je-EDZwVH7FgpDOa0ad8a7a6D=1QoYmvW2kQqSAML2gf+Uw@mail.gmail.com
Whole thread Raw
Responses Re: Insert values() per-statement overhead  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

There is a finding that insert(x) values(y);insert(x) values(z);  is
2-4 times slower than insert(..) values(y),(z);
see [1], [2].

In other words, there is a significant per-statement overhead even
though server-prepared statements are properly used.
The issue is reproducible in 9.5rc1.

Is it something that was discussed previously? (I was not able to find
that in archives)
Is it something that can be improved? (e.g. "insert node state"
caching across execute calls, improve performance of "INSERT %u %u"
generation, etc)

Even though I understand there will always be _some_ per-statement
overhead, such a hight overhead plays against common case of using
ORMs.
End-users are just stuck with insert(...) values(...);

1) Java's standard way of batching statements is
"PreparedStatement#addBatch()". Unfortunately, backend protocol does
not support statement batching.
One does not simply teach Hibernate/EclipseLink,etc etc to use
PostgreSQL's-specific COPY.
Note: I am not talking about network roundtrips here. I'm just
highlighting that there is no way to execute "bind bind bind
executebatch" sequence at the protocol level.

2) One might consider "transparent rewrite of insert()  batches into a
single insert() values(),(),() statement" at JDBC driver level, but it
is hard to get right as there is no easy way to parse a query. It is
really expected that every PostgreSQL connector would implement SQL
parser & insert rewriter?

3) Transparent rewrites (including "rewrite inserts to COPY") would
fail to provide "number of modified rows" for each row. Error
semantics is different as well.

4) COPY does not support UPSERT, does it?

My profiler (Instruments in Mac OS) shows that significant time is
spent in standard_ExecutorStart: see [3]
In fact, the time spent in standard_ExecutorStart even exceeds the
time spent in standard_ExecutorRun.

[1]: http://www.postgresql.org/message-id/55130DC8.2070508@redhat.com
[2]: https://github.com/pgjdbc/pgjdbc/pull/491#issuecomment-171780054
[3]: https://github.com/pgjdbc/pgjdbc/pull/491#issuecomment-171908974

Vladimir Sitnikov



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Stream consistent snapshot via a logical decoding plugin as a series of INSERTs
Next
From: Andres Freund
Date:
Subject: Re: Insert values() per-statement overhead