Thread: [JDBC] Efficient use of ON CONFLICT DO UPDATE with the PostgreSQL JDBC driver
[JDBC] Efficient use of ON CONFLICT DO UPDATE with the PostgreSQL JDBC driver
Christopher
Re: [JDBC] Efficient use of ON CONFLICT DO UPDATE with the PostgreSQLJDBC driver
Hello,I've at last had an opportunity to use PostgreSQL 9.6 (instead of 9.4) for a project, and have been able to use the ON CONFLICT DO UPDATE clause. As it can involve repeating parameter values (including, as is my case, large binary files), I'm wondering if there's any performance or resource-usage side effects (or workarounds) when doing something like this:INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. I'm not noticing side effects on localhost (but haven't profiled either). Is this fine, or a potential problem? Is there a better approach?
Hello,I've at last had an opportunity to use PostgreSQL 9.6 (instead of 9.4) for a project, and have been able to use the ON CONFLICT DO UPDATE clause. As it can involve repeating parameter values (including, as is my case, large binary files), I'm wondering if there's any performance or resource-usage side effects (or workarounds) when doing something like this:INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. I'm not noticing side effects on localhost (but haven't profiled either). Is this fine, or a potential problem? Is there a better approach?
Re: Efficient use of ON CONFLICT DO UPDATE with the PostgreSQLJDBC driver
Re: [JDBC] Efficient use of ON CONFLICT DO UPDATE with the PostgreSQLJDBC driver
Re: [JDBC] Efficient use of ON CONFLICT DO UPDATE with the PostgreSQLJDBC driver
On 13 January 2017 at 06:01, Christopher BROWN <brown@reflexe.fr> wrote:Hello,I've at last had an opportunity to use PostgreSQL 9.6 (instead of 9.4) for a project, and have been able to use the ON CONFLICT DO UPDATE clause. As it can involve repeating parameter values (including, as is my case, large binary files), I'm wondering if there's any performance or resource-usage side effects (or workarounds) when doing something like this:INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. I'm not noticing side effects on localhost (but haven't profiled either). Is this fine, or a potential problem? Is there a better approach?So to summarize you would like some sort of optimization where we figured out that two of the values were the same in the initial insert and the on conflict and only send it once ?
Re: Efficient use of ON CONFLICT DO UPDATE with the PostgreSQLJDBC driver
On 13 January 2017 at 06:01, Christopher BROWN <brown@reflexe.fr> wrote:Hello,I've at last had an opportunity to use PostgreSQL 9.6 (instead of 9.4) for a project, and have been able to use the ON CONFLICT DO UPDATE clause. As it can involve repeating parameter values (including, as is my case, large binary files), I'm wondering if there's any performance or resource-usage side effects (or workarounds) when doing something like this:INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. I'm not noticing side effects on localhost (but haven't profiled either). Is this fine, or a potential problem? Is there a better approach?So to summarize you would like some sort of optimization where we figured out that two of the values were the same in the initial insert and the on conflict and only send it once ?
Re: Efficient use of ON CONFLICT DO UPDATE with the PostgreSQL JDBCdriver
Christopher BROWN schrieb am 13.01.2017 um 15:01: > I've at last had an opportunity to use PostgreSQL 9.6 (instead of > 9.4) for a project, and have been able to use the ON CONFLICT DO > UPDATE clause. As it can involve repeating parameter values > (including, as is my case, large binary files), I'm wondering if > there's any performance or resource-usage side effects (or > workarounds) when doing something like this: > > INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?) > ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ? > > On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. > > Is there a better approach? Maybe I'm missing something but you don't need to the parameters in the UPDATE part: INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?) ON CONFLICT (c_uuid) DO UPDATE SET file_data = excluded.file_data, file_name = excluded.file_name; This assumes that in case of an update you actually want to use the same values as provided in the VALUES () clause
[JDBC] Re: Efficient use of ON CONFLICT DO UPDATE with the PostgreSQL JDBCdriver
Christopher BROWN schrieb am 13.01.2017 um 15:01: > I've at last had an opportunity to use PostgreSQL 9.6 (instead of > 9.4) for a project, and have been able to use the ON CONFLICT DO > UPDATE clause. As it can involve repeating parameter values > (including, as is my case, large binary files), I'm wondering if > there's any performance or resource-usage side effects (or > workarounds) when doing something like this: > > INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?) > ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ? > > On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. > > Is there a better approach? Maybe I'm missing something but you don't need to the parameters in the UPDATE part: INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?) ON CONFLICT (c_uuid) DO UPDATE SET file_data = excluded.file_data, file_name = excluded.file_name; This assumes that in case of an update you actually want to use the same values as provided in the VALUES () clause
Re: Efficient use of ON CONFLICT DO UPDATE with the PostgreSQLJDBC driver
BTW this use case don't really seems to be of high impact on performance, so adding an extension to the specification don't make sense IMO.
On 13 January 2017 at 15:20, Dave Cramer <pg@fastcrypt.com> wrote:On 13 January 2017 at 06:01, Christopher BROWN <brown@reflexe.fr> wrote:Hello,I've at last had an opportunity to use PostgreSQL 9.6 (instead of 9.4) for a project, and have been able to use the ON CONFLICT DO UPDATE clause. As it can involve repeating parameter values (including, as is my case, large binary files), I'm wondering if there's any performance or resource-usage side effects (or workarounds) when doing something like this:INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. I'm not noticing side effects on localhost (but haven't profiled either). Is this fine, or a potential problem? Is there a better approach?So to summarize you would like some sort of optimization where we figured out that two of the values were the same in the initial insert and the on conflict and only send it once ?To summarize, I'd like to avoid sending the same data twice, if that has any potentially significant performance issues and/or resource-usage issues. I don't know if that's the case or not.The SQL syntax (and the query workflow, and parameter order) is specific to PostgreSQL, so I'm not against something that varies from the basic "PreparedStatement" syntax. If there's a safe way of figuring out the intention, that'd be excellent. Maybe if the driver has "seen" the parameter set (via PreparedStatement::setXXX) in the INSERT part for field "X", and it "sees" the field "X" is unset in the ON CONFLICT part, or if you could have some sort of back-reference (either by index, e.g. $1, $2 or whatever, or by name if that's possible).
Re: [JDBC] Efficient use of ON CONFLICT DO UPDATE with the PostgreSQLJDBC driver
BTW this use case don't really seems to be of high impact on performance, so adding an extension to the specification don't make sense IMO.
On 13 January 2017 at 15:20, Dave Cramer <pg@fastcrypt.com> wrote:On 13 January 2017 at 06:01, Christopher BROWN <brown@reflexe.fr> wrote:Hello,I've at last had an opportunity to use PostgreSQL 9.6 (instead of 9.4) for a project, and have been able to use the ON CONFLICT DO UPDATE clause. As it can involve repeating parameter values (including, as is my case, large binary files), I'm wondering if there's any performance or resource-usage side effects (or workarounds) when doing something like this:INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. I'm not noticing side effects on localhost (but haven't profiled either). Is this fine, or a potential problem? Is there a better approach?So to summarize you would like some sort of optimization where we figured out that two of the values were the same in the initial insert and the on conflict and only send it once ?To summarize, I'd like to avoid sending the same data twice, if that has any potentially significant performance issues and/or resource-usage issues. I don't know if that's the case or not.The SQL syntax (and the query workflow, and parameter order) is specific to PostgreSQL, so I'm not against something that varies from the basic "PreparedStatement" syntax. If there's a safe way of figuring out the intention, that'd be excellent. Maybe if the driver has "seen" the parameter set (via PreparedStatement::setXXX) in the INSERT part for field "X", and it "sees" the field "X" is unset in the ON CONFLICT part, or if you could have some sort of back-reference (either by index, e.g. $1, $2 or whatever, or by name if that's possible).
Christopher Brown wrote:
> ... I'd like to avoid sending the same data twice ...
> ... when doing something like this:
> INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)
> ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?
> ... The SQL syntax (and the query workflow, and parameter order) is specific to PostgreSQL,
> so I'm not against something that varies from the basic "PreparedStatement" syntax.
> If there's a safe way of figuring out the intention, that'd be excellent. ...
I find it easiest to make the intention explicit like this. This is also a more conventient syntax for handling multiple rows because you just extend the "values (?, ?, ?)" to values "(?, ?, ?), (?, ?, ?), (?, ?, ?)".
with params (c_uuid, file_data, file_name) as (values (?, ?, ?))
insert into foo (c_uuid, file_data, file_name)
select c_uuid, file_data, file_name from params
on conflict (c_uuid) do update set (file_data, file_name) = (select file_data, file_name from params)
I ran this statement in 9.6 through pgAdmin 4 in order to verify the syntax (with the ?, ?, ? replaced with literals - I have not yet figured out how to bind parameters to parameterized statements in pgAdmin 4) and the messages were ...
INSERT 0 1
Query returned successfully in 182 msec.
Re: [JDBC] Efficient use of ON CONFLICT DO UPDATE with the PostgreSQLJDBC driver
Christopher Brown wrote:
> ... I'd like to avoid sending the same data twice ...
> ... when doing something like this:
> INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)
> ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?
> ... The SQL syntax (and the query workflow, and parameter order) is specific to PostgreSQL,
> so I'm not against something that varies from the basic "PreparedStatement" syntax.
> If there's a safe way of figuring out the intention, that'd be excellent. ...
I find it easiest to make the intention explicit like this. This is also a more conventient syntax for handling multiple rows because you just extend the "values (?, ?, ?)" to values "(?, ?, ?), (?, ?, ?), (?, ?, ?)".
with params (c_uuid, file_data, file_name) as (values (?, ?, ?))
insert into foo (c_uuid, file_data, file_name)
select c_uuid, file_data, file_name from params
on conflict (c_uuid) do update set (file_data, file_name) = (select file_data, file_name from params)
I ran this statement in 9.6 through pgAdmin 4 in order to verify the syntax (with the ?, ?, ? replaced with literals - I have not yet figured out how to bind parameters to parameterized statements in pgAdmin 4) and the messages were ...
INSERT 0 1
Query returned successfully in 182 msec.
Re: [JDBC] Efficient use of ON CONFLICT DO UPDATE with the PostgreSQLJDBC driver
Christopher Brown wrote:
> ...
> I'm wondering if there's any performance or resource-usage side effects (or workarounds) when doing something like this:
>
> INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)
> ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?
>
> On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. ...
Yes, there are lots of resource usage side effects.
pgjdbc transmits the byte stream over the network once for every time the stream is bound.
You will also need to provide independent InputStream values (or use mark()/reset()) because setBinaryStream() consumes the stream.
The nice thing with pgjdbc is that it consumes the stream at the time you make the setBinaryStream() call. "small" streams (50 KB or less)
are materialized in memory while larger streams are written to a temp file. The mark()/reset() trick doesn't work for JDBC drivers that
consume the stream lazily.
If your query uses simple protocol, the query is materialized as a StringBuilder/String in its entirety (including parameter replacement)
before the query is transmitted (when you execute the statement). Which, if you have large blob/clob values, can result in
OutOfMemoryError or running into the internal JVM limits on String/array size (normally Integer.MAX_VALUE).
Christopher Brown wrote:
> ...
> I'm wondering if there's any performance or resource-usage side effects (or workarounds) when doing something like this:
>
> INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?)
> ON CONFLICT (c_uuid) DO UPDATE SET file_data = ?, file_name = ?
>
> On the Java side, that means invoking PreparedStatement::setBinaryStream TWICE for the same data. ...
Yes, there are lots of resource usage side effects.
pgjdbc transmits the byte stream over the network once for every time the stream is bound.
You will also need to provide independent InputStream values (or use mark()/reset()) because setBinaryStream() consumes the stream.
The nice thing with pgjdbc is that it consumes the stream at the time you make the setBinaryStream() call. "small" streams (50 KB or less)
are materialized in memory while larger streams are written to a temp file. The mark()/reset() trick doesn't work for JDBC drivers that
consume the stream lazily.
If your query uses simple protocol, the query is materialized as a StringBuilder/String in its entirety (including parameter replacement)
before the query is transmitted (when you execute the statement). Which, if you have large blob/clob values, can result in
OutOfMemoryError or running into the internal JVM limits on String/array size (normally Integer.MAX_VALUE).
Re: Efficient use of ON CONFLICT DO UPDATE with the PostgreSQL JDBCdriver
Brad DeJong schrieb am 13.01.2017 um 18:18: > I find it easiest to make the intention explicit like this. This is > also a more conventient syntax for handling multiple rows because you > just extend the "values (?, ?, ?)" to values "(?, ?, ?), (?, ?, ?), > (?, ?, ?)". > > with params (c_uuid, file_data, file_name) as (values (?, ?, ?)) > insert into foo (c_uuid, file_data, file_name) > select c_uuid, file_data, file_name from params > on conflict (c_uuid) do update set (file_data, file_name) = (select file_data, file_name from params) > > > I ran this statement in 9.6 through pgAdmin 4 in order to verify the > syntax (with the ?, ?, ? replaced with literals - I have not yet > figured out how to bind parameters to parameterized statements in > pgAdmin 4) and the messages were ... That is overly complicated because you can do a simple: INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?) ON CONFLICT (c_uuid) DO UPDATE SET file_data = excluded.file_data, file_name = excluded.file_name; Plus: using EXCLUDED means this will also work for multi-row inserts which your workaround does not: INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?) ON CONFLICT (c_uuid) DO UPDATE SET file_data = excluded.file_data, file_name = excluded.file_name;
[JDBC] Re: Efficient use of ON CONFLICT DO UPDATE with the PostgreSQL JDBCdriver
Brad DeJong schrieb am 13.01.2017 um 18:18: > I find it easiest to make the intention explicit like this. This is > also a more conventient syntax for handling multiple rows because you > just extend the "values (?, ?, ?)" to values "(?, ?, ?), (?, ?, ?), > (?, ?, ?)". > > with params (c_uuid, file_data, file_name) as (values (?, ?, ?)) > insert into foo (c_uuid, file_data, file_name) > select c_uuid, file_data, file_name from params > on conflict (c_uuid) do update set (file_data, file_name) = (select file_data, file_name from params) > > > I ran this statement in 9.6 through pgAdmin 4 in order to verify the > syntax (with the ?, ?, ? replaced with literals - I have not yet > figured out how to bind parameters to parameterized statements in > pgAdmin 4) and the messages were ... That is overly complicated because you can do a simple: INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?) ON CONFLICT (c_uuid) DO UPDATE SET file_data = excluded.file_data, file_name = excluded.file_name; Plus: using EXCLUDED means this will also work for multi-row inserts which your workaround does not: INSERT INTO foo (c_uuid, file_data, file_name) VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?) ON CONFLICT (c_uuid) DO UPDATE SET file_data = excluded.file_data, file_name = excluded.file_name;