Thread: Slow performance updating CLOB data
Most of the time is spent in method PgResultSet.isUpdateable because the table has a primary key but doesn't have a "oid" column.
So the code tries to get primary keys from the query, using getMetaData().getPrimaryKeys.
This is a very slow process.
There is any workaround ?
Nicola Zanaga schrieb am 17.07.2016 um 11:39: > Hi, using ResultSet.updateCharacterStream to update a CLOB is very slow. > > Most of the time is spent in method PgResultSet.isUpdateable because the table has a primary key but doesn't have a "oid"column. > > So the code tries to get primary keys from the query, using getMetaData().getPrimaryKeys. > This is a very slow process. > > There is any workaround ? Are you using setCharacterStream()? Why aren't you using setString()? Please show us the Java code.
Nicola Zanaga schrieb am 17.07.2016 um 11:39: > Hi, using ResultSet.updateCharacterStream to update a CLOB is very slow. > > Most of the time is spent in method PgResultSet.isUpdateable because the table has a primary key but doesn't have a "oid"column. > > So the code tries to get primary keys from the query, using getMetaData().getPrimaryKeys. > This is a very slow process. > > There is any workaround ? Are you using setCharacterStream()? Why aren't you using setString()? Please show us the Java code.
Here a snippet:
Properties props = new Properties();
props.put("user", "test1");
props.put("password", "test1");
Connection connection = java.sql.DriverManager.getConnection("jdbc:postgresql://<server>", props);
Statement st = connection.createStatement();
st.execute("CREATE TABLE TestClob1 (id int8 NOT NULL, data TEXT, PRIMARY KEY(id))");
st.close();
st = connection.createStatement();
st.execute("INSERT INTO TestClob1 (id) VALUES (1) ");
st.close();
st = connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_UPDATABLE);
ResultSet resultSet = st.executeQuery("SELECT id, data FROM TestClob1 WHERE id = 1 ");
resultSet.next();
resultSet.updateCharacterStream(2, new StringReader("hello"), "hello".length());
resultSet.updateRow();
resultSet.close();
Da: Nicola Zanaga
Inviato: domenica 17 luglio 2016 11:39
A: pgsql-jdbc@postgresql.org
Oggetto: Slow performance updating CLOB data
Hi, using ResultSet.updateCharacterStream to update a CLOB is very slow.
Most of the time is spent in method PgResultSet.isUpdateable because the table has a primary key but doesn't have a "oid" column.
So the code tries to get primary keys from the query, using getMetaData().getPrimaryKeys.
This is a very slow process.
There is any workaround ?
Nicola Zanaga schrieb am 18.07.2016 um 10:13: > Here a snippet: > > st= connection.createStatement(ResultSet.*/TYPE_FORWARD_ONLY/*, ResultSet.*/CONCUR_UPDATABLE/*); > ResultSet resultSet= st.executeQuery("SELECT id, data FROM TestClob1 WHERE id = 1 "); > > > resultSet.next(); > resultSet.updateCharacterStream(2, *new*StringReader("hello"), "hello".length()); > resultSet.updateRow(); > resultSet.close(); Did you try resultSet.updateString(2, "hello") instead? > Most of the time is spent in method PgResultSet.isUpdateable because the table has a primary key but doesn't have a "oid"column. > So the code tries to get primary keys from the query, using getMetaData().getPrimaryKeys. But in order to run an UPDATE on the table, the driver will need to retrieve the PK, there is no way around that. I guess using updateString() won't change that Thomas
Is there a reason of using updateable resultset? Simple update table set... should work just fine. PS. Having table -> PK cache in pgjdbc would be good. Vladimir
Thanks for the quick replies. I'm using updateable resultset only to update CLOB data. My application uses many databases. This method can be used with many driver (I think this is the "standard" mode to updateCLOB data). I took a look at the code in PgResultSet, and I think the driver will be slow with any operation on updateable resultset,because it will always call " isUpdateable()" to do any operation. So, in my opinion, this makes updateable resultset quite useless To check if the query is updateable, the code first checks if it's using a single table, so should be easy to cache tableand primary keys. -----Messaggio originale----- Da: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] Per conto di Vladimir Sitnikov Inviato: lunedì 18 luglio 2016 10:54 A: Thomas Kellerer; pgsql-jdbc@postgresql.org Oggetto: Re: [JDBC] Slow performance updating CLOB data Is there a reason of using updateable resultset? Simple update table set... should work just fine. PS. Having table -> PK cache in pgjdbc would be good. Vladimir -- Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-jdbc
Thanks for the quick replies.
I'm using updateable resultset only to update CLOB data.
My application uses many databases. This method can be used with many driver (I think this is the "standard" mode to update CLOB data).
I took a look at the code in PgResultSet, and I think the driver will be slow with any operation on updateable resultset, because it will always call " isUpdateable()" to do any operation.
So, in my opinion, this makes updateable resultset quite useless
Here a snippet:
Properties props = new Properties();
props.put("user", "test1");
props.put("password", "test1");
Connection connection = java.sql.DriverManager.getConnection("jdbc:postgresql://<server>", props);
Statement st = connection.createStatement();
st.execute("CREATE TABLE TestClob1 (id int8 NOT NULL, data TEXT, PRIMARY KEY(id))");
st.close();
st = connection.createStatement();
st.execute("INSERT INTO TestClob1 (id) VALUES (1) ");
st.close();
st = connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_UPDATABLE);
ResultSet resultSet = st.executeQuery("SELECT id, data FROM TestClob1 WHERE id = 1 ");
resultSet.next();
resultSet.updateCharacterStream(2, new StringReader("hello"), "hello".length());
resultSet.updateRow();
resultSet.close();
Da: Nicola Zanaga
Inviato: domenica 17 luglio 2016 11:39
A: pgsql-jdbc@postgresql.org
Oggetto: Slow performance updating CLOB data
Hi, using ResultSet.updateCharacterStream to update a CLOB is very slow.
Most of the time is spent in method PgResultSet.isUpdateable because the table has a primary key but doesn't have a "oid" column.
So the code tries to get primary keys from the query, using getMetaData().getPrimaryKeys.
This is a very slow process.
There is any workaround ?
Nicola Zanaga schrieb am 18.07.2016 um 10:13: > Here a snippet: > > st= connection.createStatement(ResultSet.*/TYPE_FORWARD_ONLY/*, ResultSet.*/CONCUR_UPDATABLE/*); > ResultSet resultSet= st.executeQuery("SELECT id, data FROM TestClob1 WHERE id = 1 "); > > > resultSet.next(); > resultSet.updateCharacterStream(2, *new*StringReader("hello"), "hello".length()); > resultSet.updateRow(); > resultSet.close(); Did you try resultSet.updateString(2, "hello") instead? > Most of the time is spent in method PgResultSet.isUpdateable because the table has a primary key but doesn't have a "oid"column. > So the code tries to get primary keys from the query, using getMetaData().getPrimaryKeys. But in order to run an UPDATE on the table, the driver will need to retrieve the PK, there is no way around that. I guess using updateString() won't change that Thomas
Is there a reason of using updateable resultset? Simple update table set... should work just fine. PS. Having table -> PK cache in pgjdbc would be good. Vladimir
I’m using this technique also with Oracle, Sql Server, DB2 and H2. Postgresql driver is about 5-10 times slower.
I can change strategy for postgres, but I don’t think is good to issue a query like “UPDATE table SET clob = ‘value’ WHERE key = x” if value is more than 10Mb.
So I’m bit confused how to do this operation.
PS: about the cache, in my opinion it’s acceptable cache primary keys, I don’t expect primary keys change so often.
Da: Vladimir Sitnikov [mailto:sitnikov.vladimir@gmail.com]
Inviato: lunedì 18 luglio 2016 13:48
A: Dave Cramer; Nicola Zanaga
Cc: Thomas Kellerer; pgsql-jdbc@postgresql.org
Oggetto: Re: [JDBC] Slow performance updating CLOB data
Dave>Well all drivers have to do something similar. Not all result sets are updatable. What do other drivers do ?
Technically speaking, pgjdbc could cache the names of primary keys for a given table.
It would be useful at least in two places:
1) updateable resultset
2) return generated keys
However, as of now no such cache exists in pgjdbc.
The second issue is the backend does not send notifications on DDL changes. Thus the cache can easily get out of sync when java thinks there's a column named A, and in reality the column was dropped long ago.
Vladimir
Dave>Well all drivers have to do something similar. Not all result sets are updatable. What do other drivers do ?Technically speaking, pgjdbc could cache the names of primary keys for a given table.It would be useful at least in two places:1) updateable resultset2) return generated keysHowever, as of now no such cache exists in pgjdbc.The second issue is the backend does not send notifications on DDL changes. Thus the cache can easily get out of sync when java thinks there's a column named A, and in reality the column was dropped long ago.
On 18 July 2016 at 07:48, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:Dave>Well all drivers have to do something similar. Not all result sets are updatable. What do other drivers do ?Technically speaking, pgjdbc could cache the names of primary keys for a given table.It would be useful at least in two places:1) updateable resultset2) return generated keysHowever, as of now no such cache exists in pgjdbc.The second issue is the backend does not send notifications on DDL changes. Thus the cache can easily get out of sync when java thinks there's a column named A, and in reality the column was dropped long ago.This error would happen far fewer times than the cache would help the problem. I think if we can figure out how to gracefully recover from the error we would be far further ahead.
Nicola Zanaga schrieb am 18.07.2016 um 14:28: > > I can change strategy for postgres, but I don’t think is good to issue a query > like “UPDATE table SET clob = ‘value’ WHERE key = x” if value is more than 10Mb. > You should use a PreparedStatement not string literals. But apart from that, that won't be any different to the SQL that the driver uses. Why do you think that would be a problem? The client needs to send 10MB of data, regardless on _how_ it sends that. Thomas
Thanks for the quick replies. I'm using updateable resultset only to update CLOB data. My application uses many databases. This method can be used with many driver (I think this is the "standard" mode to updateCLOB data). I took a look at the code in PgResultSet, and I think the driver will be slow with any operation on updateable resultset,because it will always call " isUpdateable()" to do any operation. So, in my opinion, this makes updateable resultset quite useless To check if the query is updateable, the code first checks if it's using a single table, so should be easy to cache tableand primary keys. -----Messaggio originale----- Da: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] Per conto di Vladimir Sitnikov Inviato: lunedì 18 luglio 2016 10:54 A: Thomas Kellerer; pgsql-jdbc@postgresql.org Oggetto: Re: [JDBC] Slow performance updating CLOB data Is there a reason of using updateable resultset? Simple update table set... should work just fine. PS. Having table -> PK cache in pgjdbc would be good. Vladimir -- Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-jdbc
Thanks for the quick replies.
I'm using updateable resultset only to update CLOB data.
My application uses many databases. This method can be used with many driver (I think this is the "standard" mode to update CLOB data).
I took a look at the code in PgResultSet, and I think the driver will be slow with any operation on updateable resultset, because it will always call " isUpdateable()" to do any operation.
So, in my opinion, this makes updateable resultset quite useless
I’m using this technique also with Oracle, Sql Server, DB2 and H2. Postgresql driver is about 5-10 times slower.
I can change strategy for postgres, but I don’t think is good to issue a query like “UPDATE table SET clob = ‘value’ WHERE key = x” if value is more than 10Mb.
So I’m bit confused how to do this operation.
PS: about the cache, in my opinion it’s acceptable cache primary keys, I don’t expect primary keys change so often.
Da: Vladimir Sitnikov [mailto:sitnikov.vladimir@gmail.com]
Inviato: lunedì 18 luglio 2016 13:48
A: Dave Cramer; Nicola Zanaga
Cc: Thomas Kellerer; pgsql-jdbc@postgresql.org
Oggetto: Re: [JDBC] Slow performance updating CLOB data
Dave>Well all drivers have to do something similar. Not all result sets are updatable. What do other drivers do ?
Technically speaking, pgjdbc could cache the names of primary keys for a given table.
It would be useful at least in two places:
1) updateable resultset
2) return generated keys
However, as of now no such cache exists in pgjdbc.
The second issue is the backend does not send notifications on DDL changes. Thus the cache can easily get out of sync when java thinks there's a column named A, and in reality the column was dropped long ago.
Vladimir
Dave>Well all drivers have to do something similar. Not all result sets are updatable. What do other drivers do ?Technically speaking, pgjdbc could cache the names of primary keys for a given table.It would be useful at least in two places:1) updateable resultset2) return generated keysHowever, as of now no such cache exists in pgjdbc.The second issue is the backend does not send notifications on DDL changes. Thus the cache can easily get out of sync when java thinks there's a column named A, and in reality the column was dropped long ago.
On 18 July 2016 at 07:48, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:Dave>Well all drivers have to do something similar. Not all result sets are updatable. What do other drivers do ?Technically speaking, pgjdbc could cache the names of primary keys for a given table.It would be useful at least in two places:1) updateable resultset2) return generated keysHowever, as of now no such cache exists in pgjdbc.The second issue is the backend does not send notifications on DDL changes. Thus the cache can easily get out of sync when java thinks there's a column named A, and in reality the column was dropped long ago.This error would happen far fewer times than the cache would help the problem. I think if we can figure out how to gracefully recover from the error we would be far further ahead.
Nicola Zanaga schrieb am 18.07.2016 um 14:28: > > I can change strategy for postgres, but I don’t think is good to issue a query > like “UPDATE table SET clob = ‘value’ WHERE key = x” if value is more than 10Mb. > You should use a PreparedStatement not string literals. But apart from that, that won't be any different to the SQL that the driver uses. Why do you think that would be a problem? The client needs to send 10MB of data, regardless on _how_ it sends that. Thomas
Please see comment at the end of this post... On 19/07/16 00:55, Christopher BROWN wrote: > Hello, > > What would the scope and capacity of the cache be? For example, > scoped to the lifetime of a (pooled) Connection, to that of Statement, > or something else, and how could the cache capacity be controlled > (avoiding excessive garbage collection pressure, etc.) and > instrumented (cache hits/misses, cache filling and emptying rates, > etc.)? Would it be possible for the application to issue a command to > clear the cache immediately if the application is aware of structural > changes (this can happen a lot in development might lead to unexpected > behavior)? > > -- > Christopher > > > On 18 July 2016 at 14:38, Dave Cramer <pg@fastcrypt.com > <mailto:pg@fastcrypt.com>> wrote: > > > > > > On 18 July 2016 at 07:48, Vladimir Sitnikov > <sitnikov.vladimir@gmail.com <mailto:sitnikov.vladimir@gmail.com>> > wrote: > > Dave>Well all drivers have to do something similar. Not all > result sets are updatable. What do other drivers do ? > > Technically speaking, pgjdbc could cache the names of primary > keys for a given table. > It would be useful at least in two places: > 1) updateable resultset > 2) return generated keys > > However, as of now no such cache exists in pgjdbc. > The second issue is the backend does not send notifications on > DDL changes. Thus the cache can easily get out of sync when > java thinks there's a column named A, and in reality the > column was dropped long ago. > > This error would happen far fewer times than the cache would help > the problem. I think if we can figure out how to gracefully > recover from the error we would be far further ahead. > > Dave Cramer > > davec@postgresintl.com <mailto:davec@postgresintl.com> > www.postgresintl.com <http://www.postgresintl.com/> > In this list, the convention is to post replies at the end (with some rare exceptions), or interspersed when appropriate, and to omit parts no longer relevant. The motivation of bottom posting like this: is that people get to see the context before the reply, AND emails don't end up getting longer & longer as people reply at the beginning forgetting to trim the now irrelevant stuff at the end. Cheers, Gavin
Please see comment at the end of this post... On 19/07/16 00:55, Christopher BROWN wrote: > Hello, > > What would the scope and capacity of the cache be? For example, > scoped to the lifetime of a (pooled) Connection, to that of Statement, > or something else, and how could the cache capacity be controlled > (avoiding excessive garbage collection pressure, etc.) and > instrumented (cache hits/misses, cache filling and emptying rates, > etc.)? Would it be possible for the application to issue a command to > clear the cache immediately if the application is aware of structural > changes (this can happen a lot in development might lead to unexpected > behavior)? > > -- > Christopher > > > On 18 July 2016 at 14:38, Dave Cramer <pg@fastcrypt.com > <mailto:pg@fastcrypt.com>> wrote: > > > > > > On 18 July 2016 at 07:48, Vladimir Sitnikov > <sitnikov.vladimir@gmail.com <mailto:sitnikov.vladimir@gmail.com>> > wrote: > > Dave>Well all drivers have to do something similar. Not all > result sets are updatable. What do other drivers do ? > > Technically speaking, pgjdbc could cache the names of primary > keys for a given table. > It would be useful at least in two places: > 1) updateable resultset > 2) return generated keys > > However, as of now no such cache exists in pgjdbc. > The second issue is the backend does not send notifications on > DDL changes. Thus the cache can easily get out of sync when > java thinks there's a column named A, and in reality the > column was dropped long ago. > > This error would happen far fewer times than the cache would help > the problem. I think if we can figure out how to gracefully > recover from the error we would be far further ahead. > > Dave Cramer > > davec@postgresintl.com <mailto:davec@postgresintl.com> > www.postgresintl.com <http://www.postgresintl.com/> > In this list, the convention is to post replies at the end (with some rare exceptions), or interspersed when appropriate, and to omit parts no longer relevant. The motivation of bottom posting like this: is that people get to see the context before the reply, AND emails don't end up getting longer & longer as people reply at the beginning forgetting to trim the now irrelevant stuff at the end. Cheers, Gavin
-----Messaggio originale----- Da: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] Per conto di Thomas Kellerer Inviato: lunedì 18 luglio 2016 15:01 A: pgsql-jdbc@postgresql.org Oggetto: Re: [JDBC] Slow performance updating CLOB data Nicola Zanaga schrieb am 18.07.2016 um 14:28: > > I can change strategy for postgres, but I don’t think is good to issue > a query like “UPDATE table SET clob = ‘value’ WHERE key = x” if value is more than 10Mb. > You should use a PreparedStatement not string literals. But apart from that, that won't be any different to the SQL that the driver uses. Why do you think that would be a problem? The client needs to send 10MB of data, regardless on _how_ it sends that. Thomas I solved my problem switching to prepared statement. Now the performance are like other drivers. However, in general, it's not the same thing sending the full sql query, instead of using 'setCharacterStream' or 'setBinaryStream'(for prepared statement) or 'updateCharacterStream' or 'updateBinaryStream' (for updatable resultset). Using streams a driver could optimize sending data to the server in small packets. Thanks
-----Messaggio originale----- Da: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] Per conto di Thomas Kellerer Inviato: lunedì 18 luglio 2016 15:01 A: pgsql-jdbc@postgresql.org Oggetto: Re: [JDBC] Slow performance updating CLOB data Nicola Zanaga schrieb am 18.07.2016 um 14:28: > > I can change strategy for postgres, but I don’t think is good to issue > a query like “UPDATE table SET clob = ‘value’ WHERE key = x” if value is more than 10Mb. > You should use a PreparedStatement not string literals. But apart from that, that won't be any different to the SQL that the driver uses. Why do you think that would be a problem? The client needs to send 10MB of data, regardless on _how_ it sends that. Thomas I solved my problem switching to prepared statement. Now the performance are like other drivers. However, in general, it's not the same thing sending the full sql query, instead of using 'setCharacterStream' or 'setBinaryStream'(for prepared statement) or 'updateCharacterStream' or 'updateBinaryStream' (for updatable resultset). Using streams a driver could optimize sending data to the server in small packets. Thanks