Thread: refreshRow is slow
I am having no luck getting responses to a question I posted earlier about optimistic locking with multiple rows. I'll ask a more targeted question. How do I efficiently refresh a single row in a resultset? Suppose I allow users to view multiple records with TYPE_SCROLL_INSENSITIVE, CONCUR_READ_ONLY. The user might select one row or 1,000 rows and scroll back and forth at will. The user might choose to change one of the rows. I do this with a separate FOR UPDATE query. The update query will fetch the current row, which might be different from the row in the original resultset. My programmingcompares the two rows and can report to the user the differences. After the row has been updated, I need to refreshthat row in the result set so that if the user continues to scroll back and forth, it will see the updated information. I am using refreshRow. I do setFetchSize(1) first. This works, but sometimes it is very slow. It can take many seconds overa remote connection. Can I fix that? What's going on? Note: I don't want to use TYPE_SCROLL_SENSITIVE because it would update the row in the result set silently and I wouldn'tbe able to tell the user that somebody had changed anything. (Besides, I understand that TYPE_SCROLL_SENSITIVE doesn'twork reliably.) Thanks John
John T. Dow wrote: > I am having no luck getting responses to a question I posted earlier about optimistic locking with multiple rows. > > I'll ask a more targeted question. How do I efficiently refresh a single row in a resultset? > > Suppose I allow users to view multiple records with TYPE_SCROLL_INSENSITIVE, CONCUR_READ_ONLY. > > The user might select one row or 1,000 rows and scroll back and forth at will. > > The user might choose to change one of the rows. I do this with a separate FOR UPDATE query. > > The update query will fetch the current row, which might be different from the row in the original resultset. My programmingcompares the two rows and can report to the user the differences. After the row has been updated, I need to refreshthat row in the result set so that if the user continues to scroll back and forth, it will see the updated information. > > I am using refreshRow. I do setFetchSize(1) first. This works, but sometimes it is very slow. It can take many secondsover a remote connection. Can I fix that? What's going on? What's your original query? What's the query that the driver ends up synthesizing when you call refreshRow()? Have you run that query separately under EXPLAIN ANALYZE etc? Do you have knowledge of the structure of the original query (i.e. are you building it yourself, or is it user-supplied?) If you do, have you considered running suitable update/refresh query yourself directly? (The driver has limited knowledge of your database structure and may not be able to produce 'good' queries for update/refresh via the resultset interface) > Note: I don't want to use TYPE_SCROLL_SENSITIVE because it would update the row in the result set silently and I wouldn'tbe able to tell the user that somebody had changed anything. (Besides, I understand that TYPE_SCROLL_SENSITIVE doesn'twork reliably.) From memory the driver doesn't implement TYPE_SCROLL_SENSITIVE at all (that type of resultset is quite strange..) -O
On Fri, 15 Jan 2010 10:59:00 +1300, Oliver Jowett wrote: >John T. Dow wrote: >> I am having no luck getting responses to a question I posted earlier about optimistic locking with multiple rows. >> >> I'll ask a more targeted question. How do I efficiently refresh a single row in a resultset? >> >> Suppose I allow users to view multiple records with TYPE_SCROLL_INSENSITIVE, CONCUR_READ_ONLY. >> >> The user might select one row or 1,000 rows and scroll back and forth at will. >> >> The user might choose to change one of the rows. I do this with a separate FOR UPDATE query. >> >> The update query will fetch the current row, which might be different from the row in the original resultset. My programmingcompares the two rows and can report to the user the differences. After the row has been updated, I need to refreshthat row in the result set so that if the user continues to scroll back and forth, it will see the updated information. >> >> I am using refreshRow. I do setFetchSize(1) first. This works, but sometimes it is very slow. It can take many secondsover a remote connection. Can I fix that? What's going on? > >What's your original query? >What's the query that the driver ends up synthesizing when you call >refreshRow()? >Have you run that query separately under EXPLAIN ANALYZE etc? > >Do you have knowledge of the structure of the original query (i.e. are >you building it yourself, or is it user-supplied?) If you do, have you >considered running suitable update/refresh query yourself directly? (The >driver has limited knowledge of your database structure and may not be >able to produce 'good' queries for update/refresh via the resultset >interface) The original query is user supplied. It can range from a query using a primary key that selects a single row to a query selectinghalf a dozen rows using their primary keys to a complicated query testing ranges on several columns and resultingin a thousand rows. If I knew the query would only return a small number of rows, I'd just redo the query. (That'swhat I have to do with Derby.) But you'd think that if there are a thousand rows, refreshing one row would be betterthan rereading all of them. That's the theory anyway. That's why I'm hoping refreshrow would be one solution that fitsall user supplied queries. Each row in a postgres database has a unique identifier. I would think that refreshrow could use that to target the specificrow to be refreshed. I realize that I only want to refresh one row, and that refreshrow actually uses the fetch size and may get more than oneif required. Refreshing many rows using their unique identifiers might not be desirable. My program of course knows what the refreshed row should look like. Too bad there's not a way to switch out a row in theresult set. > >> Note: I don't want to use TYPE_SCROLL_SENSITIVE because it would update the row in the result set silently and I wouldn'tbe able to tell the user that somebody had changed anything. (Besides, I understand that TYPE_SCROLL_SENSITIVE doesn'twork reliably.) > > From memory the driver doesn't implement TYPE_SCROLL_SENSITIVE at all >(that type of resultset is quite strange..) I kind of got that impression regarding postgres. For another database I was reading about (Oracle?), it was said to notbe reliable as to whether it would pick up all changes. > >-O >
John T. Dow wrote: > Each row in a postgres database has a unique identifier. I would think that refreshrow could use that to target the specificrow to be refreshed. refreshRow() uses the primary key of the table to synthesize a SELECT to pick out just the one row, yes. If you explicitly select a column called "oid" in your query, it'll use that instead of the PK columns. (I looked at that code and there are a few ways to confuse it. That's why I want to know what the actual query that's causing problems is.) > > I realize that I only want to refresh one row, and that refreshrow actually uses the fetch size and may get more than oneif required. Refreshing many rows using their unique identifiers might not be desirable. Huh? Fetch size should have no effect on refreshRow() - it should only be selecting a single row anyway. Again: What is the actual query being synthesized by refreshRow(), and have you tried analysing that query to work out why it's slow? Also, if you just want to do an UPDATE and have the freshly updated data appear in your resultset, why don't you use the ResultSet update interface? updateRow() populates the ResultSet row with the data sent to the backend as part of the update. -O
I observe the following in the jdbc log. There's the original read only query, with 722 rows. The actual query was "SELECT * FROM doctor WHERE 1 = 1". Yes, I knowit could just be "SELECT * FROM doctor" but the query is generated based on user input, and that's my version of a nullwhere clause. After the first row of the result set was displayed, I changed a field and updated the table. The log shows the query "forupdate" and later a commit. Following that, where my program would do the refreshRow on the original resultset, I find a 21 instances of this block (somelines omitted): 17:25:50.812 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@1142196, maxRows=0,fetchSize=0, flags=17 17:25:50.812 (1) FE=> Parse(stmt=null,query="SELECT attname FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum= $2",oids={23,23}) 17:25:50.812 (1) FE=> Bind(stmt=null,portal=null,$1=<16404>,$2=<1>) 17:25:50.812 (1) FE=> Describe(portal=null) 17:25:50.812 (1) FE=> Execute(portal=null,limit=0) 17:25:50.812 (1) FE=> Sync 17:25:50.812 (1) <=BE ParseComplete [null] 17:25:50.812 (1) <=BE BindComplete [null] 17:25:50.812 (1) <=BE RowDescription(1) 17:25:50.812 (1) <=BE DataRow 17:25:50.812 (1) <=BE CommandStatus(SELECT) 17:25:50.812 (1) <=BE ReadyForQuery(I) The only difference between the 21 blocks is that the $2 parameter goes 1, 2, 3, 4, ... 21 After the 21st block comes this, and that ends the log. 17:38:47.031 (1) selecting select id, address, citystatezip, combined, degree, email, fax, firstname, groupname, lastname,nfiid, pabsid, practicetype, ssn, telephone, insertby, insertdate, inserttime, updateby, updatedate, updatetimefrom doctor where id= ? 17:38:47.046 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@d3c6a3, maxRows=0,fetchSize=0, flags=17 17:38:47.046 (1) FE=> Parse(stmt=null,query="select id, address, citystatezip, combined, degree, email, fax, firstname,groupname, lastname, nfiid, pabsid, practicetype, ssn, telephone, insertby, insertdate, inserttime, updateby, updatedate,updatetime from doctor where id= $1",oids={1043}) 17:38:47.046 (1) FE=> Bind(stmt=null,portal=null,$1=<0984>) 17:38:47.046 (1) FE=> Describe(portal=null) 17:38:47.046 (1) FE=> Execute(portal=null,limit=0) 17:38:47.046 (1) FE=> Sync 17:38:47.046 (1) <=BE ParseComplete [null] 17:38:47.046 (1) <=BE BindComplete [null] 17:38:47.046 (1) <=BE RowDescription(21) 17:38:47.046 (1) <=BE DataRow 17:38:47.046 (1) <=BE CommandStatus(SELECT) 17:38:47.046 (1) <=BE ReadyForQuery(I) 17:38:47.046 (1) done updates 17:38:49.765 (1) FE=> Terminate The value 0984 is the primary key value for the row I changed. I repeated this with the same original resultset, but movedto the 25th row before making a change. Again it did 21 instances of this block (the primary key value was of coursedifferent). To me, who knows nothing, the mystery is why 21 repetitions? I repeated this with another table. The original query had 792 rows. I went to the last row and made a change. This timethere were 408 repetitions. Again, it used the primary key for the $1 parameter shown above. So why 408 repetitions this time? John
John T. Dow wrote: > Following that, where my program would do the refreshRow on the original resultset, I find a 21 instances of this block(some lines omitted): > > 17:25:50.812 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@1142196, maxRows=0,fetchSize=0, flags=17 > 17:25:50.812 (1) FE=> Parse(stmt=null,query="SELECT attname FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum= $2",oids={23,23}) > 17:25:50.812 (1) FE=> Bind(stmt=null,portal=null,$1=<16404>,$2=<1>) This is a metadata query, getting the canonical name of each field/column in your resultset so that updateRow() knows how to build the refresh query. > 17:38:47.031 (1) selecting select id, address, citystatezip, combined, degree, email, fax, firstname, groupname, lastname,nfiid, pabsid, practicetype, ssn, telephone, insertby, insertdate, inserttime, updateby, updatedate, updatetimefrom doctor where id= ? > 17:38:47.046 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@d3c6a3, maxRows=0,fetchSize=0, flags=17 > 17:38:47.046 (1) FE=> Parse(stmt=null,query="select id, address, citystatezip, combined, degree, email, fax, firstname,groupname, lastname, nfiid, pabsid, practicetype, ssn, telephone, insertby, insertdate, inserttime, updateby, updatedate,updatetime from doctor where id= $1",oids={1043}) > 17:38:47.046 (1) FE=> Bind(stmt=null,portal=null,$1=<0984>) > 17:38:47.046 (1) FE=> Describe(portal=null) > 17:38:47.046 (1) FE=> Execute(portal=null,limit=0) > 17:38:47.046 (1) FE=> Sync And this is the actual refreshRow() query. I assume that this WHERE clause looks correct for your table and that query should be pretty fast? > To me, who knows nothing, the mystery is why 21 repetitions? 21 columns. > I repeated this with another table. The original query had 792 rows. I went to the last row and made a change. This timethere were 408 repetitions. Again, it used the primary key for the $1 parameter shown above. > > So why 408 repetitions this time? 408 columns. Is this a very wide SELECT? I can see that 408 roundtrips would slow things down a lot over a high-latency connection such as over the internet. Perhaps that's the cause of the slowdown you see.. -O
Oliver Jowett wrote: > This is a metadata query, getting the canonical name of each > field/column in your resultset so that updateRow() knows how to build > the refresh query. Sorry, I meant refreshRow() not updateRow() -O
>> I repeated this with another table. The original query had 792 rows. I went to the last row and made a change. This timethere were 408 repetitions. Again, it used the primary key for the $1 parameter shown above. >> >> So why 408 repetitions this time? > >408 columns. Is this a very wide SELECT? > >I can see that 408 roundtrips would slow things down a lot over a >high-latency connection such as over the internet. Perhaps that's the >cause of the slowdown you see.. While your email was on its way to me, I was looking at the source for refreshrow and came to the conclusion that it's thenumber of columns. (Actually, 408 is a bit high, but close. Not important.) Yes, it is a wide select. Virtually every column in a table is displayable so they all have to be retrieved. The user who has been complaining (virtually identical code) has a table with only 93 columns, but still that's a lot ofround trips. Certainly that piece of code can be made more efficient. One query ought to be able to return all the primary keys. I guess something like this won't work, unless "pkey" is reserved: select * from pg_indexes where tablename = 'patinfo' andindexname = 'patinfo_pkey'. That returns indexdev, which can be parsed to get the columns if needed. John
John T. Dow wrote: > Certainly that piece of code can be made more efficient. One query ought to be able to return all the primary keys. It's not the PKs it is after (it's already worked those out earlier), it's the column names to select to populate the rest of the resultset. But yes, it could be more efficient. We'd need a static String[] Field.getColumnNames(Connection,Field[]) method or similar. -O
On Sat, 16 Jan 2010 14:23:33 +1300, Oliver Jowett wrote: >John T. Dow wrote: > >> Certainly that piece of code can be made more efficient. One query ought to be able to return all the primary keys. > >It's not the PKs it is after (it's already worked those out earlier), >it's the column names to select to populate the rest of the resultset. I realized that when I looked more carefully at the code. >But yes, it could be more efficient. We'd need a static String[] >Field.getColumnNames(Connection,Field[]) method or similar. That would be really nice. Another approach is to reread the entire resultset, but with hundreds of rows that's not pretty. Probably faster than manyround trips to get the column names. I could have my programming look at the size of the resultset and the number ofcolumns and decide whether to execute the query again or refreshRow. John
Having decided that refreshRow is going to be too slow sometimes (because it does a query for each column to see if it'sa primary key), I am for the time being revising my standard application logic to execute the original query again. A small complication has developed. There is no guarantee that the rows will be returned in the same order, unless the orderis explicitly stated in the query. Since my queries are usually user-defined (that is, generated by the applicationbased on user requests in a non-SQL syntax), I'll have to do something like sort on the primary keys in all cases.Not necessarily a bad thing to do I suppose. But another complication might be confusing to the users. Suppose someone else has added a row. Executing the original querybrings up more rows than originally. If the user has been scrolling back and forth through the result set, suddenlya new row will appear. Also possible, someone deletes a row, so it disappears. That is likely to generate a phonecall to me ("What happened?"). It sure would be nice if refreshRow was more efficient in learning which are the primary keys. John
Several months ago, on January 14, I posted a question about JDBC's refreshRow being slow. The problem is that the resultset code is accessing the server for each column. It needs to know the column names so it canbuild a query to get the values for that single row. It knows the position of the resultset columns in the actual table, but not the column names. Oliver Jowett suggested thata method like this might help because it would get all column names at once: static String[] Field.getColumnNames(Connection,Field[]) Having poked around in the code, I notice that not only refreshing rows but also updating column values suffers from thisproblem, namely a visit to the server for each column to be updated. I should think that the proper fix would be to get the names of all the columns when the resultset is created so that anynumber of updates and refreshes could be performed without accessing the server for names. In the meantime, since the names returned by the result set agree with the actual table names if there are no aliases inthe original list of columns, I am about to try a small modification -- to use the column label (which is in memory) insteadof the base column name (which isn't). It seems to work in the lab, haven't tried it in the real world yet. Question: should I expect something as simple as this to work, or am I overlooking something? John
On Sat, 16 Jan 2010, Oliver Jowett wrote: > John T. Dow wrote: > >> Certainly that piece of code can be made more efficient. One query ought to >> be able to return all the primary keys. > > It's not the PKs it is after (it's already worked those out earlier), it's > the column names to select to populate the rest of the resultset. > > But yes, it could be more efficient. We'd need a static String[] > Field.getColumnNames(Connection,Field[]) method or similar. > I've adjusted the fetching of all of the ResultSetMetaData attributes to happen in one query for the entire ResultSet instead of one attribute of one column at a time. I've adjusted the updateable ResultSet code to use ResultSetMetaData to take advantage of this, so refreshRow should be much faster now. Kris Jurka