Thread: FW: Question about the postgres resultset implementation
I wrote Peter about the JDBC driver implementation of findColumn and he suggested I go here for help. I know the implementation that the Postgres driver uses (iterating through an array, doing case insensitive searches on each field name) is common and probably came from an old reference driver, but it's slow. I'm using an ORM tool that currently asks for all values by their column name (Hibernate) and I'm actually pestering both sides to see if I can get some improvements. Hibernate should really use indexes, but the postgres driver should really do a better job than it's doing now. I did some tinkering of my own and found that one of the problems with making the driver less cumbersome is the fact that findColumn() is currently case insensitive. I don't know if this is required by the jdbc spec, but it seems all the field names come back lower case (in the fields[] array), and so I could replace the implementation without understanding how to prevent the case of the field names from changing (changing from the case they were sent in as). At any rate, if case insensitivity could be thrown out then a very fast implementation could be worked out. As is, the following code was a marked improvement: int i; final int flen = fields.length; final String lowerCaseColumnName = columnName.toLowerCase(); for (i = 0 ; i < flen; ++i) if ((fields[I].getName().hashCode() == lowerCaseColumnName.hashCode()) && (fields[i].getName().equalsIgnoreCase(columnName))) return (i + 1); The int comparisons to hashCode() values are much faster than case insensitive String compares. The use of a HashMap would be even better, especially for very wide resultsets. Anyways, if we could make a change to the driver I'd be very pleased. And I'm sure I'm not the only one. Actually, the level 4 Oracle driver has a very similar implementation that's causing Hibernate users some pain, so speeding up the Postgres implementation would actually make it a much more performant option with that particular tool. :) Thanks, Phill P.S. I'm not subscribed to the list (nor do I know how to be), so please include me in any replies. ------ Forwarded Message From: Peter Mount <peter@retep.org.uk> Reply-To: <peter@retep.org.uk> Date: Wed, 13 Oct 2004 08:44:20 +0100 To: "Tornroth, Phill" <ptornroth@intellidot.net> Subject: Re: Question about the postgres resultset implementation Tornroth, Phill wrote: >I noticed that getting values from the Postgres JDBC ResultSet by column >name was expensive. I looked at the implementation, and findColumn(String) >does the following: > > int i; > > final int flen = fields.length; > for (i = 0 ; i < flen; ++i) > if (fields[i].getName().equalsIgnoreCase(columnName)) > return (i + 1); > > >If you're getting results by name for a query with 30 columns and a hundred >rows, you'll do 4650 case insensitive string comparisons. I believe it would >be much, much faster to use a Hashtable for this task. Is there a reason the >implementation is as it is? > > I think that was a legacy from the original two drivers, and as it's always worked it's stayed there. I agree with you a Map of some sort (HashMap would be ideal, Hashtable would be overkill as it's syncronized and technically only one thread should access a ResultSet) would help. NB: I've not been part of the drivers development for some three years now, so it's best to ask on the jdbc list. Peter > >Phill > > ------ End of Forwarded Message
Tornroth, Phill wrote: > I did some tinkering of my own and found that one of the problems with > making the driver less cumbersome is the fact that findColumn() is currently > case insensitive. I don't know if this is required by the jdbc spec, but it > seems all the field names come back lower case (in the fields[] array), and > so I could replace the implementation without understanding how to prevent > the case of the field names from changing (changing from the case they were > sent in as). They come back lower-case because that's how they are in the actual schema. Unless you quote identifiers, they get smashed to lowercase by the backend: >> test=> create table t1(lowercase integer, UPPERCASE integer, "QUOTEDUPPERCASE" integer); >> CREATE TABLE >> test=> select * from t1; >> lowercase | uppercase | QUOTEDUPPERCASE >> -----------+-----------+----------------- >> (0 rows) > At any rate, if case insensitivity could be thrown out then a very fast > implementation could be worked out. As is, the following code was a marked > improvement: That change is slightly buggy as you *can* get uppercase characters in a field name. We can't entirely discard case-insensitivy as JDBC requires that column names are found case-insensitively. This actually makes the use of column names somewhat unreliable if you ever have two columns with names that only differ by case. From the JDBC javadoc: >> Column names used as input to getter methods are case insensitive. When >> a getter method is called with a column name and several columns have >> the same name, the value of the first matching column will be returned. >> The column name option is designed to be used when column names are used >> in the SQL query that generated the result set. For columns that are NOT >> explicitly named in the query, it is best to use column numbers. If >> column names are used, there is no way for the programmer to guarantee >> that they actually refer to the intended columns. (I am assuming that "findColumn" is considered a getter method.. the javadoc is vague as ever) Nevertheless, we could certainly use a hashmap to speed up field lookups. This should be pretty trivial to implement, expect a patch shortly. (hah, in fact I see that Kris got there first..) -O
Phill, this is largely because postgres is case insensitive. all column and table names are folded into lower case unless you surround them with quotes. Dave On Wed, 2004-10-13 at 11:04, Tornroth, Phill wrote: > I wrote Peter about the JDBC driver implementation of findColumn and he > suggested I go here for help. > > I know the implementation that the Postgres driver uses (iterating through > an array, doing case insensitive searches on each field name) is common and > probably came from an old reference driver, but it's slow. > > I'm using an ORM tool that currently asks for all values by their column > name (Hibernate) and I'm actually pestering both sides to see if I can get > some improvements. Hibernate should really use indexes, but the postgres > driver should really do a better job than it's doing now. > > I did some tinkering of my own and found that one of the problems with > making the driver less cumbersome is the fact that findColumn() is currently > case insensitive. I don't know if this is required by the jdbc spec, but it > seems all the field names come back lower case (in the fields[] array), and > so I could replace the implementation without understanding how to prevent > the case of the field names from changing (changing from the case they were > sent in as). > > At any rate, if case insensitivity could be thrown out then a very fast > implementation could be worked out. As is, the following code was a marked > improvement: > > int i; > > final int flen = fields.length; > final String lowerCaseColumnName = columnName.toLowerCase(); > > for (i = 0 ; i < flen; ++i) > if ((fields[I].getName().hashCode() == lowerCaseColumnName.hashCode()) && > (fields[i].getName().equalsIgnoreCase(columnName))) > return (i + 1); > > > The int comparisons to hashCode() values are much faster than case > insensitive String compares. The use of a HashMap would be even better, > especially for very wide resultsets. > > Anyways, if we could make a change to the driver I'd be very pleased. And > I'm sure I'm not the only one. Actually, the level 4 Oracle driver has a > very similar implementation that's causing Hibernate users some pain, so > speeding up the Postgres implementation would actually make it a much more > performant option with that particular tool. :) > > Thanks, > Phill > > P.S. I'm not subscribed to the list (nor do I know how to be), so please > include me in any replies. > > ------ Forwarded Message > From: Peter Mount <peter@retep.org.uk> > Reply-To: <peter@retep.org.uk> > Date: Wed, 13 Oct 2004 08:44:20 +0100 > To: "Tornroth, Phill" <ptornroth@intellidot.net> > Subject: Re: Question about the postgres resultset implementation > > Tornroth, Phill wrote: > > >I noticed that getting values from the Postgres JDBC ResultSet by column > >name was expensive. I looked at the implementation, and findColumn(String) > >does the following: > > > > int i; > > > > final int flen = fields.length; > > for (i = 0 ; i < flen; ++i) > > if (fields[i].getName().equalsIgnoreCase(columnName)) > > return (i + 1); > > > > > >If you're getting results by name for a query with 30 columns and a hundred > >rows, you'll do 4650 case insensitive string comparisons. I believe it would > >be much, much faster to use a Hashtable for this task. Is there a reason the > >implementation is as it is? > > > > > I think that was a legacy from the original two drivers, and as it's > always worked it's stayed there. I agree with you a Map of some sort > (HashMap would be ideal, Hashtable would be overkill as it's syncronized > and technically only one thread should access a ResultSet) would help. > > NB: I've not been part of the drivers development for some three years > now, so it's best to ask on the jdbc list. > > Peter > > > > >Phill > > > > > > > ------ End of Forwarded Message > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match -- Dave Cramer 519 939 0336 ICQ # 14675561 www.postgresintl.com
On Wed, 13 Oct 2004, Kris Jurka wrote: > I believe Peter's suggestion was to use a hashmap to cache the > string->index mapping so that findColumn would only need to do the > equalsIgnoreCase on the first call of that string, meaning once per column > instead of for every call. > I've committed a fix to cvs following this suggestion. The new findColumn method looks like this: Kris Jurka public int findColumn(String columnName) throws SQLException { Integer index = (Integer)columnNameIndexMap.get(columnName); if (index != null) { return index.intValue(); } final int flen = fields.length; for (int i = 0 ; i < flen; ++i) { if (fields[i].getColumnLabel().equalsIgnoreCase(columnName)) { index = new Integer(i+1); columnNameIndexMap.put(columnName, index); return index.intValue(); } } throw new PSQLException (GT.tr("The column name '{0}' was not found in this ResultSet.", columnName)); }
On Wed, 13 Oct 2004, Tornroth, Phill wrote: > I did some tinkering of my own and found that one of the problems with > making the driver less cumbersome is the fact that findColumn() is currently > case insensitive. I don't know if this is required by the jdbc spec, but it > seems all the field names come back lower case (in the fields[] array), and > so I could replace the implementation without understanding how to prevent > the case of the field names from changing (changing from the case they were > sent in as). Case insensitivity is required. Note that field names won't come back in lowercase if they aren't created as lowercase. Postgresql folds all non-quoted identifiers to lower case, so you need to quote them to retain case. > > At any rate, if case insensitivity could be thrown out then a very fast > implementation could be worked out. As is, the following code was a marked > improvement: I believe Peter's suggestion was to use a hashmap to cache the string->index mapping so that findColumn would only need to do the equalsIgnoreCase on the first call of that string, meaning once per column instead of for every call. Kris Jurka
Wow, thanks! This will be very helpful to us. I can't believe the speed at which you've responded to this. I really appreciateit. Are official releases of the JDBC driver tied to official releases of Postgres? Phill -----Original Message----- From: Kris Jurka [mailto:books@ejurka.com] Sent: Wed 10/13/2004 2:03 PM To: Tornroth, Phill Cc: pgsql-jdbc@postgresql.org Subject: Re: [JDBC] FW: Question about the postgres resultset implementation On Wed, 13 Oct 2004, Kris Jurka wrote: > I believe Peter's suggestion was to use a hashmap to cache the > string->index mapping so that findColumn would only need to do the > equalsIgnoreCase on the first call of that string, meaning once per column > instead of for every call. > I've committed a fix to cvs following this suggestion. The new findColumn method looks like this: Kris Jurka public int findColumn(String columnName) throws SQLException { Integer index = (Integer)columnNameIndexMap.get(columnName); if (index != null) { return index.intValue(); } final int flen = fields.length; for (int i = 0 ; i < flen; ++i) { if (fields[i].getColumnLabel().equalsIgnoreCase(columnName)) { index = new Integer(i+1); columnNameIndexMap.put(columnName, index); return index.intValue(); } } throw new PSQLException (GT.tr("The column name '{0}' was not found in this ResultSet.", columnName)); }
On Wed, 13 Oct 2004, Tornroth, Phill wrote: > Are official releases of the JDBC driver tied to official releases of > Postgres? Yes. Up until the upcoming 8.0 release the JDBC driver was included in the main postgresql source tree, so it was released together. Now it's packaged separately, but based on that history and the fact that JDBC features and new server features match up well we have decided to keep the same release schedule. Kris Jurka