Thread: FW: Question about the postgres resultset implementation

FW: Question about the postgres resultset implementation

From
"Tornroth, Phill"
Date:
I wrote Peter about the JDBC driver implementation of findColumn and he
suggested I go here for help.

I know the implementation that the Postgres driver uses (iterating through
an array, doing case insensitive searches on each field name) is common and
probably came from an old reference driver, but it's slow.

I'm using an ORM tool that currently asks for all values by their column
name (Hibernate) and I'm actually pestering both sides to see if I can get
some improvements. Hibernate should really use indexes, but the postgres
driver should really do a better job than it's doing now.

I did some tinkering of my own and found that one of the problems with
making the driver less cumbersome is the fact that findColumn() is currently
case insensitive. I don't know if this is required by the jdbc spec, but it
seems all the field names come back lower case (in the fields[] array), and
so I could replace the implementation without understanding how to prevent
the case of the field names from changing (changing from the case they were
sent in as).

At any rate, if case insensitivity could be thrown out then a very fast
implementation could be worked out. As is, the following code was a marked
improvement:

int i;

final int flen = fields.length;
final String lowerCaseColumnName = columnName.toLowerCase();

for (i = 0 ; i < flen; ++i)
if ((fields[I].getName().hashCode() == lowerCaseColumnName.hashCode()) &&
(fields[i].getName().equalsIgnoreCase(columnName)))
return (i + 1);


The int comparisons to hashCode() values are much faster than case
insensitive String compares. The use of a HashMap would be even better,
especially for very wide resultsets.

Anyways, if we could make a change to the driver I'd be very pleased. And
I'm sure I'm not the only one. Actually, the level 4 Oracle driver has a
very similar implementation that's causing Hibernate users some pain, so
speeding up the Postgres implementation would actually make it a much more
performant option with that particular tool. :)

Thanks,
Phill

P.S. I'm not subscribed to the list (nor do I know how to be), so please
include me in any replies.

------ Forwarded Message
From: Peter Mount <peter@retep.org.uk>
Reply-To: <peter@retep.org.uk>
Date: Wed, 13 Oct 2004 08:44:20 +0100
To: "Tornroth, Phill" <ptornroth@intellidot.net>
Subject: Re: Question about the postgres resultset implementation

Tornroth, Phill wrote:

>I noticed that getting values from the Postgres JDBC ResultSet by column
>name was expensive. I looked at the implementation, and findColumn(String)
>does the following:
>
>        int i;
>
>        final int flen = fields.length;
>        for (i = 0 ; i < flen; ++i)
>            if (fields[i].getName().equalsIgnoreCase(columnName))
>                return (i + 1);
>
>
>If you're getting results by name for a query with 30 columns and a hundred
>rows, you'll do 4650 case insensitive string comparisons. I believe it would
>be much, much faster to use a Hashtable for this task. Is there a reason the
>implementation is as it is?
>
>
I think that was a legacy from the original two drivers, and as it's
always worked it's stayed there. I agree with you a Map of some sort
(HashMap would be ideal, Hashtable would be overkill as it's syncronized
and technically only one thread should access a ResultSet) would help.

NB: I've not been part of the drivers development for some three years
now, so it's best to ask on the jdbc list.

Peter

>
>Phill
>
>


------ End of Forwarded Message


Re: FW: Question about the postgres resultset implementation

From
Oliver Jowett
Date:
Tornroth, Phill wrote:

> I did some tinkering of my own and found that one of the problems with
> making the driver less cumbersome is the fact that findColumn() is currently
> case insensitive. I don't know if this is required by the jdbc spec, but it
> seems all the field names come back lower case (in the fields[] array), and
> so I could replace the implementation without understanding how to prevent
> the case of the field names from changing (changing from the case they were
> sent in as).

They come back lower-case because that's how they are in the actual
schema. Unless you quote identifiers, they get smashed to lowercase by
the backend:

>> test=> create table t1(lowercase integer, UPPERCASE integer, "QUOTEDUPPERCASE" integer);
>> CREATE TABLE
>> test=> select * from t1;
>>  lowercase | uppercase | QUOTEDUPPERCASE
>> -----------+-----------+-----------------
>> (0 rows)

> At any rate, if case insensitivity could be thrown out then a very fast
> implementation could be worked out. As is, the following code was a marked
> improvement:

That change is slightly buggy as you *can* get uppercase characters in a
field name.

We can't entirely discard case-insensitivy as JDBC requires that column
names are found case-insensitively. This actually makes the use of
column names somewhat unreliable if you ever have two columns with names
that only differ by case. From the JDBC javadoc:

>> Column names used as input to getter methods are case insensitive. When
>> a getter method is called with a column name and several columns have
>> the same name, the value of the first matching column will be returned.
>> The column name option is designed to be used when column names are used
>> in the SQL query that generated the result set. For columns that are NOT
>> explicitly named in the query, it is best to use column numbers. If
>> column names are used, there is no way for the programmer to guarantee
>> that they actually refer to the intended columns.

(I am assuming that "findColumn" is considered a getter method.. the
javadoc is vague as ever)

Nevertheless, we could certainly use a hashmap to speed up field
lookups. This should be pretty trivial to implement, expect a patch shortly.

(hah, in fact I see that Kris got there first..)

-O

Re: FW: Question about the postgres resultset implementation

From
Dave Cramer
Date:
Phill,

this is largely because postgres is case insensitive.
all column and table names are folded into lower case unless you
surround them with quotes.

Dave
On Wed, 2004-10-13 at 11:04, Tornroth, Phill wrote:
> I wrote Peter about the JDBC driver implementation of findColumn and he
> suggested I go here for help.
>
> I know the implementation that the Postgres driver uses (iterating through
> an array, doing case insensitive searches on each field name) is common and
> probably came from an old reference driver, but it's slow.
>
> I'm using an ORM tool that currently asks for all values by their column
> name (Hibernate) and I'm actually pestering both sides to see if I can get
> some improvements. Hibernate should really use indexes, but the postgres
> driver should really do a better job than it's doing now.
>
> I did some tinkering of my own and found that one of the problems with
> making the driver less cumbersome is the fact that findColumn() is currently
> case insensitive. I don't know if this is required by the jdbc spec, but it
> seems all the field names come back lower case (in the fields[] array), and
> so I could replace the implementation without understanding how to prevent
> the case of the field names from changing (changing from the case they were
> sent in as).
>
> At any rate, if case insensitivity could be thrown out then a very fast
> implementation could be worked out. As is, the following code was a marked
> improvement:
>
> int i;
>
> final int flen = fields.length;
> final String lowerCaseColumnName = columnName.toLowerCase();
>
> for (i = 0 ; i < flen; ++i)
> if ((fields[I].getName().hashCode() == lowerCaseColumnName.hashCode()) &&
> (fields[i].getName().equalsIgnoreCase(columnName)))
> return (i + 1);
>
>
> The int comparisons to hashCode() values are much faster than case
> insensitive String compares. The use of a HashMap would be even better,
> especially for very wide resultsets.
>
> Anyways, if we could make a change to the driver I'd be very pleased. And
> I'm sure I'm not the only one. Actually, the level 4 Oracle driver has a
> very similar implementation that's causing Hibernate users some pain, so
> speeding up the Postgres implementation would actually make it a much more
> performant option with that particular tool. :)
>
> Thanks,
> Phill
>
> P.S. I'm not subscribed to the list (nor do I know how to be), so please
> include me in any replies.
>
> ------ Forwarded Message
> From: Peter Mount <peter@retep.org.uk>
> Reply-To: <peter@retep.org.uk>
> Date: Wed, 13 Oct 2004 08:44:20 +0100
> To: "Tornroth, Phill" <ptornroth@intellidot.net>
> Subject: Re: Question about the postgres resultset implementation
>
> Tornroth, Phill wrote:
>
> >I noticed that getting values from the Postgres JDBC ResultSet by column
> >name was expensive. I looked at the implementation, and findColumn(String)
> >does the following:
> >
> >        int i;
> >
> >        final int flen = fields.length;
> >        for (i = 0 ; i < flen; ++i)
> >            if (fields[i].getName().equalsIgnoreCase(columnName))
> >                return (i + 1);
> >
> >
> >If you're getting results by name for a query with 30 columns and a hundred
> >rows, you'll do 4650 case insensitive string comparisons. I believe it would
> >be much, much faster to use a Hashtable for this task. Is there a reason the
> >implementation is as it is?
> >
> >
> I think that was a legacy from the original two drivers, and as it's
> always worked it's stayed there. I agree with you a Map of some sort
> (HashMap would be ideal, Hashtable would be overkill as it's syncronized
> and technically only one thread should access a ResultSet) would help.
>
> NB: I've not been part of the drivers development for some three years
> now, so it's best to ask on the jdbc list.
>
> Peter
>
> >
> >Phill
> >
> >
>
>
> ------ End of Forwarded Message
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
--
Dave Cramer
519 939 0336
ICQ # 14675561
www.postgresintl.com


Re: FW: Question about the postgres resultset implementation

From
Kris Jurka
Date:

On Wed, 13 Oct 2004, Kris Jurka wrote:

> I believe Peter's suggestion was to use a hashmap to cache the
> string->index mapping so that findColumn would only need to do the
> equalsIgnoreCase on the first call of that string, meaning once per column
> instead of for every call.
>

I've committed a fix to cvs following this suggestion.  The new findColumn
method looks like this:

Kris Jurka

public int findColumn(String columnName) throws SQLException
{
        Integer index = (Integer)columnNameIndexMap.get(columnName);
        if (index != null) {
                return index.intValue();
        }

        final int flen = fields.length;
        for (int i = 0 ; i < flen; ++i) {
                if (fields[i].getColumnLabel().equalsIgnoreCase(columnName)) {
                        index = new Integer(i+1);
                        columnNameIndexMap.put(columnName, index);
                        return index.intValue();
                }
        }

        throw new PSQLException (GT.tr("The column name '{0}' was not found in this ResultSet.", columnName));
}


Re: FW: Question about the postgres resultset implementation

From
Kris Jurka
Date:

On Wed, 13 Oct 2004, Tornroth, Phill wrote:

> I did some tinkering of my own and found that one of the problems with
> making the driver less cumbersome is the fact that findColumn() is currently
> case insensitive. I don't know if this is required by the jdbc spec, but it
> seems all the field names come back lower case (in the fields[] array), and
> so I could replace the implementation without understanding how to prevent
> the case of the field names from changing (changing from the case they were
> sent in as).

Case insensitivity is required.  Note that field names won't come back in
lowercase if they aren't created as lowercase.  Postgresql folds all
non-quoted identifiers to lower case, so you need to quote them to retain
case.

>
> At any rate, if case insensitivity could be thrown out then a very fast
> implementation could be worked out. As is, the following code was a marked
> improvement:

I believe Peter's suggestion was to use a hashmap to cache the
string->index mapping so that findColumn would only need to do the
equalsIgnoreCase on the first call of that string, meaning once per column
instead of for every call.

Kris Jurka

Re: FW: Question about the postgres resultset implementation

From
"Tornroth, Phill"
Date:
Wow, thanks! This will be very helpful to us. I can't believe the speed at which you've responded to this. I really
appreciateit. 

Are official releases of the JDBC driver tied to official releases of Postgres?

Phill


-----Original Message-----
From: Kris Jurka [mailto:books@ejurka.com]
Sent: Wed 10/13/2004 2:03 PM
To: Tornroth, Phill
Cc: pgsql-jdbc@postgresql.org
Subject: Re: [JDBC] FW: Question about the postgres resultset implementation



On Wed, 13 Oct 2004, Kris Jurka wrote:

> I believe Peter's suggestion was to use a hashmap to cache the
> string->index mapping so that findColumn would only need to do the
> equalsIgnoreCase on the first call of that string, meaning once per column
> instead of for every call.
>

I've committed a fix to cvs following this suggestion.  The new findColumn
method looks like this:

Kris Jurka

public int findColumn(String columnName) throws SQLException
{
        Integer index = (Integer)columnNameIndexMap.get(columnName);
        if (index != null) {
                return index.intValue();
        }

        final int flen = fields.length;
        for (int i = 0 ; i < flen; ++i) {
                if (fields[i].getColumnLabel().equalsIgnoreCase(columnName)) {
                        index = new Integer(i+1);
                        columnNameIndexMap.put(columnName, index);
                        return index.intValue();
                }
        }

        throw new PSQLException (GT.tr("The column name '{0}' was not found in this ResultSet.", columnName));
}





Re: FW: Question about the postgres resultset implementation

From
Kris Jurka
Date:

On Wed, 13 Oct 2004, Tornroth, Phill wrote:

> Are official releases of the JDBC driver tied to official releases of
> Postgres?

Yes.  Up until the upcoming 8.0 release the JDBC driver was included in
the main postgresql source tree, so it was released together.  Now it's
packaged separately, but based on that history and the fact that JDBC
features and new server features match up well we have decided to keep the
same release schedule.

Kris Jurka