Re: new String(byte[]) performance - Mailing list pgsql-jdbc

From Barry Lind
Subject Re: new String(byte[]) performance
Date
Msg-id 3DB4BA0F.2000804@xythos.com
Whole thread Raw
In response to new String(byte[]) performance  (Teofilis Martisius <teo@teohome.lzua.lt>)
Responses Re: new String(byte[]) performance
Re: new String(byte[]) performance
List pgsql-jdbc
Teofilis,

I don't think the problem you are seeing is as a result of using java.
It is more the result of the architecture of the jdbc driver.  I know
other followups to this email have suggested fixes at the IO level, and
while I think that may be interesting to look into, I think there is a
lot that can be done to improve performance within the existing code
that can work on all jdks (1.1, 1.2, 1.3 and 1.4).

If you look at what is happening in the driver when you do something as
simple as 'select 1', you can see many areas of improvement.

The first thing that the driver does is allocate byte[] objects for each
value being selected (a two dimensional array of rows and columns).
This makes sense since the values are being read raw off of the socket
and need to be stored somewhere and byte[] seems a reasonable datatype.
  However this results in allocating many, many, many small objects off
of the java heap which then need to be garbage collected later (and
garbage collection isn't free, it takes a lot of CPU that could be used
for other things).  One design pattern to deal with this problem is to
use an object pool and reuse byte[] objects to avoid the excessive
overhead of the object creation and garbage collection.  There have been
two attempts at this in the past, one I did (but lost due to a hard
drive crash) and another that was checked into CVS, but had a
significant number of issues it wasn't ever used.

However the byte[] objects are only the first problem.  For example take
a call to the getInt() method.  It converts the raw byte[] data to a
String and then that string is converted to an int.  So a bunch more
String objects are created (and then later garbage collected).  These
String objects are there because the java API doesn't provide any
methods to convert from byte[] to other objects like int, long,
BigDecimal, Timestamp, etc.  So a String intermediary is used.

So to get the int returned by getInt() both a byte[] and a String object
get created only to be garbage collected later as they are just
temporary objects.

Now using object pools can help the allocation of byte[] objects, but
doesn't help with String objects.  However if the driver started using
char[] objects internally instead of Strings, these could be pooled as
well.  But this would probably mean that code like
Integer.parseInt(String) would need to be reimplemented in the driver
since there is no corresponding Integer.parseInt(char[]).


Now while I realize that there is a lot of room for improvement, I find
that the overall performance of the Postgresql jdbc driver is similar to
the drivers I have used for other databases (i.e. Oracle and MSSQL).  So
I wouldn't characterize the performance as bad, but it could be improved.

thanks,
--Barry





Teofilis Martisius wrote:
> On Sat, Oct 19, 2002 at 08:02:37PM -0700, Barry Lind wrote:
>
>>Teofilis,
>>
>>I have applied this patch.  I also made the change that so that when
>>connected to a 7.3 database this optimization will always be used.  This
>>is done by having the server do the character set encoding/decoding and
>>always using UTF-8 when dealing with the jdbc client.
>>
>>thanks,
>>--Barry
>>
>
>
> Hi,
>
> Ok, thanks for applying that. Well, after doing some benchmarks, I can
> say that java sucks. Don't get me wrong- it is still my language of
> choice and it is better than many other alternatives, but I have yet to
> see a JVM that has good performance, and no strange bottlenecks. I was
> quite annoyed to see that executing a query  via JDBC and iterating over
> it from java took 6x the time it takes to execute it with psql. This
> patch helps a bit, but the performance overhead is still huge. Well, I
> looked over PostgreSQL JDBC driver code serveral times, and now I don't
> see anything more that can be optimized. The things that take up most
> time now is transferring everything over network
> (PG_Stream.receiveTuple if i remember correctly) and allocating memmory
> for byte[] arrays. But I don't know any way to speed them up.
>
> Teofilis Martisius,
> teo@mediaworks.lt
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>




pgsql-jdbc by date:

Previous
From: "David Hooker"
Date:
Subject: Re: URGENT: Help with exception
Next
From: Aaron Mulder
Date:
Subject: Re: new String(byte[]) performance