Re: new String(byte[]) performance - Mailing list pgsql-jdbc
From | Barry Lind |
---|---|
Subject | Re: new String(byte[]) performance |
Date | |
Msg-id | 3DB4BA0F.2000804@xythos.com Whole thread Raw |
In response to | new String(byte[]) performance (Teofilis Martisius <teo@teohome.lzua.lt>) |
Responses |
Re: new String(byte[]) performance
Re: new String(byte[]) performance |
List | pgsql-jdbc |
Teofilis, I don't think the problem you are seeing is as a result of using java. It is more the result of the architecture of the jdbc driver. I know other followups to this email have suggested fixes at the IO level, and while I think that may be interesting to look into, I think there is a lot that can be done to improve performance within the existing code that can work on all jdks (1.1, 1.2, 1.3 and 1.4). If you look at what is happening in the driver when you do something as simple as 'select 1', you can see many areas of improvement. The first thing that the driver does is allocate byte[] objects for each value being selected (a two dimensional array of rows and columns). This makes sense since the values are being read raw off of the socket and need to be stored somewhere and byte[] seems a reasonable datatype. However this results in allocating many, many, many small objects off of the java heap which then need to be garbage collected later (and garbage collection isn't free, it takes a lot of CPU that could be used for other things). One design pattern to deal with this problem is to use an object pool and reuse byte[] objects to avoid the excessive overhead of the object creation and garbage collection. There have been two attempts at this in the past, one I did (but lost due to a hard drive crash) and another that was checked into CVS, but had a significant number of issues it wasn't ever used. However the byte[] objects are only the first problem. For example take a call to the getInt() method. It converts the raw byte[] data to a String and then that string is converted to an int. So a bunch more String objects are created (and then later garbage collected). These String objects are there because the java API doesn't provide any methods to convert from byte[] to other objects like int, long, BigDecimal, Timestamp, etc. So a String intermediary is used. So to get the int returned by getInt() both a byte[] and a String object get created only to be garbage collected later as they are just temporary objects. Now using object pools can help the allocation of byte[] objects, but doesn't help with String objects. However if the driver started using char[] objects internally instead of Strings, these could be pooled as well. But this would probably mean that code like Integer.parseInt(String) would need to be reimplemented in the driver since there is no corresponding Integer.parseInt(char[]). Now while I realize that there is a lot of room for improvement, I find that the overall performance of the Postgresql jdbc driver is similar to the drivers I have used for other databases (i.e. Oracle and MSSQL). So I wouldn't characterize the performance as bad, but it could be improved. thanks, --Barry Teofilis Martisius wrote: > On Sat, Oct 19, 2002 at 08:02:37PM -0700, Barry Lind wrote: > >>Teofilis, >> >>I have applied this patch. I also made the change that so that when >>connected to a 7.3 database this optimization will always be used. This >>is done by having the server do the character set encoding/decoding and >>always using UTF-8 when dealing with the jdbc client. >> >>thanks, >>--Barry >> > > > Hi, > > Ok, thanks for applying that. Well, after doing some benchmarks, I can > say that java sucks. Don't get me wrong- it is still my language of > choice and it is better than many other alternatives, but I have yet to > see a JVM that has good performance, and no strange bottlenecks. I was > quite annoyed to see that executing a query via JDBC and iterating over > it from java took 6x the time it takes to execute it with psql. This > patch helps a bit, but the performance overhead is still huge. Well, I > looked over PostgreSQL JDBC driver code serveral times, and now I don't > see anything more that can be optimized. The things that take up most > time now is transferring everything over network > (PG_Stream.receiveTuple if i remember correctly) and allocating memmory > for byte[] arrays. But I don't know any way to speed them up. > > Teofilis Martisius, > teo@mediaworks.lt > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
pgsql-jdbc by date: