Thread: new String(byte[]) performance

new String(byte[]) performance

From

Teofilis Martisius

Date:

12 September 2002, 03:50:46

Hello,

While looking through postgresql JDBC driver sources and profiling, I
noticed that the driver uses new String(byte[]) a lot while iterating a
ResultSet. And I noticed that this String constructor takes a lot of
time. I wrote a custom byte[]->String conversion method for UTF-8 that
speeds up iterating over ResultSet 2 times or even more. I have a patch
for PostgreSQL JDBC drivers, but well, this is a workaround and I am not
sure it gets accepted. It does speed things up quite a noticable amount.

Hmm, maybe decodeUTF8() should be synchronized on cdata, or maybe cdata
should be allocated for each call. static cdata version was faster.

By the way. What should a JDBC driver do when f.e. ResultSet.getInt() is
called for a VARCHAR field? I would suggest converting byte arrays to
Strings or even to more precisely typed values (Integers, Doubles and so
on) on QueryExecutor().execute(). This should save some RAM allocation
for receiveTuple, because now memory gets allocated several times- once
for byte[], and second time for String, and third time for Integer or
other object in getObject(). Memory allocation takes a considerable
amount of time. But this stronger typing would remove some of
flexibility to any getXXX for any SQL type field. And it would probably
make the querying itself (QueryExecutor.execute() slower, i don't know
:/

Teofilis Martisius

Anyway, here is the patch to fix string decoding:

diff -r -u ./org/postgresql/core/Encoding.java
/usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java
--- ./org/postgresql/core/Encoding.java    2001-11-20 00:33:37.000000000 +0200
+++ /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java    2002-09-11
15:56:10.000000000+0200 
@@ -155,6 +155,9 @@
             }
             else
             {
+                if (encoding.equals("UTF-8")) {
+                    return decodeUTF8(encodedString, offset, length);
+                }
                 return new String(encodedString, offset, length, encoding);
             }
         }
@@ -163,6 +166,43 @@
             throw new PSQLException("postgresql.stream.encoding", e);
         }
     }
+    /**
+     * custom byte[] -> String conversion routine, 3x-10x faster then standard new String(byte[])
+      */
+    static final int pow2_6 = 64;        // 2^6
+    static final int pow2_12 = 4096;    // 2^12
+    static char cdata[] = new char[50];
+
+    public static final String decodeUTF8(byte data[], int offset, int length) {
+        if (cdata.length < (length-offset)) {
+            cdata = new char[length-offset];
+        }
+        int i = offset;
+        int j = 0;
+        int z, y, x, val;
+        while (i < length) {
+            z = data[i] & 0xFF;
+            if (z < 0x80) {
+                cdata[j++] = (char)data[i];
+                i++;
+            } else if (z >= 0xE0) {        // length == 3
+                y = data[i+1] & 0xFF;
+                x = data[i+2] & 0xFF;
+                val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
+                cdata[j++] = (char) val;
+                i+= 3;
+            } else {        // length == 2 (maybe add checking for length > 3, throw exception if it is
+                y = data[i+1] & 0xFF;
+                val = (z - 0xC0)* (pow2_6)+(y-0x80);
+                cdata[j++] = (char) val;
+                i+=2;
+            }
+        }
+
+        String s = new String(cdata, 0, j);
+        return s;
+    }
+

     /*
      * Decode an array of bytes into a string.

Re: new String(byte[]) performance

From

Teofilis Martisius

Date:

20 October 2002, 23:31:46

On Sat, Oct 19, 2002 at 08:02:37PM -0700, Barry Lind wrote:
>
> Teofilis,
>
> I have applied this patch.  I also made the change that so that when
> connected to a 7.3 database this optimization will always be used.  This
> is done by having the server do the character set encoding/decoding and
> always using UTF-8 when dealing with the jdbc client.
>
> thanks,
> --Barry
>

Hi,

Ok, thanks for applying that. Well, after doing some benchmarks, I can
say that java sucks. Don't get me wrong- it is still my language of
choice and it is better than many other alternatives, but I have yet to
see a JVM that has good performance, and no strange bottlenecks. I was
quite annoyed to see that executing a query  via JDBC and iterating over
it from java took 6x the time it takes to execute it with psql. This
patch helps a bit, but the performance overhead is still huge. Well, I
looked over PostgreSQL JDBC driver code serveral times, and now I don't
see anything more that can be optimized. The things that take up most
time now is transferring everything over network
(PG_Stream.receiveTuple if i remember correctly) and allocating memmory
for byte[] arrays. But I don't know any way to speed them up.

Teofilis Martisius,
teo@mediaworks.lt

Re: new String(byte[]) performance

From

Aaron Mulder

Date:

20 October 2002, 23:39:52

On Mon, 21 Oct 2002, Teofilis Martisius wrote:
> The things that take up most time now is transferring everything over
> network (PG_Stream.receiveTuple if i remember correctly) and allocating
> memmory for byte[] arrays. But I don't know any way to speed them up.

    There is probably room for improvement here under JDK 1.4, if we
want to get really fancy.  I think we could manipulate IO buffers to read
directly from the network into byte arrays, rather than reading in from
the network at the hardware level, allocating a new buffer in the program,
and then copying the data from the network buffer to the program buffer.
    But of course, IANAIOGuy; I've never actually tried that... :)

Aaron

Re: new String(byte[]) performance

From

"Michael Paesold"

Date:

21 October 2002, 04:42:54

Aaron Mulder <ammulder@alumni.princeton.edu> wrote:

> On Mon, 21 Oct 2002, Teofilis Martisius wrote:
> > The things that take up most time now is transferring everything over
> > network (PG_Stream.receiveTuple if i remember correctly) and allocating
> > memmory for byte[] arrays. But I don't know any way to speed them up.
>
> There is probably room for improvement here under JDK 1.4, if we
> want to get really fancy.  I think we could manipulate IO buffers to read
> directly from the network into byte arrays, rather than reading in from
> the network at the hardware level, allocating a new buffer in the program,
> and then copying the data from the network buffer to the program buffer.
> But of course, IANAIOGuy; I've never actually tried that... :)
>
> Aaron

You are talking about the new IO system in java.nio.*, right? I think there
could be a major improvement. On the other hand this would definatly
increase the maintenance overhead in the driver. Anyway, what else if not
this would be a stronger reason for object oriented programming? ;-)

Regards,
Michael Paesold

Re: new String(byte[]) performance

From

Barry Lind

Date:

21 October 2002, 08:56:23

Teofilis,

I have applied this patch.  I also made the change that so that when
connected to a 7.3 database this optimization will always be used.  This
is done by having the server do the character set encoding/decoding and
always using UTF-8 when dealing with the jdbc client.

thanks,
--Barry



Teofilis Martisius wrote:
> Hello,
>
> While looking through postgresql JDBC driver sources and profiling, I
> noticed that the driver uses new String(byte[]) a lot while iterating a
> ResultSet. And I noticed that this String constructor takes a lot of
> time. I wrote a custom byte[]->String conversion method for UTF-8 that
> speeds up iterating over ResultSet 2 times or even more. I have a patch
> for PostgreSQL JDBC drivers, but well, this is a workaround and I am not
> sure it gets accepted. It does speed things up quite a noticable amount.
>
> Hmm, maybe decodeUTF8() should be synchronized on cdata, or maybe cdata
> should be allocated for each call. static cdata version was faster.
>
> By the way. What should a JDBC driver do when f.e. ResultSet.getInt() is
> called for a VARCHAR field? I would suggest converting byte arrays to
> Strings or even to more precisely typed values (Integers, Doubles and so
> on) on QueryExecutor().execute(). This should save some RAM allocation
> for receiveTuple, because now memory gets allocated several times- once
> for byte[], and second time for String, and third time for Integer or
> other object in getObject(). Memory allocation takes a considerable
> amount of time. But this stronger typing would remove some of
> flexibility to any getXXX for any SQL type field. And it would probably
> make the querying itself (QueryExecutor.execute() slower, i don't know
> :/
>
> Teofilis Martisius
>
> Anyway, here is the patch to fix string decoding:
>
> diff -r -u ./org/postgresql/core/Encoding.java
/usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java
> --- ./org/postgresql/core/Encoding.java    2001-11-20 00:33:37.000000000 +0200
> +++ /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java    2002-09-11
15:56:10.000000000+0200 
> @@ -155,6 +155,9 @@
>              }
>              else
>              {
> +                if (encoding.equals("UTF-8")) {
> +                    return decodeUTF8(encodedString, offset, length);
> +                }
>                  return new String(encodedString, offset, length, encoding);
>              }
>          }
> @@ -163,6 +166,43 @@
>              throw new PSQLException("postgresql.stream.encoding", e);
>          }
>      }
> +    /**
> +     * custom byte[] -> String conversion routine, 3x-10x faster then standard new String(byte[])
> +      */
> +    static final int pow2_6 = 64;        // 2^6
> +    static final int pow2_12 = 4096;    // 2^12
> +    static char cdata[] = new char[50];
> +
> +    public static final String decodeUTF8(byte data[], int offset, int length) {
> +        if (cdata.length < (length-offset)) {
> +            cdata = new char[length-offset];
> +        }
> +        int i = offset;
> +        int j = 0;
> +        int z, y, x, val;
> +        while (i < length) {
> +            z = data[i] & 0xFF;
> +            if (z < 0x80) {
> +                cdata[j++] = (char)data[i];
> +                i++;
> +            } else if (z >= 0xE0) {        // length == 3
> +                y = data[i+1] & 0xFF;
> +                x = data[i+2] & 0xFF;
> +                val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
> +                cdata[j++] = (char) val;
> +                i+= 3;
> +            } else {        // length == 2 (maybe add checking for length > 3, throw exception if it is
> +                y = data[i+1] & 0xFF;
> +                val = (z - 0xC0)* (pow2_6)+(y-0x80);
> +                cdata[j++] = (char) val;
> +                i+=2;
> +            }
> +        }
> +
> +        String s = new String(cdata, 0, j);
> +        return s;
> +    }
> +
>
>      /*
>       * Decode an array of bytes into a string.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

Re: new String(byte[]) performance

From

Barry Lind

Date:

22 October 2002, 11:43:56

Teofilis,

I don't think the problem you are seeing is as a result of using java.
It is more the result of the architecture of the jdbc driver.  I know
other followups to this email have suggested fixes at the IO level, and
while I think that may be interesting to look into, I think there is a
lot that can be done to improve performance within the existing code
that can work on all jdks (1.1, 1.2, 1.3 and 1.4).

If you look at what is happening in the driver when you do something as
simple as 'select 1', you can see many areas of improvement.

The first thing that the driver does is allocate byte[] objects for each
value being selected (a two dimensional array of rows and columns).
This makes sense since the values are being read raw off of the socket
and need to be stored somewhere and byte[] seems a reasonable datatype.
  However this results in allocating many, many, many small objects off
of the java heap which then need to be garbage collected later (and
garbage collection isn't free, it takes a lot of CPU that could be used
for other things).  One design pattern to deal with this problem is to
use an object pool and reuse byte[] objects to avoid the excessive
overhead of the object creation and garbage collection.  There have been
two attempts at this in the past, one I did (but lost due to a hard
drive crash) and another that was checked into CVS, but had a
significant number of issues it wasn't ever used.

However the byte[] objects are only the first problem.  For example take
a call to the getInt() method.  It converts the raw byte[] data to a
String and then that string is converted to an int.  So a bunch more
String objects are created (and then later garbage collected).  These
String objects are there because the java API doesn't provide any
methods to convert from byte[] to other objects like int, long,
BigDecimal, Timestamp, etc.  So a String intermediary is used.

So to get the int returned by getInt() both a byte[] and a String object
get created only to be garbage collected later as they are just
temporary objects.

Now using object pools can help the allocation of byte[] objects, but
doesn't help with String objects.  However if the driver started using
char[] objects internally instead of Strings, these could be pooled as
well.  But this would probably mean that code like
Integer.parseInt(String) would need to be reimplemented in the driver
since there is no corresponding Integer.parseInt(char[]).

Now while I realize that there is a lot of room for improvement, I find
that the overall performance of the Postgresql jdbc driver is similar to
the drivers I have used for other databases (i.e. Oracle and MSSQL).  So
I wouldn't characterize the performance as bad, but it could be improved.

thanks,
--Barry

Teofilis Martisius wrote:
> On Sat, Oct 19, 2002 at 08:02:37PM -0700, Barry Lind wrote:
>
>>Teofilis,
>>
>>I have applied this patch.  I also made the change that so that when
>>connected to a 7.3 database this optimization will always be used.  This
>>is done by having the server do the character set encoding/decoding and
>>always using UTF-8 when dealing with the jdbc client.
>>
>>thanks,
>>--Barry
>>
>
>
> Hi,
>
> Ok, thanks for applying that. Well, after doing some benchmarks, I can
> say that java sucks. Don't get me wrong- it is still my language of
> choice and it is better than many other alternatives, but I have yet to
> see a JVM that has good performance, and no strange bottlenecks. I was
> quite annoyed to see that executing a query  via JDBC and iterating over
> it from java took 6x the time it takes to execute it with psql. This
> patch helps a bit, but the performance overhead is still huge. Well, I
> looked over PostgreSQL JDBC driver code serveral times, and now I don't
> see anything more that can be optimized. The things that take up most
> time now is transferring everything over network
> (PG_Stream.receiveTuple if i remember correctly) and allocating memmory
> for byte[] arrays. But I don't know any way to speed them up.
>
> Teofilis Martisius,
> teo@mediaworks.lt
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: new String(byte[]) performance

From

Aaron Mulder

Date:

22 October 2002, 12:15:41

Barry,
    Are you saying that the server returns everything as
strings/characters no matter what?  Like if it sends the number "123456"
that will be 7 bytes (6+null), not 4 bytes (an int)?  Can we make it send
the 4-byte int value instead?

Aaron

On Mon, 21 Oct 2002, Barry Lind wrote:
> Teofilis,
>
> I don't think the problem you are seeing is as a result of using java.
> It is more the result of the architecture of the jdbc driver.  I know
> other followups to this email have suggested fixes at the IO level, and
> while I think that may be interesting to look into, I think there is a
> lot that can be done to improve performance within the existing code
> that can work on all jdks (1.1, 1.2, 1.3 and 1.4).
>
> If you look at what is happening in the driver when you do something as
> simple as 'select 1', you can see many areas of improvement.
>
> The first thing that the driver does is allocate byte[] objects for each
> value being selected (a two dimensional array of rows and columns).
> This makes sense since the values are being read raw off of the socket
> and need to be stored somewhere and byte[] seems a reasonable datatype.
>   However this results in allocating many, many, many small objects off
> of the java heap which then need to be garbage collected later (and
> garbage collection isn't free, it takes a lot of CPU that could be used
> for other things).  One design pattern to deal with this problem is to
> use an object pool and reuse byte[] objects to avoid the excessive
> overhead of the object creation and garbage collection.  There have been
> two attempts at this in the past, one I did (but lost due to a hard
> drive crash) and another that was checked into CVS, but had a
> significant number of issues it wasn't ever used.
>
> However the byte[] objects are only the first problem.  For example take
> a call to the getInt() method.  It converts the raw byte[] data to a
> String and then that string is converted to an int.  So a bunch more
> String objects are created (and then later garbage collected).  These
> String objects are there because the java API doesn't provide any
> methods to convert from byte[] to other objects like int, long,
> BigDecimal, Timestamp, etc.  So a String intermediary is used.
>
> So to get the int returned by getInt() both a byte[] and a String object
> get created only to be garbage collected later as they are just
> temporary objects.
>
> Now using object pools can help the allocation of byte[] objects, but
> doesn't help with String objects.  However if the driver started using
> char[] objects internally instead of Strings, these could be pooled as
> well.  But this would probably mean that code like
> Integer.parseInt(String) would need to be reimplemented in the driver
> since there is no corresponding Integer.parseInt(char[]).
>
>
> Now while I realize that there is a lot of room for improvement, I find
> that the overall performance of the Postgresql jdbc driver is similar to
> the drivers I have used for other databases (i.e. Oracle and MSSQL).  So
> I wouldn't characterize the performance as bad, but it could be improved.
>
> thanks,
> --Barry
>
>
>
>
>
> Teofilis Martisius wrote:
> > On Sat, Oct 19, 2002 at 08:02:37PM -0700, Barry Lind wrote:
> >
> >>Teofilis,
> >>
> >>I have applied this patch.  I also made the change that so that when
> >>connected to a 7.3 database this optimization will always be used.  This
> >>is done by having the server do the character set encoding/decoding and
> >>always using UTF-8 when dealing with the jdbc client.
> >>
> >>thanks,
> >>--Barry
> >>
> >
> >
> > Hi,
> >
> > Ok, thanks for applying that. Well, after doing some benchmarks, I can
> > say that java sucks. Don't get me wrong- it is still my language of
> > choice and it is better than many other alternatives, but I have yet to
> > see a JVM that has good performance, and no strange bottlenecks. I was
> > quite annoyed to see that executing a query  via JDBC and iterating over
> > it from java took 6x the time it takes to execute it with psql. This
> > patch helps a bit, but the performance overhead is still huge. Well, I
> > looked over PostgreSQL JDBC driver code serveral times, and now I don't
> > see anything more that can be optimized. The things that take up most
> > time now is transferring everything over network
> > (PG_Stream.receiveTuple if i remember correctly) and allocating memmory
> > for byte[] arrays. But I don't know any way to speed them up.
> >
> > Teofilis Martisius,
> > teo@mediaworks.lt
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

Re: new String(byte[]) performance

From

Barry Lind

Date:

22 October 2002, 13:02:40

Aaron Mulder wrote:
> Barry,
>     Are you saying that the server returns everything as
> strings/characters no matter what?  Like if it sends the number "123456"
> that will be 7 bytes (6+null), not 4 bytes (an int)?  Can we make it send
> the 4-byte int value instead?

That is correct.  The FE/BE protocol sends the data back and forth as
strings for all the data types.  The only exception to this is if you
are using a 'binary cursor' inwhich case the data is sent in binary,
(however the byte order is platform dependent, which makes it a pain to
use binary cursors).

--Barry

Re: new String(byte[]) performance

From

Teofilis Martisius

Date:

22 October 2002, 13:18:03

On Mon, Oct 21, 2002 at 07:38:07PM -0700, Barry Lind wrote:
> Teofilis,
>
> I don't think the problem you are seeing is as a result of using java.
> It is more the result of the architecture of the jdbc driver.  I know
> other followups to this email have suggested fixes at the IO level, and
> while I think that may be interesting to look into, I think there is a
> lot that can be done to improve performance within the existing code
> that can work on all jdks (1.1, 1.2, 1.3 and 1.4).

Ok, I took a look at 1.4 java.nio but well, I also don't like tying JDBC
drivers to 1.4, because I still use 1.3 in production myself, I had
stability problems with 1.4. And I'm not sure how much would java.nio help.
Heh, I wish hava had #ifdefs. Anyway, i'm not doing 1.4 stuff until 1.4
is more widespread. Or at least 1.4 'features' should be optional. Separate
class files or something.

>
> If you look at what is happening in the driver when you do something as
> simple as 'select 1', you can see many areas of improvement.

Testing simple selects, hmm, it doesn't realy matter. I can measure how
much time is spent in JDBC driver, I don't think postgres server/query
delays realy distort the results I get. Besides, I compare with 'psql'
performance for same queries.

>
> The first thing that the driver does is allocate byte[] ...

> However the byte[] objects are only the first problem....
>
> Now using object pools can help the allocation of byte[] objects, but
> doesn't help with String objects.  However if the driver started using
> char[] objects internally instead of Strings, these could be pooled as
> well.  But this would probably mean that code like
> Integer.parseInt(String) would need to be reimplemented in the driver
> since there is no corresponding Integer.parseInt(char[]).

Hmm, I know how all this works. I read JDBC driver code. However, I did
not find much better solution. First, when transfering data from stream,
the only logical solution is to put int into byte[]. And as far as I
understand byte[] arrays are already pooled. I doubt it is
posible/better to read other things than byte[] from the stream.

About converting char[] -> everything, well, new String(char[]) is
really cheap, but it does COPY the char[] array. String is in fact just
a wrapper for char[]. It uses System.arrayCopy() AFAIK. There is a new
String(char[])  constructor that doesn't copy the array, but it is
package private for java.lang. Too bad there isn't Integer.parseInt(char[]).

There are 2 ways I think performance can be improved. One is to strongly
type the received data into field type. F.e. for integer fields receive
byte[], then convert it to java.lang.Integer at once, and store it in
memmory as java.lang.Integer. But this does remove quite a lot of
flexibility, i.e doing resultset.getString() on integer field, or even
resultset.getLong() on integer field would cause a ClassCastException.
So I don't think this is a good solution. Well, more precisely, it would
be quite hard to make this solution flexible enough. It could f.e.
return default object for getObject() and getInt(), obj.toString() for
getString, and f.e. convert object to other object via String when
getSomethingElse() is called on resultset. Hmm, and conversion should
still be done via String (are there other ways?), so temporary String
allocation would still be a problem... What do you think?

Second solution is to store received data as Strings. I don't exaclty
know how much better it would be. It would make the temporary string
allocation permanent, and one byte[] array for receiving data would be
enough, i.e. no more byte[] allocation bottlenecks. But I don't think it
would make very much difference in the end.

Hmm, so things that can be done:

1. Strong typing after receive, store data as specific objects.
2. Converting into Strings after receive, store data as strings.
3. Maybe receive all the data into single big byte[] array? or at
least entire row into a single big byte[] aray? Less trouble for
garbage collector/pool? Is it possible?
4. Examine the possibility to receive Strings of char[] arrays directly
from stream. Maybe using java.nio for that. I read the following message
from Aaron. If backed sends everything except binary cursors as string
then receiving it as string seems a logical solution :/

Ok, I could look at these things when I have time. Tell me which
solution do you prefer more or what should I work on first.

And 1 more thing- I think SQL queries are a bottleneck in many
applications, so every milisecond saved in JDBC driver counts.

Teofilis Martisius

Re: new String(byte[]) performance

From

Aaron Mulder

Date:

22 October 2002, 13:23:08

On Tue, 22 Oct 2002, Barry Lind wrote:
> That is correct.  The FE/BE protocol sends the data back and forth as
> strings for all the data types.  The only exception to this is if you
> are using a 'binary cursor' inwhich case the data is sent in binary,
> (however the byte order is platform dependent, which makes it a pain to
> use binary cursors).

    I think it would be less of a pain to deal with byte
order/endianness than to constantly decode text into numbers.  I mean,
we'd use one routine to read all ints/floats/etc., and it's not hard to
have a flag for whether to read abcd or dcba before we stuff it into an
int/float/whatever.  Java even has the methods to do this (see DataInput
JavaDoc, Float.intBitsToFloat, etc.)  All we need is some indicator of the
native byte order of the server, which could be achieved by sending a
single well-known number in the connection process (you know, the
byte-level analog of "did the server report version 7.3 or 3.7?").
    Truly, one flag for endianness vs creating extra objects for every
numeric value ever read from the server, and getting bugs like can't read
"" as an integer?  Which is really the pain?

    Not that you probably wanted me to agitate for architecture
changes during the beta process... :)

Aaron

Re: new String(byte[]) performance

From

Barry Lind

Date:

22 October 2002, 13:38:33

Aaron Mulder wrote:
> On Tue, 22 Oct 2002, Barry Lind wrote:
>
>>That is correct.  The FE/BE protocol sends the data back and forth as
>>strings for all the data types.  The only exception to this is if you
>>are using a 'binary cursor' inwhich case the data is sent in binary,
>>(however the byte order is platform dependent, which makes it a pain to
>>use binary cursors).
>
>
>     I think it would be less of a pain to deal with byte
> order/endianness than to constantly decode text into numbers.  I mean,
> we'd use one routine to read all ints/floats/etc., and it's not hard to
> have a flag for whether to read abcd or dcba before we stuff it into an
> int/float/whatever.  Java even has the methods to do this (see DataInput
> JavaDoc, Float.intBitsToFloat, etc.)  All we need is some indicator of the
> native byte order of the server, which could be achieved by sending a
> single well-known number in the connection process (you know, the
> byte-level analog of "did the server report version 7.3 or 3.7?").
>     Truly, one flag for endianness vs creating extra objects for every
> numeric value ever read from the server, and getting bugs like can't read
> "" as an integer?  Which is really the pain?
>
>     Not that you probably wanted me to agitate for architecture
> changes during the beta process... :)

There really isn't anything that can be done in the jdbc driver until a
change occurs in the FE/BE protocol.  I agree that dealing with the
byte-order isn't too difficult (I actually do it in my code where I do
use binary cursors).  But there are other issues as well, the least of
which is knowing the binary representation of each datatype (and types
like Date in 7.3 have two different formats depending on configure time
switches).  It can get ugly :-)

thanks,
--Barry

PS.  When I say binary cursor, I am talking explicitly about the sql
statement:  declare binary cursor foo as select ...
All other sql statements are handled as strings.

Re: new String(byte[]) performance

From

Barry Lind

Date:

22 October 2002, 13:40:52


Teofilis Martisius wrote:
> On Mon, Oct 21, 2002 at 07:38:07PM -0700, Barry Lind wrote:
>

>
> Hmm, I know how all this works. I read JDBC driver code. However, I did
> not find much better solution. First, when transfering data from stream,
> the only logical solution is to put int into byte[]. And as far as I
> understand byte[] arrays are already pooled. I doubt it is
> posible/better to read other things than byte[] from the stream.
>

byte[] objects are not pooled.  As for the rest of your email I will
think about the options you have laid out and respond in more detail later.

thanks,
--Barry

Re: new String(byte[]) performance

From

"Michael Paesold"

Date:

22 October 2002, 16:30:07

Barry Lind <blind@xythos.com> wrote:

> Teofilis Martisius wrote:
> > On Mon, Oct 21, 2002 at 07:38:07PM -0700, Barry Lind wrote:
> >
> > Hmm, I know how all this works. I read JDBC driver code. However, I did
> > not find much better solution. First, when transfering data from stream,
> > the only logical solution is to put int into byte[]. And as far as I
> > understand byte[] arrays are already pooled. I doubt it is
> > posible/better to read other things than byte[] from the stream.
> >
>
> byte[] objects are not pooled.  As for the rest of your email I will
> think about the options you have laid out and respond in more detail
later.

IIRC, the jvm at least in the newest versions does object pooling itself for
some classes. Maybe for byte[] two. There are some documents on the Sun Java
homepage that discourage from using object pools for anything that is not
related to external resources (like a database connection). They say the
internal algorithms are more efficient and object pooling can disturb the
generational garbage collection.

I am not sure about this. In older books I always read about pooling
everything, but nowadays there seem to be many opinions against it. At least
with modern JVMs. Anyone who has deeper knowledge of the topic or some
significant experience?

Best Regards,
Michael Paesold