new String(byte[]) performance - Mailing list pgsql-jdbc
From | Teofilis Martisius |
---|---|
Subject | new String(byte[]) performance |
Date | |
Msg-id | 20020911095735.GA6185@teohome.lzua.lt Whole thread Raw |
List | pgsql-jdbc |
Hello, While looking through postgresql JDBC driver sources and profiling, I noticed that the driver uses new String(byte[]) a lot while iterating a ResultSet. And I noticed that this String constructor takes a lot of time. I wrote a custom byte[]->String conversion method for UTF-8 that speeds up iterating over ResultSet 2 times or even more. I have a patch for PostgreSQL JDBC drivers, but well, this is a workaround and I am not sure it gets accepted. It does speed things up quite a noticable amount. Hmm, maybe decodeUTF8() should be synchronized on cdata, or maybe cdata should be allocated for each call. static cdata version was faster. By the way. What should a JDBC driver do when f.e. ResultSet.getInt() is called for a VARCHAR field? I would suggest converting byte arrays to Strings or even to more precisely typed values (Integers, Doubles and so on) on QueryExecutor().execute(). This should save some RAM allocation for receiveTuple, because now memory gets allocated several times- once for byte[], and second time for String, and third time for Integer or other object in getObject(). Memory allocation takes a considerable amount of time. But this stronger typing would remove some of flexibility to any getXXX for any SQL type field. And it would probably make the querying itself (QueryExecutor.execute() slower, i don't know :/ Teofilis Martisius Anyway, here is the patch to fix string decoding: diff -r -u ./org/postgresql/core/Encoding.java /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java --- ./org/postgresql/core/Encoding.java 2001-11-20 00:33:37.000000000 +0200 +++ /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java 2002-09-11 15:56:10.000000000+0200 @@ -155,6 +155,9 @@ } else { + if (encoding.equals("UTF-8")) { + return decodeUTF8(encodedString, offset, length); + } return new String(encodedString, offset, length, encoding); } } @@ -163,6 +166,43 @@ throw new PSQLException("postgresql.stream.encoding", e); } } + /** + * custom byte[] -> String conversion routine, 3x-10x faster then standard new String(byte[]) + */ + static final int pow2_6 = 64; // 2^6 + static final int pow2_12 = 4096; // 2^12 + static char cdata[] = new char[50]; + + public static final String decodeUTF8(byte data[], int offset, int length) { + if (cdata.length < (length-offset)) { + cdata = new char[length-offset]; + } + int i = offset; + int j = 0; + int z, y, x, val; + while (i < length) { + z = data[i] & 0xFF; + if (z < 0x80) { + cdata[j++] = (char)data[i]; + i++; + } else if (z >= 0xE0) { // length == 3 + y = data[i+1] & 0xFF; + x = data[i+2] & 0xFF; + val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80); + cdata[j++] = (char) val; + i+= 3; + } else { // length == 2 (maybe add checking for length > 3, throw exception if it is + y = data[i+1] & 0xFF; + val = (z - 0xC0)* (pow2_6)+(y-0x80); + cdata[j++] = (char) val; + i+=2; + } + } + + String s = new String(cdata, 0, j); + return s; + } + /* * Decode an array of bytes into a string.
pgsql-jdbc by date: