new String(byte[]) performance - Mailing list pgsql-jdbc

From Teofilis Martisius
Subject new String(byte[]) performance
Date
Msg-id 20020911095735.GA6185@teohome.lzua.lt
Whole thread Raw
List pgsql-jdbc
Hello,

While looking through postgresql JDBC driver sources and profiling, I
noticed that the driver uses new String(byte[]) a lot while iterating a
ResultSet. And I noticed that this String constructor takes a lot of
time. I wrote a custom byte[]->String conversion method for UTF-8 that
speeds up iterating over ResultSet 2 times or even more. I have a patch
for PostgreSQL JDBC drivers, but well, this is a workaround and I am not
sure it gets accepted. It does speed things up quite a noticable amount.

Hmm, maybe decodeUTF8() should be synchronized on cdata, or maybe cdata
should be allocated for each call. static cdata version was faster.

By the way. What should a JDBC driver do when f.e. ResultSet.getInt() is
called for a VARCHAR field? I would suggest converting byte arrays to
Strings or even to more precisely typed values (Integers, Doubles and so
on) on QueryExecutor().execute(). This should save some RAM allocation
for receiveTuple, because now memory gets allocated several times- once
for byte[], and second time for String, and third time for Integer or
other object in getObject(). Memory allocation takes a considerable
amount of time. But this stronger typing would remove some of
flexibility to any getXXX for any SQL type field. And it would probably
make the querying itself (QueryExecutor.execute() slower, i don't know
:/

Teofilis Martisius

Anyway, here is the patch to fix string decoding:

diff -r -u ./org/postgresql/core/Encoding.java
/usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java
--- ./org/postgresql/core/Encoding.java    2001-11-20 00:33:37.000000000 +0200
+++ /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java    2002-09-11
15:56:10.000000000+0200 
@@ -155,6 +155,9 @@
             }
             else
             {
+                if (encoding.equals("UTF-8")) {
+                    return decodeUTF8(encodedString, offset, length);
+                }
                 return new String(encodedString, offset, length, encoding);
             }
         }
@@ -163,6 +166,43 @@
             throw new PSQLException("postgresql.stream.encoding", e);
         }
     }
+    /**
+     * custom byte[] -> String conversion routine, 3x-10x faster then standard new String(byte[])
+      */
+    static final int pow2_6 = 64;        // 2^6
+    static final int pow2_12 = 4096;    // 2^12
+    static char cdata[] = new char[50];
+
+    public static final String decodeUTF8(byte data[], int offset, int length) {
+        if (cdata.length < (length-offset)) {
+            cdata = new char[length-offset];
+        }
+        int i = offset;
+        int j = 0;
+        int z, y, x, val;
+        while (i < length) {
+            z = data[i] & 0xFF;
+            if (z < 0x80) {
+                cdata[j++] = (char)data[i];
+                i++;
+            } else if (z >= 0xE0) {        // length == 3
+                y = data[i+1] & 0xFF;
+                x = data[i+2] & 0xFF;
+                val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
+                cdata[j++] = (char) val;
+                i+= 3;
+            } else {        // length == 2 (maybe add checking for length > 3, throw exception if it is
+                y = data[i+1] & 0xFF;
+                val = (z - 0xC0)* (pow2_6)+(y-0x80);
+                cdata[j++] = (char) val;
+                i+=2;
+            }
+        }
+
+        String s = new String(cdata, 0, j);
+        return s;
+    }
+

     /*
      * Decode an array of bytes into a string.

pgsql-jdbc by date:

Previous
From: Vernon Wu
Date:
Subject: Does the JDBC driver support XADataSource interface?
Next
From: Dave Cramer
Date:
Subject: Re: Speedup patch for getTables() and getIndexInfo()