Re: JDBC Performance - Mailing list pgsql-general

From Tim Kientzle
Subject Re: JDBC Performance
Date
Msg-id 39D2B87C.32C3DB4B@acm.org
Whole thread Raw
List pgsql-general
> I'm finding that ... my CPU spends about 60% of the time in JDBC, and about
> 40% of the time in the postmaster backend.
> (Each query takes 2 msec on my machine, RH Linux 6.2, 733 MHz Intel, w/
> lots of memory.)

This doesn't sound too bad to me, to be honest.  I've not tried using
JDBC with PostgreSQL, but I've done a lot with MySQL (and some with
Oracle, although not as recently).  I'm used to seeing 5-10ms for
a fairly basic indexed query on a PII/266.

A large portion of the client-side overhead you're seeing involves
the conversion of strings into bytes for transfer over the network
(and the reverse conversion on the other side).  Java strings use
Unicode, and this has to be translated into bytes for the network.
This surprises people familiar with C, but it is the "right" way
to do it; characters and bytes are not the same thing.

Some of this overhead can be reduced with a really good JIT, but
not all.  Experiment with different JVMs and see if that helps any.

Several standard suggestions for improving JDBC performance:

* Cache.  Keep data within the client whenever you can to reduce
  the number of round-trips to the database.
* Minimize the number of queries.  It often pays off big to
  do a single SELECT that returns many rows rather than to do
  a bunch of smaller SELECTs.  Each query involves query construction
  at the client, network overhead and parsing and execution overhead;
  after all that, each row is relatively cheap.
* Use multi-threading, but cautiously.  Because of the intrinsic delays
  of communicating with a separate server, you can improve performance
  by opening a couple of database connections and issuing queries over
  each one.  This only helps up to a point, though, and good
multi-threaded
  code is hard to write in any language, including Java.  This helps
  less with a local server than a networked one, of course.

C can be significantly faster, simply because you can build the query
directly as an ASCII string and then just pump it over the socket
without the character-to-byte conversion overhead.  Of course, that
only applies if you're using pretty simple queries.  For more complex
queries or large databases, the database processing time dominates,
and nothing else really matters.

There are a lot of other factors to consider, of course.  In particular,
time per query is usually less important than queries per second.
During the wait time for one transaction, other transaction can be
going on simultaneously.  If you're writing servlet-based systems,
for example, you can get pretty good parallelism, especially on SMP
machines, where the DB and Java can actually run on separate processors.

                - Tim

pgsql-general by date:

Previous
From: "elein"
Date:
Subject: oracle ate
Next
From: "Rafa Couto"
Date:
Subject: ALERT: VIRUS Warning (WScript.KakWorm)