Re: A few more questions - Mailing list pgsql-jdbc

From Oliver Jowett
Subject Re: A few more questions
Date
Msg-id 402E9D9B.8070607@opencloud.com
Whole thread Raw
In response to A few more questions  ("Chris Smith" <chris@mindiq.com>)
List pgsql-jdbc
Chris Smith wrote:

> 3. For someone who knows the v3 protocol; there are a number of cases where
> multiple queries are issued by one statement in the driver.  My understanding
> is that this is not allowed in extended query.  Does that mean that we're
> stuck with multiple round-trips if we substitute extended query?  Or may I
> send multiple consecutive parse, bind, and execute commands without an
> intervening read of the result; and then read all the results at once?

The latter. Look at the Sync and Flush messages too -- they're designed
to deal with that sort of batching of queries.

> This latter bit seems to me like it risks walking into a trap where I'm
> blocked on my send buffer waiting for the server to read more of a statement,
> and the server is blocked on its send buffer waiting for me to read the
> results from a previous execute.  But basically, I'm thinking there has to be
> a way in v3 extended query to avoid this problem; otherwise, it would be a
> potentially serious performance issue with no solution.  Is there a way?

I raised this with Barry Lind a while back when I was looking at batch
updates. There doesn't seem to be an elegant way around it. Ideas:

Option #1: limit the total size of outstanding sent data

   while (more queries) {
     send next query
     size = 0
     while (more queries && size + next query size < threshold) {
        send next query
        size += next query size
     }

     send Sync (or possibly Flush depending on how you deal with errors)
     read results for sent queries, deal with errors, etc
   }

Then we set threshold based on the expected buffer sizes between us and
the server (kernel buffer etc).

The above code is a bit subtle .. we only need to count queries after
the first towards the threshold, since we assume we only block when
there are greater than threshold bytes unread by the server, and at a
minimum the server will read the first query from the stream before
blocking.

Also we could probably avoid waiting ignore packets that don't generate
large resultsets in this calculation (or track our idea of how much data
should be coming back and only start throttling our queries when it gets
dangerous).

Option #2: use a separate write (probably easier) or read thread. This
means creating threads from the driver, which is new, and I'm not sure
how the additional context switches on every query will affect performance.

Option #3: switch to NIO but I don't think that's really feasible since
we'd require 1.4, and NIO has a bunch of resource-exhaustion issues in
current JVMs anyway.

Option #4: set a socket timeout on writes. I'm not sure how reliable
this is, though, or if we can safely recover from timeouts that do occur.

> 4. As I plan the changes I'd like to make for v3 protocol support, it's
> starting to look tempting to segregate a good bit of the code into separate
> classes for v2 and v3.  Thus, we'd have:
>
>     V2QueryExecutor - For v2 code
>     V3QueryExecutor - For v3 code
>     QueryExecutorUtil - For code shared between both classes above

Or subclass BaseQueryExecutor -> V2/V3QueryExecutor. (six of one..)

This was pretty high on my list of refactoring to do. Also consider
using a single query executor instance per connection and killing the
existing static methods -- currently we check the protocol version and
create a new instance per query which seems entirely unnecessary.

> and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet.

I'm not so sure about this one though. You run the danger of a lot of
code duplication here.

I was looking at a design where the statement-execution and
result-gathering code was broken out into a internal
interface/implementation classes, with separate implementations for V2
and V3. These talked only in the native postgresql types known to the
backend (int2, int4, int8, bytea, text, etc). Then there was a single
Statement/ResultSet implementation that delegated to an instance of the
interface, i.e. all the JDBC<->postgresql type mapping and
Statement/ResultSet behaviour glue was in this common code.

The other thing I was doing here was to move the server-side preparation
and translation/parsing of queries down into those protocol-specific
classes -- as V2 and V3 work quite differently in these areas (the most
obvious bit being that using the V3 extended protocol, you have to parse
the query for multiple statements; you'd also want to maintain different
state about cursors/PREPARE vs. named portals and statements).

I have some very early interface code I can send you if you'd like to
take a look.

-O

pgsql-jdbc by date:

Previous
From: Dave Cramer
Date:
Subject: Re: A few more questions
Next
From: Tom Ansley
Date:
Subject: unsubscribe