Thread: A few more questions

A few more questions

From
"Chris Smith"
Date:
Here are a few more questions to throw out.

1. There's a comment in AbstractJdbc1Connection saying "TODO: handle
transaction status".  Did someone have something specific in mind regarding
how transaction status was to be "handled"?

2. What's the oldest pgsql version that we intend to support with the drivers
in 7.4?  Is it 7.0?  6.x?

3. For someone who knows the v3 protocol; there are a number of cases where
multiple queries are issued by one statement in the driver.  My understanding
is that this is not allowed in extended query.  Does that mean that we're
stuck with multiple round-trips if we substitute extended query?  Or may I
send multiple consecutive parse, bind, and execute commands without an
intervening read of the result; and then read all the results at once?

This latter bit seems to me like it risks walking into a trap where I'm
blocked on my send buffer waiting for the server to read more of a statement,
and the server is blocked on its send buffer waiting for me to read the
results from a previous execute.  But basically, I'm thinking there has to be
a way in v3 extended query to avoid this problem; otherwise, it would be a
potentially serious performance issue with no solution.  Is there a way?

4. As I plan the changes I'd like to make for v3 protocol support, it's
starting to look tempting to segregate a good bit of the code into separate
classes for v2 and v3.  Thus, we'd have:

    V2QueryExecutor - For v2 code
    V3QueryExecutor - For v3 code
    QueryExecutorUtil - For code shared between both classes above

and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet.  Would
this meet with objections if I tried to get this incorporated into the actual
drivers?

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation


Re: A few more questions

From
Dave Cramer
Date:
On Fri, 2004-02-13 at 20:09, Chris Smith wrote:
> Here are a few more questions to throw out.
>
> 1. There's a comment in AbstractJdbc1Connection saying "TODO: handle
> transaction status".  Did someone have something specific in mind regarding
> how transaction status was to be "handled"?
>
> 2. What's the oldest pgsql version that we intend to support with the drivers
> in 7.4?  Is it 7.0?  6.x?

7.2 is about as far back as I'd like to see handled
>
> 3. For someone who knows the v3 protocol; there are a number of cases where
> multiple queries are issued by one statement in the driver.  My understanding
> is that this is not allowed in extended query.  Does that mean that we're
> stuck with multiple round-trips if we substitute extended query?  Or may I
> send multiple consecutive parse, bind, and execute commands without an
> intervening read of the result; and then read all the results at once?
>
> This latter bit seems to me like it risks walking into a trap where I'm
> blocked on my send buffer waiting for the server to read more of a statement,
> and the server is blocked on its send buffer waiting for me to read the
> results from a previous execute.  But basically, I'm thinking there has to be
> a way in v3 extended query to avoid this problem; otherwise, it would be a
> potentially serious performance issue with no solution.  Is there a way?
>
> 4. As I plan the changes I'd like to make for v3 protocol support, it's
> starting to look tempting to segregate a good bit of the code into separate
> classes for v2 and v3.  Thus, we'd have:
>
>     V2QueryExecutor - For v2 code
>     V3QueryExecutor - For v3 code
>     QueryExecutorUtil - For code shared between both classes above
Any reason why you wouldn't extend V*QueryExecuter from
BaseQueryExecutor ?
>
> and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet.  Would
> this meet with objections if I tried to get this incorporated into the actual
> drivers?
These are a bit different, not sure what you have in mind here, but the
reason they are named like that is to allow you to build under various
JDK's, look in the build.xml file to see how that is handled.

Dave
--
Dave Cramer
519 939 0336
ICQ # 14675561


Re: A few more questions

From
Oliver Jowett
Date:
Chris Smith wrote:

> 3. For someone who knows the v3 protocol; there are a number of cases where
> multiple queries are issued by one statement in the driver.  My understanding
> is that this is not allowed in extended query.  Does that mean that we're
> stuck with multiple round-trips if we substitute extended query?  Or may I
> send multiple consecutive parse, bind, and execute commands without an
> intervening read of the result; and then read all the results at once?

The latter. Look at the Sync and Flush messages too -- they're designed
to deal with that sort of batching of queries.

> This latter bit seems to me like it risks walking into a trap where I'm
> blocked on my send buffer waiting for the server to read more of a statement,
> and the server is blocked on its send buffer waiting for me to read the
> results from a previous execute.  But basically, I'm thinking there has to be
> a way in v3 extended query to avoid this problem; otherwise, it would be a
> potentially serious performance issue with no solution.  Is there a way?

I raised this with Barry Lind a while back when I was looking at batch
updates. There doesn't seem to be an elegant way around it. Ideas:

Option #1: limit the total size of outstanding sent data

   while (more queries) {
     send next query
     size = 0
     while (more queries && size + next query size < threshold) {
        send next query
        size += next query size
     }

     send Sync (or possibly Flush depending on how you deal with errors)
     read results for sent queries, deal with errors, etc
   }

Then we set threshold based on the expected buffer sizes between us and
the server (kernel buffer etc).

The above code is a bit subtle .. we only need to count queries after
the first towards the threshold, since we assume we only block when
there are greater than threshold bytes unread by the server, and at a
minimum the server will read the first query from the stream before
blocking.

Also we could probably avoid waiting ignore packets that don't generate
large resultsets in this calculation (or track our idea of how much data
should be coming back and only start throttling our queries when it gets
dangerous).

Option #2: use a separate write (probably easier) or read thread. This
means creating threads from the driver, which is new, and I'm not sure
how the additional context switches on every query will affect performance.

Option #3: switch to NIO but I don't think that's really feasible since
we'd require 1.4, and NIO has a bunch of resource-exhaustion issues in
current JVMs anyway.

Option #4: set a socket timeout on writes. I'm not sure how reliable
this is, though, or if we can safely recover from timeouts that do occur.

> 4. As I plan the changes I'd like to make for v3 protocol support, it's
> starting to look tempting to segregate a good bit of the code into separate
> classes for v2 and v3.  Thus, we'd have:
>
>     V2QueryExecutor - For v2 code
>     V3QueryExecutor - For v3 code
>     QueryExecutorUtil - For code shared between both classes above

Or subclass BaseQueryExecutor -> V2/V3QueryExecutor. (six of one..)

This was pretty high on my list of refactoring to do. Also consider
using a single query executor instance per connection and killing the
existing static methods -- currently we check the protocol version and
create a new instance per query which seems entirely unnecessary.

> and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet.

I'm not so sure about this one though. You run the danger of a lot of
code duplication here.

I was looking at a design where the statement-execution and
result-gathering code was broken out into a internal
interface/implementation classes, with separate implementations for V2
and V3. These talked only in the native postgresql types known to the
backend (int2, int4, int8, bytea, text, etc). Then there was a single
Statement/ResultSet implementation that delegated to an instance of the
interface, i.e. all the JDBC<->postgresql type mapping and
Statement/ResultSet behaviour glue was in this common code.

The other thing I was doing here was to move the server-side preparation
and translation/parsing of queries down into those protocol-specific
classes -- as V2 and V3 work quite differently in these areas (the most
obvious bit being that using the V3 extended protocol, you have to parse
the query for multiple statements; you'd also want to maintain different
state about cursors/PREPARE vs. named portals and statements).

I have some very early interface code I can send you if you'd like to
take a look.

-O

Re: A few more questions

From
"Chris Smith"
Date:
Dave,

Dave Cramer wrote:
> > 2. What's the oldest pgsql version that we intend to support with
> > the drivers in 7.4?  Is it 7.0?  6.x?
>
> 7.2 is about as far back as I'd like to see handled

That would be great, from the perspective of trying to simplify the code.
However, there's a good bit of code there that exists to support at least as
old as 7.0 (the transaction isolation level stuff comes to mind).  That code
also just happens to be interfering with what I'm doing right now.  Is your
answer just a one of many opinions, or an okay to nix it all?

> >     V2QueryExecutor - For v2 code
> >     V3QueryExecutor - For v3 code
> >     QueryExecutorUtil - For code shared between both classes above
> Any reason why you wouldn't extend V*QueryExecuter from
> BaseQueryExecutor ?

Actually, I would extend BaseQueryExecutor.  I just hadn't completely thought
it through when I posted.

> These are a bit different, not sure what you have in mind here, but
> the reason they are named like that is to allow you to build under
> various JDK's, look in the build.xml file to see how that is handled.

I do understand that general situation with the separate packages and
interfaces for various JDBC/core API versions.  I'm not entirely sure what I
have in mind here either; I suppose it's mostly QueryExecutor that I want to
split at this point, and perhaps to relocate some code from
AbstractJdbc1Statement and/or PGStream into the split QueryExecutor classes as
well.  There seems to be general agreement with splitting QueryExecutor, I'll
write something first, and then see if there are remaining problems.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation


Re: A few more questions

From
"Chris Smith"
Date:
Oliver Jowett wrote:
> I raised this with Barry Lind a while back when I was looking at batch
> updates.

Good to see I'm not the first person to deal with it.  Thanks for your
comments.

> There doesn't seem to be an elegant way around it. Ideas:
>
> Option #1: limit the total size of outstanding sent data

This looks like the best bet to me.  It is ugly, but at least its ugliness is
confined to one section of code and doesn't have a global impact in the manner
of threads or aborting the whole connection.  I'll probably take a look at
this early next week and see what I can come up with.

> This was pretty high on my list of refactoring to do. Also consider
> using a single query executor instance per connection and killing the
> existing static methods -- currently we check the protocol version and
> create a new instance per query which seems entirely unnecessary.

Thanks... yes I'll do that.  Since the sending and receiving of queries is
serialized by a monitor on the PGStream anyway, it shouldn't cost anything
this way.  Preferably, we could move a lot of protocol-specific code into
these polymorphic classes.

> I'm not so sure about this one though. You run the danger of a lot of
> code duplication here.

Right.  I am mainly concerned about AbstractJdbc1Statement and all the
convolutions it goes through with breaking down and transforming the query
into gazillions of forms (server-side prepared statements, cursors, etc.)
Trying to do much with that class while retaining all that state is perhaps
not the best of ideas.  But that could possibly be handled by encapsulating
into a separate object, and splitting *that* class into v2 and v3 versions.
Probably much cleaner that way.

> I have some very early interface code I can send you if you'd like to
> take a look.

Sure, I'll take a look.  That'd be great.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation


Re: A few more questions

From
Barry Lind
Date:
Chris,

Chris Smith wrote:
> Here are a few more questions to throw out.
>
> 1. There's a comment in AbstractJdbc1Connection saying "TODO: handle
> transaction status".  Did someone have something specific in mind regarding
> how transaction status was to be "handled"?

Well in general someone could include a transaction control statement
(begin, commit, rollback) in a sql statement.  This would confuse the
driver since it is assuming that it is controlling transaction state as
a result of the current setting of autoCommit().  The transaction state
reported by the server would allow the driver to verify that the state
was what the driver expected and allow the driver to correct it if it
wasn't (for example if autoCommit() was false and the user issued a
commit, then the driver would detect that fact that the transaction had
ended and start a new one).

>
> 2. What's the oldest pgsql version that we intend to support with the drivers
> in 7.4?  Is it 7.0?  6.x?

In the past when this has been discussed on the mail lists we decided to
support two past releases.  Thus the 7.4 driver would support back to
7.2.  However we have also never removed code that enables older
versions than that.  So one could say we support two versions back, but
expect older versions to generally work, but we wouldn't consider it a
bug if they didn't.  I am fine with actually beginning to remove the
older support, but we really should then document the policy on the web
site and in the documentation.

>
> 3. For someone who knows the v3 protocol; there are a number of cases where
> multiple queries are issued by one statement in the driver.  My understanding
> is that this is not allowed in extended query.  Does that mean that we're
> stuck with multiple round-trips if we substitute extended query?  Or may I
> send multiple consecutive parse, bind, and execute commands without an
> intervening read of the result; and then read all the results at once?
>

Which examples of multiple statements are you referring to?  As I see
it, many of the multiple statements in a single query are there because
of the limitations of the V2 protocol, but under the V3 protocol they
can be eliminated by using the features of the V3 protocol.  So which
ones are you looking at that you feel are a problem?  I am sure there
are some and I think Oliver has already responded with good ideas on how
to handle those cases, but in general, I think we should be trying to
eliminate them were possible.

> This latter bit seems to me like it risks walking into a trap where I'm
> blocked on my send buffer waiting for the server to read more of a statement,
> and the server is blocked on its send buffer waiting for me to read the
> results from a previous execute.  But basically, I'm thinking there has to be
> a way in v3 extended query to avoid this problem; otherwise, it would be a
> potentially serious performance issue with no solution.  Is there a way?
>
> 4. As I plan the changes I'd like to make for v3 protocol support, it's
> starting to look tempting to segregate a good bit of the code into separate
> classes for v2 and v3.  Thus, we'd have:
>
>     V2QueryExecutor - For v2 code
>     V3QueryExecutor - For v3 code
>     QueryExecutorUtil - For code shared between both classes above
>

Breaking this code out would be a very good idea.

> and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet.  Would
> this meet with objections if I tried to get this incorporated into the actual
> drivers?

The AbstractJdbc* stuff is much more tricky.  But I think from later
emails in this thread you understand the issues and are on the right track.

thanks,
--Barry