Thread: PreparedStatement parameters and mutable objects

PreparedStatement parameters and mutable objects

From

Oliver Jowett

Date:

06 January 2004, 21:06:06

I'm examining ways to reduce the amount of garbage generated by the
driver, and one approach I'm looking at is to delay converting the
parameters of a PreparedStatement to string form until we're actually
streaming data to the backend, to avoid creating an intermediate copy.

This leads to a question .. does this code have well-defined behaviour:

   Connection conn = /* ... */;
   PreparedStatement pstmt =
      conn.prepareStatement("INSERT INTO t(data) VALUES (?)");
   byte[] data = new byte[] { (byte)1, (byte)2, (byte)3 };
   pstmt.setBytes(1, data);
   data[0] = (byte)42;
   pstmt.executeUpdate();

i.e. is the driver required to capture the value of the byte array
passed to setBytes() at a particular time (either the time of the call
or the time of execution), or is it free to choose? Currently we capture
the value at the time of call.

The same problem applies to all parameter-setting methods that take
mutable objects.

I can't see anything about this in the (PDF) JDBC specification. The
javadoc says:

   The driver converts this to an SQL VARBINARY or LONGVARBINARY
   (depending on the argument's size relative to the driver's limits on
   VARBINARY values) when it sends it to the database.

which implies the driver can delay the conversion (or perhaps must delay
the conversion).

Anyone have an opinion on this?

-O

Re: PreparedStatement parameters and mutable objects

From

Dave Cramer

Date:

07 January 2004, 09:00:43

Oliver,

while I think the code snippet below has defined behaviour I think it is
fraught with danger.
As long as everyone knows that the code does this and writes code that
does not break then yes it would work, however I don't believe this is
the case.

Also reading the spec that you provided, I think the phrase "when it
sends" refers to the actual conversion that takes place not the specific
time that it is done.

In general while I can see that this would indeed speed up the driver, I
don't think I can agree to the mechanism proposed however, I'm willing
to be convinced.

Dave
On Tue, 2004-01-06 at 20:05, Oliver Jowett wrote:
> I'm examining ways to reduce the amount of garbage generated by the
> driver, and one approach I'm looking at is to delay converting the
> parameters of a PreparedStatement to string form until we're actually
> streaming data to the backend, to avoid creating an intermediate copy.
>
> This leads to a question .. does this code have well-defined behaviour:
>
>    Connection conn = /* ... */;
>    PreparedStatement pstmt =
>       conn.prepareStatement("INSERT INTO t(data) VALUES (?)");
>    byte[] data = new byte[] { (byte)1, (byte)2, (byte)3 };
>    pstmt.setBytes(1, data);
>    data[0] = (byte)42;
>    pstmt.executeUpdate();
>
> i.e. is the driver required to capture the value of the byte array
> passed to setBytes() at a particular time (either the time of the call
> or the time of execution), or is it free to choose? Currently we capture
> the value at the time of call.
>
> The same problem applies to all parameter-setting methods that take
> mutable objects.
>
> I can't see anything about this in the (PDF) JDBC specification. The
> javadoc says:
>
>    The driver converts this to an SQL VARBINARY or LONGVARBINARY
>    (depending on the argument's size relative to the driver's limits on
>    VARBINARY values) when it sends it to the database.
>
> which implies the driver can delay the conversion (or perhaps must delay
> the conversion).
>
> Anyone have an opinion on this?
>
> -O
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>
--
Dave Cramer
519 939 0336
ICQ # 1467551

Re: PreparedStatement parameters and mutable objects

From

Barry Lind

Date:

07 January 2004, 13:47:47

Oliver,

I think we are free to do it either way.  I suspect that most jdbc
drivers delay this work until the actual call to the server.

I am curious why you are choosing to optimize this particular aspect?
It seems the bigger win is on query results where the common code path
does the following:  read from socket, put value into a new byte[],
convert byte[] to a String, convert String to required object/type.
There are three copies in different forms of the same data being
generated for each value.  Ugh!

I have long wanted to fix this, but it always ends up being too big a
project for the time I have to work on it.  Have you given any thought
to this bigger problem?

Also, your email below indicates to me that you are trying to do this
with the old V2 protocol methods of executing queries.  The new V3
protocol provides a much, much better interface to the driver to do this
more efficiently.  Have you looked at using the V3 protocol features to
execute the prepared statements?

thanks,
--Barry

Oliver Jowett wrote:

> I'm examining ways to reduce the amount of garbage generated by the
> driver, and one approach I'm looking at is to delay converting the
> parameters of a PreparedStatement to string form until we're actually
> streaming data to the backend, to avoid creating an intermediate copy.
>
> This leads to a question .. does this code have well-defined behaviour:
>
>   Connection conn = /* ... */;
>   PreparedStatement pstmt =
>      conn.prepareStatement("INSERT INTO t(data) VALUES (?)");
>   byte[] data = new byte[] { (byte)1, (byte)2, (byte)3 };
>   pstmt.setBytes(1, data);
>   data[0] = (byte)42;
>   pstmt.executeUpdate();
>
> i.e. is the driver required to capture the value of the byte array
> passed to setBytes() at a particular time (either the time of the call
> or the time of execution), or is it free to choose? Currently we capture
> the value at the time of call.
>
> The same problem applies to all parameter-setting methods that take
> mutable objects.
>
> I can't see anything about this in the (PDF) JDBC specification. The
> javadoc says:
>
>   The driver converts this to an SQL VARBINARY or LONGVARBINARY
>   (depending on the argument's size relative to the driver's limits on
>   VARBINARY values) when it sends it to the database.
>
> which implies the driver can delay the conversion (or perhaps must delay
> the conversion).
>
> Anyone have an opinion on this?
>
> -O
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>

Re: PreparedStatement parameters and mutable objects

From

Oliver Jowett

Date:

07 January 2004, 19:59:02

Barry Lind wrote:
> Oliver,
>
> I think we are free to do it either way.  I suspect that most jdbc
> drivers delay this work until the actual call to the server.

Good!

> I am curious why you are choosing to optimize this particular aspect? It
> seems the bigger win is on query results where the common code path does
> the following:  read from socket, put value into a new byte[], convert
> byte[] to a String, convert String to required object/type. There are
> three copies in different forms of the same data being generated for
> each value.  Ugh!

In our case, we don't pull much data from the DB in normal operation.
The normal case is for each transaction to query the DB to check the
state hasn't shifted unexpectedly (this is a tiny ResultSet -- one row
with one int8 column), then push out a large batch of
inserts/updates/deletes where the bulk of the data is bytearrays set via
PreparedStatement.setBytes(). We only have large ResultSets when pulling
data out of the database at startup, and at that point we're not so
worried about garbage as we're not doing latency-sensitive processing yet.

> I have long wanted to fix this, but it always ends up being too big a
> project for the time I have to work on it.  Have you given any thought
> to this bigger problem?

It's somewhere on my list of things to do.. but that list currently
fills a whiteboard. I've given a bit of thought to it but didn't get
much beyond looking at the current code. It seems like we should be
doing something like maintaining the original bytearray as a single
object (or perhaps one per row), and doing conversions from ranges of
that array on demand .. would need to maintain some metadata to easily
locate column and row boundaries .. using the V3 binary tuple format
might make things easier.

> Also, your email below indicates to me that you are trying to do this
> with the old V2 protocol methods of executing queries.  The new V3
> protocol provides a much, much better interface to the driver to do this
> more efficiently.  Have you looked at using the V3 protocol features to
> execute the prepared statements?

Yes, I'm looking at using V3 (also probably for COPY support). There's a
fair amount of reorganization needed to do this properly, though; the
main part would be abstracting the conversion of parameters to wire
representations, so we can support different representations for V2 and
V3 easily, and possibly something similar for query execution methods
(to support reusing queries via V3's Parse message).

There's also a whole slew of other changes that could be made to use
V3's features, especially around cursors and not materializing all the
rows in a resultset at once; it looks like we can use a named portal and
set a row limit, then use FETCH or another Execute to navigate around
the rows within that portal, for any row-returning command -- much
cleaner than the current "transform-to-DECLARE" approach. But I'm not
sure if I want to take that on at the moment :)

-O

Re: PreparedStatement parameters and mutable objects

From

Oliver Jowett

Date:

08 January 2004, 02:04:35

Dave Cramer wrote:
> Oliver,
>
> while I think the code snippet below has defined behaviour

What is the defined behaviour, then? It's not obvious to me whether
{1,2,3}, {42,2,3}, or a driver-dependent value (i.e. undefined
behaviour) gets inserted.

 > I think it is
> fraught with danger.

Yes :)

> As long as everyone knows that the code does this and writes code that
> does not break then yes it would work, however I don't believe this is
> the case.

I'm not sure what you mean here.

Are you saying that the driver would be incorrect to delay the
conversion, as there is existing (correct) code that expects other
behaviour?

Or are you saying that delaying the conversion is valid, but there is
existing (buggy) code that expects it not to be delayed, and we
shouldn't break that code?

Or something else entirely? :)

> Also reading the spec that you provided, I think the phrase "when it
> sends" refers to the actual conversion that takes place not the specific
> time that it is done.

Yeah .. the spec just doesn't seem to address this area at all.

-O

Re: PreparedStatement parameters and mutable objects

From

Dave Cramer

Date:

08 January 2004, 08:22:05

On Thu, 2004-01-08 at 01:03, Oliver Jowett wrote:
> Dave Cramer wrote:
> > Oliver,
> >
> > while I think the code snippet below has defined behaviour
>
> What is the defined behaviour, then? It's not obvious to me whether
> {1,2,3}, {42,2,3}, or a driver-dependent value (i.e. undefined
> behaviour) gets inserted.
defined by who, I meant it was defined in that it didn't have any
surprises. If you implement your idea, then 42,2,3 will be inserted
otherwise 1,2,3

>
>  > I think it is
> > fraught with danger.
>
> Yes :)
>
> > As long as everyone knows that the code does this and writes code that
> > does not break then yes it would work, however I don't believe this is
> > the case.
>
> I'm not sure what you mean here.
>
> Are you saying that the driver would be incorrect to delay the
> conversion, as there is existing (correct) code that expects other
> behaviour?
>
> Or are you saying that delaying the conversion is valid, but there is
> existing (buggy) code that expects it not to be delayed, and we
> shouldn't break that code?
>
> Or something else entirely? :)
I'm saying that doing it this way will likely expose buggy code, which
we will end up having to figure out why it doesn't work, when the user
says "my code used to work, and now it doesn't", plus they don't send us
code to look at.
>
> > Also reading the spec that you provided, I think the phrase "when it
> > sends" refers to the actual conversion that takes place not the specific
> > time that it is done.
>
> Yeah .. the spec just doesn't seem to address this area at all.

:(

>
> -O
>
--
Dave Cramer
519 939 0336
ICQ # 1467551

Re: PreparedStatement parameters and mutable objects

From

Barry Lind

Date:

08 January 2004, 12:03:26


Oliver Jowett wrote:
> Barry Lind wrote:
>> I am curious why you are choosing to optimize this particular aspect?
>> It seems the bigger win is on query results where the common code path
>> does the following:  read from socket, put value into a new byte[],
>> convert byte[] to a String, convert String to required object/type.
>> There are three copies in different forms of the same data being
>> generated for each value.  Ugh!
 >>
>> I have long wanted to fix this, but it always ends up being too big a
>> project for the time I have to work on it.  Have you given any thought
>> to this bigger problem?
>
> It's somewhere on my list of things to do.. but that list currently
> fills a whiteboard. I've given a bit of thought to it but didn't get
> much beyond looking at the current code. It seems like we should be
> doing something like maintaining the original bytearray as a single
> object (or perhaps one per row), and doing conversions from ranges of
> that array on demand .. would need to maintain some metadata to easily
> locate column and row boundaries .. using the V3 binary tuple format
> might make things easier.

My idea for handling this problem was to extend the use of PGobject to
cover all datatypes (not just special ones as is the case today).  And
then store data as PGobject[][]  where each column value would be some
subtype of PGobject specific for that datatype.  Thus 'PGint' would know
how to handle the v2 and v3 protocol format for ints and have accessor
methods for getting the value as a String or possibly other formats,
etc.  I think this would allow the code in most cases to only need one
copy of the data around at a time.

>> Also, your email below indicates to me that you are trying to do this
>> with the old V2 protocol methods of executing queries.  The new V3
>> protocol provides a much, much better interface to the driver to do
>> this more efficiently.  Have you looked at using the V3 protocol
>> features to execute the prepared statements?
>
> Yes, I'm looking at using V3 (also probably for COPY support). There's a
> fair amount of reorganization needed to do this properly, though; the
> main part would be abstracting the conversion of parameters to wire
> representations, so we can support different representations for V2 and
> V3 easily, and possibly something similar for query execution methods
> (to support reusing queries via V3's Parse message).
>
> There's also a whole slew of other changes that could be made to use
> V3's features, especially around cursors and not materializing all the
> rows in a resultset at once; it looks like we can use a named portal and
> set a row limit, then use FETCH or another Execute to navigate around
> the rows within that portal, for any row-returning command -- much
> cleaner than the current "transform-to-DECLARE" approach. But I'm not
> sure if I want to take that on at the moment :)
>

Yup.  There is a lot of work to do here to fully utilize the V3
protocol.  I worked with Tom to insure that his protocol rewrite had all
the features needed by jdbc (which I believe it does), we just need to
take advantage of them.

thanks,
--Barry

Re: PreparedStatement parameters and mutable objects

From

Oliver Jowett

Date:

08 January 2004, 18:57:23

Dave Cramer wrote:
> On Thu, 2004-01-08 at 01:03, Oliver Jowett wrote:
>
>>Dave Cramer wrote:
>>
>>>Oliver,
>>>
>>>while I think the code snippet below has defined behaviour
>>
>>What is the defined behaviour, then? It's not obvious to me whether
>>{1,2,3}, {42,2,3}, or a driver-dependent value (i.e. undefined
>>behaviour) gets inserted.
>
> defined by who, I meant it was defined in that it didn't have any
> surprises. If you implement your idea, then 42,2,3 will be inserted
> otherwise 1,2,3

Ok. I meant "defined by the JDBC spec" i.e. portable between drivers. It
sounds like you think it's undefined in this sense.

> I'm saying that doing it this way will likely expose buggy code, which
> we will end up having to figure out why it doesn't work, when the user
> says "my code used to work, and now it doesn't", plus they don't send us
> code to look at.

How far do we go to support buggy code though? If we can't make this
sort of change, we lose several opportunities for optimization.

If the only objection is supporting buggy code, I'll probably do the
changes anyway and apply the patch locally even if it doesn't hit CVS,
since we don't have buggy code here ;)

-O

Re: PreparedStatement parameters and mutable objects

From

Kris Jurka

Date:

09 January 2004, 04:19:12

On Fri, 9 Jan 2004, Oliver Jowett wrote:

> > I'm saying that doing it this way will likely expose buggy code, which
> > we will end up having to figure out why it doesn't work, when the user
> > says "my code used to work, and now it doesn't", plus they don't send us
> > code to look at.
>
> How far do we go to support buggy code though? If we can't make this
> sort of change, we lose several opportunities for optimization.

I don't think you can label this as buggy code unless you can point to the
spec and say where it is disallowed.  It is certainly something that looks
dangerous and is unlikely to be written by the average developer, but that
doesn't make it illegal.  I lean towards the notion that when I say setXXX
that's the time the value must be saved because that's certainly more
intuitive and specific than "sometime later when the driver decides to."

Kris Jurka

Re: PreparedStatement parameters and mutable objects

From

Dave Cramer

Date:

09 January 2004, 09:47:47

Well there is some hint that this is incorrect:


pstmnt = con.prepareStatment("insert into foo values(?,?)");

pstmnt.setString(1,"abcd");

for( i=0; i< 5; i++)
{
    Integer intx = new Integer(i);
    pstmnt.setInt(2,intx)
    pstmnt.executeUpdate();
}

The above code should insert ("abcd", 0 .. 4 ) into five rows

The point being that the value for setXXX does not change until setXXX
is called.

This is from section 24.1.2 in the JDBC api tutorial

In all I'm sort of sitting on the fence here. I guess we could try it
and see how many people get tripped up?

Dave

On Fri, 2004-01-09 at 03:18, Kris Jurka wrote:
> On Fri, 9 Jan 2004, Oliver Jowett wrote:
>
> > > I'm saying that doing it this way will likely expose buggy code, which
> > > we will end up having to figure out why it doesn't work, when the user
> > > says "my code used to work, and now it doesn't", plus they don't send us
> > > code to look at.
> >
> > How far do we go to support buggy code though? If we can't make this
> > sort of change, we lose several opportunities for optimization.
>
> I don't think you can label this as buggy code unless you can point to the
> spec and say where it is disallowed.  It is certainly something that looks
> dangerous and is unlikely to be written by the average developer, but that
> doesn't make it illegal.  I lean towards the notion that when I say setXXX
> that's the time the value must be saved because that's certainly more
> intuitive and specific than "sometime later when the driver decides to."
>
> Kris Jurka
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
--
Dave Cramer
519 939 0336
ICQ # 1467551

Re: PreparedStatement parameters and mutable objects

From

Andrew Rawnsley

Date:

09 January 2004, 12:10:13

If anyone is looking for behavior from others, Oracle (my apologies for
using the 'O' word...) behaves like this:

        byte[] b = new byte[] {(byte)1,(byte)2,(byte)3};
        stmnt.setBytes(1,b);
        b[2] = (byte) 33;
        stmnt.setBytes(1,b);
        stmnt.execute();

results in {1,2,33}. (Disclaimer - I'm not advocating something just
because 'Oracle does it this way')

Why someone would do this is, of course, I don't know.

While I am all for the idea of improved performance, I find myself
forced to agree with the idea that, while something
like the above is kooky, it isn't illegal, and falls into the realm of
expected (but not necessarily mandated) behavior from a driver.
Ambiguity tends to have to favor the idiot, unfortunately.

On Jan 9, 2004, at 8:46 AM, Dave Cramer wrote:

> Well there is some hint that this is incorrect:
>
>
> pstmnt = con.prepareStatment("insert into foo values(?,?)");
>
> pstmnt.setString(1,"abcd");
>
> for( i=0; i< 5; i++)
> {
>     Integer intx = new Integer(i);
>     pstmnt.setInt(2,intx)
>     pstmnt.executeUpdate();
> }
>
> The above code should insert ("abcd", 0 .. 4 ) into five rows
>
> The point being that the value for setXXX does not change until setXXX
> is called.
>
> This is from section 24.1.2 in the JDBC api tutorial
>
> In all I'm sort of sitting on the fence here. I guess we could try it
> and see how many people get tripped up?
>
> Dave
>
> On Fri, 2004-01-09 at 03:18, Kris Jurka wrote:
>> On Fri, 9 Jan 2004, Oliver Jowett wrote:
>>
>>>> I'm saying that doing it this way will likely expose buggy code,
>>>> which
>>>> we will end up having to figure out why it doesn't work, when the
>>>> user
>>>> says "my code used to work, and now it doesn't", plus they don't
>>>> send us
>>>> code to look at.
>>>
>>> How far do we go to support buggy code though? If we can't make this
>>> sort of change, we lose several opportunities for optimization.
>>
>> I don't think you can label this as buggy code unless you can point
>> to the
>> spec and say where it is disallowed.  It is certainly something that
>> looks
>> dangerous and is unlikely to be written by the average developer, but
>> that
>> doesn't make it illegal.  I lean towards the notion that when I say
>> setXXX
>> that's the time the value must be saved because that's certainly more
>> intuitive and specific than "sometime later when the driver decides
>> to."
>>
>> Kris Jurka
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>>
> --
> Dave Cramer
> 519 939 0336
> ICQ # 1467551
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>
--------------------

Andrew Rawnsley
President
The Ravensfield Digital Resource Group, Ltd.
(740) 587-0114
www.ravensfield.com

Re: PreparedStatement parameters and mutable objects

From

Dave Cramer

Date:

09 January 2004, 12:10:42

Andrew,

What happens in Oracle if he second setBytes is not called?
does it still insert {1,2,33} ?

I think my biggest concern is Andrew's last statement:
"Ambiguity tends to have to favor the idiot, unfortunately"


Dave

On Fri, 2004-01-09 at 09:28, Andrew Rawnsley wrote:
> If anyone is looking for behavior from others, Oracle (my apologies for
> using the 'O' word...) behaves like this:
>
>         byte[] b = new byte[] {(byte)1,(byte)2,(byte)3};
>         stmnt.setBytes(1,b);
>         b[2] = (byte) 33;
>         stmnt.setBytes(1,b);
>         stmnt.execute();
>
> results in {1,2,33}. (Disclaimer - I'm not advocating something just
> because 'Oracle does it this way')
>
> Why someone would do this is, of course, I don't know.
>
> While I am all for the idea of improved performance, I find myself
> forced to agree with the idea that, while something
> like the above is kooky, it isn't illegal, and falls into the realm of
> expected (but not necessarily mandated) behavior from a driver.
> Ambiguity tends to have to favor the idiot, unfortunately.
>
> On Jan 9, 2004, at 8:46 AM, Dave Cramer wrote:
>
> > Well there is some hint that this is incorrect:
> >
> >
> > pstmnt = con.prepareStatment("insert into foo values(?,?)");
> >
> > pstmnt.setString(1,"abcd");
> >
> > for( i=0; i< 5; i++)
> > {
> >     Integer intx = new Integer(i);
> >     pstmnt.setInt(2,intx)
> >     pstmnt.executeUpdate();
> > }
> >
> > The above code should insert ("abcd", 0 .. 4 ) into five rows
> >
> > The point being that the value for setXXX does not change until setXXX
> > is called.
> >
> > This is from section 24.1.2 in the JDBC api tutorial
> >
> > In all I'm sort of sitting on the fence here. I guess we could try it
> > and see how many people get tripped up?
> >
> > Dave
> >
> > On Fri, 2004-01-09 at 03:18, Kris Jurka wrote:
> >> On Fri, 9 Jan 2004, Oliver Jowett wrote:
> >>
> >>>> I'm saying that doing it this way will likely expose buggy code,
> >>>> which
> >>>> we will end up having to figure out why it doesn't work, when the
> >>>> user
> >>>> says "my code used to work, and now it doesn't", plus they don't
> >>>> send us
> >>>> code to look at.
> >>>
> >>> How far do we go to support buggy code though? If we can't make this
> >>> sort of change, we lose several opportunities for optimization.
> >>
> >> I don't think you can label this as buggy code unless you can point
> >> to the
> >> spec and say where it is disallowed.  It is certainly something that
> >> looks
> >> dangerous and is unlikely to be written by the average developer, but
> >> that
> >> doesn't make it illegal.  I lean towards the notion that when I say
> >> setXXX
> >> that's the time the value must be saved because that's certainly more
> >> intuitive and specific than "sometime later when the driver decides
> >> to."
> >>
> >> Kris Jurka
> >>
> >>
> >> ---------------------------(end of
> >> broadcast)---------------------------
> >> TIP 4: Don't 'kill -9' the postmaster
> >>
> > --
> > Dave Cramer
> > 519 939 0336
> > ICQ # 1467551
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> >                http://www.postgresql.org/docs/faqs/FAQ.html
> >
> --------------------
>
> Andrew Rawnsley
> President
> The Ravensfield Digital Resource Group, Ltd.
> (740) 587-0114
> www.ravensfield.com
>
--
Dave Cramer
519 939 0336
ICQ # 1467551

Re: PreparedStatement parameters and mutable objects

From

Barry Lind

Date:

09 January 2004, 13:07:10

Kris Jurka wrote:
>
> On Fri, 9 Jan 2004, Oliver Jowett wrote:
>
>
>>>I'm saying that doing it this way will likely expose buggy code, which
>>>we will end up having to figure out why it doesn't work, when the user
>>>says "my code used to work, and now it doesn't", plus they don't send us
>>>code to look at.
>>
>>How far do we go to support buggy code though? If we can't make this
>>sort of change, we lose several opportunities for optimization.
>
>
> I don't think you can label this as buggy code unless you can point to the
> spec and say where it is disallowed.  It is certainly something that looks
> dangerous and is unlikely to be written by the average developer, but that
> doesn't make it illegal.  I lean towards the notion that when I say setXXX
> that's the time the value must be saved because that's certainly more
> intuitive and specific than "sometime later when the driver decides to."
>

A counter argument is that by definition the setXXXStream() methods do
it the other way.  The spec defines them (IIRC) to be read at execute
time.  There purpose after all is to allow the driver not to load the
entire contents of the stream into memory, but to simply stream it to
the server when executing the statement.  So are the setXXXStream()
methods a special case, or is the intention of the spec to be consistent
and have all values bound at execute time.

--Barry

Re: PreparedStatement parameters and mutable objects

From

Andrew Rawnsley

Date:

09 January 2004, 13:16:05

Ah, yes, my bad, tested wrong. Whenever I use the word 'idiot' in post
it ends up pointing back to me...

Oracle produces {1,2,3} with the following:

        byte[] b = new byte[] {(byte)1,(byte)2,(byte)3};
        stmnt.setInt(1,1);
        stmnt.setBytes(2,b);
        b[2] = (byte) 33;
        stmnt.executeUpdate();

So its doing it at the actual 'setXXX' call.



On Jan 9, 2004, at 9:37 AM, Dave Cramer wrote:

> Andrew,
>
> What happens in Oracle if he second setBytes is not called?
> does it still insert {1,2,33} ?
>
> I think my biggest concern is Andrew's last statement:
> "Ambiguity tends to have to favor the idiot, unfortunately"
>
>
> Dave
>
> On Fri, 2004-01-09 at 09:28, Andrew Rawnsley wrote:
>> If anyone is looking for behavior from others, Oracle (my apologies
>> for
>> using the 'O' word...) behaves like this:
>>
>>         byte[] b = new byte[] {(byte)1,(byte)2,(byte)3};
>>         stmnt.setBytes(1,b);
>>         b[2] = (byte) 33;
>>         stmnt.setBytes(1,b);
>>         stmnt.execute();
>>
>> results in {1,2,33}. (Disclaimer - I'm not advocating something just
>> because 'Oracle does it this way')
>>
>> Why someone would do this is, of course, I don't know.
>>
>> While I am all for the idea of improved performance, I find myself
>> forced to agree with the idea that, while something
>> like the above is kooky, it isn't illegal, and falls into the realm of
>> expected (but not necessarily mandated) behavior from a driver.
>> Ambiguity tends to have to favor the idiot, unfortunately.
>>
>> On Jan 9, 2004, at 8:46 AM, Dave Cramer wrote:
>>
>>> Well there is some hint that this is incorrect:
>>>
>>>
>>> pstmnt = con.prepareStatment("insert into foo values(?,?)");
>>>
>>> pstmnt.setString(1,"abcd");
>>>
>>> for( i=0; i< 5; i++)
>>> {
>>>     Integer intx = new Integer(i);
>>>     pstmnt.setInt(2,intx)
>>>     pstmnt.executeUpdate();
>>> }
>>>
>>> The above code should insert ("abcd", 0 .. 4 ) into five rows
>>>
>>> The point being that the value for setXXX does not change until
>>> setXXX
>>> is called.
>>>
>>> This is from section 24.1.2 in the JDBC api tutorial
>>>
>>> In all I'm sort of sitting on the fence here. I guess we could try it
>>> and see how many people get tripped up?
>>>
>>> Dave
>>>
>>> On Fri, 2004-01-09 at 03:18, Kris Jurka wrote:
>>>> On Fri, 9 Jan 2004, Oliver Jowett wrote:
>>>>
>>>>>> I'm saying that doing it this way will likely expose buggy code,
>>>>>> which
>>>>>> we will end up having to figure out why it doesn't work, when the
>>>>>> user
>>>>>> says "my code used to work, and now it doesn't", plus they don't
>>>>>> send us
>>>>>> code to look at.
>>>>>
>>>>> How far do we go to support buggy code though? If we can't make
>>>>> this
>>>>> sort of change, we lose several opportunities for optimization.
>>>>
>>>> I don't think you can label this as buggy code unless you can point
>>>> to the
>>>> spec and say where it is disallowed.  It is certainly something that
>>>> looks
>>>> dangerous and is unlikely to be written by the average developer,
>>>> but
>>>> that
>>>> doesn't make it illegal.  I lean towards the notion that when I say
>>>> setXXX
>>>> that's the time the value must be saved because that's certainly
>>>> more
>>>> intuitive and specific than "sometime later when the driver decides
>>>> to."
>>>>
>>>> Kris Jurka
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 4: Don't 'kill -9' the postmaster
>>>>
>>> --
>>> Dave Cramer
>>> 519 939 0336
>>> ICQ # 1467551
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 5: Have you checked our extensive FAQ?
>>>
>>>                http://www.postgresql.org/docs/faqs/FAQ.html
>>>
>> --------------------
>>
>> Andrew Rawnsley
>> President
>> The Ravensfield Digital Resource Group, Ltd.
>> (740) 587-0114
>> www.ravensfield.com
>>
> --
> Dave Cramer
> 519 939 0336
> ICQ # 1467551
>
--------------------

Andrew Rawnsley
President
The Ravensfield Digital Resource Group, Ltd.
(740) 587-0114
www.ravensfield.com

Re: PreparedStatement parameters and mutable objects

From

Oliver Jowett

Date:

11 January 2004, 18:41:09

Dave Cramer wrote:
> Well there is some hint that this is incorrect:
>
>
> pstmnt = con.prepareStatment("insert into foo values(?,?)");
>
> pstmnt.setString(1,"abcd");
>
> for( i=0; i< 5; i++)
> {
>     Integer intx = new Integer(i);
>     pstmnt.setInt(2,intx)
>     pstmnt.executeUpdate();
> }
>
> The above code should insert ("abcd", 0 .. 4 ) into five rows
>
> The point being that the value for setXXX does not change until setXXX
> is called.

Well, this is a different case -- Integer is immutable, so when the
driver transforms it to a DB representation is irrelevant. (I assume you
mean setObject(), above, since setInt() expects a primitive int..).

Even if the implication of the above code is that a parameter doesn't
change from what you last set it to on execution, that doesn't really
help us answer the question .. so you don't set it, and have the same
mutable object reference, but that's no different to the case where you
set it to a new mutable object reference .. it could still mutate before
execution.

If we were using e.g. setBytes() the analogous case is something like this:

   byte[] data = new byte[10];
   // (1)
   for (int i = 0; i < 5; ++i) {
      data[0] = i;
      pstmt.setBytes(2, data);       // (2)
      pstmt.executeUpdate();
   }

If binding was guaranteed to happen at execution you could get away with
a setBytes() at (1) and remove (2). But I think the above code is the
only way to be safe under all drivers.

The above case is relatively boring as 'data' does not change between
setBytes() and executeUpdate() anyway. More interesting is this one..

   byte[] data = new byte[10];
   for (int i = 0; i < 5; ++i) {
      data[0] = i;
      pstmt.setBytes(2, data);
      pstmt.addBatch();
   }
   pstmt.executeBatch();

I think this is the case that is more likely to bite us. "set
parameters, modify mutable objects, execute" seems like an unlikely
pattern to use in real code. "reuse mutable objects, set parameters, add
to batch, loop" seems more likely.

I'm still in favour of an "undefined behaviour" interpretation here.
There's not much benefit to application code in nailing down one
behaviour or the other, and leaving it undefined gives the driver the
flexibility to do whichever is a better implementation for the DB in
question.

I might send a query to the JDBC expert group about this. Anyone have a
better contact address than the spec feedback address? (the Sun
JDBC-INTEREST list seems to be pretty useless, it seems to be the "where
can I find an oracle driver?" mailing list)

-O

Re: PreparedStatement parameters and mutable objects

From

Kris Jurka

Date:

11 January 2004, 19:53:05

On Mon, 12 Jan 2004, Oliver Jowett wrote:

> I'm still in favour of an "undefined behaviour" interpretation here.
> There's not much benefit to application code in nailing down one
> behaviour or the other, and leaving it undefined gives the driver the
> flexibility to do whichever is a better implementation for the DB in
> question.
>

The question that has yet been unanswered is how much taking advantage of
the "undefined behavior" will get us.  You personally seem most interested
in the setBytes() case because it is relevent to your application and it
can potentially be quite large.  I don't know how much this would gain on
say a Date object.  For your particular problem it seems you could simply
wrap the byte array in question inside a ByteArrayInputStream (which does
not copy it) and use setBinaryStream which would allow the delayed reading
of it.

Kris Jurka

Re: PreparedStatement parameters and mutable objects

From

Paul Thomas

Date:

11 January 2004, 19:58:13

On 11/01/2004 22:40 Oliver Jowett wrote:
> [snip]
> I'm still in favour of an "undefined behaviour" interpretation here.
> There's not much benefit to application code in nailing down one
> behaviour or the other, and leaving it undefined gives the driver the
> flexibility to do whichever is a better implementation for the DB in
> question.

Having followed this very interesting thread, I'm still wondering exactly
how much measurable improvement could be achieved. I read an article on
IBM developerWorks (sorry can't remember the URL) which stated that, on
modern VMs, things like object creation aren't the performance bogeys that
they once were. So I'm thinking that before we make a decision about
committing to a change which might break someones app, is there any way by
which we could measure the effects of the proposed change?
> I might send a query to the JDBC expert group about this.

I, for one, would be interested to know.


--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller
Business |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

Re: PreparedStatement parameters and mutable objects

From

Oliver Jowett

Date:

11 January 2004, 20:39:54

Paul Thomas wrote:
>
> On 11/01/2004 22:40 Oliver Jowett wrote:
>
>> [snip]
>> I'm still in favour of an "undefined behaviour" interpretation here.
>> There's not much benefit to application code in nailing down one
>> behaviour or the other, and leaving it undefined gives the driver the
>> flexibility to do whichever is a better implementation for the DB in
>> question.
>
>
> Having followed this very interesting thread, I'm still wondering
> exactly how much measurable improvement could be achieved. I read an
> article on IBM developerWorks (sorry can't remember the URL) which
> stated that, on modern VMs, things like object creation aren't the
> performance bogeys that they once were. So I'm thinking that before we
> make a decision about committing to a change which might break someones
> app, is there any way by which we could measure the effects of the
> proposed change?

The problem is that it's very specific to the application workload; what
case do we measure?

The reason I'm interested in doing this for is not the direct CPU
overhead of object creation (none of the JDBC code is on our main
execution path), but the effect that object creation has on GC interval
and pause. We're running a low-latency app where an extra 50ms pause due
to GC has a large impact on our latency figures .. so the less garbage
we generate in the first place, the better. We could farm the JDBC work
out to a separate VM, but that gets complex quite fast.

Aside from the change in behaviour, the change I'm proposing has no real
downside I can see -- it's not a CPU-vs-memory tradeoff, we're just
generating fewer intermediate copies altogether. More of a code
complexity-vs-runtime cost tradeoff I suppose.

-O

Re: PreparedStatement parameters and mutable objects

From

Oliver Jowett

Date:

11 January 2004, 20:50:09

Kris Jurka wrote:
> On Mon, 12 Jan 2004, Oliver Jowett wrote:
>
>
>>I'm still in favour of an "undefined behaviour" interpretation here.
>>There's not much benefit to application code in nailing down one
>>behaviour or the other, and leaving it undefined gives the driver the
>>flexibility to do whichever is a better implementation for the DB in
>>question.
>>
>
>
> The question that has yet been unanswered is how much taking advantage of
> the "undefined behavior" will get us.  You personally seem most interested
> in the setBytes() case because it is relevent to your application and it
> can potentially be quite large.  I don't know how much this would gain on
> say a Date object.  For your particular problem it seems you could simply
> wrap the byte array in question inside a ByteArrayInputStream (which does
> not copy it) and use setBinaryStream which would allow the delayed reading
> of it.

Yeah, modifying the driver to support setBinaryStream better is my
second choice (the current implementation just delegates to setBytes).
Most of that work overlaps with setBytes() though, it'll be almost as
easy to do both. And wrapping the array just to have the driver unwrap
it again seems a bit perverse :)

I'm interested in the official word on this anyway since even if it's
not useful to implement for some types, it's still a correctness issue.

-O

Re: PreparedStatement parameters and mutable objects

From

Dave Cramer

Date:

12 January 2004, 08:55:00

Oliver,

I think finding the "offical" word on this is going to be unlikely, at
best it will be someone's opinion. It's not in the spec so it will be an
interpretation. I think more important is meeting the expected behaviour
from the users POV.

That being said, my example showing mutable objects and the expected
behaviour was just that an example, I think the behaviour should be the
same for mutable/un-mutable objects. I would expect there would be a
population of programmers out there that isn't even aware of the fact
that some objects are un-mutable/mutable. Unfortunately we need to code
to the lowest common denominator.

Dave

On Sun, 2004-01-11 at 19:49, Oliver Jowett wrote:
> Kris Jurka wrote:
> > On Mon, 12 Jan 2004, Oliver Jowett wrote:
> >
> >
> >>I'm still in favour of an "undefined behaviour" interpretation here.
> >>There's not much benefit to application code in nailing down one
> >>behaviour or the other, and leaving it undefined gives the driver the
> >>flexibility to do whichever is a better implementation for the DB in
> >>question.
> >>
> >
> >
> > The question that has yet been unanswered is how much taking advantage of
> > the "undefined behavior" will get us.  You personally seem most interested
> > in the setBytes() case because it is relevent to your application and it
> > can potentially be quite large.  I don't know how much this would gain on
> > say a Date object.  For your particular problem it seems you could simply
> > wrap the byte array in question inside a ByteArrayInputStream (which does
> > not copy it) and use setBinaryStream which would allow the delayed reading
> > of it.
>
> Yeah, modifying the driver to support setBinaryStream better is my
> second choice (the current implementation just delegates to setBytes).
> Most of that work overlaps with setBytes() though, it'll be almost as
> easy to do both. And wrapping the array just to have the driver unwrap
> it again seems a bit perverse :)
>
> I'm interested in the official word on this anyway since even if it's
> not useful to implement for some types, it's still a correctness issue.
>
> -O
>
--
Dave Cramer
519 939 0336
ICQ # 1467551

Re: PreparedStatement parameters and mutable objects

From

Paul Thomas

Date:

12 January 2004, 09:26:35

On 12/01/2004 00:39 Oliver Jowett wrote:
> Paul Thomas wrote:
>> Having followed this very interesting thread, I'm still wondering
>> exactly how much measurable improvement could be achieved. I read an
>> article on IBM developerWorks (sorry can't remember the URL) which
>> stated that, on modern VMs, things like object creation aren't the
>> performance bogeys that they once were. So I'm thinking that before we
>> make a decision about committing to a change which might break someones
>> app, is there any way by which we could measure the effects of the
>> proposed change?
>
> The problem is that it's very specific to the application workload; what
> case do we measure?

I think you would need something which closely models you app's work load.

> The reason I'm interested in doing this for is not the direct CPU
> overhead of object creation (none of the JDBC code is on our main
> execution path), but the effect that object creation has on GC interval
> and pause. We're running a low-latency app where an extra 50ms pause due
> to GC has a large impact on our latency figures .. so the less garbage
> we generate in the first place, the better. We could farm the JDBC work
> out to a separate VM, but that gets complex quite fast.

Reducing the amount of garbage created is never a bad thing. Do you have
an estimate of the % reduction you think you could achieve?

>
> Aside from the change in behaviour, the change I'm proposing has no real
> downside I can see -- it's not a CPU-vs-memory tradeoff, we're just
> generating fewer intermediate copies altogether. More of a code
> complexity-vs-runtime cost tradeoff I suppose.

Sounds fine to me.

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller
Business |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

Re: PreparedStatement parameters and mutable objects

From

Oliver Jowett

Date:

12 January 2004, 18:45:14

Dave Cramer wrote:
> Oliver,
>
> I think finding the "offical" word on this is going to be unlikely, at
> best it will be someone's opinion. It's not in the spec so it will be an
> interpretation. I think more important is meeting the expected behaviour
> from the users POV.

I've sent an email asking for clarification to the feedback address in
the 3.0 specification. Hopefully whatever response I get will make it
into future versions of the JDBC specification.

> That being said, my example showing mutable objects and the expected
> behaviour was just that an example, I think the behaviour should be the
> same for mutable/un-mutable objects. I would expect there would be a
> population of programmers out there that isn't even aware of the fact
> that some objects are un-mutable/mutable. Unfortunately we need to code
> to the lowest common denominator.

Ouch. Pass-by-reference semantics are pretty fundamental to Java. Do we
really have to assume that our users aren't aware of this?

BTW the behaviour for mutable vs. immutable objects *will* be the same
.. it's just that for immutable objects, they don't happen to expose an
interface that allows you to change their value (that's why you call
them immutable ;). The reference semantics are identical.

-O

Re: PreparedStatement parameters and mutable objects

From

Oliver Jowett

Date:

12 January 2004, 22:03:16

Paul Thomas wrote:
> On 12/01/2004 00:39 Oliver Jowett wrote:
>> The problem is that it's very specific to the application workload;
>> what case do we measure?
>
> I think you would need something which closely models you app's work load.
>
>> The reason I'm interested in doing this for is not the direct CPU
>> overhead of object creation (none of the JDBC code is on our main
>> execution path), but the effect that object creation has on GC
>> interval and pause. We're running a low-latency app where an extra
>> 50ms pause due to GC has a large impact on our latency figures .. so
>> the less garbage we generate in the first place, the better. We could
>> farm the JDBC work out to a separate VM, but that gets complex quite
>> fast.
>
>
> Reducing the amount of garbage created is never a bad thing. Do you have
> an estimate of the % reduction you think you could achieve?

See the attached testcase, it does batched inserts of byte arrays. I've
run this as:

   java -Xrunhprof:heap=all,file=insertgarbage.txt,depth=10,cutoff=0 \
      -verbose:gc -cp .:/net/java/java_libs/postgresql.jar \
      TestInsertGarbage \
      'jdbc:postgresql:rhino?user=oliver&password=oliver' \
      100000 100 100

which does 100k inserts each with a 100 byte array, in batches of 100.
This is roughly the same as our application workload when it is doing a
DB resync of a reasonably-sized provisioning database.

In the resulting heap dump, the allocation hotspots are:

  QueryExecutor.sendQueryV2 -> Encoding.encode -> String.getBytes
   (100000 byte[], 143738400 bytes)
   (100000 byte[], 37144800 bytes)
   (100000 byte[], 20000000 bytes) // This is encoding the query text
  Statement.setBytes -> setString -> escapeString -> StringBuffer.append
   (63000 char[], 81608000 bytes)
   (19200 char[], 25018400 bytes)
   (17800 char[], 23091200 bytes)
  Statement.setBytes -> setString -> StringBuffer.ensureCapacity
   (100000 char[], 65445600 bytes)
  Statement.setBytes -> PGbytea.toPGString -> StringBuffer.<init>
   (100000 char[], 41600000 bytes)
  Statement.setBytes -> PGbytea.toPGString -> StringBuffer.append
   (21800 char[], 17788800 bytes)
   (21400 char[], 17462400 bytes)
   (21300 char[], 17380800 bytes)
   (20800 char[], 16972800 bytes)
   (14400 char[], 11750400 bytes)
  QueryExecutor.executeV2 -> Jdbc3PreparedStatement.createResultSet
   (100000 Jdbc3ResultSet, 13600000 bytes)

All other allocation points allocate <10MB. There's quite a bit of other
object allocation that could be cleaned up, e.g. repeated allocation of
QueryExecutor objects.

Total allocations were around 631800000 bytes. I estimate that streaming
the parameter data would generate about 499000000 bytes less garbage
(basically all of the above goes away except for encoding the query text
and creating the resultsets), which is a 79% reduction (this is actually
a lot more than I expected!).

There are improvements we can make here without streaming, though. Not
creating resultset objects for updates, and doing all necessary escaping
in a single pass rather than two, both seem like low-hanging fruit.

-O
import java.sql.*;

public class TestInsertGarbage {
    public static void main(String args[]) throws Exception {
        if (args.length < 4) {
            System.err.println("usage: java TestInsertGarbage <jdbc: URL> <# of inserts> <array size> <batch size>");
            return;
        }

        String driverURL = args[0];
        int numInserts = Integer.parseInt(args[1]);
        int arraySize = Integer.parseInt(args[2]);
        int batchSize = Integer.parseInt(args[3]);

        // We don't want to count the data to be inserted itself, since that's
        // mostly static and isn't garbage. So just create a single array
        // to represent all the data. We'll reuse that on each insert.

        // we also give it a random distribution of values since that affects
        // the exact length needed to escape the value.
        byte[] dummyArray = new byte[arraySize];

        Class.forName("org.postgresql.Driver");
        Connection conn = DriverManager.getConnection(driverURL);

        // Schema setup.
        Statement stmt = conn.createStatement();
        try {
            stmt.executeUpdate("DROP TABLE test_insert_garbage");
        } catch (SQLException e) {}
        stmt.executeUpdate("CREATE TABLE test_insert_garbage(data bytea)");

        // Testcase itself.
        conn.setAutoCommit(false);
        PreparedStatement inserter =
            conn.prepareStatement("INSERT INTO test_insert_garbage(data) VALUES (?)");

        // Main loop.
        java.util.Random randomizer = new java.util.Random();
        for (int i = 0; i < numInserts; ) {
            randomizer.nextBytes(dummyArray); // Hee hee.
            for (int j = 0; j < batchSize && i < numInserts; ++i, ++j) {
                inserter.setBytes(1, dummyArray);
                inserter.addBatch();
            }
            inserter.executeBatch();
            System.err.println("Inserted " + i + " rows");
        }

        // Done.
        conn.commit();
        System.err.println("Committed transaction.");
    }
}

Re: PreparedStatement parameters and mutable objects

From

Dave Cramer

Date:

12 January 2004, 22:08:08

On Mon, 2004-01-12 at 17:44, Oliver Jowett wrote:
> Dave Cramer wrote:
> > Oliver,
> >
> > I think finding the "offical" word on this is going to be unlikely, at
> > best it will be someone's opinion. It's not in the spec so it will be an
> > interpretation. I think more important is meeting the expected behaviour
> > from the users POV.
>
> I've sent an email asking for clarification to the feedback address in
> the 3.0 specification. Hopefully whatever response I get will make it
> into future versions of the JDBC specification.
Ok, lets see what they say.
>
> > That being said, my example showing mutable objects and the expected
> > behaviour was just that an example, I think the behaviour should be the
> > same for mutable/un-mutable objects. I would expect there would be a
> > population of programmers out there that isn't even aware of the fact
> > that some objects are un-mutable/mutable. Unfortunately we need to code
> > to the lowest common denominator.
>
> Ouch. Pass-by-reference semantics are pretty fundamental to Java. Do we
> really have to assume that our users aren't aware of this?

This is essentially what Andrew Rawnsley was referring to in his post
where he said
"Ambiguity tends to have to favor the idiot, unfortunately"

>
> BTW the behaviour for mutable vs. immutable objects *will* be the same
> .. it's just that for immutable objects, they don't happen to expose an
> interface that allows you to change their value (that's why you call
> them immutable ;). The reference semantics are identical.
>
> -O
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
--
Dave Cramer
519 939 0336
ICQ # 1467551

Re: PreparedStatement parameters and mutable objects

From

Dave Cramer

Date:

13 January 2004, 08:35:39

Oliver,


OK, now how do we test your estimates? If it truly shows this kind of
improvement you would sway me!

Dave
On Mon, 2004-01-12 at 21:02, Oliver Jowett wrote:
> Paul Thomas wrote:
> > On 12/01/2004 00:39 Oliver Jowett wrote:
> >> The problem is that it's very specific to the application workload;
> >> what case do we measure?
> >
> > I think you would need something which closely models you app's work load.
> >
> >> The reason I'm interested in doing this for is not the direct CPU
> >> overhead of object creation (none of the JDBC code is on our main
> >> execution path), but the effect that object creation has on GC
> >> interval and pause. We're running a low-latency app where an extra
> >> 50ms pause due to GC has a large impact on our latency figures .. so
> >> the less garbage we generate in the first place, the better. We could
> >> farm the JDBC work out to a separate VM, but that gets complex quite
> >> fast.
> >
> >
> > Reducing the amount of garbage created is never a bad thing. Do you have
> > an estimate of the % reduction you think you could achieve?
>
> See the attached testcase, it does batched inserts of byte arrays. I've
> run this as:
>
>    java -Xrunhprof:heap=all,file=insertgarbage.txt,depth=10,cutoff=0 \
>       -verbose:gc -cp .:/net/java/java_libs/postgresql.jar \
>       TestInsertGarbage \
>       'jdbc:postgresql:rhino?user=oliver&password=oliver' \
>       100000 100 100
>
> which does 100k inserts each with a 100 byte array, in batches of 100.
> This is roughly the same as our application workload when it is doing a
> DB resync of a reasonably-sized provisioning database.
>
> In the resulting heap dump, the allocation hotspots are:
>
>   QueryExecutor.sendQueryV2 -> Encoding.encode -> String.getBytes
>    (100000 byte[], 143738400 bytes)
>    (100000 byte[], 37144800 bytes)
>    (100000 byte[], 20000000 bytes) // This is encoding the query text
>   Statement.setBytes -> setString -> escapeString -> StringBuffer.append
>    (63000 char[], 81608000 bytes)
>    (19200 char[], 25018400 bytes)
>    (17800 char[], 23091200 bytes)
>   Statement.setBytes -> setString -> StringBuffer.ensureCapacity
>    (100000 char[], 65445600 bytes)
>   Statement.setBytes -> PGbytea.toPGString -> StringBuffer.<init>
>    (100000 char[], 41600000 bytes)
>   Statement.setBytes -> PGbytea.toPGString -> StringBuffer.append
>    (21800 char[], 17788800 bytes)
>    (21400 char[], 17462400 bytes)
>    (21300 char[], 17380800 bytes)
>    (20800 char[], 16972800 bytes)
>    (14400 char[], 11750400 bytes)
>   QueryExecutor.executeV2 -> Jdbc3PreparedStatement.createResultSet
>    (100000 Jdbc3ResultSet, 13600000 bytes)
>
> All other allocation points allocate <10MB. There's quite a bit of other
> object allocation that could be cleaned up, e.g. repeated allocation of
> QueryExecutor objects.
>
> Total allocations were around 631800000 bytes. I estimate that streaming
> the parameter data would generate about 499000000 bytes less garbage
> (basically all of the above goes away except for encoding the query text
> and creating the resultsets), which is a 79% reduction (this is actually
> a lot more than I expected!).
>
> There are improvements we can make here without streaming, though. Not
> creating resultset objects for updates, and doing all necessary escaping
> in a single pass rather than two, both seem like low-hanging fruit.
>
> -O
>
> ______________________________________________________________________
>
> import java.sql.*;
>
> public class TestInsertGarbage {
>     public static void main(String args[]) throws Exception {
>         if (args.length < 4) {
>             System.err.println("usage: java TestInsertGarbage <jdbc: URL> <# of inserts> <array size> <batch size>");
>             return;
>         }
>
>         String driverURL = args[0];
>         int numInserts = Integer.parseInt(args[1]);
>         int arraySize = Integer.parseInt(args[2]);
>         int batchSize = Integer.parseInt(args[3]);
>
>         // We don't want to count the data to be inserted itself, since that's
>         // mostly static and isn't garbage. So just create a single array
>         // to represent all the data. We'll reuse that on each insert.
>
>         // we also give it a random distribution of values since that affects
>         // the exact length needed to escape the value.
>         byte[] dummyArray = new byte[arraySize];
>
>         Class.forName("org.postgresql.Driver");
>         Connection conn = DriverManager.getConnection(driverURL);
>
>         // Schema setup.
>         Statement stmt = conn.createStatement();
>         try {
>             stmt.executeUpdate("DROP TABLE test_insert_garbage");
>         } catch (SQLException e) {}
>         stmt.executeUpdate("CREATE TABLE test_insert_garbage(data bytea)");
>
>         // Testcase itself.
>         conn.setAutoCommit(false);
>         PreparedStatement inserter =
>             conn.prepareStatement("INSERT INTO test_insert_garbage(data) VALUES (?)");
>
>         // Main loop.
>         java.util.Random randomizer = new java.util.Random();
>         for (int i = 0; i < numInserts; ) {
>             randomizer.nextBytes(dummyArray); // Hee hee.
>             for (int j = 0; j < batchSize && i < numInserts; ++i, ++j) {
>                 inserter.setBytes(1, dummyArray);
>                 inserter.addBatch();
>             }
>             inserter.executeBatch();
>             System.err.println("Inserted " + i + " rows");
>         }
>
>         // Done.
>         conn.commit();
>         System.err.println("Committed transaction.");
>     }
> }
>
> ______________________________________________________________________
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
--
Dave Cramer
519 939 0336
ICQ # 1467551