Thread: Performance problem with timestamps in result sets

Performance problem with timestamps in result sets

From
"Thomas Dudziak"
Date:
Hi,

while profiling an application of ours, I noticed a performance
problem with the PostgreSQL JDBC driver (the same for any version:
8.0, 8.1, 8.2dev-501).
In short, whenever the ResultSet contains a timestamp value, the JDBC
driver takes nearly one second to extract it (this is against a 8.0 on
Windows, database and application on the same machine; with a
non-development configuration, the number would be a bit lower).
The profiling shows that the culprit is

-> AbstractJdbc2ResultSet.getTimestamp(int) : 7130ms | 8190 calls
    -> TimestampUtils.loadCalendar(GregorianCalendar, String, String)
: 5417ms | 8190 calls
        -> TimestampUtils.firstNonDigit : 2163ms | 49140 times
        -> TimestampUtils.number(String, int, int) : 1562ms | 49140 times

Now my question is, is there something to make this faster ? Nearly
1sec for a ResulSet.getTimestamp call is too much for us.

cheers,
Tom

Re: Performance problem with timestamps in result sets

From
Dave Cramer
Date:
Thomas,

That can't possibly be ...

Dave
On 8-Mar-06, at 8:37 AM, Thomas Dudziak wrote:

> Hi,
>
> while profiling an application of ours, I noticed a performance
> problem with the PostgreSQL JDBC driver (the same for any version:
> 8.0, 8.1, 8.2dev-501).
> In short, whenever the ResultSet contains a timestamp value, the JDBC
> driver takes nearly one second to extract it (this is against a 8.0 on
> Windows, database and application on the same machine; with a
> non-development configuration, the number would be a bit lower).
> The profiling shows that the culprit is
>
> -> AbstractJdbc2ResultSet.getTimestamp(int) : 7130ms | 8190 calls
>     -> TimestampUtils.loadCalendar(GregorianCalendar, String, String)
> : 5417ms | 8190 calls
>         -> TimestampUtils.firstNonDigit : 2163ms | 49140 times
>         -> TimestampUtils.number(String, int, int) : 1562ms | 49140
> times
>
> Now my question is, is there something to make this faster ? Nearly
> 1sec for a ResulSet.getTimestamp call is too much for us.
>
> cheers,
> Tom
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that
> your
>        message can get through to the mailing list cleanly
>


Performance problem with timestamps in result sets

From
"Thomas Dudziak"
Date:
On 3/8/06, Dave Cramer <pg@fastcrypt.com> wrote:

> That can't possibly be ...

But it is: in profiling the getTimestamp method takes 10x longer than
the getString or getInt methods.

Tom

Re: Performance problem with timestamps in result sets

From
"Michael Paesold"
Date:
Thomas Dudziak wrote

> On 3/8/06, Dave Cramer <pg@fastcrypt.com> wrote:
>
> > That can't possibly be ...
>
> But it is: in profiling the getTimestamp method takes 10x longer than
> the getString or getInt methods.

What profiler are you using? What are the settings?
Profiling can add significant overhead...

Best Regards,
Michael Paesold



Performance problem with timestamps in result sets

From
"Thomas Dudziak"
Date:
On 3/8/06, Michael Paesold <mpaesold@gmx.at> wrote:

> What profiler are you using? What are the settings?
> Profiling can add significant overhead...

YourKit 5.1.1 (newest version) against Java 5 Update 6, with method
timing and invocation counting (which makes it somewhat slower, I'd
say 10-15%).
The interesting figure is that getTimestamp takes 10x more than
getString or getInt.

Tom

Re: Performance problem with timestamps in result sets

From
"Michael Paesold"
Date:
Thomas Dudziak wrote:

> -> AbstractJdbc2ResultSet.getTimestamp(int) : 7130ms | 8190 calls
>
> Now my question is, is there something to make this faster ? Nearly
> 1sec for a ResulSet.getTimestamp call is too much for us.

This should be 1 ms per getTimestamp call, shouldn't it? The time is the
aggregate time for ~8000 calls. That is still rather slow, yeah, but it is
so with profiling.

Best Regards,
Michael Paesold



Re: Performance problem with timestamps in result sets

From
"Thomas Dudziak"
Date:
On 3/8/06, Michael Paesold <mpaesold@gmx.at> wrote:

> This should be 1 ms per getTimestamp call, shouldn't it? The time is the
> aggregate time for ~8000 calls. That is still rather slow, yeah, but it is
> so with profiling.

Yep, but as I said, the others are much faster. E.g. getString takes
140ms for 5460 calls, and getInt 570ms for 10920 calls, so its
probably not so much the profiler.
Also, I was merely asking whether there is something that could be
done to bring getTimestamp at least in the same region.

cheers,
Tom

Re: Performance problem with timestamps in result sets

From
Oliver Jowett
Date:
Thomas Dudziak wrote:

> In short, whenever the ResultSet contains a timestamp value, the JDBC
> driver takes nearly one second to extract it

> -> AbstractJdbc2ResultSet.getTimestamp(int) : 7130ms | 8190 calls

> Now my question is, is there something to make this faster ? Nearly
> 1sec for a ResulSet.getTimestamp call is too much for us.

Your profiling numbers show about 0.9 milliseconds per call, not 1
second per call, if I am reading them right?

That's still slower than I would have expected, though. I did no
profiling at the time I rewrote the timestamp stuff to actually work
correctly. Perhaps you'd like to profile in more detail and submit some
patches?

-O

Re: Performance problem with timestamps in result sets

From
"Thomas Dudziak"
Date:
On 3/8/06, Oliver Jowett <oliver@opencloud.com> wrote:

> Your profiling numbers show about 0.9 milliseconds per call, not 1
> second per call, if I am reading them right?

Yep, my mistake in the formulating the sentence.

> That's still slower than I would have expected, though. I did no
> profiling at the time I rewrote the timestamp stuff to actually work
> correctly. Perhaps you'd like to profile in more detail and submit some
> patches?

Yep, I'll put together a testcase (can't give you my original
profiling data because they are against our application, and way to
complicated anyway) and give you the profiling data.

cheers,
Tom

Re: Performance problem with timestamps in result sets

From
Oliver Jowett
Date:
Thomas Dudziak wrote:
> On 3/8/06, Michael Paesold <mpaesold@gmx.at> wrote:
>
>
>>This should be 1 ms per getTimestamp call, shouldn't it? The time is the
>>aggregate time for ~8000 calls. That is still rather slow, yeah, but it is
>>so with profiling.
>
>
> Yep, but as I said, the others are much faster. E.g. getString takes
> 140ms for 5460 calls, and getInt 570ms for 10920 calls, so its
> probably not so much the profiler.

getTimestamp() does many more internal method invocations than getInt()
so the per-method overhead of your profiler will have a larger effect on
getTimestamp(). Consider that some of the utility methods that
getTimestamp() calls internally are very small and will be inlined by
the JIT in normal operation..

I would try an external benchmark without profiling to get more
comparable numbers: record start time, call getTimestamp() 1000000
times, record end time; repeat for getInt(). A warmup loop is a good
idea too if you want to measure the JITted performance, 8000 invocations
is not much in terms of what the JIT does.

> Also, I was merely asking whether there is something that could be
> done to bring getTimestamp at least in the same region.

There is no simple go-faster change I know of. Then again I've not heard
of any performance problems in this area before.

If you do come up with some performance-improving changes, by all means
post them to the list for integration into the official driver..

-O

Re: Performance problem with timestamps in result sets

From
"Thomas Dudziak"
Date:
On 3/8/06, mikael-aronsson <mikael-aronsson@telia.com> wrote:

> I do not think there is much more to do, the timestamp is just a 64 bit
> integer and you say an int takes 570ms/10920 calls and a timestamp takes
> 7130ms/8190 calls, this is about twice as much to get twice as much data.

No, thats about 16x slower (and most of the time is spent in the JDBC
driver, not on the wire).

Tom

Re: Performance problem with timestamps in result sets

From
"mikael-aronsson"
Date:
I do not think there is much more to do, the timestamp is just a 64 bit
integer and you say an int takes 570ms/10920 calls and a timestamp takes
7130ms/8190 calls, this is about twice as much to get twice as much data.

Mikael

----- Original Message -----
From: "Thomas Dudziak" <tomdzk@gmail.com>
To: "Michael Paesold" <mpaesold@gmx.at>
Cc: <pgsql-jdbc@postgresql.org>
Sent: Wednesday, March 08, 2006 3:57 PM
Subject: Re: [JDBC] Performance problem with timestamps in result sets


On 3/8/06, Michael Paesold <mpaesold@gmx.at> wrote:

> This should be 1 ms per getTimestamp call, shouldn't it? The time is the
> aggregate time for ~8000 calls. That is still rather slow, yeah, but it is
> so with profiling.

Yep, but as I said, the others are much faster. E.g. getString takes
140ms for 5460 calls, and getInt 570ms for 10920 calls, so its
probably not so much the profiler.
Also, I was merely asking whether there is something that could be
done to bring getTimestamp at least in the same region.

cheers,
Tom

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Re: Performance problem with timestamps in result sets

From
Steve Wampler
Date:
On Wed, Mar 08, 2006 at 04:08:13PM +0100, mikael-aronsson wrote:
> I do not think there is much more to do, the timestamp is just a 64 bit
> integer and you say an int takes 570ms/10920 calls and a timestamp takes
> 7130ms/8190 calls, this is about twice as much to get twice as much data.

Eh?  7130 / 8190.0 = 0.8705738705738706
      570 /10920.0 = 0.0521978021978022

Which is >16X difference.  While may be as much as can be done, it's
more than twice as much to get twice the data.

--
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

Re: Performance problem with timestamps in result sets

From
"Thomas Dudziak"
Date:
Hi,

I've run the attached sample with these results (1.000.000 rows):

no profiler: 7.850ms for the int's, 58.902ms for the timestamp's

cpu profiling: 8.865ms for the int's, 60.357ms for the timestamp's

cpu profiling with method count: 262.802ms for the ints, 695.188ms for
the timestamps

 - getTimestamp(int) took 499.047ms for the  1.000.000 invocations
   with toTimestamp(Calendar, String) 378.854ms of that
   in contrast, getLong(int) took 169.703ms

It does not seem linear (the profiler definitely has some influence,
though a strange one).

Tom

Attachment

Re: Performance problem with timestamps in result sets

From
Oliver Jowett
Date:
mikael-aronsson wrote:
> I do not think there is much more to do, the timestamp is just a 64 bit
> integer and you say an int takes 570ms/10920 calls and a timestamp takes
> 7130ms/8190 calls, this is about twice as much to get twice as much data.

This is wrong, and not just because your maths is wrong ;-)  Timestamps
are not "just a 64 bit integer" on the wire.

getTimestamp() is parsing a string representation of a timestamp
(multiple numeric fields, separators, etc) and massaging it into a
Timestamp in the right timezone (retriving the right Calendar and
converting to a milliseconds-per-epoch value from the individual
components and timezone info)

getInt() is parsing a string representation of a single number.

So I'd expect getTimestamp() to be noticably slower than getInt(), which
is what we see.

Going to binary-format results would help, as then you can transfer the
underlying value directly rather than having the backend format it all
then have the JDBC driver parse it, but noone has done that yet. One big
obstacle is that binary format results are somewhat all-or-nothing, it's
tricky to do it for only a single datatype. You'd also still have to do
some timezone conversion work, I think.

-O

binary protocol was Performance problem with timestamps in result sets

From
Dave Cramer
Date:
As Oliver points out the timestamp is not a 64bit integer, or even a
floatingpoint number. It is a textual representation of the timestamp
which needs to be parsed. I looked at the parsing and I was unable to
see anything that could be significantly optimized.

So the option of going to the binary protocol exists. There are a
number of challenges with this. As Oliver points out this is an all
or nothing proposition. In other words you can't ask for just
timestamps to be returned in binary. The entire row comes back as
binary. Additionally, there are two possible representations of
timestamps in postgresql. One is a 64 bit integer, the other is
floating point. Added to this there may be endian issues ( do we know
the answer to this question ?)

Significant performance improvement exists for dates, times,
timestamps, however the advantages for the rest of the types is
questionable given the above assertions.

Comments ?

Dave