Thread: Performance problem with timestamps in result sets
Hi, while profiling an application of ours, I noticed a performance problem with the PostgreSQL JDBC driver (the same for any version: 8.0, 8.1, 8.2dev-501). In short, whenever the ResultSet contains a timestamp value, the JDBC driver takes nearly one second to extract it (this is against a 8.0 on Windows, database and application on the same machine; with a non-development configuration, the number would be a bit lower). The profiling shows that the culprit is -> AbstractJdbc2ResultSet.getTimestamp(int) : 7130ms | 8190 calls -> TimestampUtils.loadCalendar(GregorianCalendar, String, String) : 5417ms | 8190 calls -> TimestampUtils.firstNonDigit : 2163ms | 49140 times -> TimestampUtils.number(String, int, int) : 1562ms | 49140 times Now my question is, is there something to make this faster ? Nearly 1sec for a ResulSet.getTimestamp call is too much for us. cheers, Tom
Thomas, That can't possibly be ... Dave On 8-Mar-06, at 8:37 AM, Thomas Dudziak wrote: > Hi, > > while profiling an application of ours, I noticed a performance > problem with the PostgreSQL JDBC driver (the same for any version: > 8.0, 8.1, 8.2dev-501). > In short, whenever the ResultSet contains a timestamp value, the JDBC > driver takes nearly one second to extract it (this is against a 8.0 on > Windows, database and application on the same machine; with a > non-development configuration, the number would be a bit lower). > The profiling shows that the culprit is > > -> AbstractJdbc2ResultSet.getTimestamp(int) : 7130ms | 8190 calls > -> TimestampUtils.loadCalendar(GregorianCalendar, String, String) > : 5417ms | 8190 calls > -> TimestampUtils.firstNonDigit : 2163ms | 49140 times > -> TimestampUtils.number(String, int, int) : 1562ms | 49140 > times > > Now my question is, is there something to make this faster ? Nearly > 1sec for a ResulSet.getTimestamp call is too much for us. > > cheers, > Tom > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your > message can get through to the mailing list cleanly >
On 3/8/06, Dave Cramer <pg@fastcrypt.com> wrote: > That can't possibly be ... But it is: in profiling the getTimestamp method takes 10x longer than the getString or getInt methods. Tom
Thomas Dudziak wrote > On 3/8/06, Dave Cramer <pg@fastcrypt.com> wrote: > > > That can't possibly be ... > > But it is: in profiling the getTimestamp method takes 10x longer than > the getString or getInt methods. What profiler are you using? What are the settings? Profiling can add significant overhead... Best Regards, Michael Paesold
On 3/8/06, Michael Paesold <mpaesold@gmx.at> wrote: > What profiler are you using? What are the settings? > Profiling can add significant overhead... YourKit 5.1.1 (newest version) against Java 5 Update 6, with method timing and invocation counting (which makes it somewhat slower, I'd say 10-15%). The interesting figure is that getTimestamp takes 10x more than getString or getInt. Tom
Thomas Dudziak wrote: > -> AbstractJdbc2ResultSet.getTimestamp(int) : 7130ms | 8190 calls > > Now my question is, is there something to make this faster ? Nearly > 1sec for a ResulSet.getTimestamp call is too much for us. This should be 1 ms per getTimestamp call, shouldn't it? The time is the aggregate time for ~8000 calls. That is still rather slow, yeah, but it is so with profiling. Best Regards, Michael Paesold
On 3/8/06, Michael Paesold <mpaesold@gmx.at> wrote: > This should be 1 ms per getTimestamp call, shouldn't it? The time is the > aggregate time for ~8000 calls. That is still rather slow, yeah, but it is > so with profiling. Yep, but as I said, the others are much faster. E.g. getString takes 140ms for 5460 calls, and getInt 570ms for 10920 calls, so its probably not so much the profiler. Also, I was merely asking whether there is something that could be done to bring getTimestamp at least in the same region. cheers, Tom
Thomas Dudziak wrote: > In short, whenever the ResultSet contains a timestamp value, the JDBC > driver takes nearly one second to extract it > -> AbstractJdbc2ResultSet.getTimestamp(int) : 7130ms | 8190 calls > Now my question is, is there something to make this faster ? Nearly > 1sec for a ResulSet.getTimestamp call is too much for us. Your profiling numbers show about 0.9 milliseconds per call, not 1 second per call, if I am reading them right? That's still slower than I would have expected, though. I did no profiling at the time I rewrote the timestamp stuff to actually work correctly. Perhaps you'd like to profile in more detail and submit some patches? -O
On 3/8/06, Oliver Jowett <oliver@opencloud.com> wrote: > Your profiling numbers show about 0.9 milliseconds per call, not 1 > second per call, if I am reading them right? Yep, my mistake in the formulating the sentence. > That's still slower than I would have expected, though. I did no > profiling at the time I rewrote the timestamp stuff to actually work > correctly. Perhaps you'd like to profile in more detail and submit some > patches? Yep, I'll put together a testcase (can't give you my original profiling data because they are against our application, and way to complicated anyway) and give you the profiling data. cheers, Tom
Thomas Dudziak wrote: > On 3/8/06, Michael Paesold <mpaesold@gmx.at> wrote: > > >>This should be 1 ms per getTimestamp call, shouldn't it? The time is the >>aggregate time for ~8000 calls. That is still rather slow, yeah, but it is >>so with profiling. > > > Yep, but as I said, the others are much faster. E.g. getString takes > 140ms for 5460 calls, and getInt 570ms for 10920 calls, so its > probably not so much the profiler. getTimestamp() does many more internal method invocations than getInt() so the per-method overhead of your profiler will have a larger effect on getTimestamp(). Consider that some of the utility methods that getTimestamp() calls internally are very small and will be inlined by the JIT in normal operation.. I would try an external benchmark without profiling to get more comparable numbers: record start time, call getTimestamp() 1000000 times, record end time; repeat for getInt(). A warmup loop is a good idea too if you want to measure the JITted performance, 8000 invocations is not much in terms of what the JIT does. > Also, I was merely asking whether there is something that could be > done to bring getTimestamp at least in the same region. There is no simple go-faster change I know of. Then again I've not heard of any performance problems in this area before. If you do come up with some performance-improving changes, by all means post them to the list for integration into the official driver.. -O
On 3/8/06, mikael-aronsson <mikael-aronsson@telia.com> wrote: > I do not think there is much more to do, the timestamp is just a 64 bit > integer and you say an int takes 570ms/10920 calls and a timestamp takes > 7130ms/8190 calls, this is about twice as much to get twice as much data. No, thats about 16x slower (and most of the time is spent in the JDBC driver, not on the wire). Tom
I do not think there is much more to do, the timestamp is just a 64 bit integer and you say an int takes 570ms/10920 calls and a timestamp takes 7130ms/8190 calls, this is about twice as much to get twice as much data. Mikael ----- Original Message ----- From: "Thomas Dudziak" <tomdzk@gmail.com> To: "Michael Paesold" <mpaesold@gmx.at> Cc: <pgsql-jdbc@postgresql.org> Sent: Wednesday, March 08, 2006 3:57 PM Subject: Re: [JDBC] Performance problem with timestamps in result sets On 3/8/06, Michael Paesold <mpaesold@gmx.at> wrote: > This should be 1 ms per getTimestamp call, shouldn't it? The time is the > aggregate time for ~8000 calls. That is still rather slow, yeah, but it is > so with profiling. Yep, but as I said, the others are much faster. E.g. getString takes 140ms for 5460 calls, and getInt 570ms for 10920 calls, so its probably not so much the profiler. Also, I was merely asking whether there is something that could be done to bring getTimestamp at least in the same region. cheers, Tom ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster
On Wed, Mar 08, 2006 at 04:08:13PM +0100, mikael-aronsson wrote: > I do not think there is much more to do, the timestamp is just a 64 bit > integer and you say an int takes 570ms/10920 calls and a timestamp takes > 7130ms/8190 calls, this is about twice as much to get twice as much data. Eh? 7130 / 8190.0 = 0.8705738705738706 570 /10920.0 = 0.0521978021978022 Which is >16X difference. While may be as much as can be done, it's more than twice as much to get twice the data. -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
Hi, I've run the attached sample with these results (1.000.000 rows): no profiler: 7.850ms for the int's, 58.902ms for the timestamp's cpu profiling: 8.865ms for the int's, 60.357ms for the timestamp's cpu profiling with method count: 262.802ms for the ints, 695.188ms for the timestamps - getTimestamp(int) took 499.047ms for the 1.000.000 invocations with toTimestamp(Calendar, String) 378.854ms of that in contrast, getLong(int) took 169.703ms It does not seem linear (the profiler definitely has some influence, though a strange one). Tom
Attachment
mikael-aronsson wrote: > I do not think there is much more to do, the timestamp is just a 64 bit > integer and you say an int takes 570ms/10920 calls and a timestamp takes > 7130ms/8190 calls, this is about twice as much to get twice as much data. This is wrong, and not just because your maths is wrong ;-) Timestamps are not "just a 64 bit integer" on the wire. getTimestamp() is parsing a string representation of a timestamp (multiple numeric fields, separators, etc) and massaging it into a Timestamp in the right timezone (retriving the right Calendar and converting to a milliseconds-per-epoch value from the individual components and timezone info) getInt() is parsing a string representation of a single number. So I'd expect getTimestamp() to be noticably slower than getInt(), which is what we see. Going to binary-format results would help, as then you can transfer the underlying value directly rather than having the backend format it all then have the JDBC driver parse it, but noone has done that yet. One big obstacle is that binary format results are somewhat all-or-nothing, it's tricky to do it for only a single datatype. You'd also still have to do some timezone conversion work, I think. -O
As Oliver points out the timestamp is not a 64bit integer, or even a floatingpoint number. It is a textual representation of the timestamp which needs to be parsed. I looked at the parsing and I was unable to see anything that could be significantly optimized. So the option of going to the binary protocol exists. There are a number of challenges with this. As Oliver points out this is an all or nothing proposition. In other words you can't ask for just timestamps to be returned in binary. The entire row comes back as binary. Additionally, there are two possible representations of timestamps in postgresql. One is a 64 bit integer, the other is floating point. Added to this there may be endian issues ( do we know the answer to this question ?) Significant performance improvement exists for dates, times, timestamps, however the advantages for the rest of the types is questionable given the above assertions. Comments ? Dave