Thread: Question about speed: Weird Behavior

Question about speed: Weird Behavior

From
Israel Ben Guilherme Fonseca
Date:
I'm doing some speed tests, and I noticed something very strange.

My test code:

        Connection con = DriverManager.getConnection("jdbc:postgresql://localhost:5432/tcc","postgres","postgres");
        PreparedStatement pst;
        ResultSet rs;
        Timer timer = new Timer();
       
        timer.start();
        pst = con.prepareStatement("select count(*) from Curso");
        rs = pst.executeQuery();
        rs.next();
        timer.stop();
       
        timer.start();
        pst = con.prepareStatement("select * from Curso");
        rs = pst.executeQuery();
        rs.next();
        timer.stop();

At every stop, I print the milliseconds occurred between the begin and the end of the operation. The results were basically

1st  operation: 0.05 miliseconds
2nd operation: 0.1 miliseconds

That's ok, the strange thing is: When I comment the first operation and re-execute the test, the execution time of the 2nd operations increases by 50% (going to 0.15 miliseconds)

If i inverse the order of operations, the speed is changed again, the 2nd (now 1st) get the 50% percent increase and the count(*) get even faster going to 0.005 miliseconds.

At first glance, I thought that it could be the statement pre-compiled but the two statements are different so it doesn't apply (i think). Using the Wireshark tool i could notice that the package sizes get bigger automatically (going from 8258 to 16450 and finally to 49218). From my readings, the Java Sockets have some configurations about buffer size that maybe is doing this.

Even when I close the connection and open a new one, the new queries still get faster.

--

Anyway, that speed increases is definitely a good thing, but why this happens it's a mystery for me. The problem is that my benchmark tests are influenced with this, cause the order of the tests can influence the outcome of the total time.

Does someone know what's is this? And how could I disable this (if possible)?

Thanks in advance,

Israel

Re: Question about speed: Weird Behavior

From
Oliver Jowett
Date:
On 13 June 2011 15:14, Israel Ben Guilherme Fonseca
<israel.bgf@gmail.com> wrote:
> I'm doing some speed tests, and I noticed something very strange.

> Does someone know what's is this? And how could I disable this (if
> possible)?

Classloading and JIT overhead? Benchmarking on only one iteration is
fairly meaningless.

Oliver

Re: Question about speed: Weird Behavior

From
Radosław Smogura
Date:
Israel Ben Guilherme Fonseca <israel.bgf@gmail.com> Monday 13 of June 2011
05:14:07
> I'm doing some speed tests, and I noticed something very strange.
>
> My test code:
>
>         Connection con =
> DriverManager.getConnection("jdbc:postgresql://localhost:5432/tcc","postgre
> s","postgres"); PreparedStatement pst;
>         ResultSet rs;
>         Timer timer = new Timer();
>
>         timer.start();
>         pst = con.prepareStatement("select count(*) from Curso");
>         rs = pst.executeQuery();
>         rs.next();
>         timer.stop();
>
>         timer.start();
>         pst = con.prepareStatement("select * from Curso");
>         rs = pst.executeQuery();
>         rs.next();
>         timer.stop();
>
> At every stop, I print the milliseconds occurred between the begin and the
> end of the operation. The results were basically
>
> 1st  operation: 0.05 miliseconds
> 2nd operation: 0.1 miliseconds
>
> That's ok, the strange thing is: When I comment the first operation and
> re-execute the test, the execution time of the 2nd operations increases by
> 50% (going to 0.15 miliseconds)
>
> If i inverse the order of operations, the speed is changed again, the 2nd
> (now 1st) get the 50% percent increase and the count(*) get even faster
> going to 0.005 miliseconds.
It's not stranga, server may keep some data in buffers or in system cache.
Compare your times on only 1st operations, but invoke select * in terminal.

> At first glance, I thought that it could be the statement pre-compiled but
> the two statements are different so it doesn't apply (i think). Using the
> Wireshark tool i could notice that the package sizes get bigger
> automatically (going from 8258 to 16450 and finally to 49218). From my
> readings, the Java Sockets have some configurations about buffer size that
> maybe is doing this.
This is system autotuning - a good solution.

> Even when I close the connection and open a new one, the new queries still
> get faster.
System cache.

> --
>
> Anyway, that speed increases is definitely a good thing, but why this
> happens it's a mystery for me. The problem is that my benchmark tests are
> influenced with this, cause the order of the tests can influence the
> outcome of the total time.
>
> Does someone know what's is this? And how could I disable this (if
> possible)?
>
> Thanks in advance,
>
> Israel

Re: Question about speed: Weird Behavior

From
Craig Ringer
Date:
On 13/06/11 11:14, Israel Ben Guilherme Fonseca wrote:

> At every stop, I print the milliseconds occurred between the begin and
> the end of the operation. The results were basically
>
> 1st  operation: 0.05 miliseconds
> 2nd operation: 0.1 miliseconds

I suspect you're right on the border of the precision the timers you are
using can offer you. It's hard to be sure as you didn't mention you
platform, CPU, or JDK version.

Try using System.nanoTime() for your timing, and/or (much better) do
many iterations rather than just one to try to smooth out error and jitter.

http://download.oracle.com/javase/1,5.0/docs/api/java/lang/System.html#nanoTime()

> Anyway, that speed increases is definitely a good thing, but why this
> happens it's a mystery for me. The problem is that my benchmark tests
> are influenced with this, cause the order of the tests can influence the
> outcome of the total time.

You really need to tweak your benchmarks to reflect more of a real-world
workload. Timing tiny statements stand-alone will not give you data with
any relationship to what will happen under real load with many such
operations happening concurrently or thousands of them being executed in
rapid sequence.

--
Craig Ringer

Re: Question about speed: Weird Behavior

From
Israel Ben Guilherme Fonseca
Date:
The actual tests are executed many times, I just tried to make a simpler test to be easier to understand.

I executed this code some times to get the average of the result.

Other point:

Executing this code 10 times give different averages of executing the same code in 10 times loop. So the system cache is actually linked to the same JVM instance? At each separated invocation the values keep somewhat constant, differently from one execution with a loop.

I'll try the "select *" approach in the terminal later.


2011/6/13 Craig Ringer <craig@postnewspapers.com.au>
On 13/06/11 11:14, Israel Ben Guilherme Fonseca wrote:

> At every stop, I print the milliseconds occurred between the begin and
> the end of the operation. The results were basically
>
> 1st  operation: 0.05 miliseconds
> 2nd operation: 0.1 miliseconds

I suspect you're right on the border of the precision the timers you are
using can offer you. It's hard to be sure as you didn't mention you
platform, CPU, or JDK version.

Try using System.nanoTime() for your timing, and/or (much better) do
many iterations rather than just one to try to smooth out error and jitter.

http://download.oracle.com/javase/1,5.0/docs/api/java/lang/System.html#nanoTime()

> Anyway, that speed increases is definitely a good thing, but why this
> happens it's a mystery for me. The problem is that my benchmark tests
> are influenced with this, cause the order of the tests can influence the
> outcome of the total time.

You really need to tweak your benchmarks to reflect more of a real-world
workload. Timing tiny statements stand-alone will not give you data with
any relationship to what will happen under real load with many such
operations happening concurrently or thousands of them being executed in
rapid sequence.

--
Craig Ringer

Re: Question about speed: Weird Behavior

From
Radosław Smogura
Date:
 On Mon, 13 Jun 2011 08:31:41 -0300, Israel Ben Guilherme Fonseca wrote:
> The actual tests are executed many times, I just tried to make a
> simpler test to be easier to understand.
>
> I executed this code some times to get the average of the result.
>
> Other point:
>
> Executing this code 10 times give different averages of executing the
> same code in 10 times loop. So the system cache is actually linked to
> the same JVM instance?
 No system cache where PostgreSQL server resides, if you query database
 for something DB need to read disk, if those data are in cache then it
 will go faster.

 Of course whole image is much more complicated. There may be some
 fluctuations with JIT, as Craig wrote (actually you should take times
 after 1000 of invocations - if it's client JVM , or change compile
 threshold in JVM to be sure if JVM doesn't gives you overhead). There
 may be some GC overhead, you should call System.gc after or before each
 pass to start with similar environment. As well, some if you connect
 through TCP/IP this may randomize your results too.

 Regards,
 Radek

>At each separated invocation the values keep
> somewhat constant, differently from one execution with a loop.
>
> Ill try the "select *" approach in the terminal later.
>
> 2011/6/13 Craig Ringer
>
>> On 13/06/11 11:14, Israel Ben Guilherme Fonseca wrote:
>>
>> > At every stop, I print the milliseconds occurred between the
>> begin and
>> > the end of the operation. The results were basically
>> >
>> > 1st  operation: 0.05 miliseconds
>> > 2nd operation: 0.1 miliseconds
>>
>> I suspect youre right on the border of the precision the timers you
>> are
>> using can offer you. Its hard to be sure as you didnt mention you
>> platform, CPU, or JDK version.
>>
>> Try using System.nanoTime() for your timing, and/or (much better)
>> do
>> many iterations rather than just one to try to smooth out error and
>> jitter.
>>
>>
>
> http://download.oracle.com/javase/1,5.0/docs/api/java/lang/System.html#nanoTime()
>> [1]
>>
>> > Anyway, that speed increases is definitely a good thing, but why
>> this
>> > happens its a mystery for me. The problem is that my benchmark
>> tests
>> > are influenced with this, cause the order of the tests can
>> influence the
>> > outcome of the total time.
>>
>> You really need to tweak your benchmarks to reflect more of a
>> real-world
>> workload. Timing tiny statements stand-alone will not give you data
>> with
>> any relationship to what will happen under real load with many such
>> operations happening concurrently or thousands of them being
>> executed in
>> rapid sequence.
>>
>> --
>> Craig Ringer
>
>
>
> Links:
> ------
> [1]
>
> http://download.oracle.com/javase/1,5.0/docs/api/java/lang/System.html#nanoTime%28%29
> [2] mailto:craig@postnewspapers.com.au


Re: Question about speed: Weird Behavior

From
Craig Ringer
Date:
On 06/13/2011 08:11 PM, Radosław Smogura wrote:

> Of course whole image is much more complicated. There may be some
> fluctuations with JIT, as Craig wrote (actually you should take times
> after 1000 of invocations - if it's client JVM , or change compile
> threshold in JVM to be sure if JVM doesn't gives you overhead). There
> may be some GC overhead, you should call System.gc after or before each
> pass to start with similar environment. As well, some if you connect
> through TCP/IP this may randomize your results too.

Yep. Yet more good reasons to try to benchmark real-world tasks instead
of synthetic micro-benchmarks, which are *REALLY* hard to get right.

Israel Ben Guilherme Fonseca: Is there a certain number of repetitions
after which the numbers stop changing?

--
Craig Ringer

Re: Question about speed: Weird Behavior

From
Maciek Sakrejda
Date:
> There may be some GC
> overhead, you should call System.gc after or before each pass to start with
> similar environment.

Note that the contract of System.gc() is really more like "Hey, I
think it might be fun to run GC right now," than "You'll run GC right
now and you'll like it." In other words, it's technically only a
suggestion, so you should make sure it's actually doing something
before you start depending on it for performance numbers.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: Question about speed: Weird Behavior

From
Israel Ben Guilherme Fonseca
Date:
Thanks for all the answers,

Smogura: "This is system autotuning - a good solution."

Where can I find more about this subject, and how can I weak it? Is it hidden on postresql.conf? Can I configure it to always use an specifc size like 8258 bytes per-package?

I'm doing tests on other drivers of other programming languages to postgresql, and for those, there wasn't any package size differences, only with the Java tests. Maybe the others drivers are setting specific values for this.


2011/6/13 Maciek Sakrejda <msakrejda@truviso.com>
> There may be some GC
> overhead, you should call System.gc after or before each pass to start with
> similar environment.

Note that the contract of System.gc() is really more like "Hey, I
think it might be fun to run GC right now," than "You'll run GC right
now and you'll like it." In other words, it's technically only a
suggestion, so you should make sure it's actually doing something
before you start depending on it for performance numbers.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: Question about speed: Weird Behavior

From
Israel Ben Guilherme Fonseca
Date:
(Craig, this post answer your question: yes with few iterations the values get stable)

Here it is my new test:

http://pastebin.com/efK34FW1

I execute it passing the parameter "select" or "count", they define what statement will be used for "warm up". With this warm up, CPU/Java Initialization/random things, get less chance to affect the main test.

When I use the count, my timings get around of

100 miliseconds.

When I use the select it goes to

30 miliseconds.

I executed those tests many times with this bash script:

#!/bin/bash
for i in {1..20}
do
   eval "java -jar tests.jar $1"
done

The reason for this is probably because the size of the packages of the "count statement" aren't enough to trigger the "tunning" of automatically changing the size of the packets. With the select it does.

My question is simple, can I change this behavior somewhere? Is it a Java (Socket maybe) or Postgresql configuration?

Thanks in advance,

2011/6/13 Israel Ben Guilherme Fonseca <israel.bgf@gmail.com>
Thanks for all the answers,

Smogura: "This is system autotuning - a good solution."

Where can I find more about this subject, and how can I weak it? Is it hidden on postresql.conf? Can I configure it to always use an specifc size like 8258 bytes per-package?

I'm doing tests on other drivers of other programming languages to postgresql, and for those, there wasn't any package size differences, only with the Java tests. Maybe the others drivers are setting specific values for this.



2011/6/13 Maciek Sakrejda <msakrejda@truviso.com>
> There may be some GC
> overhead, you should call System.gc after or before each pass to start with
> similar environment.

Note that the contract of System.gc() is really more like "Hey, I
think it might be fun to run GC right now," than "You'll run GC right
now and you'll like it." In other words, it's technically only a
suggestion, so you should make sure it's actually doing something
before you start depending on it for performance numbers.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com


Re: Question about speed: Weird Behavior

From
Samuel Gendler
Date:
On Mon, Jun 13, 2011 at 6:07 PM, Israel Ben Guilherme Fonseca <israel.bgf@gmail.com> wrote:

When I use the count, my timings get around of

100 miliseconds.

When I use the select it goes to

30 miliseconds.


I don't know much about the internals of the protocol used to move data around for jdbc, but doesn't this actually make some sense?  If the query returns as soon as the first row has arrived, then select * should definitely be faster than select count(*), since count(*) can't return until the entire table has been scanned, whereas select * returns with the first row received.  That, at least, seems like a likely candidate for the difference between them, no?

--sam


Re: Question about speed: Weird Behavior

From
Radosław Smogura
Date:
 On Mon, 13 Jun 2011 14:12:24 -0300, Israel Ben Guilherme Fonseca wrote:
> Thanks for all the answers,
>
> Smogura: "This is system autotuning - a good solution."
>
> Where can I find more about this subject, and how can I weak it? Is
> it
> hidden on postresql.conf? Can I configure it to always use an specifc
> size like 8258 bytes per-package?
>
> Im doing tests on other drivers of other programming languages to
> postgresql, and for those, there wasnt any package size differences,
> only with the Java tests. Maybe the others drivers are setting
> specific values for this.
>
> 2011/6/13 Maciek Sakrejda
>
>>> There may be some GC
>> > overhead, you should call System.gc after or before each pass to
>> start with
>> > similar environment.
>>
>> Note that the contract of System.gc() is really more like "Hey, I
>> think it might be fun to run GC right now," than "Youll run GC
>> right
>> now and youll like it." In other words, its technically only a
>> suggestion, so you should make sure its actually doing something
>> before you start depending on it for performance numbers.
>>
>> ---
>> Maciek Sakrejda | System Architect | Truviso
>>
>> 1065 E. Hillsdale Blvd., Suite 215
>> Foster City, CA 94404
>> (650) 242-3500 Main
>> www.truviso.com [1]
>
>
>
> Links:
> ------
> [1] http://www.truviso.com
> [2] mailto:msakrejda@truviso.com
 Actually there is no way to tune it. On server side on Linux platforms
 kernel will adjust send/recv buffer for you and it, in many situations,
 does it good. On Windows I think this buffer (or send/recv) has fixed
 size. You may try to change some Windows's registry settings, or proc
 TCP/IP to decrease / increase max buffer size.

 On Client side send/recv buffer should be auto tuned too, as Java uses
 system sockets.

 Of course, PGSQL internally has small buffer, as sockets doesn't use
 TCP delay it size is 8k (I think it should be 16k).

 I don't think it's good to manually tune this buffers (actually I done
 two approaches for this), and system tuning is much more better.

 If you are looking for some information about this search "TCP/IP
 tuning" (you will get many how to determine good size of socket
 recv/send buffers), and as well few links for kernel auto tuning.

 Regards,
 Radosław Smogura

Re: Question about speed: Weird Behavior

From
Radosław Smogura
Date:
 On Mon, 13 Jun 2011 21:44:45 -0700, Samuel Gendler wrote:
> On Mon, Jun 13, 2011 at 6:07 PM, Israel Ben Guilherme Fonseca  wrote:
>
>> When I use the count, my timings get around of
>>
>> 100 miliseconds.
>>
>> When I use the select it goes to
>>
>> 30 miliseconds.
>
> I dont know much about the internals of the protocol used to move
> data
> around for jdbc, but doesnt this actually make some sense?  If the
> query returns as soon as the first row has arrived, then select *
> should definitely be faster than select count(*), since count(*) cant
> return until the entire table has been scanned, whereas select *
> returns with the first row received.  That, at least, seems like a
> likely candidate for the difference between them, no?
>
> --sam
>
>
>
> Links:
> ------
> [1] mailto:israel.bgf@gmail.com
 Did you ran your queries in above order or did you changed order of
 queries. Please bear in mind, that if you set fetch size, then cursor
 approch will be used and not all data will be transferred to client
 side.

 Regards,
 Radek

Re: Question about speed: Weird Behavior

From
Israel Ben Guilherme Fonseca
Date:
I followed the code order.

First warm up, and for last the select * from Curso

No fetch size settings. The code is only what I showed.

--

Another thing, I didn't say it, but if you notice my warm up code do a select in a different table (Pessoa), not Curso. It isn't scanning the Curso table, so it won't do any optimization.


I'll give a look in this TCP/IP autotunning.

--

In the psycopg2 postgresql driver (A python driver), the packages are with constant size (around 8000 bytes), with it the time value are totally constant, no matter if I do loops on the queries, If all this is controlled by kernel/postgres side, I keep wondering why this is so different of the java driver.

I think that I'll ask them too,

Thanks,

2011/6/14 Radosław Smogura <rsmogura@softperience.eu>
On Mon, 13 Jun 2011 21:44:45 -0700, Samuel Gendler wrote:
On Mon, Jun 13, 2011 at 6:07 PM, Israel Ben Guilherme Fonseca  wrote:


When I use the count, my timings get around of

100 miliseconds.

When I use the select it goes to

30 miliseconds.

I dont know much about the internals of the protocol used to move data
around for jdbc, but doesnt this actually make some sense?  If the

query returns as soon as the first row has arrived, then select *
should definitely be faster than select count(*), since count(*) cant

return until the entire table has been scanned, whereas select *
returns with the first row received.  That, at least, seems like a
likely candidate for the difference between them, no?

--sam



Links:
------
[1] mailto:israel.bgf@gmail.com
Did you ran your queries in above order or did you changed order of queries. Please bear in mind, that if you set fetch size, then cursor approch will be used and not all data will be transferred to client side.

Regards,
Radek