Thread: Fun little performance IMPROVEMENT...

Fun little performance IMPROVEMENT...

From
grant@amadensor.com
Date:
I was doing a little testing to see how machine load affected the
performance of different types of queries, index range scans, hash joins,
full scans, a mix, etc.

In order to do this, I isolated different performance hits, spinning only
CPU, loading the disk to create high I/O wait states, and using most of
the physical memory.   This was on a 4 CPU Xen virtual machine running
8.1.22 on CENTOS.


Here is the fun part.   When running 8 threads spinning calculating square
roots (using the stress package), the full scan returned consistently 60%
faster than the machine with no load.   It was returning 44,000 out of
5,000,000 rows.   Here is the explain analyze.   I am hoping that this
triggers something (I can run more tests as needed) that can help us make
it always better.

Idling:
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on schedule_details  (cost=0.00..219437.90 rows=81386 width=187)
(actual time=0.053..2915.966 rows=44320 loops=1)
   Filter: (schedule_type = '5X'::bpchar)
 Total runtime: 2986.764 ms

Loaded:
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on schedule_details  (cost=0.00..219437.90 rows=81386 width=187)
(actual time=0.034..1698.068 rows=44320 loops=1)
   Filter: (schedule_type = '5X'::bpchar)
 Total runtime: 1733.084 ms






Re: Fun little performance IMPROVEMENT...

From
Andy Colson
Date:
On 1/21/2011 12:12 PM, grant@amadensor.com wrote:
> I was doing a little testing to see how machine load affected the
> performance of different types of queries, index range scans, hash joins,
> full scans, a mix, etc.
>
> In order to do this, I isolated different performance hits, spinning only
> CPU, loading the disk to create high I/O wait states, and using most of
> the physical memory.   This was on a 4 CPU Xen virtual machine running
> 8.1.22 on CENTOS.
>
>
> Here is the fun part.   When running 8 threads spinning calculating square
> roots (using the stress package), the full scan returned consistently 60%
> faster than the machine with no load.   It was returning 44,000 out of
> 5,000,000 rows.   Here is the explain analyze.   I am hoping that this
> triggers something (I can run more tests as needed) that can help us make
> it always better.
>
> Idling:
>                                                           QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on schedule_details  (cost=0.00..219437.90 rows=81386 width=187)
> (actual time=0.053..2915.966 rows=44320 loops=1)
>     Filter: (schedule_type = '5X'::bpchar)
>   Total runtime: 2986.764 ms
>
> Loaded:
>                                                           QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on schedule_details  (cost=0.00..219437.90 rows=81386 width=187)
> (actual time=0.034..1698.068 rows=44320 loops=1)
>     Filter: (schedule_type = '5X'::bpchar)
>   Total runtime: 1733.084 ms
>

Odd.  Did'ja by chance run the select more than once... maybe three or
four times, and always get the same (or close) results?

Is the stress package running niced?

-Andy

Re: Fun little performance IMPROVEMENT...

From
Tom Lane
Date:
grant@amadensor.com writes:
> Here is the fun part.   When running 8 threads spinning calculating square
> roots (using the stress package), the full scan returned consistently 60%
> faster than the machine with no load.

Possibly the synchronized-seqscans logic kicking in, resulting in this
guy not having to do all his own I/Os.  It would be difficult to make
any trustworthy conclusions about performance in such cases from a view
of only one process's results --- you'd need to look at the aggregate
behavior to understand what's happening.

            regards, tom lane

Re: Fun little performance IMPROVEMENT...

From
Greg Smith
Date:
grant@amadensor.com wrote:
> This was on a 4 CPU Xen virtual machine running
> 8.1.22 on CENTOS.
>

You're not going to get anyone to spend a minute trying to figure what's
happening on virtual hardware with an ancient version of PostgreSQL.  If
this was an actual full test case against PostgreSQL 8.4 or later on a
physical machine, it might be possible to draw some conclusions about it
that impact current PostgreSQL development.  Note where 8.1 is on
http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy for
example.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Fun little performance IMPROVEMENT...

From
grant@amadensor.com
Date:
>
> Odd.  Did'ja by chance run the select more than once... maybe three or
> four times, and always get the same (or close) results?
>
> Is the stress package running niced?
>
The stress package is not running niced.  I ran it initially 5 times each.
  It was very consistent.  Initially, I just ran everything to files.
Later when I looked over it, I was confused, so tried it again, several
times on each, with very little deviation, and the process with the CPU
stressed always being faster.

The only deviation, which is understandable, was that the first run of
anything after memory stress (using 7G of the available 8G). was slow as
it swapped back in, so I did a swapoff/swapon to clear up swap, and still
got the same results.



Re: Fun little performance IMPROVEMENT...

From
grant@amadensor.com
Date:
> grant@amadensor.com writes:
>> Here is the fun part.   When running 8 threads spinning calculating
>> square
>> roots (using the stress package), the full scan returned consistently
>> 60%
>> faster than the machine with no load.
>
> Possibly the synchronized-seqscans logic kicking in, resulting in this
> guy not having to do all his own I/Os.  It would be difficult to make
> any trustworthy conclusions about performance in such cases from a view
> of only one process's results --- you'd need to look at the aggregate
> behavior to understand what's happening.
>
>             regards, tom lane
>
My though was that either:

1)  It was preventing some other I/O or memory intensive process from
happening, opening the resources up.
2)  It was keeping the machine busy from the hypervisor's point of view,
preventing it from waiting for a slot on the host machine.
3)  The square roots happen quickly, resulting in more yields, and
therefore more time slices for my process than if the system was in its
idle loop.

Any way you look at it, it is fun and interesting that a load can make
something unrelated happen more quickly.   I will continue to try to find
out why it is the case.



Re: Fun little performance IMPROVEMENT...

From
Scott Carey
Date:

On 1/21/11 12:23 PM, "grant@amadensor.com" <grant@amadensor.com> wrote:

>> grant@amadensor.com writes:
>>> Here is the fun part.   When running 8 threads spinning calculating
>>> square
>>> roots (using the stress package), the full scan returned consistently
>>> 60%
>>> faster than the machine with no load.
>>
>> Possibly the synchronized-seqscans logic kicking in, resulting in this
>> guy not having to do all his own I/Os.  It would be difficult to make
>> any trustworthy conclusions about performance in such cases from a view
>> of only one process's results --- you'd need to look at the aggregate
>> behavior to understand what's happening.
>>
>>             regards, tom lane
>>
>My though was that either:
>
>1)  It was preventing some other I/O or memory intensive process from
>happening, opening the resources up.
>2)  It was keeping the machine busy from the hypervisor's point of view,
>preventing it from waiting for a slot on the host machine.

My guess is its something hypervisor related.   If this happened on direct
hardware I'd be more surprised.  Hypervisors have all sorts of stuff going
on, like throttling the number of CPU cycles a vm gets.  In your idle
case, your VM might effectively occupy 1Ghz of a CPU, but 2Ghz in the
loaded case.

>3)  The square roots happen quickly, resulting in more yields, and
>therefore more time slices for my process than if the system was in its
>idle loop.
>
>Any way you look at it, it is fun and interesting that a load can make
>something unrelated happen more quickly.   I will continue to try to find
>out why it is the case.
>
>
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


Re: Fun little performance IMPROVEMENT...

From
grant@amadensor.com
Date:
>
> Odd.  Did'ja by chance run the select more than once... maybe three or
> four times, and always get the same (or close) results?
>
> Is the stress package running niced?
>
> -Andy
>
I got a little crazy, and upgraded the DB to 8.4.5.   It still reacts the
same.

I am hoping someone has an idea of a metric I can run to see why it is
different.


Re: Fun little performance IMPROVEMENT...

From
grant@amadensor.com
Date:
>
> My guess is its something hypervisor related.   If this happened on direct
> hardware I'd be more surprised.  Hypervisors have all sorts of stuff going
> on, like throttling the number of CPU cycles a vm gets.  In your idle
> case, your VM might effectively occupy 1Ghz of a CPU, but 2Ghz in the
> loaded case.
>
I will be building a new machine this weekend on bare hardware.   It won't
be very big on specs, but this is only 5 million rows, so it should be
fine.   I will try it there.


Re: Fun little performance IMPROVEMENT...

From
Ivan Voras
Date:
On 21/01/2011 19:12, grant@amadensor.com wrote:
> I was doing a little testing to see how machine load affected the
> performance of different types of queries, index range scans, hash joins,
> full scans, a mix, etc.
>
> In order to do this, I isolated different performance hits, spinning only
> CPU, loading the disk to create high I/O wait states, and using most of
> the physical memory.   This was on a 4 CPU Xen virtual machine running
> 8.1.22 on CENTOS.
>
>
> Here is the fun part.   When running 8 threads spinning calculating square
> roots (using the stress package), the full scan returned consistently 60%
> faster than the machine with no load.   It was returning 44,000 out of
> 5,000,000 rows.   Here is the explain analyze.   I am hoping that this
> triggers something (I can run more tests as needed) that can help us make
> it always better.

Looks like a virtualization artifact. Here's a list of some such noticed
artifacts:

http://wiki.freebsd.org/WhyNotBenchmarkUnderVMWare

>
> Idling:
>                                                           QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on schedule_details  (cost=0.00..219437.90 rows=81386 width=187)
> (actual time=0.053..2915.966 rows=44320 loops=1)
>     Filter: (schedule_type = '5X'::bpchar)
>   Total runtime: 2986.764 ms
>
> Loaded:
>                                                           QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on schedule_details  (cost=0.00..219437.90 rows=81386 width=187)
> (actual time=0.034..1698.068 rows=44320 loops=1)
>     Filter: (schedule_type = '5X'::bpchar)
>   Total runtime: 1733.084 ms

In this case it looks like the IO generated by the VM is causing the
Hypervisor to frequently "sleep" the machine while waiting for the IO,
but if the machine is also generating CPU load, it is not put to sleep
as often.