Thread: PostgreSQL and Xeon MP

PostgreSQL and Xeon MP

From
"Guillaume Smet"
Date:
Hello,

We are experiencing performances problem with a quad Xeon MP and
PostgreSQL 7.4 for a year now. Our context switch rate is not so high
but the load of the server is blocked to 4 even on very high load and
we have 60% cpu idle even in this case. Our database fits in RAM and
we don't have any IO problem. I saw this post from Tom Lane
http://archives.postgresql.org/pgsql-performance/2004-04/msg00249.php
and several other references to problem with Xeon MP and I suspect our
problems are related to this.
We tried to put our production load on a dual standard Xeon on monday
and it performs far better with the same configuration parameters.

I know that work has been done by Tom for PostgreSQL 8.1 on
multiprocessor support but I didn't find any information on if it
solves the problem with Xeon MP or not.

My question is should we expect a resolution of our problem by
switching to 8.1 or will we still have problems and should we consider
a hardware change? We will try to upgrade next tuesday so we will have
the real answer soon but if anyone has any experience or information
on this, he will be very welcome.

Thanks for your help.

--
Guillaume

Re: PostgreSQL and Xeon MP

From
Richard Huxton
Date:
Guillaume Smet wrote:
> Hello,
>
> We are experiencing performances problem with a quad Xeon MP and
> PostgreSQL 7.4 for a year now.

I had a similar issue with  a client the other week.

> Our context switch rate is not so high
> but the load of the server is blocked to 4 even on very high load and
> we have 60% cpu idle even in this case. Our database fits in RAM and
> we don't have any IO problem.

Actually, I think that's part of the problem - it's the memory bandwidth.

 > I saw this post from Tom Lane
> http://archives.postgresql.org/pgsql-performance/2004-04/msg00249.php
> and several other references to problem with Xeon MP and I suspect our
> problems are related to this.

You should be seeing context-switching jump dramatically if it's the
"classic" multi-Xeon problem. There's a point at which it seems to just
escalate without a corresponding jump in activity.

> We tried to put our production load on a dual standard Xeon on monday
> and it performs far better with the same configuration parameters.
>
> I know that work has been done by Tom for PostgreSQL 8.1 on
> multiprocessor support but I didn't find any information on if it
> solves the problem with Xeon MP or not.

I checked with Tom last week. Thread starts below:
   http://archives.postgresql.org/pgsql-hackers/2006-02/msg01118.php

He's of the opinion that 8.1.3 will be an improvement.

> My question is should we expect a resolution of our problem by
> switching to 8.1 or will we still have problems and should we consider
> a hardware change? We will try to upgrade next tuesday so we will have
> the real answer soon but if anyone has any experience or information
> on this, he will be very welcome.

--
   Richard Huxton
   Archonet Ltd

Re: PostgreSQL and Xeon MP

From
"Guillaume Smet"
Date:
Richard,

> You should be seeing context-switching jump dramatically if it's the
> "classic" multi-Xeon problem. There's a point at which it seems to just
> escalate without a corresponding jump in activity.

No we don't have this problem of very high context switching in our
case even when the database is very slow. When I mean very slow, we
have pages which loads in a few seconds in the normal case (load
between 3 and 4) which takes several minutes (up to 5-10 minutes) to
be generated in the worst case (load at 4 but really bad
performances).
If I take a look on our cpu load graph, in one year, the cpu load was
never higher than 5 even in the worst cases...

> I checked with Tom last week. Thread starts below:
>    http://archives.postgresql.org/pgsql-hackers/2006-02/msg01118.php
>
> He's of the opinion that 8.1.3 will be an improvement.

Thanks for pointing me this thread, I searched in -performance not in
-hackers as the original thread was in -performance. We planned a
migration to 8.1.3 so we'll see what happen with this version.

Do you plan to test it before next tuesday? If so, I'm interested in
your results. I'll post our results here as soon as we complete the
upgrade.

--
Guillaume

Re: PostgreSQL and Xeon MP

From
Sven Geisler
Date:
Hi Guillaume,

I had a similar issue last summer. Could you please provide details
about your XEON MP server and some statistics (context-switches/load/CPU
usage)?

I tried different servers (x86) with different results. I saw a
difference between XEON MP w/ and w/o EMT64. The memory bandwidth makes
also a difference.

What version of XEON MP does your server have?
Which type of RAM does you server have?
Do you use Hyperthreading?

You should provide details from the XEON DP?

Regards
Sven.

Guillaume Smet schrieb:
> Richard,
>
>> You should be seeing context-switching jump dramatically if it's the
>> "classic" multi-Xeon problem. There's a point at which it seems to just
>> escalate without a corresponding jump in activity.
>
> No we don't have this problem of very high context switching in our
> case even when the database is very slow. When I mean very slow, we
> have pages which loads in a few seconds in the normal case (load
> between 3 and 4) which takes several minutes (up to 5-10 minutes) to
> be generated in the worst case (load at 4 but really bad
> performances).
> If I take a look on our cpu load graph, in one year, the cpu load was
> never higher than 5 even in the worst cases...
>
>> I checked with Tom last week. Thread starts below:
>>    http://archives.postgresql.org/pgsql-hackers/2006-02/msg01118.php
>>
>> He's of the opinion that 8.1.3 will be an improvement.
>
> Thanks for pointing me this thread, I searched in -performance not in
> -hackers as the original thread was in -performance. We planned a
> migration to 8.1.3 so we'll see what happen with this version.
>
> Do you plan to test it before next tuesday? If so, I'm interested in
> your results. I'll post our results here as soon as we complete the
> upgrade.
>
> --
> Guillaume
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
/This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you are not the intended recipient, you should not
copy it, re-transmit it, use it or disclose its contents, but should
return it to the sender immediately and delete your copy from your
system. Thank you for your cooperation./

Sven Geisler <sgeisler@aeccom.com> Tel +49.30.5362.1627 Fax .1638
Senior Developer,    AEC/communications GmbH    Berlin,   Germany

Re: PostgreSQL and Xeon MP

From
Richard Huxton
Date:
Guillaume Smet wrote:
> Richard,
>
>> You should be seeing context-switching jump dramatically if it's the
>> "classic" multi-Xeon problem. There's a point at which it seems to just
>> escalate without a corresponding jump in activity.
>
> No we don't have this problem of very high context switching in our
> case even when the database is very slow. When I mean very slow, we
> have pages which loads in a few seconds in the normal case (load
> between 3 and 4) which takes several minutes (up to 5-10 minutes) to
> be generated in the worst case (load at 4 but really bad
> performances).

Very strange.

> If I take a look on our cpu load graph, in one year, the cpu load was
> never higher than 5 even in the worst cases...
>
>> I checked with Tom last week. Thread starts below:
>>    http://archives.postgresql.org/pgsql-hackers/2006-02/msg01118.php
>>
>> He's of the opinion that 8.1.3 will be an improvement.
>
> Thanks for pointing me this thread, I searched in -performance not in
> -hackers as the original thread was in -performance. We planned a
> migration to 8.1.3 so we'll see what happen with this version.
>
> Do you plan to test it before next tuesday? If so, I'm interested in
> your results. I'll post our results here as soon as we complete the
> upgrade.

The client has just bought an Opteron to run on, I'm afraid. I might try
8.1 on the Xeon but it'll just be to see what happens and that won't be
for a while.

--
   Richard Huxton
   Archonet Ltd

Re: PostgreSQL and Xeon MP

From
"Guillaume Smet"
Date:
On 3/16/06, Richard Huxton <dev@archonet.com> wrote:
> Very strange.

Sure. I can't find any logical explanation for that but it is the
behaviour we have for more than a year now (the site was migrated from
Oracle to PostgreSQL on january 2005).
We check iostat, vmstat and so on without any hint on why we have this
behaviour.

> The client has just bought an Opteron to run on, I'm afraid. I might try
> 8.1 on the Xeon but it'll just be to see what happens and that won't be
> for a while.

I don't think it will be an option for us so I will have more
information next week.

Re: PostgreSQL and Xeon MP

From
"Guillaume Smet"
Date:
Sven,

On 3/16/06, Sven Geisler <sgeisler@aeccom.com> wrote:
> What version of XEON MP does your server have?

The server is a dell 6650 from end of 2004 with 4 xeon mp 2.2 and 2MB
cache per proc.

Here are the information from Dell:
4x PROCESSOR, 80532, 2.2GHZ, 2MB cache, 400Mhz, SOCKET F
8x DUAL IN-LINE MEMORY MODULE, 512MB, 266MHz

> Do you use Hyperthreading?

No, we don't use it.

> You should provide details from the XEON DP?

The only problem is that the Xeon DP is installed with a 2.6 kernel
and a postgresql 8.1.3 (it is used to test the migration from 7.4 to
8.1.3). So it's very difficult to really compare the two behaviours.

It's a Dell 2850 with:
2 x PROCESSOR, 80546K, 2.8G, 1MB cache, XEON NOCONA, 800MHz
4 x DUAL IN-LINE MEMORY MODULE, 1GB, 400MHz

This server is obviously newer than the other one.

--
Guillaume

Re: PostgreSQL and Xeon MP

From
"Guillaume Smet"
Date:
On 3/16/06, Sven Geisler <sgeisler@aeccom.com> wrote:
> Hi Guillaume,
>
> I had a similar issue last summer. Could you please provide details
> about your XEON MP server and some statistics (context-switches/load/CPU
> usage)?

I forgot the statistics:
CPU load usually from 1 to 4.
CPU usage < 40% for each processor usually and sometimes when the
server completely hangs, it grows to 60%..,

Here is a top output of the server at this time:
 15:21:17  up 138 days, 13:25,  1 user,  load average: 1.29, 1.25, 1.38
82 processes: 81 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   25.7%    0.0%    3.9%   0.0%     0.3%    0.1%   69.7%
           cpu00   29.3%    0.0%    4.7%   0.1%     0.5%    0.0%   65.0%
           cpu01   20.7%    0.0%    1.9%   0.0%     0.3%    0.0%   76.8%
           cpu02   25.5%    0.0%    5.5%   0.0%     0.1%    0.3%   68.2%
           cpu03   27.3%    0.0%    3.3%   0.0%     0.1%    0.1%   68.8%
Mem:  3857224k av, 3298580k used,  558644k free,       0k shrd,  105172k buff
                   2160124k actv,  701304k in_d,   56400k in_c
Swap: 4281272k av,    6488k used, 4274784k free                 2839348k cached

We have currently between 3000 and 13000 context switches/s, average
of 5000 I'd say visually.

Here is a top output I had on november 17 when the server completely
hangs (several minutes for each page of the website) and it is typical
of this server behaviour:
17:08:41  up 19 days, 15:16,  1 user,  load average: 4.03, 4.26, 4.36
288 processes: 285 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   59.0%    0.0%    8.8%   0.2%     0.0%    0.0%   31.9%
           cpu00   52.3%    0.0%   13.3%   0.9%     0.0%    0.0%   33.3%
           cpu01   65.7%    0.0%    7.6%   0.0%     0.0%    0.0%   26.6%
           cpu02   58.0%    0.0%    7.6%   0.0%     0.0%    0.0%   34.2%
           cpu03   60.0%    0.0%    6.6%   0.0%     0.0%    0.0%   33.3%
Mem:  3857224k av, 3495880k used,  361344k free,       0k shrd,   92160k buff
                   2374048k actv,  463576k in_d,   37708k in_c
Swap: 4281272k av,   25412k used, 4255860k free                 2173392k cached

As you can see, load is blocked to 4, no iowait and cpu idle of 30%.

Vmstat showed 5000 context switches/s on average so we had no context
switch storm.

Re: PostgreSQL and Xeon MP

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> Here is a top output I had on november 17 when the server completely
> hangs (several minutes for each page of the website) and it is typical
> of this server behaviour:
> 17:08:41  up 19 days, 15:16,  1 user,  load average: 4.03, 4.26, 4.36
> 288 processes: 285 sleeping, 3 running, 0 zombie, 0 stopped
> CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
>            total   59.0%    0.0%    8.8%   0.2%     0.0%    0.0%   31.9%
>            cpu00   52.3%    0.0%   13.3%   0.9%     0.0%    0.0%   33.3%
>            cpu01   65.7%    0.0%    7.6%   0.0%     0.0%    0.0%   26.6%
>            cpu02   58.0%    0.0%    7.6%   0.0%     0.0%    0.0%   34.2%
>            cpu03   60.0%    0.0%    6.6%   0.0%     0.0%    0.0%   33.3%
> Mem:  3857224k av, 3495880k used,  361344k free,       0k shrd,   92160k buff
>                    2374048k actv,  463576k in_d,   37708k in_c
> Swap: 4281272k av,   25412k used, 4255860k free                 2173392k cached

> As you can see, load is blocked to 4, no iowait and cpu idle of 30%.

Can you try strace'ing some of the backend processes while the system is
behaving like this?  I suspect what you'll find is a whole lot of
delaying select() calls due to high contention for spinlocks ...

            regards, tom lane

Re: PostgreSQL and Xeon MP

From
Sven Geisler
Date:
Hi Guillaume,

Guillaume Smet schrieb:
>
> The server is a dell 6650 from end of 2004 with 4 xeon mp 2.2 and 2MB
> cache per proc.
>
> Here are the information from Dell:
> 4x PROCESSOR, 80532, 2.2GHZ, 2MB cache, 400Mhz, SOCKET F
> 8x DUAL IN-LINE MEMORY MODULE, 512MB, 266MHz
>
....
>
>> You should provide details from the XEON DP?
>
> The only problem is that the Xeon DP is installed with a 2.6 kernel
> and a postgresql 8.1.3 (it is used to test the migration from 7.4 to
> 8.1.3). So it's very difficult to really compare the two behaviours.
>
> It's a Dell 2850 with:
> 2 x PROCESSOR, 80546K, 2.8G, 1MB cache, XEON NOCONA, 800MHz
> 4 x DUAL IN-LINE MEMORY MODULE, 1GB, 400MHz
>

Did you compare 7.4 on a 4-way with 8.1 on a 2-way?
How many queries and clients did you use to test the performance?
How much faster is the XEON DP?

I think, you can expect that your XEON DP is faster on a single query
because CPU and RAM are faster. The overall performance can be better on
your XEON DP if you only have a few clients.

I guess, the newer hardware and the newer PostgreSQL version cause the
better performance.

Regards
Sven.

Re: PostgreSQL and Xeon MP

From
"Guillaume Smet"
Date:
On 3/16/06, Sven Geisler <sgeisler@aeccom.com> wrote:
> Did you compare 7.4 on a 4-way with 8.1 on a 2-way?

I know there are too many parameters changing between the two servers
but I can't really change anything before tuesday. On tuesday, we will
be able to compare both servers with the same software.

> How many queries and clients did you use to test the performance?

Googlebot is indexing this site generating 2-3 mbits/s of traffic so
we use the googlebot to stress this server. There was a lot of clients
and a lot of queries.

> How much faster is the XEON DP?

Well, on high load, PostgreSQL scales well on the DP (load at 40,
queries slower but still performing well) and is awfully slow on the
MP box.

Re: PostgreSQL and Xeon MP

From
"Guillaume Smet"
Date:
On 3/16/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Can you try strace'ing some of the backend processes while the system is
> behaving like this?  I suspect what you'll find is a whole lot of
> delaying select() calls due to high contention for spinlocks ...

Tom,

I think we can try to do it.

You mean strace -p pid with pid on some of the postgres process not on
the postmaster itself, does you? Do we need other options?
Which pattern should we expect? I'm not really familiar with strace
and its output.

Thanks for your help.

Re: PostgreSQL and Xeon MP

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> You mean strace -p pid with pid on some of the postgres process not on
> the postmaster itself, does you?

Right, pick a couple that are accumulating CPU time.

> Do we need other options?

strace will generate a *whole lot* of output to stderr.  I usually do
something like
    strace -p pid 2>outfile
and then control-C it after a few seconds.

> Which pattern should we expect?

What we want to find out is if there's a lot of select()s and/or
semop()s shown in the result.  Ideally there wouldn't be any, but
I fear that's not what you'll find.

            regards, tom lane

Re: PostgreSQL and Xeon MP

From
Sven Geisler
Date:
Hi Guillaume,

Guillaume Smet schrieb:
>> How much faster is the XEON DP?
>
> Well, on high load, PostgreSQL scales well on the DP (load at 40,
> queries slower but still performing well) and is awfully slow on the
> MP box.

I know what you mean with awfully slow.
I think, your application is facing contention. The contention becomes
larger as more CPU you have. PostgreSQL 8.1 is addressing contention on
multiprocessor servers as you mentioned before.

I guess, you will see that your 4-way XEON MP isn't that bad if you
compare both servers with the same PostgreSQL version.

Regards
Sven.

Re: PostgreSQL and Xeon MP

From
"Guillaume Smet"
Date:
On 3/16/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What we want to find out is if there's a lot of select()s and/or
> semop()s shown in the result.  Ideally there wouldn't be any, but
> I fear that's not what you'll find.

OK, I'll try to do it on monday before our upgrade then see what
happens with PostgreSQL 8.1.3.

Thanks for your help.

Re: PostgreSQL and Xeon MP

From
Kenneth Marshall
Date:
On Thu, Mar 16, 2006 at 11:45:12AM +0100, Guillaume Smet wrote:
> Hello,
>
> We are experiencing performances problem with a quad Xeon MP and
> PostgreSQL 7.4 for a year now. Our context switch rate is not so high
> but the load of the server is blocked to 4 even on very high load and
> we have 60% cpu idle even in this case. Our database fits in RAM and
> we don't have any IO problem. I saw this post from Tom Lane
> http://archives.postgresql.org/pgsql-performance/2004-04/msg00249.php
> and several other references to problem with Xeon MP and I suspect our
> problems are related to this.
> We tried to put our production load on a dual standard Xeon on monday
> and it performs far better with the same configuration parameters.
>
> I know that work has been done by Tom for PostgreSQL 8.1 on
> multiprocessor support but I didn't find any information on if it
> solves the problem with Xeon MP or not.
>
> My question is should we expect a resolution of our problem by
> switching to 8.1 or will we still have problems and should we consider
> a hardware change? We will try to upgrade next tuesday so we will have
> the real answer soon but if anyone has any experience or information
> on this, he will be very welcome.
>
> Thanks for your help.
>
> --
> Guillaume
>

Guillaume,

We had a similar problem with poor performance on a Xeon DP and
PostgreSQL 7.4.x. 8.0 came out in time for preliminary testing but
it did not solve the problem and our production systems went live
using a different database product. We are currently testing against
8.1.x and the seemingly bizarre lack of performance is gone. I would
suspect that a quad-processor box would have the same issue. I would
definitely recommend giving 8.1 a try.

Ken

Re: PostgreSQL and Xeon MP

From
"Guillaume Smet"
Date:
On 3/16/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Can you try strace'ing some of the backend processes while the system is
> behaving like this?  I suspect what you'll find is a whole lot of
> delaying select() calls due to high contention for spinlocks ...

As announced, we have migrated our production server from 7.4.8 to
8.1.3 this morning. We did some strace'ing before the migration and
you were right on the select calls. We had a lot of them even when the
database was not highly loaded (one every 3-4 lines).

After the upgrade, we have the expected behaviour with a more linear
scalability and a growing cpu load when the database is highly loaded
(and no cpu idle anymore in this case). We have fewer context switches
too.

8.1.3 definitely is far better for quad Xeon MP and I recommend the
upgrade for everyone having this sort of problem.

Tom, thanks for your great work on this problem.

--
Guillaume