Thread: zLinux Load Testing Experience

zLinux Load Testing Experience

From
Andrew Hastie
Date:
I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the
standard build tools from source. Everything appears run correctly.
However as part of performance testing, our IBM and Linux SysProgs have
been "poking around" using strace and have reported the following (which
they think is an error condition) when hooking up to the postmaster
processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 (Timeout)
... repeated many times

 From researching the archives, I "believe" the above to be "as
designed" and simply indicates the Postmaster is attempting to read data
from an IP socket which is timing out. Could I ask :-
   1. Is this "normal" ?
   2. if abnormal, any pointers as to where to start investigating

The reason they latched onto the postmaster process was due to a
perceived high CPU utilisation. For info, we are load testing with 100
connections being accessed from an IBm WebSphere hosted EJB based
application.

Many thanks,
Andrew


Re: zLinux Load Testing Experience

From
Jeff Janes
Date:
On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote:
I'm currently working on a project porting an application from RedHat Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at version 9.n, so the PostgreSQL binaries have been built using the standard build tools from source. Everything appears run correctly. However as part of performance testing, our IBM and Linux SysProgs have been "poking around" using strace and have reported the following (which they think is an error condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 (Timeout)
... repeated many times



That does not look like the postmaster process.  It looks like probably the background writer process.

It is normal, and doesn't explain high CPU utilization.


Cheers,

Jeff

Re: zLinux Load Testing Experience

From
Merlin Moncure
Date:
On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote:
>>
>> I'm currently working on a project porting an application from RedHat
>> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
>> version 9.n, so the PostgreSQL binaries have been built using the standard
>> build tools from source. Everything appears run correctly. However as part
>> of performance testing, our IBM and Linux SysProgs have been "poking around"
>> using strace and have reported the following (which they think is an error
>> condition) when hooking up to the postmaster processes:-
>>
>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout)
>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
>> (Timeout)
>> ... repeated many times
>>
>
>
> That does not look like the postmaster process.  It looks like probably the
> background writer process.
>
> It is normal, and doesn't explain high CPU utilization.

yeah: we're probably a couple of steps in front of deep system
profiling.   Helpful things to provide to help diagnose would be:

*) 'explain analyze' of the queries that are eating cpu
*) more details about the hardware -- how many cpu, etc.
*) better definition of 'perceived high CPU utilisation'
*) some correlating performance tests, expecially cpu bound pgbench
tests (pgbench -S)

merlin


Re: zLinux Load Testing Experience

From
Andrew Hastie
Date:
On 30/04/13 20:46, Merlin Moncure wrote:
> On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote:
>>> I'm currently working on a project porting an application from RedHat
>>> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
>>> version 9.n, so the PostgreSQL binaries have been built using the standard
>>> build tools from source. Everything appears run correctly. However as part
>>> of performance testing, our IBM and Linux SysProgs have been "poking around"
>>> using strace and have reported the following (which they think is an error
>>> condition) when hooking up to the postmaster processes:-
>>>
>>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
>>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout)
>>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
>>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
>>> (Timeout)
>>> ... repeated many times
>>>
>>
>> That does not look like the postmaster process.  It looks like probably the
>> background writer process.
>>
>> It is normal, and doesn't explain high CPU utilization.
> yeah: we're probably a couple of steps in front of deep system
> profiling.   Helpful things to provide to help diagnose would be:
>
> *) 'explain analyze' of the queries that are eating cpu
> *) more details about the hardware -- how many cpu, etc.
> *) better definition of 'perceived high CPU utilisation'
> *) some correlating performance tests, expecially cpu bound pgbench
> tests (pgbench -S)
>
> merlin
>
>
I'm not sure how much experience the community has on tuning PostgreSQL
running on RedHat which in turn is hosted on an IBM mainframe under VM
(using zLinux). So I'm happy to start posting further details and
benchmark results and see where we go. Should I be moving this thread
over into the pg-performance list, or is pg-general the right place?


Re: zLinux Load Testing Experience

From
Merlin Moncure
Date:
On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:
>
> On 30/04/13 20:46, Merlin Moncure wrote:
>>
>> On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>>
>>> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
>>> wrote:
>>>>
>>>> I'm currently working on a project porting an application from RedHat
>>>> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
>>>> version 9.n, so the PostgreSQL binaries have been built using the
>>>> standard
>>>> build tools from source. Everything appears run correctly. However as
>>>> part
>>>> of performance testing, our IBM and Linux SysProgs have been "poking
>>>> around"
>>>> using strace and have reported the following (which they think is an
>>>> error
>>>> condition) when hooking up to the postmaster processes:-
>>>>
>>>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
>>>> unavailable)
>>>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
>>>> (Timeout)
>>>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
>>>> unavailable)
>>>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
>>>> (Timeout)
>>>> ... repeated many times
>>>>
>>>
>>> That does not look like the postmaster process.  It looks like probably
>>> the
>>> background writer process.
>>>
>>> It is normal, and doesn't explain high CPU utilization.
>>
>> yeah: we're probably a couple of steps in front of deep system
>> profiling.   Helpful things to provide to help diagnose would be:
>>
>> *) 'explain analyze' of the queries that are eating cpu
>> *) more details about the hardware -- how many cpu, etc.
>> *) better definition of 'perceived high CPU utilisation'
>> *) some correlating performance tests, expecially cpu bound pgbench
>> tests (pgbench -S)
>>
>> merlin
>>
>>
> I'm not sure how much experience the community has on tuning PostgreSQL
> running on RedHat which in turn is hosted on an IBM mainframe under VM
> (using zLinux). So I'm happy to start posting further details and benchmark
> results and see where we go. Should I be moving this thread over into the
> pg-performance list, or is pg-general the right place?

certainly performance.   and yes, zLinux is less well traveled.  Did
you compile postgres from source?  Did you confirm that there is a
native spinlocks implementation and it is being used?

merlin


Re: zLinux Load Testing Experience

From
Andrew Hastie
Date:

On 01/05/13 15:34, Merlin Moncure wrote:
On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:
On 30/04/13 20:46, Merlin Moncure wrote:
On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
wrote:
I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the
standard
build tools from source. Everything appears run correctly. However as
part
of performance testing, our IBM and Linux SysProgs have been "poking
around"
using strace and have reported the following (which they think is an
error
condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
(Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
(Timeout)
... repeated many times

That does not look like the postmaster process.  It looks like probably
the
background writer process.

It is normal, and doesn't explain high CPU utilization.
yeah: we're probably a couple of steps in front of deep system
profiling.   Helpful things to provide to help diagnose would be:

*) 'explain analyze' of the queries that are eating cpu
*) more details about the hardware -- how many cpu, etc.
*) better definition of 'perceived high CPU utilisation'
*) some correlating performance tests, expecially cpu bound pgbench
tests (pgbench -S)

merlin


I'm not sure how much experience the community has on tuning PostgreSQL
running on RedHat which in turn is hosted on an IBM mainframe under VM
(using zLinux). So I'm happy to start posting further details and benchmark
results and see where we go. Should I be moving this thread over into the
pg-performance list, or is pg-general the right place?
certainly performance.   and yes, zLinux is less well traveled.  Did
you compile postgres from source?  Did you confirm that there is a
native spinlocks implementation and it is being used?

merlin

Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped with RedHat Ent6 does not have several v9 specific features we need).
Did you confirm that there is a native spinlocks implementation and it is being used? - I believe so as no errors or warnings logged during the build. Is there a simple way to check whether spin-locks are running native?

I've started looking at several articles covering pgbench and running some initial tests, so I plan to start a new thread on pg-performance in the next day or so.

Thanks for the advice so far  - Appreciated :-)

Andrew





Re: zLinux Load Testing Experience

From
Merlin Moncure
Date:
On Wed, May 1, 2013 at 11:34 AM, Andrew Hastie <andrew@ahastie.net> wrote:
>
> On 01/05/13 15:34, Merlin Moncure wrote:
>
> On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:
>
> On 30/04/13 20:46, Merlin Moncure wrote:
>
> On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
> wrote:
>
> I'm currently working on a project porting an application from RedHat
> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
> version 9.n, so the PostgreSQL binaries have been built using the
> standard
> build tools from source. Everything appears run correctly. However as
> part
> of performance testing, our IBM and Linux SysProgs have been "poking
> around"
> using strace and have reported the following (which they think is an
> error
> condition) when hooking up to the postmaster processes:-
>
> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
> unavailable)
> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
> (Timeout)
> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
> unavailable)
> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
> (Timeout)
> ... repeated many times
>
> That does not look like the postmaster process.  It looks like probably
> the
> background writer process.
>
> It is normal, and doesn't explain high CPU utilization.
>
> yeah: we're probably a couple of steps in front of deep system
> profiling.   Helpful things to provide to help diagnose would be:
>
> *) 'explain analyze' of the queries that are eating cpu
> *) more details about the hardware -- how many cpu, etc.
> *) better definition of 'perceived high CPU utilisation'
> *) some correlating performance tests, expecially cpu bound pgbench
> tests (pgbench -S)
>
> merlin
>
>
> I'm not sure how much experience the community has on tuning PostgreSQL
> running on RedHat which in turn is hosted on an IBM mainframe under VM
> (using zLinux). So I'm happy to start posting further details and benchmark
> results and see where we go. Should I be moving this thread over into the
> pg-performance list, or is pg-general the right place?
>
> certainly performance.   and yes, zLinux is less well traveled.  Did
> you compile postgres from source?  Did you confirm that there is a
> native spinlocks implementation and it is being used?
>
> merlin
>
> Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped
> with RedHat Ent6 does not have several v9 specific features we need).
>
> Did you confirm that there is a native spinlocks implementation and it is
> being used? - I believe so as no errors or warnings logged during the build.
> Is there a simple way to check whether spin-locks are running native?
>
> I've started looking at several articles covering pgbench and running some
> initial tests, so I plan to start a new thread on pg-performance in the next
> day or so.
>
> Thanks for the advice so far  - Appreciated :-)

I can't remember off the top of my head if configure forces you to
specifically unset spinlocks to get through a build on a non-hardware
spinlock platform.  Point being: the interesting stuff happens during
configure, not build.

Check the contents of src/include/pg_config.h and look for this line:
#define HAVE_SPINLOCKS 1

to see if you have hardware spinlocks.

merlin


Re: zLinux Load Testing Experience

From
Merlin Moncure
Date:
On Wed, May 1, 2013 at 1:21 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, May 1, 2013 at 11:34 AM, Andrew Hastie <andrew@ahastie.net> wrote:
>>
>> On 01/05/13 15:34, Merlin Moncure wrote:
>>
>> On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:
>>
>> On 30/04/13 20:46, Merlin Moncure wrote:
>>
>> On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
>> wrote:
>>
>> I'm currently working on a project porting an application from RedHat
>> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
>> version 9.n, so the PostgreSQL binaries have been built using the
>> standard
>> build tools from source. Everything appears run correctly. However as
>> part
>> of performance testing, our IBM and Linux SysProgs have been "poking
>> around"
>> using strace and have reported the following (which they think is an
>> error
>> condition) when hooking up to the postmaster processes:-
>>
>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
>> unavailable)
>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
>> (Timeout)
>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
>> unavailable)
>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
>> (Timeout)
>> ... repeated many times
>>
>> That does not look like the postmaster process.  It looks like probably
>> the
>> background writer process.
>>
>> It is normal, and doesn't explain high CPU utilization.
>>
>> yeah: we're probably a couple of steps in front of deep system
>> profiling.   Helpful things to provide to help diagnose would be:
>>
>> *) 'explain analyze' of the queries that are eating cpu
>> *) more details about the hardware -- how many cpu, etc.
>> *) better definition of 'perceived high CPU utilisation'
>> *) some correlating performance tests, expecially cpu bound pgbench
>> tests (pgbench -S)
>>
>> merlin
>>
>>
>> I'm not sure how much experience the community has on tuning PostgreSQL
>> running on RedHat which in turn is hosted on an IBM mainframe under VM
>> (using zLinux). So I'm happy to start posting further details and benchmark
>> results and see where we go. Should I be moving this thread over into the
>> pg-performance list, or is pg-general the right place?
>>
>> certainly performance.   and yes, zLinux is less well traveled.  Did
>> you compile postgres from source?  Did you confirm that there is a
>> native spinlocks implementation and it is being used?
>>
>> merlin
>>
>> Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped
>> with RedHat Ent6 does not have several v9 specific features we need).
>>
>> Did you confirm that there is a native spinlocks implementation and it is
>> being used? - I believe so as no errors or warnings logged during the build.
>> Is there a simple way to check whether spin-locks are running native?
>>
>> I've started looking at several articles covering pgbench and running some
>> initial tests, so I plan to start a new thread on pg-performance in the next
>> day or so.
>>
>> Thanks for the advice so far  - Appreciated :-)
>
> I can't remember off the top of my head if configure forces you to
> specifically unset spinlocks to get through a build on a non-hardware
> spinlock platform.  Point being: the interesting stuff happens during
> configure, not build.
>
> Check the contents of src/include/pg_config.h and look for this line:
> #define HAVE_SPINLOCKS 1
>
> to see if you have hardware spinlocks.


Just a follow up here since I'm about to go on vacation and will be
out of pocket for the next several days.  If you do indeed find out
that you are using non TAS spinlocks, and are suspicious that this is
causing your load issues, and are feeling experimental, and are using
gcc to compile postgres, and have determined that the
HAVE_GCC_INT_ATOMICS macro is set, I'd maybe consider hacking s_lock.h
to use the gcc __sync_lock_test_and_set variant of TAS (see around
line 300) in s_lock.h.

merlin


Re: zLinux Load Testing Experience

From
Tom Lane
Date:
Andrew Hastie <andrew@ahastie.net> writes:
> Did you confirm that there is a native spinlocks implementation and it is being used?  - I believe so as no errors or
warningslogged during the build. Is there a simple way to check whether spin-locks are running native? 

All non-ancient versions of PG force you to say "configure --disable-spinlocks"
to get a build without native spinlocks.  Such builds are only
considered suitable for zero-order port testing, because the performance
hit is so bad.

            regards, tom lane


Re: zLinux Load Testing Experience

From
Andrew Hastie
Date:
On 01/05/13 19:21, Merlin Moncure wrote:
> On Wed, May 1, 2013 at 11:34 AM, Andrew Hastie <andrew@ahastie.net> wrote:
>> On 01/05/13 15:34, Merlin Moncure wrote:
>>
>> On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:
>>
>> On 30/04/13 20:46, Merlin Moncure wrote:
>>
>> On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
>> wrote:
>>
>> I'm currently working on a project porting an application from RedHat
>> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
>> version 9.n, so the PostgreSQL binaries have been built using the
>> standard
>> build tools from source. Everything appears run correctly. However as
>> part
>> of performance testing, our IBM and Linux SysProgs have been "poking
>> around"
>> using strace and have reported the following (which they think is an
>> error
>> condition) when hooking up to the postmaster processes:-
>>
>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
>> unavailable)
>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
>> (Timeout)
>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
>> unavailable)
>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
>> (Timeout)
>> ... repeated many times
>>
>> That does not look like the postmaster process.  It looks like probably
>> the
>> background writer process.
>>
>> It is normal, and doesn't explain high CPU utilization.
>>
>> yeah: we're probably a couple of steps in front of deep system
>> profiling.   Helpful things to provide to help diagnose would be:
>>
>> *) 'explain analyze' of the queries that are eating cpu
>> *) more details about the hardware -- how many cpu, etc.
>> *) better definition of 'perceived high CPU utilisation'
>> *) some correlating performance tests, expecially cpu bound pgbench
>> tests (pgbench -S)
>>
>> merlin
>>
>>
>> I'm not sure how much experience the community has on tuning PostgreSQL
>> running on RedHat which in turn is hosted on an IBM mainframe under VM
>> (using zLinux). So I'm happy to start posting further details and benchmark
>> results and see where we go. Should I be moving this thread over into the
>> pg-performance list, or is pg-general the right place?
>>
>> certainly performance.   and yes, zLinux is less well traveled.  Did
>> you compile postgres from source?  Did you confirm that there is a
>> native spinlocks implementation and it is being used?
>>
>> merlin
>>
>> Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped
>> with RedHat Ent6 does not have several v9 specific features we need).
>>
>> Did you confirm that there is a native spinlocks implementation and it is
>> being used? - I believe so as no errors or warnings logged during the build.
>> Is there a simple way to check whether spin-locks are running native?
>>
>> I've started looking at several articles covering pgbench and running some
>> initial tests, so I plan to start a new thread on pg-performance in the next
>> day or so.
>>
>> Thanks for the advice so far  - Appreciated :-)
> I can't remember off the top of my head if configure forces you to
> specifically unset spinlocks to get through a build on a non-hardware
> spinlock platform.  Point being: the interesting stuff happens during
> configure, not build.
>
> Check the contents of src/include/pg_config.h and look for this line:
> #define HAVE_SPINLOCKS 1
>
> to see if you have hardware spinlocks.
>
> merlin
>
>
Confirm that #define HAVE_SPINLOCKS 1 is present and correct.

Will move any performance related issues I find onto the pg-performance
list.
Many thanks for all the help and advice so far :-)
Andrew