Re: Wait free LW_SHARED acquisition - v0.9 - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Wait free LW_SHARED acquisition - v0.9 |
Date | |
Msg-id | CAA4eK1+3AEdJpKBoAR7-_GqqaO3ZXCEVAL177sVj=nQa7OX5zw@mail.gmail.com Whole thread Raw |
In response to | Re: Wait free LW_SHARED acquisition - v0.9 (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Wait free LW_SHARED acquisition - v0.9
|
List | pgsql-hackers |
On Fri, Oct 17, 2014 at 11:41 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-10-17 17:14:16 +0530, Amit Kapila wrote:
> > On Tue, Oct 14, 2014 at 11:34 AM, Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > HEAD – commit 494affb + wait free lw_shared_v2
> >
> > Shared_buffers=8GB; Scale Factor = 3000
> >
> > Client Count/No. Of Runs (tps) 64 128 Run-1 286209 274922 Run-2 289101
> > 274495 Run-3 289639 273633
>
> So here the results with LW_SHARED were consistently better, right?
> On 2014-10-17 17:14:16 +0530, Amit Kapila wrote:
> > On Tue, Oct 14, 2014 at 11:34 AM, Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > HEAD – commit 494affb + wait free lw_shared_v2
> >
> > Shared_buffers=8GB; Scale Factor = 3000
> >
> > Client Count/No. Of Runs (tps) 64 128 Run-1 286209 274922 Run-2 289101
> > 274495 Run-3 289639 273633
>
> So here the results with LW_SHARED were consistently better, right?
Yes.
> You
> saw performance degradations here earlier?
Yes.
> > So I am planning to proceed further with the review/test of your
> > latest patch.
>
> > According to me, below things are left from myside:
> > a. do some basic tpc-b tests with patch
> saw performance degradations here earlier?
Yes.
> > So I am planning to proceed further with the review/test of your
> > latest patch.
>
> > According to me, below things are left from myside:
> > a. do some basic tpc-b tests with patch
I have done few tests, the results of which are below, the data indicates
that neither there is any noticeable gain nor any noticeable loss on tpc-b
tests which I think is what could have been expected of this patch.
There is slight variation at few client counts (for sync_commit =off,
at 32 and 128), however I feel that is just noise as I don't see any
general trend.
Performance Data
----------------------------
IBM POWER-8 24 cores, 192 hardware threadsRAM = 492GB
Database Locale =C
max_connections =300
checkpoint_segments=300
checkpoint_timeout =15min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
Duration of each individual run = 30mins
Test mode - tpc-b
Below data is median of 3 runs, detailed data is attached with this
mail.
Scale_factor =3000; shared_buffers=8GB;
> > b. re-review latest version posted by you
>
> Cool!
Patch/Client_count | 8 | 16 | 32 | 64 | 128 |
HEAD | 3849 | 4889 | 3569 | 3845 | 4547 |
LW_SHARED | 3844 | 4787 | 3532 | 3814 | 4408 |
Scale_factor =3000; shared_buffers=8GB; synchronous_commit=off;
Patch/Client_count | 8 | 16 | 32 | 64 | 128 |
HEAD | 5966 | 8297 | 10084 | 9348 | 8836 |
LW_SHARED | 6070 | 8612 | 8839 | 9503 | 8584 |
While doing performance tests, I noticed a hang at higher client
counts with patch. I have tried to check call stack for few of
processes and it is as below:
#0 0x0000008010933e54 in .semop () from /lib64/libc.so.6
#1 0x0000000010286e48 in .PGSemaphoreLock ()
#2 0x00000000102f68bc in .LWLockAcquire ()
#3 0x00000000102d1ca0 in .ReadBuffer_common ()
#4 0x00000000102d2ae0 in .ReadBufferExtended ()
#5 0x00000000100a57d8 in ._bt_getbuf ()
#6 0x00000000100a6210 in ._bt_getroot ()
#7 0x00000000100aa910 in ._bt_search ()
#8 0x00000000100ab494 in ._bt_first ()
#9 0x00000000100a8e84 in .btgettuple ()
..
#0 0x0000008010933e54 in .semop () from /lib64/libc.so.6
#1 0x0000000010286e48 in .PGSemaphoreLock ()
#2 0x00000000102f68bc in .LWLockAcquire ()
#3 0x00000000102d1ca0 in .ReadBuffer_common ()
#4 0x00000000102d2ae0 in .ReadBufferExtended ()
#5 0x00000000100a57d8 in ._bt_getbuf ()
#6 0x00000000100a6210 in ._bt_getroot ()
#7 0x00000000100aa910 in ._bt_search ()
#8 0x00000000100ab494 in ._bt_first ()
...
The test configuration is as below:
Test env - Power - 7 (hydra)
scale_factor - 3000
shared_buffers - 8GB
test mode - pgbench read only
test execution -
./pgbench -c 128 -j 128 -T 1800 -S -M prepared postgres
I have ran it for half an hour, but it doesn't came out even after
~2 hours. It doesn't get reproduced every time, currently I am
able to reproduce it and the m/c is in same state, if you want any
info, let me know (unfortunately binaries are in release mode, so
might not get enough information).
>
> Cool!
I will post my feedback for code separately, once I am able to
completely review the new versions.
Attachment
pgsql-hackers by date: