Re: Possible performance regression in version 10.1 with pgbenchread-write tests. - Mailing list pgsql-hackers

From Mithun Cy
Subject Re: Possible performance regression in version 10.1 with pgbenchread-write tests.
Date
Msg-id CAD__OuiWigmaYRec3A4H3EuyNp0nJqqPF_+_BGiWtDs32mY64Q@mail.gmail.com
Whole thread Raw
In response to Re: Possible performance regression in version 10.1 with pgbenchread-write tests.  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Possible performance regression in version 10.1 with pgbenchread-write tests.  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, Jul 20, 2018 at 10:52 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Fri, Jul 20, 2018 at 7:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> It's not *that* noticeable, as I failed to demonstrate any performance
> difference before committing the patch.  I think some more investigation
> is warranted to find out why some other people are getting different
> results
Maybe false sharing is a factor, since sizeof(sem_t) is 32 bytes on
Linux/amd64 and we're probably hitting elements clustered at one end
of the array?  Let's see... I tried sticking padding into
PGSemaphoreData and I got ~8% more TPS (72 client on multi socket
box, pgbench scale 100, only running for a minute but otherwise the
same settings that Mithun showed).

--- a/src/backend/port/posix_sema.c
+++ b/src/backend/port/posix_sema.c
@@ -45,6 +45,7 @@
 typedef struct PGSemaphoreData
 {
        sem_t           pgsem;
+       char            padding[PG_CACHE_LINE_SIZE - sizeof(sem_t)];
 } PGSemaphoreData;

That's probably not the right idiom and my tests probably weren't long
enough, but there seems to be some effect here.

I did a quick test applying the patch with same settings as initial mail I have reported  (On postgresql 10 latest code)
72 clients

CASE 1:
Without Patch : TPS 29269.823540

With Patch : TPS 36005.544960.    --- 23% jump

Just Disabling using unnamed POSIX semaphores: TPS 34481.207959

So it seems that is the issue as the test is being run on 8 node numa machine.
I also came across a presentation [1] : slide 20 which says one of those futex architecture is bad for NUMA machine. I am not sure the new fix for same is included as part of Linux version 3.10.0-693.5.2.el7.x86_64 which is on my test machine.




--
Thanks and Regards
Mithun C Y

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Segfault logical replication PG 10.4
Next
From: Andres Freund
Date:
Subject: Re: Possible performance regression in version 10.1 with pgbenchread-write tests.