Thread: Latch implementation
Hi, I've been playing around with measuring the latch implementation in 9.1, and here are the results of a ping-pong test with 2 processes signalling and waiting on the latch. I did three variations (linux 2.6.18, nehalem processor). One is the current one. The second is built on native semaphors on linux. This one cannot implement WaitLatchOrSocket, there's no select involved. The third is an implementation based on pipe() and poll. Note: in its current incarnation it's essentially a hack to measure performance, it's not usable in postgres, this assumes all latches are created before any process is forked. We'd need to use mkfifo to sort that out if we really want to go this route, or similar. - Current implementation: 1 pingpong is avg 15 usecs - Pipe+poll: 9 usecs - Semaphore: 6 usecs The test program & modified unix_latch.c is attached, you can compile it like "gcc -DPIPE -O2 sema.c" or "gcc -DLINUX_SEM -O2 sema.c" or "gcc -O2 sema.c". Thanks, --Ganesh
On Wed, Sep 22, 2010 at 4:31 PM, Ganesh Venkitachalam-1 <ganesh@vmware.com> wrote: > I've been playing around with measuring the latch implementation in 9.1, and > here are the results of a ping-pong test with 2 processes signalling and > waiting on the latch. I did three variations (linux 2.6.18, nehalem > processor). > > One is the current one. > > The second is built on native semaphors on linux. This one cannot > implement WaitLatchOrSocket, there's no select involved. > > The third is an implementation based on pipe() and poll. Note: in its > current incarnation it's essentially a hack to measure performance, it's not > usable in postgres, this assumes all latches are created before any process > is forked. We'd need to use mkfifo to sort that out if we really want to go > this route, or similar. > > - Current implementation: 1 pingpong is avg 15 usecs > - Pipe+poll: 9 usecs > - Semaphore: 6 usecs Interesting numbers. I guess one question is how much improving the performance of the latch implementation would affect overall system performance. Synchronous replication is obviously going to be highly sensitive to latency, but even in that context I'm not really sure whether this is enough to matter. Do you have any sense of that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 22/09/10 23:31, Ganesh Venkitachalam-1 wrote: > I've been playing around with measuring the latch implementation in 9.1, > and here are the results of a ping-pong test with 2 processes signalling > and waiting on the latch. I did three variations (linux 2.6.18, nehalem > processor). > > One is the current one. > > The second is built on native semaphors on linux. This one cannot > implement WaitLatchOrSocket, there's no select involved. > > The third is an implementation based on pipe() and poll. Note: in its > current incarnation it's essentially a hack to measure performance, it's > not usable in postgres, this assumes all latches are created before any > process is forked. We'd need to use mkfifo to sort that out if we really > want to go this route, or similar. > > - Current implementation: 1 pingpong is avg 15 usecs > - Pipe+poll: 9 usecs > - Semaphore: 6 usecs > > The test program & modified unix_latch.c is attached, you can compile it > like "gcc -DPIPE -O2 sema.c" or "gcc -DLINUX_SEM -O2 sema.c" or "gcc -O2 > sema.c". Interesting, thanks for the testing! Could you also test how much faster the current implementation gets by just replacing select() with poll()? That should shave off some overhead. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2010-09-22 at 13:31 -0700, Ganesh Venkitachalam-1 wrote: > Hi, > > I've been playing around with measuring the latch implementation in 9.1, > and here are the results of a ping-pong test with 2 processes signalling > and waiting on the latch. I did three variations (linux 2.6.18, nehalem > processor). > > One is the current one. > > The second is built on native semaphors on linux. This one cannot > implement WaitLatchOrSocket, there's no select involved. That looks interesting. If we had a need for a latch that would not need to wait on a socket as well, this would be better. In sync rep, we certainly do. Thanks for measuring this. Question is: in that case would we use latches or a PGsemaphore? If the answer is "latch" then we could just have an additional boolean option when we request InitLatch() to see what kind of latch we want. > The third is an implementation based on pipe() and poll. Note: in its > current incarnation it's essentially a hack to measure performance, it's > not usable in postgres, this assumes all latches are created before any > process is forked. We'd need to use mkfifo to sort that out if we really > want to go this route, or similar. > > - Current implementation: 1 pingpong is avg 15 usecs > - Pipe+poll: 9 usecs > - Semaphore: 6 usecs Pipe+poll not worth it then. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
Attached is the current implementation redone with poll. It lands at around 10.5 usecs, right above pipe, but better than the current implementation. As to the other questions: yes, this would matter for sync replication. Cosider an enterprise use case with 10Gb network & SSDs (not at all uncommon): a 10Gb network can do a roundtrip with the commitlog in <10 usecs, and SSDs have write latency < 50 usec. Now if the latch takes tens of usescs (this stuff scales somewhat with the number of processes, my data is all with 2 processes), that becomes a very significant part of the net commit latency. So I'd think this is worth fixing. Thanks, --Ganesh On Thu, 23 Sep 2010, Simon Riggs wrote: > Date: Thu, 23 Sep 2010 06:56:38 -0700 > From: Simon Riggs <simon@2ndQuadrant.com> > To: Ganesh Venkitachalam <ganesh@vmware.com> > Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] Latch implementation > > On Wed, 2010-09-22 at 13:31 -0700, Ganesh Venkitachalam-1 wrote: >> Hi, >> >> I've been playing around with measuring the latch implementation in 9.1, >> and here are the results of a ping-pong test with 2 processes signalling >> and waiting on the latch. I did three variations (linux 2.6.18, nehalem >> processor). >> >> One is the current one. >> >> The second is built on native semaphors on linux. This one cannot >> implement WaitLatchOrSocket, there's no select involved. > > That looks interesting. If we had a need for a latch that would not need > to wait on a socket as well, this would be better. In sync rep, we > certainly do. Thanks for measuring this. > > Question is: in that case would we use latches or a PGsemaphore? > > If the answer is "latch" then we could just have an additional boolean > option when we request InitLatch() to see what kind of latch we want. > >> The third is an implementation based on pipe() and poll. Note: in its >> current incarnation it's essentially a hack to measure performance, it's >> not usable in postgres, this assumes all latches are created before any >> process is forked. We'd need to use mkfifo to sort that out if we really >> want to go this route, or similar. >> >> - Current implementation: 1 pingpong is avg 15 usecs >> - Pipe+poll: 9 usecs >> - Semaphore: 6 usecs > > Pipe+poll not worth it then. > > -- > Simon Riggs www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Training and Services > >