Re: Strange behavior: pgbench and new Linux kernels - Mailing list pgsql-performance

From Greg Smith
Subject Re: Strange behavior: pgbench and new Linux kernels
Date
Msg-id alpine.GSO.2.01.0904041147540.27286@westnet.com
Whole thread Raw
In response to Re: Strange behavior: pgbench and new Linux kernels  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Strange behavior: pgbench and new Linux kernels  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
On Tue, 31 Mar 2009, Kevin Grittner wrote:

>>>> On Thu, Apr 17, 2008 at  7:26 PM, Greg Smith wrote:
>
>> On this benchmark 2.6.25 is the worst kernel yet:
>
> I don't remember seeing a follow-up on this issue from last year.
> Are there still any particular kernels to avoid based on this?

I just discovered something really fascinating here.  The problem is
strictly limited to when you're connecting via Unix-domain sockets; use
TCP/IP instead, and it goes away.

To refresh everyone's memory here, I reported a problem to the LKML here:
http://lkml.org/lkml/2008/5/21/292 Got some patches and some kernel tweaks
for the scheduler but never a clear resolution for the cause, which kept
anybody from getting too excited about merging anything.  Test results
comparing various tweaks on the hardware I'm still using now are at
http://lkml.org/lkml/2008/5/26/288

For example, here's kernel 2.6.25 running pgbench with 50 clients with a
Q6000 processor, demonstrating poor performance--I'd get >20K TPS here
with a pre-CFS kernel:

$ pgbench -S -t 4000 -c 50 -n pgbench
transaction type: SELECT only
scaling factor: 10
query mode: simple
number of clients: 50
number of transactions per client: 4000
number of transactions actually processed: 200000/200000
tps = 8288.047442 (including connections establishing)
tps = 8319.702195 (excluding connections establishing)

If I now execute exactly the same test, but using localhost, performance
returns to normal:

$ pgbench -S -t 4000 -c 50 -n -h localhost pgbench
transaction type: SELECT only
scaling factor: 10
query mode: simple
number of clients: 50
number of transactions per client: 4000
number of transactions actually processed: 200000/200000
tps = 17575.277771 (including connections establishing)
tps = 17724.651090 (excluding connections establishing)

That's 100% repeatable, I ran each test several times each way.

So the new summary here of what I've found is that if:

1) You're running Linux 2.6.23 or greater (confirmed in up to 2.6.26)
2) You connect over a Unix-domain socket
3) Your client count is relatively high (>8 clients/core)

You can expect your pgbench results to tank.  Switch to connecting over
TCP/IP to localhost, and everything is fine; it's not quite as fast as the
pre-CFS kernels in some cases, in others it's faster though.

I haven't gotten to testing kernels newer than 2.6.26 yet, when I saw a
17K TPS result during one of my tests on 2.6.25 I screeched to a halt to
isolate this instead.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-performance by date:

Previous
From: henk de wit
Date:
Subject: Re: Using IOZone to simulate DB access patterns
Next
From: Josh Berkus
Date:
Subject: Re: Strange behavior: pgbench and new Linux kernels