Thread: pgbench could not send data to client: Broken pipe

pgbench could not send data to client: Broken pipe

From
David Kerr
Date:
Howdy,

I'm running pgbench with a fairly large # of clients and getting this error in my PG log file.

Here's the command:
./pgbench -c 1100 testdb -l

I get:
LOG:  could not send data to client: Broken pipe

(I had to modify the pgbench.c file to make it go that high, i changed:
MAXCLIENTS = 2048

I thought maybe i was running out of resources so i checked my ulimits:

 ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 2048
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) unlimited
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


This is Pg 8.3.10. Redhat 64bit, the system has 48 cores, 256G of ram..


Any idea what would be causing the error?

thanks

Dave

Re: pgbench could not send data to client: Broken pipe

From
Tom Lane
Date:
David Kerr <dmk@mr-paradox.net> writes:
> I'm running pgbench with a fairly large # of clients and getting this error in my PG log file.
> LOG:  could not send data to client: Broken pipe

That error suggests that pgbench dropped the connection.  You might be
running into some bug or internal limitation in pgbench.  Did you check
to make sure pgbench isn't crashing?

> (I had to modify the pgbench.c file to make it go that high, i changed:
> MAXCLIENTS = 2048

Hm, you can't just arbitrarily change that number; it has to be less
than whatever number of open files select(2) supports.  A look on my
Fedora 13 box suggests that 1024 is the limit there; I'm not sure which
Red Hat variant you're using but I suspect it might have the same limit.

As of the 9.0 release, it's possible to run pgbench in a "multi thread"
mode, and if you forced the subprocess rather than thread model it looks
like the select() limit would be per subprocess rather than global.
So I think you could get above the FD_SETSIZE limit with a bit of
hacking if you were using 9.0's pgbench.  No chance with 8.3 though.

(This suggests BTW that we might want to expose the thread-versus-fork
choice in a slightly more user-controllable fashion, rather than
assuming that threads are always better.)

            regards, tom lane

Re: pgbench could not send data to client: Broken pipe

From
Greg Smith
Date:
Tom Lane wrote:
> As of the 9.0 release, it's possible to run pgbench in a "multi thread"
> mode, and if you forced the subprocess rather than thread model it looks
> like the select() limit would be per subprocess rather than global.
> So I think you could get above the FD_SETSIZE limit with a bit of
> hacking if you were using 9.0's pgbench.  No chance with 8.3 though.
>

I believe David can do this easily enough by compiling a 9.0 source code
tree with the "--disable-thread-safety" option.  That's the simplest way
to force the pgbench client to build itself using the multi-process
model, rather than the multi-threaded one.

It's kind of futile to run pgbench simulating much more than a hundred
or two clients before 9.0 anyway.  Without multiple workers, you're
likely to just run into the process switching limitations within pgbench
itself rather than testing server performance usefully.  I've watched
the older pgbench program fail to come close to saturating an 8 core
server without running into its own limitations first.

You might run a 9.0 pgbench client against an 8.3 server though, if you
did the whole thing starting from pgbench database initialization over
again--the built-in tables like "accounts" changed to "pgbench_accounts"
in 8.4.  That might work, can't recall any changes that would prevent
it; but as I haven't tested it yet I can't say for sure.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: pgbench could not send data to client: Broken pipe

From
Tom Lane
Date:
Greg Smith <greg@2ndquadrant.com> writes:
> Tom Lane wrote:
>> So I think you could get above the FD_SETSIZE limit with a bit of
>> hacking if you were using 9.0's pgbench.  No chance with 8.3 though.

> I believe David can do this easily enough by compiling a 9.0 source code
> tree with the "--disable-thread-safety" option.

It would take a bit more work than that, because the code still tries to
limit the client count based on FD_SETSIZE.  He'd need to hack it so
that in non-thread mode, the limit is FD_SETSIZE per subprocess.  I was
suggesting that an official patch to that effect would be a good thing.

> It's kind of futile to run pgbench simulating much more than a hundred
> or two clients before 9.0 anyway.

Yeah ...

            regards, tom lane

Re: pgbench could not send data to client: Broken pipe

From
David Kerr
Date:
On Wed, Sep 08, 2010 at 03:27:34PM -0400, Greg Smith wrote:
- Tom Lane wrote:
- >As of the 9.0 release, it's possible to run pgbench in a "multi thread"
- >mode, and if you forced the subprocess rather than thread model it looks
- >like the select() limit would be per subprocess rather than global.
- >So I think you could get above the FD_SETSIZE limit with a bit of
- >hacking if you were using 9.0's pgbench.  No chance with 8.3 though.
- >
-
- I believe David can do this easily enough by compiling a 9.0 source code
- tree with the "--disable-thread-safety" option.  That's the simplest way
- to force the pgbench client to build itself using the multi-process
- model, rather than the multi-threaded one.
-
- It's kind of futile to run pgbench simulating much more than a hundred
- or two clients before 9.0 anyway.  Without multiple workers, you're
- likely to just run into the process switching limitations within pgbench
- itself rather than testing server performance usefully.  I've watched
- the older pgbench program fail to come close to saturating an 8 core
- server without running into its own limitations first.
-
- You might run a 9.0 pgbench client against an 8.3 server though, if you
- did the whole thing starting from pgbench database initialization over
- again--the built-in tables like "accounts" changed to "pgbench_accounts"
- in 8.4.  That might work, can't recall any changes that would prevent
- it; but as I haven't tested it yet I can't say for sure.

Thanks, I compiled the 9.0 RC1 branch with the --disable-thread-safety option
and ran PG bench on my 8.3 DB it seemed to work fine,

However, MAXCLIENTS is still 1024, if i hack it to switch it up to 2048 i
get this:
starting vacuum...end.
select failed: Bad file descriptor  <---------------
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1900
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 3723/19000
tps = 52.007642 (including connections establishing)
tps = 82.579077 (excluding connections establishing)


I'm not sure what Tom is referring to with the select(2) limitation, maybe I'm running
into it (where do i find that? /usr/include/sys/select.h? )

should i be running pgbench differently? I tried increasing the # of threads
but that didn't increase the number of backend's and i'm trying to simulate
2000 physical backend processes.

thanks

Dave

Re: pgbench could not send data to client: Broken pipe

From
David Kerr
Date:
On Wed, Sep 08, 2010 at 03:44:36PM -0400, Tom Lane wrote:
- Greg Smith <greg@2ndquadrant.com> writes:
- > Tom Lane wrote:
- >> So I think you could get above the FD_SETSIZE limit with a bit of
- >> hacking if you were using 9.0's pgbench.  No chance with 8.3 though.
-
- > I believe David can do this easily enough by compiling a 9.0 source code
- > tree with the "--disable-thread-safety" option.
-
- It would take a bit more work than that, because the code still tries to
- limit the client count based on FD_SETSIZE.  He'd need to hack it so
- that in non-thread mode, the limit is FD_SETSIZE per subprocess.  I was
- suggesting that an official patch to that effect would be a good thing.

Yeah, that might be beyond me =)

Dave

Re: pgbench could not send data to client: Broken pipe

From
Tom Lane
Date:
David Kerr <dmk@mr-paradox.net> writes:
> should i be running pgbench differently? I tried increasing the # of threads
> but that didn't increase the number of backend's and i'm trying to simulate
> 2000 physical backend processes.

The odds are good that if you did get up that high, what you'd find is
pgbench itself being the bottleneck, not the server.  What I'd suggest
is running several copies of pgbench *on different machines*, all
beating on the one database server.  Collating the results will be a bit
more of a PITA than if there were only one pgbench instance, but it'd
be a truer picture of real-world behavior.

It's probably also worth pointing out that 2000 backend processes is
likely to be a loser anyhow.  If you're just doing this for academic
purposes, fine, but if you're trying to set up a real system for 2000
clients you almost certainly want to stick some connection pooling in
there.

            regards, tom lane

Re: pgbench could not send data to client: Broken pipe

From
David Kerr
Date:
On Wed, Sep 08, 2010 at 04:35:28PM -0400, Tom Lane wrote:
- David Kerr <dmk@mr-paradox.net> writes:
- > should i be running pgbench differently? I tried increasing the # of threads
- > but that didn't increase the number of backend's and i'm trying to simulate
- > 2000 physical backend processes.
-
- The odds are good that if you did get up that high, what you'd find is
- pgbench itself being the bottleneck, not the server.  What I'd suggest
- is running several copies of pgbench *on different machines*, all
- beating on the one database server.  Collating the results will be a bit
- more of a PITA than if there were only one pgbench instance, but it'd
- be a truer picture of real-world behavior.
-
- It's probably also worth pointing out that 2000 backend processes is
- likely to be a loser anyhow.  If you're just doing this for academic
- purposes, fine, but if you're trying to set up a real system for 2000
- clients you almost certainly want to stick some connection pooling in
- there.
-
-             regards, tom lane
-

ah that's a good idea, i'll have to give that a shot.

Actually, this is real.. that's 2000 connections - connection pooled out to
20k or so. (although i'm pushing for closer to 1000 connections).

I know that's not the ideal way to go, but it's what i've got to work with.

It IS a huge box though...

Thanks

Dave

Re: pgbench could not send data to client: Broken pipe

From
"Kevin Grittner"
Date:
David Kerr <dmk@mr-paradox.net> wrote:

> Actually, this is real.. that's 2000 connections - connection
> pooled out to 20k or so. (although i'm pushing for closer to 1000
> connections).
>
> I know that's not the ideal way to go, but it's what i've got to
> work with.
>
> It IS a huge box though...

FWIW, my benchmarks (and I've had a couple people tell me this is
consistent with what they've seen) show best throughput and best
response time when the connection pool is sized such that the number
of active PostgreSQL connections is limited to about twice the
number of CPU cores plus the number of effective spindles.  Either
you've got one heck of a machine, or your "sweet spot" for the
connection pool will be well under 1000 connections.

It is important that your connection pool queues requests when
things are maxed out, and quickly submit a new request when
completion brings the number of busy connections below the maximum.

-Kevin

Re: pgbench could not send data to client: Broken pipe

From
David Kerr
Date:
On Wed, Sep 08, 2010 at 03:56:24PM -0500, Kevin Grittner wrote:
- David Kerr <dmk@mr-paradox.net> wrote:
-
- > Actually, this is real.. that's 2000 connections - connection
- > pooled out to 20k or so. (although i'm pushing for closer to 1000
- > connections).
- >
- > I know that's not the ideal way to go, but it's what i've got to
- > work with.
- >
- > It IS a huge box though...
-
- FWIW, my benchmarks (and I've had a couple people tell me this is
- consistent with what they've seen) show best throughput and best
- response time when the connection pool is sized such that the number
- of active PostgreSQL connections is limited to about twice the
- number of CPU cores plus the number of effective spindles.  Either
- you've got one heck of a machine, or your "sweet spot" for the
- connection pool will be well under 1000 connections.
-
- It is important that your connection pool queues requests when
- things are maxed out, and quickly submit a new request when
- completion brings the number of busy connections below the maximum.
-
- -Kevin

Hmm, i'm not following you. I've got 48 cores. that means my sweet-spot
active connections would be 96. (i.e., less than the default max_connections
shipped with PG) and this is a very very expensive machine.

Now if i were to connection pool that out to 15 people per connection,
that's 1440 users "total" able to use my app at one time. (with only
96 actually doing anything). not really great for a web-based app that
will have millions of users accessing it when we're fully ramped up.

I've got a few plans to spread the load out across multiple machines
but at 1440 users per machine this wouldn't be sustanable..

I know that other people are hosting more than that on larger machines
so i hope i'm ok.

Thanks

Dave

Re: pgbench could not send data to client: Broken pipe

From
"Kevin Grittner"
Date:
David Kerr <dmk@mr-paradox.net> wrote:

> Hmm, i'm not following you. I've got 48 cores. that means my
> sweet-spot active connections would be 96.

Plus your effective spindle count.  That can be hard to calculate,
but you could start by just counting spindles on your drive array.

> Now if i were to connection pool that out to 15 people per
> connection,

Where did you get that number?  We routinely route hundreds of
requests per second (many of them with 10 or 20 joins) from five or
ten thousand connected users through a pool of 30 connections.  It
started out bigger, we kept shrinking it until we hit our sweet
spot.  The reason we started bigger is we've got 40 spindles to go
with the 16 cores, but the active portion of the database is cached,
which reduces our effective spindle count to zero.

> that's 1440 users "total" able to use my app at one time. (with
> only 96 actually doing anything). not really great for a web-based
> app that will have millions of users accessing it when we're fully
> ramped up.

Once you have enough active connections to saturate the resources,
adding more connections just adds contention for resources and
context switching cost -- it does nothing to help you service more
concurrent users.  The key is, as I mentioned before, to have the
pooler queue requests above the limit and promptly get them running
as slots are freed.

-Kevin

Re: pgbench could not send data to client: Broken pipe

From
David Kerr
Date:
On Wed, Sep 08, 2010 at 04:51:17PM -0500, Kevin Grittner wrote:
- David Kerr <dmk@mr-paradox.net> wrote:
-
- > Hmm, i'm not following you. I've got 48 cores. that means my
- > sweet-spot active connections would be 96.
-
- Plus your effective spindle count.  That can be hard to calculate,
- but you could start by just counting spindles on your drive array.

We've got this weird LPAR thing at our hosting center. it's tough
for me to do.

- > Now if i were to connection pool that out to 15 people per
- > connection,
-
- Where did you get that number?  We routinely route hundreds of
- requests per second (many of them with 10 or 20 joins) from five or
- ten thousand connected users through a pool of 30 connections.  It
- started out bigger, we kept shrinking it until we hit our sweet
- spot.  The reason we started bigger is we've got 40 spindles to go
- with the 16 cores, but the active portion of the database is cached,
- which reduces our effective spindle count to zero.

That's encouraging. I don't remember where I got the number from,
but my pooler will be Geronimo, so i think it came in that context.

- > that's 1440 users "total" able to use my app at one time. (with
- > only 96 actually doing anything). not really great for a web-based
- > app that will have millions of users accessing it when we're fully
- > ramped up.
-
- Once you have enough active connections to saturate the resources,
- adding more connections just adds contention for resources and
- context switching cost -- it does nothing to help you service more
- concurrent users.  The key is, as I mentioned before, to have the
- pooler queue requests above the limit and promptly get them running
- as slots are freed.

Right, I understand that. My assertian/hope is that the saturation point
on this machine should be higher than most.

Dave

Re: pgbench could not send data to client: Broken pipe

From
"Kevin Grittner"
Date:
David Kerr <dmk@mr-paradox.net> wrote:

> My assertian/hope is that the saturation point
> on this machine should be higher than most.

Here's another way to think about it -- how long do you expect your
average database request to run?  (Our top 20 transaction functions
average about 3ms per execution.)  What does that work out to in
transactions per second?  That's the TPS you can achieve *on each
connection* if your pooler is efficient.  If you've determined a
connection pool size based on hardware resources, divide your
anticipated requests per second by that pool size.  If the result is
less than the TPS each connection can handle, you're in good shape.
If it's higher, you may need more hardware to satisfy the load.

Of course, the only way to really know some of these numbers is to
test your actual application on the real hardware under realistic
load; but sometimes you can get a reasonable approximation from
early tests or "gut feel" based on experience with similar
applications.  I strongly recommend trying incremental changes to
various configuration parameters once you have real load, and
monitor the impact.  The optimal settings are often not what you
expect.

And if the pooling isn't producing the results you expect, you
should look at its configuration, or (if you can) try other pooler
products.

-Kevin

Re: pgbench could not send data to client: Broken pipe

From
David Kerr
Date:
On Wed, Sep 08, 2010 at 05:27:24PM -0500, Kevin Grittner wrote:
- David Kerr <dmk@mr-paradox.net> wrote:
-
- > My assertian/hope is that the saturation point
- > on this machine should be higher than most.
-
- Here's another way to think about it -- how long do you expect your
- average database request to run?  (Our top 20 transaction functions
- average about 3ms per execution.)  What does that work out to in
- transactions per second?  That's the TPS you can achieve *on each
- connection* if your pooler is efficient.  If you've determined a
- connection pool size based on hardware resources, divide your
- anticipated requests per second by that pool size.  If the result is
- less than the TPS each connection can handle, you're in good shape.
- If it's higher, you may need more hardware to satisfy the load.
-
- Of course, the only way to really know some of these numbers is to
- test your actual application on the real hardware under realistic
- load; but sometimes you can get a reasonable approximation from
- early tests or "gut feel" based on experience with similar
- applications.  I strongly recommend trying incremental changes to
- various configuration parameters once you have real load, and
- monitor the impact.  The optimal settings are often not what you
- expect.
-
- And if the pooling isn't producing the results you expect, you
- should look at its configuration, or (if you can) try other pooler
- products.
-
- -Kevin
-

Thanks for the insight. we're currently in performance testing of the
app. Currently, the JVM is the bottleneck, once we get past that
i'm sure it will be the database at which point I'll have the kind
of data you're talking about.

Dave

Re: pgbench could not send data to client: Broken pipe

From
Alvaro Herrera
Date:
Excerpts from David Kerr's message of mié sep 08 18:29:59 -0400 2010:

> Thanks for the insight. we're currently in performance testing of the
> app. Currently, the JVM is the bottleneck, once we get past that
> i'm sure it will be the database at which point I'll have the kind
> of data you're talking about.

Hopefully you're not running the JVM stuff in the same machine.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: pgbench could not send data to client: Broken pipe

From
David Kerr
Date:
On Thu, Sep 09, 2010 at 10:38:16AM -0400, Alvaro Herrera wrote:
- Excerpts from David Kerr's message of mié sep 08 18:29:59 -0400 2010:
-
- > Thanks for the insight. we're currently in performance testing of the
- > app. Currently, the JVM is the bottleneck, once we get past that
- > i'm sure it will be the database at which point I'll have the kind
- > of data you're talking about.
-
- Hopefully you're not running the JVM stuff in the same machine.

Nope, this server is 100% allocated to the database.

Dave

Re: pgbench could not send data to client: Broken pipe

From
Greg Smith
Date:
Kevin Grittner wrote:
> Of course, the only way to really know some of these numbers is to
> test your actual application on the real hardware under realistic
> load; but sometimes you can get a reasonable approximation from
> early tests or "gut feel" based on experience with similar
> applications.

And that latter part only works if your gut is as accurate as Kevin's.
For most people, even a rough direct measurement is much more useful
than any estimate.

Anyway, Kevin's point--that ultimately you cannot really be executing
more things at once than you have CPUs--is an accurate one to remember
here.  One reason to put connection pooling in front of your database is
that it cannot handle thousands of active connections at once without
switching between them very frequently.  That wastes both CPU and other
resources with contention that could be avoided.

If you expect, say, 1000 simultaneous users, and you have 48 CPUs, there
is only 48ms worth of CPU time available to each user per second on
average.  If you drop that to 100 users using a pooler, they'll each get
480ms worth of it.  But no matter what, when the CPUs are busy enough to
always have a queued backlog, they will clear at best 48 * 1 second =
48000 ms of work from that queue each second, best case, no matter how
you setup the ratios here.

Now, imagine that the average query takes 24ms.  The two scenarios work
out like this:

Without pooler:  takes 24 / 48 = 0.5 seconds to execute in parallel with
999 other processes

With pooler:  Worst-case, the pooler queue is filled and there are 900
users ahead of this one, representing 21600 ms worth of work to clear
before this request will become active.  The query waits 21600 / 48000 =
0.45 seconds to get runtime on the CPU.  Once it starts, though, it's
only contending with 99 other processes, so it gets 1/100 of the
available resources.  480 ms of CPU time executes per second for this
query; it runs in 0.05 seconds at that rate.  Total runtime:  0.45 +
0.05 = 0.5 seconds!

So the incoming query in this not completely contrived case (I just
picked the numbers to make the math even) takes the same amount of time
to deliver a result either way.  It's just a matter of whether it spends
that time waiting for a clear slice of CPU time, or fighting with a lot
of other processes the whole way.  Once the incoming connections exceeds
CPUs by enough of a margin that a pooler can expect to keep all the CPUs
busy, it delivers results at the same speed as using a larger number of
connections.  And since the "without pooler" case assumes perfect
slicing of time into units, it's the unrealistic one; contention among
the 1000 processes will actually make it slower than the pooled version
in the real world.  You won't see anywhere close to 48000 ms worth of
work delivered per second anymore if the server is constantly losing its
CPU cache, swapping among an average of an average of 21
connections/CPU.  Whereas if it's only slightly more than 2 connections
per CPU, each of them should alternate between the two processes easily
enough.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: pgbench could not send data to client: Broken pipe

From
"Kevin Grittner"
Date:
Greg Smith <greg@2ndquadrant.com> wrote:
> Kevin Grittner wrote:
>> Of course, the only way to really know some of these numbers is
>> to test your actual application on the real hardware under
>> realistic load; but sometimes you can get a reasonable
>> approximation from early tests or "gut feel" based on experience
>> with similar applications.
>
> And that latter part only works if your gut is as accurate as
> Kevin's.  For most people, even a rough direct measurement is much
> more useful than any estimate.

:-)  Indeed, when I talk about "'gut feel' based on experience with
similar applications" I'm think of something like, "When I had a
query with the same number of joins against tables about this size
with the same number and types of key columns, metrics showed that
it took n ms and was CPU bound, and this new CPU and RAM hardware
benchmarks twice as fast, so I'll ballpark this at 2/3 the runtime
as a gut feel, and follow up with measurements as soon as
practical."  That may not have been entirely clear....

> So the incoming query in this not completely contrived case (I
> just picked the numbers to make the math even) takes the same
> amount of time to deliver a result either way.

I'm gonna quibble with you here.  Even if it gets done with the last
request at the same time either way (which discounts the very real
contention and context switch costs), if you release the thundering
herd of requests all at once they will all finish at about the same
time as that last request, while a queue allows a stream of
responses throughout.  Since results start coming back almost
immediately, and stream through evenly, your *average response time*
is nearly cut in half with the queue.  And that's without figuring
the network congestion issues of having all those requests complete
at the same time.

In my experience you can expect the response time benefit of
reducing the size of your connection pool to match available
resources to be more noticeable than the throughput improvements.
This directly contradicts many people's intuition, revealing the
downside of "gut feel".

-Kevin

Re: pgbench could not send data to client: Broken pipe

From
Greg Smith
Date:
Kevin Grittner wrote:
> In my experience you can expect the response time benefit of
> reducing the size of your connection pool to match available
> resources to be more noticeable than the throughput improvements.
> This directly contradicts many people's intuition, revealing the
> downside of "gut feel".
>

This is why I focused on showing there won't actually be a significant
throughput reduction, because that part is the most counterintuitive I
think.  Accurately modeling the latency improvements of pooling requires
much harder math, and it depends quite a bit on whether incoming traffic
is even or in bursts.  Easier in many cases to just swallow expectations
and estimates and just try it instead.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us