Re: MusicBrainz postgres performance issues - Mailing list pgsql-performance

From michael@sqlexec.com
Subject Re: MusicBrainz postgres performance issues
Date
Msg-id 55062128.7060804@sqlexec.com
Whole thread Raw
In response to Re: MusicBrainz postgres performance issues  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: MusicBrainz postgres performance issues  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-performance
I agree with your counter argument about how high max_connections "can" cause problems, but max_connections may not part of the problem here.  There's a bunch of "depends stuff" in there based on workload details, # cpus, RAM, etc.

I'm still waiting to find out how many CPUs on this DB server.  Did i miss it somewhere in the email thread below?

Sunday, March 15, 2015 8:07 PM

If all the connections are active at the same time (i.e. running
queries), they have to share the 32 cores somehow. Or I/O, if that's the
bottleneck.

In other words, you're not improving the throughput of the system,
you're merely increasing latencies. And it may easily happen that the
latency increase is not linear, but grows faster - because of locking,
context switches and other process-related management.

Imagine you have a query taking 1 second of CPU time. If you have 64
such queries running concurrently on 32 cores, each gets only 1/2 a CPU
and so takes >=2 seconds. With 500 queries, it's >=15 seconds per, etc.

If those queries are acquiring the same locks (e.g. updating the same
rows, or so), you can imagine what happens ...

Also, if part of the query required a certain amount of memory for part
of the plan, it now holds that memory for much longer too. That only
increases the change of OOM issues.

It may work fine when most of the connections are idle, but it makes
storms like this possible.


Sunday, March 15, 2015 7:55 PM
Why is 500 connections "insane".  We got 32 CPU with 96GB and 3000 max connections, and we are doing fine, even when hitting our max concurrent connection peaks around 4500.  At a previous site, we were using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 max concurrent connections.  So I wouldn't be too hasty in saying more than 500 is asking for trouble.  Just as long as you got your kernel resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem.
Sunday, March 15, 2015 7:41 PM
On 15.3.2015 23:47, Andres Freund wrote:
On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote:
Here's the problem with a large shared_buffers on a machine that's
getting pushed into swap. It starts to swap BUFFERs. Once buffers
start getting swapped you're not just losing performance, that huge
shared_buffers is now working against you because what you THINK are
buffers in RAM to make things faster are in fact blocks on a hard
drive being swapped in and out during reads. It's the exact opposite
of fast. :)
IMNSHO that's tackling things from the wrong end. If 12GB of shared 
buffers drive your 48GB dedicated OLTP postgres server into swapping
out actively used pages, the problem isn't the 12GB of shared
buffers, but that you require so much memory for other things. That
needs to be fixed.
I second this opinion.

As was already pointed out, the 500 connections is rather insane
(assuming the machine does not have hundreds of cores).

If there are memory pressure issues, it's likely because many queries
are performing memory-expensive operations at the same time (might even
be a bad estimate causing hashagg to use much more than work_mem).


But! We haven't even established that swapping is an actual problem
here. The ~2GB of swapped out memory could just as well be the java raid
controller management monstrosity or something similar. Those pages
won't ever be used and thus can better be used to buffer IO.

You can check what's actually swapped out using:
grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB'

For swapping to be actually harmful you need to have pages that are 
regularly swapped in. vmstat will tell.
I've already asked for vmstat logs, so let's wait.

In a concurrent OLTP workload (~450 established connections do
suggest that) with a fair amount of data keeping the hot data set in 
shared_buffers can significantly reduce problems. Constantly
searching for victim buffers isn't a nice thing, and that will happen
if your most frequently used data doesn't fit into s_b. On the other
hand, if your data set is so large that even the hottest part doesn't
fit into memory (perhaps because there's no hottest part as there's
no locality at all), a smaller shared buffers can make things more
efficient, because the search for replacement buffers is cheaper with
a smaller shared buffers setting.
I've met many systems with max_connections values this high, and it was
mostly idle connections because of separate connection pools on each
application server. So mostly idle (90% of the time), but at peak time
all the application servers want to od stuff at the same time. And it
all goes KABOOOM! just like here.


Sunday, March 15, 2015 6:47 PM

IMNSHO that's tackling things from the wrong end. If 12GB of shared
buffers drive your 48GB dedicated OLTP postgres server into swapping out
actively used pages, the problem isn't the 12GB of shared buffers, but
that you require so much memory for other things. That needs to be
fixed.

But! We haven't even established that swapping is an actual problem
here. The ~2GB of swapped out memory could just as well be the java raid
controller management monstrosity or something similar. Those pages
won't ever be used and thus can better be used to buffer IO.

You can check what's actually swapped out using:
grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB'

For swapping to be actually harmful you need to have pages that are
regularly swapped in. vmstat will tell.

In a concurrent OLTP workload (~450 established connections do suggest
that) with a fair amount of data keeping the hot data set in
shared_buffers can significantly reduce problems. Constantly searching
for victim buffers isn't a nice thing, and that will happen if your most
frequently used data doesn't fit into s_b. On the other hand, if your
data set is so large that even the hottest part doesn't fit into memory
(perhaps because there's no hottest part as there's no locality at all),
a smaller shared buffers can make things more efficient, because the
search for replacement buffers is cheaper with a smaller shared buffers
setting.

Greetings,

Andres Freund

Sunday, March 15, 2015 2:25 PM

Here's the problem with a large shared_buffers on a machine that's
getting pushed into swap. It starts to swap BUFFERs. Once buffers
start getting swapped you're not just losing performance, that huge
shared_buffers is now working against you because what you THINK are
buffers in RAM to make things faster are in fact blocks on a hard
drive being swapped in and out during reads. It's the exact opposite
of fast. :)


Attachment

pgsql-performance by date:

Previous
From: Andres Freund
Date:
Subject: Re: MusicBrainz postgres performance issues
Next
From: Gavin Flower
Date:
Subject: Re: MusicBrainz postgres performance issues