Thread: Recursive query performance issue

Recursive query performance issue

From
Jamie Koceniak
Date:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 

 

Re: Recursive query performance issue

From
Pavel Stehule
Date:
Hi



2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 


there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel
 

 


Re: Recursive query performance issue

From
Merlin Moncure
Date:
On Tue, Oct 20, 2015 at 12:34 PM, Jamie Koceniak
<jkoceniak@mediamath.com> wrote:
> Version:
>
> -----------------------------------------------------------------------------------------------
>
> PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
> 4.7.2-5) 4.7.2, 64-bit
>
> Query Plan
>
> http://explain.depesz.com/s/4s37
>
> Normally, this query takes around 200-300 ms to execute.
>
> However when several queries are run concurrently, query performance drops
> to 30-60 seconds.

Please define 'several'.  Essential information here is a capture of
'top' and possibly 'perf top'.  Also if the problem is storage related
iostat can be very useful  (or vmstat in a pinch)

FYI you can use pgbench with -f mode to measure concurrency
performance of any query.

The very first thing to rule out is a storage bottleneck via measured
iowait.  Assuming that's the case, this problem is interesting if:
*) Scaling is much worse than it should be
*) You can confirm this on more modern postgres  (interesting problems
are only interesting if they are unsolved)

merlin


Re: Recursive query performance issue

From
Pavel Stehule
Date:


2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

 

Thanks for the reply.

 

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

 

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)


so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(

What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers

Regards

Pavel

 

 

Thanks,

Jamie

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

Hi

 

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 

 

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel

 

 

 


Re: Recursive query performance issue

From
Pavel Stehule
Date:


2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

 

Or were you referring to SHMMAX?


value of shared_buffers - run SQL statements SHOW shared_buffers;

Regards

Pavel

 

Thanks

 

From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue

 

Ok

 

df -h /dev/shm

Filesystem      Size  Used Avail Use% Mounted on

tmpfs           406G     0  406G   0% /run/shm

 

Ok I will try lowering it.

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM


To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

 

 

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

 

Thanks for the reply.

 

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

 

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)

 

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(

What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers

Regards

Pavel


 

 

Thanks,

Jamie

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

Hi

 

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 

 

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel

 

 

 

 


Re: Recursive query performance issue

From
Pavel Stehule
Date:


2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

adama_prod=# SHOW shared_buffers;

shared_buffers

----------------

64GB


can you try to increase shared buffers to 200GB and decrease effective cache size to 180GB? If it is possibly - I am not sure, if this setting is good fro production usage, but the result can be interesting for bottleneck identification.

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:26 PM


To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

 

 

2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

 

Or were you referring to SHMMAX?

 

value of shared_buffers - run SQL statements SHOW shared_buffers;

Regards

Pavel

 

Thanks

 

From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue

 

Ok

 

df -h /dev/shm

Filesystem      Size  Used Avail Use% Mounted on

tmpfs           406G     0  406G   0% /run/shm

 

Ok I will try lowering it.

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM


To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

 

 

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

 

Thanks for the reply.

 

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

 

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)

 

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(

What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers

Regards

Pavel


 

 

Thanks,

Jamie

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

Hi

 

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 

 

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel

 

 

 

 

 


Re: Recursive query performance issue

From
Merlin Moncure
Date:
On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
>>
>> adama_prod=# SHOW shared_buffers;
>>
>> shared_buffers
>>
>> ----------------
>>
>> 64GB
>
>
> can you try to increase shared buffers to 200GB and decrease effective cache
> size to 180GB? If it is possibly - I am not sure, if this setting is good
> fro production usage, but the result can be interesting for bottleneck
> identification.

we need to see a snapshot from
*) top
*) perf top

merlin


Re: Recursive query performance issue

From
vincent elschot
Date:


On 20-10-15 19:34, Jamie Koceniak wrote:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 

 

Is the concurrency the cause or the result of the slowdown?
Are you executing the same query with the same parameters or do the parameters differ, perhaps making PostgreSQL
choose different queryplan?

Re: Recursive query performance issue

From
Pavel Stehule
Date:
Hi

this extremely high load looks like different issue - maybe spinlock issue or virtual memory issue.

Probably you need some low level debug tools like perf or dtrace :(

http://www.postgresql.org/message-id/20131206095629.GI7814@awork2.anarazel.de

Has you last PostgreSQL upgrade?

result of "perf top" when this issue is active is really requested.

Regards

Pavel

2015-10-23 19:45 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
Hi,

We just had the performance problem again today.
Here is some of the top output. Unfortunately, we don't have perf top installed.

top - 16:22:16 up 29 days, 13:00,  2 users,  load average: 164.63, 158.62, 148.52
Tasks: 1369 total, 181 running, 1188 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.2 us,  0.7 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem:   2068265 total,   433141 used,  1635124 free,      586 buffers
MiB Swap:     7812 total,        0 used,     7812 free,   412641 cached

   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 81745 postgres  20   0 65.7g  51m  34m R   101  0.0   0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT
 81782 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT
 81797 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT
 67103 postgres  20   0 65.7g  81m  56m R    97  0.0   2:01.89 postgres: user1 db 0.0.0.4(46337) SELECT
 82527 postgres  20   0 65.7g  25m  20m R    93  0.0   0:02.35 postgres: user1 db 0.0.0.2(52490) SELECT
 82559 postgres  20   0 65.7g  25m  20m R    93  0.0   0:02.17 postgres: user1 db 0.0.0.2(52496) SELECT
 82728 postgres  20   0 65.7g  80m  76m R    93  0.0   0:00.60 postgres: user1 db 0.0.0.6(60957) SELECT
 65588 postgres  20   0 65.7g  76m  56m R    89  0.0   2:12.27 postgres: user1 db 0.0.0.6(57195) SELECT
 80594 postgres  20   0 65.7g  34m  28m R    89  0.0   0:22.81 postgres: user1 db 0.0.0.2(52071) SELECT
 25176 postgres  20   0 65.7g  74m  57m R    85  0.0   7:24.42 postgres: user1 db 0.0.0.2(39410) SELECT
 82182 postgres  20   0 65.7g 513m 502m R    85  0.0   0:04.85 postgres: user1 db 0.0.0.4(49789) SELECT
 82034 postgres  20   0 65.7g 523m 510m R    81  0.0   0:05.79 postgres: user1 db 0.0.0.3(44683) SELECT
 82439 postgres  20   0 65.7g 262m 258m R    81  0.0   0:02.64 postgres: user1 db 0.0.0.6(60887) SELECT
 82624 postgres  20   0 65.7g 148m 143m R    81  0.0   0:01.20 postgres: user1 db 0.0.0.4(49888) SELECT
 82637 postgres  20   0 65.7g 139m 134m R    81  0.0   0:01.17 postgres: user1 db 0.0.0.3(44805) SELECT
 82669 postgres  20   0 65.7g 119m 114m R    81  0.0   0:00.97 postgres: user1 db 0.0.0.6(60939) SELECT
 82723 postgres  20   0 65.7g  79m  75m R    81  0.0   0:00.56 postgres: user1 db 0.0.0.4(49907) SELECT
 29160 postgres  20   0 65.7g  79m  54m R    77  0.0   6:52.13 postgres: user1 db 0.0.0.6(48802) SELECT
 51095 postgres  20   0 65.7g  81m  57m R    77  0.0   4:01.51 postgres: user1 db 0.0.0.4(42914) SELECT
 81833 postgres  20   0 65.7g 528m 515m R    77  0.0   0:07.23 postgres: user1 db 0.0.0.3(44644) SELECT
 81978 postgres  20   0 65.7g 528m 515m R    77  0.0   0:06.05 postgres: user1 db 0.0.0.2(52364) SELECT
 82099 postgres  20   0 65.7g 523m 510m R    77  0.0   0:05.18 postgres: user1 db 0.0.0.3(44692) SELECT
 82111 postgres  20   0 65.7g 523m 510m R    77  0.0   0:05.14 postgres: user1 db 0.0.0.4(49773) SELECT
 82242 postgres  20   0 65.7g 433m 429m R    77  0.0   0:04.27 postgres: user1 db 0.0.0.2(52428) SELECT
 82292 postgres  20   0 65.7g 407m 402m R    77  0.0   0:04.10 postgres: user1 db 0.0.0.2(52440) SELECT
 82408 postgres  20   0 65.7g 292m 288m R    77  0.0   0:02.98 postgres: user1 db 0.0.0.4(49835) SELECT
 82542 postgres  20   0 65.7g 207m 202m R    77  0.0   0:01.98 postgres: user1 db 0.0.0.4(49868) SELECT
 63638 postgres  20   0 65.7g  80m  56m R    73  0.0   2:30.10 postgres: user1 db 0.0.0.2(48699) SELECT
 71572 postgres  20   0 65.7g  80m  56m R    73  0.0   1:31.13 postgres: user1 db 0.0.0.2(50223) SELECT
 80580 postgres  20   0 65.7g  34m  28m R    73  0.0   0:22.93 postgres: user1 db 0.0.0.2(52065) SELECT
 81650 postgres  20   0 65.8g 622m 555m R    73  0.0   0:08.84 postgres: user1 db 0.0.0.2(52290) SELECT
 81728 postgres  20   0 65.7g 523m 510m R    73  0.0   0:08.28 postgres: user1 db 0.0.0.4(49684) SELECT
 81942 postgres  20   0 65.7g 528m 515m R    73  0.0   0:06.46 postgres: user1 db 0.0.0.2(52355) SELECT
 81958 postgres  20   0 65.7g 528m 514m R    73  0.0   0:06.48 postgres: user1 db 0.0.0.4(49744) SELECT
 81980 postgres  20   0 65.7g 528m 515m R    73  0.0   0:06.02 postgres: user1 db 0.0.0.3(44671) SELECT
 82007 postgres  20   0 65.7g 523m 510m R    73  0.0   0:06.27 postgres: user1 db 0.0.0.3(44676) SELECT
 82374 postgres  20   0 65.7g 367m 362m R    73  0.0   0:03.48 postgres: user1 db 0.0.0.6(60873) SELECT
 82385 postgres  20   0 65.7g 310m 306m R    73  0.0   0:03.03 postgres: user1 db 0.0.0.6(60876) SELECT
 82520 postgres  20   0 65.7g 220m 215m R    73  0.0   0:02.00 postgres: user1 db 0.0.0.3(44785) SELECT
 82676 postgres  20   0 65.7g 116m 111m R    73  0.0   0:00.90 postgres: user1 db 0.0.0.2(52531) SELECT
 18471 postgres  20   0 65.7g  73m  56m R    69  0.0   8:14.08 postgres: user1 db 0.0.0.6(46144) SELECT
 43890 postgres  20   0 65.7g  76m  56m R    69  0.0   5:04.46 postgres: user1 db 0.0.0.3(36697) SELECT
 46130 postgres  20   0 65.7g  70m  57m R    69  0.0   4:46.56 postgres: user1 db 0.0.0.4(41871) SELECT
 55604 postgres  20   0 65.7g  81m  57m R    69  0.0   3:27.67 postgres: user1 db 0.0.0.3(39292) SELECT
 59139 postgres  20   0 65.7g  81m  57m R    69  0.0   3:01.18 postgres: user1 db 0.0.0.2(47670) SELECT
 63523 postgres  20   0 65.7g  80m  56m R    69  0.0   2:28.04 postgres: user1 db 0.0.0.2(48680) SELECT
 81707 postgres  20   0 65.7g 528m 515m S    69  0.0   0:08.44 postgres: user1 db 0.0.0.6(60737) SELECT
 81830 postgres  20   0 65.7g 523m 510m R    69  0.0   0:07.60 postgres: user1 db 0.0.0.4(49707) SELECT
 81932 postgres  20   0 65.7g 528m 515m R    69  0.0   0:06.65 postgres: user1 db 0.0.0.2(52352) SELECT
 81950 postgres  20   0 65.7g 528m 515m R    69  0.0   0:05.92 postgres: user1 db 0.0.0.6(60783) SELECT
 81973 postgres  20   0 65.7g 522m 510m R    69  0.0   0:06.18 postgres: user1 db 0.0.0.6(60789) SELECT
 82193 postgres  20   0 65.7g 487m 479m R    69  0.0   0:04.61 postgres: user1 db 0.0.0.2(52415) SELECT
 82358 postgres  20   0 65.7g 299m 295m R    69  0.0   0:03.11 postgres: user1 db 0.0.0.2(52453) SELECT
 82372 postgres  20   0 65.7g 318m 313m R    69  0.0   0:03.22 postgres: user1 db 0.0.0.4(49827) SELECT
 82381 postgres  20   0 65.7g 331m 326m R    69  0.0   0:03.30 postgres: user1 db 0.0.0.3(44757) SELECT
 82404 postgres  20   0 65.7g 294m 289m R    69  0.0   0:02.86 postgres: user1 db 0.0.0.3(44761) SELECT
 82415 postgres  20   0 65.7g 270m 266m R    69  0.0   0:02.80 postgres: user1 db 0.0.0.3(44767) SELECT
 82521 postgres  20   0 65.7g 209m 205m R    69  0.0   0:02.00 postgres: user1 db 0.0.0.3(44786) SELECT
 82526 postgres  20   0 65.7g  35m  29m R    69  0.0   0:01.20 postgres: user1 db 0.0.0.6(60906) SELECT
 82550 postgres  20   0 65.7g 188m 184m R    69  0.0   0:01.72 postgres: user1 db 0.0.0.4(49870) SELECT
 82587 postgres  20   0 65.7g 183m 178m R    69  0.0   0:01.64 postgres: user1 db 0.0.0.4(49882) SELECT
 82683 postgres  20   0 65.7g  97m  93m R    69  0.0   0:00.77 postgres: user1 db 0.0.0.4(49899) SELECT
 82685 postgres  20   0 65.7g 103m  99m R    69  0.0   0:00.84 postgres: user1 db 0.0.0.2(52532) SELECT
 82687 postgres  20   0 65.7g 109m 104m R    69  0.0   0:00.85 postgres: user1 db 0.0.0.3(44809) SELECT
 82712 postgres  20   0 65.7g  68m  64m R    69  0.0   0:00.55 postgres: user1 db 0.0.0.3(44814) SELECT
 82715 postgres  20   0 65.7g  75m  70m R    69  0.0   0:00.58 postgres: user1 db 0.0.0.4(49905) SELECT
 19548 postgres  20   0 65.7g  79m  56m R    65  0.0   8:02.44 postgres: user1 db 0.0.0.2(37887) SELECT
 36714 postgres  20   0 65.7g  80m  56m R    65  0.0   5:56.08 postgres: user1 db 0.0.0.3(35177) SELECT
 43599 postgres  20   0 65.7g  80m  56m R    65  0.0   5:05.03 postgres: user1 db 0.0.0.3(36638) SELECT

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Wednesday, October 21, 2015 12:50 PM
To: Pavel Stehule
Cc: Jamie Koceniak; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
>>
>> adama_prod=# SHOW shared_buffers;
>>
>> shared_buffers
>>
>> ----------------
>>
>> 64GB
>
>
> can you try to increase shared buffers to 200GB and decrease effective
> cache size to 180GB? If it is possibly - I am not sure, if this
> setting is good fro production usage, but the result can be
> interesting for bottleneck identification.

we need to see a snapshot from
*) top
*) perf top

merlin

Re: Recursive query performance issue

From
Jamie Koceniak
Date:

Hi Pavel,

 

Thanks for the reply.

 

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

 

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)

 

Thanks,

Jamie

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

Hi

 

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 

 

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel

 

 

 

Re: Recursive query performance issue

From
Jamie Koceniak
Date:

Ok

 

df -h /dev/shm

Filesystem      Size  Used Avail Use% Mounted on

tmpfs           406G     0  406G   0% /run/shm

 

Ok I will try lowering it.

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

 

 

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

 

Thanks for the reply.

 

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

 

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)

 

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(

What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers

Regards

Pavel


 

 

Thanks,

Jamie

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

Hi

 

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 

 

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel

 

 

 

 

Re: Recursive query performance issue

From
Jamie Koceniak
Date:

Hi Pavel,

 

Or were you referring to SHMMAX?

 

Thanks

 

From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue

 

Ok

 

df -h /dev/shm

Filesystem      Size  Used Avail Use% Mounted on

tmpfs           406G     0  406G   0% /run/shm

 

Ok I will try lowering it.

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

 

 

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

 

Thanks for the reply.

 

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

 

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)

 

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(

What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers

Regards

Pavel


 

 

Thanks,

Jamie

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

Hi

 

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 

 

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel

 

 

 

 

Re: Recursive query performance issue

From
Jamie Koceniak
Date:

adama_prod=# SHOW shared_buffers;

shared_buffers

----------------

64GB

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:26 PM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

 

 

2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

 

Or were you referring to SHMMAX?

 

value of shared_buffers - run SQL statements SHOW shared_buffers;

Regards

Pavel

 

Thanks

 

From: Jamie Koceniak
Sent: Wednesday, October 21, 2015 11:40 AM
To: 'Pavel Stehule'
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Recursive query performance issue

 

Ok

 

df -h /dev/shm

Filesystem      Size  Used Avail Use% Mounted on

tmpfs           406G     0  406G   0% /run/shm

 

Ok I will try lowering it.

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 11:24 AM


To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

 

 

2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Hi Pavel,

 

Thanks for the reply.

 

1. The queries aren’t waiting on any locks.

The query has a recursive join that uses a table with only 80k records and that table is not updated often.

 

2. The I/O load was not high. CPU utilization was very high and load was very high.

We have a large effective_cache_size = 512GB (25% of total memory)

 

so your server has 2TB RAM? It is not usual server - so this issue can be pretty strange :(

What is size of shared memory? Probably is significantly lower than effective_cache_size? Try to reduce effective cache size to be lower than shared buffers

Regards

Pavel


 

 

Thanks,

Jamie

 

From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Wednesday, October 21, 2015 12:04 AM
To: Jamie Koceniak
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

 

Hi

 

2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:

Version:

-----------------------------------------------------------------------------------------------

PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

 

Query Plan

http://explain.depesz.com/s/4s37

 

Normally, this query takes around 200-300 ms to execute.

However when several queries are run concurrently, query performance drops to 30-60 seconds.

 

 

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel

 

 

 

 

 

Re: Recursive query performance issue

From
Jamie Koceniak
Date:
Hi,

We just had the performance problem again today.
Here is some of the top output. Unfortunately, we don't have perf top installed.

top - 16:22:16 up 29 days, 13:00,  2 users,  load average: 164.63, 158.62, 148.52
Tasks: 1369 total, 181 running, 1188 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.2 us,  0.7 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem:   2068265 total,   433141 used,  1635124 free,      586 buffers
MiB Swap:     7812 total,        0 used,     7812 free,   412641 cached

   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 81745 postgres  20   0 65.7g  51m  34m R   101  0.0   0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT
 81782 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT
 81797 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT
 67103 postgres  20   0 65.7g  81m  56m R    97  0.0   2:01.89 postgres: user1 db 0.0.0.4(46337) SELECT
 82527 postgres  20   0 65.7g  25m  20m R    93  0.0   0:02.35 postgres: user1 db 0.0.0.2(52490) SELECT
 82559 postgres  20   0 65.7g  25m  20m R    93  0.0   0:02.17 postgres: user1 db 0.0.0.2(52496) SELECT
 82728 postgres  20   0 65.7g  80m  76m R    93  0.0   0:00.60 postgres: user1 db 0.0.0.6(60957) SELECT
 65588 postgres  20   0 65.7g  76m  56m R    89  0.0   2:12.27 postgres: user1 db 0.0.0.6(57195) SELECT
 80594 postgres  20   0 65.7g  34m  28m R    89  0.0   0:22.81 postgres: user1 db 0.0.0.2(52071) SELECT
 25176 postgres  20   0 65.7g  74m  57m R    85  0.0   7:24.42 postgres: user1 db 0.0.0.2(39410) SELECT
 82182 postgres  20   0 65.7g 513m 502m R    85  0.0   0:04.85 postgres: user1 db 0.0.0.4(49789) SELECT
 82034 postgres  20   0 65.7g 523m 510m R    81  0.0   0:05.79 postgres: user1 db 0.0.0.3(44683) SELECT
 82439 postgres  20   0 65.7g 262m 258m R    81  0.0   0:02.64 postgres: user1 db 0.0.0.6(60887) SELECT
 82624 postgres  20   0 65.7g 148m 143m R    81  0.0   0:01.20 postgres: user1 db 0.0.0.4(49888) SELECT
 82637 postgres  20   0 65.7g 139m 134m R    81  0.0   0:01.17 postgres: user1 db 0.0.0.3(44805) SELECT
 82669 postgres  20   0 65.7g 119m 114m R    81  0.0   0:00.97 postgres: user1 db 0.0.0.6(60939) SELECT
 82723 postgres  20   0 65.7g  79m  75m R    81  0.0   0:00.56 postgres: user1 db 0.0.0.4(49907) SELECT
 29160 postgres  20   0 65.7g  79m  54m R    77  0.0   6:52.13 postgres: user1 db 0.0.0.6(48802) SELECT
 51095 postgres  20   0 65.7g  81m  57m R    77  0.0   4:01.51 postgres: user1 db 0.0.0.4(42914) SELECT
 81833 postgres  20   0 65.7g 528m 515m R    77  0.0   0:07.23 postgres: user1 db 0.0.0.3(44644) SELECT
 81978 postgres  20   0 65.7g 528m 515m R    77  0.0   0:06.05 postgres: user1 db 0.0.0.2(52364) SELECT
 82099 postgres  20   0 65.7g 523m 510m R    77  0.0   0:05.18 postgres: user1 db 0.0.0.3(44692) SELECT
 82111 postgres  20   0 65.7g 523m 510m R    77  0.0   0:05.14 postgres: user1 db 0.0.0.4(49773) SELECT
 82242 postgres  20   0 65.7g 433m 429m R    77  0.0   0:04.27 postgres: user1 db 0.0.0.2(52428) SELECT
 82292 postgres  20   0 65.7g 407m 402m R    77  0.0   0:04.10 postgres: user1 db 0.0.0.2(52440) SELECT
 82408 postgres  20   0 65.7g 292m 288m R    77  0.0   0:02.98 postgres: user1 db 0.0.0.4(49835) SELECT
 82542 postgres  20   0 65.7g 207m 202m R    77  0.0   0:01.98 postgres: user1 db 0.0.0.4(49868) SELECT
 63638 postgres  20   0 65.7g  80m  56m R    73  0.0   2:30.10 postgres: user1 db 0.0.0.2(48699) SELECT
 71572 postgres  20   0 65.7g  80m  56m R    73  0.0   1:31.13 postgres: user1 db 0.0.0.2(50223) SELECT
 80580 postgres  20   0 65.7g  34m  28m R    73  0.0   0:22.93 postgres: user1 db 0.0.0.2(52065) SELECT
 81650 postgres  20   0 65.8g 622m 555m R    73  0.0   0:08.84 postgres: user1 db 0.0.0.2(52290) SELECT
 81728 postgres  20   0 65.7g 523m 510m R    73  0.0   0:08.28 postgres: user1 db 0.0.0.4(49684) SELECT
 81942 postgres  20   0 65.7g 528m 515m R    73  0.0   0:06.46 postgres: user1 db 0.0.0.2(52355) SELECT
 81958 postgres  20   0 65.7g 528m 514m R    73  0.0   0:06.48 postgres: user1 db 0.0.0.4(49744) SELECT
 81980 postgres  20   0 65.7g 528m 515m R    73  0.0   0:06.02 postgres: user1 db 0.0.0.3(44671) SELECT
 82007 postgres  20   0 65.7g 523m 510m R    73  0.0   0:06.27 postgres: user1 db 0.0.0.3(44676) SELECT
 82374 postgres  20   0 65.7g 367m 362m R    73  0.0   0:03.48 postgres: user1 db 0.0.0.6(60873) SELECT
 82385 postgres  20   0 65.7g 310m 306m R    73  0.0   0:03.03 postgres: user1 db 0.0.0.6(60876) SELECT
 82520 postgres  20   0 65.7g 220m 215m R    73  0.0   0:02.00 postgres: user1 db 0.0.0.3(44785) SELECT
 82676 postgres  20   0 65.7g 116m 111m R    73  0.0   0:00.90 postgres: user1 db 0.0.0.2(52531) SELECT
 18471 postgres  20   0 65.7g  73m  56m R    69  0.0   8:14.08 postgres: user1 db 0.0.0.6(46144) SELECT
 43890 postgres  20   0 65.7g  76m  56m R    69  0.0   5:04.46 postgres: user1 db 0.0.0.3(36697) SELECT
 46130 postgres  20   0 65.7g  70m  57m R    69  0.0   4:46.56 postgres: user1 db 0.0.0.4(41871) SELECT
 55604 postgres  20   0 65.7g  81m  57m R    69  0.0   3:27.67 postgres: user1 db 0.0.0.3(39292) SELECT
 59139 postgres  20   0 65.7g  81m  57m R    69  0.0   3:01.18 postgres: user1 db 0.0.0.2(47670) SELECT
 63523 postgres  20   0 65.7g  80m  56m R    69  0.0   2:28.04 postgres: user1 db 0.0.0.2(48680) SELECT
 81707 postgres  20   0 65.7g 528m 515m S    69  0.0   0:08.44 postgres: user1 db 0.0.0.6(60737) SELECT
 81830 postgres  20   0 65.7g 523m 510m R    69  0.0   0:07.60 postgres: user1 db 0.0.0.4(49707) SELECT
 81932 postgres  20   0 65.7g 528m 515m R    69  0.0   0:06.65 postgres: user1 db 0.0.0.2(52352) SELECT
 81950 postgres  20   0 65.7g 528m 515m R    69  0.0   0:05.92 postgres: user1 db 0.0.0.6(60783) SELECT
 81973 postgres  20   0 65.7g 522m 510m R    69  0.0   0:06.18 postgres: user1 db 0.0.0.6(60789) SELECT
 82193 postgres  20   0 65.7g 487m 479m R    69  0.0   0:04.61 postgres: user1 db 0.0.0.2(52415) SELECT
 82358 postgres  20   0 65.7g 299m 295m R    69  0.0   0:03.11 postgres: user1 db 0.0.0.2(52453) SELECT
 82372 postgres  20   0 65.7g 318m 313m R    69  0.0   0:03.22 postgres: user1 db 0.0.0.4(49827) SELECT
 82381 postgres  20   0 65.7g 331m 326m R    69  0.0   0:03.30 postgres: user1 db 0.0.0.3(44757) SELECT
 82404 postgres  20   0 65.7g 294m 289m R    69  0.0   0:02.86 postgres: user1 db 0.0.0.3(44761) SELECT
 82415 postgres  20   0 65.7g 270m 266m R    69  0.0   0:02.80 postgres: user1 db 0.0.0.3(44767) SELECT
 82521 postgres  20   0 65.7g 209m 205m R    69  0.0   0:02.00 postgres: user1 db 0.0.0.3(44786) SELECT
 82526 postgres  20   0 65.7g  35m  29m R    69  0.0   0:01.20 postgres: user1 db 0.0.0.6(60906) SELECT
 82550 postgres  20   0 65.7g 188m 184m R    69  0.0   0:01.72 postgres: user1 db 0.0.0.4(49870) SELECT
 82587 postgres  20   0 65.7g 183m 178m R    69  0.0   0:01.64 postgres: user1 db 0.0.0.4(49882) SELECT
 82683 postgres  20   0 65.7g  97m  93m R    69  0.0   0:00.77 postgres: user1 db 0.0.0.4(49899) SELECT
 82685 postgres  20   0 65.7g 103m  99m R    69  0.0   0:00.84 postgres: user1 db 0.0.0.2(52532) SELECT
 82687 postgres  20   0 65.7g 109m 104m R    69  0.0   0:00.85 postgres: user1 db 0.0.0.3(44809) SELECT
 82712 postgres  20   0 65.7g  68m  64m R    69  0.0   0:00.55 postgres: user1 db 0.0.0.3(44814) SELECT
 82715 postgres  20   0 65.7g  75m  70m R    69  0.0   0:00.58 postgres: user1 db 0.0.0.4(49905) SELECT
 19548 postgres  20   0 65.7g  79m  56m R    65  0.0   8:02.44 postgres: user1 db 0.0.0.2(37887) SELECT
 36714 postgres  20   0 65.7g  80m  56m R    65  0.0   5:56.08 postgres: user1 db 0.0.0.3(35177) SELECT
 43599 postgres  20   0 65.7g  80m  56m R    65  0.0   5:05.03 postgres: user1 db 0.0.0.3(36638) SELECT

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Wednesday, October 21, 2015 12:50 PM
To: Pavel Stehule
Cc: Jamie Koceniak; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkoceniak@mediamath.com>:
>>
>> adama_prod=# SHOW shared_buffers;
>>
>> shared_buffers
>>
>> ----------------
>>
>> 64GB
>
>
> can you try to increase shared buffers to 200GB and decrease effective 
> cache size to 180GB? If it is possibly - I am not sure, if this 
> setting is good fro production usage, but the result can be 
> interesting for bottleneck identification.

we need to see a snapshot from
*) top
*) perf top

merlin

Re: Recursive query performance issue

From
Merlin Moncure
Date:
On Fri, Oct 23, 2015 at 12:45 PM, Jamie Koceniak
<jkoceniak@mediamath.com> wrote:
> Hi,
>
> We just had the performance problem again today.
> Here is some of the top output. Unfortunately, we don't have perf top installed.
>
> top - 16:22:16 up 29 days, 13:00,  2 users,  load average: 164.63, 158.62, 148.52
> Tasks: 1369 total, 181 running, 1188 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  6.2 us,  0.7 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> MiB Mem:   2068265 total,   433141 used,  1635124 free,      586 buffers
> MiB Swap:     7812 total,        0 used,     7812 free,   412641 cached
>
>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
>  81745 postgres  20   0 65.7g  51m  34m R   101  0.0   0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT
>  81782 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT
>  81797 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT
<snip>

ok, this rules out iowait.

load is 160+.  system is reporting 6.2%user, 93.1%idle,  0 iowait.
This is very odd.
*) how many processors do you have?
*) Can we have more details about the hardware platform?
*) Is this system virtualized? If so, what solution?

we need a perf top and a capture of 'vmstat 1' for context switches

merlin


Re: Recursive query performance issue

From
Jamie Koceniak
Date:
Had the issue again today.

Here is vmstat :
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
24  0      0 1591718656 605656 499370336    0    0     0   371    0    0  7  1 93  0
25  0      0 1591701376 605656 499371936    0    0     0   600 13975 20168 20  1 79  0
26  0      0 1591654784 605656 499372064    0    0     0  5892 12725 14627 20  1 79  0
25  0      0 1591614336 605656 499372128    0    0     0   600 11665 12642 21  1 78  0
27  0      0 1591549952 605656 499372192    0    0     0   408 16939 23387 23  1 76  0
29  0      0 1591675392 605656 499372288    0    0     0   836 15380 22564 23  1 76  0
27  0      0 1591608704 605656 499372352    0    0     0   456 17593 27955 23  1 76  0
34  0      0 1591524608 605656 499372480    0    0     0  5904 18963 30915 23  1 75  0
23  0      0 1591632384 605656 499372576    0    0     0   704 18190 31002 22  1 77  0
25  0      0 1591551360 605656 499372640    0    0     0   944 12532 14095 21  1 78  0
24  0      0 1591613568 605656 499372704    0    0     0   416 11183 12553 20  1 79  0
23  0      0 1591531520 605656 499372768    0    0     0   400 12648 15540 19  1 80  0
22  0      0 1591510528 605656 499372800    0    0     0  6024 14670 21993 19  1 80  0
31  0      0 1591388800 605656 499372896    0    0     0   472 20605 28242 20  1 79  0

We have  a 120 CPU server :)

processor       : 119
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz


-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Monday, October 26, 2015 8:04 AM
To: Jamie Koceniak
Cc: Pavel Stehule; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recursive query performance issue

On Fri, Oct 23, 2015 at 12:45 PM, Jamie Koceniak <jkoceniak@mediamath.com> wrote:
> Hi,
>
> We just had the performance problem again today.
> Here is some of the top output. Unfortunately, we don't have perf top installed.
>
> top - 16:22:16 up 29 days, 13:00,  2 users,  load average: 164.63, 158.62, 148.52
> Tasks: 1369 total, 181 running, 1188 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  6.2 us,  0.7 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> MiB Mem:   2068265 total,   433141 used,  1635124 free,      586 buffers
> MiB Swap:     7812 total,        0 used,     7812 free,   412641 cached
>
>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
>  81745 postgres  20   0 65.7g  51m  34m R   101  0.0   0:09.20 postgres: user1 db 0.0.0.2(52307) SELECT
>  81782 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.50 postgres: user1 db 0.0.0.3(44630) SELECT
>  81797 postgres  20   0 65.7g  51m  34m R   101  0.0   0:08.03 postgres: user1 db 0.0.0.6(60752) SELECT
<snip>

ok, this rules out iowait.

load is 160+.  system is reporting 6.2%user, 93.1%idle,  0 iowait.
This is very odd.
*) how many processors do you have?
*) Can we have more details about the hardware platform?
*) Is this system virtualized? If so, what solution?

we need a perf top and a capture of 'vmstat 1' for context switches

merlin

Re: Recursive query performance issue

From
Merlin Moncure
Date:
On Sat, Nov 14, 2015 at 12:58 AM, Jamie Koceniak
<jkoceniak@mediamath.com> wrote:
> Had the issue again today.
>
> Here is vmstat :
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> 24  0      0 1591718656 605656 499370336    0    0     0   371    0    0  7  1 93  0
> 25  0      0 1591701376 605656 499371936    0    0     0   600 13975 20168 20  1 79  0
> 26  0      0 1591654784 605656 499372064    0    0     0  5892 12725 14627 20  1 79  0
> 25  0      0 1591614336 605656 499372128    0    0     0   600 11665 12642 21  1 78  0
> 27  0      0 1591549952 605656 499372192    0    0     0   408 16939 23387 23  1 76  0
> 29  0      0 1591675392 605656 499372288    0    0     0   836 15380 22564 23  1 76  0
> 27  0      0 1591608704 605656 499372352    0    0     0   456 17593 27955 23  1 76  0
> 34  0      0 1591524608 605656 499372480    0    0     0  5904 18963 30915 23  1 75  0
> 23  0      0 1591632384 605656 499372576    0    0     0   704 18190 31002 22  1 77  0
> 25  0      0 1591551360 605656 499372640    0    0     0   944 12532 14095 21  1 78  0
> 24  0      0 1591613568 605656 499372704    0    0     0   416 11183 12553 20  1 79  0
> 23  0      0 1591531520 605656 499372768    0    0     0   400 12648 15540 19  1 80  0
> 22  0      0 1591510528 605656 499372800    0    0     0  6024 14670 21993 19  1 80  0
> 31  0      0 1591388800 605656 499372896    0    0     0   472 20605 28242 20  1 79  0
>
> We have  a 120 CPU server :)
>
> processor       : 119
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 62
> model name      : Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz

Per the numbers above. this server is very healthy.  Something is not
adding up here: I would really have liked to see a snapshot from 'top'
and 'perf top' taken at the same time.  Via top we could have seen if
some of the processors were completely loaded down while some were not
being utilized at all. This would suggest a problem with the operating
system, likely NUMA related.

*) Are you counting hyperthreading to get to the 120 cpu count

*) Is this server virtualized

*) what is the output of:
lscpu | grep NUMA

*) do you have 'taskset' installed?  Can we check affinity via:
taskset -c -p <pid>

where <pid> is the pid of a few randomly sampled postgres processes at work

*) Can you report exact kernel version

*) what is output of:
cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag

*) Is installing a newer postgres an option?  Configuring highly SMP
systems for reliable scaling may require some progressive thinking.

merlin