Home > mailing lists

Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6 - Mailing list pgsql-hackers

From	Naytro Naytro
Subject	Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6
Date	March 10, 2017 04:38:46
Msg-id	CAHgVxQEZfP+UK8O1wqq+tn-yDsbKqBAhRMG8EhNCS+z_fXLLHw@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6 (Andres Freund <andres@anarazel.de>)
List	pgsql-hackers

Tree view

2017-03-09 20:19 GMT+01:00 Andres Freund <andres@anarazel.de>:

Hi,

On 2017-03-09 13:47:35 +0100, Naytro Naytro wrote:
> We are having some performance issues after we upgraded to newest
> version of PostgreSQL, before it everything was fast and smooth.
>
> Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> upgraded to 9.6.2 with no improvement.
>
> Some information about our setup: Freebsd, Solaris (SmartOS), simple
> master-slave using streaming replication.

Which node is on which of those, and where is the high load?

High load in only on slaves, FreeBSD (master+slave) and Solaris (only slaves)

> Problem:
> Very high system CPU when master is streaming replication data, CPU
> goes up to 77%. Only one process is generating this load, it's a
> postgresql startup process. When I attached a truss to this process I
> saw a lot o read calls with almost the same number of errors (EAGAIN).

Hm. Just to clarify: The load is on the *receiving* side, in the startup
process? Because the load doesn't quite look that way...

Yes

> read(6,0x7fffffffa0c7,1) ERR#35 'Resource temporarily unavailable'
>
> Descriptor 6 is a pipe

That's presumably a latches internal pipe. Could you redo that
truss/strace with timestamps attached? Does truss show signals
received? The above profile would e.g. make a lot more sense if not. Is
the wal receiver sending signals?

Truss from Solaris: http://pastebin.com/WajedZ8Y and FreeBSD: http://pastebin.com/DB5iT8na

FreeBSD truss should show signals by default

Dtrace from solaris: http://pastebin.com/u03uVKbr

> Read call try to read one byte over and over, I looked up to source
> code and I think this file is responsible for this behavior
> src/backend/storage/ipc/latch.c. There was no such file in 9.4.

It was "just" moved (and expanded), used to be at
src/backend/port/unix_latch.c.

There normally shouldn't be that much "latch traffic" in the startup
process, we'd expect to block from within WaitForWALToBecomeAvailable().

Hm. Any chance you've configured a recovery_min_apply_delay? Although
I'd expect more timestamp calls in that case.

No, we don't have this option configured

Greetings,

Andres Freund

pgsql-hackers by date:

From: David Rowley
Date: 10 March 2017, 04:31:23
Subject: Re: [HACKERS] Parallel Bitmap scans a bit broken

From: David Christensen
Date: 10 March 2017, 04:39:04
Subject: Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supportinginfrastructure

Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6 - Mailing list pgsql-hackers

Previous

Next