Thread: 8.3 on FreeBSD 6.3, sudden performance degradations

8.3 on FreeBSD 6.3, sudden performance degradations

From
"Ivan Zolotukhin"
Date:
Hello,

We experience sudden performance degradations on a PostgreSQL server
used as a backend for pretty big web application.

It's 16 GB RAM dedicated PostgreSQL 8.3.3 server with 2 x Quad Core
Xeon 2.33 GHz running 6.3-PRERELEASE FreeBSD. postgresql.conf tweaked
to match current configuration since PostgreSQL routinely serves 200+
queries/sec (taken from pgbouncer logs; pgbouncer compresses all
connections to no more than 15 PostgreSQL backends usually) on a 19 GB
database (active portion is smaller than RAM though) without any
problems (average query time is 10ms, load average is around 1). But
sometimes load average jumps to 30 (90% is actually a system load and
only ~10% is userspace) and average query time increases to 1000 ms.
The only thing to note is context switch rate which is usually around
5000 and during this extreme load events it's somewhere between
15000-30000 which is anyway not too big in my mind.

Here are simultaneous iostat 5 and vmstat 5 outputs during extreme
load event just to show you that there're no IO problems:

vmstat 5
 procs      memory      page                   disk   faults      cpu
 r b w     avm    fre  flt  re  pi  po  fr  sr am0   in   sy  cs us sy id
16 67 2 2392632 753664  193   0   0   0 230   1   0  195  211 111  7 16 77
28 77 0 2328792 793424 34813   0   0   0 4351   0  41 1913 21230 20337 14 86  0
23 84 1 2328388 942960 32744   0   0   0 1657   0  37 1655 16272 14521 12 88  0
18 86 2 2331980 869076 42569   0   0   0 9280   0  81 2481 15525 12303  3 97  0
10 65 0 2348716 817712 38242   0   0   0 892   0  38 1454 11198 10788  8 92  0
16 92 0 2351880 794680 34259   0   0   0 992   0  23 1197 8870 8950  7 93  0
10 89 0 2356904 776628 33605   0   0   0 1025   0  27 1255 9337 9481  8 92  0
23 84 0 2346688 777452 34393   0   0   0 1880   0  26 1431 13960 16467 12 88  0
25 81 0 2337260 778816 33082   0   0   0 2620   0  36 1585 15153 16051 12 88  0
35 78 1 2353940 931432 17842   0   0   0 9317   0  19 1048 8805 9204  4 96  0
23 84 0 2349364 862716 37562   0   0   0 738   0 142 1521 10611 11704  5 95  0
21 84 1 2343716 820540 37195   0   0   0 994   0  31 1415 11397 12853  7 93  0
10 95 0 2348456 795484 36345   0   0   0 1418   0  28 1270 10688 13094  8 92  0
24 77 0 2367896 771416 34100   0   0   0 1094   0  22 1136 9162 9912  9 91  0
23 84 0 2362272 768856 38150   0   0   0 2283   0  29 1757 14849 14004 12 88  0
16 89 0 2339412 774664 41673   0   0   0 3526   0 141 2953 28610 23840 15 85  0

iostat 5
      tty           amrd0             cpu
 tin tout  KB/t tps  MB/s  us ni sy in id
   0   25 53.66  42  2.22   7  0 16  0 77
   0   71 21.26  43  0.90  14  0 86  0  0
   0   25 22.26  32  0.70  13  0 87  0  0
   0    6 21.51   8  0.18   2  0 98  0  0
   0   40 18.64 178  3.24   7  0 92  1  0
   0    9 21.89  38  0.81   8  0 92  0  0
   0   25 24.62  23  0.56   7  0 92  0  0
   0   24 23.52  27  0.62   8  0 92  0  0
   0   24 23.41  27  0.62  11  0 88  0  0
   0   25 23.31  35  0.79  13  0 87  0  0
   0   14 20.27  14  0.28   4  0 82 14  0
   0   24 19.42 153  2.89   5  0 95  0  0
   0   25 23.79  31  0.72   7  0 93  0  0
   0   25 24.82  29  0.71   8  0 92  0  0
   0   25 29.23  21  0.59   9  0 91  0  0
   0   24 24.81  32  0.77  12  0 88  0  0
   0   25 19.70 145  2.79  13  0 86  1  0

Yep, update_process_title = off if it is important. No severe locks,
no long running (idle) transactions, no sudden pg_dumps, only usual
connections from web application at their usual rate.

Any ideas how to debug this?

--
Regards,
 Ivan

Re: 8.3 on FreeBSD 6.3, sudden performance degradations

From
Tomasz Ostrowski
Date:
On 2008-09-08 11:46, Ivan Zolotukhin wrote:

> vmstat 5
>   procs      memory      page                   disk     faults      cpu
>  r b  w     avm    fre   flt  re  pi  po   fr  sr am0   in    sy    cs us sy id
> 28 77 0 2328792 793424 34813   0   0   0 4351   0  41 1913 21230 20337 14 86  0

I do not know FreeBSD but can you check what is this "flt" stat? Is this
page fault? That would be a lot of page faults for 5 seconds, which
could mean that this server is memory starved and is swapping a lot.
Check for a process using insane amounts of memory. Maybe you have tuned
Postgres too aggressive.

Show us the output of "free" when there is a slowdown.

Once more - I do not know FreeBSD and just guessing what this "flt" stat is.

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
                                                      Winnie the Pooh

Re: 8.3 on FreeBSD 6.3, sudden performance degradations

From
Tomasz Ostrowski
Date:
On 2008-09-09 09:30, Tomasz Ostrowski wrote:
> On 2008-09-08 11:46, Ivan Zolotukhin wrote:
>
>> vmstat 5
>>   procs      memory      page                   disk     faults      cpu
>>  r b  w     avm    fre   flt  re  pi  po   fr  sr am0   in    sy    cs us sy id
>> 28 77 0 2328792 793424 34813   0   0   0 4351   0  41 1913 21230 20337 14 86  0
>
> I do not know FreeBSD but can you check what is this "flt" stat? Is this
> page fault? That would be a lot of page faults for 5 seconds, which
> could mean that this server is memory starved and is swapping a lot.

Discard this. Just found a man page for FreeBSD vmstat and indeed "flt"
is a page fault, but because there's no pi (pages paged in) or po (pages
paged out) the system is not trashing.

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
                                                      Winnie the Pooh

Re: 8.3 on FreeBSD 6.3, sudden performance degradations

From
Greg Smith
Date:
On Mon, 8 Sep 2008, Ivan Zolotukhin wrote:

> Yep, update_process_title = off if it is important.

Have you considered turning it on so you can see what processes are most
involved in the spike?  Normally in your situation I'd try to capture what
the output from top was during the problem period and match that with what
the processes involved are doing at the time, and you'll need that to see
inside the PG processes.  If the event is random then this data can be
hard to capture, you might have to write some scripts to save top output
in batch mode and the output from ps.

Did you confirm that the slow periods aren't checkpoint-related?  I've
seen some wacky stuff on Linux before where the system percentage went up
dramatically because the background disk I/O process got hyperactive.
Again, the way you can usually figure that out is to look at what top is
showing during that period.  I suspect you'll find some daemon going
crazy.  The vmstat and iostat info you included are certainly strange but
I don't know enough about FreeBSD to know exactly what would cause that.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD