Re: SSI slows down over time - Mailing list pgsql-performance

From Ryan Johnson
Subject Re: SSI slows down over time
Date
Msg-id 5342B7DE.3020604@cs.utoronto.ca
Whole thread Raw
In response to Re: SSI slows down over time  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 06/04/2014 10:55 AM, Tom Lane wrote:
Ryan Johnson <ryan.johnson@cs.utoronto.ca> writes:
I get a strange behavior across repeated runs: each 100-second run is a 
bit slower than the one preceding it, when run with SSI (SERIALIZABLE). 
... So the question: what should I look for to diagnose/triage this problem? 
In the past I've seen behaviors like this that traced to the range of
"interesting" transaction IDs getting wider as time went on, so that
more pages of pg_clog were hot, leading to more I/O traffic in the
clog SLRU buffers.  Possibly this is some effect like that.
The effect disappears completely if I run under SI instead of SSI, though. That makes me suspect strongly that the problem lurks in SSI-specific infrastrucure.

However, I did notice that the SLRU buffer that holds "old" SSI transactions sometimes spikes from hundreds to millions of entries (by annotating the source to ereport a warning whenever the difference between head and tail is at least 10% higher than the previous record). Not sure if that's related, though: I'm pretty sure SSI never scans SLRU, it's only used for random lookups.

I've tried linux perf, but all it says is that lots of time is going to 
LWLock (but callgraph tracing doesn't work in my not-bleeding-edge 
kernel).
You could recompile with -DLWLOCK_STATS to enable gathering stats on
which LWLocks are heavily contended.  That would be a starting point
for localizing the cause.

Here are the offenders (100-second run, 24 clients, ~2.2ktps):
lwlock        7 shacq         0 exacq 7002810 blk  896235 spindelay  213
lwlock       28 shacq  94984166 exacq   3938085 blk 572762 spindelay  163
lwlock       65 shacq   3347000 exacq   2933440 blk 255927 spindelay   90
lwlock       79 shacq   1742574 exacq   3009663 blk 216178 spindelay   41
lwlock       76 shacq   2293386 exacq   2892242 blk 205199 spindelay   70
lwlock       66 shacq    498909 exacq   2987485 blk 171508 spindelay   48
lwlock       80 shacq    512107 exacq   3181753 blk 165772 spindelay   43
lwlock       71 shacq    815733 exacq   3088157 blk 165579 spindelay   48
lwlock       74 shacq    603321 exacq   3065391 blk 159953 spindelay   56
lwlock       67 shacq    695465 exacq   2918970 blk 149339 spindelay   28
lwlock       69 shacq    411203 exacq   3044007 blk 148655 spindelay   34
lwlock       72 shacq    515260 exacq   2973321 blk 147533 spindelay   43
lwlock       30 shacq  41628636 exacq      8799 blk 143889 spindelay  186
lwlock       75 shacq    409472 exacq   2987227 blk 143196 spindelay   38
lwlock       77 shacq    409401 exacq   2946972 blk 139507 spindelay   34
lwlock       73 shacq    402544 exacq   2943467 blk 139380 spindelay   43
lwlock       78 shacq    404220 exacq   2912665 blk 137625 spindelay   21
lwlock       70 shacq    603643 exacq   2816730 blk 135851 spindelay   37
lwlock       68 shacq    403533 exacq   2862017 blk 131946 spindelay   30
lwlock       29 shacq         0 exacq    255302 blk 75838 spindelay    1
lwlock        0 shacq         0 exacq    561508 blk 12445 spindelay    3
lwlock       11 shacq   1245499 exacq    219717 blk 5501 spindelay   10
lwlock        4 shacq    381211 exacq    209146 blk 1273 spindelay    4
lwlock        3 shacq        16 exacq    209081 blk 522 spindelay    0
lwlock        8 shacq         0 exacq    137961 blk 50 spindelay    0
lwlock  2097366 shacq         0 exacq    384586 blk 1 spindelay    0
lwlock  2097365 shacq         0 exacq    370176 blk 1 spindelay    0
lwlock  2097359 shacq         0 exacq    363845 blk 1 spindelay    0

The above aggregates the per-lock stats from all processes, filters out locks with fewer than 10000 accesses (shared+exclusive) or with zero blk, then sorts by highest blk first.

According to [1], locks {28, 29, 30} are {SerializableXactHashLock, SerializableFinishedListLock, SerializablePredicateLockListLock}, all SSI-related; locks 65-80 are the sixteen PredicateLockMgrLocks that the post mentions. Looking in lwlock.h, lock 7 (which tops the list) is the WALInsertLock. That lock was *not* mentioned in the pgsql-hackers post.

Re-running the same analysis for SI instead of SSI gives 4.6ktps and a much shorter list:
lwlock        7 shacq         0 exacq 14050121 blk  3429384 spindelay  347
lwlock       11 shacq   3133994 exacq    450325 blk 23456 spindelay   29
lwlock        0 shacq         0 exacq    684775 blk 19158 spindelay    3
lwlock        4 shacq    780846 exacq    428771 blk 4539 spindelay    6
lwlock        3 shacq        19 exacq    428705 blk 1147 spindelay    0
lwlock       59 shacq         0 exacq    125943 blk 203 spindelay    0
lwlock        8 shacq         0 exacq    287629 blk 116 spindelay    0
lwlock  2097358 shacq         0 exacq    752361 blk 1 spindelay    0
lwlock  2097355 shacq         0 exacq    755086 blk 1 spindelay    0
lwlock  2097352 shacq         0 exacq    760232 blk 1 spindelay    0

However, all of this only confirms that SSI has a lock bottleneck; it doesn't say why the bottleneck gets worse over time.

[1] http://www.postgresql.org/message-id/CA+TgmoYAiSM2jWEndReY5PL0sKbhgg7dbDH6r=oXKYzi9B7KJA@mail.gmail.com

Thoughts?
Ryan

pgsql-performance by date:

Previous
From: "Graeme B. Bell"
Date:
Subject: Re: PGSQL 9.3 - Materialized View - multithreading
Next
From: Ryan Johnson
Date:
Subject: Re: SSI slows down over time