Thread: Linux/PostgreSQL scalability issue - problem with 8 cores

Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

03 January 2008, 07:51:06

Hi all,

we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We
came accross a strange issue: when running with all 8 cores enabled
approximatly once a minute (period differs) the system is very busy for
a few seconds (~5-10s) and we don't know why - this issue don't show up
when we tell Linux to use only 2 cores, with 4 cores the problem is here
but it is still better than with 8 cores - all on the same machine, same
config, same workload. We don't see any apparent reason for these peaks.
We'd like to investigate it further but we don't know what to try next.
Any suggenstions? Any tunning tips for Linux+PostgreSQL on 8-way system?
Can this be connected with our heavy use of listen/notify and hundreds
backends in listen mode?

More details are below.

Thanks,

Kuba

System: HP DL360 2x5355, 8G RAM, P600+MSA50 - internal 2x72GB RAID 10
for OS, 10x72G disks RAID 10 for PostgreSQL data and wal
OS: Linux 2.6 64bit (kernel 2.6.21, 22, 23 makes little difference)
PostgreSQL: 8.2.4 (64bit), shared buffers 1G

Nothing else than PostgreSQL is running on the server. Cca 800
concurrent backends. Majority of backends in LISTEN doing nothing.
Client interface for most backends is ecpg+libpq.

Problem description:

The system is usually running 80-95% idle. Approximatly once a minute
for cca 5-10s there is a peak in activity which looks like this:

vmstat (and top or atop) reports 0% idle, 100% in user mode, very low
iowait, low IO activity, higher number of contex switches than usual but
not exceedingly high (2000-4000cs/s, usually 1500cs/s), few hundreds
waiting processes per second (usually 0-1/s). From looking at top and
running processes we can't see any obvious reason for the peak.
According to PostgreSQL log the long running commands from these moments
are e.g. begin transaction lasting several seconds.

When only 2 cores are enabled (kernel command line) then everything is
running smoothly. 4 cores exibits slightly better behavior than 8 cores
but worse than 2 cores - the peaks are visible.

We've tried kernel versions 2.6.21-23 (latest revisions as of beginning
December from kernel.org) the pattern slightly changed but it may also
be that the workload slightly changed.

pgbench or any other stress testing runs smoothly on the server.

The only strange thing about our usage pattern I can think of is heavy
use of LISTEN/NOTIFY especially hunderds backends in listen mode.

When restarting our connected clients the peaks are not there from time
0, they are visible after a while - seems something gets synchronized
and causing troubles then.

Since the server is PostgreSQL dedicated and no our client applications
are running on it - and there is a difference when 2 and 8 cores are
enabled -  we think that the peaks are not caused by our client
applications.

How can we diagnose what is happening during the peaks?

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Sven Geisler

Date:

03 January 2008, 12:27:59

Hi Jakub,

I do have a similar server (from DELL), which performance well with our
PostgreSQL application. I guess the peak in context switches is the only
think you can see.

Anyhow, I think it is you're LISTEN/NOTIFY approach which cause that
behaviour. I guess all backends do listen to the same notification.
I don't know the exact implementation, but I can imagine that all
backends are access the same section in the shared memory which cause
the increase of context switches. More cores means more access at the
same time.

Can you change your implementation?
- split you problem - create multiple notification if possible
- do an UNLISTEN if possible
- use another signalisation technique

Regards
Sven


Jakub Ouhrabka schrieb:
> Hi all,
>
> we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We
> came accross a strange issue: when running with all 8 cores enabled
> approximatly once a minute (period differs) the system is very busy for
> a few seconds (~5-10s) and we don't know why - this issue don't show up
> when we tell Linux to use only 2 cores, with 4 cores the problem is here
> but it is still better than with 8 cores - all on the same machine, same
> config, same workload. We don't see any apparent reason for these peaks.
> We'd like to investigate it further but we don't know what to try next.
> Any suggenstions? Any tunning tips for Linux+PostgreSQL on 8-way system?
> Can this be connected with our heavy use of listen/notify and hundreds
> backends in listen mode?
>
> More details are below.
>
> Thanks,
>
> Kuba
>
> System: HP DL360 2x5355, 8G RAM, P600+MSA50 - internal 2x72GB RAID 10
> for OS, 10x72G disks RAID 10 for PostgreSQL data and wal
> OS: Linux 2.6 64bit (kernel 2.6.21, 22, 23 makes little difference)
> PostgreSQL: 8.2.4 (64bit), shared buffers 1G
>
> Nothing else than PostgreSQL is running on the server. Cca 800
> concurrent backends. Majority of backends in LISTEN doing nothing.
> Client interface for most backends is ecpg+libpq.
>
> Problem description:
>
> The system is usually running 80-95% idle. Approximatly once a minute
> for cca 5-10s there is a peak in activity which looks like this:
>
> vmstat (and top or atop) reports 0% idle, 100% in user mode, very low
> iowait, low IO activity, higher number of contex switches than usual but
> not exceedingly high (2000-4000cs/s, usually 1500cs/s), few hundreds
> waiting processes per second (usually 0-1/s). From looking at top and
> running processes we can't see any obvious reason for the peak.
> According to PostgreSQL log the long running commands from these moments
> are e.g. begin transaction lasting several seconds.
>
> When only 2 cores are enabled (kernel command line) then everything is
> running smoothly. 4 cores exibits slightly better behavior than 8 cores
> but worse than 2 cores - the peaks are visible.
>
> We've tried kernel versions 2.6.21-23 (latest revisions as of beginning
> December from kernel.org) the pattern slightly changed but it may also
> be that the workload slightly changed.
>
> pgbench or any other stress testing runs smoothly on the server.
>
> The o usage panly strange thing about ourttern I can think of is heavy
> use of LISTEN/NOTIFY especially hunderds backends in listen mode.
>
> When restarting our connected clients the peaks are not there from time
> 0, they are visible after a while - seems something gets synchronized
> and causing troubles then.
>
> Since the server is PostgreSQL dedicated and no our client applications
> are running on it - and there is a difference when 2 and 8 cores are
> enabled -  we think that the peaks are not caused by our client
> applications.
>
> How can we diagnose what is happening during the peaks?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly

--
Sven Geisler <sgeisler@aeccom.com>   Tel +49.30.921017.81  Fax .50
Senior Developer, AEC/communications GmbH & Co. KG Berlin, Germany

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

03 January 2008, 12:59:39

Jakub Ouhrabka <kuba@comgate.cz> writes:
> we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We
> came accross a strange issue: when running with all 8 cores enabled
> approximatly once a minute (period differs) the system is very busy for
> a few seconds (~5-10s) and we don't know why - this issue don't show up
> when we tell Linux to use only 2 cores, with 4 cores the problem is here
> but it is still better than with 8 cores - all on the same machine, same
> config, same workload. We don't see any apparent reason for these peaks.

Interesting.  Maybe you could use oprofile to try to see what's
happening?  It sounds a bit like momentary contention for a spinlock,
but exactly what isn't clear.

> Can this be connected with our heavy use of listen/notify and hundreds
> backends in listen mode?

Perhaps.  Have you tried logging executions of NOTIFY to see if they are
correlated with the spikes?

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

03 January 2008, 13:37:45

Hi Tom,

 > Interesting.  Maybe you could use oprofile to try to see what's
 > happening?  It sounds a bit like momentary contention for a spinlock,
 > but exactly what isn't clear.

ok, we're going to try oprofile, will let you know...

 > Perhaps.  Have you tried logging executions of NOTIFY to see if they
 > are correlated with the spikes?

We didn't log the notifies but I think it's not correlated. We'll have a
detailed look next time we try it (with oprofile).

Thanks for suggestions!

Kuba

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

03 January 2008, 14:21:10

Hi Sven,

 > I guess all backends do listen to the same notification.

Unfortunatelly no. The backends are listening to different notifications
in different databases. Usually there are only few listens per database
with only one exception - there are many (hundreds) listens in one
database but all for different notifications.

 > Can you change your implementation?
 > - split you problem - create multiple notification if possible

Yes, it is like this.

 > - do an UNLISTEN if possible

Yes, we're issuing unlistens when appropriate.

 > - use another signalisation technique

We're planning to reduce the number of databases/backends/listens but
anyway we'd like to run system on 8 cores if it is running without any
problems on 2 cores...

Thanks for the suggestions!

Kuba

Sven Geisler napsal(a):
> Hi Jakub,
>
> I do have a similar server (from DELL), which performance well with our
> PostgreSQL application. I guess the peak in context switches is the only
> think you can see.
>
> Anyhow, I think it is you're LISTEN/NOTIFY approach which cause that
> behaviour. I guess all backends do listen to the same notification.
> I don't know the exact implementation, but I can imagine that all
> backends are access the same section in the shared memory which cause
> the increase of context switches. More cores means more access at the
> same time.
>
> Can you change your implementation?
> - split you problem - create multiple notification if possible
> - do an UNLISTEN if possible
> - use another signalisation technique
>
> Regards
> Sven
>
>
> Jakub Ouhrabka schrieb:
>> Hi all,
>>
>> we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We
>> came accross a strange issue: when running with all 8 cores enabled
>> approximatly once a minute (period differs) the system is very busy for
>> a few seconds (~5-10s) and we don't know why - this issue don't show up
>> when we tell Linux to use only 2 cores, with 4 cores the problem is here
>> but it is still better than with 8 cores - all on the same machine, same
>> config, same workload. We don't see any apparent reason for these peaks.
>> We'd like to investigate it further but we don't know what to try next.
>> Any suggenstions? Any tunning tips for Linux+PostgreSQL on 8-way system?
>> Can this be connected with our heavy use of listen/notify and hundreds
>> backends in listen mode?
>>
>> More details are below.
>>
>> Thanks,
>>
>> Kuba
>>
>> System: HP DL360 2x5355, 8G RAM, P600+MSA50 - internal 2x72GB RAID 10
>> for OS, 10x72G disks RAID 10 for PostgreSQL data and wal
>> OS: Linux 2.6 64bit (kernel 2.6.21, 22, 23 makes little difference)
>> PostgreSQL: 8.2.4 (64bit), shared buffers 1G
>>
>> Nothing else than PostgreSQL is running on the server. Cca 800
>> concurrent backends. Majority of backends in LISTEN doing nothing.
>> Client interface for most backends is ecpg+libpq.
>>
>> Problem description:
>>
>> The system is usually running 80-95% idle. Approximatly once a minute
>> for cca 5-10s there is a peak in activity which looks like this:
>>
>> vmstat (and top or atop) reports 0% idle, 100% in user mode, very low
>> iowait, low IO activity, higher number of contex switches than usual but
>> not exceedingly high (2000-4000cs/s, usually 1500cs/s), few hundreds
>> waiting processes per second (usually 0-1/s). From looking at top and
>> running processes we can't see any obvious reason for the peak.
>> According to PostgreSQL log the long running commands from these moments
>> are e.g. begin transaction lasting several seconds.
>>
>> When only 2 cores are enabled (kernel command line) then everything is
>> running smoothly. 4 cores exibits slightly better behavior than 8 cores
>> but worse than 2 cores - the peaks are visible.
>>
>> We've tried kernel versions 2.6.21-23 (latest revisions as of beginning
>> December from kernel.org) the pattern slightly changed but it may also
>> be that the workload slightly changed.
>>
>> pgbench or any other stress testing runs smoothly on the server.
>>
>> The o usage panly strange thing about ourttern I can think of is heavy
>> use of LISTEN/NOTIFY especially hunderds backends in listen mode.
>>
>> When restarting our connected clients the peaks are not there from time
>> 0, they are visible after a while - seems something gets synchronized
>> and causing troubles then.
>>
>> Since the server is PostgreSQL dedicated and no our client applications
>> are running on it - and there is a difference when 2 and 8 cores are
>> enabled -  we think that the peaks are not caused by our client
>> applications.
>>
>> How can we diagnose what is happening during the peaks?
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>       subscribe-nomail command to majordomo@postgresql.org so that your
>>       message can get through to the mailing list cleanly
>

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Alvaro Herrera

Date:

03 January 2008, 14:31:23

Jakub Ouhrabka wrote:

> > - do an UNLISTEN if possible
>
> Yes, we're issuing unlistens when appropriate.

You are vacuuming pg_listener periodically, yes?  Not that this seems to
have any relationship to your problem, but ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

03 January 2008, 15:51:55

Alvaro,

 >>> - do an UNLISTEN if possible
 >> Yes, we're issuing unlistens when appropriate.
 >
 > You are vacuuming pg_listener periodically, yes?  Not that this seems
 > to have any relationship to your problem, but ...

yes, autovacuum should take care of this. But looking forward for
multiple-workers in 8.3 as it should help us during high load periods
(some tables might wait too long for autovacuum now - but it's not that
big problem for us...).

Thanks for great work!

Kuba

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

James Mansion

Date:

04 January 2008, 16:44:34

Jakub Ouhrabka wrote:
> How can we diagnose what is happening during the peaks?
Can you try forcing a core from a bunch of the busy processes?  (Hmm -
does Linux have an equivalent to the useful Solaris pstacks?)

James

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

David Boreham

Date:

04 January 2008, 16:55:36

James Mansion wrote:
> Jakub Ouhrabka wrote:
>> How can we diagnose what is happening during the peaks?
> Can you try forcing a core from a bunch of the busy processes?  (Hmm -
> does Linux have an equivalent to the useful Solaris pstacks?)
There's a 'pstack' for Linux, shipped at least in Red Hat distributions
(and possibly others,
I'm not sure). It's a shell script wrapper around gdb, so easily ported
to any Linux.

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

07 January 2008, 07:05:40

Hi Tom & all,

 >> It sounds a bit like momentary contention for a spinlock,
 >> but exactly what isn't clear.

 > ok, we're going to try oprofile, will let you know...

yes, it seems like contention for spinlock if I'm intepreting oprofile
correctly, around 60% of time during spikes is in s_lock. [for details
see below].

We've tried several times to get stacktrace from some of the running
backends during spikes, we got always this:

0x00002b005d00a9a9 in semop () from /lib/libc.so.6
#0  0x00002b005d00a9a9 in semop () from /lib/libc.so.6
#1  0x000000000054fe53 in PGSemaphoreLock (sema=0x2b00a04e5090,
interruptOK=0 '\0') at pg_sema.c:411
#2  0x0000000000575d95 in LWLockAcquire (lockid=SInvalLock,
mode=LW_EXCLUSIVE) at lwlock.c:455
#3  0x000000000056fbfe in ReceiveSharedInvalidMessages
(invalFunction=0x5e9a30 <LocalExecuteInvalidationMessage>,
    resetFunction=0x5e9df0 <InvalidateSystemCaches>) at sinval.c:159
#4  0x0000000000463505 in StartTransactionCommand () at xact.c:1439
#5  0x000000000056fa4b in ProcessCatchupEvent () at sinval.c:347
#6  0x000000000056fb20 in CatchupInterruptHandler
(postgres_signal_arg=<value optimized out>) at sinval.c:221
#7  0x00002b005cf6f110 in killpg () from /lib/libc.so.6
#8  0x0000000000000000 in ?? ()

Is this enough info to guess what's happening? What should we try next?

Thanks,

Kuba

oprofile results:

[I've shortened path from /usr/local/pg... to just pg for better
readilibity]

# opreport --long-filenames
CPU: Core 2, speed 2666.76 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
unit mask of 0x00 (Unhalted core cycles) count 100000
CPU_CLK_UNHALT...|
  samples|      %|
------------------
   125577 90.7584 pg-8.2.4/bin/postgres
     3792  2.7406 /lib/libc-2.3.6.so
     3220  2.3272 /usr/src/linux-2.6.22.15/vmlinux
     2145  1.5503 /usr/bin/oprofiled
     1540  1.1130 /xfs
      521  0.3765 pg-8.2.4/lib/plpgsql.so
      441  0.3187 /cciss
      374  0.2703 /oprofile

...

CPU: Core 2, speed 2666.76 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        app name                 symbol name
85355    61.6887  pg-8.2.4/bin/postgres s_lock
9803      7.0849  pg-8.2.4/bin/postgres LWLockRelease
5535      4.0003  pg-8.2.4/bin/postgres LWLockAcquire
3792      2.7406  /lib/libc-2.3.6.so       (no symbols)
3724      2.6915  pg-8.2.4/bin/postgres DropRelFileNodeBuffers
2145      1.5503  /usr/bin/oprofiled       (no symbols)
2069      1.4953  pg-8.2.4/bin/postgres GetSnapshotData
1540      1.1130  /xfs                     (no symbols)
1246      0.9005  pg-8.2.4/bin/postgres hash_search_with_hash_value
1052      0.7603  pg-8.2.4/bin/postgres AllocSetAlloc
1015      0.7336  pg-8.2.4/bin/postgres heapgettup
879       0.6353  pg-8.2.4/bin/postgres hash_any
862       0.6230  /usr/src/linux-2.6.22.15/vmlinux mwait_idle
740       0.5348  pg-8.2.4/bin/postgres hash_seq_search
674       0.4871  pg-8.2.4/bin/postgres HeapTupleSatisfiesNow
557       0.4026  pg-8.2.4/bin/postgres SIGetDataEntry
552       0.3989  pg-8.2.4/bin/postgres equal
469       0.3390  pg-8.2.4/bin/postgres SearchCatCache
441       0.3187  /cciss                   (no symbols)
433       0.3129  /usr/src/linux-2.6.22.15/vmlinux find_busiest_group
413       0.2985  pg-8.2.4/bin/postgres PinBuffer
393       0.2840  pg-8.2.4/bin/postgres MemoryContextAllocZeroAligned
374       0.2703  /oprofile                (no symbols)
275       0.1988  pg-8.2.4/bin/postgres ExecInitExpr
253       0.1829  pg-8.2.4/bin/postgres base_yyparse
206       0.1489  pg-8.2.4/bin/postgres CatalogCacheFlushRelation
201       0.1453  pg-8.2.4/bin/postgres MemoryContextAlloc
194       0.1402  pg-8.2.4/bin/postgres _bt_compare
188       0.1359  /nf_conntrack            (no symbols)
158       0.1142  /bnx2                    (no symbols)
147       0.1062  pg-8.2.4/bin/postgres pgstat_initstats
139       0.1005  pg-8.2.4/bin/postgres fmgr_info_cxt_security
132       0.0954  /usr/src/linux-2.6.22.15/vmlinux task_rq_lock
131       0.0947  /bin/bash                (no symbols)
129       0.0932  pg-8.2.4/bin/postgres AllocSetFree
125       0.0903  pg-8.2.4/bin/postgres ReadBuffer
124       0.0896  pg-8.2.4/bin/postgres MemoryContextCreate
124       0.0896  pg-8.2.4/bin/postgres SyncOneBuffer
124       0.0896  pg-8.2.4/bin/postgres XLogInsert
123       0.0889  pg-8.2.4/bin/postgres _equalAggref
122       0.0882  pg-8.2.4/bin/postgres HeapTupleSatisfiesSnapshot
112       0.0809  pg-8.2.4/bin/postgres copyObject
102       0.0737  pg-8.2.4/bin/postgres UnpinBuffer
99        0.0716  pg-8.2.4/bin/postgres _SPI_execute_plan
99        0.0716  pg-8.2.4/bin/postgres nocachegetattr
98        0.0708  /usr/src/linux-2.6.22.15/vmlinux __wake_up_bit
97        0.0701  pg-8.2.4/bin/postgres TransactionIdIsInProgress
94        0.0679  pg-8.2.4/bin/postgres check_stack_depth
93        0.0672  pg-8.2.4/bin/postgres base_yylex
91        0.0658  pg-8.2.4/bin/postgres pfree
89        0.0643  pg-8.2.4/lib/plpgsql.so exec_stmts
86        0.0622  /usr/src/linux-2.6.22.15/vmlinux __switch_to
85        0.0614  pg-8.2.4/bin/postgres LockAcquire
83        0.0600  pg-8.2.4/bin/postgres FunctionCall2
82        0.0593  pg-8.2.4/bin/postgres ExecInitAgg
82        0.0593  /usr/src/linux-2.6.22.15/vmlinux system_call
79        0.0571  pg-8.2.4/bin/postgres pgstat_write_statsfile
77        0.0557  pg-8.2.4/bin/postgres heap_getsysattr
73        0.0528  pg-8.2.4/bin/postgres .plt
72        0.0520  /lib/ld-2.3.6.so         (no symbols)
71        0.0513  pg-8.2.4/bin/postgres SearchSysCache
71        0.0513  pg-8.2.4/bin/postgres _bt_checkkeys
68        0.0491  /usr/src/linux-2.6.22.15/vmlinux apic_timer_interrupt
67        0.0484  pg-8.2.4/bin/postgres slot_deform_tuple
64        0.0463  pg-8.2.4/bin/postgres TupleDescInitEntry
64        0.0463  pg-8.2.4/bin/postgres newarc
63        0.0455  /usr/src/linux-2.6.22.15/vmlinux __wake_up
62        0.0448  pg-8.2.4/bin/postgres LocalExecuteInvalidationMessage
61        0.0441  /usr/src/linux-2.6.22.15/vmlinux try_to_wake_up
60        0.0434  pg-8.2.4/bin/postgres ReceiveSharedInvalidMessages
59        0.0426  pg-8.2.4/bin/postgres ExecutorStart
58        0.0419  pg-8.2.4/bin/postgres DirectFunctionCall1
58        0.0419  pg-8.2.4/bin/postgres ScanKeywordLookup
57        0.0412  pg-8.2.4/bin/postgres hash_search
56        0.0405  pg-8.2.4/bin/postgres CatalogCacheComputeHashValue
56        0.0405  pg-8.2.4/bin/postgres ExecProject
55        0.0398  pg-8.2.4/bin/postgres _bt_first
54        0.0390  pg-8.2.4/lib/plpgsql.so exec_eval_simple_expr
53        0.0383  pg-8.2.4/bin/postgres AllocSetDelete
53        0.0383  pg-8.2.4/bin/postgres CleanupTempFiles
52        0.0376  pg-8.2.4/bin/postgres ExecCreateTupleTable
52        0.0376  pg-8.2.4/lib/plpgsql.so copy_plpgsql_datum
49        0.0354  pg-8.2.4/bin/postgres MemoryContextAllocZero
49        0.0354  pg-8.2.4/bin/postgres SIDelExpiredDataEntries
49        0.0354  pg-8.2.4/bin/postgres fix_opfuncids_walker
47        0.0340  pg-8.2.4/bin/postgres expression_tree_walker
46        0.0332  pg-8.2.4/bin/postgres LockBuffer
45        0.0325  pg-8.2.4/bin/postgres lappend
45        0.0325  /usr/src/linux-2.6.22.15/vmlinux do_IRQ
44        0.0318  pg-8.2.4/bin/postgres LockReleaseAll
43        0.0311  pg-8.2.4/bin/postgres ExecutorRun
43        0.0311  pg-8.2.4/bin/postgres exprTypmod
42        0.0304  pg-8.2.4/bin/postgres ExecClearTuple
42        0.0304  pg-8.2.4/bin/postgres heap_fill_tuple
41        0.0296  pg-8.2.4/bin/postgres ExecIndexBuildScanKeys
40        0.0289  pg-8.2.4/bin/postgres _bt_readpage
40        0.0289  pg-8.2.4/lib/plpgsql.so plpgsql_exec_function
40        0.0289  /usr/src/linux-2.6.22.15/vmlinux lock_timer_base
39        0.0282  pg-8.2.4/bin/postgres heap_release_fetch
38        0.0275  pg-8.2.4/bin/postgres ExecEvalVar
38        0.0275  pg-8.2.4/bin/postgres ExecTypeFromTLInternal
37        0.0267  pg-8.2.4/bin/postgres LockReassignCurrentOwner
37        0.0267  /usr/src/linux-2.6.22.15/vmlinux scheduler_tick
36        0.0260  pg-8.2.4/bin/postgres ReleaseAndReadBuffer
36        0.0260  pg-8.2.4/bin/postgres _bt_checkpage
36        0.0260  pg-8.2.4/bin/postgres freearc
36        0.0260  pg-8.2.4/bin/postgres heapgetpage
36        0.0260  /usr/src/linux-2.6.22.15/vmlinux resched_task
35        0.0253  pg-8.2.4/bin/postgres PGSemaphoreLock
35        0.0253  pg-8.2.4/bin/postgres optimize
35        0.0253  pg-8.2.4/bin/postgres slot_getattr
35        0.0253  /usr/src/linux-2.6.22.15/vmlinux effective_prio
35        0.0253  /usr/src/linux-2.6.22.15/vmlinux hrtimer_run_queues
34        0.0246  /ip_tables               (no symbols)
34        0.0246  pg-8.2.4/bin/postgres ExecCountSlotsNode
34        0.0246  pg-8.2.4/bin/postgres ExecMakeFunctionResultNoSets
34        0.0246  pg-8.2.4/bin/postgres ResourceOwnerForgetBuffer
34        0.0246  pg-8.2.4/bin/postgres TransactionIdPrecedes
34        0.0246  pg-8.2.4/bin/postgres _bt_getroot
34        0.0246  pg-8.2.4/lib/plpgsql.so exec_stmt_block
34        0.0246  /usr/src/linux-2.6.22.15/vmlinux do_page_fault
34        0.0246  /usr/src/linux-2.6.22.15/vmlinux run_rebalance_domains
33        0.0239  pg-8.2.4/bin/postgres AtEOXact_GUC
33        0.0239  /usr/src/linux-2.6.22.15/vmlinux __exit_idle
33        0.0239  /usr/src/linux-2.6.22.15/vmlinux sched_clock
32        0.0231  pg-8.2.4/bin/postgres _bt_moveright
32        0.0231  pg-8.2.4/bin/postgres compact
32        0.0231  /usr/src/linux-2.6.22.15/vmlinux __mod_timer
31        0.0224  pg-8.2.4/bin/postgres CreateExprContext
31        0.0224  pg-8.2.4/bin/postgres ExecInitNode
31        0.0224  pg-8.2.4/bin/postgres ExecMakeFunctionResult
31        0.0224  pg-8.2.4/bin/postgres ReleaseCatCache
31        0.0224  pg-8.2.4/bin/postgres exprType
30        0.0217  pg-8.2.4/bin/postgres _bt_binsrch
30        0.0217  pg-8.2.4/bin/postgres new_tail_cell
30        0.0217  /usr/src/linux-2.6.22.15/vmlinux run_timer_softirq
29        0.0210  pg-8.2.4/bin/postgres ExecAssignScanProjectionInfo
29        0.0210  pg-8.2.4/bin/postgres heap_form_tuple
28        0.0202  /usr/lib/gconv/ISO8859-1.so (no symbols)
28        0.0202  pg-8.2.4/bin/postgres AllocSetContextCreate
28        0.0202  pg-8.2.4/bin/postgres ResourceOwnerForgetCatCacheRef
28        0.0202  pg-8.2.4/bin/postgres hashoid
28        0.0202  pg-8.2.4/bin/postgres new_list
27        0.0195  pg-8.2.4/bin/postgres CreateTemplateTupleDesc
27        0.0195  pg-8.2.4/bin/postgres MemoryContextStrdup
27        0.0195  pg-8.2.4/bin/postgres SPI_execute_plan
27        0.0195  pg-8.2.4/bin/postgres btint4cmp
27        0.0195  /usr/src/linux-2.6.22.15/vmlinux IRQ0xa9_interrupt
26        0.0188  pg-8.2.4/bin/postgres ExecBuildProjectionInfo
26        0.0188  pg-8.2.4/bin/postgres ExecDropTupleTable
26        0.0188  pg-8.2.4/bin/postgres ExecEndNode
26        0.0188  pg-8.2.4/bin/postgres ExecEvalParam
26        0.0188  pg-8.2.4/bin/postgres FreeExecutorState
26        0.0188  pg-8.2.4/bin/postgres MemoryContextReset
26        0.0188  /usr/src/linux-2.6.22.15/vmlinux smp_apic_timer_interrupt
25        0.0181  pg-8.2.4/bin/postgres ExecInitSubPlan
25        0.0181  pg-8.2.4/bin/postgres FuncnameGetCandidates
25        0.0181  pg-8.2.4/bin/postgres MemoryContextDelete
25        0.0181  pg-8.2.4/bin/postgres SearchCatCacheList
25        0.0181  /usr/src/linux-2.6.22.15/vmlinux handle_edge_irq
25        0.0181  /usr/src/linux-2.6.22.15/vmlinux msecs_to_jiffies
24        0.0173  /nf_conntrack_ipv4       (no symbols)
24        0.0173  pg-8.2.4/bin/postgres ExecInitIndexScan
24        0.0173  pg-8.2.4/bin/postgres ExecProcNode
24        0.0173  pg-8.2.4/bin/postgres InternalCreateExecutorState
24        0.0173  pg-8.2.4/bin/postgres PostgresMain
24        0.0173  pg-8.2.4/bin/postgres RelationIncrementReferenceCount
24        0.0173  pg-8.2.4/bin/postgres heapgettup_pagemode
24        0.0173  /usr/src/linux-2.6.22.15/vmlinux __wake_up_common
24        0.0173  /usr/src/linux-2.6.22.15/vmlinux get_task_mm
24        0.0173  /usr/src/linux-2.6.22.15/vmlinux page_waitqueue
23        0.0166  pg-8.2.4/bin/postgres ExecCheckRTPerms
23        0.0166  pg-8.2.4/bin/postgres TransactionIdFollowsOrEquals
23        0.0166  pg-8.2.4/bin/postgres internal_putbytes
23        0.0166  pg-8.2.4/bin/postgres scanner_init
23        0.0166  pg-8.2.4/lib/plpgsql.so exec_stmt_execsql
23        0.0166  /usr/src/linux-2.6.22.15/vmlinux IRQ0xc1_interrupt
23        0.0166  /usr/src/linux-2.6.22.15/vmlinux __do_softirq
22        0.0159  pg-8.2.4/bin/postgres AfterTriggerBeginXact
22        0.0159  pg-8.2.4/bin/postgres GetTransactionSnapshot
22        0.0159  pg-8.2.4/bin/postgres ResourceOwnerReleaseInternal
22        0.0159  pg-8.2.4/bin/postgres lcons
22        0.0159  pg-8.2.4/bin/postgres markcanreach
22        0.0159  pg-8.2.4/lib/plpgsql.so exec_eval_datum
22        0.0159  pg-8.2.4/lib/plpgsql.so exec_eval_expr
22        0.0159  pg-8.2.4/lib/utf8_and_win.so win_to_utf8
22        0.0159  /usr/src/linux-2.6.22.15/vmlinux enqueue_task
22        0.0159  /usr/src/linux-2.6.22.15/vmlinux math_state_restore
21        0.0152  pg-8.2.4/bin/postgres AtCommit_Notify
21        0.0152  pg-8.2.4/bin/postgres CreateTupleDescCopy
21        0.0152  pg-8.2.4/bin/postgres ExecScan
21        0.0152  pg-8.2.4/bin/postgres ResourceOwnerEnlargeCatCacheRefs
21        0.0152  pg-8.2.4/bin/postgres ResourceOwnerForgetRelationRef
21        0.0152  pg-8.2.4/bin/postgres ResourceOwnerRememberCatCacheRef
21        0.0152  pg-8.2.4/bin/postgres SHMQueueInsertBefore
21        0.0152  pg-8.2.4/bin/postgres exec_simple_query
21        0.0152  pg-8.2.4/bin/postgres get_hash_value
21        0.0152  pg-8.2.4/bin/postgres index_getnext
21        0.0152  pg-8.2.4/bin/postgres next_token
21        0.0152  pg-8.2.4/bin/postgres pgstat_report_activity
21        0.0152  pg-8.2.4/lib/plpgsql.so exec_run_select
20        0.0145  pg-8.2.4/bin/postgres AllocSetReset
20        0.0145  pg-8.2.4/bin/postgres AtProcExit_Buffers
20        0.0145  pg-8.2.4/bin/postgres LocalToUtf
20        0.0145  pg-8.2.4/bin/postgres index_getprocinfo
20        0.0145  pg-8.2.4/bin/postgres spi_dest_startup
20        0.0145  pg-8.2.4/bin/postgres tag_hash
20        0.0145  /usr/src/linux-2.6.22.15/vmlinux sysret_check
19        0.0137  pg-8.2.4/bin/postgres Async_Unlisten
19        0.0137  pg-8.2.4/bin/postgres FileSeek
19        0.0137  pg-8.2.4/bin/postgres PortalStart
19        0.0137  pg-8.2.4/bin/postgres heap_open
19        0.0137  pg-8.2.4/bin/postgres pg_mblen
19        0.0137  /usr/src/linux-2.6.22.15/vmlinux prepare_to_wait
18        0.0130  /iptable_filter          (no symbols)
18        0.0130  /iptable_nat             (no symbols)
18        0.0130  /lib/libm-2.3.6.so       (no symbols)
18        0.0130  pg-8.2.4/bin/postgres CopySnapshot
18        0.0130  pg-8.2.4/bin/postgres ExecSetSlotDescriptor
18        0.0130  pg-8.2.4/bin/postgres SPI_connect
18        0.0130  pg-8.2.4/bin/postgres cleartraverse
18        0.0130  pg-8.2.4/bin/postgres pg_mbcliplen
18        0.0130  pg-8.2.4/bin/postgres strlcpy
18        0.0130  pg-8.2.4/lib/plpgsql.so plpgsql_compile
18        0.0130  /usr/src/linux-2.6.22.15/vmlinux account_system_time
18        0.0130  /usr/src/linux-2.6.22.15/vmlinux bit_waitqueue
17        0.0123  pg-8.2.4/bin/postgres AtEOXact_MultiXact
17        0.0123  pg-8.2.4/bin/postgres CreateTupleDescCopyConstr
17        0.0123  pg-8.2.4/bin/postgres ExecAgg
17        0.0123  pg-8.2.4/bin/postgres ExecResult
17        0.0123  pg-8.2.4/bin/postgres FlushBuffer
17        0.0123  pg-8.2.4/bin/postgres RelationGetIndexScan
17        0.0123  pg-8.2.4/bin/postgres SHMQueueDelete
17        0.0123  pg-8.2.4/bin/postgres SendRowDescriptionMessage
17        0.0123  pg-8.2.4/bin/postgres _bt_search
17        0.0123  pg-8.2.4/bin/postgres _copyTargetEntry
17        0.0123  pg-8.2.4/bin/postgres heap_getnext
17        0.0123  pg-8.2.4/bin/postgres markreachable
17        0.0123  pg-8.2.4/bin/postgres miss
17        0.0123  pg-8.2.4/bin/postgres scanRTEForColumn
17        0.0123  pg-8.2.4/lib/plpgsql.so exec_assign_value
17        0.0123  /usr/src/linux-2.6.22.15/vmlinux do_softirq
16        0.0116  pg-8.2.4/bin/postgres ExecEvalConst
16        0.0116  pg-8.2.4/bin/postgres ExecReScan
16        0.0116  pg-8.2.4/bin/postgres ExecSetParamPlan
16        0.0116  pg-8.2.4/bin/postgres downcase_truncate_identifier
16        0.0116  pg-8.2.4/bin/postgres transformExpr
16        0.0116  pg-8.2.4/bin/postgres varstr_cmp
16        0.0116  pg-8.2.4/lib/plpgsql.so plpgsql_call_handler
16        0.0116  pg-8.2.4/lib/plpgsql.so plpgsql_estate_setup
16        0.0116  /usr/src/linux-2.6.22.15/vmlinux cpu_idle
15        0.0108  pg-8.2.4/bin/postgres CommitTransactionCommand
15        0.0108  pg-8.2.4/bin/postgres ExecEvalFuncArgs
15        0.0108  pg-8.2.4/bin/postgres GrantLockLocal
15        0.0108  pg-8.2.4/bin/postgres PrepareToInvalidateCacheTuple
15        0.0108  pg-8.2.4/bin/postgres UtfToLocal
15        0.0108  pg-8.2.4/bin/postgres cost_nestloop
15        0.0108  pg-8.2.4/bin/postgres errstart
15        0.0108  pg-8.2.4/bin/postgres index_open
15        0.0108  pg-8.2.4/bin/postgres init_fcache
15        0.0108  pg-8.2.4/bin/postgres list_delete_cell
14        0.0101  pg-8.2.4/bin/postgres ExecStoreTuple
14        0.0101  pg-8.2.4/bin/postgres ReadNewTransactionId
14        0.0101  pg-8.2.4/bin/postgres ReleaseBuffer
14        0.0101  pg-8.2.4/bin/postgres ResourceOwnerRememberBuffer
14        0.0101  pg-8.2.4/bin/postgres SHMQueueNext
14        0.0101  pg-8.2.4/bin/postgres btgettuple
14        0.0101  pg-8.2.4/bin/postgres duptraverse
14        0.0101  pg-8.2.4/bin/postgres heap_compute_data_size
14        0.0101  pg-8.2.4/bin/postgres index_rescan
14        0.0101  pg-8.2.4/bin/postgres list_copy
14        0.0101  pg-8.2.4/bin/postgres smgrclosenode
14        0.0101  pg-8.2.4/lib/plpgsql.so .plt
14        0.0101  /usr/src/linux-2.6.22.15/vmlinux idle_cpu
14        0.0101  /usr/src/linux-2.6.22.15/vmlinux run_workqueue
13        0.0094  pg-8.2.4/bin/postgres Async_Listen
13        0.0094  pg-8.2.4/bin/postgres DLMoveToFront
13        0.0094  pg-8.2.4/bin/postgres FreeExprContext
13        0.0094  pg-8.2.4/bin/postgres IndexNext
13        0.0094  pg-8.2.4/bin/postgres LockRelease
13        0.0094  pg-8.2.4/bin/postgres ResourceOwnerEnlargeBuffers
13        0.0094  pg-8.2.4/bin/postgres ShutdownExprContext
13        0.0094  pg-8.2.4/bin/postgres cleanup
13        0.0094  pg-8.2.4/bin/postgres datumCopy
13        0.0094  pg-8.2.4/bin/postgres grouping_planner
13        0.0094  pg-8.2.4/bin/postgres newstate
13        0.0094  pg-8.2.4/bin/postgres pgstat_start
13        0.0094  pg-8.2.4/bin/postgres spi_printtup
13        0.0094  pg-8.2.4/bin/postgres transformStmt
12        0.0087  pg-8.2.4/bin/postgres ExecEvalOper
12        0.0087  pg-8.2.4/bin/postgres ExecQual
12        0.0087  pg-8.2.4/bin/postgres IsSharedRelation
12        0.0087  pg-8.2.4/bin/postgres RelationIdGetRelation
12        0.0087  pg-8.2.4/bin/postgres StartTransactionCommand
12        0.0087  pg-8.2.4/bin/postgres _bt_insertonpg
12        0.0087  pg-8.2.4/bin/postgres advance_aggregates
12        0.0087  pg-8.2.4/bin/postgres getvacant
12        0.0087  pg-8.2.4/bin/postgres list_delete_ptr
12        0.0087  pg-8.2.4/bin/postgres namestrcpy
12        0.0087  pg-8.2.4/bin/postgres oideq
12        0.0087  /usr/src/linux-2.6.22.15/vmlinux local_bh_enable_ip
12        0.0087  /usr/src/linux-2.6.22.15/vmlinux sys_rt_sigreturn
11        0.0080  pg-8.2.4/bin/postgres CommitTransaction
11        0.0080  pg-8.2.4/bin/postgres ExecEvalNullTest
11        0.0080  pg-8.2.4/bin/postgres LockRelationOid
11        0.0080  pg-8.2.4/bin/postgres ProcessUtility
11        0.0080  pg-8.2.4/bin/postgres UnGrantLock
11        0.0080  pg-8.2.4/bin/postgres XidInSnapshot
11        0.0080  pg-8.2.4/bin/postgres _bt_doinsert
11        0.0080  pg-8.2.4/bin/postgres _bt_steppage
11        0.0080  pg-8.2.4/bin/postgres appendBinaryStringInfo
11        0.0080  pg-8.2.4/bin/postgres okcolors
11        0.0080  pg-8.2.4/bin/postgres pq_getmessage
11        0.0080  pg-8.2.4/lib/plpgsql.so exec_move_row
11        0.0080  /usr/src/linux-2.6.22.15/vmlinux clocksource_get_next
11        0.0080  /usr/src/linux-2.6.22.15/vmlinux handle_IRQ_event
11        0.0080  /usr/src/linux-2.6.22.15/vmlinux local_bh_enable
11        0.0080  /usr/src/linux-2.6.22.15/vmlinux mod_timer
10        0.0072  pg-8.2.4/bin/postgres CommandEndInvalidationMessages
10        0.0072  pg-8.2.4/bin/postgres CopyTriggerDesc
10        0.0072  pg-8.2.4/bin/postgres ExecAssignExprContext
10        0.0072  pg-8.2.4/bin/postgres ExecEvalScalarArrayOp
10        0.0072  pg-8.2.4/bin/postgres FunctionCall5
10        0.0072  pg-8.2.4/bin/postgres GetCurrentSubTransactionId
10        0.0072  pg-8.2.4/bin/postgres RecordTransactionCommit
10        0.0072  pg-8.2.4/bin/postgres StrategyGetBuffer
10        0.0072  pg-8.2.4/bin/postgres _bt_getbuf
10        0.0072  pg-8.2.4/bin/postgres _bt_preprocess_keys
10        0.0072  pg-8.2.4/bin/postgres eval_const_expressions_mutator
10        0.0072  pg-8.2.4/bin/postgres index_beginscan_internal
10        0.0072  pg-8.2.4/bin/postgres new_head_cell
10        0.0072  pg-8.2.4/bin/postgres perform_default_encoding_conversion
10        0.0072  pg-8.2.4/bin/postgres pg_regcomp
10        0.0072  pg-8.2.4/lib/plpgsql.so exec_eval_boolean
10        0.0072  pg-8.2.4/lib/plpgsql.so exec_stmt_if
10        0.0072  /usr/src/linux-2.6.22.15/vmlinux __rcu_pending
10        0.0072  /usr/src/linux-2.6.22.15/vmlinux __rcu_process_callbacks

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Alvaro Herrera

Date:

07 January 2008, 11:12:58

Jakub Ouhrabka wrote:

> We've tried several times to get stacktrace from some of the running
> backends during spikes, we got always this:
>
> 0x00002b005d00a9a9 in semop () from /lib/libc.so.6
> #0  0x00002b005d00a9a9 in semop () from /lib/libc.so.6
> #1  0x000000000054fe53 in PGSemaphoreLock (sema=0x2b00a04e5090,
> interruptOK=0 '\0') at pg_sema.c:411
> #2  0x0000000000575d95 in LWLockAcquire (lockid=SInvalLock,
> mode=LW_EXCLUSIVE) at lwlock.c:455
> #3  0x000000000056fbfe in ReceiveSharedInvalidMessages
> (invalFunction=0x5e9a30 <LocalExecuteInvalidationMessage>,
>    resetFunction=0x5e9df0 <InvalidateSystemCaches>) at sinval.c:159
> #4  0x0000000000463505 in StartTransactionCommand () at xact.c:1439
> #5  0x000000000056fa4b in ProcessCatchupEvent () at sinval.c:347
> #6  0x000000000056fb20 in CatchupInterruptHandler
> (postgres_signal_arg=<value optimized out>) at sinval.c:221
> #7  0x00002b005cf6f110 in killpg () from /lib/libc.so.6
> #8  0x0000000000000000 in ?? ()

Perhaps it would make sense to try to take the "fast path" in
SIDelExpiredDataEntries with only a shared lock rather than exclusive.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

07 January 2008, 18:31:37

Jakub Ouhrabka <kuba@comgate.cz> writes:
> We've tried several times to get stacktrace from some of the running
> backends during spikes, we got always this:

> 0x00002b005d00a9a9 in semop () from /lib/libc.so.6
> #0  0x00002b005d00a9a9 in semop () from /lib/libc.so.6
> #1  0x000000000054fe53 in PGSemaphoreLock (sema=0x2b00a04e5090,
> interruptOK=0 '\0') at pg_sema.c:411
> #2  0x0000000000575d95 in LWLockAcquire (lockid=SInvalLock,
> mode=LW_EXCLUSIVE) at lwlock.c:455
> #3  0x000000000056fbfe in ReceiveSharedInvalidMessages
> (invalFunction=0x5e9a30 <LocalExecuteInvalidationMessage>,
>     resetFunction=0x5e9df0 <InvalidateSystemCaches>) at sinval.c:159
> #4  0x0000000000463505 in StartTransactionCommand () at xact.c:1439
> #5  0x000000000056fa4b in ProcessCatchupEvent () at sinval.c:347
> #6  0x000000000056fb20 in CatchupInterruptHandler
> (postgres_signal_arg=<value optimized out>) at sinval.c:221

CatchupInterruptHandler, eh?  That seems to let NOTIFY off the hook, and
instead points in the direction of sinval processing; which is to say,
propagation of changes to system catalogs.  Does your app create and
destroy a tremendous number of temp tables, or anything else in the way
of frequent DDL commands?

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

07 January 2008, 20:01:53

Jakub Ouhrabka <jakub.ouhrabka@comgate.cz> writes:
>>> Does your app create and destroy a tremendous number of temp tables,
>>> or anything else in the way of frequent DDL commands?

> Hmm. I can't think of anything like this. Maybe there are few backends
> which create temp tables but not tremendous number. I don't think our
> applications issue DDL statements either. Can LOCK TABLE IN EXCLUSIVE
> MODE cause this?

No.

I did some experimenting to see exactly how large the sinval message
buffer is in today's terms, and what I find is that about 22 cycles
of

create temp table foo (f1 int, f2 text); drop table foo;

is enough to force a CatchupInterrupt to a sleeping backend.  This case
is a bit more complex than it appears since the text column forces the
temp table to have a toast table; but even with only fixed-width
columns, if you were creating one temp table a second that would be
plenty to explain once-a-minute-or-so CatchupInterrupt processing.
And if you've got a lot of backends that normally *don't* wake up
that often, then they'd be sitting and failing to absorb the sinval
traffic in any more timely fashion.  So I'm betting that we've got
the source of the spike identified.  You could check this theory
out by strace'ing some of the idle backends and seeing if their
activity spikes are triggered by receipt of SIGUSR1 signals.

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

07 January 2008, 20:54:36

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Perhaps it would make sense to try to take the "fast path" in
> SIDelExpiredDataEntries with only a shared lock rather than exclusive.

I think the real problem here is that sinval catchup processing is well
designed to create contention :-(.  Once we've decided that the message
queue is getting too full, we SIGUSR1 all the backends at once (or as
fast as the postmaster can do it anyway), then they all go off and try
to touch the sinval queue.  Backends that haven't awoken even once
since the last time will have to process the entire queue contents,
and they're all trying to do that at the same time.  What's worse, they
take and release the SInvalLock once for each message they take off the
queue.  This isn't so horrid for one-core machines (since each process
will monopolize the CPU for probably less than one timeslice while it's
catching up) but it's pretty obvious where all the contention is coming
from on an 8-core.

Some ideas for improving matters:

1. Try to avoid having all the backends hit the queue at once.  Instead
of SIGUSR1'ing everybody at the same time, maybe hit only the process
with the oldest message pointer, and have him hit the next oldest after
he's done reading the queue.

2. Try to take more than one message off the queue per SInvalLock cycle.
(There is a tuning tradeoff here, since it would mean holding the lock
for longer at a time.)

3. Try to avoid having every backend run SIDelExpiredDataEntries every
time through ReceiveSharedInvalidMessages.  It's not critical to delete
entries until the queue starts getting full --- maybe we could rejigger
the logic so it only happens once when somebody notices the queue is
getting full, or so that only the guy(s) who had nextMsgNum == minMsgNum
do it, or something like that?

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

08 January 2008, 08:19:44

 > You could check this theory
 > out by strace'ing some of the idle backends and seeing if their
 > activity spikes are triggered by receipt of SIGUSR1 signals.

Yes, I can confirm that it's triggered by SIGUSR1 signals.

If I understand it correctly we have following choices now:

1) Use only 2 cores (out of 8 cores)

2) Lower the number of idle backends - at least force backends to do
something at different times to eliminate spikes - is "select 1" enough
to force processing the queue?

3) Is there any chance of this being fixed/improved in 8.3 or even 8.2?
It's a (performance) bug from our point of view. I realize we're first
who noticed it and it's not typical use case to have so many idle
backends. But large installation with connection pooling is something
similar and that's not that uncommon - one active backend doing DDL can
then cause unexpected spikes during otherwise quiet hours...

4) Sure we'll try to reduce the number of DDL statements (which in fact
we're not sure where exactly they are comming from) but I guess it would
only longer the time between spikes but not make them any smoother.

Any other suggestions?

Thanks,

Kuba

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

08 January 2008, 13:25:22

Jakub Ouhrabka <kuba@comgate.cz> writes:
> Yes, I can confirm that it's triggered by SIGUSR1 signals.

OK, that confirms the theory that it's sinval-queue contention.

> If I understand it correctly we have following choices now:

> 1) Use only 2 cores (out of 8 cores)

> 2) Lower the number of idle backends - at least force backends to do
> something at different times to eliminate spikes - is "select 1" enough
> to force processing the queue?

Yeah, if you could get your clients to issue trivial queries every few
seconds (not all at the same time) the spikes should go away.

If you don't want to change your clients, one possible amelioration is
to reduce the signaling threshold in SIInsertDataEntry --- instead of
70% of MAXNUMMESSAGES, maybe signal at 10% or 20%.  That would make the
spikes more frequent but smaller, which might help ... or not.

> 3) Is there any chance of this being fixed/improved in 8.3 or even 8.2?

I doubt we'd risk destabilizing 8.3 at this point, for a problem that
affects so few people; let alone back-patching into 8.2.  There are some
other known performance problems in the sinval signaling (for instance,
that a queue overflow results in cache resets for all backends, not only
the slowest), so I think addressing all of them at once would be the
thing to do.  That would be a large enough patch that it would certainly
need to go through beta testing before I'd want to inflict it on the
world...

This discussion has raised the priority of the problem in my mind,
so I'm thinking it should be worked on in 8.4; but it's too late for
8.3.

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

11 January 2008, 07:12:08

Hi Tom,

 > I doubt we'd risk destabilizing 8.3 at this point, for a problem that
 > affects so few people; let alone back-patching into 8.2.

understand.

 > OK, that confirms the theory that it's sinval-queue contention.

We'we tried hard to identify what's the cause of filling sinval-queue.
We went through query logs as well as function bodies stored in the
database. We were not able to find any DDL, temp table creations etc.

We did following experiment: stop one of our clients, so there started
to be queue of events (aka rows in db) for it to process. Then we
started the client again, it started processing the queue - that means
calling simple selects, updates and complex plpgsql function(s). And at
this moment, the spike started even it shouldn't start to meet usual
periodicity. It's consistent. We are pretty sure that this client is not
doing any DDL...

What should we look for to find the cause?

Thanks for any hints,

Kuba

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

11 January 2008, 11:06:06

Jakub Ouhrabka <kuba@comgate.cz> writes:
> We'we tried hard to identify what's the cause of filling sinval-queue.
> We went through query logs as well as function bodies stored in the
> database. We were not able to find any DDL, temp table creations etc.

Strange.  The best idea that comes to mind is to add some debugging
code to SendSharedInvalidMessage to log the content of each message
that's sent out.  That would at least tell us *what* is going into
the queue, even if not directly *why*.  Try something like (untested)

void
SendSharedInvalidMessage(SharedInvalidationMessage *msg)
{
    bool        insertOK;

+    elog(LOG, "sending inval msg %d %u %u %u %u %u",
+        msg->cc.id,
+        msg->cc.tuplePtr.ip_blkid.bi_hi,
+        msg->cc.tuplePtr.ip_blkid.bi_lo,
+        msg->cc.tuplePtr.ip_posid,
+        msg->cc.dbId,
+        msg->cc.hashValue);
+
    LWLockAcquire(SInvalLock, LW_EXCLUSIVE);
    insertOK = SIInsertDataEntry(shmInvalBuffer, msg);
    LWLockRelease(SInvalLock);

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

14 January 2008, 07:47:20

Hi Tom,

 > Strange.  The best idea that comes to mind is to add some debugging
 > code to SendSharedInvalidMessage to log the content of each message
 > that's sent out.  That would at least tell us *what* is going into
 > the queue, even if not directly *why*.

we've patched postgresql and run one of our plpgsql complex procedures.
There are many of sinval messages - log output is below.

What does it mean?

Thanks,

Kuba

LOG:  sending inval msg 30 0 26 13 30036 4294936593
LOG:  sending inval msg 29 0 26 13 30036 337030170
LOG:  sending inval msg 30 0 25 46 30036 4294936593
LOG:  sending inval msg 29 0 25 46 30036 337030170
LOG:  sending inval msg 30 0 26 13 30036 4294936593
LOG:  sending inval msg 29 0 26 13 30036 337030170
LOG:  sending inval msg 30 0 25 45 30036 4294936595
LOG:  sending inval msg 29 0 25 45 30036 2019111801
LOG:  sending inval msg 30 0 26 11 30036 4294936595
LOG:  sending inval msg 29 0 26 11 30036 2019111801
LOG:  sending inval msg 30 0 25 44 30036 4294936597
LOG:  sending inval msg 29 0 25 44 30036 3703878920
LOG:  sending inval msg 30 0 26 10 30036 4294936597
LOG:  sending inval msg 29 0 26 10 30036 3703878920
LOG:  sending inval msg 30 0 26 9 30036 4294936616
LOG:  sending inval msg 29 0 26 9 30036 3527122063
LOG:  sending inval msg 30 0 25 43 30036 4294936616
LOG:  sending inval msg 29 0 25 43 30036 3527122063
LOG:  sending inval msg 30 0 26 9 30036 4294936616
LOG:  sending inval msg 29 0 26 9 30036 3527122063
LOG:  sending inval msg 30 0 25 41 30036 4294936618
LOG:  sending inval msg 29 0 25 41 30036 2126866956
LOG:  sending inval msg 30 0 26 7 30036 4294936618
LOG:  sending inval msg 29 0 26 7 30036 2126866956
LOG:  sending inval msg 30 0 25 40 30036 4294936620
LOG:  sending inval msg 29 0 25 40 30036 1941919314
LOG:  sending inval msg 30 0 26 5 30036 4294936620
LOG:  sending inval msg 29 0 26 5 30036 1941919314
LOG:  sending inval msg 30 0 26 4 30036 4294936633
LOG:  sending inval msg 29 0 26 4 30036 544523647
LOG:  sending inval msg 30 0 25 39 30036 4294936633
LOG:  sending inval msg 29 0 25 39 30036 544523647
LOG:  sending inval msg 30 0 26 4 30036 4294936633
LOG:  sending inval msg 29 0 26 4 30036 544523647
LOG:  sending inval msg 30 0 25 38 30036 4294936635
LOG:  sending inval msg 29 0 25 38 30036 2557582018
LOG:  sending inval msg 30 0 26 3 30036 4294936635
LOG:  sending inval msg 29 0 26 3 30036 2557582018
LOG:  sending inval msg 30 0 25 37 30036 4294936637
LOG:  sending inval msg 29 0 25 37 30036 2207280630
LOG:  sending inval msg 30 0 26 2 30036 4294936637
LOG:  sending inval msg 29 0 26 2 30036 2207280630
LOG:  sending inval msg 30 0 26 1 30036 4294936669
LOG:  sending inval msg 29 0 26 1 30036 1310188568
LOG:  sending inval msg 30 0 25 36 30036 4294936669
LOG:  sending inval msg 29 0 25 36 30036 1310188568
LOG:  sending inval msg 30 0 26 1 30036 4294936669
LOG:  sending inval msg 29 0 26 1 30036 1310188568
LOG:  sending inval msg 30 0 25 35 30036 4294936671
LOG:  sending inval msg 29 0 25 35 30036 2633053415
LOG:  sending inval msg 30 0 25 48 30036 4294936671
LOG:  sending inval msg 29 0 25 48 30036 2633053415
LOG:  sending inval msg 30 0 25 33 30036 4294936673
LOG:  sending inval msg 29 0 25 33 30036 2049964857
LOG:  sending inval msg 30 0 25 47 30036 4294936673
LOG:  sending inval msg 29 0 25 47 30036 2049964857
LOG:  sending inval msg -1 0 30036 0 30700 3218341912
LOG:  sending inval msg -2 2084 1663 0 30036 50335
LOG:  sending inval msg -2 0 1663 0 30036 50336
LOG:  sending inval msg -1 2075 30036 0 30702 30036
LOG:  sending inval msg -2 0 1663 0 30036 50324
LOG:  sending inval msg -1 0 30036 0 30702 30036
LOG:  sending inval msg -2 0 1663 0 30036 50336
LOG:  sending inval msg -2 0 1663 0 30036 50323
LOG:  sending inval msg -1 0 30036 0 30700 30036
LOG:  sending inval msg -2 0 1663 0 30036 50335
LOG:  sending inval msg -2 0 1663 0 30036 50322
LOG:  sending inval msg -1 0 30036 0 30698 30036
LOG:  sending inval msg -2 0 1663 0 30036 50334
LOG:  sending inval msg -1 0 30036 0 30677 3218341912
LOG:  sending inval msg -2 2084 1663 0 30036 50332
LOG:  sending inval msg -2 0 1663 0 30036 50333
LOG:  sending inval msg -1 2075 30036 0 30679 30036
LOG:  sending inval msg -2 0 1663 0 30036 50321
LOG:  sending inval msg -1 0 30036 0 30679 30036
LOG:  sending inval msg -2 0 1663 0 30036 50333
LOG:  sending inval msg -2 0 1663 0 30036 50320
LOG:  sending inval msg -1 0 30036 0 30677 30036
LOG:  sending inval msg -2 0 1663 0 30036 50332
LOG:  sending inval msg -2 0 1663 0 30036 50319
LOG:  sending inval msg -1 0 30036 0 30675 30036
LOG:  sending inval msg -2 0 1663 0 30036 50331
LOG:  sending inval msg -1 0 30036 0 30660 3218341912
LOG:  sending inval msg -2 2084 1663 0 30036 50329
LOG:  sending inval msg -2 0 1663 0 30036 50330
LOG:  sending inval msg -1 2075 30036 0 30662 30036
LOG:  sending inval msg -2 0 1663 0 30036 50318
LOG:  sending inval msg -1 0 30036 0 30662 30036
LOG:  sending inval msg -2 0 1663 0 30036 50330
LOG:  sending inval msg -2 0 1663 0 30036 50317
LOG:  sending inval msg -1 0 30036 0 30660 30036
LOG:  sending inval msg -2 0 1663 0 30036 50329
LOG:  sending inval msg -2 0 1663 0 30036 50316
LOG:  sending inval msg -1 0 30036 0 30658 30036
LOG:  sending inval msg -2 0 1663 0 30036 50328
LOG:  sending inval msg -1 0 30036 0 30624 3218341912
LOG:  sending inval msg -2 2084 1663 0 30036 50326
LOG:  sending inval msg -2 0 1663 0 30036 50327
LOG:  sending inval msg -1 2075 30036 0 30626 30036
LOG:  sending inval msg -2 0 1663 0 30036 50315
LOG:  sending inval msg -1 0 30036 0 30626 30036
LOG:  sending inval msg -2 0 1663 0 30036 50327
LOG:  sending inval msg -2 0 1663 0 30036 50314
LOG:  sending inval msg -1 0 30036 0 30624 30036
LOG:  sending inval msg -2 0 1663 0 30036 50326
LOG:  sending inval msg -2 0 1663 0 30036 50313
LOG:  sending inval msg -1 0 30036 0 30622 30036
LOG:  sending inval msg -2 0 1663 0 30036 50325


Tom Lane napsal(a):
> Jakub Ouhrabka <kuba@comgate.cz> writes:
>> We'we tried hard to identify what's the cause of filling sinval-queue.
>> We went through query logs as well as function bodies stored in the
>> database. We were not able to find any DDL, temp table creations etc.
>
> Strange.  The best idea that comes to mind is to add some debugging
> code to SendSharedInvalidMessage to log the content of each message
> that's sent out.  That would at least tell us *what* is going into
> the queue, even if not directly *why*.  Try something like (untested)
>
> void
> SendSharedInvalidMessage(SharedInvalidationMessage *msg)
> {
>     bool        insertOK;
>
> +    elog(LOG, "sending inval msg %d %u %u %u %u %u",
> +        msg->cc.id,
> +        msg->cc.tuplePtr.ip_blkid.bi_hi,
> +        msg->cc.tuplePtr.ip_blkid.bi_lo,
> +        msg->cc.tuplePtr.ip_posid,
> +        msg->cc.dbId,
> +        msg->cc.hashValue);
> +
>     LWLockAcquire(SInvalLock, LW_EXCLUSIVE);
>     insertOK = SIInsertDataEntry(shmInvalBuffer, msg);
>     LWLockRelease(SInvalLock);
>
>             regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

14 January 2008, 11:22:21

Jakub Ouhrabka <kuba@comgate.cz> writes:
> What does it mean?

Look at src/include/storage/sinval.h and src/include/utils/syscache.h.
What you seem to have here is a bunch of tuple updates in pg_class
(invalidating caches 29 and 30, which in 8.2 correspond to RELNAMENSP
and RELOID), followed by a bunch of SharedInvalRelcacheMsg and
SharedInvalSmgrMsg.

What I find interesting is that the hits are coming against
nearly-successive tuple CTIDs in pg_class, eg these are all
on pages 25 and 26 of pg_class:

> LOG:  sending inval msg 30 0 25 45 30036 4294936595
> LOG:  sending inval msg 29 0 25 45 30036 2019111801
> LOG:  sending inval msg 30 0 26 11 30036 4294936595
> LOG:  sending inval msg 29 0 26 11 30036 2019111801
> LOG:  sending inval msg 30 0 25 44 30036 4294936597
> LOG:  sending inval msg 29 0 25 44 30036 3703878920
> LOG:  sending inval msg 30 0 26 10 30036 4294936597
> LOG:  sending inval msg 29 0 26 10 30036 3703878920
> LOG:  sending inval msg 30 0 26 9 30036 4294936616
> LOG:  sending inval msg 29 0 26 9 30036 3527122063
> LOG:  sending inval msg 30 0 25 43 30036 4294936616
> LOG:  sending inval msg 29 0 25 43 30036 3527122063

The ordering is a little strange --- not sure what's producing that.

I can think of three things that might be producing this:

1. DDL operations ... but most sorts of DDL on a table would touch
more catalogs than just pg_class, so this is a bit hard to credit.

2. VACUUM.

3. Some sort of direct update of pg_class.

The fact that we have a bunch of catcache invals followed by
relcache/smgr invals says that this all happened in one transaction,
else they'd have been intermixed better.  That lets VACUUM off the
hook, because it processes each table in a separate transaction.

I am wondering if maybe your app does one of those sneaky things like
fooling with pg_class.reltriggers.  If so, the problem might be soluble
by just avoiding unnecessary updates.

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

15 January 2008, 08:09:36

Hi Tom,

 > I can think of three things that might be producing this:

we've found it: TRUNCATE

We'll try to eliminate use of TRUNCATE and the periodical spikes should
go off. There will still be possibility of spikes because of database
creation etc - we'll try to handle this by issuing trivial commands from
idle backeds and in longer run to decrease the number of
backends/databases running. This is the way to go, right?

One more question: is it ok to do mass regexp update of pg_proc.prosrc
changing TRUNCATEs to DELETEs? Anything we should be aware of? Sure
we'll do our own testing but in case you see any obvious downsides...

Many thanks for your guidance!

Kuba

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Adrian Moisey

Date:

15 January 2008, 08:26:19

Hi

>  > I can think of three things that might be producing this:
>
> we've found it: TRUNCATE

I haven't been following this thread.  Can someone please explain to me
why TRUNCATE causes these spikes?

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

15 January 2008, 11:21:10

Jakub Ouhrabka <kuba@comgate.cz> writes:
> we've found it: TRUNCATE

Huh.  One transaction truncating a dozen tables?  That would match the
sinval trace all right ...

> One more question: is it ok to do mass regexp update of pg_proc.prosrc
> changing TRUNCATEs to DELETEs?

You might be throwing the baby out with the bathwater, performance-wise.
Mass DELETEs will require cleanup by VACUUM, and that'll likely eat more
cycles and I/O than you save.  I'd think in terms of trying to spread
out the TRUNCATEs or check to see if you really need one (maybe the
table's already empty), rather than this.

I do plan to look at the sinval code once 8.3 is out the door, so
another possibility if you can wait a few weeks/months is to leave your
app alone and see if the eventual patch looks sane to back-patch.
(I don't think the community would consider doing so, but you could
run a locally modified Postgres with it.)

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

15 January 2008, 11:46:17

 > Huh.  One transaction truncating a dozen tables?  That would match the
 > sinval trace all right ...

It should be 4 tables - the shown log looks like there were more truncates?

 > You might be throwing the baby out with the bathwater,
 > performance-wise.

Yes, performance was the initial reason to use truncate instead of
delete many years ago. But today the truncated tables usualy contain
exactly one row - quick measurements now show that it's faster to issue
delete instead of truncate in this case.

Again, many thanks for your invaluable advice!

Kuba

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

15 January 2008, 11:51:14

Adrian Moisey <adrian@careerjunction.co.za> writes:
>> we've found it: TRUNCATE

> I haven't been following this thread.  Can someone please explain to me
> why TRUNCATE causes these spikes?

It's not so much the TRUNCATE as the overhead of broadcasting the
resultant catalog changes to the many hundreds of (mostly idle)
backends he's got --- all of which respond by trying to lock the
shared sinval message queue at about the same time.

You could see the whole thread as an object lesson in why connection
pooling is a good idea.  But certainly it seems that sinval is the
next bottleneck in terms of being able to scale Postgres up to very
large numbers of backends.

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Tom Lane

Date:

15 January 2008, 12:26:10

Jakub Ouhrabka <kuba@comgate.cz> writes:
>>> Huh.  One transaction truncating a dozen tables?  That would match the
>>> sinval trace all right ...

> It should be 4 tables - the shown log looks like there were more truncates?

Actually, counting up the entries, there are close to 2 dozen relations
apparently being truncated in the trace you showed.  But that might be
only four tables at the user level, since each index on these tables
would appear separately, and you might have a toast table plus index
for each one too.  If you want to dig down, the table OIDs are visible
in the trace, in the messages with type -1:

>> LOG:  sending inval msg -1 0 30036 0 30700 3218341912
                                ^^^^^   ^^^^^
                                DBOID   RELOID

so you could look into pg_class to confirm what's what.

> Yes, performance was the initial reason to use truncate instead of
> delete many years ago. But today the truncated tables usualy contain
> exactly one row - quick measurements now show that it's faster to issue
> delete instead of truncate in this case.

Okay, for a table of just a few entries I agree that DELETE is probably
better.  But don't forget you're going to need to have those tables
vacuumed fairly regularly now, else they'll start to bloat.

            regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Jakub Ouhrabka

Date:

15 January 2008, 13:06:56

 > Okay, for a table of just a few entries I agree that DELETE is
 > probably better.  But don't forget you're going to need to have those
 > tables vacuumed fairly regularly now, else they'll start to bloat.

I think we'll go with DELETE also for another reason:

Just after we figured out the cause of the spikes we started to
investigate a long-term issue we had with PostgreSQL: pg_dump of big
database was blocking some of our applications. And yes, we replaced
TRUNCATE with DELETE and everything is running as expected.

Looking at the docs now I see there is a new paragraph in 8.3 docs
mentioning that TRUNCATE is not MVCC-safe and also the blocking issue.
It's a pity that the warning wasn't there in 7.1 times :-)

Thanks,

Kuba

Tom Lane napsal(a):
> Jakub Ouhrabka <kuba@comgate.cz> writes:
>>>> Huh.  One transaction truncating a dozen tables?  That would match the
>>>> sinval trace all right ...
>
>> It should be 4 tables - the shown log looks like there were more truncates?
>
> Actually, counting up the entries, there are close to 2 dozen relations
> apparently being truncated in the trace you showed.  But that might be
> only four tables at the user level, since each index on these tables
> would appear separately, and you might have a toast table plus index
> for each one too.  If you want to dig down, the table OIDs are visible
> in the trace, in the messages with type -1:
>
>>> LOG:  sending inval msg -1 0 30036 0 30700 3218341912
>                                 ^^^^^   ^^^^^
>                                 DBOID   RELOID
>
> so you could look into pg_class to confirm what's what.
>
>> Yes, performance was the initial reason to use truncate instead of
>> delete many years ago. But today the truncated tables usualy contain
>> exactly one row - quick measurements now show that it's faster to issue
>> delete instead of truncate in this case.
>
> Okay, for a table of just a few entries I agree that DELETE is probably
> better.  But don't forget you're going to need to have those tables
> vacuumed fairly regularly now, else they'll start to bloat.
>
>             regards, tom lane

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Simon Riggs

Date:

25 January 2008, 19:26:42

On Mon, 2008-01-07 at 19:54 -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Perhaps it would make sense to try to take the "fast path" in
> > SIDelExpiredDataEntries with only a shared lock rather than exclusive.
>
> I think the real problem here is that sinval catchup processing is well
> designed to create contention :-(.

Thinking some more about handling TRUNCATEs...

> Some ideas for improving matters:
>
> 1. Try to avoid having all the backends hit the queue at once.  Instead
> of SIGUSR1'ing everybody at the same time, maybe hit only the process
> with the oldest message pointer, and have him hit the next oldest after
> he's done reading the queue.
>
> 2. Try to take more than one message off the queue per SInvalLock cycle.
> (There is a tuning tradeoff here, since it would mean holding the lock
> for longer at a time.)
>
> 3. Try to avoid having every backend run SIDelExpiredDataEntries every
> time through ReceiveSharedInvalidMessages.  It's not critical to delete
> entries until the queue starts getting full --- maybe we could rejigger
> the logic so it only happens once when somebody notices the queue is
> getting full, or so that only the guy(s) who had nextMsgNum == minMsgNum
> do it, or something like that?

(2) is unnecessary if we can reduce the number of Exclusive lockers so
that repeated access to the backend's messages is not contended.

(1) would do this, but seems like it would be complex. We can reduce the
possibility of multiple re-signals though.

(3) seems like the easiest route, as long as we get a reasonable
algorithm for reducing the access rate to a reasonable level.

I'm posting a patch for discussion to -patches now that will do this. It
seems straightforward enough to include in 8.3, but that may rise a few
eyebrows, but read the patch first.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Matthew

Date:

28 January 2008, 10:29:08

On Fri, 25 Jan 2008, Simon Riggs wrote:
>> 1. Try to avoid having all the backends hit the queue at once.  Instead
>> of SIGUSR1'ing everybody at the same time, maybe hit only the process
>> with the oldest message pointer, and have him hit the next oldest after
>> he's done reading the queue.

My feeling was that an "obvious" way to deal with this is to implement
some sort of "random early detect". That is, randomly SIGUSR1 processes as
entries are added to the queue. The signals should become more frequent as
the queue length increases, until it reaches the current cut-off of
signalling everyone when the queue really is full. The hope would be that
that would never happen.

Matthew

Re: Linux/PostgreSQL scalability issue - problem with 8 cores

From

Bruce Momjian

Date:

21 March 2008, 22:44:36

Added to TODO:

* Improve performance of shared invalidation queue for multiple CPUs

  http://archives.postgresql.org/pgsql-performance/2008-01/msg00023.php


---------------------------------------------------------------------------

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Perhaps it would make sense to try to take the "fast path" in
> > SIDelExpiredDataEntries with only a shared lock rather than exclusive.
>
> I think the real problem here is that sinval catchup processing is well
> designed to create contention :-(.  Once we've decided that the message
> queue is getting too full, we SIGUSR1 all the backends at once (or as
> fast as the postmaster can do it anyway), then they all go off and try
> to touch the sinval queue.  Backends that haven't awoken even once
> since the last time will have to process the entire queue contents,
> and they're all trying to do that at the same time.  What's worse, they
> take and release the SInvalLock once for each message they take off the
> queue.  This isn't so horrid for one-core machines (since each process
> will monopolize the CPU for probably less than one timeslice while it's
> catching up) but it's pretty obvious where all the contention is coming
> from on an 8-core.
>
> Some ideas for improving matters:
>
> 1. Try to avoid having all the backends hit the queue at once.  Instead
> of SIGUSR1'ing everybody at the same time, maybe hit only the process
> with the oldest message pointer, and have him hit the next oldest after
> he's done reading the queue.
>
> 2. Try to take more than one message off the queue per SInvalLock cycle.
> (There is a tuning tradeoff here, since it would mean holding the lock
> for longer at a time.)
>
> 3. Try to avoid having every backend run SIDelExpiredDataEntries every
> time through ReceiveSharedInvalidMessages.  It's not critical to delete
> entries until the queue starts getting full --- maybe we could rejigger
> the logic so it only happens once when somebody notices the queue is
> getting full, or so that only the guy(s) who had nextMsgNum == minMsgNum
> do it, or something like that?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +