Re: Spinlock performance improvement proposal - Mailing list pgsql-hackers
From | Neil Padgett |
---|---|
Subject | Re: Spinlock performance improvement proposal |
Date | |
Msg-id | 3BB37322.A9B97FB4@redhat.com Whole thread Raw |
In response to | Spinlock performance improvement proposal (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Tom Lane wrote: > > Neil Padgett <npadgett@redhat.com> writes: > > Well. Currently the runs are the typical pg_bench runs. > > With what parameters? If you don't initialize the pg_bench database > with "scale" proportional to the number of clients you intend to use, > then you'll naturally get huge lock contention. For example, if you > use scale=1, there's only one "branch" in the database. Since every > transaction wants to update the branch's balance, every transaction > has to write-lock that single row, and so everybody serializes on that > one lock. Under these conditions it's not surprising to see lots of > lock waits and lots of useless runs of the deadlock detector ... The results you saw with the large number of useless runs of the deadlock detector had a scale factor of 2. With a scale factor 2, the performance fall-off began at about 100 clients. So, I reran the 512 client profiling run with a scale factor of 12. (2:100 as 10:500 -- so 12 might be an appropriate scale factor with some cushion?) This does, of course, reduce the contention. However, the throughput is still only about twice as much, which sounds good, but is still a small fraction of the throughput realized on the same machine with a small number of clients. (This is the uniprocessor machine.) The new profile looks like this (uniprocessor machine): Flat profile: Each sample counts as 1 samples. % cumulative self self total time samples samples calls T1/call T1/call name 9.44 10753.00 10753.00 pg_fsync (I'd attribute this to the slow disk in the machine -- scale 12 yields a lot of tuples.) 6.63 18303.01 7550.00 s_lock_sleep 6.56 25773.01 7470.00 s_lock 5.88 32473.01 6700.00 heapgettup 5.28 38487.02 6014.00 HeapTupleSatisfiesSnapshot 4.83 43995.02 5508.00 hash_destroy 2.77 47156.02 3161.00 load_file 1.90 49322.02 2166.00 XLogInsert 1.86 51436.02 2114.00 _bt_compare 1.82 53514.02 2078.00 AllocSetAlloc 1.72 55473.02 1959.00 LockBuffer 1.50 57180.02 1707.00 init_ps_display 1.40 58775.03 1595.00 DirectFunctionCall9 1.26 60211.03 1436.00 hash_search 1.14 61511.03 1300.00 GetSnapshotData 1.11 62780.03 1269.00 SpinAcquire 1.10 64028.03 1248.00 LockAcquire 1.04 70148.03 1190.00 heap_fetch 0.91 71182.03 1034.00 _bt_orderkeys 0.89 72201.03 1019.00 LockRelease 0.75 73058.03 857.00 InitBufferPoolAccess . . . I reran the benchmarks on the SMP machine with a scale of 12 instead of 2. The numbers still show a clear performance drop off at approximately 100 clients, albeit not as sharp. (But still quite pronounced.) In terms of raw performance, the numbers are comparable. The scale factor certainly helped -- but it still seems that we might have a problem here. Thoughts? Neil -- Neil Padgett Red Hat Canada Ltd. E-Mail: npadgett@redhat.com 2323 Yonge Street, Suite #300, Toronto, ON M4P 2C9
pgsql-hackers by date: