Re: Patch: fix lock contention for HASHHDR.mutex - Mailing list pgsql-hackers
From | Aleksander Alekseev |
---|---|
Subject | Re: Patch: fix lock contention for HASHHDR.mutex |
Date | |
Msg-id | 20151222183953.771cb58b@fujitsu Whole thread Raw |
In response to | Re: Patch: fix lock contention for HASHHDR.mutex (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Patch: fix lock contention for HASHHDR.mutex
|
List | pgsql-hackers |
> > Actually, I'd like to improve all partitioned hashes instead of > > improve only one case. > > Yeah. I'm not sure that should be an LWLock rather than a spinlock, > but we can benchmark it both ways. I would like to share some preliminary results. I tested four implementations: - no locks and no element stealing from other partitions; - single LWLock per partitioned table; - single spinlock per partitioned table; - NUM_LOCK_PARTITIONS spinlocks per partitioned table; Interestingly "Shared Buffer Lookup Table" (see buf_table.c) has 128 partitions. The constant NUM_BUFFER_PARTITIONS was increased from 16 to 128 in commit 3acc10c9: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3acc10c997f916f6a741d0b4876126b7b08e3892;hp=952872698d9443fdf9b808a1376017f00c91065a Obviously after splitting a freelist into NUM_LOCK_PARTITIONS partitions (and assuming that all necessary locking/unlocking is done on calling side) tables can't have more than NUM_LOCK_PARTITIONS partitions because it would cause race conditions. For this reason I had to define NUM_BUFFER_PARTITIONS as NUM_LOCK_PARTITIONS and compare behaviour of PostgreSQL depending on different values of NUM_LOCK_PARTITIONS. So here are results: Core i7, pgbench -j 8 -c 8 -T 30 pgbench (3 tests, TPS excluding connections establishing) NUM_LOCK_ | master | no locks | lwlock | spinlock | spinlock PARTITIONS | (99ccb2) | | | | array -----------|----------|----------|----------|----------|---------- | 295.4 | 297.4 | 299.4 | 285.6 | 302.7 (1 << 4) | 286.1 | 300.5 | 283.4 | 300.9 | 300.4 | 300.0 | 300.0 | 302.1 | 300.7 | 300.3 -----------|----------|----------|----------|----------|---------- | | 296.7 | 299.9 | 298.8 | 298.3 (1 << 5) | ---- | 301.9 | 302.2 | 305.7 | 306.3 | | 287.7 | 301.0 | 303.0 | 304.5 -----------|----------|----------|----------|----------|---------- | | 296.4 | 300.5 | 302.9 | 304.6 (1 << 6) | ---- | 301.7 | 305.6 | 306.4 | 302.3 | | 299.6 | 304.5 | 306.6 | 300.4 -----------|----------|----------|----------|----------|---------- | | 295.9 | 298.7 | 295.3 | 305.0 (1 << 7) | ---- | 299.5 | 300.5 | 299.0 | 310.2 | | 287.8 | 285.9 | 300.2 | 302.2 Core i7, pgbench -j 8 -c 8 -f big_table.sql -T 30 my_database (3 test, TPS excluding connections establishing) NUM_LOCK_ | master | no locks | lwlock | spinlock | spinlock PARTITIONS | (99ccb2) | | | | array -----------|----------|----------|----------|----------|---------- | 505.1 | 521.3 | 511.1 | 524.4 | 501.6 (1 << 4) | 452.4 | 467.4 | 509.2 | 472.3 | 453.7 | 435.2 | 462.4 | 445.8 | 467.9 | 467.0 -----------|----------|----------|----------|----------|---------- | | 514.8 | 476.3 | 507.9 | 510.6 (1 << 5) | ---- | 457.5 | 491.2 | 464.6 | 431.7 | | 442.2 | 457.0 | 495.5 | 448.2 -----------|----------|----------|----------|----------|---------- | | 516.4 | 502.5 | 468.0 | 521.3 (1 << 6) | ---- | 463.6 | 438.7 | 488.8 | 455.4 | | 434.2 | 468.1 | 484.7 | 433.5 -----------|----------|----------|----------|----------|---------- | | 513.6 | 459.4 | 519.6 | 510.3 (1 << 7) | ---- | 470.1 | 454.6 | 445.5 | 415.9 | | 459.4 | 489.7 | 457.1 | 452.8 60-core server, pgbench -j 64 -c 64 -T 30 pgbench (3 tests, TPS excluding connections establishing) NUM_LOCK_ | master | no locks | lwlock | spinlock | spinlock PARTITIONS | (99ccb2) | | | | array -----------|----------|----------|----------|----------|---------- | 3156.2 | 3157.9 | 3542.0 | 3444.3 | 3472.4 (1 << 4) | 3268.5 | 3444.7 | 3485.7 | 3486.0 | 3500.5 | 3251.2 | 3482.3 | 3398.7 | 3587.1 | 3557.7 -----------|----------|----------|----------|----------|---------- | | 3352.7 | 3556.0 | 3543.3 | 3526.8 (1 << 5) | ---- | 3465.0 | 3475.2 | 3486.9 | 3528.4 | | 3410.0 | 3482.0 | 3493.7 | 3444.9 -----------|----------|----------|----------|----------|---------- | | 3437.8 | 3413.1 | 3445.8 | 3481.6 (1 << 6) | ---- | 3470.1 | 3478.4 | 3538.5 | 3579.9 | | 3450.8 | 3431.1 | 3509.0 | 3512.5 -----------|----------|----------|----------|----------|---------- | | 3425.4 | 3534.6 | 3414.7 | 3517.1 (1 << 7) | ---- | 3436.5 | 3430.0 | 3428.0 | 3536.4 | | 3455.6 | 3479.7 | 3573.4 | 3543.0 60-core server, pgbench -j 64 -c 64 -f big_table.sql -T 30 my_database (3 tests, TPS excluding connections establishing) NUM_LOCK_ | master | no locks | lwlock | spinlock | spinlock PARTITIONS | (99ccb2) | | | | array -----------|----------|----------|----------|----------|---------- | 661.1 | 4639.6 | 1435.2 | 445.9 | 1589.6 (1 << 4) | 642.9 | 4566.7 | 1410.3 | 457.1 | 1601.7 | 643.9 | 4621.8 | 1404.8 | 489.0 | 1592.6 -----------|----------|----------|----------|----------|---------- | | 4721.9 | 1543.1 | 499.1 | 1596.9 (1 << 5) | ---- | 4506.8 | 1513.0 | 528.3 | 1594.7 | | 4744.7 | 1540.3 | 524.0 | 1593.0 -----------|----------|----------|----------|----------|---------- | | 4649.1 | 1564.5 | 475.9 | 1580.1 (1 << 6) | ---- | 4671.0 | 1560.5 | 485.6 | 1589.1 | | 4751.0 | 1557.4 | 505.1 | 1580.3 -----------|----------|----------|----------|----------|---------- | | 4657.7 | 1551.8 | 534.7 | 1585.1 (1 << 7) | ---- | 4616.8 | 1546.8 | 495.8 | 1623.4 | | 4779.2 | 1538.5 | 537.4 | 1588.5 All four implementations (W.I.P. quality --- dirty code, no comments, etc) are attached to this message. Schema of my_database and big_table.sql file are attached to the first message of this thread. A large spread of TPS on Core i7 is due to the fact that its actually my laptop with other applications running beside PostgreSQL. Still we see that all solutions are equally good on this CPU and there is no performance degradation. Now regarding 60-core server: - One spinlock per hash table doesn't scale. I personally was expecting this; - LWLock's and array of spinlocks do scale on NUMA up to a certain point; - Best results are shown by "no locks"; I believe that "no locks" implementation is the best one since it is at least 3 times faster on NUMA then any other implementation. Also it is simpler and doesn't have stealing-from-other-freelists logic that executes rarely and therefore is a likely source of bugs. Regarding ~16 elements of freelists which in some corner cases could but wouldn't be used --- as I mentioned before I believe its not such a big problem. Also its a small price to pay for 3 times more TPS. Regarding NUM_LOCK_PARTITIONS (and NUM_BUFFER_PARTITIONS) I have some doubts. For sure Robert had a good reason for committing 3acc10c9. Unfortunately I'm not familiar with a story behind this commit. What do you think?
Attachment
pgsql-hackers by date: