[HACKERS] Two pass CheckDeadlock in contentent case - Mailing list pgsql-hackers
From | Sokolov Yura |
---|---|
Subject | [HACKERS] Two pass CheckDeadlock in contentent case |
Date | |
Msg-id | bac42052debbd66e8d5f786d8abe8db1@postgrespro.ru Whole thread Raw |
Responses |
Re: [HACKERS] Two pass CheckDeadlock in contentent case
Re: [HACKERS] Two pass CheckDeadlock in contentent case |
List | pgsql-hackers |
Good day, hackers. During hard workload sometimes process reaches deadlock timeout even if no real deadlock occurred. It is easily reproducible with pg_xact_advisory_lock on same value + some time consuming operation (update) and many clients. When backend reaches deadlock timeout, it calls CheckDeadlock, which locks all partitions of lock hash in exclusive mode to walk through graph and search for deadlock. If hundreds of backends reaches this timeout trying to acquire advisory lock on a same value, it leads to hard-stuck for many seconds, cause they all traverse same huge lock graph under exclusive lock. During this stuck there is no possibility to do any meaningful operations (no new transaction can begin). Attached patch makes CheckDeadlock to do two passes: - first pass uses LW_SHARED on partitions of lock hash. DeadLockCheck is called with flag "readonly", so it doesn't modify anything. - If there is possibility of "soft" or "hard" deadlock detected, ie if there is need to modify lock graph, then partitions relocked with LW_EXCLUSIVE, and DeadLockCheck is called again. It fixes hard-stuck, cause backends walk lock graph under shared lock, and found that there is no real deadlock. Test on 4 socket xeon machine: pgbench_zipf -s 10 -c 800 -j 100 -M prepared -T 450 -f ./ycsb_read_zipf.sql@50 -f ./ycsb_update_lock2_zipf.sql@50 -P 5 ycsb_read_zipf.sql: \set i random_zipfian(1, 100000 * :scale, 1.01) SELECT abalance FROM pgbench_accounts WHERE aid = :i; ycsb_update_lock2_zipf.sql: \set i random_zipfian(1, 100000 * :scale, 1.01) select lock_and_update( :i ); CREATE OR REPLACE FUNCTION public.lock_and_update(i integer) RETURNS void LANGUAGE sql AS $function$ SELECT pg_advisory_xact_lock( $1 ); UPDATE pgbench_accounts SET abalance = 1 WHERE aid = $1; $function$ Without attached patch: progress: 5.0 s, 45707.0 tps, lat 15.599 ms stddev 83.757 progress: 10.0 s, 51124.4 tps, lat 15.681 ms stddev 78.353 progress: 15.0 s, 52293.8 tps, lat 15.327 ms stddev 77.017 progress: 20.0 s, 51280.4 tps, lat 15.603 ms stddev 78.199 progress: 25.0 s, 47278.6 tps, lat 16.795 ms stddev 83.570 progress: 30.0 s, 41792.9 tps, lat 18.535 ms stddev 93.697 progress: 35.0 s, 12393.7 tps, lat 33.757 ms stddev 169.116 progress: 40.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 45.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 50.0 s, 1.2 tps, lat 2497.734 ms stddev 5393.166 progress: 55.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 60.0 s, 27357.9 tps, lat 160.622 ms stddev 1866.625 progress: 65.0 s, 38770.8 tps, lat 20.829 ms stddev 104.453 progress: 70.0 s, 40553.2 tps, lat 19.809 ms stddev 99.741 (autovacuum led to trigger deadlock timeout, and therefore, to stuck) Patched: progress: 5.0 s, 56264.7 tps, lat 12.847 ms stddev 66.980 progress: 10.0 s, 55389.3 tps, lat 14.329 ms stddev 71.997 progress: 15.0 s, 50757.7 tps, lat 15.730 ms stddev 78.222 progress: 20.0 s, 50797.3 tps, lat 15.736 ms stddev 79.296 progress: 25.0 s, 48485.3 tps, lat 16.432 ms stddev 82.720 progress: 30.0 s, 45202.1 tps, lat 17.733 ms stddev 88.554 progress: 35.0 s, 40035.8 tps, lat 19.343 ms stddev 97.787 progress: 40.0 s, 14240.1 tps, lat 47.625 ms stddev 265.465 progress: 45.0 s, 30150.6 tps, lat 31.140 ms stddev 196.114 progress: 50.0 s, 38975.0 tps, lat 20.543 ms stddev 104.281 progress: 55.0 s, 40573.9 tps, lat 19.770 ms stddev 99.487 progress: 60.0 s, 38361.1 tps, lat 20.693 ms stddev 103.141 progress: 65.0 s, 39793.3 tps, lat 20.216 ms stddev 101.080 progress: 70.0 s, 38387.9 tps, lat 20.886 ms stddev 104.482 (autovacuum led to trigger deadlock timeout, but postgresql did not stuck) I believe, patch will positively affect other heavy workloads as well, although I have not collect benchmarks. `make check-world` passes with configured `--enable-tap-tests --enable-casserts` -- Sokolov Yura Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: