suboverflowed subtransactions concurrency performance optimize - Mailing list pgsql-hackers
From | Pengchengliu |
---|---|
Subject | suboverflowed subtransactions concurrency performance optimize |
Date | |
Msg-id | 003201d79d7b$189141f0$49b3c5d0$@tju.edu.cn Whole thread Raw |
Responses |
Re: suboverflowed subtransactions concurrency performance optimize
Re: suboverflowed subtransactions concurrency performance optimize |
List | pgsql-hackers |
Hi hackers, I wrote a patch to resolve the subtransactions concurrency performance problems when suboverflowed. When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per transaction concurrency, it will lead to subtransactions performance problems. All backend will be stuck at acquiring lock SubtransSLRULock. The reproduce steps in PG master branch: 1, init a cluster, append postgresql.conf as below: max_connections = '2500' max_files_per_process = '2000' max_locks_per_transaction = '64' max_parallel_maintenance_workers = '8' max_parallel_workers = '60' max_parallel_workers_per_gather = '6' max_prepared_transactions = '15000' max_replication_slots = '10' max_wal_senders = '64' max_worker_processes = '250' shared_buffers = 8GB 2, create table and insert some records as below: CREATE UNLOGGED TABLE contend ( id integer, val integer NOT NULL ) WITH (fillfactor='50'); INSERT INTO contend (id, val) SELECT i, 0 FROM generate_series(1, 10000) AS i; VACUUM (ANALYZE) contend; 3, The script subtrans_128.sql in attachment. use pgbench with subtrans_128.sql as below. pgbench -d postgres -p 33800 -n -r -f subtrans_128.sql -c 500 -j 500 -T 3600 4, After for a while, we can get the stuck result. We can query pg_stat_activity. All backends wait event is SubtransSLRULock. We can use pert top and try find the root cause. The result of perf top as below: 66.20% postgres [.] pg_atomic_compare_exchange_u32_impl 29.30% postgres [.] pg_atomic_fetch_sub_u32_impl 1.46% postgres [.] pg_atomic_read_u32 1.34% postgres [.] TransactionIdIsCurrentTransactionId 0.75% postgres [.] SimpleLruReadPage_ReadOnly 0.14% postgres [.] LWLockAttemptLock 0.14% postgres [.] LWLockAcquire 0.12% postgres [.] pg_atomic_compare_exchange_u32 0.09% postgres [.] HeapTupleSatisfiesMVCC 0.06% postgres [.] heapgetpage 0.03% postgres [.] sentinel_ok 0.03% postgres [.] XidInMVCCSnapshot 0.03% postgres [.] slot_deform_heap_tuple 0.03% postgres [.] ExecInterpExpr 0.02% postgres [.] AllocSetCheck 0.02% postgres [.] HeapTupleSatisfiesVisibility 0.02% postgres [.] LWLockRelease 0.02% postgres [.] TransactionIdPrecedes 0.02% postgres [.] SubTransGetParent 0.01% postgres [.] heapgettup_pagemode 0.01% postgres [.] CheckForSerializableConflictOutNeeded After view the subtrans codes, it is easy to find that the global LWLock SubtransSLRULock is the bottleneck of subtrans concurrently. When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions, we will get a snapshot with suboverflowed. A suboverflowed snapshot does not contain all data required to determine visibility, so PostgreSQL will occasionally have to resort to pg_subtrans. These pages are cached in shared buffers, but you can see the overhead of looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf output. To resolve this performance problem, we think about a solution which cache SubtransSLRU to local cache. First we can query parent transaction id from SubtransSLRU, and copy the SLRU page to local cache page. After that if we need query parent transaction id again, we can query it from local cache directly. It will reduce the number of acquire and release LWLock SubtransSLRULock observably. From all snapshots, we can get the latest xmin. All transaction id which precedes this xmin, it muse has been commited/abortd. Their parent/top transaction has been written subtrans SLRU. Then we can cache the subtrans SLRU and copy it to local cache. Use the same produce steps above, with our patch we cannot get the stuck result. Note that append our GUC parameter in postgresql.conf. This optimize is off in default. local_cache_subtrans_pages=128 The patch is base on PG master branch 0d906b2c0b1f0d625ff63d9ace906556b1c66a68 Our project in https://github.com/ADBSQL/AntDB, Welcome to follow us, AntDB, AsiaInfo's PG-based distributed database product Thanks Pengcheng
Attachment
pgsql-hackers by date: