Home > mailing lists

suboverflowed subtransactions concurrency performance optimize - Mailing list pgsql-hackers

From	Pengchengliu
Subject	suboverflowed subtransactions concurrency performance optimize
Date	August 30, 2021 11:43:24
Msg-id	003201d79d7b$189141f0$49b3c5d0$@tju.edu.cn Whole thread Raw
Responses	Re: suboverflowed subtransactions concurrency performance optimize Re: suboverflowed subtransactions concurrency performance optimize
List	pgsql-hackers

Tree view

Hi hackers,
I wrote a patch to resolve the subtransactions concurrency performance
problems when suboverflowed.

When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per
transaction concurrency, it will lead to subtransactions performance
problems. 
All backend will be stuck at acquiring lock SubtransSLRULock.

The reproduce steps in PG master branch:

1, init a cluster, append postgresql.conf as below: 

max_connections = '2500'
max_files_per_process = '2000'
max_locks_per_transaction = '64'
max_parallel_maintenance_workers = '8'
max_parallel_workers = '60'
max_parallel_workers_per_gather = '6'
max_prepared_transactions = '15000'
max_replication_slots = '10'
max_wal_senders = '64'
max_worker_processes = '250'
shared_buffers = 8GB

2, create table and insert some records as below:

CREATE UNLOGGED TABLE contend (
    id integer,
    val integer NOT NULL
)
WITH (fillfactor='50');
 
INSERT INTO contend (id, val)
SELECT i, 0
FROM generate_series(1, 10000) AS i;
 
VACUUM (ANALYZE) contend;

3, The script subtrans_128.sql in attachment. use pgbench with
subtrans_128.sql as below.
pgbench  -d postgres -p 33800 -n -r -f subtrans_128.sql  -c 500 -j 500 -T
3600


4, After for a while, we can get the stuck result. We can query
pg_stat_activity. All backends wait event is SubtransSLRULock.
   We can use pert top and try find the root cause. The result of perf top
as below:
66.20%  postgres            [.] pg_atomic_compare_exchange_u32_impl
  29.30%  postgres            [.] pg_atomic_fetch_sub_u32_impl
   1.46%  postgres            [.] pg_atomic_read_u32
   1.34%  postgres            [.] TransactionIdIsCurrentTransactionId
   0.75%  postgres            [.] SimpleLruReadPage_ReadOnly
   0.14%  postgres            [.] LWLockAttemptLock
   0.14%  postgres            [.] LWLockAcquire
   0.12%  postgres            [.] pg_atomic_compare_exchange_u32
   0.09%  postgres            [.] HeapTupleSatisfiesMVCC
   0.06%  postgres            [.] heapgetpage
   0.03%  postgres            [.] sentinel_ok
   0.03%  postgres            [.] XidInMVCCSnapshot
   0.03%  postgres            [.] slot_deform_heap_tuple
   0.03%  postgres            [.] ExecInterpExpr
   0.02%  postgres            [.] AllocSetCheck
   0.02%  postgres            [.] HeapTupleSatisfiesVisibility
   0.02%  postgres            [.] LWLockRelease
   0.02%  postgres            [.] TransactionIdPrecedes
   0.02%  postgres            [.] SubTransGetParent
   0.01%  postgres            [.] heapgettup_pagemode
   0.01%  postgres            [.] CheckForSerializableConflictOutNeeded


After view the subtrans codes, it is easy to find that the global LWLock
SubtransSLRULock is the bottleneck of subtrans concurrently.

When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64)
subtransactions, we will get a snapshot with suboverflowed.
A suboverflowed snapshot does not contain all data required to determine
visibility, so PostgreSQL will occasionally have to resort to pg_subtrans. 
These pages are cached in shared buffers, but you can see the overhead of
looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf
output.

To resolve this performance problem, we think about a solution which cache
SubtransSLRU to local cache. 
First we can query parent transaction id from SubtransSLRU, and copy the
SLRU page to local cache page.
After that if we need query parent transaction id again, we can query it
from local cache directly.
It will reduce the number of acquire and release LWLock SubtransSLRULock
observably.

From all snapshots, we can get the latest xmin. All transaction id which
precedes this xmin, it muse has been commited/abortd. 
Their parent/top transaction has been written subtrans SLRU. Then we can
cache the subtrans SLRU and copy it to local cache.

Use the same produce steps above, with our patch we cannot get the stuck
result.
Note that append our GUC parameter in postgresql.conf. This optimize is off
in default.
local_cache_subtrans_pages=128 

The patch is base on PG master branch
0d906b2c0b1f0d625ff63d9ace906556b1c66a68


Our project in  https://github.com/ADBSQL/AntDB, Welcome to follow us,
AntDB, AsiaInfo's PG-based distributed database product

Thanks
Pengcheng

Attachment

pgsql-hackers by date:

From: "Andrey V. Lepikhov"
Date: 30 August 2021, 11:36:38
Subject: Re: Defer selection of asynchronous subplans until the executor initialization stage

From: Greg Nancarrow
Date: 30 August 2021, 11:44:12
Subject: Re: Added schema level support for publication.

suboverflowed subtransactions concurrency performance optimize - Mailing list pgsql-hackers

Attachment

Previous

Next