Re: MultiXact\SLRU buffers configuration - Mailing list pgsql-hackers
From | Gilles Darold |
---|---|
Subject | Re: MultiXact\SLRU buffers configuration |
Date | |
Msg-id | 3319917a-679e-b07d-b194-473552b72082@darold.net Whole thread Raw |
In response to | Re: MultiXact\SLRU buffers configuration (Gilles Darold <gilles@darold.net>) |
Responses |
Re: MultiXact\SLRU buffers configuration
|
List | pgsql-hackers |
Le 08/12/2020 à 18:52, Andrey Borodin a écrit :Hi Gilles! Many thanks for your message!8 дек. 2020 г., в 21:05, Gilles Darold <gilles@darold.net> написал(а): I know that this report is not really helpfulQuite contrary - this benchmarks prove that controllable reproduction exists. I've rebased patches for PG11. Can you please benchmark them (without extending SLRU)? Best regards, Andrey Borodin.Hi,
Running tests yesterday with the patches has reported log of failures with error on INSERT and UPDATE statements:
ERROR: lock MultiXactOffsetControlLock is not held
After a patch review this morning I think I have found what's going wrong. In patch v6-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch I think there is a missing reinitialisation of the lockmode variable to LW_NONE inside the retry loop after the call to LWLockRelease() in src/backend/access/transam/multixact.c:1392:GetMultiXactIdMembers(). I've attached a new version of the patch for master that include the fix I'm using now with PG11 and with which everything works very well now.
I'm running more tests to see the impact on the performances to play with multixact_offsets_slru_buffers, multixact_members_slru_buffers and multixact_local_cache_entries. I will reports the results later today.
Sorry for the delay, I have done some further tests to try to reach the limit without bottlenecks on multixact or shared buffers. The tests was done on a Microsoft Asure machine with 2TB of RAM and 4 sockets Intel Xeon Platinum 8280M (128 cpu). PG configuration:
max_connections = 4096
shared_buffers = 64GB
max_prepared_transactions = 2048
work_mem = 256MB
maintenance_work_mem = 2GB
wal_level = minimal
synchronous_commit = off
commit_delay = 1000
commit_siblings = 10
checkpoint_timeout = 1h
max_wal_size = 32GB
checkpoint_completion_target = 0.9
I have tested with several values for the different buffer's variables starting from:
multixact_offsets_slru_buffers = 64
multixact_members_slru_buffers = 128
multixact_local_cache_entries = 256
to the values with the best performances we achieve with this test to avoid MultiXactOffsetControlLock or MultiXactMemberControlLock:
multixact_offsets_slru_buffers = 128
multixact_members_slru_buffers = 512
multixact_local_cache_entries = 1024
Also shared_buffers have been increased up to 256GB to avoid buffer_mapping contention.
Our last best test reports the following wait events:
event_type | event | sum
------------+----------------------------+-----------
Client | ClientRead | 321690211
LWLock | buffer_content | 2970016
IPC | ProcArrayGroupUpdate | 2317388
LWLock | ProcArrayLock | 1445828
LWLock | WALWriteLock | 1187606
LWLock | SubtransControlLock | 972889
Lock | transactionid | 840560
Lock | relation | 587600
Activity | LogicalLauncherMain | 529599
Activity | AutoVacuumMain | 528097
At this stage I don't think we can have better performances by tuning these buffers at least with PG11.
About performances gain related to the patch for shared lock in GetMultiXactIdMembers unfortunately I can not see a difference with or without this patch, it could be related to our particular benchmark. But clearly the patch on multixact buffers should be committed as this is really helpfull to be able to tuned PG when multixact bottlenecks are found.
Best regards,
-- Gilles Darold LzLabs GmbH https://www.lzlabs.com/
pgsql-hackers by date: