Re: [HACKERS] Proposal for CSN based snapshots - Mailing list pgsql-hackers

From Alexander Kuzmenkov
Subject Re: [HACKERS] Proposal for CSN based snapshots
Date
Msg-id 8a855f33-2581-66bf-85f7-0b99239edbda@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Proposal for CSN based snapshots  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Proposal for CSN based snapshots
List pgsql-hackers
Hi hackers,

Here is a new version of the patch with some improvements, rebased to 
117469006b.

Performance on pgbench tpcb with subtransactions is now slightly better 
than master. See the picture 'savepoints2'. This was achieved by 
removing unnecessary exclusive locking on CSNLogControlLock in 
SubTransSetParent. After that change, both versions are mostly waiting 
on XidGenLock in GetNewTransactionId.

Performance on default pgbench tpcb is also improved. At scale 500, csn 
is at best 30% faster than master, see the picture 'tpcb500'. These 
improvements are due to slight optimizations of GetSnapshotData and 
refreshing RecentGlobalXmin less often. At scale 1500, csn is slightly 
faster at up to 200 clients, but then degrades steadily: see the picture 
'tpcb1500'. Nevertheless, CSN-related code paths do not show up in perf 
profiles or LWLock wait statistics [1]. I think what we are seeing here 
is again that when some bottlenecks are removed, the fast degradation of 
LWLocks under contention leads to net drop in performance. With this in 
mind, I tried running the same benchmarks with patch from Yura Sokolov 
[2], which should improve LWLock performance on NUMA machines. Indeed, 
with this patch csn starts outperforming master on all numbers of 
clients measured, as you can see in the picture 'tpcb1500'. This LWLock 
change influences the csn a lot more than master, which also suggests 
that we are observing a superlinear degradation of LWLocks under 
increasing contention.

After this I plan to improve the comments, since many of them have 
become out of date, and work on logical replication.

[1] To collect LWLock wait statistics, I sample pg_stat_activity, and 
also use a bcc script by Andres Freund: 

https://www.postgresql.org/message-id/flat/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de#20170622210845.d2hsbqv6rxu2tiye@alap3.anarazel.de

[2] 

https://www.postgresql.org/message-id/flat/2968c0be065baab8865c4c95de3f435c@postgrespro.ru#2968c0be065baab8865c4c95de3f435c@postgrespro.ru

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: BUG #14941: Vacuum crashes
Next
From: Masahiko Sawada
Date:
Subject: Re: Re: User defined data types in Logical Replication