Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum) - Mailing list pgsql-hackers

From Sergey Shinderuk
Subject Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)
Date
Msg-id e960e889-f85c-4be8-819c-acd6ca299ce2@postgrespro.ru
Whole thread Raw
In response to Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 16.06.2025 17:41, Andres Freund wrote:
> TBH, I don't see a point in continuing with this thread without something that
> others can test.  I rather doubt that the right fix here is to just change the
> lock model over, but without a repro I can't evaluate that.


Hello,

I think I can reproduce the issue with pgbench on a muti-core server. I 
start a regular select-only test with 64 clients, and while it's 
running, I start a plpgsql loop creating and dropping temporary tables 
from a single psql session. I observe ~25% drop in tps reported by 
pgbench until I cancel the query in psql.


$ pgbench -n -S -c64 -j64 -T300 -P1

progress: 10.0 s, 1249724.7 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 11.0 s, 1248289.0 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 12.0 s, 1246001.0 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 13.0 s, 1247832.5 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 14.0 s, 1248205.8 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 15.0 s, 1247737.3 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 16.0 s, 1219444.3 tps, lat 0.052 ms stddev 0.039, 0 failed
progress: 17.0 s, 893943.4 tps, lat 0.071 ms stddev 0.159, 0 failed
progress: 18.0 s, 927861.3 tps, lat 0.069 ms stddev 0.150, 0 failed
progress: 19.0 s, 886317.1 tps, lat 0.072 ms stddev 0.163, 0 failed
progress: 20.0 s, 877200.1 tps, lat 0.073 ms stddev 0.164, 0 failed
progress: 21.0 s, 875424.4 tps, lat 0.073 ms stddev 0.163, 0 failed
progress: 22.0 s, 877693.0 tps, lat 0.073 ms stddev 0.165, 0 failed
progress: 23.0 s, 897202.8 tps, lat 0.071 ms stddev 0.158, 0 failed
progress: 24.0 s, 917853.4 tps, lat 0.070 ms stddev 0.153, 0 failed
progress: 25.0 s, 907865.1 tps, lat 0.070 ms stddev 0.154, 0 failed

Here I started the following loop in psql around 17s and tps dropped by 
~25%:

do $$
begin
   for i in 1..1000000 loop
     create temp table tt1 (a bigserial primary key, b text);
     drop table tt1;
     commit;
   end loop;
end;
$$;

Now, if I simply remove the spinlock in SIGetDataEntries, I see a drop 
of just ~6% under concurrent DDL. I think this strongly suggests that 
the spinlock is the bottleneck.

Before that, I tried removing `if (!hasMessages) return` optimization in 
SIGetDataEntries to stress the spinlock and observed ~35% drop in tps of 
select-only with an empty sinval queue (no DDL running in background). 
Then I also removed the spinlock in SIGetDataEntries, and the loss was 
just ~4%, which may be noise. I think this also suggests that the 
spinlock could be the bottleneck.

I'm running this on a 2 socket AMD EPYC 9654 96-Core server with 
postgres and pgbench bound to distinct CPUs. PGDATA is placed on tmpfs. 
postgres is running with the default settings. pgbench tables are of 
scale 1. pgbench is connecting via loopback/127.0.0.1.

Does this sound convincing?

Best regards,

-- 
Sergey Shinderuk        https://postgrespro.com/




pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: pg_dump misses comments on NOT NULL constraints
Next
From: Shlok Kyal
Date:
Subject: Re: Logical Replication of sequences