On Fri, Sep 11, 2020 at 4:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/09/11 16:23, bttanakahbk wrote:
> >
> > pgbench:
> > initialization: pgbench -i -s 100
> > benchmarking : pgbench -j16 -c128 -T180 -r -n -f <script> -h <address> -U <user> -p <port> -d <db>
> > # VACUUMed and pg_prewarmed manually before run the benchmark
> > query:SELECT 1;
> >> pgss_lwlock_v2.patch track_planning TPS decline rate s_lock CPU usage
> >> - OFF 810509.4 standard 0.17% 98.8%(sys24.9%,user73.9%)
> >> - ON 732823.1 -9.6% 1.94% 95.1%(sys22.8%,user72.3%)
> >> + OFF 371035.0 -49.4% - 65.2%(sys20.6%,user44.6%)
> >> + ON 193965.2 -47.7% - 41.8%(sys12.1%,user29.7%)
> > # "-" is showing that s_lock was not reported from the perf.
>
> Ok, so my proposed patch degrated the performance in this case :(
> This means that replacing spinlock with lwlock in pgss is not proper
> approach for the lock contention issue on pgss...
>
> I proposed to split the spinlock for each pgss entry into two
> to reduce the lock contention, upthread. One is for planner stats,
> and the other is for executor stats. Is it worth working on
> this approach as an alternative idea? Or does anyone have any better idea?
For now only calls and [min|max|mean|total]_time are split between
planning and execution, so we'd have to do the same for the rest of
the counters to be able to have 2 different spinlocks. That'll
increase the size of the struct quite a lot, and we'd also have to
change the SRF output, which is already quite wide.