Re: Speed up transaction completion faster after many relations are accessed in a transaction - Mailing list pgsql-hackers

From David Rowley
Subject Re: Speed up transaction completion faster after many relations are accessed in a transaction
Date
Msg-id CAApHDvo91hN+8ZzpOP9p--2KjSbcwBeoEBcDLb1fYObvehJJuw@mail.gmail.com
Whole thread Raw
In response to Re: Speed up transaction completion faster after many relations are accessed in a transaction  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: Speed up transaction completion faster after many relations are accessed in a transaction
List pgsql-hackers
Thanks for having a look at this.

On Wed, 1 Feb 2023 at 03:07, Amit Langote <amitlangote09@gmail.com> wrote:
> Maybe you're planning to do it once this patch is post the PoC phase
> (isn't it?), but it would be helpful to have commentary on all the new
> dlist fields.

I've added comments on the new fields.  Maybe we can say the patch is "wip".

> This seems to be replacing what is a cache with an upper limit on the
> number of cacheds locks with something that has no limit on how many
> per-owner locks are remembered.  I wonder whether we'd be doing
> additional work in some cases with the new no-limit implementation
> that wasn't being done before (where the owner's locks array is
> overflowed) or maybe not much because the new implementation of
> ResourceOwner{Remember|Forget}Lock() is simple push/delete of a dlist
> node from the owner's dlist?

It's a good question. The problem is I don't really have a good test
to find out.  The problem is we'd need to benchmark taking fewer than
16 locks. On trying that, I find that there's just too much
variability in the performance between runs to determine if there's
any slowdown.

$ cat 10_locks.sql
select count(pg_advisory_lock(x)) from generate_series(1,10) x;

$ pgbench -f 10_locks.sql@1000 -M prepared -T 10 -n postgres | grep -E "(tps)"
tps = 47809.306088 (without initial connection time)
tps = 66859.789072 (without initial connection time)
tps = 37885.924616 (without initial connection time)

On trying with more locks, I see there are good wins from the patched version.

$ cat 100_locks.sql
select count(pg_advisory_lock(x)) from generate_series(1,100) x;

$ cat 1k_locks.sql
select count(pg_advisory_lock(x)) from generate_series(1,1000) x;

$ cat 10k_locks.sql
select count(pg_advisory_lock(x)) from generate_series(1,10000) x;

Test 1:  Take 100 locks but periodically take 10k locks to bloat the
local lock table.

master:
$ pgbench -f 100_locks.sql@1000 -f 10k_locks.sql@1 -M prepared -T 10
-n postgres | grep -E "(tps|script)"
transaction type: multiple scripts
tps = 2726.197037 (without initial connection time)
SQL script 1: 100_locks.sql
 - 27219 transactions (99.9% of total, tps = 2722.496227)
SQL script 2: 10k_locks.sql
 - 37 transactions (0.1% of total, tps = 3.700810)

patched:
$ pgbench -f 100_locks.sql@1000 -f 10k_locks.sql@1 -M prepared -T 10
-n postgres | grep -E "(tps|script)"
transaction type: multiple scripts
tps = 34047.297822 (without initial connection time)
SQL script 1: 100_locks.sql
 - 340039 transactions (99.9% of total, tps = 34012.688879)
SQL script 2: 10k_locks.sql
 - 346 transactions (0.1% of total, tps = 34.608943)

patched without slab context:
$ pgbench -f 100_locks.sql@1000 -f 10k_locks.sql@1 -M prepared -T 10
-n postgres | grep -E "(tps|script)"
transaction type: multiple scripts
tps = 34851.770846 (without initial connection time)
SQL script 1: 100_locks.sql
 - 348097 transactions (99.9% of total, tps = 34818.662324)
SQL script 2: 10k_locks.sql
 - 331 transactions (0.1% of total, tps = 33.108522)

Test 2: Always take just 100 locks and don't bloat the local lock table.

master:
$ pgbench -f 100_locks.sql@1000 -M prepared -T 10 -n postgres | grep
-E "(tps|script)"
tps = 32682.491548 (without initial connection time)

patched:
$ pgbench -f 100_locks.sql@1000 -M prepared -T 10 -n postgres | grep
-E "(tps|script)"
tps = 35637.241815 (without initial connection time)

patched without slab context:
$ pgbench -f 100_locks.sql@1000 -M prepared -T 10 -n postgres | grep
-E "(tps|script)"
tps = 36192.185181 (without initial connection time)

The attached 0003 patch is an experiment to see if using a slab memory
context has any advantages for storing the LOCALLOCKOWNER structs.
There seems to be a small performance hit from doing this.

> The following comment is now obsolete:
>
> /*
>  * LockReassignCurrentOwner
>  *      Reassign all locks belonging to CurrentResourceOwner to belong
>  *      to its parent resource owner.
>  *
>  * If the caller knows what those locks are, it can pass them as an array.
>  * That speeds up the call significantly, when a lot of locks are held
>  * (e.g pg_dump with a large schema).  Otherwise, pass NULL for locallocks,
>  * and we'll traverse through our hash table to find them.
>  */

I've removed the obsolete part.

I've attached another set of patches. I do need to spend longer
looking at this. I'm mainly attaching these as CI seems to be
highlighting a problem that I'm unable to recreate locally and I
wanted to see if the attached fixes it.

David

Attachment

pgsql-hackers by date:

Previous
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: run pgindent on a regular basis / scripted manner
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: pg_usleep for multisecond delays