Re: Speed up transaction completion faster after many relations are accessed in a transaction - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Speed up transaction completion faster after many relations are accessed in a transaction
Date
Msg-id 7a342e6f-ddd7-4d14-a953-731255493164@iki.fi
Whole thread Raw
In response to Re: Speed up transaction completion faster after many relations are accessed in a transaction  (vignesh C <vignesh21@gmail.com>)
Responses Re: Speed up transaction completion faster after many relations are accessed in a transaction
List pgsql-hackers
Returning to an old thread that I just remembered:

On 09/01/2024 08:24, vignesh C wrote:
> On Thu, 9 Nov 2023 at 21:48, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> On 18/09/2023 07:08, David Rowley wrote:
>>> On Fri, 15 Sept 2023 at 22:37, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> A few quick comments:
>>
>> - It would be nice to add a test for the issue that you fixed in patch
>> v7, i.e. if you prepare a transaction while holding session-level locks.
>>
>> - GrantLockLocal() now calls MemoryContextAlloc(), which can fail if you
>> are out of memory. Is that handled gracefully or is the lock leaked?

The above are still valid comments..

> CFBot shows one of the test has aborted at [1] with:
> [20:54:28.535] Core was generated by `postgres: subscriber: logical
> replication apply worker for subscription 16397 '.
> [20:54:28.535] Program terminated with signal SIGABRT, Aborted.
> [20:54:28.535] #0  __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:50
> [20:54:28.535] Download failed: Invalid argument.  Continuing without
> source file ./signal/../sysdeps/unix/sysv/linux/raise.c.
> [20:54:28.627]
> [20:54:28.627] Thread 1 (Thread 0x7f0ea02d1a40 (LWP 50984)):
> [20:54:28.627] #0  __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:50
> ...
> ...
> [20:54:28.627] #2  0x00005618e989d62f in ExceptionalCondition
> (conditionName=conditionName@entry=0x5618e9b40f70
> "dlist_is_empty(&(MyProc->myProcLocks[i]))",
> fileName=fileName@entry=0x5618e9b40ec0
> "../src/backend/storage/lmgr/proc.c", lineNumber=lineNumber@entry=856)
> at ../src/backend/utils/error/assert.c:66
> [20:54:28.627] No locals.
> [20:54:28.627] #3  0x00005618e95e6847 in ProcKill (code=<optimized
> out>, arg=<optimized out>) at ../src/backend/storage/lmgr/proc.c:856
> [20:54:28.627]         i = <optimized out>
> [20:54:28.627]         proc = <optimized out>
> [20:54:28.627]         procgloballist = <optimized out>
> [20:54:28.627]         __func__ = "ProcKill"
> [20:54:28.627] #4  0x00005618e959ebcc in shmem_exit
> (code=code@entry=1) at ../src/backend/storage/ipc/ipc.c:276
> [20:54:28.627]         __func__ = "shmem_exit"
> [20:54:28.627] #5  0x00005618e959ecd0 in proc_exit_prepare
> (code=code@entry=1) at ../src/backend/storage/ipc/ipc.c:198
> [20:54:28.627]         __func__ = "proc_exit_prepare"
> [20:54:28.627] #6  0x00005618e959ee8e in proc_exit (code=code@entry=1)
> at ../src/backend/storage/ipc/ipc.c:111
> [20:54:28.627]         __func__ = "proc_exit"
> [20:54:28.627] #7  0x00005618e94aa54d in BackgroundWorkerMain () at
> ../src/backend/postmaster/bgworker.c:805
> [20:54:28.627]         local_sigjmp_buf = {{__jmpbuf =
> {94665009627112, -3865857745677845768, 0, 0, 140732736634980, 1,
> 3865354362587970296, 7379258256398875384}, __mask_was_saved = 1,
> __saved_mask = {__val = {18446744066192964099, 94665025527920,
> 94665025527920, 94665025527920, 0, 94665025528120, 8192, 1,
> 94664997686410, 94665009627040, 94664997622076, 94665025527920, 1, 0,
> 0, 140732736634980}}}}
> [20:54:28.627]         worker = 0x5618eb37c570
> [20:54:28.627]         entrypt = <optimized out>
> [20:54:28.627]         __func__ = "BackgroundWorkerMain"
> [20:54:28.627] #8  0x00005618e94b495c in do_start_bgworker
> (rw=rw@entry=0x5618eb3b73c8) at
> ../src/backend/postmaster/postmaster.c:5697
> [20:54:28.627]         worker_pid = <optimized out>
> [20:54:28.627]         __func__ = "do_start_bgworker"
> [20:54:28.627] #9  0x00005618e94b4c32 in maybe_start_bgworkers () at
> ../src/backend/postmaster/postmaster.c:5921
> [20:54:28.627]         rw = 0x5618eb3b73c8
> [20:54:28.627]         num_launched = 0
> [20:54:28.627]         now = 0
> [20:54:28.627]         iter = {cur = 0x5618eb3b79a8, next =
> 0x5618eb382a20, prev = 0x5618ea44a980 <BackgroundWorkerList>}
> [20:54:28.627] #10 0x00005618e94b574a in process_pm_pmsignal () at
> ../src/backend/postmaster/postmaster.c:5073
> [20:54:28.627]         __func__ = "process_pm_pmsignal"
> [20:54:28.627] #11 0x00005618e94b5f4a in ServerLoop () at
> ../src/backend/postmaster/postmaster.c:1760
> 
> [1] - https://cirrus-ci.com/task/5118173163290624?logs=cores#L51

I was able to reproduce and debug that. The patch made the assumption 
that when a process is about to exit, in ShutdownPostgres(), it can only 
have session-level locks left, after we have done 
AbortOutOfAnyTransaction(). That assumption did not hold, because if 
proc_exit() is called while we're waiting on a lock, the lock is not yet 
registered with the resource owner (or if it's a session lock, it's not 
yet added to the 'session_locks' list). Therefore, when 
ShutdownPostgres() called AbortOutOfAnyTransaction() and 
LockReleaseSession(USER_LOCKMETHOD), it would not clean up the lock 
objects for the lock we were waiting on. That's pretty straightforward 
to fix by also calling LockErrorCleanup() from ShutdownPostgres(). 
Alternatively, perhaps we should register the lock with the resource 
owner earlier.

Here's a rebased version of the patch. I also split it into a few parts 
for easier review:

v10-0001-Assert-that-all-fastpath-locks-are-released-befo.patch adds a 
few assertions, which are not probably good to have even without the 
rest of the patches.

v10-0002-use-linked-list-in-ResourceOwner-to-hold-LOCALLO.patch replaces 
the "cache" in ResourceOwnerData with the linked list.

v10-0003-stop-using-releaseMask-in-PostPrepare_Locks.patch contains the 
changes to PostPrepare_Locks(), so that it doesn't use the 'releaseMask' 
field anymore.

v10-0004-Remove-LockReleaseAll.patch removes LockReleaseAll, and 
replaces it with LockReleaseSession() calls.

v10-0005-Fix-assertion-failure-from-ProcKill-in-subscript.patch fixes 
the assertion failure.


I realized that after these patches, we only use PGPROC->myProcLocks 
lists in assertions and in PostPrepare_Locks(). PostPrepare_Locks() 
would be pretty easy to refactor to not need them either. That would 
shrink the PGPROC struct considerably.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Remove more leftovers of AIX support
Next
From: Peter Eisentraut
Date:
Subject: Re: commented out code