Re: Latches vs lwlock contention - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Latches vs lwlock contention
Date
Msg-id 4961142a-7958-4229-8329-8777b4b72690@iki.fi
Whole thread Raw
In response to Re: Latches vs lwlock contention  (Alexander Lakhin <exclusion@gmail.com>)
List pgsql-hackers
On 27/03/2025 07:00, Alexander Lakhin wrote:
> I've discovered that the following script:
> export PGOPTIONS='-c lock_timeout=1s'
> createdb regression
> for i in {1..100}; do
> echo "ITERATION: $i"
> psql -c "CREATE TABLE t(i int);"
> cat << 'EOF' | psql &
> DO $$
> DECLARE
>      i int;
> BEGIN
>     FOR i IN 1 .. 5000000 LOOP
>      INSERT INTO t VALUES (1);
>    END LOOP;
> END;
> $$;
> EOF
> sleep 1
> psql -c "DROP TABLE t" &
> cat << 'EOF' | psql &
> COPY t FROM STDIN;
> 0
> \.
> EOF
> wait
> 
> psql -c "DROP TABLE t" || break;
> done
> 
> causes a segmentation fault on master (it fails on iterations 5, 4, 26 
> for me):
> ITERATION: 26
> CREATE TABLE
> ERROR:  canceling statement due to lock timeout
> ERROR:  canceling statement due to lock timeout
> invalid command \.
> ERROR:  syntax error at or near "0"
> LINE 1: 0
>          ^
> server closed the connection unexpectedly
> 
> Core was generated by `postgres: law regression [local] 
> idle                                         '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at 
> lock.c:1805
> 1805            lockOwners[i].owner = owner;
> (gdb) bt
> #0  GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at 
> lock.c:1805
> #1  0x00005a1d51e93ee7 in GrantAwaitedLock () at lock.c:1887
> #2  0x00005a1d51ea1e58 in LockErrorCleanup () at proc.c:814
> #3  0x00005a1d51b9a1a7 in AbortTransaction () at xact.c:2853
> #4  0x00005a1d51b9abc6 in AbortCurrentTransactionInternal () at xact.c:3579
> #5  AbortCurrentTransaction () at xact.c:3457
> #6  0x00005a1d51eafeda in PostgresMain (dbname=<optimized out>, 
> username=0x5a1d75c139b8 "law") at postgres.c:4440
> 
> (gdb) p lockOwners
> $1 = (LOCALLOCKOWNER *) 0x0
> 
> git bisect led me to 3c0fd64fe.
> Could you please take a look?

Great, thanks for the repro! With that, I was able to capture the 
failure with 'rr' and understand what happens: Commit 3c0fd64fe removed 
"lockAwaited = NULL;" from LockErrorCleanup(). Because of that, if the 
lock had been granted to us, and if LockErrorCleanup() was called twice, 
the second call would call GrantAwaitedLock() even if the lock was 
already released and cleaned up.

I've pushed a fix to put that back.

-- 
Heikki Linnakangas
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Test to dump and restore objects left behind by regression
Next
From: Alvaro Herrera
Date:
Subject: Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints