On 27/03/2025 07:00, Alexander Lakhin wrote:
> I've discovered that the following script:
> export PGOPTIONS='-c lock_timeout=1s'
> createdb regression
> for i in {1..100}; do
> echo "ITERATION: $i"
> psql -c "CREATE TABLE t(i int);"
> cat << 'EOF' | psql &
> DO $$
> DECLARE
> i int;
> BEGIN
> FOR i IN 1 .. 5000000 LOOP
> INSERT INTO t VALUES (1);
> END LOOP;
> END;
> $$;
> EOF
> sleep 1
> psql -c "DROP TABLE t" &
> cat << 'EOF' | psql &
> COPY t FROM STDIN;
> 0
> \.
> EOF
> wait
>
> psql -c "DROP TABLE t" || break;
> done
>
> causes a segmentation fault on master (it fails on iterations 5, 4, 26
> for me):
> ITERATION: 26
> CREATE TABLE
> ERROR: canceling statement due to lock timeout
> ERROR: canceling statement due to lock timeout
> invalid command \.
> ERROR: syntax error at or near "0"
> LINE 1: 0
> ^
> server closed the connection unexpectedly
>
> Core was generated by `postgres: law regression [local]
> idle '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at
> lock.c:1805
> 1805 lockOwners[i].owner = owner;
> (gdb) bt
> #0 GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at
> lock.c:1805
> #1 0x00005a1d51e93ee7 in GrantAwaitedLock () at lock.c:1887
> #2 0x00005a1d51ea1e58 in LockErrorCleanup () at proc.c:814
> #3 0x00005a1d51b9a1a7 in AbortTransaction () at xact.c:2853
> #4 0x00005a1d51b9abc6 in AbortCurrentTransactionInternal () at xact.c:3579
> #5 AbortCurrentTransaction () at xact.c:3457
> #6 0x00005a1d51eafeda in PostgresMain (dbname=<optimized out>,
> username=0x5a1d75c139b8 "law") at postgres.c:4440
>
> (gdb) p lockOwners
> $1 = (LOCALLOCKOWNER *) 0x0
>
> git bisect led me to 3c0fd64fe.
> Could you please take a look?
Great, thanks for the repro! With that, I was able to capture the
failure with 'rr' and understand what happens: Commit 3c0fd64fe removed
"lockAwaited = NULL;" from LockErrorCleanup(). Because of that, if the
lock had been granted to us, and if LockErrorCleanup() was called twice,
the second call would call GrantAwaitedLock() even if the lock was
already released and cleaned up.
I've pushed a fix to put that back.
--
Heikki Linnakangas
Neon (https://neon.tech)