Re: adding wait_start column to pg_locks - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: adding wait_start column to pg_locks
Date
Msg-id 337d5311-6113-15e0-bd85-90b47b55f5ae@oss.nttdata.com
Whole thread Raw
In response to Re: adding wait_start column to pg_locks  (torikoshia <torikoshia@oss.nttdata.com>)
Responses Re: adding wait_start column to pg_locks
List pgsql-hackers

On 2021/02/09 23:31, torikoshia wrote:
> On 2021-02-09 22:54, Fujii Masao wrote:
>> On 2021/02/09 19:11, Fujii Masao wrote:
>>>
>>>
>>> On 2021/02/09 18:13, Fujii Masao wrote:
>>>>
>>>>
>>>> On 2021/02/09 17:48, torikoshia wrote:
>>>>> On 2021-02-05 18:49, Fujii Masao wrote:
>>>>>> On 2021/02/05 0:03, torikoshia wrote:
>>>>>>> On 2021-02-03 11:23, Fujii Masao wrote:
>>>>>>>>> 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without
holdingthe partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?
 
>>>>>>>>
>>>>>>>> Also it might be worth thinking to use 64-bit atomic operations like
>>>>>>>> pg_atomic_read_u64(), for that.
>>>>>>>
>>>>>>> Thanks for your suggestion and advice!
>>>>>>>
>>>>>>> In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().
>>>>>>>
>>>>>>> waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports
unsignedint, so I cast the type.
 
>>>>>>>
>>>>>>> I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.
>>>>>>>
>>>>>>>
>>>>>>> About the documentation, since your suggestion seems better than v6, I used it as is.
>>>>>>
>>>>>> Thanks for updating the patch!
>>>>>>
>>>>>> +    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
>>>>>> +        pg_atomic_write_u64(&MyProc->waitStart,
>>>>>> +                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));
>>>>>>
>>>>>> pg_atomic_read_u64() is really necessary? I think that
>>>>>> "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
>>>>>>
>>>>>> +        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
>>>>>> +        pg_atomic_write_u64(&MyProc->waitStart,
>>>>>> +                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
>>>>>>
>>>>>> Same as above.
>>>>>>
>>>>>> +        /*
>>>>>> +         * Record waitStart reusing the deadlock timeout timer.
>>>>>> +         *
>>>>>> +         * It would be ideal this can be synchronously done with updating
>>>>>> +         * lock information. Howerver, since it gives performance impacts
>>>>>> +         * to hold partitionLock longer time, we do it here asynchronously.
>>>>>> +         */
>>>>>>
>>>>>> IMO it's better to comment why we reuse the deadlock timeout timer.
>>>>>>
>>>>>>      proc->waitStatus = waitStatus;
>>>>>> +    pg_atomic_init_u64(&MyProc->waitStart, 0);
>>>>>>
>>>>>> pg_atomic_write_u64() should be used instead? Because waitStart can be
>>>>>> accessed concurrently there.
>>>>>>
>>>>>> I updated the patch and addressed the above review comments. Patch attached.
>>>>>> Barring any objection, I will commit this version.
>>>>>
>>>>> Thanks for modifying the patch!
>>>>> I agree with your comments.
>>>>>
>>>>> BTW, I ran pgbench several times before and after applying
>>>>> this patch.
>>>>>
>>>>> The environment is virtual machine(CentOS 8), so this is
>>>>> just for reference, but there were no significant difference
>>>>> in latency or tps(both are below 1%).
>>>>
>>>> Thanks for the test! I pushed the patch.
>>>
>>> But I reverted the patch because buildfarm members rorqual and
>>> prion don't like the patch. I'm trying to investigate the cause
>>> of this failures.
>>>
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
>>
>> -    relation     | locktype |        mode
>> ------------------+----------+---------------------
>> - test_prepared_1 | relation | RowExclusiveLock
>> - test_prepared_1 | relation | AccessExclusiveLock
>> -(2 rows)
>> -
>> +ERROR:  invalid spinlock number: 0
>>
>> "rorqual" reported that the above error happened in the server built with
>> --disable-atomics --disable-spinlocks when reading pg_locks after
>> the transaction was prepared. The cause of this issue is that "waitStart"
>> atomic variable in the dummy proc created at the end of prepare transaction
>> was not initialized. I updated the patch so that pg_atomic_init_u64() is
>> called for the "waitStart" in the dummy proc for prepared transaction.
>> Patch attached. I confirmed that the patched server built with
>> --disable-atomics --disable-spinlocks passed all the regression tests.
> 
> Thanks for fixing the bug, I also tested v9.patch configured with
> --disable-atomics --disable-spinlocks on my environment and confirmed
> that all tests have passed.

Thanks for the test!

I found another bug in the patch. InitProcess() initializes "waitStart",
but previously InitAuxiliaryProcess() did not. This could cause "invalid
spinlock number" error when reading pg_locks in the standby server.
I fixed that. Attached is the updated version of the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Next
From: Peter Geoghegan
Date:
Subject: Re: 64-bit XIDs in deleted nbtree pages