Re: adding wait_start column to pg_locks - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: adding wait_start column to pg_locks |
Date | |
Msg-id | 067d5597e5de82fe47e5716991ab65c4@oss.nttdata.com Whole thread Raw |
In response to | Re: adding wait_start column to pg_locks (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: adding wait_start column to pg_locks
|
List | pgsql-hackers |
On 2021-02-09 22:54, Fujii Masao wrote: > On 2021/02/09 19:11, Fujii Masao wrote: >> >> >> On 2021/02/09 18:13, Fujii Masao wrote: >>> >>> >>> On 2021/02/09 17:48, torikoshia wrote: >>>> On 2021-02-05 18:49, Fujii Masao wrote: >>>>> On 2021/02/05 0:03, torikoshia wrote: >>>>>> On 2021-02-03 11:23, Fujii Masao wrote: >>>>>>>> 64-bit fetches are not atomic on some platforms. So spinlock is >>>>>>>> necessary when updating "waitStart" without holding the >>>>>>>> partition lock? Also GetLockStatusData() needs spinlock when >>>>>>>> reading "waitStart"? >>>>>>> >>>>>>> Also it might be worth thinking to use 64-bit atomic operations >>>>>>> like >>>>>>> pg_atomic_read_u64(), for that. >>>>>> >>>>>> Thanks for your suggestion and advice! >>>>>> >>>>>> In the attached patch I used pg_atomic_read_u64() and >>>>>> pg_atomic_write_u64(). >>>>>> >>>>>> waitStart is TimestampTz i.e., int64, but it seems >>>>>> pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned >>>>>> int, so I cast the type. >>>>>> >>>>>> I may be using these functions not correctly, so if something is >>>>>> wrong, I would appreciate any comments. >>>>>> >>>>>> >>>>>> About the documentation, since your suggestion seems better than >>>>>> v6, I used it as is. >>>>> >>>>> Thanks for updating the patch! >>>>> >>>>> + if (pg_atomic_read_u64(&MyProc->waitStart) == 0) >>>>> + pg_atomic_write_u64(&MyProc->waitStart, >>>>> + pg_atomic_read_u64((pg_atomic_uint64 >>>>> *) &now)); >>>>> >>>>> pg_atomic_read_u64() is really necessary? I think that >>>>> "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough. >>>>> >>>>> + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); >>>>> + pg_atomic_write_u64(&MyProc->waitStart, >>>>> + pg_atomic_read_u64((pg_atomic_uint64 *) >>>>> &deadlockStart)); >>>>> >>>>> Same as above. >>>>> >>>>> + /* >>>>> + * Record waitStart reusing the deadlock timeout timer. >>>>> + * >>>>> + * It would be ideal this can be synchronously done with >>>>> updating >>>>> + * lock information. Howerver, since it gives performance >>>>> impacts >>>>> + * to hold partitionLock longer time, we do it here >>>>> asynchronously. >>>>> + */ >>>>> >>>>> IMO it's better to comment why we reuse the deadlock timeout timer. >>>>> >>>>> proc->waitStatus = waitStatus; >>>>> + pg_atomic_init_u64(&MyProc->waitStart, 0); >>>>> >>>>> pg_atomic_write_u64() should be used instead? Because waitStart can >>>>> be >>>>> accessed concurrently there. >>>>> >>>>> I updated the patch and addressed the above review comments. Patch >>>>> attached. >>>>> Barring any objection, I will commit this version. >>>> >>>> Thanks for modifying the patch! >>>> I agree with your comments. >>>> >>>> BTW, I ran pgbench several times before and after applying >>>> this patch. >>>> >>>> The environment is virtual machine(CentOS 8), so this is >>>> just for reference, but there were no significant difference >>>> in latency or tps(both are below 1%). >>> >>> Thanks for the test! I pushed the patch. >> >> But I reverted the patch because buildfarm members rorqual and >> prion don't like the patch. I'm trying to investigate the cause >> of this failures. >> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10 > > - relation | locktype | mode > ------------------+----------+--------------------- > - test_prepared_1 | relation | RowExclusiveLock > - test_prepared_1 | relation | AccessExclusiveLock > -(2 rows) > - > +ERROR: invalid spinlock number: 0 > > "rorqual" reported that the above error happened in the server built > with > --disable-atomics --disable-spinlocks when reading pg_locks after > the transaction was prepared. The cause of this issue is that > "waitStart" > atomic variable in the dummy proc created at the end of prepare > transaction > was not initialized. I updated the patch so that pg_atomic_init_u64() > is > called for the "waitStart" in the dummy proc for prepared transaction. > Patch attached. I confirmed that the patched server built with > --disable-atomics --disable-spinlocks passed all the regression tests. Thanks for fixing the bug, I also tested v9.patch configured with --disable-atomics --disable-spinlocks on my environment and confirmed that all tests have passed. > > BTW, while investigating this issue, I found that pg_stat_wal_receiver > also > could cause this error even in the current master (without the patch). > I will report that in separate thread. > > >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-09%2009%3A13%3A16 > > "prion" reported the following error. But I'm not sure how the changes > of > pg_locks caused this error. I found that Heikki also reported at [1] > that > "prion" failed with the same error but was not sure how it happened. > This makes me think for now that this issue is not directly related to > the pg_locks changes. Thanks! I was wondering how these errors were related to the commit. Regards, -- Atsushi Torikoshi > ------------------------------------- > pg_dump: error: query failed: ERROR: missing chunk number 0 for toast > value 14444 in pg_toast_2619 > pg_dump: error: query was: SELECT > a.attnum, > a.attname, > a.atttypmod, > a.attstattarget, > a.attstorage, > t.typstorage, > a.attnotnull, > a.atthasdef, > a.attisdropped, > a.attlen, > a.attalign, > a.attislocal, > pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname, > array_to_string(a.attoptions, ', ') AS attoptions, > CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 > END AS attcollation, > pg_catalog.array_to_string(ARRAY(SELECT > pg_catalog.quote_ident(option_name) || ' ' || > pg_catalog.quote_literal(option_value) FROM > pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), > E', > ') AS attfdwoptions, > a.attidentity, > CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval > ELSE null END AS attmissingval, > a.attgenerated > FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t ON > a.atttypid = t.oid > WHERE a.attrelid = '35987'::pg_catalog.oid AND a.attnum > > 0::pg_catalog.int2 > ORDER BY a.attnum > pg_dumpall: error: pg_dump failed on database "regression", exiting > waiting for server to shut down.... done > server stopped > pg_dumpall of post-upgrade database cluster failed > ------------------------------------- > > [1] > https://www.postgresql.org/message-id/f03ea04a-9b77-e371-9ab9-182cb35db1f9@iki.fi > > > Regards,
pgsql-hackers by date: