Re: adding wait_start column to pg_locks - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: adding wait_start column to pg_locks |
Date | |
Msg-id | 64102638-7fc3-1941-12f1-e139e3463c03@oss.nttdata.com Whole thread Raw |
In response to | Re: adding wait_start column to pg_locks (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: adding wait_start column to pg_locks
|
List | pgsql-hackers |
On 2021/02/09 19:11, Fujii Masao wrote: > > > On 2021/02/09 18:13, Fujii Masao wrote: >> >> >> On 2021/02/09 17:48, torikoshia wrote: >>> On 2021-02-05 18:49, Fujii Masao wrote: >>>> On 2021/02/05 0:03, torikoshia wrote: >>>>> On 2021-02-03 11:23, Fujii Masao wrote: >>>>>>> 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holdingthe partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? >>>>>> >>>>>> Also it might be worth thinking to use 64-bit atomic operations like >>>>>> pg_atomic_read_u64(), for that. >>>>> >>>>> Thanks for your suggestion and advice! >>>>> >>>>> In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). >>>>> >>>>> waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsignedint, so I cast the type. >>>>> >>>>> I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. >>>>> >>>>> >>>>> About the documentation, since your suggestion seems better than v6, I used it as is. >>>> >>>> Thanks for updating the patch! >>>> >>>> + if (pg_atomic_read_u64(&MyProc->waitStart) == 0) >>>> + pg_atomic_write_u64(&MyProc->waitStart, >>>> + pg_atomic_read_u64((pg_atomic_uint64 *) &now)); >>>> >>>> pg_atomic_read_u64() is really necessary? I think that >>>> "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough. >>>> >>>> + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); >>>> + pg_atomic_write_u64(&MyProc->waitStart, >>>> + pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart)); >>>> >>>> Same as above. >>>> >>>> + /* >>>> + * Record waitStart reusing the deadlock timeout timer. >>>> + * >>>> + * It would be ideal this can be synchronously done with updating >>>> + * lock information. Howerver, since it gives performance impacts >>>> + * to hold partitionLock longer time, we do it here asynchronously. >>>> + */ >>>> >>>> IMO it's better to comment why we reuse the deadlock timeout timer. >>>> >>>> proc->waitStatus = waitStatus; >>>> + pg_atomic_init_u64(&MyProc->waitStart, 0); >>>> >>>> pg_atomic_write_u64() should be used instead? Because waitStart can be >>>> accessed concurrently there. >>>> >>>> I updated the patch and addressed the above review comments. Patch attached. >>>> Barring any objection, I will commit this version. >>> >>> Thanks for modifying the patch! >>> I agree with your comments. >>> >>> BTW, I ran pgbench several times before and after applying >>> this patch. >>> >>> The environment is virtual machine(CentOS 8), so this is >>> just for reference, but there were no significant difference >>> in latency or tps(both are below 1%). >> >> Thanks for the test! I pushed the patch. > > But I reverted the patch because buildfarm members rorqual and > prion don't like the patch. I'm trying to investigate the cause > of this failures. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10 - relation | locktype | mode ------------------+----------+--------------------- - test_prepared_1 | relation | RowExclusiveLock - test_prepared_1 | relation | AccessExclusiveLock -(2 rows) - +ERROR: invalid spinlock number: 0 "rorqual" reported that the above error happened in the server built with --disable-atomics --disable-spinlocks when reading pg_locks after the transaction was prepared. The cause of this issue is that "waitStart" atomic variable in the dummy proc created at the end of prepare transaction was not initialized. I updated the patch so that pg_atomic_init_u64() is called for the "waitStart" in the dummy proc for prepared transaction. Patch attached. I confirmed that the patched server built with --disable-atomics --disable-spinlocks passed all the regression tests. BTW, while investigating this issue, I found that pg_stat_wal_receiver also could cause this error even in the current master (without the patch). I will report that in separate thread. > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-09%2009%3A13%3A16 "prion" reported the following error. But I'm not sure how the changes of pg_locks caused this error. I found that Heikki also reported at [1] that "prion" failed with the same error but was not sure how it happened. This makes me think for now that this issue is not directly related to the pg_locks changes. ------------------------------------- pg_dump: error: query failed: ERROR: missing chunk number 0 for toast value 14444 in pg_toast_2619 pg_dump: error: query was: SELECT a.attnum, a.attname, a.atttypmod, a.attstattarget, a.attstorage, t.typstorage, a.attnotnull, a.atthasdef, a.attisdropped, a.attlen, a.attalign, a.attislocal, pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname, array_to_string(a.attoptions, ', ') AS attoptions, CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation, pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || ' ' || pg_catalog.quote_literal(option_value)FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E', ') AS attfdwoptions, a.attidentity, CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval ELSE null END AS attmissingval, a.attgenerated FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t ON a.atttypid = t.oid WHERE a.attrelid = '35987'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2 ORDER BY a.attnum pg_dumpall: error: pg_dump failed on database "regression", exiting waiting for server to shut down.... done server stopped pg_dumpall of post-upgrade database cluster failed ------------------------------------- [1] https://www.postgresql.org/message-id/f03ea04a-9b77-e371-9ab9-182cb35db1f9@iki.fi Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
pgsql-hackers by date: