Re: Patch: Global Unique Index - Mailing list pgsql-hackers

From David Zhang
Subject Re: Patch: Global Unique Index
Date
Msg-id 57cc769a-1969-7f2a-89e6-4aaefb9843fe@highgo.ca
Whole thread Raw
In response to Re: Patch: Global Unique Index  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Thanks a lot for all the comments.

On 2022-11-29 3:13 p.m., Tom Lane wrote:
> ... not to mention creating a high probability of deadlocks between
> concurrent insertions to different partitions.  If they each
> ex-lock their own partition's index before starting to look into
> other partitions' indexes, it seems like a certainty that such
> cases would fail.  The rule of thumb about locking multiple objects
> is that all comers had better do it in the same order, and this
> isn't doing that.
In the current POC patch, the deadlock is happening when backend-1 
inserts a value to index X(partition-1), and backend-2 try to insert a 
conflict value right after backend-1 released the buffer block lock but 
before start to check unique on index Y(partition-2). In this case, 
backend-1 holds ExclusiveLock on transaction-1 and waits for ShareLock 
on transaction-2 , while backend-2 holds ExclusiveLock on transaction-2 
and waits for ShareLock on transaction-1. Based on my debugging tests, 
this only happens when backend-1 and backend-2 want to insert a conflict 
value. If this is true, then is it ok to either `deadlock` error out or 
`duplicated value` error out since this is a conflict value? (hopefully 
end users can handle it in a similar way). I think the probability of 
such deadlock has two conditions: 1) users insert a conflict value and 
plus 2) the uniqueness checking happens in the right moment (see above).
> That specific issue could perhaps be fixed by having everybody
> examine all the indexes in the same order, inserting when you
> come to your own partition's index and otherwise just checking
> for conflicts.  But that still means serializing insertions
> across all the partitions.  And the fact that you need to lock
> all the partitions, or even just know what they all are,
Here is the main change for insertion cross-partition uniqueness check 
in `0004-support-global-unique-index-insert-and-update.patch`,
      result = _bt_doinsert(rel, itup, checkUnique, indexUnchanged, 
heapRel);

+    if (checkUnique != UNIQUE_CHECK_NO)
+        btinsert_check_unique_gi(itup, rel, heapRel, checkUnique);
+
      pfree(itup);

where, a cross-partition uniqueness check is added after the index tuple 
btree insertion on current partition. The idea is to make sure other 
backends can find out the ongoing index tuple just inserted (but before 
marked as visible yet), and the current partition uniqueness check can 
be skipped as it has already been checked. Based on this change, I think 
the insertion serialization can happen in two cases: 1) two insertions 
happen on the same buffer block (buffer lock waiting); 2) two ongoing 
insertions with duplicated values (transaction id waiting);





pgsql-hackers by date:

Previous
From: Paul Jungwirth
Date:
Subject: Think-o in foreign key comments
Next
From: Zheng Li
Date:
Subject: Re: Support logical replication of DDLs