Thread: Deadlock possibility in _bt_check_unique?
Hi,<br /> With the implementation of deferred unique constraints, we need to go back to the index second time to checkwhether the unique check is valid. Say a situation occurs like this<br />a) the first session doing the unique checkfinds out that there is a unique check required second time and just makes its entry and comes back<br /> b) the secondsession doing the unique check finds out that there is a unique check required second time and just makes its entryand comes back<br /><br />While they do the second check, first session will wait for the session to complete and viceversa. Won't this result in a deadlock? Isn't this a realistic scenario?<br /><br />Thanks,<br />Gokul.<br />
Gokulakannan Somasundaram wrote: > Hi, > With the implementation of deferred unique constraints, we need to go > back to the index second time to check whether the unique check is valid. > Say a situation occurs like this > a) the first session doing the unique check finds out that there is a unique > check required second time and just makes its entry and comes back > b) the second session doing the unique check finds out that there is a > unique check required second time and just makes its entry and comes back > > While they do the second check, first session will wait for the session to > complete and vice versa. Won't this result in a deadlock? Isn't this a > realistic scenario? Yes, that can happen. The deadlock detector will kick in and abort one of the sessions. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Can you also explain how are we avoiding duplicates in this scenario?<br /><br />a) Say there are three pages(P,Q, R) fullof duplicate tuples, that are deleted but not dead of id x(due to some long running transaction).<br />b) Now SessionA gets in and checks the duplicate tuples for their liveliness with the HeapTuple for id x with shared lock on allthe pages P, Q and R. Since all are deleted, it will get the message, that it need not come back to check again for uniquenessFinally it again starts from P to check for freespace to insert its tuple. Say it inserts the tuple at page Q.<br/> c) Now Session B(with same id x) starts after Session A, but it passes Q before the insertion of the tuple by SessionA. It will also get the response from _bt_check_unique, that it need not comeback for second time unique check. Nowit checks for freespace from P and it finds freespace at P. Then it will insert the new record at P itself.<br /><br />Sowe have two duplicate records, eventhough there is a unique constraint. Is this a possible scenario?<br /><br />Thanks,<br/>Gokul.<br /><br /><br />
Gokulakannan Somasundaram <gokul007@gmail.com> writes: > Can you also explain how are we avoiding duplicates in this scenario? > a) Say there are three pages(P,Q, R) full of duplicate tuples, that are > deleted but not dead of id x(due to some long running transaction). > b) Now Session A gets in and checks the duplicate tuples for their > liveliness with the HeapTuple for id x with shared lock on all the pages P, > Q and R. Since all are deleted, it will get the message, that it need not > come back to check again for uniqueness Finally it again starts from P to > check for freespace to insert its tuple. Say it inserts the tuple at page Q. > c) Now Session B(with same id x) starts after Session A, but it passes Q > before the insertion of the tuple by Session A. It will also get the > response from _bt_check_unique, that it need not comeback for second time > unique check. Now it checks for freespace from P and it finds freespace at > P. Then it will insert the new record at P itself. > So we have two duplicate records, eventhough there is a unique constraint. > Is this a possible scenario? Are you talking about exclusion constraints or btree uniqueness constraints? This doesn't seem to be a particularly accurate description of the implementation of either one. The way btree deals with this is explained in _bt_doinsert: * NOTE: obviously, _bt_check_unique can only detect keys that are already * in the index; so it cannot defend againstconcurrent insertions of the * same key. We protect against that by means of holding a write lock on * thetarget page. Any other would-be inserter of the same key must * acquire a write lock on the same target page, so onlyone would-be * inserter can be making the check at one time. Furthermore, once we are * past the check we holdwrite locks continuously until we have performed * our insertion, so no later inserter can fail to see our insertion. * (This requires some care in _bt_insertonpg.) regards, tom lane
Are you talking about exclusion constraints or btree uniqueness
constraints? This doesn't seem to be a particularly accurate
description of the implementation of either one. The way btree
deals with this is explained in _bt_doinsert:
Unique constraints
* NOTE: obviously, _bt_check_unique can only detect keys that are already
* in the index; so it cannot defend against concurrent insertions of the
* same key. We protect against that by means of holding a write lock on
* the target page. Any other would-be inserter of the same key must
* acquire a write lock on the same target page, so only one would-be
* inserter can be making the check at one time. Furthermore, once we are
* past the check we hold write locks continuously until we have performed
* our insertion, so no later inserter can fail to see our insertion.
* (This requires some care in _bt_insertonpg.)
This is fine, if the second session has to pass through the page, where the first session inserted the record. But as i said if the second session finds a free slot before hitting on the page where the first session inserted, then it will never hit the page with write lock. The comment says that,
" Furthermore, once we are
* past the check we hold write locks continuously until we have performed
* our insertion, so no later inserter can fail to see our insertion.
* (This requires some care in _bt_insertonpg.) "
But in the case, i suggested(page1, page2, page3 with non-dead duplicate tuples, which are deleted), the first session checks page1, finds no freespace, moves to page2, finds freespace and inserts. But when the second session checks page1, say some of the tuples become dead. Then it gets freespace there and inserts, but never sees the insertion of the first session. Maybe, i am missing something. Just thought of raising the flag..
Gokul.
" Furthermore, once we are
* past the check we hold write locks continuously until we have performed
* our insertion, so no later inserter can fail to see our insertion.
* (This requires some care in _bt_insertonpg.) "
But in the case, i suggested(page1, page2, page3 with non-dead duplicate tuples, which are deleted), the first session checks page1, finds no freespace, moves to page2, finds freespace and inserts. But when the second session checks page1, say some of the tuples become dead. Then it gets freespace there and inserts, but never sees the insertion of the first session. Maybe, i am missing something. Just thought of raising the flag..
Gokul.
Gokulakannan Somasundaram <gokul007@gmail.com> writes: > This is fine, if the second session has to pass through the page, where the > first session inserted the record. But as i said if the second session finds > a free slot before hitting on the page where the first session inserted, > then it will never hit the page with write lock. The comment says that, No, you don't understand how it works. All would-be inserters will hit the same target page to begin with, ie, the first one that the new key could legally be inserted on. The lock that protects against this problem is the lock on that page, regardless of which page the key actually ends up on. regards, tom lane
No, you don't understand how it works. All would-be inserters will hit
the same target page to begin with, ie, the first one that the new key
could legally be inserted on. The lock that protects against this
problem is the lock on that page, regardless of which page the key
actually ends up on.
Consider Time instances T1, T2, T3, T4
T1 : session 1 holds the write lock on page p1 and completes the unique check on p1, p2 and p3.
T2 : session 1 releases the lock on p1 (its waiting to acquire a ex lock on p2)
T3 : session 2 acquires write lock on p1 and completes the unique check on p1, p2 and p3. Here, i believe the Session 2
has a chance of getting the read lock before session 1 gets the write lock. Is it not possible?
T4 : session 1 gets the write lock on p2 and inserts and session 2 gets the write lock on p1 and inserts.
OK. I have stated my assumptions. Is my assumption at time instant T3 wrong/ never happen?
Thanks,
Gokul.
Gokulakannan Somasundaram <gokul007@gmail.com> writes: > Consider Time instances T1, T2, T3, T4 > T1 : session 1 holds the write lock on page p1 and completes the unique > check on p1, p2 and p3. > T2 : session 1 releases the lock on p1 (its waiting to acquire a ex lock on > p2) That's not what we do. See _bt_findinsertloc. regards, tom lane
<br /><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204,204, 204); padding-left: 1ex;"><div class="im"> > T2 : session 1 releases the lock on p1 (its waiting to acquirea ex lock on<br /> > p2)<br /><br /></div>That's not what we do. See _bt_findinsertloc.<br /><br /> regards, tom lane<br /></blockquote></div><br /> I am really confused. Please keep the cool and explainme, if i am wrong. I could see this code in _bt_findinsertloc. There is a _bt_relandgetbuf, which releases lock onp1 and tries to acquire a lock on p2. I wrote ex lock in the place of BT_WRITE.<br /><i><font size="2"><br style="font-family:arial,helvetica,sans-serif;" /></font></i><pre class="fragment"><i><font size="2"><span style="font-family:arial,helvetica,sans-serif;">00589 </span><span class="keywordflow" style="font-family: arial,helvetica,sans-serif;">for</span><spanstyle="font-family: arial,helvetica,sans-serif;"> (;;)</span><br style="font-family:arial,helvetica,sans-serif;" /> <a name="l00590" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00590 {</span><br style="font-family: arial,helvetica,sans-serif;" /><a name="l00591"style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00591 </span><a class="code" href="http://doxygen.postgresql.org/block_8h.html#0be1c1ab88d7f8120e2cd2e8ac2697a1"style="font-family: arial,helvetica,sans-serif;">BlockNumber</a><spanstyle="font-family: arial,helvetica,sans-serif;"> rblkno = lpageop-></span><aclass="code" href="http://doxygen.postgresql.org/structBTPageOpaqueData.html#0e96302f6e2aa4203cef0e243362b692"style="font-family: arial,helvetica,sans-serif;">btpo_next</a><spanstyle="font-family: arial,helvetica,sans-serif;">;</span><br style="font-family:arial,helvetica,sans-serif;" /> <a name="l00592" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00592</span><br style="font-family: arial,helvetica,sans-serif;" /><a name="l00593" style="font-family:arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00593 rbuf = </span><a class="code" href="http://doxygen.postgresql.org/nbtpage_8c.html#023261cd645fc5e8b4e8517fe9027bd6" style="font-family:arial,helvetica,sans-serif;">_bt_relandgetbuf</a><span style="font-family: arial,helvetica,sans-serif;">(rel,rbuf, rblkno, </span><a class="code" href="http://doxygen.postgresql.org/nbtree_8h.html#e494b1ec6ecbe7251dfc412a1ec53c1b"style="font-family: arial,helvetica,sans-serif;">BT_WRITE</a><spanstyle="font-family: arial,helvetica,sans-serif;">);</span><br style="font-family:arial,helvetica,sans-serif;" /> <a name="l00594" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00594 page = </span><a class="code" href="http://doxygen.postgresql.org/bufmgr_8h.html#fb570c83a17839dabeb75dba7ea8e1a5"style="font-family: arial,helvetica,sans-serif;">BufferGetPage</a><spanstyle="font-family: arial,helvetica,sans-serif;">(rbuf);</span><br style="font-family:arial,helvetica,sans-serif;" /> <a name="l00595" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00595 lpageop = (</span><a class="code" href="http://doxygen.postgresql.org/structBTPageOpaqueData.html"style="font-family: arial,helvetica,sans-serif;">BTPageOpaque</a><spanstyle="font-family: arial,helvetica,sans-serif;">) </span><a class="code"href="http://doxygen.postgresql.org/bufpage_8h.html#3be45495654ca1ff61f6ae45805e25f6" style="font-family: arial,helvetica,sans-serif;">PageGetSpecialPointer</a><spanstyle="font-family: arial,helvetica,sans-serif;">(page);</span><brstyle="font-family: arial,helvetica,sans-serif;" /> <a name="l00596" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00596 </span><span class="keywordflow" style="font-family: arial,helvetica,sans-serif;">if</span><spanstyle="font-family: arial,helvetica,sans-serif;"> (!</span><a class="code" href="http://doxygen.postgresql.org/nbtree_8h.html#a8df35238449d00d7e14c3f1ccd3121c"style="font-family: arial,helvetica,sans-serif;">P_IGNORE</a><spanstyle="font-family: arial,helvetica,sans-serif;">(lpageop))</span><br style="font-family:arial,helvetica,sans-serif;" /> <a name="l00597" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00597 </span><span class="keywordflow" style="font-family: arial,helvetica,sans-serif;">break</span><spanstyle="font-family: arial,helvetica,sans-serif;">;</span><br style="font-family:arial,helvetica,sans-serif;" /> <a name="l00598" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00598 </span><span class="keywordflow" style="font-family: arial,helvetica,sans-serif;">if</span><spanstyle="font-family: arial,helvetica,sans-serif;"> (</span><a class="code" href="http://doxygen.postgresql.org/nbtree_8h.html#8b5e4857926b514c0f97592bf3344293"style="font-family: arial,helvetica,sans-serif;">P_RIGHTMOST</a><spanstyle="font-family: arial,helvetica,sans-serif;">(lpageop))</span><br style="font-family:arial,helvetica,sans-serif;" /> <a name="l00599" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00599 </span><a class="code" href="http://doxygen.postgresql.org/elog_8h.html#850290250a1449fc513da20ae2ce5ec5"style="font-family: arial,helvetica,sans-serif;">elog</a><spanstyle="font-family: arial,helvetica,sans-serif;">(</span><a class="code" href="http://doxygen.postgresql.org/elog_8h.html#8fe83ac76edc595f6b98cd4a4127aed5"style="font-family: arial,helvetica,sans-serif;">ERROR</a><spanstyle="font-family: arial,helvetica,sans-serif;">, </span><span class="stringliteral"style="font-family: arial,helvetica,sans-serif;">"fell off the end of index \"%s\""</span><span style="font-family:arial,helvetica,sans-serif;">,</span><br style="font-family: arial,helvetica,sans-serif;" /> <a name="l00600" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00600 </span><a class="code" href="http://doxygen.postgresql.org/rel_8h.html#5e9d450c92f70171110e22ffc678ed04"style="font-family: arial,helvetica,sans-serif;">RelationGetRelationName</a><spanstyle="font-family: arial,helvetica,sans-serif;">(rel));</span><brstyle="font-family: arial,helvetica,sans-serif;" /> <a name="l00601" style="font-family: arial,helvetica,sans-serif;"></a><span style="font-family: arial,helvetica,sans-serif;">00601 }</span></font></i><br /><br /><br /></pre>What is that, i am missing here?<br/><br />Gokul.<br />
Oh! yeah, i got it. Thanks....
On Wed, Mar 24, 2010 at 12:26 AM, Gokulakannan Somasundaram <gokul007@gmail.com> wrote:
I am really confused. Please keep the cool and explain me, if i am wrong. I could see this code in _bt_findinsertloc. There is a _bt_relandgetbuf, which releases lock on p1 and tries to acquire a lock on p2. I wrote ex lock in the place of BT_WRITE.> T2 : session 1 releases the lock on p1 (its waiting to acquire a ex lock onThat's not what we do. See _bt_findinsertloc.
> p2)
regards, tom lane00589 for (;;)What is that, i am missing here?
00590 {
00591 BlockNumber rblkno = lpageop->btpo_next;
00592
00593 rbuf = _bt_relandgetbuf(rel, rbuf, rblkno, BT_WRITE);
00594 page = BufferGetPage(rbuf);
00595 lpageop = (BTPageOpaque) PageGetSpecialPointer(page);
00596 if (!P_IGNORE(lpageop))
00597 break;
00598 if (P_RIGHTMOST(lpageop))
00599 elog(ERROR, "fell off the end of index \"%s\"",
00600 RelationGetRelationName(rel));
00601 }
Gokul.
Gokulakannan Somasundaram <gokul007@gmail.com> writes: > I am really confused. Please keep the cool and explain me, if i am > wrong. I could see this code in _bt_findinsertloc. There is a > _bt_relandgetbuf, which releases lock on p1 and tries to acquire a lock on > p2. No, read it again. The only locks that get released inside that loop are ones on intermediate dead pages (rbuf is not buf). The lock on the original page is only released after the loop. regards, tom lane