Re: Hash Indexes - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Hash Indexes
Date
Msg-id CAA4eK1+JM+ffHgUfW8wf+Lgn2eJ1fGjyn6b_L5m0fiTEj2_6Pw@mail.gmail.com
Whole thread Raw
In response to Re: Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Sep 29, 2016 at 8:27 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Sep 29, 2016 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Sep 28, 2016 at 3:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> As I was looking at the old text regarding deadlock risk, I realized
>> what may be a serious problem.  Suppose process A is performing a scan
>> of some hash index.  While the scan is suspended, it attempts to take
>> a lock and is blocked by process B.  Process B, meanwhile, is running
>> VACUUM on that hash index.  Eventually, it will do
>> LockBufferForCleanup() on the hash bucket on which process A holds a
>> buffer pin, resulting in an undetected deadlock. In the current
>> coding, A would hold a heavyweight lock and B would attempt to acquire
>> a conflicting heavyweight lock, and the deadlock detector would kill
>> one of them.  This patch probably breaks that.  I notice that that's
>> the only place where we attempt to acquire a buffer cleanup lock
>> unconditionally; every place else, we acquire the lock conditionally,
>> so there's no deadlock risk.  Once we resolve this problem, the
>> paragraph about deadlock risk in this section should be revised to
>> explain whatever solution we come up with.
>>
>> By the way, since VACUUM must run in its own transaction, B can't be
>> holding arbitrary locks, but that doesn't seem quite sufficient to get
>> us out of the woods.  It will at least hold ShareUpdateExclusiveLock
>> on the relation being vacuumed, and process A could attempt to acquire
>> that same lock.
>>
>
> Right, I think there is a danger of deadlock in above situation.
> Needs some more thoughts.
>

I think one way to avoid the risk of deadlock in above scenario is to
take the cleanup lock conditionally, if we get the cleanup lock then
we will delete the items as we are doing in patch now, else it will
just mark the tuples as dead and ensure that it won't try to remove
tuples that are moved-by-split.  Now, I think the question is how will
these dead tuples be removed.  We anyway need a separate mechanism to
clear dead tuples for hash indexes as during scans we are marking the
tuples as dead if corresponding tuple in heap is dead which are not
removed later.  This is already taken care in btree code via
kill_prior_tuple optimization.  So I think clearing of dead tuples can
be handled by a separate patch.

I have also thought about using page-scan-at-a-time idea which has
been discussed upthread[1], but I think we can't completely eliminate
the need to out-wait scans (cleanup lock requirement) for scans that
are started when split-in-progress or for non-MVCC scans as described
in that e-mail [1].  We might be able to find some way to solve the
problem with this approach, but I think it will be slightly
complicated and much more work is required as compare to previous
approach.

What is your preference among above approaches to resolve this problem
or let me know if you have a better idea to solve it?


[1] - https://www.postgresql.org/message-id/CAA4eK1Jj1UqneTXrywr%3DGg87vgmnMma87LuscN_r3hKaHd%3DL2g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Transactions involving multiple postgres foreign servers
Next
From: Amit Langote
Date:
Subject: Un-include access/heapam.h