Thread: btree vacuum and suspended scans can deadlock

btree vacuum and suspended scans can deadlock

From

Amit Kapila

Date:

13 October 2016, 13:33:54

During Vacuum, cleanup in btree need to acquire cleanup lock on the
block and the non-MVCC scans or scans on unlogged index retain the pin
on a buffer during scan of a block.  Now, if the scan is suspended
(use cursor and then fetch one row) after holding a pin on a buffer in
Session-1 and in Session-2 we initiate the vacuum which will wait for
the suspended scan in Session-1 to finish (by this time vacuum has a
ShareUpdateExclusiveLock on the table), now if in Session-1, user
tries to acquire a conflicting lock on the same table, it will create
a deadlock.

Simple test to reproduce deadlock:

create unlogged table ul_t1(c1 int);
create index idx_ul_t1 on ul_t1 (c1);
insert into ul_t1 values(1);
insert into ul_t1 values(2);
delete from ul_t1 where c1=1;

Session-1
set enable_seqscan = off;
begin transaction;
declare c1 cursor for select * from ul_t1 where c1=2;
fetch next from c1; --every thing is good till here

Session-2
vacuum ul_t1;  -- This will wait for session-1 to complete the scan

Session-1
lock table ul_t1 in Access Exclusive Mode;  -- This will wait for
session-2 to complete the vacuum

If we agree that above is a problematic case, then some of the options
to solve it could be (a) Vacuum should not wait for a cleanup lock and
instead just give up and start again which I think is a bad idea (b)
don't allow to take lock of higher granularity after the scan is
suspended, not sure if that is feasible (c) document the above danger,
this sounds okay on the ground that nobody has reported the problem
till now

I have found this problem while analysing similar case for hash index
interaction with vacuum as is being discussed on that thread [1].  We
could think of solving it in a way as proposed on that thread, but I
think as proposed that is not safe for non-MVCC scans.

Thoughts?


[1] - https://www.postgresql.org/message-id/CAA4eK1%2BJM%2BffHgUfW8wf%2BLgn2eJ1fGjyn6b_L5m0fiTEj2_6Pw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: btree vacuum and suspended scans can deadlock

From

Robert Haas

Date:

13 October 2016, 21:22:22

On Thu, Oct 13, 2016 at 6:33 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> If we agree that above is a problematic case, then some of the options
> to solve it could be (a) Vacuum should not wait for a cleanup lock and
> instead just give up and start again which I think is a bad idea (b)
> don't allow to take lock of higher granularity after the scan is
> suspended, not sure if that is feasible (c) document the above danger,
> this sounds okay on the ground that nobody has reported the problem
> till now

I don't think any of these sound particularly good.  There have been
some previous discussions of this topic.

https://wiki.postgresql.org/wiki/Todo#Locking - second and third items
https://www.postgresql.org/message-id/CA+TgmoZG01uGL2TV6KOjmax-53eT3J66nk1KSkuOsPysz=DQ2A@mail.gmail.com

https://www.postgresql.org/message-id/flat/CA%2BU5nMJde1%2Bh5bKm48%2B_4U%3D78%2Bw%2BRa4ipfJnAva6QVyDWv0VNQ%40mail.gmail.com
https://www.postgresql.org/message-id/1223.1298392493@sss.pgh.pa.us

I thought Tom had at some point suggested a way of detecting deadlocks
of this type, but I can't find any such email now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: btree vacuum and suspended scans can deadlock

From

Tom Lane

Date:

13 October 2016, 21:44:34

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Oct 13, 2016 at 6:33 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> If we agree that above is a problematic case, then some of the options
>> to solve it could be (a) Vacuum should not wait for a cleanup lock and
>> instead just give up and start again which I think is a bad idea (b)
>> don't allow to take lock of higher granularity after the scan is
>> suspended, not sure if that is feasible (c) document the above danger,
>> this sounds okay on the ground that nobody has reported the problem
>> till now

> I don't think any of these sound particularly good.

Note that it's a mistake to imagine that this is specific to indexes;
the same type of deadlock risk exists just when considering heap cleanup.
We've ameliorated the heap case quite a bit by recognizing situations
where it's okay to skip a page and move on, but it's not gone.
Unfortunately indexes don't get to decide that deletion is optional.

I was about to suggest that maybe we didn't need cleanup locks in btree
indexes anymore now that SnapshotNow is gone, but I see that somebody
already altered nbtree/README to say this:

: Therefore, a scan using an MVCC snapshot which has no other confounding
: factors will not hold the pin after the page contents are read.  The
: current reasons for exceptions, where a pin is still needed, are if the
: index is not WAL-logged or if the scan is an index-only scan.

This is one of the saddest excuses for documentation I've ever seen,
because it doesn't explain WHY either of those conditions require a VACUUM
interlock, and certainly it's not immediately obvious why they should.
"git blame" pins the blame for this text on Kevin, so I'm going to throw
it up to him to explain himself.
        regards, tom lane

Re: btree vacuum and suspended scans can deadlock

From

Amit Kapila

Date:

14 October 2016, 02:46:33

On Fri, Oct 14, 2016 at 3:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Oct 13, 2016 at 6:33 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> If we agree that above is a problematic case, then some of the options
>>> to solve it could be (a) Vacuum should not wait for a cleanup lock and
>>> instead just give up and start again which I think is a bad idea (b)
>>> don't allow to take lock of higher granularity after the scan is
>>> suspended, not sure if that is feasible (c) document the above danger,
>>> this sounds okay on the ground that nobody has reported the problem
>>> till now
>
>> I don't think any of these sound particularly good.
>
> Note that it's a mistake to imagine that this is specific to indexes;
> the same type of deadlock risk exists just when considering heap cleanup.
> We've ameliorated the heap case quite a bit by recognizing situations
> where it's okay to skip a page and move on, but it's not gone.
> Unfortunately indexes don't get to decide that deletion is optional.
>
> I was about to suggest that maybe we didn't need cleanup locks in btree
> indexes anymore now that SnapshotNow is gone,
>

Wouldn't it still be a problem for SnapshotAny?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: btree vacuum and suspended scans can deadlock

From

Amit Kapila

Date:

14 October 2016, 03:22:58

On Fri, Oct 14, 2016 at 2:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 13, 2016 at 6:33 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> If we agree that above is a problematic case, then some of the options
>> to solve it could be (a) Vacuum should not wait for a cleanup lock and
>> instead just give up and start again which I think is a bad idea (b)
>> don't allow to take lock of higher granularity after the scan is
>> suspended, not sure if that is feasible (c) document the above danger,
>> this sounds okay on the ground that nobody has reported the problem
>> till now
>
> I don't think any of these sound particularly good.
>

Tom has suggested something similar to approach (b) in his mail [1],
basically rejecting some commands like Truncate, Reindex,.. if the
current transaction is already holding the table open and I think if
we want to go that way, it might be better to reject any command that
requires lock level higher than the current transaction has on table.
Tom has suggested few more ways to resolve such deadlocks in that
thread, but I think we never implemented those.

Here, one point to think is do we really need to invent some ways to
make hash indexes not prone to that problem when it can occur for
other indexes or even for heap.  Even if want to do something, I think
the solution has to be common.

[1] - https://www.postgresql.org/message-id/21534.1200956464@sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: btree vacuum and suspended scans can deadlock

From

Kevin Grittner

Date:

14 October 2016, 21:00:27

On Thu, Oct 13, 2016 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I was about to suggest that maybe we didn't need cleanup locks in btree
> indexes anymore now that SnapshotNow is gone, but I see that somebody
> already altered nbtree/README to say this:
>
> : Therefore, a scan using an MVCC snapshot which has no other confounding
> : factors will not hold the pin after the page contents are read.  The
> : current reasons for exceptions, where a pin is still needed, are if the
> : index is not WAL-logged or if the scan is an index-only scan.
>
> This is one of the saddest excuses for documentation I've ever seen,
> because it doesn't explain WHY either of those conditions require a VACUUM
> interlock, and certainly it's not immediately obvious why they should.
> "git blame" pins the blame for this text on Kevin, so I'm going to throw
> it up to him to explain himself.

Going back to old posts to confirm the reasoning at the time, I
find this:

The reason unlogged tables are an issue is that when a pin is not
held for the index page, TIDs may be reused before we move to the
next page; LP_DEAD hinting (one of the last things done with the
old page before moving to the next page) would not work correctly
in such a case.  We work around that by storing the page LSN into
the scan position structure when the page contents are read, and
only doing hinting if that matches the current LSN for the page
when we are ready to do the hinting.  That won't work for an index
which is not WAL-logged, since the LSN is not set, so we hold pins
for those.

Visibility information for an index-only scan isn't checked while
the index page READ lock is held, so so it appears that some work
is needed to change that before such scans can drop the pins.

Would you like me to add something to that effect into the README
now, or would you prefer to take it from here?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company