Re: reindex and copy - deadlock? - Mailing list pgsql-performance

From Tom Lane
Subject Re: reindex and copy - deadlock?
Date
Msg-id 10244.1088615217@sss.pgh.pa.us
Whole thread Raw
In response to Re: reindex and copy - deadlock?  (Litao Wu <litaowu@yahoo.com>)
Responses Re: reindex and copy - deadlock?
List pgsql-performance
Litao Wu <litaowu@yahoo.com> writes:
> Our PG version is 7.3.2.

Hmm.  On general principles you should be using 7.3.6, but I do not see
anything in the 7.3.* change logs that looks very likely to cure this.

> The copy process is always there. Besides copy
> process, there are many select processes wait also
> (it is understandable only when reindex,
> but how come selects wait when drop/create index?

DROP INDEX would lock out selects (it has no other way to be sure no
select is trying to *use* the index).  Once you're past that, selects
would work, but if you try something like
    begin; drop index; create index; commit;
then the drop's lock will be held till commit.

I'm not sure about whether COPY is related.  In your original post, the
COPY was waiting to acquire RowExclusiveLock on the table, so it hadn't
actually done anything yet and really couldn't be holding a buffer lock
AFAICS.

> But one thing is sure:
> reindex or create index is granted lock while
> others wait. If reindex/create index is not
> the perpetrator, how can PG grants it lock
> but not others, like COPY?

The point is that it's waiting for a lower-level lock (namely a buffer
LWLock).  There's no deadlock detection for LWLocks, because they're not
supposed to be used in ways that could cause a deadlock.

Assuming for the moment that indeed this is a deadlock, you could learn
something the next time it happens with some manual investigation.
You'll need to keep using the debug-enabled build.  When you next get a
lockup, proceed as follows:

1. Attach to the REINDEX or CREATE INDEX process and find out which
LWLock number it is blocked on.  (This is the lockid argument of
LWLockAcquire, 21335 in your trace of today.)

2. For *each* live backend process (including the REINDEX itself),
attach with gdb and look at the held-locks status of lwlock.c.
This would go something like

    gdb> p num_held_lwlocks
if greater than zero:
    gdb> x/10d held_lwlocks
(replace "10" by the value of num_held_lwlocks)

If you find a backend that is holding the lock number that REINDEX
wants, print out its call stack with "bt", and look in pg_locks to see
what lockmanager locks it is holding or waiting for.  If you do not find
one, then the deadlock theory is disproved, and we're back to square
one.

            regards, tom lane

pgsql-performance by date:

Previous
From: Litao Wu
Date:
Subject: Re: reindex and copy - deadlock?
Next
From: Litao Wu
Date:
Subject: Re: reindex and copy - deadlock?