Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout - Mailing list pgsql-hackers

From Tom Lane
Subject Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout
Date
Msg-id 11701.1104522753@sss.pgh.pa.us
Whole thread Raw
In response to 'COPY ... FROM' inserts to btree, blocks on buffer writeout  (Michael Wildpaner <mike@rainbow.studorg.tuwien.ac.at>)
Responses Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout
List pgsql-hackers
Michael Wildpaner <mike@rainbow.studorg.tuwien.ac.at> writes:
> two concurrent clients try to 'COPY ... FROM ...' to the same table,
> "feature_link".

> The second one (pid 17983) is waiting for an ExclusiveLock on the table's
> primary key index, "key__idpk__flink".

> The first one (pid 17980) is inserting into the index's btree, waiting
> for a buffer lock. This locking state seems to persist.

After staring at this for a little bit I have a theory.  The stack trace
for 17980 indicates that it is trying to flush a dirty buffer so that it
can take over the buffer for a new index page.  It's blocked trying to
acquire a shared buffer lock on that buffer, which means that someone
else must have an exclusive buffer lock, which the code is not expecting
because the buffer just came off the free list and therefore was not
pinned by anyone.

However ... FlushBuffer releases the BufMgrLock before trying to acquire
the per-buffer lock.  If 17980 lost its time slice during that interval,
it'd be possible for someone else to come in and re-pin the chosen
victim page and then lock the buffer before 17980 could.

Normally this wouldn't be any big problem, but if the someone else later
blocked on some lock held by 17980, you'd have a deadlock.  I think
that's exactly what we're seeing here.  The victim buffer page must be
the same one that 17983 is trying to split; so it's got exclusive lock
(and pin) on that page, and is now stuck waiting for the lock that would
give it the privilege to extend the index.

A possible fix for this is to reorder the operations in FlushBuffer
so that we share-lock the buffer before releasing BufMgrLock ... but
I'm not sure that doesn't introduce other deadlock risks.  It needs
some thought.

If this is the correct explanation, the bug has been around for a good
while; but it's got to be a very low-probability scenario.  Congrats
to Michael for taking the time to dig into it when he saw it.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: contrib regression on old versions
Next
From: Sibtay Abbas
Date:
Subject: exception handling in plpgsql