Home > mailing lists

Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout
Date	December 31, 2004 22:54:07
Msg-id	11701.1104522753@sss.pgh.pa.us Whole thread Raw
In response to	'COPY ... FROM' inserts to btree, blocks on buffer writeout (Michael Wildpaner <mike@rainbow.studorg.tuwien.ac.at>)
Responses	Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout
List	pgsql-hackers

Tree view

Michael Wildpaner <mike@rainbow.studorg.tuwien.ac.at> writes:
> two concurrent clients try to 'COPY ... FROM ...' to the same table,
> "feature_link".

> The second one (pid 17983) is waiting for an ExclusiveLock on the table's
> primary key index, "key__idpk__flink".

> The first one (pid 17980) is inserting into the index's btree, waiting
> for a buffer lock. This locking state seems to persist.

After staring at this for a little bit I have a theory.  The stack trace
for 17980 indicates that it is trying to flush a dirty buffer so that it
can take over the buffer for a new index page.  It's blocked trying to
acquire a shared buffer lock on that buffer, which means that someone
else must have an exclusive buffer lock, which the code is not expecting
because the buffer just came off the free list and therefore was not
pinned by anyone.

However ... FlushBuffer releases the BufMgrLock before trying to acquire
the per-buffer lock.  If 17980 lost its time slice during that interval,
it'd be possible for someone else to come in and re-pin the chosen
victim page and then lock the buffer before 17980 could.

Normally this wouldn't be any big problem, but if the someone else later
blocked on some lock held by 17980, you'd have a deadlock.  I think
that's exactly what we're seeing here.  The victim buffer page must be
the same one that 17983 is trying to split; so it's got exclusive lock
(and pin) on that page, and is now stuck waiting for the lock that would
give it the privilege to extend the index.

A possible fix for this is to reorder the operations in FlushBuffer
so that we share-lock the buffer before releasing BufMgrLock ... but
I'm not sure that doesn't introduce other deadlock risks.  It needs
some thought.

If this is the correct explanation, the bug has been around for a good
while; but it's got to be a very low-probability scenario.  Congrats
to Michael for taking the time to dig into it when he saw it.
        regards, tom lane

pgsql-hackers by date:

From: Andrew Dunstan
Date: 31 December 2004, 22:53:49
Subject: Re: contrib regression on old versions

From: Sibtay Abbas
Date: 31 December 2004, 22:57:25
Subject: exception handling in plpgsql

Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout - Mailing list pgsql-hackers

Previous

Next