Re: Failure while inserting parent tuple to B-tree is not fun - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Failure while inserting parent tuple to B-tree is not fun
Date
Msg-id 20131022182546.GF7435@awork2.anarazel.de
Whole thread Raw
In response to Failure while inserting parent tuple to B-tree is not fun  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Failure while inserting parent tuple to B-tree is not fun  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On 2013-10-22 19:55:09 +0300, Heikki Linnakangas wrote:
> Splitting a B-tree page is a two-stage process: First, the page is split,
> and then a downlink for the new right page is inserted into the parent
> (which might recurse to split the parent page, too). What happens if
> inserting the downlink fails for some reason? I tried that out, and it turns
> out that it's not nice.
> 
> I used this to cause a failure:
> 
> >--- a/src/backend/access/nbtree/nbtinsert.c
> >+++ b/src/backend/access/nbtree/nbtinsert.c
> >@@ -1669,6 +1669,8 @@ _bt_insert_parent(Relation rel,
> >             _bt_relbuf(rel, pbuf);
> >         }
> >
> >+        elog(ERROR, "fail!");
> >+
> >         /* get high key from left page == lowest key on new right page */
> >         ritem = (IndexTuple) PageGetItem(page,
> >                                          PageGetItemId(page, P_HIKEY));
> 
> postgres=# create table foo (i int4 primary key);
> CREATE TABLE
> postgres=# insert into foo select generate_series(1, 10000);
> ERROR:  fail!
> 
> That's not surprising. But when I removed that elog again and restarted the
> server, I still can't insert. The index is permanently broken:
> 
> postgres=# insert into foo select generate_series(1, 10000);
> ERROR:  failed to re-find parent key in index "foo_pkey" for split pages 4/5
> 
> In real life, you would get a failure like this e.g if you run out of memory
> or disk space while inserting the downlink to the parent. Although rare in
> practice, it's no fun if it happens.

Why doesn't the incomplete split mechanism prevent this? Because we do
not delay checkpoints on the primary and a checkpoint happened just
befor your elog(ERROR) above?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Reasons not to like asprintf
Next
From: Josh Berkus
Date:
Subject: Location for external scripts for Extensions?