Re: Error with index on unlogged table - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Error with index on unlogged table
Date
Msg-id 20150326175024.GJ451@alap3.anarazel.de
Whole thread Raw
In response to Re: Error with index on unlogged table  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Error with index on unlogged table  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On 2015-03-26 15:13:41 +0100, Andres Freund wrote:
> On 2015-03-26 13:55:22 +0000, Thom Brown wrote:
> > I still, however, have a problem with the separate and original issue of:
> > 
> > # insert into utest (thing) values ('moomoo');
> > ERROR:  index "utest_pkey" contains unexpected zero page at block 0
> > HINT:  Please REINDEX it.
> > 
> > I don't see why the user should need to go re-indexing all unlogged tables
> > each time a standby is promoted.  The index should just be empty and ready
> > to use.
> 
> There's definitely something rather broken here. Investigating.

As far as I can see this has been broken at least since the introduction
of fast promotion. WAL replay will update the init fork in shared
memory, but it'll not be guaranteed to be flushed to disk when the reset
happens. d3586fc8a et al. then also made it possible to hit the issue
without fast promotion.

To hit the issue there may not be a restartpoint (requiring a checkpoint
on the primary) since the creation of the unlogged table.

I think the problem here is that the *primary* makes no such
assumptions. Init forks are logged via stuff likesmgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,          (char
*)metapage, true);if (XLogIsNeeded())    log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
BTREE_METAPAGE,metapage, false);
 
/* * An immediate sync is required even if we xlog'd the page, because the * write did not go through shared_buffers
andtherefore a concurrent * checkpoint may have moved the redo pointer past our xlog record.
*/smgrimmedsync(index->rd_smgr,INIT_FORKNUM);
 

i.e. the data is written out directly to disk, circumventing
shared_buffers. It's pretty bad that we don't do the same on the
standby. For master I think we should just add a bit to the XLOG_FPI
record saying the data should be forced out to disk. I'm less sure
what's to be done in the back branches. Flushing every HEAP_NEWPAGE
record isn't really an option.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Index-only scans for GiST.
Next
From: Peter Geoghegan
Date:
Subject: Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0