Re: why do hash index builds use smgrextend() for new splitpoint pages - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: why do hash index builds use smgrextend() for new splitpoint pages
Date
Msg-id CAA4eK1KbPo8+XtJf1Cc6rtLfwHYribSvCW=WRwggmeG8SP3c3w@mail.gmail.com
Whole thread Raw
In response to Re: why do hash index builds use smgrextend() for new splitpoint pages  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
On Sat, Feb 26, 2022 at 9:17 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Feb 25, 2022 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Feb 26, 2022 at 3:01 AM Melanie Plageman
> > <melanieplageman@gmail.com> wrote:
> > >
> > > Since _hash_alloc_buckets() WAL-logs the last page of the
> > > splitpoint, is it safe to skip the smgrimmedsync()? What if the last
> > > page of the splitpoint doesn't end up having any tuples added to it
> > > during the index build and the redo pointer is moved past the WAL for
> > > this page and then later there is a crash sometime before this page
> > > makes it to permanent storage. Does it matter that this page is lost? If
> > > not, then why bother WAL-logging it?
> > >
> >
> > I think we don't care if the page is lost before we update the
> > meta-page in the caller because we will try to reallocate in that
> > case. But we do care after meta page update (having the updated
> > information about this extension via different masks) in which case we
> > won't lose this last page because it would have registered the sync
> > request for it via sgmrextend before meta page update.
>
> and could it happen that during smgrextend() for the last page, a
> checkpoint starts and finishes between FileWrite() and
> register_dirty_segment(), then index build finishes, and then a crash
> occurs before another checkpoint completes the pending fsync for that
> last page?
>

Yeah, this seems to be possible and then the problem could be that
index's idea and smgr's idea for EOF could be different which could
lead to a problem when we try to get a new page via _hash_getnewbuf().
If this theory turns out to be true then probably, we can get an error
either because of disk full or the index might request a block that is
beyond EOF as determined by RelationGetNumberOfBlocksInFork() in
_hash_getnewbuf().

Can we try to reproduce this scenario with the help of a debugger to
see if we are missing something?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: Yura Sokolov
Date:
Subject: Re: BufferAlloc: don't take two simultaneous locks