Re: error context for vacuum to include block number - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: error context for vacuum to include block number
Date
Msg-id 20200327061601.GE20103@telsasoft.com
Whole thread Raw
In response to Re: error context for vacuum to include block number  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: error context for vacuum to include block number  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Mar 26, 2020 at 11:44:24PM -0500, Justin Pryzby wrote:
> On Fri, Mar 27, 2020 at 09:49:29AM +0530, Amit Kapila wrote:
> > On Fri, Mar 27, 2020 at 3:47 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > > Hm, I was just wondering what happens if an error happens *during*
> > > > update_vacuum_error_cbarg().  It seems like if we set
> > > > errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, then an
> > > > error would cause a crash.
> > > >
> > 
> > Can't that be avoided if you check if cbarg->indname is non-null in
> > vacuum_error_callback as we are already doing for
> > VACUUM_ERRCB_PHASE_TRUNCATE?
> > 
> > > >  And if we pfree and set indname before phase, it'd
> > > > be a problem when going from an index phase to non-index phase.
> > 
> > How is it possible that we move to the non-index phase without
> > clearing indname as we always revert back the old phase information?
> 
> The crash scenario I'm trying to avoid would be like statement_timeout or other
> asynchronous event occurring between two non-atomic operations.
> 
> I said that there's an issue no matter what order we set indname/phase;
> If we wrote:
> |cbarg->indname = indname;
> |cbarg->phase = phase;
> ..and hit a timeout (or similar) between setting indname=NULL but before
> setting phase=VACUUM_INDEX, then we can crash due to null pointer.
> 
> But if we write:
> |cbarg->phase = phase;
> |if (cbarg->indname) {pfree(cbarg->indname);}
> |cbarg->indname = indname ? pstrdup(indname) : NULL;
> ..then we can still crash if we timeout between freeing cbarg->indname and
> setting it to null, due to acccessing a pfreed allocation.

If "phase" is updated before "indname", I'm able to induce a synthetic crash
like this:

+if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname==NULL) 
+{
+kill(getpid(), SIGINT);
+pg_sleep(1); // that's needed since signals are delivered asynchronously
+}

And another crash if we do this after pfree but before setting indname.

+if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname!=NULL)
+{
+kill(getpid(), SIGINT);
+pg_sleep(1);
+}

I'm not sure if those are possible outside of "induced" errors.  Maybe the
function is essentially atomic due to no CHECK_FOR_INTERRUPTS or similar?

-- 
Justin



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: allow online change primary_conninfo
Next
From: Amit Kapila
Date:
Subject: Re: error context for vacuum to include block number