Re: Missed check for too-many-children in bgworker spawning - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Missed check for too-many-children in bgworker spawning
Date
Msg-id 20191104204116.qugcb3ytkac3ugy3@alap3.anarazel.de
Whole thread Raw
In response to Re: Missed check for too-many-children in bgworker spawning  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2019-11-04 14:58:20 -0500, Robert Haas wrote:
> On Mon, Nov 4, 2019 at 2:04 PM Andres Freund <andres@anarazel.de> wrote:
> > Is that really true? In the case where it started and failed we except
> > the error queue to have been attached to, and there to be either an
> > error 'E' or a 'X' response (cf HandleParallelMessage()).  It doesn't
> > strike me as very complicated to keep track of whether any worker has
> > sent an 'E' or not, no?  I don't think we really need the
> 
> One of us is confused here, because I don't think that helps. Consider
> three background workers Alice, Bob, and Charlie. Alice fails to
> launch because fork() fails. Bob launches but then exits unexpectedly.
> Charlie has no difficulties and carries out his assigned duties.
> 
> Now, the system you are proposing will say that Charlie is OK but
> Alice and Bob are a problem. However, that's the way it already works.
> What Tom wants is to distinguish Alice from Bob, and your proposal is
> of no help at all with that problem, so far as I can see.

I don't see how what I'm saying treats Alice and Bob the same. What I'm
saying is that if a worker has been started and shut down, without
signalling an error via the error queue, and without an exit code that
causes postmaster to worry, then we can just ignore the worker for the
purpose of determining whether the query succeeded. Without a meaningful
loss in reliably. And we can detect such a cases easily, we already do,
we just have remove an ereport(), and document things.


> > > We certainly can't ignore a worker that managed to start and
> > > then bombed out, because it might've already, for example, claimed a
> > > block from a Parallel Seq Scan and not yet sent back the corresponding
> > > tuples. We could ignore a worker that never started at all, due to
> > > EAGAIN or whatever else, but the original process that registered the
> > > worker has no way of finding this out.
> >
> > Sure, but in that case we'd have gotten either an error back from the
> > worker, or postmaster wouldhave PANIC restarted everyone due to an
> > unhandled error in the worker, no?
> 
> An unhandled ERROR in the worker is not a PANIC. I think it's just an
> ERROR that ends up being fatal in effect, but even if it's actually
> promoted to FATAL, it's not a PANIC.

If it's an _exit without going through the PG machinery, it'll
eventually be PANIC, albeit with a slight delay. And once we're actually
executing the parallel query, we better have error reporting set up for
parallel queries.


> It is *generally* true that if a worker hits an ERROR, the error will
> be propagated back to the leader, but it is not an invariable rule.
> One pretty common way that it fails to happen - common in the sense
> that it comes up during development, not common on production systems
> I hope - is if a worker dies before reaching the call to
> pq_redirect_to_shm_mq(). Before that, there's no possibility of
> communicating anything. Granted, at that point we shouldn't yet have
> done any work that might mess up the query results.

Right.


> Similarly, once we reach that point, we are dependent on a certain amount of good
> behavior for things to work as expected; yeah, any code that calls
> proc_exit() is suppose to signal an ERROR or FATAL first, but what if
> it doesn't? Granted, in that case we'd probably fail to send an 'X'
> message, too, so the leader would still have a chance to realize
> something is wrong.

I mean, in that case so many more things are screwed up, I don't buy
that it's worth pessimizing ENOMEM handling for this. And if you're
really concerned, we could add before_shmem_exit hook or such that makes
extra double sure that we've signalled something.


> I guess I agree with you to this extent: I made a policy decision that
> if a worker is successfully fails to show up, that's an ERROR. It
> would be possible to adopt the opposite policy, namely that if a
> worker doesn't show up, that's an "oh well." You'd have to be very
> certain that the worker wasn't going to show up later, though. For
> instance, suppose you check all of the shared memory queues used for
> returning tuples and find that every queue is either in a state where
> (1) nobody's ever attached to it or (2) somebody attached and then
> detached. This is not good enough, because it's possible that after
> you checked queue #1, and found it in the former state, someone
> attached and read a block, which caused queue #2 to enter the latter
> state before you got around to checking it. If you decide that it's OK
> to decide that we're done at this point, you'll never return the
> tuples that are pushed through queue #1.

That's why the code *already* waits for workers to attach, or for the
slot to be marked unused/invalid/reused. I don't see how that applies to
not explicitly erroring out when we know that the worker *failed* to
start:

void
WaitForParallelWorkersToFinish(ParallelContext *pcxt)
...


            /*
             * We didn't detect any living workers, but not all workers are
             * known to have exited cleanly.  Either not all workers have
             * launched yet, or maybe some of them failed to start or
             * terminated abnormally.
             */
            for (i = 0; i < pcxt->nworkers_launched; ++i)
            {
                pid_t        pid;
                shm_mq       *mq;

                /*
                 * If the worker is BGWH_NOT_YET_STARTED or BGWH_STARTED, we
                 * should just keep waiting.  If it is BGWH_STOPPED, then
                 * further investigation is needed.
                 */
                if (pcxt->worker[i].error_mqh == NULL ||
                    pcxt->worker[i].bgwhandle == NULL ||
                    GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle,
                                           &pid) != BGWH_STOPPED)
                    continue;

                /*
                 * Check whether the worker ended up stopped without ever
                 * attaching to the error queue.  If so, the postmaster was
                 * unable to fork the worker or it exited without initializing
                 * properly.  We must throw an error, since the caller may
                 * have been expecting the worker to do some work before
                 * exiting.
                 */
                mq = shm_mq_get_queue(pcxt->worker[i].error_mqh);
                if (shm_mq_get_sender(mq) == NULL)
                    ereport(ERROR,
                            (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                             errmsg("parallel worker failed to initialize"),
                             errhint("More details may be available in the server log.")));

                /*
                 * The worker is stopped, but is attached to the error queue.
                 * Unless there's a bug somewhere, this will only happen when
                 * the worker writes messages and terminates after the
                 * CHECK_FOR_INTERRUPTS() near the top of this function and
                 * before the call to GetBackgroundWorkerPid().  In that case,
                 * or latch should have been set as well and the right things
                 * will happen on the next pass through the loop.
                 */
            }
        }



> But, assuming you nailed the door shut so that such problems could not
> occur, I think we could make a decision to ignore works that failed
> before doing anything interesting. Whether that would be a good policy
> decision is pretty questionable in my mind. In addition to what I
> mentioned before, I think there's a serious danger that errors that
> users would have really wanted to know about - or developers would
> really want to have known about - would get ignored. You could have
> some horrible problem that's making your workers fail to launch, and
> the system would just carry on as if everything were fine, except with
> bad query plans. I realize that you and others might say "oh, well,
> monitor your logs, then," but I think there is certainly some value in
> an ordinary user being able to know that things didn't go well without
> having to look into the PostgreSQL log for errors. Now, maybe you
> think that's not enough value to justify having it work the way it
> does today, and I certainly respect that, but I don't view it that way
> myself.

Yea, this is somewhat of a pickle. I'm inclined to think that the
problem of unnecessarily ERRORing out queries is worse than the disease
(causing unnecessary failures to make debugging of some not all that
likely errors easier is somewhat a severe measure).


I think I mentioned this to you on chat, but I think the in-core use of
bgworkers, at least as they currently are designed, is/was an
architecturally bad idea. There's numerous problems flowing from
that, with error handling being one big recurring theme.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: 64 bit transaction id
Next
From: Tom Lane
Date:
Subject: Re: patch: psql - enforce constant width of last column