Re: background workers, round three - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: background workers, round three
Date
Msg-id CADyhKSXs-EvV-iovE839rOcQgJMDri=8uurzs4S0N0N-NyNqSg@mail.gmail.com
Whole thread Raw
In response to Re: background workers, round three  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: background workers, round three
List pgsql-hackers
2013/10/14 Robert Haas <robertmhaas@gmail.com>:
>> * ephemeral-precious-v1.patch
>> AtEOXact_BackgroundWorker() is located around other AtEOXact_*
>> routines. Doesn't it makes resource management complicated?
>> In case when main process goes into error handler but worker
>> process is still running in health, it may continue to calculate
>> something and put its results on shared memory segment, even
>> though main process suggest postmaster to kill it.
>
> Since I wrote this patch set, I've been thinking a lot more about
> error recovery.  Obviously, one of the big problems as we think about
> parallel query is that you've now got multiple backends floating
> around, and if the transaction aborts (in any backend), the other
> backends don't automatically know that; they need some way to know
> that they, too, short abort processing.  There are a lot of details to
> get right here, and the time I've spent on it so far convinces me that
> the problem is anything but easy.
>
> Having said that, I'm not too concerned about the particular issue
> that you raise here.  The resources that need to be cleaned up during
> transaction abort are backend-private resources.  If, for example, the
> user backend detaches a dynamic shared memory segment that is being
> used for a parallel computation, they're not actually *destroying* the
> segment; they are just detaching it *from their address space*.  The
> last process to detach it will also destroy it.  So the ordering in
> which the various processes detach it doesn't matter much.
>
> One of the things I do this is necessary is a set of on_dsm_detach
> callbacks that work pretty much the way that on_shmem_exit callbacks
> work today.  Just as we can't detach from the main shared memory
> segment without releasing locks and buffer pins and lwlocks and our
> PGXACT, we can't release from a dynamic shared memory segment without
> performing any similar cleanup that is needed.  I'm currently working
> on a patch for that.
>
Hmm. It probably allows to clean-up smaller fraction of data structure
constructed on dynamic shared memory segment, if we map / unmap
for each transactions.

>> All the ResourceOwnerRelease() callbacks are located prior to
>> AtEOXact_BackgroundWorker(), it is hard to release resources
>> being in use by background worker, because they are healthy
>> running until it receives termination signal, but sent later.
>> In addition, it makes implementation complicated if we need to
>> design background workers to release resources if and when it
>> is terminated. I don't think it is a good coding style, if we need
>> to release resources in different location depending on context.
>
> Which specific resources are you concerned about?
>
I assumed smaller chunks allocated on static or dynamic shared
memory segment to be used for communicate between main
process and worker processes because of my motivation.
When we move a chunk of data to co-processor using asynchronous
DMA transfer, API requires the source buffer is mlock()'ed to avoid
unintentional swap out during DMA transfer. On the other hand,
cost of mlock() operation is not ignorable, so it may be a reasonable
design to lock a shared memory segment on start-up time then
continue to use it, without unmapping.
So, I wondered how to handle the situation when extension tries
to manage a resource with smaller granularity than the one
managed by PostgreSQL core.

>> So, I'd like to propose to add a new invocation point of
>> ResourceOwnerRelease() after all AtEOXact_* jobs, with
>> new label something like RESOURCE_RELEASE_FINAL.
>>
>> In addition, AtEOXact_BackgroundWorker() does not synchronize
>> termination of background worker processes being killed.
>> Of course it depends on situation, I think it is good idea to wait
>> for completion of worker processes to be terminated, to ensure
>> resource to be released is backed to the main process if above
>> ResourceOwnerRelease() do the job.
>
> Again, which resources are we talking about here?  I tend to think
> it's an essential property of the system that we *shouldn't* have to
> care about the order in which processes are terminated.  First, that
> will be difficult to control; if an ERROR or FATAL condition has
> occurred and we need to terminate, then there are real limits to what
> guarantees we can provide after that point.  Second, it's also
> *expensive*.  The point of parallelism is to make things faster; any
> steps we add that involve waiting for other processes to do things
> will eat away at the available gains.  For a query that'll run for an
> hour that hardly matters, but for short queries it's important to
> avoid unnecessary overhead.
>
Indeed, you are right. Error path has to be terminated soon.
Probably, ResourceOwnerRelease() callback needs to inform healthy
performing worker process the transaction got aborted thus no need
to return its calculation result, using some way, if I implement it.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Next
From: Amit Kapila
Date:
Subject: Re: Long paths for tablespace leads to uninterruptible hang in Windows