Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database) - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)
Date
Msg-id 1348147099-sup-1200@alvh.no-ip.org
Whole thread Raw
In response to Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
Excerpts from Amit Kapila's message of jue sep 20 02:10:23 -0300 2012:


>   Why can't worker tasks be also permanent, which can be controlled through
>   configuration. What I mean to say is that if user has need for parallel
> operations
>   he can configure max_worker_tasks and those many worker tasks will get
> created.
>   Otherwise without having such parameter, we might not be sure whether such
> deamons
>   will be of use to database users who don't need any background ops.
>
>   The dynamism will come in to scene when we need to allocate such daemons
> for particular ops(query), because
>   might be operation need certain number of worker tasks, but no such task
> is available, at that time it need
>   to be decided whether to spawn a new task or change the parallelism in
> operation such that it can be executed with
>   available number of worker tasks.

Well, there is a difficulty here which is that the number of processes
connected to databases must be configured during postmaster start
(because it determines the size of certain shared memory structs).  So
you cannot just spawn more tasks if all max_worker_tasks are busy.
(This is a problem only for those workers that want to be connected as
backends.  Those that want libpq connections do not need this and are
easier to handle.)

The design we're currently discussing actually does not require a new
GUC parameter at all.  This is why: since the workers must be registered
before postmaster start anyway (in the _PG_init function of a module
that's listed in shared_preload_libraries) then we have to run a
registering function during postmaster start.  So postmaster can simply
count how many it needs and size those structs from there.  Workers that
do not need a backend-like connection don't have a shmem sizing
requirement so are not important for this.  Configuration is thus
simplified.

BTW I am working on this patch and I think I have a workable design in
place; I just couldn't get the code done before the start of this
commitfest.  (I am missing handling the EXEC_BACKEND case though, but I
will not even look into that until the basic Unix case is working).

One thing I am not going to look into is how is this new capability be
used for parallel query.  I feel we have enough use cases without it,
that we can develop a fairly powerful feature.  After that is done and
proven (and committed) we can look into how we can use this to implement
these short-lived workers for stuff such as parallel query.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Invalid optimization of VOLATILE function in WHERE clause?
Next
From: Kohei KaiGai
Date:
Subject: Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)