Thread: Creating a DSA area to provide work space for parallel execution

Creating a DSA area to provide work space for parallel execution

From
Thomas Munro
Date:
Hi hackers,

A couple of months ago I proposed dynamic shared areas[1].  DSA areas
are dynamically sized shared memory heaps that backends can use to
share data, building on top of the existing DSM infrastructure.

One target use case for DSA areas is to provide work space for
parallel query execution.  To that end, here is a patch to create a
DSA area for use by executor code.  The area is automatically attached
to the leader and all worker processes for the duration of the
parallel query, and is available as estate->es_query_area.

Backends already have access to shared memory through a single DSM
segment managed with a table-of-contents.  The TOC provides a way to
carve out some shared storage space for individual executor nodes and
look it up later by plan node ID.  That works for things like
ParallelHeapScanDescData whose size is known up front, but not so well
if you need something more like a heap in which to build shared data
structures.  Through estate->es_query_area, a parallel-aware executor
node can use and recycle arbitrary amounts of shared memory with an
allocate/free interface.

Motivating use cases include shared bitmaps and shared hash tables
(patches to follow).

Currently, this doesn't mean you don't also need the existing DSM
segment.  In order share data structures in the DSA area, you need a
way to exchange pointers to find them, and the existing segment + TOC
mechanism is ideal for that.

One obvious problem is that this patch results in at least *two* DSM
segments being created for every parallel query execution: the main
segment used for parallel execution, and then the initial segment
managed by the DSA area.  One thought is that DSA areas are the more
general mechanism, so perhaps we should figure out how to store
contents of the existing segment in it.  The TOC interface would need
a few tweaks to be able to live in memory allocated with dsa_allocate,
and they we'd need to share that address with other backends so that
they could find it (cf the current approach of finding the TOC at the
start of the segment).  I haven't prototyped that yet.  That'd involve
changing the wording "InitializeDSM" that appears in various places
including the FDW API, which has been putting me off...

This patch depends on dsa-v2.patch[1].

[1] https://www.postgresql.org/message-id/flat/CAEepm%3D1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Creating a DSA area to provide work space for parallel execution

From
Thomas Munro
Date:
On Wed, Oct 5, 2016 at 10:32 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> One obvious problem is that this patch results in at least *two* DSM
> segments being created for every parallel query execution: the main
> segment used for parallel execution, and then the initial segment
> managed by the DSA area.  One thought is that DSA areas are the more
> general mechanism, so perhaps we should figure out how to store
> contents of the existing segment in it.  The TOC interface would need
> a few tweaks to be able to live in memory allocated with dsa_allocate,
> and they we'd need to share that address with other backends so that
> they could find it (cf the current approach of finding the TOC at the
> start of the segment).  I haven't prototyped that yet.  That'd involve
> changing the wording "InitializeDSM" that appears in various places
> including the FDW API, which has been putting me off...

... or we could allow DSA areas to be constructed inside existing
shmem, as in the attached patch which requires dsa_create_in_place,
from the patch at
https://www.postgresql.org/message-id/CAEepm%3D0-pbokaQdCXhtYn%3Dw64MmdJz4hYW7qcSU235ar276x7w%40mail.gmail.com
.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Creating a DSA area to provide work space for parallel execution

From
Dilip Kumar
Date:
On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> ... or we could allow DSA areas to be constructed inside existing
> shmem, as in the attached patch which requires dsa_create_in_place,
> from the patch at
> https://www.postgresql.org/message-id/CAEepm%3D0-pbokaQdCXhtYn%3Dw64MmdJz4hYW7qcSU235ar276x7w%40mail.gmail.com
> .
Seems like there is problem in this patch..

In below code,  pei->area is not yet allocated at this point , so
estate->es_query_area will always me NULL ?

+ estate->es_query_area = pei->area;
+ /* Create a parallel context. */ pcxt = CreateParallelContext(ParallelQueryMain, nworkers); pei->pcxt = pcxt;
@@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers) shm_toc_estimate_keys(&pcxt->estimator, 1); }

+ /* Estimate space for DSA area. */
+ shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Everyone's had a chance to ask for space, so now create the DSM. */ InitializeParallelDSM(pcxt);

@@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers) pei->instrumentation = instrumentation; }

+ /* Create a DSA area that can be used by the leader and all workers. */
+ area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space);
+ pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE,
+ LWTRANCHE_PARALLEL_EXEC_AREA,
+   "parallel query memory area");


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Creating a DSA area to provide work space for parallel execution

From
Dilip Kumar
Date:
I have one more question,

In V1 we were calling dsa_detach in ExecParallelCleanup and in
ParallelQueryMain, but it's removed in v2.

Any specific reason ?
Does this need to be used differently ?
ExecParallelCleanup(ParallelExecutorInfo *pei){
+ if (pei->area != NULL)
+ {
+ dsa_detach(pei->area);
+ pei->area = NULL;
+ }

After this changes, I am getting DSM segment leak warning.

I am calling dsa_allocate and dsa_free.

On Thu, Nov 24, 2016 at 8:09 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> ... or we could allow DSA areas to be constructed inside existing
>> shmem, as in the attached patch which requires dsa_create_in_place,
>> from the patch at
>> https://www.postgresql.org/message-id/CAEepm%3D0-pbokaQdCXhtYn%3Dw64MmdJz4hYW7qcSU235ar276x7w%40mail.gmail.com
>> .
> Seems like there is problem in this patch..
>
> In below code,  pei->area is not yet allocated at this point , so
> estate->es_query_area will always me NULL ?
>
> + estate->es_query_area = pei->area;
> +
>   /* Create a parallel context. */
>   pcxt = CreateParallelContext(ParallelQueryMain, nworkers);
>   pei->pcxt = pcxt;
> @@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState
> *estate, int nworkers)
>   shm_toc_estimate_keys(&pcxt->estimator, 1);
>   }
>
> + /* Estimate space for DSA area. */
> + shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE);
> + shm_toc_estimate_keys(&pcxt->estimator, 1);
> +
>   /* Everyone's had a chance to ask for space, so now create the DSM. */
>   InitializeParallelDSM(pcxt);
>
> @@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState
> *estate, int nworkers)
>   pei->instrumentation = instrumentation;
>   }
>
> + /* Create a DSA area that can be used by the leader and all workers. */
> + area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE);
> + shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space);
> + pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE,
> + LWTRANCHE_PARALLEL_EXEC_AREA,
> +   "parallel query memory area");
>
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Creating a DSA area to provide work space for parallel execution

From
Thomas Munro
Date:
On Fri, Nov 25, 2016 at 4:32 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I have one more question,
>
> In V1 we were calling dsa_detach in ExecParallelCleanup and in
> ParallelQueryMain, but it's removed in v2.
>
> Any specific reason ?
> Does this need to be used differently ?
>
>  ExecParallelCleanup(ParallelExecutorInfo *pei)
>  {
> + if (pei->area != NULL)
> + {
> + dsa_detach(pei->area);
> + pei->area = NULL;
> + }
>
> After this changes, I am getting DSM segment leak warning.

Thanks!  I had some chicken-vs-egg problems dealing with cleanup of
DSM segments belonging to DSA areas created inside DSM segments.
Here's a new version to apply on top of dsa-v7.patch.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Creating a DSA area to provide work space for parallel execution

From
Thomas Munro
Date:
On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Creating a DSA area to provide work space for parallel execution

From
Haribabu Kommi
Date:


On Thu, Dec 1, 2016 at 10:35 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

Re: Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Here's a new version to apply on top of dsa-v7.patch.
>
> Here's a version to go with dsa-v8.patch.

Thomas has spent a fair amount of time beating me up off-list about
the fact that we have no way to recycle LWLock tranche IDs, and I've
spent a fair amount of time trying to defend that decision, but it's
really a problem here.  This is all well and good the way it's written
provided that there's only one parallel context in existence at a
time, but should that number ever exceed 1, which it can, then the
tranche's array_base will be incorrect for LWLocks in one or the other
tranche.

That *almost* doesn't matter.  If you're not compiling with dtrace or
LOCK_DEBUG or LWLOCK_STATS, you'll be fine.  And if you are compiling
with one of those things, I believe the consequences will be no worse
than an occasional nonsensical lock ID.  It's halfway tempting to just
accept that as a known wart, but, uggh, that sucks.

It's a bit hard to come up with a better alternative.  We could set
aside a certain number of tranche IDs for parallel contexts, say 16.
Then as long as the same backend doesn't try to do create more than 16
parallel contexts at the same time, we'll be fine.  And a backend
really shouldn't have more than 16 Gather nodes active at the same
time.  But it could.  In fact, even if the planner never created more
than one Gather node in the same plan (which it sometimes does), the
leader can have multiple parallel contexts active for different
queries at the same time.  It could fire off a Gather and then later
some part of that plan that's running in the master could call a
function that tries to execute some OTHER query which also happens to
involve a Gather and then while the master is executing its part of
THAT query the same thing could happen all over again, so any
arbitrary limit we install here can be exceeded in, admittedly,
extremely contrived scenarios.

Moreover, the LWLock tranche system itself is limited to a 16-bit
integer for tranche IDs, so there can't be more than 65536 of those
over the lifetime of the cluster no matter what.  Since
008608b9d51061b1f598c197477b3dc7be9c4a64, that limit is rather
pointless; as things stand today, we could change the tranche ID to a
32-bit integer without losing anything.  But changing that might eat
up bit space that somebody wants to use later to solve some other
problem, and anyway by itself it doesn't fix anything.  Also, it would
allow LWLockTrancheArray to get regrettably large.

Another alternative is to have the DSA system fix up the tranche base
address every time we enter a DSA function that might need to take a
lock.  The DSA code never returns with any relevant locks held, so the
base address can't get obsoleted by some other DSA while a lock using
the old base address is still held.  That's sort of ugly too, and it
adds a little overhead to fix a problem that will bite almost nobody,
but it's formally correct and seems fairly watertight. Basically each
DSA function that needs to take a lock would need to first do this:
  area->lwlock_tranche.array_base = &area->control->pools[0];

Maybe that's not too bad... thoughts?

BTW, I just noticed that the dsa_area_control member called "lock" is
going to get a fairly random lock ID, based on the number of bytes by
which "lock" follows "pools".  Wouldn't it be better to put all of the
LWLocks - both the general locks and the per-pool locks - in one
array, probably with the general lock first, so that T_ID() will do
the right thing for such locks?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Mon, Dec 5, 2016 at 3:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> Here's a new version to apply on top of dsa-v7.patch.
>>
>> Here's a version to go with dsa-v8.patch.
>
> Thomas has spent a fair amount of time beating me up off-list about
> the fact that we have no way to recycle LWLock tranche IDs, and I've
> spent a fair amount of time trying to defend that decision, but it's
> really a problem here.  This is all well and good the way it's written
> provided that there's only one parallel context in existence at a
> time, but should that number ever exceed 1, which it can, then the
> tranche's array_base will be incorrect for LWLocks in one or the other
> tranche.
>
> That *almost* doesn't matter.  If you're not compiling with dtrace or
> LOCK_DEBUG or LWLOCK_STATS, you'll be fine.  And if you are compiling
> with one of those things, I believe the consequences will be no worse
> than an occasional nonsensical lock ID.  It's halfway tempting to just
> accept that as a known wart, but, uggh, that sucks.
>
> It's a bit hard to come up with a better alternative.

After thinking about this some more, I realized that the problem here
boils down to T_ID() not being smart enough to cope with multiple
instances of the same tranche at different base addresses.  We can
either make it more complicated so that it can do that, or (drum roll,
please!) get rid of it altogether.  I don't think making it more
complicated is very desirable, because I think that we end up
computing T_ID() for every lock acquisition and release whenever
--enable-dtrace is used, even if dtrace is not actually in use.  And
the usefulness of T_ID() for debugging is pretty marginal, with one
important exception, which is that currently T_ID() is used to
distinguish between individual LWLocks in the main tranche.  So here's
my proposal:

1. Give every LWLock in the main tranche a separate tranche ID.  This
has been previously proposed, so it's not a revolutionary concept.

2. Always identify LWLocks in pg_stat_activity only by tranche ID,
never offset within the tranche, not even for the main tranche.  This
results in pg_stat_activity saying "LWLock" rather than either
"LWLockNamed" or "LWLockTranche", which is a user-visible behavior
change but not AFAICS a very upsetting one.

3. Change the dtrace probes that currently pass both T_NAME() and
T_ID() to pass only T_NAME().  This is a minor loss of information for
dtrace, but in view of (1) it's not a very significant loss.

4. Change LOCK_DEBUG and LWLOCK_STATS output that identifies locks
using T_NAME() and T_ID() to instead identify them by T_NAME() and the
pointer address.  Since these are only developer debugging facilities
not intended for production builds, I think it's OK to expose the
pointer address, and it's arguably MORE useful to do so than to expose
an offset into an array with an unknown base address.

5. Remove T_ID() from the can't-happen elog() in LWLockRelease().

6. Remove T_ID() itself.  And then, since that's the only client of
array_base/array_stride, remove those too.  And then, since there's
nothing left in LWLockTranche except for the tranche name, get rid of
the whole structure, simplifying a whole bunch of code all over the
system.

Patch implementing all of this attached.  There's further
simplification that could be done here -- with array_stride gone, we
could do more at compile time rather than run-time -- but I think that
can be left for another day.

The overall result of this is a considerable savings of code:

 doc/src/sgml/monitoring.sgml             |  52 ++++-----
 src/backend/access/transam/slru.c        |   6 +-
 src/backend/access/transam/xlog.c        |   9 +-
 src/backend/postmaster/pgstat.c          |  10 +-
 src/backend/replication/logical/origin.c |   8 +-
 src/backend/replication/slot.c           |   8 +-
 src/backend/storage/buffer/buf_init.c    |  16 +--
 src/backend/storage/ipc/procarray.c      |   9 +-
 src/backend/storage/lmgr/lwlock.c        | 175 ++++++++++---------------------
 src/backend/utils/mmgr/dsa.c             |  15 +--
 src/backend/utils/probes.d               |  16 +--
 src/include/access/slru.h                |   1 -
 src/include/pgstat.h                     |   3 +-
 src/include/storage/lwlock.h             |  45 ++------
 14 files changed, 112 insertions(+), 261 deletions(-)

It also noticeably reduces the number of bytes of machine code
generated for lwlock.c:

[rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo master
__TEXT    __DATA    __OBJC    others    dec    hex
11815    3360    0    46430    61605    f0a5
[rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo lwlock
__TEXT    __DATA    __OBJC    others    dec    hex
11199    3264    0    45487    59950    ea2e

That's better than a 5% reduction in the code size of a very hot
module just by removing something that almost nobody uses or cares
about.  That's with --enable-dtrace but without --enable-cassert.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Thoughts?

Hearing no objections, I've gone ahead and committed this.  If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution

From
Andres Freund
Date:
On 2016-12-16 11:41:49 -0500, Robert Haas wrote:
> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > Thoughts?
> 
> Hearing no objections, I've gone ahead and committed this.  If that
> makes somebody really unhappy I can revert it, but I am betting that
> the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such.  This is getting a
bit ridiculous.



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-12-16 11:41:49 -0500, Robert Haas wrote:
>> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > Thoughts?
>>
>> Hearing no objections, I've gone ahead and committed this.  If that
>> makes somebody really unhappy I can revert it, but I am betting that
>> the real story is that nobody cares about preserving T_ID().
>
> I don't care about T_ID, but I do care about breaking extensions using
> lwlocks like for the 3rd release in a row or such.  This is getting a
> bit ridiculous.

Hmm, I hadn't thought about that.  :-)

I guess we could put back array_base/array_stride and just ignore
them, but that hardly seems better.  Then we're stuck with that wart
forever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Fri, Dec 16, 2016 at 12:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-12-16 11:41:49 -0500, Robert Haas wrote:
>>> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> > Thoughts?
>>>
>>> Hearing no objections, I've gone ahead and committed this.  If that
>>> makes somebody really unhappy I can revert it, but I am betting that
>>> the real story is that nobody cares about preserving T_ID().
>>
>> I don't care about T_ID, but I do care about breaking extensions using
>> lwlocks like for the 3rd release in a row or such.  This is getting a
>> bit ridiculous.
>
> Hmm, I hadn't thought about that.  :-)

Err, that was supposed to be :-(  As in sad, not happy.

> I guess we could put back array_base/array_stride and just ignore
> them, but that hardly seems better.  Then we're stuck with that wart
> forever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > Thoughts?
> 
> Hearing no objections, I've gone ahead and committed this.  If that
> makes somebody really unhappy I can revert it, but I am betting that
> the real story is that nobody cares about preserving T_ID().

AFAICT the comment on LWLockRegisterTranche is confused; it talks about
an allocated object being passed, but there isn't any.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution

From
Andres Freund
Date:
On 2016-12-16 12:32:49 -0500, Robert Haas wrote:
> On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-12-16 11:41:49 -0500, Robert Haas wrote:
> >> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> > Thoughts?
> >>
> >> Hearing no objections, I've gone ahead and committed this.  If that
> >> makes somebody really unhappy I can revert it, but I am betting that
> >> the real story is that nobody cares about preserving T_ID().
> >
> > I don't care about T_ID, but I do care about breaking extensions using
> > lwlocks like for the 3rd release in a row or such.  This is getting a
> > bit ridiculous.
> 
> Hmm, I hadn't thought about that.  :-)
> 
> I guess we could put back array_base/array_stride and just ignore
> them, but that hardly seems better.  Then we're stuck with that wart
> forever.

Yea, I don't think that's good either.  I'm all for evolving APIs when
necessary, but constantly breaking the same API release after release
seems indicative of needing to spend a bit more time on it in the first
round.  I've a few extensions (one of them citus) that work across
versions, and the ifdef-ery is significant.

Andres



Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution

From
Andres Freund
Date:
On 2016-12-16 12:33:11 -0500, Robert Haas wrote:
> On Fri, Dec 16, 2016 at 12:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:
> >> On 2016-12-16 11:41:49 -0500, Robert Haas wrote:
> >>> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >>> > Thoughts?
> >>>
> >>> Hearing no objections, I've gone ahead and committed this.  If that
> >>> makes somebody really unhappy I can revert it, but I am betting that
> >>> the real story is that nobody cares about preserving T_ID().
> >>
> >> I don't care about T_ID, but I do care about breaking extensions using
> >> lwlocks like for the 3rd release in a row or such.  This is getting a
> >> bit ridiculous.
> >
> > Hmm, I hadn't thought about that.  :-)
> 
> Err, that was supposed to be :-(  As in sad, not happy.

Both work for me ;)



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Fri, Dec 16, 2016 at 12:37 PM, Andres Freund <andres@anarazel.de> wrote:
> Yea, I don't think that's good either.  I'm all for evolving APIs when
> necessary, but constantly breaking the same API release after release
> seems indicative of needing to spend a bit more time on it in the first
> round.

I am not sure the issue was time so much as the ability to foresee all
the problems we'd want to solve.  9.4 added tranches and converted
everything to LWLock * instead of LWLockId, but I think all of the old
APIs still worked.  At that point, we didn't have parallel query and
we weren't that close to having it, so I was loathe to do anything too
invasive.  9.5 removed LWLockAcquireWithVar() and added
LWLockReleaseClearVar(), but most of the API was still fine.  9.6
moved almost everything to tranches and removed RequestAddinLWLocks()
and LWLockAssign(), which was a big break for extensions -- but that
wasn't because of parallel query, but rather because we wanted to use
tranches to support the wait_event stuff and we also wanted to be able
to pad different tranches differently.  This latest change is inspired
by the fact that the 9.4-era changes to support parallel query weren't
quite smart enough to be able to cope with the possibility of multiple
tranches with the same tranche ID in a reasonable way.  That last one
is indeed an oversight but in January of 2014 it wasn't very clear
that we were going to have tranche-ified every LWLock in the system,
without which this change wouldn't be possible.  Quite a lot of work
by at least 3 or 4 different people went into that tranche-ification
effort.

I think it's quite surprising how fast the LWLock system has evolved
over the last few years.  When I first started working on PostgreSQL
in 2008, there was no LWLockAcquireOrWait, none of the Var stuff, the
padding was much less sophisticated, no tranches, no atomics, volatile
qualifiers all over the place...  and all of that has changed in the
last 5 years.  Pretty amazing, actually, IMHO.  If our LWLocks improve
as much between now and 2021 as they have between 2011 and now, it'll
be worth almost any amount of API breakage to get there.  I don't
personally have any plans or ideas that would involve breaking things
for extensions again any time soon, but I won't be very surprised if
somebody else comes up with one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Fri, Dec 16, 2016 at 12:36 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > Thoughts?
>>
>> Hearing no objections, I've gone ahead and committed this.  If that
>> makes somebody really unhappy I can revert it, but I am betting that
>> the real story is that nobody cares about preserving T_ID().
>
> AFAICT the comment on LWLockRegisterTranche is confused; it talks about
> an allocated object being passed, but there isn't any.

Oops.  Thanks, will fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution

From
Alvaro Herrera
Date:
Robert Haas wrote:

> I am not sure the issue was time so much as the ability to foresee all
> the problems we'd want to solve.

I think all that movement is okay.  It's not like we're breaking things
to no purpose.  The amount of effort that has to go into making
extensions compile with changed APIs is not *that* bad, surely; it's
pretty clear that we need to keep moving forward.  All the changes you
listed that required lwlock changed have clearly been worth the
breakage, IMO.

> I think it's quite surprising how fast the LWLock system has evolved
> over the last few years.  When I first started working on PostgreSQL
> in 2008, there was no LWLockAcquireOrWait, none of the Var stuff, the
> padding was much less sophisticated, no tranches, no atomics, volatile
> qualifiers all over the place...  and all of that has changed in the
> last 5 years.  Pretty amazing, actually, IMHO.

Yes, I agree.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Thomas Munro
Date:
On Sat, Dec 17, 2016 at 5:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Thoughts?
>
> Hearing no objections, I've gone ahead and committed this.  If that
> makes somebody really unhappy I can revert it, but I am betting that
> the real story is that nobody cares about preserving T_ID().

I suppose LWLock could include a uint16 member 'id' without making
LWLock any larger, since it would replace the padding between
'tranche' and 'state'.  But I think a better solution, if anyone
really wants these T_ID numbers from a dtrace script, would be to add
lock address to the existing lwlock probes, and then figure out a way
to add some new probes to report the base and stride in the right
places so you can do the book keeping in dtrace variables.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Sun, Dec 18, 2016 at 10:33 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sat, Dec 17, 2016 at 5:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Thoughts?
>>
>> Hearing no objections, I've gone ahead and committed this.  If that
>> makes somebody really unhappy I can revert it, but I am betting that
>> the real story is that nobody cares about preserving T_ID().
>
> I suppose LWLock could include a uint16 member 'id' without making
> LWLock any larger, since it would replace the padding between
> 'tranche' and 'state'.  But I think a better solution, if anyone
> really wants these T_ID numbers from a dtrace script, would be to add
> lock address to the existing lwlock probes, and then figure out a way
> to add some new probes to report the base and stride in the right
> places so you can do the book keeping in dtrace variables.

Interesting idea.  My bet is that nobody cares about dtrace very much.
probes.d was first added in 2006, and continued to gradually get new
probes (all from submissions by Robert Lor) until 2009.  Since then,
nothing has happened except for perfunctory maintenance by various
committers trying to solve other problems who had to maintain the work
that had already been done whether they cared about it or not.  (I
notice that the probes lwlock-acquire-or-wait and
lwlock-acquire-or-wait-fail have never been documented.)  I would be
tempted to rip the whole thing out as an attractive nuisance, except
that I bet somebody would complain about that...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Here's a new version to apply on top of dsa-v7.patch.
>
> Here's a version to go with dsa-v8.patch.

All right, so I've committed this, but not before (1) renaming some
things that you added, (2) adding documentation for the new LWLock
tranche, (3) not creating the DSA in the "simulated DSM using
backend-private memory" case, and (4) some cosmetic tweaks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Thomas Munro
Date:
On Tue, Dec 20, 2016 at 11:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> Here's a new version to apply on top of dsa-v7.patch.
>>
>> Here's a version to go with dsa-v8.patch.
>
> All right, so I've committed this, but not before (1) renaming some
> things that you added, (2) adding documentation for the new LWLock
> tranche, (3) not creating the DSA in the "simulated DSM using
> backend-private memory" case, and (4) some cosmetic tweaks.

Thanks!  Hmm, so now in rare circumstances we can finish up running
ExecXXXInitializeDSM, but later have estate->es_query_dsa == NULL.
This is something to be careful about.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

From
Robert Haas
Date:
On Mon, Dec 19, 2016 at 6:35 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Tue, Dec 20, 2016 at 11:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
>>> <thomas.munro@enterprisedb.com> wrote:
>>>> Here's a new version to apply on top of dsa-v7.patch.
>>>
>>> Here's a version to go with dsa-v8.patch.
>>
>> All right, so I've committed this, but not before (1) renaming some
>> things that you added, (2) adding documentation for the new LWLock
>> tranche, (3) not creating the DSA in the "simulated DSM using
>> backend-private memory" case, and (4) some cosmetic tweaks.
>
> Thanks!  Hmm, so now in rare circumstances we can finish up running
> ExecXXXInitializeDSM, but later have estate->es_query_dsa == NULL.
> This is something to be careful about.

Yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Creating a DSA area to provide work space forparallel execution

From
Kyotaro HORIGUCHI
Date:
At Mon, 19 Dec 2016 12:24:38 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoY7cHfGQRjYRixEsvBp63Ud9AgBuksc4oWS-Lu7tX5=TA@mail.gmail.com>
> Interesting idea.  My bet is that nobody cares about dtrace very much.

FWIW, I just had an inquiry about system tap for PostgreSQL but
he probed using function names. I mean, at least few people care
it, not nobody, but...

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center