Thread: Creating a DSA area to provide work space for parallel execution
Hi hackers, A couple of months ago I proposed dynamic shared areas[1]. DSA areas are dynamically sized shared memory heaps that backends can use to share data, building on top of the existing DSM infrastructure. One target use case for DSA areas is to provide work space for parallel query execution. To that end, here is a patch to create a DSA area for use by executor code. The area is automatically attached to the leader and all worker processes for the duration of the parallel query, and is available as estate->es_query_area. Backends already have access to shared memory through a single DSM segment managed with a table-of-contents. The TOC provides a way to carve out some shared storage space for individual executor nodes and look it up later by plan node ID. That works for things like ParallelHeapScanDescData whose size is known up front, but not so well if you need something more like a heap in which to build shared data structures. Through estate->es_query_area, a parallel-aware executor node can use and recycle arbitrary amounts of shared memory with an allocate/free interface. Motivating use cases include shared bitmaps and shared hash tables (patches to follow). Currently, this doesn't mean you don't also need the existing DSM segment. In order share data structures in the DSA area, you need a way to exchange pointers to find them, and the existing segment + TOC mechanism is ideal for that. One obvious problem is that this patch results in at least *two* DSM segments being created for every parallel query execution: the main segment used for parallel execution, and then the initial segment managed by the DSA area. One thought is that DSA areas are the more general mechanism, so perhaps we should figure out how to store contents of the existing segment in it. The TOC interface would need a few tweaks to be able to live in memory allocated with dsa_allocate, and they we'd need to share that address with other backends so that they could find it (cf the current approach of finding the TOC at the start of the segment). I haven't prototyped that yet. That'd involve changing the wording "InitializeDSM" that appears in various places including the FDW API, which has been putting me off... This patch depends on dsa-v2.patch[1]. [1] https://www.postgresql.org/message-id/flat/CAEepm%3D1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Attachment
On Wed, Oct 5, 2016 at 10:32 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > One obvious problem is that this patch results in at least *two* DSM > segments being created for every parallel query execution: the main > segment used for parallel execution, and then the initial segment > managed by the DSA area. One thought is that DSA areas are the more > general mechanism, so perhaps we should figure out how to store > contents of the existing segment in it. The TOC interface would need > a few tweaks to be able to live in memory allocated with dsa_allocate, > and they we'd need to share that address with other backends so that > they could find it (cf the current approach of finding the TOC at the > start of the segment). I haven't prototyped that yet. That'd involve > changing the wording "InitializeDSM" that appears in various places > including the FDW API, which has been putting me off... ... or we could allow DSA areas to be constructed inside existing shmem, as in the attached patch which requires dsa_create_in_place, from the patch at https://www.postgresql.org/message-id/CAEepm%3D0-pbokaQdCXhtYn%3Dw64MmdJz4hYW7qcSU235ar276x7w%40mail.gmail.com . -- Thomas Munro http://www.enterprisedb.com
Attachment
On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > ... or we could allow DSA areas to be constructed inside existing > shmem, as in the attached patch which requires dsa_create_in_place, > from the patch at > https://www.postgresql.org/message-id/CAEepm%3D0-pbokaQdCXhtYn%3Dw64MmdJz4hYW7qcSU235ar276x7w%40mail.gmail.com > . Seems like there is problem in this patch.. In below code, pei->area is not yet allocated at this point , so estate->es_query_area will always me NULL ? + estate->es_query_area = pei->area; + /* Create a parallel context. */ pcxt = CreateParallelContext(ParallelQueryMain, nworkers); pei->pcxt = pcxt; @@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers) shm_toc_estimate_keys(&pcxt->estimator, 1); } + /* Estimate space for DSA area. */ + shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE); + shm_toc_estimate_keys(&pcxt->estimator, 1); + /* Everyone's had a chance to ask for space, so now create the DSM. */ InitializeParallelDSM(pcxt); @@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers) pei->instrumentation = instrumentation; } + /* Create a DSA area that can be used by the leader and all workers. */ + area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space); + pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE, + LWTRANCHE_PARALLEL_EXEC_AREA, + "parallel query memory area"); -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
I have one more question, In V1 we were calling dsa_detach in ExecParallelCleanup and in ParallelQueryMain, but it's removed in v2. Any specific reason ? Does this need to be used differently ? ExecParallelCleanup(ParallelExecutorInfo *pei){ + if (pei->area != NULL) + { + dsa_detach(pei->area); + pei->area = NULL; + } After this changes, I am getting DSM segment leak warning. I am calling dsa_allocate and dsa_free. On Thu, Nov 24, 2016 at 8:09 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> ... or we could allow DSA areas to be constructed inside existing >> shmem, as in the attached patch which requires dsa_create_in_place, >> from the patch at >> https://www.postgresql.org/message-id/CAEepm%3D0-pbokaQdCXhtYn%3Dw64MmdJz4hYW7qcSU235ar276x7w%40mail.gmail.com >> . > Seems like there is problem in this patch.. > > In below code, pei->area is not yet allocated at this point , so > estate->es_query_area will always me NULL ? > > + estate->es_query_area = pei->area; > + > /* Create a parallel context. */ > pcxt = CreateParallelContext(ParallelQueryMain, nworkers); > pei->pcxt = pcxt; > @@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState > *estate, int nworkers) > shm_toc_estimate_keys(&pcxt->estimator, 1); > } > > + /* Estimate space for DSA area. */ > + shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE); > + shm_toc_estimate_keys(&pcxt->estimator, 1); > + > /* Everyone's had a chance to ask for space, so now create the DSM. */ > InitializeParallelDSM(pcxt); > > @@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState > *estate, int nworkers) > pei->instrumentation = instrumentation; > } > > + /* Create a DSA area that can be used by the leader and all workers. */ > + area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE); > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space); > + pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE, > + LWTRANCHE_PARALLEL_EXEC_AREA, > + "parallel query memory area"); > > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Nov 25, 2016 at 4:32 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > I have one more question, > > In V1 we were calling dsa_detach in ExecParallelCleanup and in > ParallelQueryMain, but it's removed in v2. > > Any specific reason ? > Does this need to be used differently ? > > ExecParallelCleanup(ParallelExecutorInfo *pei) > { > + if (pei->area != NULL) > + { > + dsa_detach(pei->area); > + pei->area = NULL; > + } > > After this changes, I am getting DSM segment leak warning. Thanks! I had some chicken-vs-egg problems dealing with cleanup of DSM segments belonging to DSA areas created inside DSM segments. Here's a new version to apply on top of dsa-v7.patch. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Here's a new version to apply on top of dsa-v7.patch. Here's a version to go with dsa-v8.patch. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Thu, Dec 1, 2016 at 10:35 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here's a new version to apply on top of dsa-v7.patch.
Here's a version to go with dsa-v8.patch.
Moved to next CF with "needs review" status.
Regards,
Hari Babu
Fujitsu Australia
On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Here's a new version to apply on top of dsa-v7.patch. > > Here's a version to go with dsa-v8.patch. Thomas has spent a fair amount of time beating me up off-list about the fact that we have no way to recycle LWLock tranche IDs, and I've spent a fair amount of time trying to defend that decision, but it's really a problem here. This is all well and good the way it's written provided that there's only one parallel context in existence at a time, but should that number ever exceed 1, which it can, then the tranche's array_base will be incorrect for LWLocks in one or the other tranche. That *almost* doesn't matter. If you're not compiling with dtrace or LOCK_DEBUG or LWLOCK_STATS, you'll be fine. And if you are compiling with one of those things, I believe the consequences will be no worse than an occasional nonsensical lock ID. It's halfway tempting to just accept that as a known wart, but, uggh, that sucks. It's a bit hard to come up with a better alternative. We could set aside a certain number of tranche IDs for parallel contexts, say 16. Then as long as the same backend doesn't try to do create more than 16 parallel contexts at the same time, we'll be fine. And a backend really shouldn't have more than 16 Gather nodes active at the same time. But it could. In fact, even if the planner never created more than one Gather node in the same plan (which it sometimes does), the leader can have multiple parallel contexts active for different queries at the same time. It could fire off a Gather and then later some part of that plan that's running in the master could call a function that tries to execute some OTHER query which also happens to involve a Gather and then while the master is executing its part of THAT query the same thing could happen all over again, so any arbitrary limit we install here can be exceeded in, admittedly, extremely contrived scenarios. Moreover, the LWLock tranche system itself is limited to a 16-bit integer for tranche IDs, so there can't be more than 65536 of those over the lifetime of the cluster no matter what. Since 008608b9d51061b1f598c197477b3dc7be9c4a64, that limit is rather pointless; as things stand today, we could change the tranche ID to a 32-bit integer without losing anything. But changing that might eat up bit space that somebody wants to use later to solve some other problem, and anyway by itself it doesn't fix anything. Also, it would allow LWLockTrancheArray to get regrettably large. Another alternative is to have the DSA system fix up the tranche base address every time we enter a DSA function that might need to take a lock. The DSA code never returns with any relevant locks held, so the base address can't get obsoleted by some other DSA while a lock using the old base address is still held. That's sort of ugly too, and it adds a little overhead to fix a problem that will bite almost nobody, but it's formally correct and seems fairly watertight. Basically each DSA function that needs to take a lock would need to first do this: area->lwlock_tranche.array_base = &area->control->pools[0]; Maybe that's not too bad... thoughts? BTW, I just noticed that the dsa_area_control member called "lock" is going to get a fairly random lock ID, based on the number of bytes by which "lock" follows "pools". Wouldn't it be better to put all of the LWLocks - both the general locks and the per-pool locks - in one array, probably with the general lock first, so that T_ID() will do the right thing for such locks? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Robert Haas
Date:
On Mon, Dec 5, 2016 at 3:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> Here's a new version to apply on top of dsa-v7.patch. >> >> Here's a version to go with dsa-v8.patch. > > Thomas has spent a fair amount of time beating me up off-list about > the fact that we have no way to recycle LWLock tranche IDs, and I've > spent a fair amount of time trying to defend that decision, but it's > really a problem here. This is all well and good the way it's written > provided that there's only one parallel context in existence at a > time, but should that number ever exceed 1, which it can, then the > tranche's array_base will be incorrect for LWLocks in one or the other > tranche. > > That *almost* doesn't matter. If you're not compiling with dtrace or > LOCK_DEBUG or LWLOCK_STATS, you'll be fine. And if you are compiling > with one of those things, I believe the consequences will be no worse > than an occasional nonsensical lock ID. It's halfway tempting to just > accept that as a known wart, but, uggh, that sucks. > > It's a bit hard to come up with a better alternative. After thinking about this some more, I realized that the problem here boils down to T_ID() not being smart enough to cope with multiple instances of the same tranche at different base addresses. We can either make it more complicated so that it can do that, or (drum roll, please!) get rid of it altogether. I don't think making it more complicated is very desirable, because I think that we end up computing T_ID() for every lock acquisition and release whenever --enable-dtrace is used, even if dtrace is not actually in use. And the usefulness of T_ID() for debugging is pretty marginal, with one important exception, which is that currently T_ID() is used to distinguish between individual LWLocks in the main tranche. So here's my proposal: 1. Give every LWLock in the main tranche a separate tranche ID. This has been previously proposed, so it's not a revolutionary concept. 2. Always identify LWLocks in pg_stat_activity only by tranche ID, never offset within the tranche, not even for the main tranche. This results in pg_stat_activity saying "LWLock" rather than either "LWLockNamed" or "LWLockTranche", which is a user-visible behavior change but not AFAICS a very upsetting one. 3. Change the dtrace probes that currently pass both T_NAME() and T_ID() to pass only T_NAME(). This is a minor loss of information for dtrace, but in view of (1) it's not a very significant loss. 4. Change LOCK_DEBUG and LWLOCK_STATS output that identifies locks using T_NAME() and T_ID() to instead identify them by T_NAME() and the pointer address. Since these are only developer debugging facilities not intended for production builds, I think it's OK to expose the pointer address, and it's arguably MORE useful to do so than to expose an offset into an array with an unknown base address. 5. Remove T_ID() from the can't-happen elog() in LWLockRelease(). 6. Remove T_ID() itself. And then, since that's the only client of array_base/array_stride, remove those too. And then, since there's nothing left in LWLockTranche except for the tranche name, get rid of the whole structure, simplifying a whole bunch of code all over the system. Patch implementing all of this attached. There's further simplification that could be done here -- with array_stride gone, we could do more at compile time rather than run-time -- but I think that can be left for another day. The overall result of this is a considerable savings of code: doc/src/sgml/monitoring.sgml | 52 ++++----- src/backend/access/transam/slru.c | 6 +- src/backend/access/transam/xlog.c | 9 +- src/backend/postmaster/pgstat.c | 10 +- src/backend/replication/logical/origin.c | 8 +- src/backend/replication/slot.c | 8 +- src/backend/storage/buffer/buf_init.c | 16 +-- src/backend/storage/ipc/procarray.c | 9 +- src/backend/storage/lmgr/lwlock.c | 175 ++++++++++--------------------- src/backend/utils/mmgr/dsa.c | 15 +-- src/backend/utils/probes.d | 16 +-- src/include/access/slru.h | 1 - src/include/pgstat.h | 3 +- src/include/storage/lwlock.h | 45 ++------ 14 files changed, 112 insertions(+), 261 deletions(-) It also noticeably reduces the number of bytes of machine code generated for lwlock.c: [rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo master __TEXT __DATA __OBJC others dec hex 11815 3360 0 46430 61605 f0a5 [rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo lwlock __TEXT __DATA __OBJC others dec hex 11199 3264 0 45487 59950 ea2e That's better than a 5% reduction in the code size of a very hot module just by removing something that almost nobody uses or cares about. That's with --enable-dtrace but without --enable-cassert. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Robert Haas
Date:
On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Thoughts? Hearing no objections, I've gone ahead and committed this. If that makes somebody really unhappy I can revert it, but I am betting that the real story is that nobody cares about preserving T_ID(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution
From
Andres Freund
Date:
On 2016-12-16 11:41:49 -0500, Robert Haas wrote: > On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > Thoughts? > > Hearing no objections, I've gone ahead and committed this. If that > makes somebody really unhappy I can revert it, but I am betting that > the real story is that nobody cares about preserving T_ID(). I don't care about T_ID, but I do care about breaking extensions using lwlocks like for the 3rd release in a row or such. This is getting a bit ridiculous.
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Robert Haas
Date:
On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-12-16 11:41:49 -0500, Robert Haas wrote: >> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> > Thoughts? >> >> Hearing no objections, I've gone ahead and committed this. If that >> makes somebody really unhappy I can revert it, but I am betting that >> the real story is that nobody cares about preserving T_ID(). > > I don't care about T_ID, but I do care about breaking extensions using > lwlocks like for the 3rd release in a row or such. This is getting a > bit ridiculous. Hmm, I hadn't thought about that. :-) I guess we could put back array_base/array_stride and just ignore them, but that hardly seems better. Then we're stuck with that wart forever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Robert Haas
Date:
On Fri, Dec 16, 2016 at 12:32 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote: >> On 2016-12-16 11:41:49 -0500, Robert Haas wrote: >>> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> > Thoughts? >>> >>> Hearing no objections, I've gone ahead and committed this. If that >>> makes somebody really unhappy I can revert it, but I am betting that >>> the real story is that nobody cares about preserving T_ID(). >> >> I don't care about T_ID, but I do care about breaking extensions using >> lwlocks like for the 3rd release in a row or such. This is getting a >> bit ridiculous. > > Hmm, I hadn't thought about that. :-) Err, that was supposed to be :-( As in sad, not happy. > I guess we could put back array_base/array_stride and just ignore > them, but that hardly seems better. Then we're stuck with that wart > forever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution
From
Alvaro Herrera
Date:
Robert Haas wrote: > On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > Thoughts? > > Hearing no objections, I've gone ahead and committed this. If that > makes somebody really unhappy I can revert it, but I am betting that > the real story is that nobody cares about preserving T_ID(). AFAICT the comment on LWLockRegisterTranche is confused; it talks about an allocated object being passed, but there isn't any. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution
From
Andres Freund
Date:
On 2016-12-16 12:32:49 -0500, Robert Haas wrote: > On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-12-16 11:41:49 -0500, Robert Haas wrote: > >> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> > Thoughts? > >> > >> Hearing no objections, I've gone ahead and committed this. If that > >> makes somebody really unhappy I can revert it, but I am betting that > >> the real story is that nobody cares about preserving T_ID(). > > > > I don't care about T_ID, but I do care about breaking extensions using > > lwlocks like for the 3rd release in a row or such. This is getting a > > bit ridiculous. > > Hmm, I hadn't thought about that. :-) > > I guess we could put back array_base/array_stride and just ignore > them, but that hardly seems better. Then we're stuck with that wart > forever. Yea, I don't think that's good either. I'm all for evolving APIs when necessary, but constantly breaking the same API release after release seems indicative of needing to spend a bit more time on it in the first round. I've a few extensions (one of them citus) that work across versions, and the ifdef-ery is significant. Andres
Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution
From
Andres Freund
Date:
On 2016-12-16 12:33:11 -0500, Robert Haas wrote: > On Fri, Dec 16, 2016 at 12:32 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote: > >> On 2016-12-16 11:41:49 -0500, Robert Haas wrote: > >>> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >>> > Thoughts? > >>> > >>> Hearing no objections, I've gone ahead and committed this. If that > >>> makes somebody really unhappy I can revert it, but I am betting that > >>> the real story is that nobody cares about preserving T_ID(). > >> > >> I don't care about T_ID, but I do care about breaking extensions using > >> lwlocks like for the 3rd release in a row or such. This is getting a > >> bit ridiculous. > > > > Hmm, I hadn't thought about that. :-) > > Err, that was supposed to be :-( As in sad, not happy. Both work for me ;)
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Robert Haas
Date:
On Fri, Dec 16, 2016 at 12:37 PM, Andres Freund <andres@anarazel.de> wrote: > Yea, I don't think that's good either. I'm all for evolving APIs when > necessary, but constantly breaking the same API release after release > seems indicative of needing to spend a bit more time on it in the first > round. I am not sure the issue was time so much as the ability to foresee all the problems we'd want to solve. 9.4 added tranches and converted everything to LWLock * instead of LWLockId, but I think all of the old APIs still worked. At that point, we didn't have parallel query and we weren't that close to having it, so I was loathe to do anything too invasive. 9.5 removed LWLockAcquireWithVar() and added LWLockReleaseClearVar(), but most of the API was still fine. 9.6 moved almost everything to tranches and removed RequestAddinLWLocks() and LWLockAssign(), which was a big break for extensions -- but that wasn't because of parallel query, but rather because we wanted to use tranches to support the wait_event stuff and we also wanted to be able to pad different tranches differently. This latest change is inspired by the fact that the 9.4-era changes to support parallel query weren't quite smart enough to be able to cope with the possibility of multiple tranches with the same tranche ID in a reasonable way. That last one is indeed an oversight but in January of 2014 it wasn't very clear that we were going to have tranche-ified every LWLock in the system, without which this change wouldn't be possible. Quite a lot of work by at least 3 or 4 different people went into that tranche-ification effort. I think it's quite surprising how fast the LWLock system has evolved over the last few years. When I first started working on PostgreSQL in 2008, there was no LWLockAcquireOrWait, none of the Var stuff, the padding was much less sophisticated, no tranches, no atomics, volatile qualifiers all over the place... and all of that has changed in the last 5 years. Pretty amazing, actually, IMHO. If our LWLocks improve as much between now and 2021 as they have between 2011 and now, it'll be worth almost any amount of API breakage to get there. I don't personally have any plans or ideas that would involve breaking things for extensions again any time soon, but I won't be very surprised if somebody else comes up with one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Robert Haas
Date:
On Fri, Dec 16, 2016 at 12:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> > Thoughts? >> >> Hearing no objections, I've gone ahead and committed this. If that >> makes somebody really unhappy I can revert it, but I am betting that >> the real story is that nobody cares about preserving T_ID(). > > AFAICT the comment on LWLockRegisterTranche is confused; it talks about > an allocated object being passed, but there isn't any. Oops. Thanks, will fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space for parallelexecution
From
Alvaro Herrera
Date:
Robert Haas wrote: > I am not sure the issue was time so much as the ability to foresee all > the problems we'd want to solve. I think all that movement is okay. It's not like we're breaking things to no purpose. The amount of effort that has to go into making extensions compile with changed APIs is not *that* bad, surely; it's pretty clear that we need to keep moving forward. All the changes you listed that required lwlock changed have clearly been worth the breakage, IMO. > I think it's quite surprising how fast the LWLock system has evolved > over the last few years. When I first started working on PostgreSQL > in 2008, there was no LWLockAcquireOrWait, none of the Var stuff, the > padding was much less sophisticated, no tranches, no atomics, volatile > qualifiers all over the place... and all of that has changed in the > last 5 years. Pretty amazing, actually, IMHO. Yes, I agree. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Thomas Munro
Date:
On Sat, Dec 17, 2016 at 5:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Thoughts? > > Hearing no objections, I've gone ahead and committed this. If that > makes somebody really unhappy I can revert it, but I am betting that > the real story is that nobody cares about preserving T_ID(). I suppose LWLock could include a uint16 member 'id' without making LWLock any larger, since it would replace the padding between 'tranche' and 'state'. But I think a better solution, if anyone really wants these T_ID numbers from a dtrace script, would be to add lock address to the existing lwlock probes, and then figure out a way to add some new probes to report the base and stride in the right places so you can do the book keeping in dtrace variables. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Robert Haas
Date:
On Sun, Dec 18, 2016 at 10:33 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Dec 17, 2016 at 5:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> Thoughts? >> >> Hearing no objections, I've gone ahead and committed this. If that >> makes somebody really unhappy I can revert it, but I am betting that >> the real story is that nobody cares about preserving T_ID(). > > I suppose LWLock could include a uint16 member 'id' without making > LWLock any larger, since it would replace the padding between > 'tranche' and 'state'. But I think a better solution, if anyone > really wants these T_ID numbers from a dtrace script, would be to add > lock address to the existing lwlock probes, and then figure out a way > to add some new probes to report the base and stride in the right > places so you can do the book keeping in dtrace variables. Interesting idea. My bet is that nobody cares about dtrace very much. probes.d was first added in 2006, and continued to gradually get new probes (all from submissions by Robert Lor) until 2009. Since then, nothing has happened except for perfunctory maintenance by various committers trying to solve other problems who had to maintain the work that had already been done whether they cared about it or not. (I notice that the probes lwlock-acquire-or-wait and lwlock-acquire-or-wait-fail have never been documented.) I would be tempted to rip the whole thing out as an attractive nuisance, except that I bet somebody would complain about that... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Robert Haas
Date:
On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Here's a new version to apply on top of dsa-v7.patch. > > Here's a version to go with dsa-v8.patch. All right, so I've committed this, but not before (1) renaming some things that you added, (2) adding documentation for the new LWLock tranche, (3) not creating the DSA in the "simulated DSM using backend-private memory" case, and (4) some cosmetic tweaks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Thomas Munro
Date:
On Tue, Dec 20, 2016 at 11:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> Here's a new version to apply on top of dsa-v7.patch. >> >> Here's a version to go with dsa-v8.patch. > > All right, so I've committed this, but not before (1) renaming some > things that you added, (2) adding documentation for the new LWLock > tranche, (3) not creating the DSA in the "simulated DSM using > backend-private memory" case, and (4) some cosmetic tweaks. Thanks! Hmm, so now in rare circumstances we can finish up running ExecXXXInitializeDSM, but later have estate->es_query_dsa == NULL. This is something to be careful about. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
From
Robert Haas
Date:
On Mon, Dec 19, 2016 at 6:35 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Tue, Dec 20, 2016 at 11:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro >>> <thomas.munro@enterprisedb.com> wrote: >>>> Here's a new version to apply on top of dsa-v7.patch. >>> >>> Here's a version to go with dsa-v8.patch. >> >> All right, so I've committed this, but not before (1) renaming some >> things that you added, (2) adding documentation for the new LWLock >> tranche, (3) not creating the DSA in the "simulated DSM using >> backend-private memory" case, and (4) some cosmetic tweaks. > > Thanks! Hmm, so now in rare circumstances we can finish up running > ExecXXXInitializeDSM, but later have estate->es_query_dsa == NULL. > This is something to be careful about. Yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space forparallel execution
From
Kyotaro HORIGUCHI
Date:
At Mon, 19 Dec 2016 12:24:38 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoY7cHfGQRjYRixEsvBp63Ud9AgBuksc4oWS-Lu7tX5=TA@mail.gmail.com> > Interesting idea. My bet is that nobody cares about dtrace very much. FWIW, I just had an inquiry about system tap for PostgreSQL but he probed using function names. I mean, at least few people care it, not nobody, but... regards, -- Kyotaro Horiguchi NTT Open Source Software Center