Thread: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
Add a new GUC and a reloption to enable inserts in parallel-mode. Commit 05c8482f7f added the implementation of parallel SELECT for "INSERT INTO ... SELECT ..." which may incur non-negligible overhead in the additional parallel-safety checks that it performs, even when, in the end, those checks determine that parallelism can't be used. This is normally only ever a problem in the case of when the target table has a large number of partitions. A new GUC option "enable_parallel_insert" is added, to allow insert in parallel-mode. The default is on. In addition to the GUC option, the user may want a mechanism to allow inserts in parallel-mode with finer granularity at table level. The new table option "parallel_insert_enabled" allows this. The default is true. Author: "Hou, Zhijie" Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/c8f78b616167bf8e24bc5dc69112c37755ed3058 Modified Files -------------- doc/src/sgml/config.sgml | 23 +++++++++++ doc/src/sgml/ref/alter_table.sgml | 3 +- doc/src/sgml/ref/create_table.sgml | 26 +++++++++++++ src/backend/access/common/reloptions.c | 25 +++++++++--- src/backend/optimizer/path/costsize.c | 2 + src/backend/optimizer/util/clauses.c | 34 ++++++++++++++-- src/backend/utils/misc/guc.c | 10 +++++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/bin/psql/tab-complete.c | 1 + src/include/optimizer/cost.h | 1 + src/include/utils/rel.h | 25 ++++++++++++ src/test/regress/expected/insert_parallel.out | 56 ++++++++++++++++++++++++++- src/test/regress/expected/sysviews.out | 3 +- src/test/regress/sql/insert_parallel.sql | 44 ++++++++++++++++++++- src/tools/pgindent/typedefs.list | 1 + 15 files changed, 240 insertions(+), 15 deletions(-)
On Wed, Mar 17, 2021 at 10:14 PM Amit Kapila <akapila@postgresql.org> wrote: > Add a new GUC and a reloption to enable inserts in parallel-mode. > > Commit 05c8482f7f added the implementation of parallel SELECT for > "INSERT INTO ... SELECT ..." which may incur non-negligible overhead in > the additional parallel-safety checks that it performs, even when, in the > end, those checks determine that parallelism can't be used. This is > normally only ever a problem in the case of when the target table has a > large number of partitions. > > A new GUC option "enable_parallel_insert" is added, to allow insert in > parallel-mode. The default is on. > > In addition to the GUC option, the user may want a mechanism to allow > inserts in parallel-mode with finer granularity at table level. The new > table option "parallel_insert_enabled" allows this. The default is true. I find this fairly ugly. If you can't make the cost of checking whether parallelism is safe low enough that you don't need a setting for this, then I think perhaps you shouldn't have the feature at all. In other words, I propose that you revert both this and 05c8482f7f and come back when you have a better design that doesn't introduce so much overhead. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > I find this fairly ugly. If you can't make the cost of checking > whether parallelism is safe low enough that you don't need a setting > for this, then I think perhaps you shouldn't have the feature at all. > In other words, I propose that you revert both this and 05c8482f7f and > come back when you have a better design that doesn't introduce so much > overhead. I'm +1 on that idea for a completely independent reason: 05c8482f7f is currently the easy winner for "scariest patch of the v14 cycle". I don't have any faith in it, and so I'm very concerned that it went in so late in the dev cycle. It'd be better to push some successor design early in the v15 cycle, when we'll have more time to catch problems. regards, tom lane
On Mon, Mar 22, 2021 at 8:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > I find this fairly ugly. If you can't make the cost of checking > > whether parallelism is safe low enough that you don't need a setting > > for this, then I think perhaps you shouldn't have the feature at all. > > In other words, I propose that you revert both this and 05c8482f7f and > > come back when you have a better design that doesn't introduce so much > > overhead. > > I'm +1 on that idea for a completely independent reason: 05c8482f7f > is currently the easy winner for "scariest patch of the v14 cycle". > I don't have any faith in it, and so I'm very concerned that it went > in so late in the dev cycle. It'd be better to push some successor > design early in the v15 cycle, when we'll have more time to catch > problems. > Okay, I can revert this work to avoid that risk but it would be great if you can share your thoughts on what alternative design do you have in mind, and or how it should be better implemented? Regarding performance overhead, it is for partitioned tables with a large number of partitions, and that too when the data to insert is not that much or there is parallel-unsafe clause on one of the partitions. Now, how else can we determine the parallel-safety without checking each of the partitions? We do have other partition-related gucs (enable_partition*) to avoid the partitions-related overhead so why is it so bad to have guc here (maybe the naming of guc/reloption is not good)? -- With Regards, Amit Kapila.
RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Amit Kapila <amit.kapila16@gmail.com> > Okay, I can revert this work to avoid that risk but it would be great > if you can share your thoughts on what alternative design do you have > in mind, and or how it should be better implemented? Regarding > performance overhead, it is for partitioned tables with a large number > of partitions, and that too when the data to insert is not that much > or there is parallel-unsafe clause on one of the partitions. Now, how > else can we determine the parallel-safety without checking each of the > partitions? We do have other partition-related gucs > (enable_partition*) to avoid the partitions-related overhead so why is > it so bad to have guc here (maybe the naming of guc/reloption is not > good)? > I'd love to hear even rough ideas from Robert-san. I guess something like...: * Basically, we can assume or hope that all partitions have the same parallel safety as the root partitioned table. Thatis, it'd be very rare that child partitions have different indexes, constraints, and triggers. So, we can just checkthe root table during planning to decide if we want to run parallel processing. * When the executor opens a target child partition, it checks its parallel safety only once for it. If any target partitionturns out to be parallel unsafe, have the parallel leader do the insertion shomehow. TBH, I don't think the parameter is so ugly. At least, it's not worse than Oracle or SQL Server. Regards Takayuki Tsunakawa
Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
Andres Freund
Date:
Hi, On 2021-03-22 08:47:47 -0400, Robert Haas wrote: > I find this fairly ugly. If you can't make the cost of checking > whether parallelism is safe low enough that you don't need a setting > for this, then I think perhaps you shouldn't have the feature at all. > In other words, I propose that you revert both this and 05c8482f7f and > come back when you have a better design that doesn't introduce so much > overhead. I started out wondering whether some of the caching David Rowley has been working on to reduce the overhead of the result cache planning shows a path for how to make parts of this cheaper. https://postgr.es/m/CAApHDvpw5qG4vTb%2BhwhmAJRid6QF%2BR9vvtOhX2HN2Yy9%2Bw20sw%40mail.gmail.com But looking more, several of the checks just seem wrong to me. target_rel_index_max_parallel_hazard() deparses index expressions from scratch? With code like + index_rel = index_open(index_oid, lockmode); ... + index_oid_list = RelationGetIndexList(rel); + foreach(lc, index_oid_list) ... + ii_Expressions = RelationGetIndexExpressions(index_rel); ... + Assert(index_expr_item != NULL); + if (index_expr_item == NULL) /* shouldn't happen */ + { + elog(WARNING, "too few entries in indexprs list"); + context->max_hazard = PROPARALLEL_UNSAFE; + found_max_hazard = true; + break; + } Brrr. Shouldn't we have this in IndexOptInfo already? But also, why on earth is that WARNING branch a good idea? +static bool +target_rel_domain_max_parallel_hazard(Oid typid, max_parallel_hazard_context *context) ... + scan = systable_beginscan(con_rel, ConstraintTypidIndexId, true, + NULL, 1, key); + + while (HeapTupleIsValid((tup = systable_getnext(scan)))) There's plenty other code in the planner that needs to know about domains. This stuff is precisely why the typecache exists. The pattern to rebuild information that we already have cached elsewhere seems to repeat all over this patch. This seems not even close to committable. Greetings, Andres Freund
Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
Andres Freund
Date:
Hi, On 2021-03-23 09:01:13 +0530, Amit Kapila wrote: > Okay, I can revert this work to avoid that risk but it would be great > if you can share your thoughts on what alternative design do you have > in mind, and or how it should be better implemented? Regarding > performance overhead, it is for partitioned tables with a large number > of partitions, and that too when the data to insert is not that much > or there is parallel-unsafe clause on one of the partitions. Now, how > else can we determine the parallel-safety without checking each of the > partitions? You cache it. And actually use information that is already cached. There's a huge difference between doing expensive stuff once every now and then, and doing it over and over and over. That's why we have relcache, syscache, typecache etc. > We do have other partition-related gucs (enable_partition*) to avoid > the partitions-related overhead so why is it so bad to have guc here > (maybe the naming of guc/reloption is not good)? Most of those seem more about risk-reduction. And they don't have reloptions - which seems to be the only way this feature can realistically be used. The one halfway comparable GUC is enable_partitionwise_join - while it'd be great to not need it (or at least not default to off), it implies some legitimately computationally hard work that can't easily be cached. Greetings, Andres Freund
On Tue, Mar 23, 2021 at 3:13 PM Andres Freund <andres@anarazel.de> wrote: > You cache it. Yeah, exactly. I don't think it's super-easy to understand exactly how to make that work well for something like this. It would be easy enough to set a flag in the relcache whose value is computed the first time we need it and is then consulted every time after that, and you just invalidate it based on sinval messages. But, if you go with that design, you've got a big problem: now an insert has to lock all the tables in the partitioning hierarchy to decide whether it can run in parallel or not, and we do not want that. We want to be able to just lock the partitions we need, so really, we want to be able to test for parallel-safety without requiring a relation lock, or only requiring it on the partitioned table itself and not all the partitions. However, that introduces a race condition, because if you ever check the value of the flag without a lock on the relation, then you can't rely on sinval to blow away the cached state. I don't have a good solution to that problem in mind right now, because I haven't really devoted much time to thinking about it, but I think it might be possible to solve it with more thought. But if I had thought about it and had not come up with anything better than what you committed here, I would have committed nothing, and I think that's what you should have done, too. This patch is full of grotty hacks. Just to take one example, consider the code that forces a transaction ID assignment prior to the operation. You *could* have solved this problem by introducing new machinery to make it safe to assign an XID in parallel mode. Then, we'd have a fundamental new capability that we currently lack. Instead, you just force-assigned an XID before entering parallel mode. That's not building any new capability; that's just hacking around the lack of a capability to make something work, while ignoring the disadvantages of doing it that way, namely that sometimes an XID will be assigned for no purpose. Likewise, the XXX comment you added to max_parallel_hazard_walker claims that some of the code introduced there is to compensate for an unspecified bug in the rewriter. I'm a bit skeptical that the comment is correct, and there's no way to find out because the comment doesn't say what the bug supposedly is, but let's just say for the sake of argument that it's true. Well, you *could* have fixed the bug, but instead you hacked around it, and in a relatively expensive way that affects every query with a CTE in it whether it can benefit from this patch or not. That's not a responsible way of maintaining the core PostgreSQL code. I also agree with Andres's criticism of the code in target_rel_index_max_parallel_hazard entirely. It's completely unacceptable to be doing index_open() here. If you don't understand the design of the planner well enough to know why that's not OK, then you shouldn't be committing patches like this. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
Alvaro Herrera
Date:
On 2021-Mar-23, Robert Haas wrote: > Likewise, the XXX comment you added to max_parallel_hazard_walker > claims that some of the code introduced there is to compensate for an > unspecified bug in the rewriter. I'm a bit skeptical that the comment > is correct, and there's no way to find out because the comment doesn't > say what the bug supposedly is, but let's just say for the sake of > argument that it's true. Well, you *could* have fixed the bug, but > instead you hacked around it, and in a relatively expensive way that > affects every query with a CTE in it whether it can benefit from this > patch or not. That's not a responsible way of maintaining the core > PostgreSQL code. I think the CTE bug is this one: https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com while I can't disagree with the overall conclusion that it seems safer to revert parallel INSERT/SELECT given the number of alleged problems, it is true that this bug exists, and has gone unfixed. -- Álvaro Herrera Valdivia, Chile
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2021-Mar-23, Robert Haas wrote: >> Likewise, the XXX comment you added to max_parallel_hazard_walker >> claims that some of the code introduced there is to compensate for an >> unspecified bug in the rewriter. > I think the CTE bug is this one: > https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com > while I can't disagree with the overall conclusion that it seems safer > to revert parallel INSERT/SELECT given the number of alleged problems, > it is true that this bug exists, and has gone unfixed. Yeah, because it's not totally clear whether we want to fix it by disallowing the case, or by significantly changing the way the rewriter works. It'd be good if some of the folks on this thread weighed in on that choice. (Having said that, another complaint about this particular comment is that it doesn't mention how the planner should be changed once the rewriter is fixed. Is it sufficient to delete the stanza below the comment? Just looking at it, I wonder whether max_parallel_hazard_walker is now being invoked on portions of the Query tree that we'd not have to traverse at all given a rewriter fix.) regards, tom lane
On Wed, Mar 24, 2021 at 2:00 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2021-Mar-23, Robert Haas wrote: > > > Likewise, the XXX comment you added to max_parallel_hazard_walker > > claims that some of the code introduced there is to compensate for an > > unspecified bug in the rewriter. I'm a bit skeptical that the comment > > is correct, and there's no way to find out because the comment doesn't > > say what the bug supposedly is, but let's just say for the sake of > > argument that it's true. Well, you *could* have fixed the bug, but > > instead you hacked around it, and in a relatively expensive way that > > affects every query with a CTE in it whether it can benefit from this > > patch or not. That's not a responsible way of maintaining the core > > PostgreSQL code. > > I think the CTE bug is this one: > > https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com > > while I can't disagree with the overall conclusion that it seems safer > to revert parallel INSERT/SELECT given the number of alleged problems, > Agreed. I'll revert the patch and respond to some of the points here with my thoughts. -- With Regards, Amit Kapila.
On Wed, Mar 24, 2021 at 2:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > On 2021-Mar-23, Robert Haas wrote: > >> Likewise, the XXX comment you added to max_parallel_hazard_walker > >> claims that some of the code introduced there is to compensate for an > >> unspecified bug in the rewriter. > > > I think the CTE bug is this one: > > https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com > > while I can't disagree with the overall conclusion that it seems safer > > to revert parallel INSERT/SELECT given the number of alleged problems, > > it is true that this bug exists, and has gone unfixed. > > Yeah, because it's not totally clear whether we want to fix it by > disallowing the case, or by significantly changing the way the > rewriter works. It'd be good if some of the folks on this thread > weighed in on that choice. > > (Having said that, another complaint about this particular comment > is that it doesn't mention how the planner should be changed once > the rewriter is fixed. Is it sufficient to delete the stanza below > the comment? > Yes. > Just looking at it, I wonder whether > max_parallel_hazard_walker is now being invoked on portions of the > Query tree that we'd not have to traverse at all given a rewriter fix.) > I remember that I have debugged that part and information was available, so decided to add a check based on that. I just went with your analysis in the CTE bug discussion that it is not worth and OTOH here that information was available, so I thought that it might be a good compromise but still, I or someone else could have weighed in on that choice. -- With Regards, Amit Kapila.
RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com> > Yeah, exactly. I don't think it's super-easy to understand exactly how > to make that work well for something like this. It would be easy > enough to set a flag in the relcache whose value is computed the first > time we need it and is then consulted every time after that, and you > just invalidate it based on sinval messages. But, if you go with that > design, you've got a big problem: now an insert has to lock all the > tables in the partitioning hierarchy to decide whether it can run in > parallel or not, and we do not want that. We want to be able to just > lock the partitions we need, so really, we want to be able to test for > parallel-safety without requiring a relation lock, or only requiring > it on the partitioned table itself and not all the partitions. > However, that introduces a race condition, because if you ever check > the value of the flag without a lock on the relation, then you can't > rely on sinval to blow away the cached state. I don't have a good > solution to that problem in mind right now, because I haven't really > devoted much time to thinking about it, but I think it might be > possible to solve it with more thought. One problem with caching the result is that the first access in each session has to experience the slow processing. Somesevere customers of our proprietary database, which is not based on Postgres, have requested to eliminate even the overheadassociated with the first access, and we have provided features for them. As for the data file, users can use pg_prewam. But what can we recommend users to do in this case? Maybe the logon trigger feature, which is ready for committerin PG 14, can be used to allow users to execute possible queries at session start (or establishing a connectionpool), but I feel it's inconvenient. > But if I had thought about it and had not come up with anything better > than what you committed here, I would have committed nothing, and I > think that's what you should have done, too. This patch is full of > grotty hacks. Just to take one example, consider the code that forces > a transaction ID assignment prior to the operation. You *could* have > solved this problem by introducing new machinery to make it safe to > assign an XID in parallel mode. Then, we'd have a fundamental new > capability that we currently lack. Instead, you just force-assigned an > XID before entering parallel mode. That's not building any new > capability; that's just hacking around the lack of a capability to > make something work, while ignoring the disadvantages of doing it that > way, namely that sometimes an XID will be assigned for no purpose. Regarding the picked xid assignment, I didn't think it's so grotty. Yes, in fact, I felt it's a bit unclean. But it's onlya single line of code. With a single line of code, we can provide great value to users. Why don't we go for it? Asdiscussed in the thread, the xid is wasted only when the source data is empty, which is impractical provided that the userwants to load much data probably for ETL. (I'm afraid "grotty" may be too strong a word considering the CoC statement "We encourage thoughtful, constructive discussionof the software and this community, their current state, and possible directions for development. The focus ofour discussions should be the code and related technology, community projects, and infrastructure.") > Likewise, the XXX comment you added to max_parallel_hazard_walker > claims that some of the code introduced there is to compensate for an > unspecified bug in the rewriter. I'm a bit skeptical that the comment > is correct, and there's no way to find out because the comment doesn't > say what the bug supposedly is, but let's just say for the sake of > argument that it's true. Well, you *could* have fixed the bug, but > instead you hacked around it, and in a relatively expensive way that > affects every query with a CTE in it whether it can benefit from this > patch or not. That's not a responsible way of maintaining the core > PostgreSQL code. It'd be too sad if we have to be bothered by an existing bug and give up an attractive feature. Adding more explanationin the comment is OK? Anyway, I think we can separate this issue. > I also agree with Andres's criticism of the code in > target_rel_index_max_parallel_hazard entirely. It's completely > unacceptable to be doing index_open() here. If you don't understand > the design of the planner well enough to know why that's not OK, then > you shouldn't be committing patches like this. This sounds like something to address. I have to learn... Regards Takayuki Tsunakawa
Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
Greg Nancarrow
Date:
On Wed, Mar 24, 2021 at 5:44 AM Andres Freund <andres@anarazel.de> wrote: > > > But looking more, several of the checks just seem wrong to me. > > target_rel_index_max_parallel_hazard() deparses index expressions from > scratch? With code like > > + index_rel = index_open(index_oid, lockmode); > ... > + index_oid_list = RelationGetIndexList(rel); > + foreach(lc, index_oid_list) > ... > + ii_Expressions = RelationGetIndexExpressions(index_rel); > ... > > + Assert(index_expr_item != NULL); > + if (index_expr_item == NULL) /* shouldn't happen */ > + { > + elog(WARNING, "too few entries in indexprs list"); > + context->max_hazard = PROPARALLEL_UNSAFE; > + found_max_hazard = true; > + break; > + } > > Brrr. > > Shouldn't we have this in IndexOptInfo already? The additional parallel-safety checks are (at least currently) invoked as part of max_parallel_hazard(), which is called early on in the planner, so I don't believe that IndexOptInfo/RelOptInfo has been built at this point. > But also, why on earth > is that WARNING branch a good idea? > I think there are about half a dozen other places in the Postgres code where the same check is done, in which case it calls elog(ERROR,...). Is it a better strategy to immediately exit the backend with an error in this case, like the other cases do? My take on it was that if this "should never happen" condition is ever encountered, let it back out of the parallel-safety checks and error-out in the place it normally (currently) would. > > +static bool > +target_rel_domain_max_parallel_hazard(Oid typid, max_parallel_hazard_context *context) > ... > + scan = systable_beginscan(con_rel, ConstraintTypidIndexId, true, > + NULL, 1, key); > + > + while (HeapTupleIsValid((tup = systable_getnext(scan)))) > > There's plenty other code in the planner that needs to know about > domains. This stuff is precisely why the typecache exists. > > OK, fair comment. Regards, Greg Nancarrow Fujitsu Australia
Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
Andres Freund
Date:
Hi, On 2021-03-24 14:42:44 +1100, Greg Nancarrow wrote: > On Wed, Mar 24, 2021 at 5:44 AM Andres Freund <andres@anarazel.de> wrote: > > Shouldn't we have this in IndexOptInfo already? > > The additional parallel-safety checks are (at least currently) invoked > as part of max_parallel_hazard(), which is called early on in the > planner, so I don't believe that IndexOptInfo/RelOptInfo has been > built at this point. Then that's something you need to redesign, not duplicate the effort. > > But also, why on earth > > is that WARNING branch a good idea? > I think there are about half a dozen other places in the Postgres code > where the same check is done, in which case it calls elog(ERROR,...). > Is it a better strategy to immediately exit the backend with an error > in this case, like the other cases do? Yes. > My take on it was that if this "should never happen" condition is ever > encountered, let it back out of the parallel-safety checks and > error-out in the place it normally (currently) would. What advantage does that have? You'll get a bunch of WARNINGs before the ERROR later in optimize, differences between assert-non/assert builds, more complicated code flow, longer untested code. Greetings, Andres Freund
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Mar 23, 2021 at 3:13 PM Andres Freund <andres@anarazel.de> wrote: >> You cache it. > Yeah, exactly. I don't think it's super-easy to understand exactly how > to make that work well for something like this. It would be easy > enough to set a flag in the relcache whose value is computed the first > time we need it and is then consulted every time after that, and you > just invalidate it based on sinval messages. But, if you go with that > design, you've got a big problem: now an insert has to lock all the > tables in the partitioning hierarchy to decide whether it can run in > parallel or not, and we do not want that. Possibly-crazy late-night idea ahead: IIUC, we need to know a global property of a partitioning hierarchy: is every trigger, CHECK constraint, etc that might be run by an INSERT parallel-safe? What we see here is that reverse-engineering that property every time we need to know it is just too expensive, even with use of our available caching methods. How about a declarative approach instead? That is, if a user would like parallelized inserts into a partitioned table, she must declare the table parallel-safe with some suitable annotation. Then, checking the property during DML is next door to free, and instead we have to think about whether and how to enforce that the marking is valid during DDL. I don't honestly see a real cheap way to enforce such a property. For instance, if someone does ALTER FUNCTION to remove a function's parallel-safe marking, we can't really run around and verify that the function is not used in any CHECK constraint. (Aside from the cost, there would be race conditions.) But maybe we don't have to enforce it exactly. It could be on the user's head that the marking is accurate. We could prevent any really bad misbehavior by having parallel workers error out if they see they've been asked to execute a non-parallel-safe function. Or there are probably other ways to slice it up. But I think some outside-the-box thinking might be helpful here. regards, tom lane
On Wed, Mar 24, 2021 at 8:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Mar 24, 2021 at 2:00 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2021-Mar-23, Robert Haas wrote: > > > > > Likewise, the XXX comment you added to max_parallel_hazard_walker > > > claims that some of the code introduced there is to compensate for an > > > unspecified bug in the rewriter. I'm a bit skeptical that the comment > > > is correct, and there's no way to find out because the comment doesn't > > > say what the bug supposedly is, but let's just say for the sake of > > > argument that it's true. Well, you *could* have fixed the bug, but > > > instead you hacked around it, and in a relatively expensive way that > > > affects every query with a CTE in it whether it can benefit from this > > > patch or not. That's not a responsible way of maintaining the core > > > PostgreSQL code. > > > > I think the CTE bug is this one: > > > > https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com > > > > while I can't disagree with the overall conclusion that it seems safer > > to revert parallel INSERT/SELECT given the number of alleged problems, > > > > Agreed. I'll revert the patch and respond to some of the points here > with my thoughts. > BTW, I think the revert needs catversion bump as commit c5be48f092016b1caf597b2e21d588b56c88a23e has changed catalog. I hope that is fine but if not do let me know? -- With Regards, Amit Kapila.
Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
Andres Freund
Date:
Hi, On 2021-03-23 16:04:59 -0400, Robert Haas wrote: > It would be easy enough to set a flag in the relcache whose value is > computed the first time we need it and is then consulted every time > after that, and you just invalidate it based on sinval messages. But, > if you go with that design, you've got a big problem: now an insert > has to lock all the tables in the partitioning hierarchy to decide > whether it can run in parallel or not, and we do not want that. We > want to be able to just lock the partitions we need, so really, we > want to be able to test for parallel-safety without requiring a > relation lock, or only requiring it on the partitioned table itself > and not all the partitions. That seems like an argument for a pg_class attribute about parallel safety of the current relation *and* its children. It'd definitely mean recursing higher in the partition tree during DDL if the action on a child partition causes the safety to flip. > But if I had thought about it and had not come up with anything better > than what you committed here, I would have committed nothing, and I > think that's what you should have done, too. I agree with that. > Just to take one example, consider the code that forces a transaction > ID assignment prior to the operation. You *could* have solved this > problem by introducing new machinery to make it safe to assign an XID > in parallel mode. Then, we'd have a fundamental new capability that we > currently lack. Instead, you just force-assigned an XID before > entering parallel mode. That's not building any new capability; that's > just hacking around the lack of a capability to make something work, > while ignoring the disadvantages of doing it that way, namely that > sometimes an XID will be assigned for no purpose. Although this specific hack doesn't seem too terrible to me. If you execute a parallel insert the likelihood to end up not needing an xid is pretty low. Implementing it concurrently does seem like it'd end up needing another lwlock nested around xid assignment, or some more complicated scheme with already holding XidGenLock or retries. But maybe I'm missing an easy solution here. Greetings, Andres Freund
Amit Kapila <amit.kapila16@gmail.com> writes: > BTW, I think the revert needs catversion bump as commit > c5be48f092016b1caf597b2e21d588b56c88a23e has changed catalog. I hope > that is fine but if not do let me know? Yeah, just advance catversion another time. regards, tom lane
RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tom Lane <tgl@sss.pgh.pa.us> > Possibly-crazy late-night idea ahead: > > IIUC, we need to know a global property of a partitioning hierarchy: > is every trigger, CHECK constraint, etc that might be run by an INSERT > parallel-safe? What we see here is that reverse-engineering that > property every time we need to know it is just too expensive, even > with use of our available caching methods. > > How about a declarative approach instead? That is, if a user would > like parallelized inserts into a partitioned table, she must declare > the table parallel-safe with some suitable annotation. Then, checking > the property during DML is next door to free, and instead we have to think > about whether and how to enforce that the marking is valid during DDL. > > I don't honestly see a real cheap way to enforce such a property. > For instance, if someone does ALTER FUNCTION to remove a function's > parallel-safe marking, we can't really run around and verify that the > function is not used in any CHECK constraint. (Aside from the cost, > there would be race conditions.) I thought of a similar idea as below, which I was most reluctant to adopt. https://www.postgresql.org/message-id/TYAPR01MB29907AE025B60A1C2CA5F08DFEA70%40TYAPR01MB2990.jpnprd01.prod.outlook.com -------------------------------------------------- (3) Record the parallel safety in system catalog Add a column like relparallel in pg_class that indicates the parallel safety of the relation. planner just checks the valueinstead of doing heavy work for every SQL statement. That column's value is modified whenever a relation alterationis made that affects the parallel safety, such as adding a domain column and CHECK constraint. In case of a partitionedrelation, the parallel safety attributes of all its descendant relations are merged. For example, if a partitionbecomes parallel-unsafe, the ascendant partitioned tables also become parallel-unsafe. But... developing such code would be burdonsome and bug-prone? -------------------------------------------------- > But maybe we don't have to enforce it exactly. It could be on the > user's head that the marking is accurate. We could prevent any > really bad misbehavior by having parallel workers error out if they > see they've been asked to execute a non-parallel-safe function. I'm wondering if we can do so as I mentioned yesterday; the parallel worker delegates the work to the parallel leader whenthe target relation or related functions is not parallel-safe. Regards Takayuki Tsunakawa
On Tue, Mar 23, 2021 at 11:13 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > One problem with caching the result is that the first access in each session has to experience the slow processing. Somesevere customers of our proprietary database, which is not based on Postgres, have requested to eliminate even the overheadassociated with the first access, and we have provided features for them. As for the data file, users can use pg_prewam. But what can we recommend users to do in this case? Maybe the logon trigger feature, which is ready for committerin PG 14, can be used to allow users to execute possible queries at session start (or establishing a connectionpool), but I feel it's inconvenient. Well, I don't mind if somebody thinks up an even better solution. > Regarding the picked xid assignment, I didn't think it's so grotty. Yes, in fact, I felt it's a bit unclean. But it'sonly a single line of code. With a single line of code, we can provide great value to users. Why don't we go for it? As discussed in the thread, the xid is wasted only when the source data is empty, which is impractical provided thatthe user wants to load much data probably for ETL. The amount of code isn't the issue. I'd rather expend a little more code and solve the problem in a better way. > (I'm afraid "grotty" may be too strong a word considering the CoC statement "We encourage thoughtful, constructive discussionof the software and this community, their current state, and possible directions for development. The focus ofour discussions should be the code and related technology, community projects, and infrastructure.") I did not mean to give offense, but I also don't think grotty is a strong word. I consider it a pretty mild word. > > Likewise, the XXX comment you added to max_parallel_hazard_walker > > claims that some of the code introduced there is to compensate for an > > unspecified bug in the rewriter. I'm a bit skeptical that the comment > > is correct, and there's no way to find out because the comment doesn't > > say what the bug supposedly is, but let's just say for the sake of > > argument that it's true. Well, you *could* have fixed the bug, but > > instead you hacked around it, and in a relatively expensive way that > > affects every query with a CTE in it whether it can benefit from this > > patch or not. That's not a responsible way of maintaining the core > > PostgreSQL code. > > It'd be too sad if we have to be bothered by an existing bug and give up an attractive feature. Adding more explanationin the comment is OK? Anyway, I think we can separate this issue. I don't think I agree. These checks are adding a significant amount of overhead, and one of the problems with this whole thing is that it adds a lot of overhead. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Mar 24, 2021 at 12:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > How about a declarative approach instead? That is, if a user would > like parallelized inserts into a partitioned table, she must declare > the table parallel-safe with some suitable annotation. Then, checking > the property during DML is next door to free, and instead we have to think > about whether and how to enforce that the marking is valid during DDL. > > I don't honestly see a real cheap way to enforce such a property. > For instance, if someone does ALTER FUNCTION to remove a function's > parallel-safe marking, we can't really run around and verify that the > function is not used in any CHECK constraint. (Aside from the cost, > there would be race conditions.) > > But maybe we don't have to enforce it exactly. It could be on the > user's head that the marking is accurate. We could prevent any > really bad misbehavior by having parallel workers error out if they > see they've been asked to execute a non-parallel-safe function. > > Or there are probably other ways to slice it up. But I think some > outside-the-box thinking might be helpful here. It's a possibility. It's a bit different from my decision to mark functions as PARALLEL SAFE/RESTRICTED/UNSAFE because, if you wanted to deduce that without an explicit marking, you'd need to solve the halting problem, which I've heard is fairly difficult. In this case, though, the problem is solvable with a linear-time algorithm. It's possible that there is nothing better, but I'm not sure. One idea would be to try to cache some state in shared memory. That wouldn't work for an unbounded number of relations, at least not unless we used DSA, but you could have a hash table with a configurable number of slots and make the default big enough that it would bother few people in practice. There might be some other details about partitioned relations that would be useful to cache, too. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Mar 24, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote: > That seems like an argument for a pg_class attribute about parallel > safety of the current relation *and* its children. It'd definitely mean > recursing higher in the partition tree during DDL if the action on a > child partition causes the safety to flip. That could be an issue, both because locking parents is not very nice from a concurrency standpoint, and also because it introduces deadlock hazards. > Although this specific hack doesn't seem too terrible to me. If you > execute a parallel insert the likelihood to end up not needing an xid is > pretty low. Implementing it concurrently does seem like it'd end up > needing another lwlock nested around xid assignment, or some more > complicated scheme with already holding XidGenLock or retries. But maybe > I'm missing an easy solution here. I don't think you need to do anything that is known outside the group of processes involved in the parallel query. I think you just need to make sure that only one of them is trying to acquire an XID at a time, and that all the others find out about it. I haven't thought too hard about the timing: if one process acquires an XID for the transaction, is it OK if the others do an arbitrary amount of work before they realize that this has happened? Also, there's the problem that the leader has the whole transaction stack and the workers don't, so the recursive nature of XID acquisition is a problem. I suspect these are all pretty solvable problems; I just haven't put in the energy. But, it could also be that I'm missing something. I think we should be trying, though. Some of the limitations of parallel query are unavoidable: there's always going to be a certain amount of parallel-unsafe stuff out there, and we just have to not use parallelism in those cases. But, some things - and I think this is probably one of them - are just limitations of the current implementation, and we should be looking to fix those. If we just accept that the infrastructure limitations are what they are and skate around them to make a few more things work, we're going to get less and less real improvement from every new project, and any future infrastructure improvements that somebody does want to make are going to have to deal with all the odd special cases we've introduced to get those improvements. Now, just to be clear, I'm completely in favor of incremental improvements in cases where that can be done without too much ugliness. There have been some great examples of that with parallel query already, like the work David Rowley did to allow parallel sequential scans to allocate multiple blocks at a time, or, looking back further, his work on parallel aggregate. I'm not saying those projects were easy; I know they were hard. But, they could basically use the infrastructure that had been created by previous commits to new stuff in a way that was reasonably localized. But, when Thomas Munro worked on parallel hash join, he had to first invent a whole new way of allocating memory. Trying to do that project without inventing a new way to allocate memory would have been a really bad decision. It would have boxed us into all sorts of unpleasant corners where a lot of stuff didn't really work and finding any further improvements was hard. But, because he did invent that new way, it can now be used for lots of other things, and already has been. That infrastructure work made future projects *easier*. And one of the big problems that I have with this feature is that, as it seems to me right now, there wasn't any real new infrastructure built, even though doing this right really requires it. I think there is room for debate about whether the particular thing that I'm complaining about here is mandatory for this feature, although I am strongly of the position that should be tried. We might consider forcing an XID assignment to be acceptable for a bulk-insert case, but what happens when somebody wants to extend this to updates and deletes? There's going to be a strong temptation to just double down on the same design, and that's a very dangerous direction in my opinion. But, even if you don't think that this *particular* infrastructure improvement ought to be the job of this patch, I think this patch clearly needed to do more infrastructure improvement than it did. Your comments about the IndexOptInfo stuff are another example, and the rewriter bug is a third. -- Robert Haas EDB: http://www.enterprisedb.com
RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com> > The amount of code isn't the issue. I'd rather expend a little more > code and solve the problem in a better way. Reading your reply to Andres-san, I feel sympathy about your attitude. Maybe we should outline the (rough) design first,discuss/guess its complexity, and ask for opinions on whether it's worth expending our effort for the complexity orchoose an easier hack, considering the assumed use case and ensuring future extensibility not to prevent smooth enhancements. With that said, I think the easy hack this time is good, because parallel INSERT/UPDATE/DELETE will only be used infrequently(compared to short OLTP transactions) for data migration and batch processing where the source data is ample. I don't think of a use case yet where we want to make parallel workers allocate an XID when only necessary. > I did not mean to give offense, but I also don't think grotty is a > strong word. I consider it a pretty mild word. Thank you for telling me the impression of the word. TBH, I knew the word for the first time, and looked it up in English-Japanesedictionary. It didn't give good impression, and grotty sounds somewhat similar to grotesque, so I got worried"Robert-san may be angry." Regards Takayuki Tsunakawa
On Wed, Mar 24, 2021 at 5:45 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Mar 24, 2021 at 12:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > How about a declarative approach instead? That is, if a user would > > like parallelized inserts into a partitioned table, she must declare > > the table parallel-safe with some suitable annotation. Then, checking > > the property during DML is next door to free, and instead we have to think > > about whether and how to enforce that the marking is valid during DDL. > > > > I don't honestly see a real cheap way to enforce such a property. > > For instance, if someone does ALTER FUNCTION to remove a function's > > parallel-safe marking, we can't really run around and verify that the > > function is not used in any CHECK constraint. (Aside from the cost, > > there would be race conditions.) > > > > But maybe we don't have to enforce it exactly. It could be on the > > user's head that the marking is accurate. We could prevent any > > really bad misbehavior by having parallel workers error out if they > > see they've been asked to execute a non-parallel-safe function. > > If we want to do something like this then we might want to provide a function is_dml_rel_parallel_safe(relid/relname) (a better name for a function could be used) where we can check all global properties of relation and report whether it is safe or not to perform dml and additionally we can report the unsafe property if it is unsafe? This will provide a way for users to either update the parallel-safe property of a relation or do something about the parallel-unsafe property of the relation. > > Or there are probably other ways to slice it up. But I think some > > outside-the-box thinking might be helpful here. > > It's a possibility. It's a bit different from my decision to mark > functions as PARALLEL SAFE/RESTRICTED/UNSAFE because, if you wanted to > deduce that without an explicit marking, you'd need to solve the > halting problem, which I've heard is fairly difficult. In this case, > though, the problem is solvable with a linear-time algorithm. It's > possible that there is nothing better, but I'm not sure. > > One idea would be to try to cache some state in shared memory. That > wouldn't work for an unbounded number of relations, at least not > unless we used DSA, but you could have a hash table with a > configurable number of slots and make the default big enough that it > would bother few people in practice. There might be some other details > about partitioned relations that would be useful to cache, too. > Wouldn't we need to invalidate the hash entries as soon as something parallel-unsafe is associated with them? If so, how is this better than setting a flag in relcache? -- With Regards, Amit Kapila.
On Thu, Mar 25, 2021 at 5:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Wouldn't we need to invalidate the hash entries as soon as something > parallel-unsafe is associated with them? Yes. > If so, how is this better > than setting a flag in relcache? You can't legally access a flag in the relcache without taking a relation lock. If it's possible to avoid that requirement by doing this some other way, it would be a big win. I'm not sure whether it is or exactly what would be involved, but relying on the relcache/sinval interaction surely won't work. Also, it would address the concern Takayuki-san raised about having to recompute this in each session. I don't think it would be worth doing for that reason alone, but it could be a side benefit. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 25, 2021 at 11:45 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 25, 2021 at 5:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Wouldn't we need to invalidate the hash entries as soon as something > > parallel-unsafe is associated with them? > > Yes. > > > If so, how is this better > > than setting a flag in relcache? > > You can't legally access a flag in the relcache without taking a > relation lock. > But won't some form of lock is required for each rel entry in the hash table as well for the same duration as is required for rel? Because otherwise, while we are processing the statement or other relations in the query, something parallel-unsafe could be attached to that corresponding rel entry in the hash table. And, I feel probably some concurrency bottleneck might happen because DDL/DML needs to access this table at the same time. -- With Regards, Amit Kapila.
On Wed, Mar 24, 2021 at 12:14 AM Andres Freund <andres@anarazel.de> wrote: > > On 2021-03-22 08:47:47 -0400, Robert Haas wrote: > > I find this fairly ugly. If you can't make the cost of checking > > whether parallelism is safe low enough that you don't need a setting > > for this, then I think perhaps you shouldn't have the feature at all. > > In other words, I propose that you revert both this and 05c8482f7f and > > come back when you have a better design that doesn't introduce so much > > overhead. > > I started out wondering whether some of the caching David Rowley has been > working on to reduce the overhead of the result cache planning shows a > path for how to make parts of this cheaper. > https://postgr.es/m/CAApHDvpw5qG4vTb%2BhwhmAJRid6QF%2BR9vvtOhX2HN2Yy9%2Bw20sw%40mail.gmail.com > > But looking more, several of the checks just seem wrong to me. > > target_rel_index_max_parallel_hazard() deparses index expressions from > scratch? With code like > > + index_rel = index_open(index_oid, lockmode); > ... > + index_oid_list = RelationGetIndexList(rel); > + foreach(lc, index_oid_list) > ... > + ii_Expressions = RelationGetIndexExpressions(index_rel); > ... > > + Assert(index_expr_item != NULL); > + if (index_expr_item == NULL) /* shouldn't happen */ > + { > + elog(WARNING, "too few entries in indexprs list"); > + context->max_hazard = PROPARALLEL_UNSAFE; > + found_max_hazard = true; > + break; > + } > > Brrr. > > Shouldn't we have this in IndexOptInfo already? > No, because I think we don't build IndexOptInfo for target tables (tables in which insert is performed). It is only built for tables to scan. However, I think here we could cache parallel-safety info in the index rel descriptor and can use it for next time. This will avoid checking expressions each time. Do you have something else in mind? -- With Regards, Amit Kapila.
RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com> > On Wed, Mar 24, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> > wrote: > > Although this specific hack doesn't seem too terrible to me. If you > > execute a parallel insert the likelihood to end up not needing an xid is > > pretty low. Implementing it concurrently does seem like it'd end up > > needing another lwlock nested around xid assignment, or some more > > complicated scheme with already holding XidGenLock or retries. But maybe > > I'm missing an easy solution here. > > I don't think you need to do anything that is known outside the group > of processes involved in the parallel query. I think you just need to > make sure that only one of them is trying to acquire an XID at a time, > and that all the others find out about it. I haven't thought too hard > about the timing: if one process acquires an XID for the transaction, > is it OK if the others do an arbitrary amount of work before they > realize that this has happened? Also, there's the problem that the > leader has the whole transaction stack and the workers don't, so the > recursive nature of XID acquisition is a problem. I suspect these are > all pretty solvable problems; I just haven't put in the energy. But, > it could also be that I'm missing something. It doesn't seem easy to make parallel workers allocate an XID and share it among the parallel processes. When the DML isrun inside a deeply nested subtransaction and the parent transactions have not allocated their XIDs yet, the worker needsto allocate the XIDs for its parents. That indeterminate number of XIDs must be stored in shared memory. The stackof TransactionState structures must also be passed. Also, TransactionIdIsCurrentTransactionId() uses an array ParallelCurrentXidswhere parallel workers receive sub-committed XIDs from the leader. This needs to be reconsidered. Before that, I don't see the need for parallel workers to allocate the XID. As the following Oracle manual says, parallelDML will be used in data analytics and OLTP batch jobs. There should be plenty of source data in those scenarios. When to Use Parallel DML https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-18B2AF09-C548-48DE-A794-86224111549F -------------------------------------------------- Several scenarios where parallel DML is used include: Refreshing Tables in a Data Warehouse System Creating Intermediate Summary Tables Using Scoring Tables Updating Historical Tables Running Batch Jobs -------------------------------------------------- I don't mean to say we want to use the easy hack as we want to be lazy. I'd like to know whether we *really* need the effort. And I want PostgreSQL to provide great competitive features as early as possible without messing up the design andcode. For what kind of realistic conceivable scenarios do we need the sophisticated XID assignment mechanism in parallel workers? Regards Takayuki Tsunakawa
On Wed, Mar 24, 2021 at 6:18 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Mar 24, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote: > > > Although this specific hack doesn't seem too terrible to me. If you > > execute a parallel insert the likelihood to end up not needing an xid is > > pretty low. Implementing it concurrently does seem like it'd end up > > needing another lwlock nested around xid assignment, or some more > > complicated scheme with already holding XidGenLock or retries. But maybe > > I'm missing an easy solution here. > > I don't think you need to do anything that is known outside the group > of processes involved in the parallel query. I think you just need to > make sure that only one of them is trying to acquire an XID at a time, > and that all the others find out about it. I haven't thought too hard > about the timing: if one process acquires an XID for the transaction, > is it OK if the others do an arbitrary amount of work before they > realize that this has happened? > A naive question about this scheme: What if the worker that acquires the XID writes some row and another worker reads that row before it gets to see the XID information? I think it won't treat such a row is written by its own transaction. Won't such a scheme lead to different behavior than serial inserts or where we have acquired XID before starting parallel-operation? -- With Regards, Amit Kapila.
On Mon, Mar 29, 2021 at 6:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > A naive question about this scheme: What if the worker that acquires > the XID writes some row and another worker reads that row before it > gets to see the XID information? I think it won't treat such a row is > written by its own transaction. Won't such a scheme lead to different > behavior than serial inserts or where we have acquired XID before > starting parallel-operation? Well, this is the sort of thing that somebody would need to analyze as part of implementing something like this, but off hand I don't quite see what the problem is. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 25, 2021 at 11:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > But won't some form of lock is required for each rel entry in the hash > table as well for the same duration as is required for rel? Because > otherwise, while we are processing the statement or other relations in > the query, something parallel-unsafe could be attached to that > corresponding rel entry in the hash table. And, I feel probably some > concurrency bottleneck might happen because DDL/DML needs to access > this table at the same time. Hmm, yeah. I hadn't thought of that, but you're right: it's not good enough to determine that there's no problem at the time we start the query, because somebody could create a problem for a partition we haven't yet locked before we get around to locking it. This is something that really deserves a separate discussion rather than being buried on a thread on -committers with a bunch of other topics, but in my opinion we probably shouldn't choose to handle this by adding more locking. The need to lock everything all the time any time we do anything is part of what sucks in this whole area, and we don't want to just keep propagating that. I don't know whether it would be OK to just document that if you concurrently add parallel-unsafe stuff to a table which is a partition while parallel inserts are going on, you might cause the insert to error out, but I don't think it's a completely crazy idea. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Mar 30, 2021 at 12:59 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 25, 2021 at 11:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > But won't some form of lock is required for each rel entry in the hash > > table as well for the same duration as is required for rel? Because > > otherwise, while we are processing the statement or other relations in > > the query, something parallel-unsafe could be attached to that > > corresponding rel entry in the hash table. And, I feel probably some > > concurrency bottleneck might happen because DDL/DML needs to access > > this table at the same time. > > Hmm, yeah. I hadn't thought of that, but you're right: it's not good > enough to determine that there's no problem at the time we start the > query, because somebody could create a problem for a partition we > haven't yet locked before we get around to locking it. This is > something that really deserves a separate discussion rather than being > buried on a thread on -committers with a bunch of other topics, > Agreed, we will summarize all the discussion and ideas by either starting a new thread (s) or on the existing thread on hackers. Sometime back, I have started a thread [1] with an intention to just discuss this part of the puzzle but at that time it didn't get as much attention as was required but I am hoping this time the situation might improve. > but in > my opinion we probably shouldn't choose to handle this by adding more > locking. The need to lock everything all the time any time we do > anything is part of what sucks in this whole area, and we don't want > to just keep propagating that. I don't know whether it would be OK to > just document that if you concurrently add parallel-unsafe stuff to a > table which is a partition while parallel inserts are going on, you > might cause the insert to error out, but I don't think it's a > completely crazy idea. > Yeah, this idea has merit and to some extent, the same is true for what Tom, Andres, or Tsunakawa-San is proposing to keep a flag in pg_class in someway and or have a declarative syntax for parallel-safety for relations. Doing things on such lines might take the complexity of locking partitions and maybe users also don't mind if they occasionally get an error due to concurrently adding a parallel-unsafe clause on one of the relations. I guess we can mitigate the user's concerns to some degree by providing a function like is_dml_rel_parallel_safe(relid/relname) where she can check all global properties of relation and report whether it is safe or not to perform dml and additionally we can report the unsafe property if it is unsafe. [1] - https://www.postgresql.org/message-id/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com -- With Regards, Amit Kapila.
On Tue, Mar 30, 2021 at 12:43 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Mar 29, 2021 at 6:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > A naive question about this scheme: What if the worker that acquires > > the XID writes some row and another worker reads that row before it > > gets to see the XID information? I think it won't treat such a row is > > written by its own transaction. Won't such a scheme lead to different > > behavior than serial inserts or where we have acquired XID before > > starting parallel-operation? > > Well, this is the sort of thing that somebody would need to analyze as > part of implementing something like this, but off hand I don't quite > see what the problem is. > Say, the worker (w-1) that acquires the XID (501) deletes the tuple (CTID: 0, 2). Now, another worker (w-2) reads that tuple (CTID: 0, 2), I think it would consider that the tuple is still visible to its snapshot but if the w-2 knows that 501 is its own XID, it would have been considered it as (not-visible) deleted. Now, if I am not missing anything here, I think this can happen when multiple updates to the same row happen and new rows get added to the new page. Even, if there are no hazards with the new XID machinery, I am not sure if we want to consider this as a pre-requisite to enable Inserts for parallel-select work. We might want to focus on partitioning and other caching-related stuff to rework this implementation. We might want to consider this new XID sharing machinery for actual parallel Inserts/Updates/Deletes if we think it is important to enabling those parallel operations. -- With Regards, Amit Kapila.
On Fri, Mar 26, 2021 at 11:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Mar 24, 2021 at 12:14 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > Shouldn't we have this in IndexOptInfo already? > > > > No, because I think we don't build IndexOptInfo for target tables > (tables in which insert is performed). It is only built for tables to > scan. However, I think here we could cache parallel-safety info in the > index rel descriptor and can use it for next time. This will avoid > checking expressions each time. > On further investigation, it seems we can't rely on the cached information of parallel-safety in rel descriptor because we don't invalidate it if someone changes function property say from safe to unsafe or vice-versa. We need to think of some other idea here, simply caching rel descriptors won't work. I think we need to further discuss this topic on -hackers. -- With Regards, Amit Kapila.
RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
From
"tsunakawa.takay@fujitsu.com"
Date:
Hello Tom-san, Robert-san, Andres-san, others, I've just started the following thread. Parallel INSERT SELECT take 2 https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709%40TYAPR01MB2990.jpnprd01.prod.outlook.com I'm worried that this might not be the right time to ask this now because you might be busy with organizing the beta release,but we'd appreciate your comments and suggestions to set and go in the promising direction. Unfortunately, we couldn'tget attention from you for this difficult parallelism topic, and failed to commit it. We'd like to achieve parallelDML in PG 15. I'd be grateful if you could give us advice not to lose time and effort in the wrong direction. Regards Takayuki Tsunakawa
On Mon, Mar 29, 2021 at 11:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Say, the worker (w-1) that acquires the XID (501) deletes the tuple > (CTID: 0, 2). Now, another worker (w-2) reads that tuple (CTID: 0, 2), > I think it would consider that the tuple is still visible to its > snapshot but if the w-2 knows that 501 is its own XID, it would have > been considered it as (not-visible) deleted. Not unless the command counter had been advanced, which shouldn't be happening anyway. I think it's probably fundamentally impossible to allow the command counter to advance during a parallel operation. The whole point of the command counter mechanism is that you finish one body of work, then bump the command counter so that all the changes become visible before you start doing the next set of things. That doesn't really make sense in a parallel context, and not just because of any difficulty of synchronizing the value. It's actually trying to make sure that we finish X first and only then do Y, which is opposed to what parallelism is all about. But I don't think there's anything more than incidental preventing us from assigning an XID in a parallel operation. Contrary to what you allege here, assigning an XID doesn't change the set of tuples we can see - nor do any inserts, updates, or deletes it performs afterward, until such time as the command counter is bumped. We would need to make sure that two different participants in the parallel operation don't both get an XID, because we don't want the transaction to end up with two different XIDs. We want the second one to instead discover the XID that the other one got. But that's not a definitional question, like with the command counter. It's "just" a programming problem. -- Robert Haas EDB: http://www.enterprisedb.com