Thread: pgsql: Clarify use of temporary tables within partition trees
Clarify use of temporary tables within partition trees Since their introduction, partition trees have been a bit lossy regarding temporary relations. Inheritance trees respect the following patterns: 1) a child relation can be temporary if the parent is permanent. 2) a child relation can be temporary if the parent is temporary. 3) a child relation cannot be permanent if the parent is temporary. 4) The use of temporary relations also imply that when both parent and child need to be from the same sessions. Partitions share many similar patterns with inheritance, however the handling of the partition bounds make the situation a bit tricky for case 1) as the partition code bases a lot of its lookup code upon PartitionDesc which does not really look after relpersistence. This causes for example a temporary partition created by session A to be visible by another session B, preventing this session B to create an extra partition which overlaps with the temporary one created by A with a non-intuitive error message. There could be use-cases where mixing permanent partitioned tables with temporary partitions make sense, but that would be a new feature. Partitions respect 2), 3) and 4) already. It is a bit depressing to see those error checks happening in MergeAttributes() whose purpose is different, but that's left as future refactoring work. Back-patch down to 10, which is where partitioning has been introduced, except that default partitions do not apply there. Documentation also includes limitations related to the use of temporary tables with partition trees. Reported-by: David Rowley Author: Amit Langote, Michael Paquier Reviewed-by: Ashutosh Bapat, Amit Langote, Michael Paquier Discussion: https://postgr.es/m/CAKJS1f94Ojk0og9GMkRHGt8wHTW=ijq5KzJKuoBoqWLwSVwGmw@mail.gmail.com Branch ------ REL_10_STABLE Details ------- https://git.postgresql.org/pg/commitdiff/5862174ec78a173c41710c5ef33feb993ae45cc7 Modified Files -------------- doc/src/sgml/ddl.sgml | 10 ++++++++++ src/backend/commands/tablecmds.c | 21 +++++++++++++++++++++ src/test/regress/expected/alter_table.out | 16 ++++++++++++++++ src/test/regress/expected/create_table.out | 10 ++++++++++ src/test/regress/expected/foreign_data.out | 12 ++++++++++++ src/test/regress/sql/alter_table.sql | 15 +++++++++++++++ src/test/regress/sql/create_table.sql | 9 +++++++++ src/test/regress/sql/foreign_data.sql | 11 +++++++++++ 8 files changed, 104 insertions(+)
On 20 June 2018 at 13:53, Michael Paquier <michael@paquier.xyz> wrote: > Clarify use of temporary tables within partition trees Hi, Thanks for committing this fix. I think slightly more should have been done. There's still some dead code in expand_partitioned_rtentry that I think should be removed. The code won't cost much performance wise, but it may mislead someone into thinking they can add some other condition there to skip partitions. The attached removes it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Fri, Jun 29, 2018 at 5:13 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 20 June 2018 at 13:53, Michael Paquier <michael@paquier.xyz> wrote: >> Clarify use of temporary tables within partition trees > > Thanks for committing this fix. > > I think slightly more should have been done. There's still some dead > code in expand_partitioned_rtentry that I think should be removed. > > The code won't cost much performance wise, but it may mislead someone > into thinking they can add some other condition there to skip > partitions. > > The attached removes it. I'd rather keep an elog(ERROR) than completely remove the check. Also, for the record, I think the subject line of Michael's commit message was pretty unclear about what it was actually doing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jul 02, 2018 at 02:07:37PM -0400, Robert Haas wrote: > I'd rather keep an elog(ERROR) than completely remove the check. +1. > Also, for the record, I think the subject line of Michael's commit > message was pretty unclear about what it was actually doing. How would you formulate it? Perhaps the error message did not emphasize enough on the fast that it actually blocked a behavior, say "Block mix of temporary and permanent relations in partition trees" or such? -- Michael
Attachment
On 3 July 2018 at 10:16, Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Jul 02, 2018 at 02:07:37PM -0400, Robert Haas wrote: >> I'd rather keep an elog(ERROR) than completely remove the check. > > +1. Attached >> Also, for the record, I think the subject line of Michael's commit >> message was pretty unclear about what it was actually doing. > > How would you formulate it? Perhaps the error message did not emphasize > enough on the fast that it actually blocked a behavior, say "Block mix > of temporary and permanent relations in partition trees" or such? For me, reading the subject line of the commit I'd have expected a doc change, or improved/new code comments. This is really more "Disallow mixed temp/permanent partitioned hierarchies". "Clarify" does not really involve a change of behaviour. It's an explanation of what the behaviour is. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Jul 2, 2018 at 8:59 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: >> How would you formulate it? Perhaps the error message did not emphasize >> enough on the fast that it actually blocked a behavior, say "Block mix >> of temporary and permanent relations in partition trees" or such? Yes. > For me, reading the subject line of the commit I'd have expected a doc > change, or improved/new code comments. > > This is really more "Disallow mixed temp/permanent partitioned hierarchies". Yes. > "Clarify" does not really involve a change of behaviour. It's an > explanation of what the behaviour is. And yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
(re-adding committers list) On 3 July 2018 at 20:31, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > 1. I still insist that it's better for the newly added code to be near the > top of the function body than in the middle, which brings me to... That will cause the AppendRelInfo not to be built for a partitioned table with no partitions. I'm not personally certain that doing that comes without consequence. Are you certain of that? If not 100%, then I think that's more of a task for PG12. I'm trying to propose removing dead code for PG11 and on. My current thoughts are that moving this test up a few lines is unlikely to improve performance in a real-world situation, so it does not seem worth the risk for a backpatch to me. > 2. While we're at fixing the code around here, I think we should think > about trying to get rid of the *non-dead* code that produces a structure > that isn't used anywhere, which I was under the impression, 0a480502b09 > [1] already did (cc'ing Ashutosh). To clarify, we still unnecessarily > create a "duplicate" RTE for partitioned tables in a partition tree > (non-leaf tables) in its role as a child. So, for the above query, there > end up being created 4 entries in the query's range table (2 for 'p' and 2 > for 'pd'). That makes sense for plain inheritance, because even the root > parent table in a plain inheritance tree is a regular table containing > data. That's not true for partition inheritance trees, where non-leaf > tables contain no data, so we don't create a plan to scan them (see > d3cc37f1d80 [2]), which in turn means we don't need to create the > redundant "duplicate" child RTEs for them either. > > See attached my delta patch to address both 1 and 2. I recently saw this too and wondered about it. I then wrote a patch to just have it create a single RangeTblEntry for each partitioned table. The tests all passed. I'd categorise this one the same as I have #1 above, i.e. not backpatch material. It seems like something useful to look into for v12 though. I assumed this was done for a reason and that I just didn't understand what that reason was. I don't recall any comments to explain the reason why we build two RangeTblEntrys for each partitioned table. In light of what Amit has highlighted, I'm still standing by the v3 patch assuming the typo is fixed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jul 03, 2018 at 09:05:41PM +1200, David Rowley wrote: > > [...] > > I'd categorise this one the same as I have #1 above, i.e. not > backpatch material. It seems like something useful to look into for > v12 though. I assumed this was done for a reason and that I just > didn't understand what that reason was. I don't recall any comments to > explain the reason why we build two RangeTblEntrys for each > partitioned table. I agree. Please let's keep v11 stable, and discuss further more on future optimizations like the previous two items for v12, which has plenty of time to be broken. > In light of what Amit has highlighted, I'm still standing by the v3 > patch assuming the typo is fixed. Yeah. Actually I'd like to add a test as well to test the recursion call of expand_partitioned_rtentry. If you have an idea, please let me know or I'll figure out one by myself and add it probably in create_table.sql. -- Michael
Attachment
On 3 July 2018 at 21:15, Michael Paquier <michael@paquier.xyz> wrote: > Yeah. Actually I'd like to add a test as well to test the recursion > call of expand_partitioned_rtentry. If you have an idea, please let me > know or I'll figure out one by myself and add it probably in > create_table.sql. What specifically do you want to test? There are plenty of partitioned tests with sub-partitioned tables. Going by [1], there's no shortage of coverage. Of course, the dead code I'm proposing we remove is not covered. There's no way to cover it... it's dead. [1] https://coverage.postgresql.org/src/backend/optimizer/prep/prepunion.c.gcov.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2018/07/03 18:05, David Rowley wrote: > (re-adding committers list) > > On 3 July 2018 at 20:31, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> 1. I still insist that it's better for the newly added code to be near the >> top of the function body than in the middle, which brings me to... > > That will cause the AppendRelInfo not to be built for a partitioned > table with no partitions. I'm not personally certain that doing that > comes without consequence. Are you certain of that? If not 100%, > then I think that's more of a task for PG12. I'm trying to propose > removing dead code for PG11 and on. What to use the AppendRelInfo for if there is no partition? > My current thoughts are that moving this test up a few lines is > unlikely to improve performance in a real-world situation, so it does > not seem worth the risk for a backpatch to me. It's just that needless structures get created which we could avoid, and once I realized that, I found out about the 2 below. If you look at all of what expand_single_inheritance_child does, I'm a bit surprised you don't think we should try to avoid calling it if we don't need any of that processing. >> 2. While we're at fixing the code around here, I think we should think >> about trying to get rid of the *non-dead* code that produces a structure >> that isn't used anywhere, which I was under the impression, 0a480502b09 >> [1] already did (cc'ing Ashutosh). To clarify, we still unnecessarily >> create a "duplicate" RTE for partitioned tables in a partition tree >> (non-leaf tables) in its role as a child. So, for the above query, there >> end up being created 4 entries in the query's range table (2 for 'p' and 2 >> for 'pd'). That makes sense for plain inheritance, because even the root >> parent table in a plain inheritance tree is a regular table containing >> data. That's not true for partition inheritance trees, where non-leaf >> tables contain no data, so we don't create a plan to scan them (see >> d3cc37f1d80 [2]), which in turn means we don't need to create the >> redundant "duplicate" child RTEs for them either. >> >> See attached my delta patch to address both 1 and 2. > > I recently saw this too and wondered about it. I then wrote a patch to > just have it create a single RangeTblEntry for each partitioned table. > The tests all passed. Which is how I was thinking the commit 0a480502b09 had done it, but apparently not. That's a commit in PG11. None of what I'm suggesting is for PG 10. > I'd categorise this one the same as I have #1 above, i.e. not > backpatch material. It seems like something useful to look into for > v12 though. I assumed this was done for a reason and that I just > didn't understand what that reason was. I don't recall any comments to > explain the reason why we build two RangeTblEntrys for each > partitioned table. Maybe because that's what's done for the root parent in a plain inheritance hierarchy, which is always a plain table. In that case, one RTE is for its role as the parent (with rte->inh = true) and another is for its role as a child (with rte->inh = false). The former is processed as an append rel and the latter as a plain rel that will get assigned scan paths such as SeqScan, etc. For partitioned table parent(s), we need only the former because they can only be processed as append rels. That's why I'm proposing we could adjust the commit in PG 11 that introduced expand_partitioned_rtentry such that the duplicate child RTE and other objects are not created. Thanks, Amit
On Tue, Jul 03, 2018 at 09:30:20PM +1200, David Rowley wrote: > On 3 July 2018 at 21:15, Michael Paquier <michael@paquier.xyz> wrote: > > Yeah. Actually I'd like to add a test as well to test the recursion > > call of expand_partitioned_rtentry. If you have an idea, please let me > > know or I'll figure out one by myself and add it probably in > > create_table.sql. > > What specifically do you want to test? There are plenty of partitioned > tests with sub-partitioned tables. Going by [1], there's no shortage > of coverage. > > Of course, the dead code I'm proposing we remove is not covered. > There's no way to cover it... it's dead. Your patch removes this part: - /* - * If the partitioned table has no partitions or all the partitions are - * temporary tables from other backends, treat this as non-inheritance - * case. - */ - if (!has_child) - parentrte->inh = false; And adds this equivalent part: + /* + * If the partitioned table has no partitions, treat this as the + * non-inheritance case. + */ + if (partdesc->nparts == 0) + { + parentrte->inh = false; + return; + } As far as I can see from the coverage report, the former is not tested, and corresponds to the case of a partition leaf which is itself partitioned but has no partitions, and the new portion is equivalent to the part removed. That ought to be tested, particularly as Amit mentions that there could be improvements with moving it around in future versions. -- Michael
Attachment
On 3 July 2018 at 21:53, Michael Paquier <michael@paquier.xyz> wrote: > Your patch removes this part: > - /* > - * If the partitioned table has no partitions or all the partitions are > - * temporary tables from other backends, treat this as non-inheritance > - * case. > - */ > - if (!has_child) > - parentrte->inh = false; > > And adds this equivalent part: > + /* > + * If the partitioned table has no partitions, treat this as the > + * non-inheritance case. > + */ > + if (partdesc->nparts == 0) > + { > + parentrte->inh = false; > + return; > + } > > As far as I can see from the coverage report, the former is not tested, > and corresponds to the case of a partition leaf which is itself > partitioned but has no partitions, and the new portion is equivalent to > the part removed. That ought to be tested, particularly as Amit > mentions that there could be improvements with moving it around in > future versions. Oh okay. Yeah, you can hit that with a partitionless sub-partitioned table. I've added a test in the attached v4. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Tue, Jul 03, 2018 at 11:16:55PM +1200, David Rowley wrote: > Oh okay. Yeah, you can hit that with a partitionless sub-partitioned > table. Thanks for the patch and fixing the typo ;) +create table list_parted_tbl (a int,b int) partition by list (a); +create table list_parted_tbl1 partition of list_parted_tbl for values in(1) partition by list(b); +select * from list_parted_tbl; +explain (costs off) select * from list_parted_tbl; I am not sure if it is much interesting to keep around this table set for pg_upgrade, so I would drop it. Except for that, the result looks fine. I'll double-check and wrap it tomorrow on HEAD and REL_11_STABLE. The optimizations mentioned sound interesting, though I would recommend to not risk the stability of v11 at this point, so let's keep them for v12~. -- Michael
Attachment
On Tue, Jul 3, 2018 at 3:20 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > Maybe because that's what's done for the root parent in a plain > inheritance hierarchy, which is always a plain table. In that case, one > RTE is for its role as the parent (with rte->inh = true) and another is > for its role as a child (with rte->inh = false). The former is processed > as an append rel and the latter as a plain rel that will get assigned scan > paths such as SeqScan, etc. Yes that's true. I remember we had some discussion about these two RTEs and that the one marked as child was extraneous, but I can not spot that in the mail thread. It's one of the things we did as part of partition-wise join and that thread is pretty long. It was probably kept without changing it because a. we wanted to get the bigger patch committed without breaking anything and this was a small thing which we couldn't decide whether was safe or not b. if it was safe not to create that entry, it should have been done in a commit which avoided creating scans for partitioned tables, but didn't > > For partitioned table parent(s), we need only the former because they can > only be processed as append rels. That's why I'm proposing we could > adjust the commit in PG 11 that introduced expand_partitioned_rtentry such > that the duplicate child RTE and other objects are not created. FWIW, I think this would be ok before beta, but not now. I see it as a PG12 item. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > On Tue, Jul 3, 2018 at 3:20 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> Maybe because that's what's done for the root parent in a plain >> inheritance hierarchy, which is always a plain table. In that case, one >> RTE is for its role as the parent (with rte->inh = true) and another is >> for its role as a child (with rte->inh = false). The former is processed >> as an append rel and the latter as a plain rel that will get assigned scan >> paths such as SeqScan, etc. > Yes that's true. Yes, that's exactly why there are two RTEs for the parent table in normal inheritance cases. I concur with the idea that it shouldn't be necessary to create a child RTE for a partitioning parent table --- we should really only need the appendrel RTE plus RTEs for tables that will be scanned. However, it's not clear to me that this is a trivial change for multilevel partitioning cases. Do we need RTEs for the intermediate nonleaf levels? In the abstract, the planner and executor might not need them. But the code that deals with partitioning constraint management might expect them to exist. Another point is that executor-start-time privilege checking is driven off the RTE list, so we need an RTE for any table that requires priv checks, so we might need RTEs for intermediate levels just for that. Also, IIRC, the planner sets up the near-duplicate RTEs for inheritance cases so that only one of them is privilege-checked. Be careful that you don't end up with zero privilege checks on the partition root :-( regards, tom lane
On 2018/07/03 21:15, Ashutosh Bapat wrote: > On Tue, Jul 3, 2018 at 3:20 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> >> Maybe because that's what's done for the root parent in a plain >> inheritance hierarchy, which is always a plain table. In that case, one >> RTE is for its role as the parent (with rte->inh = true) and another is >> for its role as a child (with rte->inh = false). The former is processed >> as an append rel and the latter as a plain rel that will get assigned scan >> paths such as SeqScan, etc. > > Yes that's true. I remember we had some discussion about these two > RTEs and that the one marked as child was extraneous, but I can not > spot that in the mail thread. It's one of the things we did as part of > partition-wise join and that thread is pretty long. It was probably > kept without changing it because a. we wanted to get the bigger patch > committed without breaking anything and this was a small thing which > we couldn't decide whether was safe or not b. if it was safe not to > create that entry, it should have been done in a commit which avoided > creating scans for partitioned tables, but didn't About (b), maybe yes. Perhaps, I/we decided to put it off until we got around to writing a patch for making inheritance expansion step-wise for partitioned tables. We didn't get to that until 11dev branch opened up for development. The patch that I had proposed for step-wise expansion was such that the duplicate RTE for parents would not get created, but it wasn't committed. That was one of the things that was different from your patch for step-wise expansion which was committed. Thanks, Amit
On 2018/07/04 1:21, Tom Lane wrote: > Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: >> On Tue, Jul 3, 2018 at 3:20 PM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> Maybe because that's what's done for the root parent in a plain >>> inheritance hierarchy, which is always a plain table. In that case, one >>> RTE is for its role as the parent (with rte->inh = true) and another is >>> for its role as a child (with rte->inh = false). The former is processed >>> as an append rel and the latter as a plain rel that will get assigned scan >>> paths such as SeqScan, etc. > >> Yes that's true. > > Yes, that's exactly why there are two RTEs for the parent table in normal > inheritance cases. I concur with the idea that it shouldn't be necessary > to create a child RTE for a partitioning parent table --- we should really > only need the appendrel RTE plus RTEs for tables that will be scanned. > > However, it's not clear to me that this is a trivial change for multilevel > partitioning cases. Do we need RTEs for the intermediate nonleaf levels? > In the abstract, the planner and executor might not need them. But the > code that deals with partitioning constraint management might expect them > to exist. We do need RTEs for *all* parents (non-leaf tables) in a partition tree, each of which we need to process as an append rel (partition pruning is invoked separately for each non-leaf table). What we *don't* need for each of them is the duplicate RTE with inh = false, because we don't need to process them as plain rels. > Another point is that executor-start-time privilege checking is driven > off the RTE list, so we need an RTE for any table that requires priv > checks, so we might need RTEs for intermediate levels just for that. > > Also, IIRC, the planner sets up the near-duplicate RTEs for inheritance > cases so that only one of them is privilege-checked. Yeah, I see in prepunion.c that the child RTE's requiredPerms is set to 0, with the following comment justifying it: "Also, set requiredPerms to zero since all required permissions checks are done on the original RTE." > Be careful that > you don't end up with zero privilege checks on the partition root :-( The original RTE belongs to the partition root and it's already in the range table, so its privileges *are* checked. Thanks, Amit
On Tue, Jul 03, 2018 at 08:31:27PM +0900, Michael Paquier wrote: > I am not sure if it is much interesting to keep around this table set > for pg_upgrade, so I would drop it. Except for that, the result looks > fine. I'll double-check and wrap it tomorrow on HEAD and REL_11_STABLE. > The optimizations mentioned sound interesting, though I would recommend > to not risk the stability of v11 at this point, so let's keep them for > v12~. So at the end I have dropped the table from the test, and pushed the patch to HEAD and REL_11_STABLE. Thanks David for the patch, and others for the reviews. -- Michael
Attachment
On 4 July 2018 at 13:48, Michael Paquier <michael@paquier.xyz> wrote: > So at the end I have dropped the table from the test, and pushed the > patch to HEAD and REL_11_STABLE. Thanks David for the patch, and others > for the reviews. Thanks for pushing it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services