Thread: reorganizing partitioning code (was: Re: [HACKERS] path toward fasterpartition pruning)
reorganizing partitioning code (was: Re: [HACKERS] path toward fasterpartition pruning)
From
Amit Langote
Date:
Re-posting my earlier email to start a new thread. On 2018/02/09 2:58, Alvaro Herrera wrote:> Robert Haas wrote: >> On Wed, Feb 7, 2018 at 3:42 AM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: > >>> partition.c seems to have two kinds of functions 1. that build and >>> manage relcache, creates quals from bounds etc. which are metadata >>> management kind 2. partition bound comparison functions, and other >>> optimizer related functions. May be we should divide the file that >>> way. The first category code remains in catalog/ as it is today. The >>> second catagory functions move to optimizer/. >> >> It would be sensible to separate functions that build and manage data >> in the relcache from other functions. I think we should consider >> moving the existing functions of that type from partition.c to >> src/backend/utils/cache/partcache.c. > > FWIW I've been thinking that perhaps we need some other separation of > code better than statu quo. The current partition.c file includes stuff > for several modules and ISTM all these new patches are making more and > more of a mess. So +1 to the general idea of splitting things up. > Maybe partcache.c is not ambitious enough, but it seems a good first > step. Agree with the proposed reorganizing and adding a partcache.c, which I tried to do in the attached patch. * The new src/backend/utils/cache/partcache.c contains functions that initialize relcache's partitioning related fields. Various partition bound comparison and search functions (and then some) that work off of the cached information are moved. Also, since we cache partition qual, interface functions RelationGetPartitioQual(Relation) and get_partition_qual_relid(Oid) are moved too. * The new src/include/utils/partcache.h contains various struct definitions that are moved from backend/catalog/partition.c, include/catalog/partition.h, and include/utils/rel.h. Also, declarations of interface functions of partcache.c. Thoughts? Thanks, Amit
Attachment
Re: reorganizing partitioning code (was: Re: [HACKERS] path towardfaster partition pruning)
From
Amit Langote
Date:
On 2018/02/13 23:08, Ashutosh Bapat wrote: > On Tue, Feb 13, 2018 at 2:17 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> >> Agree with the proposed reorganizing and adding a partcache.c, which I >> tried to do in the attached patch. >> >> * The new src/backend/utils/cache/partcache.c contains functions that >> initialize relcache's partitioning related fields. Various partition >> bound comparison and search functions (and then some) that work off of the >> cached information are moved. > > Are you moving partition bound comparison functions to partcache.c? > They will also used by optimizer, so may be leave them out of > partcache.c? Yes, I moved the partition bound comparison and search functions, because I thought that they should live with the other code that manages the cached information. So, I moved not only the code that reads the catalogs and builds the partition key, partition (bound) descriptor, and partition qual (for individual partitions), but also the code that uses those structures. So, with the new arrangement, optimizer will include utils/partcache.h, instead of catalog/partition.h. Thanks, Amit
Re: reorganizing partitioning code (was: Re: [HACKERS] path towardfaster partition pruning)
From
Amit Langote
Date:
On 2018/02/14 10:00, Amit Langote wrote: > Agree with the proposed reorganizing and adding a partcache.c, which I > tried to do in the attached patch. > > * The new src/backend/utils/cache/partcache.c contains functions that > initialize relcache's partitioning related fields. Various partition > bound comparison and search functions (and then some) that work off of the > cached information are moved. Also, since we cache partition qual, > interface functions RelationGetPartitioQual(Relation) and > get_partition_qual_relid(Oid) are moved too. > > * The new src/include/utils/partcache.h contains various struct > definitions that are moved from backend/catalog/partition.c, > include/catalog/partition.h, and include/utils/rel.h. Also, declarations > of interface functions of partcache.c. Attached updated version, where I removed #include "catalog/partition.h" from even more places and also moved map_partition_varattnos() next to map_variable_attnos() in rewriteManip.c. Now, after applying the patch - Files #including partition.h File Line 0 src/backend/catalog/heap.c 44 #include "catalog/partition.h" 1 src/backend/catalog/partition.c 22 #include "catalog/partition.h" 2 src/backend/commands/indexcmds.c 26 #include "catalog/partition.h" 3 src/backend/commands/tablecmds.c 33 #include "catalog/partition.h" 4 src/backend/utils/cache/partcache.c 23 #include "catalog/partition.h" Files #including partcache.h File Line 0 src/backend/optimizer/path/joinrels.c 24 #include "utils/partcache.h" 1 src/backend/optimizer/prep/prepunion.c 51 #include "utils/partcache.h" 2 src/backend/optimizer/util/relnode.c 30 #include "utils/partcache.h" 3 src/backend/utils/cache/partcache.c 37 #include "utils/partcache.h" 4 src/backend/utils/cache/relcache.c 83 #include "utils/partcache.h" 5 src/include/executor/execPartition.h 19 #include "utils/partcache.h" 6 src/include/utils/rel.h 27 #include "utils/partcache.h" Thanks, Amit
Attachment
Re: reorganizing partitioning code (was: Re: [HACKERS] path towardfaster partition pruning)
From
Alvaro Herrera
Date:
This is looking attractive. Please don't #include postgres.h in partcache.h. That's per policy. Why do you need to #include parsenodes.h in partcache.h? I think rewriteManip.h can do with just relcache.h rather than rel.h (probably partition.h can do likewise) thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: reorganizing partitioning code (was: Re: [HACKERS] path towardfaster partition pruning)
From
Alvaro Herrera
Date:
BTW may I suggest that git config diff.algorithm=histogram results in better diffs? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018/02/15 5:30, Alvaro Herrera wrote: > BTW may I suggest that > > git config diff.algorithm=histogram > > results in better diffs? Aha! That's much better. Thanks, Amit
Thanks for the review. On 2018/02/15 5:25, Alvaro Herrera wrote: > This is looking attractive. > > Please don't #include postgres.h in partcache.h. That's per policy. Oops, fixed. > Why do you need to #include parsenodes.h in partcache.h? I thought it was needed because there was this: extern void check_new_partition_bound(char *relname, Relation parent, PartitionBoundSpec *spec); in partcache.h and PartitionBoundSpec is defined in parsenodes.h. Removing the #include turned out to be fine, that is, after I put it in partition.h instead. > I think rewriteManip.h can do with just relcache.h rather than rel.h > (probably partition.h can do likewise) Hmm. map_partition_varattnos() that I moved to rewriteManip.c wants to use RelationGetDescr(), but partition.h was fine. Attached updated patch. Thanks, Amit
Attachment
Added to CF here: https://commitfest.postgresql.org/17/1520/ Thanks, Amit
Hello. I'd like to make a humble comment. At Thu, 15 Feb 2018 19:31:47 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <8906c861-ea47-caee-c860-eff8d7e1dbc0@lab.ntt.co.jp> > Added to CF here: https://commitfest.postgresql.org/17/1520/ The reorganization adds/removes header files to/from .[ch] files. That can easily leave useless includes and they're hardly noticed. Following files touched in this patch have such useless includes after this patch. On the other hand, naive decision of this kind of cleanup can lead to curruption. [1] So, I don't insisit that the all of the following *should* amended, especially for rel.h. [1] https://www.postgresql.org/message-id/6748.1518711125@sss.pgh.pa.us ==== nodeModifyTable.c: > +#include "rewrite/rewriteManip.h" rewriteManip.h is changed to include rel.h so rel.h is no longer need to be included in the file. (It is also included in lmgr.h so it was needless since before this patch, though.) ==== hba.c: > +#include "catalog/objectaddress.h" objectaddress.h includes acl.h so acl.h is no longer useful. ==== joinrels.c: > +#include "utils/partcache.h" partcache.h includes lsyscache.h. ==== prepunion.c: > +#include "utils/partcache.h" partcache.h includes lsyscache.h and partcache.h is included in rel.h. So partcache.h and lsyscache.h are not required. ==== utility.c: > +#include "utils/rel.h" rel.h includes xlog.h (and xlog.h includes fd.h). The last two are removable. ==== partcache.c: parsenodes.h is included from some other files. rel.h is included from rewriteManip.h. partcache.h is included from rel.h As the result, parsenodes.h, rel.h, partcache.h are not required. ==== relcache.c: > +#include "utils/partcache.h" lsyscache.h is included by partcache.h. ==== rel.h: > +#include "utils/partcache.h" partcache.h includes fmgr.h and relcache.h so the last two are no longer useful. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Horiguchi-san, On 2018/02/16 14:07, Kyotaro HORIGUCHI wrote: > Hello. I'd like to make a humble comment. Thanks a lot for taking a look. > The reorganization adds/removes header files to/from .[ch] > files. That can easily leave useless includes and they're hardly > noticed. Following files touched in this patch have such useless > includes after this patch. Yes, I agree. > On the other hand, naive decision of this kind of cleanup can > lead to curruption. [1] So, I don't insisit that the all of the > following *should* amended, especially for rel.h. > > [1] https://www.postgresql.org/message-id/6748.1518711125@sss.pgh.pa.us I was initially trying limit this patch's #include churn so that it only touches .[ch] files that are somehow involved with partitioning, but ended up with a lot of files touched. To prevent confusion and concerns, I have updated the patch to keep the churn to minimum as best as I could. So, while the v3 patch looked like this: 24 files changed, 2373 insertions(+), 2312 deletions(-) create mode 100644 src/backend/utils/cache/partcache.c create mode 100644 src/include/utils/partcache.h v4 looks like this: 15 files changed, 2314 insertions(+), 2263 deletions(-) create mode 100644 src/backend/utils/cache/partcache.c create mode 100644 src/include/utils/partcache.h > ==== nodeModifyTable.c: >> +#include "rewrite/rewriteManip.h" > > rewriteManip.h is changed to include rel.h so rel.h is no longer > need to be included in the file. (It is also included in lmgr.h > so it was needless since before this patch, though.) On second thought, I decided to hold back on moving map_partition_varattnos() to rewriteManip.c, because its interface is unlike any other functions in that file, requiring us to include rel.h in rewriteManip.h. So, I removed this #include from nodeModifyTable.c. > ==== hba.c: >> +#include "catalog/objectaddress.h" > > objectaddress.h includes acl.h so acl.h is no longer useful. This and a few others were necessitated by removing partition.h from executor.h. For now, I'm putting partition.h back into executor.h, although it would be nice to remove it eventually. > ==== joinrels.c: >> +#include "utils/partcache.h" > > partcache.h includes lsyscache.h. Moved lsyscache.h from partcache.h to partcache.c. > ==== prepunion.c: >> +#include "utils/partcache.h" > > partcache.h includes lsyscache.h and partcache.h is included in > rel.h. So partcache.h and lsyscache.h are not required. Oops, why did I even include partcache.h in prepunion.c! Removed. > ==== utility.c: >> +#include "utils/rel.h" > > rel.h includes xlog.h (and xlog.h includes fd.h). The last two > are removable. I've reverted the change that necessitated this and so this one. > ==== partcache.c: > parsenodes.h is included from some other files. > rel.h is included from rewriteManip.h. > partcache.h is included from rel.h > As the result, parsenodes.h, rel.h, partcache.h are not required. Removed. > ==== relcache.c: >> +#include "utils/partcache.h" > > lsyscache.h is included by partcache.h. lsyscache.h was moved from partcache.h per above, so keeping here. > ==== rel.h: >> +#include "utils/partcache.h" > > partcache.h includes fmgr.h and relcache.h so the last two are > no longer useful. Removed both. Attached updated version. Thanks, Amit
Attachment
Hi Amit, On 2/16/18 3:36 AM, Amit Langote wrote: > > Attached updated version. This patch no longer applies and the conflicts do not appear to be trivial. I'm a bit confused about your comment in [1]: > I gave up on rebasing this patch yesterday as I couldn't finish it in > 5 minutes, but maybe I will try later this month. Gotta focus on > thefaster pruning stuff for now... How much later are we talking about? Marked Waiting on Author. -- -David david@pgmasters.net [1] https://www.postgresql.org/message-id/33098109-9ef1-9594-e7d5-0977a50f8cfa%40lab.ntt.co.jp
Hi David. On Fri, Mar 2, 2018 at 11:53 PM, David Steele <david@pgmasters.net> wrote: > Hi Amit, > > On 2/16/18 3:36 AM, Amit Langote wrote: >> >> Attached updated version. > > This patch no longer applies and the conflicts do not appear to be trivial. > > I'm a bit confused about your comment in [1]: > >> I gave up on rebasing this patch yesterday as I couldn't finish it in >> 5 minutes, but maybe I will try later this month. Gotta focus on >> thefaster pruning stuff for now... > > How much later are we talking about? Sorry about the confusing comment. It could be sometime later half of the next week. Thanks, Amit
Hi Amit, On 3/2/18 11:17 AM, Amit Langote wrote: > > On Fri, Mar 2, 2018 at 11:53 PM, David Steele <david@pgmasters.net> wrote: >> Hi Amit, >> >> On 2/16/18 3:36 AM, Amit Langote wrote: >>> >>> Attached updated version. >> >> This patch no longer applies and the conflicts do not appear to be trivial. >> >> I'm a bit confused about your comment in [1]: >> >>> I gave up on rebasing this patch yesterday as I couldn't finish it in >>> 5 minutes, but maybe I will try later this month. Gotta focus on >>> thefaster pruning stuff for now... >> >> How much later are we talking about? > > Sorry about the confusing comment. It could be sometime later half of > the next week. We are now three weeks into the CF with no new patch. Are you planning to update this patch? If not, I think it should be marked as Returned with Feedback and submitted to the next CF once it has been updated. Regards, -- -David david@pgmasters.net
David Steele wrote: > > Sorry about the confusing comment. It could be sometime later half of > > the next week. > > We are now three weeks into the CF with no new patch. > > Are you planning to update this patch? If not, I think it should be > marked as Returned with Feedback and submitted to the next CF once it > has been updated. This is no new development, only code movement. I think it would be worse to have three different branches of partitioning code, v10 "basic", v11 "powerful but not reorganized", v12 "reorganized". I'd rather have only v10 "basic" and v11+ "powerful". Let's keep this entry open till the last minute. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > David Steele wrote: >> Are you planning to update this patch? If not, I think it should be >> marked as Returned with Feedback and submitted to the next CF once it >> has been updated. > This is no new development, only code movement. I think it would be > worse to have three different branches of partitioning code, v10 > "basic", v11 "powerful but not reorganized", v12 "reorganized". I'd > rather have only v10 "basic" and v11+ "powerful". > Let's keep this entry open till the last minute. Nonetheless, it's March 21. David's point is that it's time to get a move on. regards, tom lane
On 2018/03/22 2:33, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> David Steele wrote: >>> Are you planning to update this patch? If not, I think it should be >>> marked as Returned with Feedback and submitted to the next CF once it >>> has been updated. > >> This is no new development, only code movement. I think it would be >> worse to have three different branches of partitioning code, v10 >> "basic", v11 "powerful but not reorganized", v12 "reorganized". I'd >> rather have only v10 "basic" and v11+ "powerful". > >> Let's keep this entry open till the last minute. > > Nonetheless, it's March 21. David's point is that it's time to get a > move on. I'm sorry it took me a while to reply on this thread. Due to quite a few changes to the partitioning-related code recently and also considering some pending patches which might touch the code moved around by this patch, I'd been putting off rebasing this patch. Although, I should have said that before without waiting until today to do so. Sorry. FWIW, I did manage to rebase it this morning and posting it here. Thanks, Amit
Attachment
On Wed, Mar 21, 2018 at 10:20 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Let's keep this entry open till the last minute. Ugh, I don't like keeping things open till the last minute all that much, especially if they're not being updated. But since this has been updated I guess it's somewhat moot. Are you going to pick this up RSN? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Wed, Mar 21, 2018 at 10:20 AM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: > > Let's keep this entry open till the last minute. > > Ugh, I don't like keeping things open till the last minute all that > much, especially if they're not being updated. But since this has > been updated I guess it's somewhat moot. > > Are you going to pick this up RSN? If during next week qualifies as RSN, then yes. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018/03/22 11:45, Amit Langote wrote: > FWIW, I did manage to rebase it this morning and posting it here. Rebased again. I started wondering if we should separate out stuff related to partition bounds. That is create a utils/partbound.h and put PartitionBoundInfo and related comparison and search functions into a utils/adt/partbound.c. I had started thinking about that when I looked at the code added by the patch submitted on the "advanced partition matching algorithm for partition-wise join" thread [1]. I haven't done anything about that though. Thanks, Amit [1] https://commitfest.postgresql.org/17/1553/
Attachment
On Wed, Mar 28, 2018 at 12:07 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2018/03/22 11:45, Amit Langote wrote: >> FWIW, I did manage to rebase it this morning and posting it here. > > Rebased again. > > I started wondering if we should separate out stuff related to partition > bounds. That is create a utils/partbound.h and put PartitionBoundInfo and > related comparison and search functions into a utils/adt/partbound.c. I > had started thinking about that when I looked at the code added by the > patch submitted on the "advanced partition matching algorithm for > partition-wise join" thread [1]. I haven't done anything about that though. adt = Abstract Data Type, which I think we've interpreted up until now to mean an SQL-visible data type, so that seems like an odd choice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company