Re: [PoC] Reducing planning time when tables have many partitions - Mailing list pgsql-hackers
From | Yuya Watari |
---|---|
Subject | Re: [PoC] Reducing planning time when tables have many partitions |
Date | |
Msg-id | CAJ2pMkZGB4Yx1dCYkU_YRJgj2rcC0s+PWknqrszmsRPHGLqCgg@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Reducing planning time when tables have many partitions (Yuya Watari <watari.yuya@gmail.com>) |
Responses |
Re: [PoC] Reducing planning time when tables have many partitions
|
List | pgsql-hackers |
Hello Alena, Andrei, and all, I am sorry for my late response. I found that the current patches do not apply to the master, so I have rebased those patches. I have attached v22. For this later discussion, I separated the rebasing and bug fixing that Alena did in v21 into separate commits, v22-0003 and v22-0004. I will merge these commits after the discussion. 1. v22-0003 (solved_conflict_with_self_join_removal.txt) Thank you for your rebase. Looking at your rebasing patch, I thought we could do this more simply. Your patch deletes (more precisely, sets to null) non-redundant members from the root->eq_sources list and re-adds them to the same list. However, this approach seems a little waste of memory. Instead, we can update EquivalenceClass->ec_source_indexes directly. Then, we can reuse the members in root->eq_sources and don't need to extend root->eq_sources. I did this in v22-0003. What do you think of this approach? The main concern with this idea is that it does not fix RangeTblEntry->eclass_source_indexes. The current code works fine even if we don't fix the index because get_ec_source_indexes() always does bms_intersect() for eclass_source_indexes and ec_source_indexes. If we guaranteed this behavior of doing bms_intersect, then simply modifying ec_source_indexes would be fine. Fortunately, such a guarantee is not so difficult. And your patch removes the following assertion code from the previous patch. May I ask why you removed this code? I think this assertion is helpful for sanity checks. Of course, I know that this kind of assertion will slow down regression tests or assert-enabled builds. So, we may have to discuss which assertions to keep and which to discard. ===== -#ifdef USE_ASSERT_CHECKING - /* verify the results look sane */ - i = -1; - while ((i = bms_next_member(rel_esis, i)) >= 0) - { - RestrictInfo *rinfo = list_nth_node(RestrictInfo, root->eq_sources, - i); - - Assert(bms_overlap(relids, rinfo->clause_relids)); - } -#endif ===== Finally, your patch changes the name of the following function. I understand the need for this change, but it has nothing to do with our patches, so we should not include it and discuss it in another thread. ===== -update_eclasses(EquivalenceClass *ec, int from, int to) +update_eclass(PlannerInfo *root, EquivalenceClass *ec, int from, int to) ===== 2. v22-0004 (bug_related_to_atomic_function.txt) Thank you for fixing the bug. As I wrote in the previous mail: On Wed, Nov 22, 2023 at 2:32 PM Yuya Watari <watari.yuya@gmail.com> wrote: > On Mon, Nov 20, 2023 at 1:45 PM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: > > During the work on committing the SJE feature [1], Alexander Korotkov > > pointed out the silver lining in this work [2]: he proposed that we > > shouldn't remove RelOptInfo from simple_rel_array at all but replace it > > with an 'Alias', which will refer the kept relation. It can simplify > > further optimizations on removing redundant parts of the query. > > Thank you for sharing this information. I think the idea suggested by > Alexander Korotkov is also helpful for our patch. As mentioned above, > the indexes are in RangeTblEntry in the current implementation. > However, I think RangeTblEntry is not the best place to store them. An > 'alias' relids may help solve this and simplify fixing the above bug. > I will try this approach soon. I think that the best way to solve this issue is to move these indexes from RangeTblEntry to RelOptInfo. Since they are related to planning time, they should be in RelOptInfo. The reason why I put these indexes in RangeTblEntry is because some RelOptInfos can be null and we cannot store the indexes. This problem is similar to an issue regarding 'varno 0' Vars. I hope an alias RelOptInfo would help solve this issue. I have attached the current proof of concept I am considering as poc-alias-reloptinfo.txt. To test this patch, please follow the procedure below. 1. Apply all *.patch files, 2. Apply Alexander Korotkov's alias_relids.patch [1], and 3. Apply poc-alias-reloptinfo.txt, which is attached to this email. My patch creates a dummy (or an alias) RelOptInfo to store indexes if the corresponding RelOptInfo is null. The following is the core change in my patch. ===== @@ -627,9 +627,19 @@ add_eq_source(PlannerInfo *root, EquivalenceClass *ec, RestrictInfo *rinfo) i = -1; while ((i = bms_next_member(rinfo->clause_relids, i)) >= 0) { - RangeTblEntry *rte = root->simple_rte_array[i]; + RelOptInfo *rel = root->simple_rel_array[i]; - rte->eclass_source_indexes = bms_add_member(rte->eclass_source_indexes, + /* + * If the corresponding RelOptInfo does not exist, we create a 'dummy' + * RelOptInfo for storing EquivalenceClass indexes. + */ + if (rel == NULL) + { + rel = root->simple_rel_array[i] = makeNode(RelOptInfo); + rel->eclass_source_indexes = NULL; + rel->eclass_derive_indexes = NULL; + } + rel->eclass_source_indexes = bms_add_member(rel->eclass_source_indexes, source_idx); } ===== At this point, I'm not sure if this approach is correct. It seems to pass the regression tests, but we should doubt its correctness. I will continue to experiment with this idea. [1] https://www.postgresql.org/message-id/CAPpHfdseB13zJJPZuBORevRnZ0vcFyUaaJeSGfAysX7S5er%2BEQ%40mail.gmail.com -- Best regards, Yuya Watari
Attachment
pgsql-hackers by date: