Re: [PoC] Reducing planning time when tables have many partitions - Mailing list pgsql-hackers
From | Yuya Watari |
---|---|
Subject | Re: [PoC] Reducing planning time when tables have many partitions |
Date | |
Msg-id | CAJ2pMka+FO=XFW9FC01AUFZA5gXE1e_6=6sdpt7khD-7NSHrMA@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Reducing planning time when tables have many partitions (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
Hello David, Thank you very much for your continuous contributions to this patch series, and especially for providing these new patches despite the time constraints. On Fri, Apr 4, 2025 at 3:04 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 4 Apr 2025 at 00:34, David Rowley <dgrowleyml@gmail.com> wrote: > > I've attached 2 patches, which I think addresses most of this, aside > > from the last point. > > > > These do need more work. I've just attached what I have so far before > > I head off for the day. I am planning on running some performance > > tests tomorrow and doing a round on the comments. > > I've done some further work on this, mostly relating to the code > comments. I also removed the now-empty > dispose_eclass_member_iterator() function. I agree with this new approach. It significantly simplifies the overall architecture of the patch series while still maintaining excellent performance. Thank you again for your effort here. > A couple of things which I'm still uncertain of: > > 1. How to handle the ec_childmembers array in _outEquivalenceClass(). > There's no field to know the size of the array. Maybe I should add one > and then print out the non-empty lists. I'm also not certain about the best solution here. As you suggested, adding a field representing the array size to EquivalenceClass seems like a reasonable approach. > 2. When processing RELOPT_OTHER_JOINREL in add_child_eq_member(), I'm > adding the member to each List for all individual relid mentioned in > child_relids. This will result in the member going on multiple Lists > and cause the iterator to possibly return the member multiple times. > That might matter in a few places, e.g. > generate_join_implied_equalities_normal() keeps some scoring based on > the number of members. > > For #2, Yuya's Bitmapset approach didn't suffer from this issue as the > Bitmapsets would be unioned to get the non-duplicative members. I > wondered about doing list_append_unique() instead of lappend() in > generate_join_implied_equalities_normal(). Unsure. The only other > thing I can think of is to do something else with members for > RELOPT_OTHER_JOINREL and store them elsewhere. Another approach I have in mind is adding an iterator pointer to each EquivalenceMember to track the iterator that last returned each member. When the iterator is about to return a member, it would first check if that member's iterator pointer matches the current iterator. If it does, we know this member has already been returned, so we skip it. However, this approach does not work when iterators are called recursively and leads to increased complexity in the data structure. Your proposed solution using list_append_unique() instead of lappend() seems practical since the number of EquivalenceMembers handled in generate_join_implied_equalities_normal() is usually limited. > I also did some benchmarking using the attached script. I've attached > the results of running that on my AMD Zen2 machine. See the end of the > script for the CREATE TABLE statement for loading that into postgres. > > The results look pretty good. v37 came out slightly faster than v36, > either noise or because of dispose_eclass_member_iterator() removal. Thank you for running your benchmarks as well. Your results look promising, demonstrating both reduced planning time and lower memory consumption. I have also conducted benchmarks using queries A and B, which I have used previously and are in [1]. Here is a quick summary: * The new patch (v37) shows better performance improvements compared to previous versions (v35 and v36). * The performance gains are significant and worth committing. * Performance regressions are negligible or non-existent, even with a small number of partitions. * Memory usage in v37 is lower than v35 and almost identical to the master. Detailed results are as follows: The following tables and attached figures indicate that v37 achieves up to 415.4% and 280.3% speedups for queries A and B, respectively. These improvements are better than those seen in v35 and v36. Importantly, v37 does not appear to introduce any regressions. Its speedups exceeded 100% in all tested cases except for the one with two partitions in query A. Even in that case, the performance remained at 99.9% of the master, demonstrating that the regression is negligible. Moreover, Table 5 and the attached figure show v37 consumes no additional memory compared to the master. Table 1: Planning time for query A (ms) ------------------------------------------- n | Master | v35 | v36 | v37 ------------------------------------------- 1 | 0.274 | 0.273 | 0.274 | 0.270 2 | 0.285 | 0.288 | 0.286 | 0.286 4 | 0.381 | 0.378 | 0.368 | 0.372 8 | 0.477 | 0.468 | 0.471 | 0.471 16 | 0.698 | 0.671 | 0.667 | 0.650 32 | 1.251 | 1.190 | 1.169 | 1.149 64 | 2.848 | 2.550 | 2.463 | 2.444 128 | 6.051 | 4.692 | 4.669 | 4.588 256 | 16.812 | 10.851 | 10.784 | 10.742 384 | 30.985 | 16.640 | 16.354 | 16.243 512 | 50.548 | 23.174 | 22.981 | 22.940 640 | 72.046 | 28.725 | 28.679 | 28.296 768 | 102.668 | 34.975 | 34.759 | 34.280 896 | 150.563 | 46.764 | 46.313 | 46.006 1024 | 197.559 | 48.243 | 47.777 | 47.553 ------------------------------------------- Table 2: Speedup of query A (higher is better) --------------------------------- n | v35 | v36 | v37 --------------------------------- 1 | 100.6% | 100.2% | 101.5% 2 | 99.2% | 99.9% | 99.9% 4 | 100.6% | 103.3% | 102.3% 8 | 101.8% | 101.2% | 101.2% 16 | 104.0% | 104.6% | 107.4% 32 | 105.1% | 107.0% | 108.9% 64 | 111.7% | 115.6% | 116.5% 128 | 129.0% | 129.6% | 131.9% 256 | 154.9% | 155.9% | 156.5% 384 | 186.2% | 189.5% | 190.8% 512 | 218.1% | 220.0% | 220.4% 640 | 250.8% | 251.2% | 254.6% 768 | 293.5% | 295.4% | 299.5% 896 | 322.0% | 325.1% | 327.3% 1024 | 409.5% | 413.5% | 415.4% --------------------------------- Table 3: Planning time for query B (ms) ------------------------------------------ n | Master | v35 | v36 | v37 ------------------------------------------ 1 | 12.300 | 12.419 | 12.219 | 12.209 2 | 11.741 | 11.761 | 11.652 | 11.639 4 | 12.573 | 12.376 | 12.390 | 12.418 8 | 13.653 | 13.242 | 13.074 | 13.081 16 | 15.693 | 14.717 | 14.503 | 14.416 32 | 20.957 | 17.890 | 17.732 | 17.675 64 | 35.914 | 25.772 | 25.633 | 25.495 128 | 79.154 | 42.826 | 42.441 | 42.407 256 | 243.880 | 88.246 | 87.626 | 87.011 ------------------------------------------ Table 4: Speedup of query B (higher is better) -------------------------------- n | v35 | v36 | v37 -------------------------------- 1 | 99.0% | 100.7% | 100.7% 2 | 99.8% | 100.8% | 100.9% 4 | 101.6% | 101.5% | 101.2% 8 | 103.1% | 104.4% | 104.4% 16 | 106.6% | 108.2% | 108.9% 32 | 117.1% | 118.2% | 118.6% 64 | 139.4% | 140.1% | 140.9% 128 | 184.8% | 186.5% | 186.7% 256 | 276.4% | 278.3% | 280.3% -------------------------------- Table 5: Memory usage (MB) (n: number of partitions per table; PWJ: partition-wise join) ---------------------------------------------------------------- Query | n | PWJ | Master | v35 | v36 | v37 ---------------------------------------------------------------- A | 1024 | OFF | 48.138 | 49.606 | 48.341 | 48.341 A | 1024 | ON | 127.483 | 128.952 | 127.687 | 127.687 B | 256 | OFF | 92.507 | 96.882 | 92.632 | 92.632 B | 256 | ON | 5803.316 | 5807.691 | 5803.441 | 5803.441 ---------------------------------------------------------------- Again, I greatly appreciate your taking the time to significantly improve this patch. I'd also like to thank Tom once again for his valuable feedback, which greatly contributed to these improvements. [1] https://www.postgresql.org/message-id/CAJ2pMkYcKHFBD_OMUSVyhYSQU0-j9T6NZ0pL6pwbZsUCohWc7Q@mail.gmail.com -- Best regards, Yuya Watari
Attachment
pgsql-hackers by date: