Re: [PoC] Reducing planning time when tables have many partitions - Mailing list pgsql-hackers

From Yuya Watari
Subject Re: [PoC] Reducing planning time when tables have many partitions
Date
Msg-id CAJ2pMka+FO=XFW9FC01AUFZA5gXE1e_6=6sdpt7khD-7NSHrMA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Reducing planning time when tables have many partitions  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Hello David,

Thank you very much for your continuous contributions to this patch
series, and especially for providing these new patches despite the
time constraints.

On Fri, Apr 4, 2025 at 3:04 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 4 Apr 2025 at 00:34, David Rowley <dgrowleyml@gmail.com> wrote:
> > I've attached 2 patches, which I think addresses most of this, aside
> > from the last point.
> >
> > These do need more work. I've just attached what I have so far before
> > I head off for the day. I am planning on running some performance
> > tests tomorrow and doing a round on the comments.
>
> I've done some further work on this, mostly relating to the code
> comments. I also removed the now-empty
> dispose_eclass_member_iterator() function.

I agree with this new approach. It significantly simplifies the
overall architecture of the patch series while still maintaining
excellent performance. Thank you again for your effort here.

> A couple of things which I'm still uncertain of:
>
> 1. How to handle the ec_childmembers array in _outEquivalenceClass().
> There's no field to know the size of the array. Maybe I should add one
> and then print out the non-empty lists.

I'm also not certain about the best solution here. As you suggested,
adding a field representing the array size to EquivalenceClass seems
like a reasonable approach.

> 2. When processing RELOPT_OTHER_JOINREL in add_child_eq_member(), I'm
> adding the member to each List for all individual relid mentioned in
> child_relids.  This will result in the member going on multiple Lists
> and cause the iterator to possibly return the member multiple times.
> That might matter in a few places, e.g.
> generate_join_implied_equalities_normal() keeps some scoring based on
> the number of members.
>
> For #2, Yuya's Bitmapset approach didn't suffer from this issue as the
> Bitmapsets would be unioned to get the non-duplicative members. I
> wondered about doing list_append_unique() instead of lappend() in
> generate_join_implied_equalities_normal(). Unsure. The only other
> thing I can think of is to do something else with members for
> RELOPT_OTHER_JOINREL and store them elsewhere.

Another approach I have in mind is adding an iterator pointer to each
EquivalenceMember to track the iterator that last returned each
member. When the iterator is about to return a member, it would first
check if that member's iterator pointer matches the current iterator.
If it does, we know this member has already been returned, so we skip
it. However, this approach does not work when iterators are called
recursively and leads to increased complexity in the data structure.
Your proposed solution using list_append_unique() instead of lappend()
seems practical since the number of EquivalenceMembers handled in
generate_join_implied_equalities_normal() is usually limited.

> I also did some benchmarking using the attached script. I've attached
> the results of running that on my AMD Zen2 machine. See the end of the
> script for the CREATE TABLE statement for loading that into postgres.
>
> The results look pretty good. v37 came out slightly faster than v36,
> either noise or because of dispose_eclass_member_iterator() removal.

Thank you for running your benchmarks as well. Your results look
promising, demonstrating both reduced planning time and lower memory
consumption.

I have also conducted benchmarks using queries A and B, which I have
used previously and are in [1]. Here is a quick summary:

* The new patch (v37) shows better performance improvements compared
to previous versions (v35 and v36).
* The performance gains are significant and worth committing.
* Performance regressions are negligible or non-existent, even with a
small number of partitions.
* Memory usage in v37 is lower than v35 and almost identical to the master.

Detailed results are as follows:

The following tables and attached figures indicate that v37 achieves
up to 415.4% and 280.3% speedups for queries A and B, respectively.
These improvements are better than those seen in v35 and v36.

Importantly, v37 does not appear to introduce any regressions. Its
speedups exceeded 100% in all tested cases except for the one with two
partitions in query A. Even in that case, the performance remained at
99.9% of the master, demonstrating that the regression is negligible.

Moreover, Table 5 and the attached figure show v37 consumes no
additional memory compared to the master.

Table 1: Planning time for query A (ms)
-------------------------------------------
    n |  Master |    v35 |    v36 |    v37
-------------------------------------------
    1 |   0.274 |  0.273 |  0.274 |  0.270
    2 |   0.285 |  0.288 |  0.286 |  0.286
    4 |   0.381 |  0.378 |  0.368 |  0.372
    8 |   0.477 |  0.468 |  0.471 |  0.471
   16 |   0.698 |  0.671 |  0.667 |  0.650
   32 |   1.251 |  1.190 |  1.169 |  1.149
   64 |   2.848 |  2.550 |  2.463 |  2.444
  128 |   6.051 |  4.692 |  4.669 |  4.588
  256 |  16.812 | 10.851 | 10.784 | 10.742
  384 |  30.985 | 16.640 | 16.354 | 16.243
  512 |  50.548 | 23.174 | 22.981 | 22.940
  640 |  72.046 | 28.725 | 28.679 | 28.296
  768 | 102.668 | 34.975 | 34.759 | 34.280
  896 | 150.563 | 46.764 | 46.313 | 46.006
 1024 | 197.559 | 48.243 | 47.777 | 47.553
-------------------------------------------

Table 2: Speedup of query A (higher is better)
---------------------------------
    n |    v35 |    v36 |    v37
---------------------------------
    1 | 100.6% | 100.2% | 101.5%
    2 |  99.2% |  99.9% |  99.9%
    4 | 100.6% | 103.3% | 102.3%
    8 | 101.8% | 101.2% | 101.2%
   16 | 104.0% | 104.6% | 107.4%
   32 | 105.1% | 107.0% | 108.9%
   64 | 111.7% | 115.6% | 116.5%
  128 | 129.0% | 129.6% | 131.9%
  256 | 154.9% | 155.9% | 156.5%
  384 | 186.2% | 189.5% | 190.8%
  512 | 218.1% | 220.0% | 220.4%
  640 | 250.8% | 251.2% | 254.6%
  768 | 293.5% | 295.4% | 299.5%
  896 | 322.0% | 325.1% | 327.3%
 1024 | 409.5% | 413.5% | 415.4%
---------------------------------

Table 3: Planning time for query B (ms)
------------------------------------------
   n |  Master |    v35 |    v36 |    v37
------------------------------------------
   1 |  12.300 | 12.419 | 12.219 | 12.209
   2 |  11.741 | 11.761 | 11.652 | 11.639
   4 |  12.573 | 12.376 | 12.390 | 12.418
   8 |  13.653 | 13.242 | 13.074 | 13.081
  16 |  15.693 | 14.717 | 14.503 | 14.416
  32 |  20.957 | 17.890 | 17.732 | 17.675
  64 |  35.914 | 25.772 | 25.633 | 25.495
 128 |  79.154 | 42.826 | 42.441 | 42.407
 256 | 243.880 | 88.246 | 87.626 | 87.011
------------------------------------------

Table 4: Speedup of query B (higher is better)
--------------------------------
   n |    v35 |    v36 |    v37
--------------------------------
   1 |  99.0% | 100.7% | 100.7%
   2 |  99.8% | 100.8% | 100.9%
   4 | 101.6% | 101.5% | 101.2%
   8 | 103.1% | 104.4% | 104.4%
  16 | 106.6% | 108.2% | 108.9%
  32 | 117.1% | 118.2% | 118.6%
  64 | 139.4% | 140.1% | 140.9%
 128 | 184.8% | 186.5% | 186.7%
 256 | 276.4% | 278.3% | 280.3%
--------------------------------

Table 5: Memory usage (MB)
(n: number of partitions per table; PWJ: partition-wise join)
----------------------------------------------------------------
 Query |    n | PWJ |   Master |      v35 |      v36 |      v37
----------------------------------------------------------------
     A | 1024 | OFF |   48.138 |   49.606 |   48.341 |   48.341
     A | 1024 |  ON |  127.483 |  128.952 |  127.687 |  127.687
     B |  256 | OFF |   92.507 |   96.882 |   92.632 |   92.632
     B |  256 |  ON | 5803.316 | 5807.691 | 5803.441 | 5803.441
----------------------------------------------------------------

Again, I greatly appreciate your taking the time to significantly
improve this patch. I'd also like to thank Tom once again for his
valuable feedback, which greatly contributed to these improvements.

[1] https://www.postgresql.org/message-id/CAJ2pMkYcKHFBD_OMUSVyhYSQU0-j9T6NZ0pL6pwbZsUCohWc7Q@mail.gmail.com

--
Best regards,
Yuya Watari

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Making sslrootcert=system work on Windows psql
Next
From: Amit Langote
Date:
Subject: Re: [PoC] Reducing planning time when tables have many partitions