Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids) - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)
Date
Msg-id CAExHW5snUW7pD2RdtaBa1T_TqJYaY6W_YPVjWDrgSf33i-0uqA@mail.gmail.com
Whole thread Raw
In response to Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)
List pgsql-hackers
On Fri, Jul 28, 2023 at 3:16 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi Tom, Richard,
>
> On Mon, Jul 24, 2023 at 8:17 AM Richard Guo <guofenglinux@gmail.com> wrote:
> >
> > Thanks for pushing it!
>
> With this fix, I saw a noticeable increase in the memory consumption
> of planner. I was running experiments mentioned in [1] The reason is
> the Bitmapset created by bms_union() are not freed during planning and
> when there are thousands of child joins involved, bitmapsets takes up
> a large memory and there any a large number of bitmaps.
>
> Attached 0002 patch fixes the memory consumption by calculating
> appinfos only once and using them twice. The number look like below
>
> Number of tables joined | without patch | with patch |
> ------------------------------------------------------
>                       2 |      40.8 MiB |   40.3 MiB |
>                       3 |     151.6 MiB |  146.9 MiB |
>                       4 |     463.9 MiB |  445.5 MiB |
>                       5 |    1663.9 MiB | 1563.3 MiB |
>
> The memory consumption is prominent at higher number of joins as that
> exponentially increases the number of child joins.
>
> Attached setup.sql and queries.sql and patch 0001 were used to measure
> memory similar to [1].
>
> I don't think it's a problem with the patch itself. We should be able
> to use Bitmapset APIs similar to what patch is doing. But there's a
> problem with our Bitmapset implementation. It's not space efficient
> for thousands of partitions. A quick calculation reveals this.
>
> If the number of partitions is 1000, the matching partitions will
> usually be 1000, 2000, 3000 and so on. Thus the bitmapset represnting
> the relids will be {b 1000, 2000, 3000, ...}. To represent a single
> 1000th digit current Bitmapset implementation will allocate 1000/8 =
> 125 bytes of memory. A 5 way join will require 125 * 5 = 625 bytes of
> memory. This is even true for lower relid numbers since they will be
> 1000 bits away e.g. (1, 1001, 2001, 3001, ...). So 1000 such child
> joins require 625KiB memory. Doing this as many times as the number of
> joins we get 120 * 625 KiB = 75 MiB which is closer to the memory
> difference we see above.
>
> Even if we allocate a list to hold 5 integers it will not take 625
> bytes. I think we need to improve our Bitmapset representation to be
> efficient in such cases. Of course whatever representation we choose
> has to be space efficient for a small number of tables as well and
> should gel well with our planner logic. So I guess some kind of
> dynamic representation which changes the underlying layout based on
> the contents is required. I have looked up past hacker threads to see
> if this has been discussed previously.
>
> I don't think this is the thread to discuss it and also I am not
> planning to work on it in the immediate future. But I thought I would
> mention it where I found it.
>
> [1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com
>

Adding this small patch to the commitfest in case somebody finds it
worth fixing this specific memory consumption. With a new subject.



--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Add resource intensiveness as a reason to not running tests by default
Next
From: Jelte Fennema
Date:
Subject: Re: Support prepared statement invalidation when result types change