Thread: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

Hi All,

Following up on [1] ...

A restrictlist is a list of RestrictInfo nodes, each representing one
clause, applicable to a set of joins.  When partitionwise join is used
as a strategy, the restrictlists for child join are obtained by
translating the restrictlists for the parent in
try_partitionwise_join(). That function is called once for every join
order. E.g. when computing join ABC, it will be called for planning
joins AB, BC, AC, (AB)C, A(BC), B(AC). Every time it is called it will
translate the given parent restrictlist. This means that a
RestrictInfo node applicable to given child relations will be
translated as many times as the join orders in which those child
relations appear in different joining relations.

For example, consider a query "select * from A, B, C where A.a = B.a
and B.a = C.a" where A, B and C are partitioned tables. A has
partitions A1, A2, ... An. B has partitions B1, B2, ... Bn and C has
partitions C1, C2, ... Cn. Partitions Ai, Bi and Ci are matching
partitions respectively for all i's. The join ABC is computed as
Append of (A1B1C1, A2B2C2, ... AnBnCn). The clause A.a = B.a is
translated to A1.a = B1.a thrice, when computing A1B1, A1(B1C1) and
B1(A1C1) respectively. Similarly other clauses are translated multiple
times. Some extra translations also happen in
reparameterize_path_by_child().

These translations consume memory which remains allocated till the
statement finishes. A ResrtictInfo should be translated only once per
parent-child pair, thus avoiding consuming extra memory.

There are two patches attached
0001 - to measure memory consumption during planning. This is the same
one as attached to [1].
0002 - WIP patch to avoid repeated translations of RestrictInfo.
The WIP patch avoids repeated translations by tracking the child for
which a RestrictInfo is translated and reusing the same translation
every time it is requested. In order to track the translations,
RestrictInfo gets two new members.
1. parent_rinfo - In a child's RestrictInfo this points to the
RestrictInfo applicable to the topmost parent in partition hierarchy.
This is NULL in the topmost parent's RestrictInfo
2. child_rinfos - In a parent's RestrictInfo, this is a list that
contains all the translated child RestrictInfos. In child
RestrictInfos this is NULL.

Every translated RestrictInfo is stored in the top parent's
RestrictInfo child_rinfos. RestrictInfo::required_relids is used as a
key to search a given translation. I have intercepted
adjust_appendrel_attrs_mutator() to track translations as well as
avoid multiple translations. It first looks for an existing
translation when translating a RestrictInfo and creates a new one only
when one doesn't exist already.

Using this patch the memory consumption for the above query reduces as follows

Number of partitions: 1000

Number of tables | without patch  | with patch | % reduction |
being joined     |                |            |             |
--------------------------------------------------------------
               2 |      40.3 MiB  |   37.7 MiB |       6.43% |
               3 |     146.8 MiB  |  133.0 MiB |       9.42% |
               4 |     445.4 MiB  |  389.5 MiB |      12.57% |
               5 |    1563.2 MiB  | 1243.2 MiB |      20.47% |

The number of times a RestrictInfo requires to be translated increases
exponentially with the number of tables joined. Thus we see more
memory saved as the number of tables joined increases. When two tables
are joined there's only a single join planned so no extra translations
happen in try_partitionwise_join(). The memory saved in case of 2
joining tables comes from avoiding extra translations happening during
reparameterization of paths (in reparameterize_path_by_child()).

The attached patch is to show how much memory can be saved if we avoid
extra translation. But I want to discuss the following things about
the approach.

1. The patch uses RestrictInfo::required_relids as the key for
searching child RelOptInfos. I am not sure which of the two viz.
required_relids and clause_relids is a better key. required_relids
seems to be a subset of clause_relids and from the description it
looks like that's the set that decides the applicability of a clause
in a join. But clause_relids is obtained from all the Vars that appear
in the clause, so may be that's the one that matters for the
translations. Can somebody guide me?

2. The patch adds two extra pointers per RestrictInfo. They will
remain unused when partitionwise join is not used. Right now, I do not
see any increase in memory consumed by planner because of those
pointers even in case of unpartitioned tables; maybe they are absorbed
in memory alignment. They may show up as extra memory in the future. I
am wondering whether we can instead save and track translations in
PlannerInfo as a hash table using <rinfo_serial, required_relids (or
whatever is the answer to above question) of parent and child
respectively> as key. That will just add one extra pointer in
PlannerInfo when partitionwise join is not used. Please let me know
your suggestions.

3. I have changed adjust_appendrel_attrs_mutator() to return a
translated RestrictInfo if it already exists. IOW, it won't always
return a deep copy of given RestrictInfo as it does today. This can be
fixed by writing wrappers around adjust_appendrel_attrs() to translate
RestrictInfo specifically. But maybe we don't always need deep copies.
Are there any cases when we need translated deep copies of
RestrictInfo? Those cases will require fixing callers of
adjust_appendrel_attrs() instead of the mutator.

4. IIRC, when partitionwise join was implemented we had discussed
creating child RestrictInfos using a login similar to
build_joinrel_restrictlist(). That might be another way to build
RestrictInfo only once and use it multiple times. But we felt that it
was much harder problem to solve since it's not known which partitions
from joining partitioned tables will match and will be joined till we
enter try_partitionwise_join(), so the required RestrictInfos may not
be available in RelOptInfo::joininfo. Let me know your thoughts on
this.

Comments/suggestions welcome.

references
[1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachment

On Thu, Jul 27, 2023 at 10:06 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
0002 - WIP patch to avoid repeated translations of RestrictInfo.
The WIP patch avoids repeated translations by tracking the child for
which a RestrictInfo is translated and reusing the same translation
every time it is requested. In order to track the translations,
RestrictInfo gets two new members.
1. parent_rinfo - In a child's RestrictInfo this points to the
RestrictInfo applicable to the topmost parent in partition hierarchy.
This is NULL in the topmost parent's RestrictInfo
2. child_rinfos - In a parent's RestrictInfo, this is a list that
contains all the translated child RestrictInfos. In child
RestrictInfos this is NULL.

I haven't looked into the details but with 0002 patch I came across a
crash while planning the query below.

regression=# set enable_partitionwise_join to on;
SET
regression=# EXPLAIN (COSTS OFF)
SELECT * FROM prt1 t1, prt2 t2
WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0;
server closed the connection unexpectedly

Thanks
Richard
On Tue, Aug 8, 2023 at 4:08 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> I haven't looked into the details but with 0002 patch I came across a
> crash while planning the query below.
>
> regression=# set enable_partitionwise_join to on;
> SET
> regression=# EXPLAIN (COSTS OFF)
> SELECT * FROM prt1 t1, prt2 t2
> WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0;
> server closed the connection unexpectedly

Thanks for the report. PFA the patches with this issue fixed. There is
another issue seen when running partition_join.sql "ERROR:  index key
does not match expected index column". I will investigate that issue.

Right now I am looking at the 4th point in my first email in this
thread. [1]. I am trying to figure whether that approach would work.
If that approach doesn't work what's the best to track the
translations and also figure out answers to 1, 2, 3 there. Please let
me know your opinions if any.

--
Best Wishes,
Ashutosh Bapat

[1] https://www.postgresql.org/message-id/CAExHW5s=bCLMMq8n_bN6iU+Pjau0DS3z_6Dn6iLE69ESmsPMJQ@mail.gmail.com

Attachment
On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> 0001 - to measure memory consumption during planning. This is the same
> one as attached to [1].

I see you're recording the difference in the CurrentMemoryContext of
palloc'd memory before and after planning.  That won't really alert us
to problems if the planner briefly allocates, say 12GBs of memory
during, say the join search then quickly pfree's it again.  unless
it's an oversized chunk, aset.c won't free() any memory until
MemoryContextReset(). Chunks just go onto a freelist for reuse later.
So at the end of planning, the context may still have that 12GBs
malloc'd, yet your new EXPLAIN ANALYZE property might end up just
reporting a tiny fraction of that.

I wonder if it would be more useful to just have ExplainOneQuery()
create a new memory context and switch to that and just report the
context's mem_allocated at the end.

It's also slightly annoying that these planner-related summary outputs
are linked to EXPLAIN ANALYZE. We could be showing them in EXPLAIN
without ANALYZE.  If we were to change that now, it might be a bit
annoying for the regression tests as we'd need to go and add SUMMARY
OFF in a load of places...

drowley@amd3990x:~/pg_src/src/test/regress/sql$ git grep -i "costs off" | wc -l
1592

hmm, that would cause a bit of churn... :-(

David



On Wed, Aug 9, 2023 at 10:12 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > 0001 - to measure memory consumption during planning. This is the same
> > one as attached to [1].
>

I have started a separate thread to discuss this patch. I am taking
this discussion to that thread.

--
Best Wishes,
Ashutosh Bapat



I spent some time on 4th point below but also looked at other points.
Here's what I have found so far

On Thu, Jul 27, 2023 at 7:35 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> 1. The patch uses RestrictInfo::required_relids as the key for
> searching child RelOptInfos. I am not sure which of the two viz.
> required_relids and clause_relids is a better key. required_relids
> seems to be a subset of clause_relids and from the description it
> looks like that's the set that decides the applicability of a clause
> in a join. But clause_relids is obtained from all the Vars that appear
> in the clause, so may be that's the one that matters for the
> translations. Can somebody guide me?

I was wrong that required_relids is subset of clause_relids. The first
can contain OJ relids which the other can not. OJ relids do not have
any children, so they won't be translated. So clause_relids seems to
be a better key. I haven't made a decision yet.

>
> 2. The patch adds two extra pointers per RestrictInfo. They will
> remain unused when partitionwise join is not used. Right now, I do not
> see any increase in memory consumed by planner because of those
> pointers even in case of unpartitioned tables; maybe they are absorbed
> in memory alignment. They may show up as extra memory in the future. I
> am wondering whether we can instead save and track translations in
> PlannerInfo as a hash table using <rinfo_serial, required_relids (or
> whatever is the answer to above question) of parent and child
> respectively> as key. That will just add one extra pointer in
> PlannerInfo when partitionwise join is not used. Please let me know
> your suggestions.

I will go ahead with a pointer in PlannerInfo for now.

>
> 3. I have changed adjust_appendrel_attrs_mutator() to return a
> translated RestrictInfo if it already exists. IOW, it won't always
> return a deep copy of given RestrictInfo as it does today. This can be
> fixed by writing wrappers around adjust_appendrel_attrs() to translate
> RestrictInfo specifically. But maybe we don't always need deep copies.
> Are there any cases when we need translated deep copies of
> RestrictInfo? Those cases will require fixing callers of
> adjust_appendrel_attrs() instead of the mutator.

I think it's better to handle the tracking logic outside
adjust_appendrel_attrs. That will be some code churn but it will be
cleaner and won't affect anything other that partitionwise joins.

>
> 4. IIRC, when partitionwise join was implemented we had discussed
> creating child RestrictInfos using a login similar to
> build_joinrel_restrictlist(). That might be another way to build
> RestrictInfo only once and use it multiple times. But we felt that it
> was much harder problem to solve since it's not known which partitions
> from joining partitioned tables will match and will be joined till we
> enter try_partitionwise_join(), so the required RestrictInfos may not
> be available in RelOptInfo::joininfo. Let me know your thoughts on
> this.

Here's some lengthy description of why I feel translations are better
compared to computing restrictlist and joininfo for a child join from
joining relation's joininfo
Consider a query "select * from p, q, r where p.c1 = q.c1 and q.c1 =
r.c1 and p.c2 + q.c2 < r.c2 and p.c3 != q.c3 and q.c3 != r.c3". The
query has following clauses
1. p.c1 = q.c1
2. q.c1 = r.c1
3. p.c2 + q.c2 < r.c2
4. p.c3 != q.c3
5. q.c3 != r.c3

The first two clauses are added to EC machinery and do not appear in
joininfo. They appear in restrictlist when we construct clauses in
restrictlist from ECs. Let's ignore them for now.

Assume that each table p, q, r has partitions (p1, p2, ...), (q1, q2,
...) and (r1, r2, ... ) respectively. Each triplet (pi, qi,ri) forms
the set of matching partitions from p, q, r respectively for all "i".
Consider join, p1q1r1. We will generate relations p1, q1, r1, p1q1,
p1r1, q1r1 and p1q1r1 while building the last join. Below is
description of how these clauses would look in each of these relations
and the list they appear in when computing that join. Please notice
the numeric suffixes carefully.

p1.
joininfo: p1.c2 + q.c2 < r.c2, p1.c3 != q.c3
restrictlist: <>

q1
joininfo: p.c2 + q1.c2 < r.c2, p.c3 != q1.c3, q1.c3 != r.c3
restrictlist: <>

r1
joininfo: p.c2 + q.c2 < r1.c2, q.c3 != r1.c3
restrictlist: <>

p1q1
joininfo: p1.c2 + q1.c2 < r.c2, q1.c3 != r.c3
restrictlist: p1.c3 != q1.c3

q1r1
joininfo: p.c2 + q1.c2 < r1.c2, p.c3 != q1.c3
restrictlist: q1.c3 != r1.c3

p1r1
joininfo: p1.c2 + q.c2 < r1.c2, p1.c3 != q.c3, q.c3 != r1.c3
restrictlist: <>

p1q1r1
joininfo: <>
restrictlist for (p1q1)r1: p1.c2 + q1.c2 < r1.c2, q1.c3 != r1.c3
restrictlist for (p1r1)q1: p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3, q1.c3 != r1.c3
restrictlist for p1(q1r1): p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3

If we translate the clauses when building join e.g. translate p1.c3 !=
q1.c3 when building p1q1 or p1q1r1, it would cause repeated
translations. So the translations need to be saved in lower relations
when we detect matching partitions and then use these translations.
Something I have done in the attached patches. But the problem is the
same clause reaches its final translation through different
intermediate translations as the join search advances. E.g. the
evolution of p.c2 + q.c2 < r.c2 to p1.c2 + q1.c2 < r1.c2 has three
different intemediate translations at second level of join. Each of
these intermediate translations conflict with each other and none of
them can be saved in any of the second level joins as a candidate for
the last stage translation. Extending the logic in the patches would
make those more complicated.

Another possibility is to avoid the same clause being translated
multiple times when building the join using
RestrictInfo::rinfo_serial. But simply that won't help avoiding
repeated translations caused by different join orders. E.g. we won't
be able to detect that p.c2 + q.c2 < r.c2 has been translated to p1.c2
+ q1.c2 < r1.c2 already when we computed (p1r1)q1 or p1(q1r1) or
(p1q1)r1 whichever was computed earlier. For that we need some
tracking outside the join relations themselves like I did in my first
patch.

Coming back to the problem of generating child restrictlist clauses
from equivalence classes, I think it's easier with some tweaks to pass
child relids down to the minions when dealing with child joins. It
seems to be working as is but I haven't tested it thoroughly.

Obtaining child clauses from parent clauses by translation and
tracking the translations is less complex and may be more efficient
too. I will post a patch on those lines soon.

--
Best Wishes,
Ashutosh Bapat

Attachment
Hi All,

On Fri, Aug 11, 2023 at 6:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Obtaining child clauses from parent clauses by translation and
> tracking the translations is less complex and may be more efficient
> too. I will post a patch on those lines soon.
>

PFA patch set to add infrastructure to track RestrictInfo translations
and reuse them.

PlannerInfo gets a new member "struct HTAB *child_rinfo_hash" which is
a hash table of hash tables keyed by RestrictInfo::rinfo_serial. Each
element in the array is a hash table of RestrictInfos keyed by
RestrictInfo::required_relids as explained in my previous email. When
building child clauses when a. building child join rels or b. when
reparameterizing paths, we first access the first level hash table
using RestrictInfo::rinfo_serial of the parent and search the required
translation by computing the child RestrictInfo::required_relids
obtained by translating RestrictInfo::required_relids of the parent
RestrictInfo. If the translation doesn't exist, we create one and add
to the hash table.

RestrictInfo::required_relids is same for a RestrictInfo and its
commuted RestrictInfo. The order of operands is important for
IndexClauses. Hence we track the commuted RestrictInfo in a new field
RestrictInfo::comm_rinfo. RestrictInfo::is_commuted differentiates
between a RestrictInfo and its commuted version. This is explained as
a comment in the patch. This scheme has a minor benefit of saving
memory when the same RestrictInfo is commuted multiple times.

Hash table of hash table is used instead of an array of hash tables
since a. not every rinfo_serial has a RestrictInfo associated with it
b. not every RestrictInfo has translations, c. I don't think the exact
size of this array is not known till the planning ends since we
continue to create new clauses as we create new RelOptInfos. Of
course, an array can be repalloc'ed and unused slots in the array may
not waste a lot of memory. I am open to change hash table to an array
which may be more efficient.

With these set of patches, the memory consumption stands as below
Number of tables | without patch  | with patch | % reduction |
being joined     |                |            |             |
--------------------------------------------------------------
               2 |      40.8 MiB  |   37.4 MiB |       8.46% |
               3 |     151.6 MiB  |  135.0 MiB |      10.96% |
               4 |     464.0 MiB  |  403.6 MiB |      12.00% |
               5 |    1663.9 MiB  | 1329.1 MiB |      20.12% |

The patch set is thus
0001 - patch used to measure memory used during planning

0002 - Patch to free intermediate Relids computed by
adjust_child_relids_multilevel(). I didn't test memory consumption for
multi-level partitioning. But this is clear improvement. In that
function we free the AppendInfos array which as many pointers long as
the number of relids. So it doesn't make sense not to free the Relids
which can be {largest relid}/8 bytes long at least.

0003 - patch to save and reuse commuted RestrictInfo. This patch by
itself shows a small memory saving (3%) in the query below where the
same clause is commuted twice. The query does not contain any
partitioned tables.
create table t2 (a int primary key, b int, c int);
create index t2_a_b on t2(a, b);
select * from t2 where 10 = a
Memory used without patch: 13696 bytes
Memory used with patch: 13264 bytes

0004 - Patch which implements the hash table of hash table described
above and also code to avoid repeated RestrictInfo list translations.

I will add this patchset to next commitfest.

--
Best Wishes,
Ashutosh Bapat

Attachment
On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> The patch set is thus
> 0001 - patch used to measure memory used during planning
>
> 0002 - Patch to free intermediate Relids computed by
> adjust_child_relids_multilevel(). I didn't test memory consumption for
> multi-level partitioning. But this is clear improvement. In that
> function we free the AppendInfos array which as many pointers long as
> the number of relids. So it doesn't make sense not to free the Relids
> which can be {largest relid}/8 bytes long at least.
>
> 0003 - patch to save and reuse commuted RestrictInfo. This patch by
> itself shows a small memory saving (3%) in the query below where the
> same clause is commuted twice. The query does not contain any
> partitioned tables.
> create table t2 (a int primary key, b int, c int);
> create index t2_a_b on t2(a, b);
> select * from t2 where 10 = a
> Memory used without patch: 13696 bytes
> Memory used with patch: 13264 bytes
>
> 0004 - Patch which implements the hash table of hash table described
> above and also code to avoid repeated RestrictInfo list translations.
>
> I will add this patchset to next commitfest.
>
> --
> Best Wishes,
> Ashutosh Bapat

PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is
the squashed version of the latest patch set at
https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com.

--
Best Wishes,
Ashutosh Bapat

Attachment

Hi! Thank you for your work on the subject, I think it's a really useful feature. I've reviewed your patch and I have a few questions.

First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that this is an important feature, but I think only for debugging.

Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn't you generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0].

[0] https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru

-- 
Regards,
Alena Rybakina
Postgres Professional
On 24.11.2023 13:20, Alena Rybakina wrote:

Hi! Thank you for your work on the subject, I think it's a really useful feature. I've reviewed your patch and I have a few questions.

First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that this is an important feature, but I think only for debugging.

Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn't you generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0].

[0] https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru

Sorry, my first question was not clear, I mean: have you thought about creating a guc parameter to display memory planning information?
-- 
Regards,
Alena Rybakina
On Fri, Nov 24, 2023 at 3:56 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
>
> On 24.11.2023 13:20, Alena Rybakina wrote:
>
> Hi! Thank you for your work on the subject, I think it's a really useful feature.
>
> I've reviewed your patch and I have a few questions.
>
> First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that
thisis an important feature, but I think only for debugging. 

Not a GUC parameter but I have a proposal to use EXPLAIN for the same. [1]

>
> Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why
didn'tyou generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0]. 

Not usually. But that's the only "key" we have to access a set of
sematically same RestrictInfos. Relids is another key to access the
exact RestrictInfo. A child RestrictInfo can not be used since there
will many child RestrictInfos. Similar parent RestrictInfo can not be
used since there will be multiple forms of the same RestrictInfo.

[1] https://commitfest.postgresql.org/45/4492/

--
Best Wishes,
Ashutosh Bapat



On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > The patch set is thus
> > 0001 - patch used to measure memory used during planning
> >
> > 0002 - Patch to free intermediate Relids computed by
> > adjust_child_relids_multilevel(). I didn't test memory consumption for
> > multi-level partitioning. But this is clear improvement. In that
> > function we free the AppendInfos array which as many pointers long as
> > the number of relids. So it doesn't make sense not to free the Relids
> > which can be {largest relid}/8 bytes long at least.
> >
> > 0003 - patch to save and reuse commuted RestrictInfo. This patch by
> > itself shows a small memory saving (3%) in the query below where the
> > same clause is commuted twice. The query does not contain any
> > partitioned tables.
> > create table t2 (a int primary key, b int, c int);
> > create index t2_a_b on t2(a, b);
> > select * from t2 where 10 = a
> > Memory used without patch: 13696 bytes
> > Memory used with patch: 13264 bytes
> >
> > 0004 - Patch which implements the hash table of hash table described
> > above and also code to avoid repeated RestrictInfo list translations.
> >
> > I will add this patchset to next commitfest.
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat
>
> PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is
> the squashed version of the latest patch set at
> https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com.

CFBot shows that the patch does not apply anymore as in [1]:
=== Applying patches on top of PostgreSQL commit ID
7014c9a4bba2d1b67d60687afb5b2091c1d07f73 ===
=== applying patch
./0001-Report-memory-used-for-planning-a-query-in--20231031.patch
...
patching file src/test/regress/expected/explain.out
Hunk #5 FAILED at 290.
Hunk #6 succeeded at 545 (offset 4 lines).
1 out of 6 hunks FAILED -- saving rejects to file
src/test/regress/expected/explain.out.rej
patching file src/tools/pgindent/typedefs.list
Hunk #1 succeeded at 1562 (offset 18 lines).

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4564.log

Regards,
Vignesh



On Sat, Jan 27, 2024 at 8:26 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > The patch set is thus
> > > 0001 - patch used to measure memory used during planning
> > >
> > > 0002 - Patch to free intermediate Relids computed by
> > > adjust_child_relids_multilevel(). I didn't test memory consumption for
> > > multi-level partitioning. But this is clear improvement. In that
> > > function we free the AppendInfos array which as many pointers long as
> > > the number of relids. So it doesn't make sense not to free the Relids
> > > which can be {largest relid}/8 bytes long at least.
> > >
> > > 0003 - patch to save and reuse commuted RestrictInfo. This patch by
> > > itself shows a small memory saving (3%) in the query below where the
> > > same clause is commuted twice. The query does not contain any
> > > partitioned tables.
> > > create table t2 (a int primary key, b int, c int);
> > > create index t2_a_b on t2(a, b);
> > > select * from t2 where 10 = a
> > > Memory used without patch: 13696 bytes
> > > Memory used with patch: 13264 bytes
> > >
> > > 0004 - Patch which implements the hash table of hash table described
> > > above and also code to avoid repeated RestrictInfo list translations.
> > >
> > > I will add this patchset to next commitfest.
> > >
> > > --
> > > Best Wishes,
> > > Ashutosh Bapat
> >
> > PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is
> > the squashed version of the latest patch set at
> > https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com.
>
> CFBot shows that the patch does not apply anymore as in [1]:
> === Applying patches on top of PostgreSQL commit ID
> 7014c9a4bba2d1b67d60687afb5b2091c1d07f73 ===
> === applying patch
> ./0001-Report-memory-used-for-planning-a-query-in--20231031.patch
> ...
> patching file src/test/regress/expected/explain.out
> Hunk #5 FAILED at 290.
> Hunk #6 succeeded at 545 (offset 4 lines).
> 1 out of 6 hunks FAILED -- saving rejects to file
> src/test/regress/expected/explain.out.rej
> patching file src/tools/pgindent/typedefs.list
> Hunk #1 succeeded at 1562 (offset 18 lines).
>
> Please post an updated version for the same.
>
> [1] - http://cfbot.cputube.org/patch_46_4564.log
>
> Regards,
> Vignesh

Thanks Vignesh for the notification. PFA rebased patches. 0001 in
earlier patch-set is now removed.

--
Best Wishes,
Ashutosh Bapat

Attachment
Hi,

After taking a look at the patch optimizing SpecialJoinInfo allocations,
I decided to take a quick look at this one too. I don't have any
specific comments on the code yet, but it seems quite a bit more complex
than the other patch ... it's introducing a HTAB into the optimizer,
surely that has costs too.

I started by doing the same test as with the other patch, comparing
master to the two patches (applied independently and both). And I got
about this (in MB):

  tables    master     sjinfo    rinfo      both
  -----------------------------------------------
       2        37         36       34        33
       3       138        129      122       113
       4       421        376      363       318
       5      1495       1254     1172       931

Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
planning time for this one.

The reduction in allocated memory is nice. I wonder what's allocating
the remaining memory, and we'd have to do to reduce it further.

However, this is a somewhat extreme example - it's joining 5 tables,
each with 1000 partitions, using a partition-wise join. It reduces the
amount of memory, but the planning time is still quite high (and
essentially the same as without the patch). So it's not like it'd make
them significantly more practical ... do we have any other ideas/plans
how to improve that?

AFAIK we don't expect this to improve "regular" cases with modest number
of tables / partitions, etc. But could it make them slower in some case?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 19/2/2024 06:05, Tomas Vondra wrote:
> However, this is a somewhat extreme example - it's joining 5 tables,
> each with 1000 partitions, using a partition-wise join. It reduces the
> amount of memory, but the planning time is still quite high (and
> essentially the same as without the patch). So it's not like it'd make
> them significantly more practical ... do we have any other ideas/plans
> how to improve that?
The planner principle of cleaning up all allocated structures after the 
optimization stage simplifies development and code. But, if we want to 
achieve horizontal scalability on many partitions, we should introduce 
per-partition memory context and reset it in between. GEQO already has a 
short-lived memory context, making designing extensions a bit more 
challenging but nothing too painful.

-- 
regards,
Andrei Lepikhov
Postgres Professional




On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> After taking a look at the patch optimizing SpecialJoinInfo allocations,
> I decided to take a quick look at this one too. I don't have any
> specific comments on the code yet, but it seems quite a bit more complex
> than the other patch ... it's introducing a HTAB into the optimizer,
> surely that has costs too.

Thanks for looking into this too.

>
> I started by doing the same test as with the other patch, comparing
> master to the two patches (applied independently and both). And I got
> about this (in MB):
>
>   tables    master     sjinfo    rinfo      both
>   -----------------------------------------------
>        2        37         36       34        33
>        3       138        129      122       113
>        4       421        376      363       318
>        5      1495       1254     1172       931
>
> Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
> planning time for this one.

Yeah. That agreed with my observation as well.

>
> The reduction in allocated memory is nice. I wonder what's allocating
> the remaining memory, and we'd have to do to reduce it further.

Please see reply to SpecialJoinInfo thread. Other that the current
patches, we require memory efficient Relids implementation. I have
shared some ideas in the slides I shared in the other thread, but
haven't spent time experimenting with any ideas there.

>
> However, this is a somewhat extreme example - it's joining 5 tables,
> each with 1000 partitions, using a partition-wise join. It reduces the
> amount of memory, but the planning time is still quite high (and
> essentially the same as without the patch). So it's not like it'd make
> them significantly more practical ... do we have any other ideas/plans
> how to improve that?

Yuya has been working on reducing planning time [1]. Has some
significant improvements in that area based on my experiments. But
those patches are complex and still WIP.

>
> AFAIK we don't expect this to improve "regular" cases with modest number
> of tables / partitions, etc. But could it make them slower in some case?
>

AFAIR, my experiments did not show any degradation in regular cases
with modest number of tables/partitions. The variation in planning
time was with the usual planning time variations.

[1]
https://www.postgresql.org/message-id/flat/CAExHW5uVZ3E5RT9cXHaxQ_DEK7tasaMN%3DD6rPHcao5gcXanY5w%40mail.gmail.com#112b3e104e0f9e39eb007abe075aae20

--
Best Wishes,
Ashutosh Bapat




On Mon, Feb 19, 2024 at 5:17 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> After taking a look at the patch optimizing SpecialJoinInfo allocations,
> I decided to take a quick look at this one too. I don't have any
> specific comments on the code yet, but it seems quite a bit more complex
> than the other patch ... it's introducing a HTAB into the optimizer,
> surely that has costs too.

Thanks for looking into this too.

>
> I started by doing the same test as with the other patch, comparing
> master to the two patches (applied independently and both). And I got
> about this (in MB):
>
>   tables    master     sjinfo    rinfo      both
>   -----------------------------------------------
>        2        37         36       34        33
>        3       138        129      122       113
>        4       421        376      363       318
>        5      1495       1254     1172       931
>
> Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
> planning time for this one.

Yeah. That agreed with my observation as well.

>
> The reduction in allocated memory is nice. I wonder what's allocating
> the remaining memory, and we'd have to do to reduce it further.

Please see reply to SpecialJoinInfo thread. Other that the current
patches, we require memory efficient Relids implementation. I have
shared some ideas in the slides I shared in the other thread, but
haven't spent time experimenting with any ideas there.

>
> However, this is a somewhat extreme example - it's joining 5 tables,
> each with 1000 partitions, using a partition-wise join. It reduces the
> amount of memory, but the planning time is still quite high (and
> essentially the same as without the patch). So it's not like it'd make
> them significantly more practical ... do we have any other ideas/plans
> how to improve that?

Yuya has been working on reducing planning time [1]. Has some
significant improvements in that area based on my experiments. But
those patches are complex and still WIP.

>
> AFAIK we don't expect this to improve "regular" cases with modest number
> of tables / partitions, etc. But could it make them slower in some case?
>

AFAIR, my experiments did not show any degradation in regular cases
with modest number of tables/partitions. The variation in planning
time was with the usual planning time variations.

Documenting some comments from todays' patch review session
1. Instead of a nested hash table, it might be better to use a flat hash table to save more memory.
2. new comm_rinfo member in RestrictInfo may have problems when copying RestrictInfo or translating it. Instead commuted versions may be tracked outside RestrictInfo

Combining the above two, it feels like we need a single hash table with (commuted, rinfo_serial, relids) as key to store all the translations of a RestrictInfo.

--
Best Wishes,
Ashutosh Bapat
On Mon, Aug 19, 2024 at 6:43 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Sorry for the delay in my response.
>
> On Wed, May 29, 2024 at 9:34 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > Documenting some comments from todays' patch review session
>
> I forgot to mention back then that both of the suggestions below came
> from Tom Lane.
>
> > 1. Instead of a nested hash table, it might be better to use a flat hash table to save more memory.
>
> Done. It indeed saves memory without impacting planning time.
>
> > 2. new comm_rinfo member in RestrictInfo may have problems when copying RestrictInfo or translating it. Instead
commutedversions may be tracked outside RestrictInfo 
>
> After commit 767c598954bbf72e0535f667e2e0667765604b2a,
> repameterization of parent paths for child relation happen at the time
> of creating the plan. This reduces the number of child paths produced
> by reparameterization and also reduces the number of RestrictInfos
> that get translated during reparameterization. During
> reparameterization commuted parent RestrictInfos are required to be
> translated to child RestrictInfos. Before the commit, this led to
> translating the same commuted parent RestrictInfo multiple times.
> After the commit, only one path gets reparameterized for a given
> parent and child pair. Hence we do not produce multiple copies of the
> same commuted child RestrictInfo. Hence we don't need to keep track of
> commuted child RestrictInfos anymore. Removed that portion of code
> from the patches.
>
> I made detailed memory consumption measurements with this patch for
> number of partitions changed from 0 (unpartitioned) to 1000 and for 2
> to 5-way joins. They are available in the spreadsheet at [1]. The raw
> measurement data is in the first sheet named "pwj_mem_measurements raw
> numbers". The averages over multiple runs are in second sheet named
> "avg_numbers". Rest of the sheet represent the averages in more
> consumable manner. Please note that the averages make sense only for
> planning time since the memory consumption remains same with every
> run. Also note that EXPLAIN now reports planning memory consumption in
> kB. Any changes to memory consumption below 1kB are not reported and
> hence not noticed. Here are some observations.
>
> 1. When partitionwise join is not enabled, no changes to planner's
> memory consumption are observed. See sheet named "pwj disabled,
> planning memory".
>
> 2. When partitionwise join is enabled, upto 17% (206MB) memory is
> saved by the patch in case of 5-way self-join with 1000 partitions.
> This is the maximum memory saving observed. The amount of memory saved
> increases with the number of joins and number of partitions. See sheet
> with name "pwj enabled, planning memory"
>
> 3. After commit 767c598954bbf72e0535f667e2e0667765604b2a, we do not
> translate a parent RestrictInfo multiple times for the same
> parent-child pair in case of  a *2-way partitionwise join*. But we
> still consume memory in saving the child RestrictInfo in the hash
> table. Hence in case of 2-way join we see increased memory consumption
> with the patch compared to master. The memory consumption increases by
> 13kb, 23kB, 76kB and 146kB for 10, 100, 500, 1000 partitions
> respectively. This increase is smaller compared to the overall memory
> saving. In order to avoid this memory increase, we will need to avoid
> using hash table for 2-way join. We will need to know whether there
> will be more than one partitionwise join before translating the
> RestrictInfos for the first partitionwise join. This is hard to
> achieve in all the cases since the decision to use partitionwise join
> happens at the time of creating paths for a given join relation, which
> itself is computed on the fly. We may choose some heuristics which
> take into account the number of partitioned tables in the query, their
> partition schemes, and the quals in the query to decide whether or not
> to track the translated child RestrictInfos. But that will make the
> code more complex, but more importantly the heuristics may not be able
> to keep up if we start using partitionwise join as an optimization
> strategy for more cases (e.g. asymmetric partitionwise join [2]). The
> attached patch looks like a good tradeoff to me. But other opinions
> might vary. Suggestions are welcome.
>
> 4. There is no noticeable change in the planning time. I ran the same
> experiment multiple times. The planning time variations from each
> experiment do not show any noticeable pattern suggesting increase or
> decrease in the planning time with the patch.
>
> A note about the code: I have added all the structures and functions
> dealing with the RestrictInfo hash table at the end of restrictinfo.c.
> I have not come across a C file in PostgreSQL code base where private
> structures are defined in the middle the file; usually they are
> defined at the beginning of the file. But I have chosen it that way
> here since it makes it easy to document the hash table and the
> functions at one place at the beginning of this code section. I am
> open to suggestions which make the documentation easy while placing
> the structures at the beginning of the file.
>
> [1] https://docs.google.com/spreadsheets/d/1CEjRWZ02vuR8fSwhYaNugewtX8f0kIm5pLoRA95f3s8/edit?usp=sharing
> [2] https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com
>

Noticed that Tomas's email address has changed. Adding his new email address.

--
Best Wishes,
Ashutosh Bapat