Memory consumption during partitionwise join planning - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Memory consumption during partitionwise join planning |
Date | |
Msg-id | CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Memory consumption during partitionwise join planning
|
List | pgsql-hackers |
Hi All, When we implemented partitionwise join we disabled it by default (by setting enable_partitionwise_join to false by default) since it consumed a lot of memory and took a lot of time [1]. We also set enable_partitionwise_aggregate to false by default since partitionwise aggregates require partitionwise join. I have come across at least a few cases in the field where users didn't know about these GUCs and suffered from bad performance when queries involved partitioned tables. Their queries improved a lot by just turning these GUCs ON. Given that the partitionwise operations improve performance of queries on partitioned tables, I think these GUCs need to be turned ON by default. Irrespective of whether the GUC is turned ON or OFF by default, we need to reduce the memory consumed by the partitionwise planning and the time those strategies take during planning. In this email I will discuss the planner memory consumption problem. Please find below the memory consumption measurements on master. Experiment ---------- Created two tables t1 and t1_parted respectively. t1 is an unpartitioned table whereas t1_parted is a partitioned table with 1000 partitions. Both of them are empty. Being empty does not affect planning time and memory consumption much, but it takes far less time to set up and then run EXPLAIN ANALYZE. I am using a self-join on the partition key to measure memory and planning time. Again it's something that makes setup easier yet good enough for measurements. Attached are two scripts. setup.sql should be run to create tables, partitions and helper functions. queries.sql executes and measures time and memory of 2-way, 3-way, 4-way and 5-way self joins respectively. It runs each query four times, once to warm up caches and then thrice for measurement. Below I report averages of three runs. The first attached patch is used to measure memory consumption and report it as part of EXPLAIN ANALYZE. It simply takes the difference between memory used in the CurrentMemoryContext before and after calling pg_plan_query() in ExplainOneQuery(). It does not account for the memory consumed outside the CurrentMemoryContext while planning the query e.g. caches. The memory consumed in those contexts does not necessarily contribute to planning of a given query. Hence they are ignored. I think the patch by itself makes a good improvement to EXPLAIN ANALYZE, so we may want to consider it to be accepted in the core code. Measurements ------------ In all the tables below, the first column shows the number of tables joined. Second column is for queries on unpartitioned table. Third column for queries on partitioned table with partitionwise join turned OFF. Fourth column is for queries on partitioned table with partitionwise join turned ON. Table 1: Planning time in ms. Number of | unpartitioned | partitioned with | partitioned | tables | | PWJ off | PWJ on | --------------------------------------------------------------- 2 | 0.10 | 464.39 | 501.75 | 3 | 0.17 | 2,714.28 | 3,035.07 | 4 | 0.34 | 8,856.82 | 10,416.31 | 5 | 0.84 | 22,519.02 | 29,769.19 | Table 2: Memory consumption Number of | unpartitioned | partitioned with | partitioned | tables | | PWJ off | PWJ on | --------------------------------------------------------------- 2 | 29.056 KiB | 19.520 MiB | 40.298 MiB | 3 | 79.088 KiB | 45.227 MiB | 146.877 MiB | 4 | 208.552 KiB | 83.540 MiB | 445.453 MiB | 5 | 561.592 KiB | 149.283 MiB | 1563.253 MiB | The time required to plan queries on partitioned table is very high compared to the time taken to plan queries on unpartitioned tables. In fact it's way higher than the factor of 1000 expected when there 1000 partitions. But the difference between planning time when partitionwise join is turned OFF vs ON is less compared to difference between unpartitioned vs partitioned case. The problem of reducing planning time for queries on partitioned tables is being worked on by Yuya and David in [2]. That work focuses on queries without partitionwise join. But it might fix the problem for partitionwise join as well. I am focusing on reducing the memory consumption when partitionwise join is enabled. As visible from table 2 when partitionwise join is *not* used, the memory consumed in planning is proportional to the number of partitions as expected. But when partitionwise join is enabled the memory consumption increases exponentially with the number of tables being joined. This is because the number of ways a join can be computed is exponentially proportional to the number of tables being joined. Table 3 has a breakup of memory consumed. The memory consumption is broken by the objects that consume memory during planning. The second attached patch is used to measure breakup by functionality . Here's a brief explanation of the rows in the table. 1. Restrictlist translations: Like other expressions the Restrictinfo lists of parent are translated to obtain Restrictinfo lists to be applied to child partitions (base as well as join). The first row shows the memory consumed by the translated RestrictInfos. We can't avoid these translations but closer examination reveals that a given RestrictInfo gets translated multiple times proportional to the join orders. These repeated translations can be avoided. I will start a separate thread to discuss this topic. 2. Paths: this is the memory consumed when creating child join paths and the Append paths in parent joins. It includes memory consumed by the paths as well as translated expressions. I don't think we can avoid creating these paths. But once the best paths are chosen for the lower level relations, the unused paths can be freed. I will start a separate thread to discuss this topic. 3. targetlist translation: child join relations' targetlists are created by translating parent relations' targetlist. This row shows the memory consumed by the translated targetlists. This translation can't be avoided. 4. child SpecialJoinInfo: This is memory consumed in child joins' SpecialJoinInfos translated from SpecialJoinInfo applicable to parent joins. The child SpecialJoinInfos are translated on the fly when computing child joins but are never freed. May be we can free them on the fly as well or even better save them somewhere and fetch as and when required. I will start a separate thread to discuss this topic. 5. Child join RelOptInfos: memory consumed by child join relations. This is unavoidable as we need the RelOptInfos representing the child joins. Table 3: Partitionwise join planning memory breakup Num joins | 2 | 3 | 4 | 5 | ------------------------------------------------------------------------ 1. translated | 1.8 MiB | 13.1 MiB | 58.0 MiB | 236.5 MiB | restrictlists | | | | | ------------------------------------------------------------------------ 2. creating child | 11.6 MiB | 59.4 MiB | 207.6 MiB | 768.2 MiB | join paths | | | | | ------------------------------------------------------------------------ 3. translated | 723.5 KiB | 3.3 MiB | 10.6 MiB | 28.5 MiB | targetlists | | | | | ------------------------------------------------------------------------ 4. child | 926.8 KiB | 9.0 MiB | 45.7 MiB | 245.5 MiB | SpecialJoinInfo | | | | | ------------------------------------------------------------------------ 5. Child join rels | 1.6 MiB | 7.9 MiB | 23.8 MiB | 67.5 MiB | ------------------------------------------------------------------------ Rows 1, 2 and 4 show most of the memory consumption. Those are also the ones where there are opportunities to save memory. As said above, we will discuss them in their respective email threads since each of these fixes are independent and require separate discussion. Here's memory consumption numbers after applying all the POC patches mentioned above. Number of tables | no | all POC | % | Absolute | joined | patches | patches | reduction | reduction | ---------------------------------------------------------------------- 2 | 40.3 MiB | 36.5 MiB | 9.43% | 3.8 MiB | 3 | 146.9 MiB | 120.2 MiB | 18.13% | 26.6 MiB | 4 | 445.5 MiB | 328.8 MiB | 26.19% | 116.7 MiB | 5 | 1563.3 MiB | 953.2 MiB | 39.03% | 610.1 MiB | Even if we are able to recover some memory as mentioned above, a lot of memory is still consumed by the translated objects (expressions, RestrictLists etc.). In case of partitioned tables, these translated object trees differ only in leaf Var nodes. Thus if we device a way to use the same Var node for parents as well as its children, we won't need to translate the expressions. But that's a much larger effort and probably something not even feasible. While subproblems and their solutions will be discussed in separate email threads, this thread is to discuss 1. generic memory consumption problems not covered above but seen by others 2. Is the memory consumption reduction after the POC patches good enough for us to make partitionwise operations turned ON by default? 3. Is there anything else that stops us from turning the partitionwise operations ON by default? [1] https://www.postgresql.org/message-id/CA+TgmoYggDp6k-HXNAgrykZh79w6nv2FevpYR_jeMbrORDaQrA@mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com -- Best Wishes, Ashutosh Bapat
Attachment
pgsql-hackers by date: