Thread: Excessive memory usage in multi-statement queries w/ partitioning
Hi, a customer reported excessive memory usage and out-of-memory ERRORs after introducing native partitioning in one of their databases. We could narrow it down to the overhead introduced by the partitioning when issuing multiple statements in a single query. I could reduce the problem to the following recipe: --8<---------------cut here---------------start------------->8--- #!/bin/bash # create 100 partitions psql -c 'create table t(c int primary key) partition by range(c)' for i in {1..100}; do psql -e -c "create table t$i partition of t for values from ($(((i-1)*100))) to ($((i*100-1))) " done # artificially limit per-process memory by setting a resource limit for # the postmaster to 256MB prlimit -d$((256*1024*1024)) -p $POSTMASTER_PID --8<---------------cut here---------------end--------------->8--- Now, updates to a partition are fine with 4000 update statements: ,---- | $ psql -c "$(yes update t2 set c=c where c=6 \; | head -n 4000)" | UPDATE 0 `---- …but when doing it on the parent relation, even 100 statements are enough to exceed the limit: ,---- | $ psql -c "$(yes update t set c=c where c=6 \; | head -n 100)" | FEHLER: Speicher aufgebraucht | DETAIL: Failed on request of size 200 in memory context "MessageContext". `---- The memory context dump shows plausible values except for the MessageContext: TopMemoryContext: 124336 total in 8 blocks; 18456 free (11 chunks); 105880 used [...] MessageContext: 264241152 total in 42 blocks; 264 free (0 chunks); 264240888 used [...] Maybe some tactically placed pfrees or avoiding putting redundant stuff into MessageContext can relax the situation? regards, Andreas
On Thu, 23 May 2019 at 17:55, Andreas Seltenreich <andreas.seltenreich@credativ.de> wrote: > a customer reported excessive memory usage and out-of-memory ERRORs > after introducing native partitioning in one of their databases. We > could narrow it down to the overhead introduced by the partitioning when > issuing multiple statements in a single query. "multiple statements in a single query", did you mean to write session or maybe transaction there? Which version? I tried your test case with REL_11_STABLE and I see nowhere near as much memory used in MessageContext. After repeating the query twice, I see: MessageContext: 8388608 total in 11 blocks; 3776960 free (1 chunks); 4611648 used Grand total: 8388608 bytes in 11 blocks; 3776960 free (1 chunks); 4611648 used MessageContext: 8388608 total in 11 blocks; 3776960 free (1 chunks); 4611648 used Grand total: 8388608 bytes in 11 blocks; 3776960 free (1 chunks); 4611648 used which is quite a long way off the 252MB you're getting. perhaps I'm not testing with the same version as you are. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hey David, > "multiple statements in a single query", did you mean to write > session > or maybe transaction there? Maybe the wording isn't perfect. It is required that the querys are sent as a single batch. Try the exact bash-script Andreas used for updating the parent. > Which version? Tested including 11.2. Initially found on 11.1. Memory-consumption Scales somewhat linearly with existing partitions and ';' delimited Querys per single Batch. regards -- Julian Schauder
On Thu, 23 May 2019 at 21:19, Julian Schauder <julian.schauder@credativ.de> wrote: > > "multiple statements in a single query", did you mean to write > > session > > or maybe transaction there? > > Maybe the wording isn't perfect. It is required that the querys are > sent as a single batch. Try the exact bash-script Andreas used for > updating the parent. Thanks for explaining. > > Which version? > > Tested including 11.2. Initially found on 11.1. Memory-consumption > Scales somewhat linearly with existing partitions and ';' delimited > Querys per single Batch. Yeah, unfortunately, if the batch contains 100 of those statements then the planner is going to eat 100 times the memory since it stores all 100 plans at once. Since your pruning all but 1 partition then the situation should be much better for you when you can upgrade to v12. Unfortunately, that's still about 5 months away. The best thing you can do for now is going to be either reduce the number of partitions or reduce the number of statements in the batch... or install more memory. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2019/05/23 4:15, Andreas Seltenreich wrote: > …but when doing it on the parent relation, even 100 statements are > enough to exceed the limit: > > ,---- > | $ psql -c "$(yes update t set c=c where c=6 \; | head -n 100)" > | FEHLER: Speicher aufgebraucht > | DETAIL: Failed on request of size 200 in memory context "MessageContext". > `---- > > The memory context dump shows plausible values except for the MessageContext: > > TopMemoryContext: 124336 total in 8 blocks; 18456 free (11 chunks); 105880 used > [...] > MessageContext: 264241152 total in 42 blocks; 264 free (0 chunks); 264240888 used > [...] As David Rowley said, planning that query hundreds of times under a single MessageContext is not something that will end well on 11.3, because even a single instance takes up tons of memory that's only released when MessageContext is reset. > Maybe some tactically placed pfrees or avoiding putting redundant stuff > into MessageContext can relax the situation? I too have had similar thoughts on the matter. If the planner had built all its subsidiary data structures in its own private context (or tree of contexts) which is reset once a plan for a given query is built and passed on, then there wouldn't be an issue of all of that subsidiary memory leaking into MessageContext. However, the problem may really be that we're subjecting the planner to use cases that it wasn't perhaps designed to perform equally well under -- running it many times while handling the same message. It is worsened by the fact that the query in question is something that ought to have been documented as not well supported by the planner; David has posted a documentation patch for that [1]. PG 12 has alleviated the situation to a large degree, so you won't see the OOM occurring for this query, but not for all queries unfortunately. With that said, we may want to look into the planner sometimes hoarding memory, especially when planning complex queries involving partitions. AFAIK, one of the reasons for partition-wise join, aggregate to be turned off by default is that its planning consumes a lot of CPU and memory, partly because of the fact that planner doesn't actively release the memory of its subsidiary structures, or maybe because of inferior ways in which partitions and partitioning properties are represented in the planner. Though if there's evidence that it's the latter, maybe we should fix that before pondering any sophisticated planner memory management. Thanks, Amit [1] https://www.postgresql.org/message-id/CAKJS1f-2rx%2BE9mG3xrCVHupefMjAp1%2BtpczQa9SEOZWyU7fjEA%40mail.gmail.com
On 5/24/19 1:47 AM, Amit Langote wrote: > On 2019/05/23 4:15, Andreas Seltenreich wrote: >> …but when doing it on the parent relation, even 100 statements are >> enough to exceed the limit: >> >> ,---- >> | $ psql -c "$(yes update t set c=c where c=6 \; | head -n 100)" >> | FEHLER: Speicher aufgebraucht >> | DETAIL: Failed on request of size 200 in memory context "MessageContext". >> `---- >> >> The memory context dump shows plausible values except for the MessageContext: >> >> TopMemoryContext: 124336 total in 8 blocks; 18456 free (11 chunks); 105880 used >> [...] >> MessageContext: 264241152 total in 42 blocks; 264 free (0 chunks); 264240888 used >> [...] > > As David Rowley said, planning that query hundreds of times under a single > MessageContext is not something that will end well on 11.3, because even a > single instance takes up tons of memory that's only released when > MessageContext is reset. > >> Maybe some tactically placed pfrees or avoiding putting redundant stuff >> into MessageContext can relax the situation? > > I too have had similar thoughts on the matter. If the planner had built > all its subsidiary data structures in its own private context (or tree of > contexts) which is reset once a plan for a given query is built and passed > on, then there wouldn't be an issue of all of that subsidiary memory > leaking into MessageContext. However, the problem may really be that > we're subjecting the planner to use cases that it wasn't perhaps designed > to perform equally well under -- running it many times while handling the > same message. It is worsened by the fact that the query in question is > something that ought to have been documented as not well supported by the > planner; David has posted a documentation patch for that [1]. PG 12 has > alleviated the situation to a large degree, so you won't see the OOM > occurring for this query, but not for all queries unfortunately. I admittedly haven't followed this thread too closely, but if having 100 partitions causes out of memory on pg11, that sounds like a massive regression to me. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Attachment
On Sat, 25 May 2019 at 00:18, Joe Conway <mail@joeconway.com> wrote: > I admittedly haven't followed this thread too closely, but if having 100 > partitions causes out of memory on pg11, that sounds like a massive > regression to me. For it to have regressed it would have had to once have been better, but where was that mentioned? The only thing I saw was non-partitioned tables compared to partitioned tables, but you can't really say it's a regression if you're comparing apples to oranges. I think the only regression here is in the documents from bebc46931a1 having removed the warning about too many partitions in a partitioned table at the end of ddl.sgml. As Amit mentions, we'd like to put something back about that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 5/24/19 9:33 AM, David Rowley wrote: > On Sat, 25 May 2019 at 00:18, Joe Conway <mail@joeconway.com> wrote: >> I admittedly haven't followed this thread too closely, but if having 100 >> partitions causes out of memory on pg11, that sounds like a massive >> regression to me. > > For it to have regressed it would have had to once have been better, > but where was that mentioned? The only thing I saw was > non-partitioned tables compared to partitioned tables, but you can't > really say it's a regression if you're comparing apples to oranges. I have very successfully used multiple hundreds and even low thousands of partitions without running out of memory under the older inheritance based "partitioning", and declarative partitioning is supposed to be (and we have advertised it to be) better, not worse, isn't it? At least from my point of view if 100 partitions is unusable due to memory leaks it is a regression. Perhaps not *technically* a regression assuming it behaves this way in pg10 also, but I bet lots of users would see it that way. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Attachment
Joe Conway <mail@joeconway.com> writes: > On 5/24/19 9:33 AM, David Rowley wrote: >> For it to have regressed it would have had to once have been better, >> but where was that mentioned? The only thing I saw was >> non-partitioned tables compared to partitioned tables, but you can't >> really say it's a regression if you're comparing apples to oranges. > I have very successfully used multiple hundreds and even low thousands > of partitions without running out of memory under the older inheritance > based "partitioning", and declarative partitioning is supposed to be > (and we have advertised it to be) better, not worse, isn't it? Have you done the exact thing described in the test case? I think that's going to be quite unpleasantly memory-intensive in any version. The real issue here is that we have designed around the assumption that MessageContext will be used to parse and plan one single statement before being reset. The described usage breaks that assumption. No matter how memory-efficient any one statement is or isn't, if you throw enough of them at the backend without giving it a chance to reset MessageContext, it won't end well. So my thought, if we want to do something about this, is not "find some things we can pfree at the end of planning" but "find a way to use a separate context for each statement in the query string". Maybe multi-query strings could be handled by setting up a child context of MessageContext (after we've done the raw parsing there and determined that indeed there are multiple queries), running parse analysis and planning in that context, and resetting that context after each query. regards, tom lane
On 5/24/19 10:28 AM, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> On 5/24/19 9:33 AM, David Rowley wrote: >>> For it to have regressed it would have had to once have been better, >>> but where was that mentioned? The only thing I saw was >>> non-partitioned tables compared to partitioned tables, but you can't >>> really say it's a regression if you're comparing apples to oranges. > >> I have very successfully used multiple hundreds and even low thousands >> of partitions without running out of memory under the older inheritance >> based "partitioning", and declarative partitioning is supposed to be >> (and we have advertised it to be) better, not worse, isn't it? > > Have you done the exact thing described in the test case? I think > that's going to be quite unpleasantly memory-intensive in any version. Ok, fair point. Will test and report back. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Attachment
On 2019/05/24 23:28, Tom Lane wrote: > So my thought, if we want to do something about this, is not "find > some things we can pfree at the end of planning" but "find a way > to use a separate context for each statement in the query string". > Maybe multi-query strings could be handled by setting up a child > context of MessageContext (after we've done the raw parsing there > and determined that indeed there are multiple queries), running > parse analysis and planning in that context, and resetting that > context after each query. Maybe like the attached? I'm not sure if we need to likewise be concerned about exec_sql_string() being handed multi-query strings. Thanks, Amit
Attachment
Hi, On 2019/05/24 21:18, Joe Conway wrote: > On 5/24/19 1:47 AM, Amit Langote wrote: >> I too have had similar thoughts on the matter. If the planner had built >> all its subsidiary data structures in its own private context (or tree of >> contexts) which is reset once a plan for a given query is built and passed >> on, then there wouldn't be an issue of all of that subsidiary memory >> leaking into MessageContext. However, the problem may really be that >> we're subjecting the planner to use cases that it wasn't perhaps designed >> to perform equally well under -- running it many times while handling the >> same message. It is worsened by the fact that the query in question is >> something that ought to have been documented as not well supported by the >> planner; David has posted a documentation patch for that [1]. PG 12 has >> alleviated the situation to a large degree, so you won't see the OOM >> occurring for this query, but not for all queries unfortunately. > > I admittedly haven't followed this thread too closely, but if having 100 > partitions causes out of memory on pg11, that sounds like a massive > regression to me. You won't run out of memory if you are running just one query per message, but that's not the case in this discussion. With multi-query submissions like in this case, memory taken up by parsing and planning of *all* queries adds up to a single MessageContext, so can lead to OOM if there are enough queries to load up MessageContext beyond limit. The only point I was trying to make in what I wrote is that reaching OOM of this sort is easier with partitioning, because of the age-old behavior that planning UPDATE/DELETE queries on inherited tables (and so partitioned tables) needs tons of memory that grows as the number of child tables / partitions increases. We fixed things in PG 12, at least for partitioning, so that as long as a query needs to affect only a small number of partitions of the total present, its planning will use only a fixed amount of CPU and memory, so increasing the number of partitions won't lead to explosive growth in memory used. You might be able to tell however that that effort had nothing to do improving the situation with multi-query submissions. Thanks, Amit
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2019/05/24 23:28, Tom Lane wrote: >> So my thought, if we want to do something about this, is not "find >> some things we can pfree at the end of planning" but "find a way >> to use a separate context for each statement in the query string". > Maybe like the attached? I'm not sure if we need to likewise be concerned > about exec_sql_string() being handed multi-query strings. Please add this to the upcoming CF so we don't forget about it. (I don't think there's anything very new about this behavior, so I don't feel that we should consider it an open item for v12 --- anyone think differently?) regards, tom lane
On 2019/05/27 21:56, Tom Lane wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> On 2019/05/24 23:28, Tom Lane wrote: >>> So my thought, if we want to do something about this, is not "find >>> some things we can pfree at the end of planning" but "find a way >>> to use a separate context for each statement in the query string". > >> Maybe like the attached? I'm not sure if we need to likewise be concerned >> about exec_sql_string() being handed multi-query strings. > > Please add this to the upcoming CF so we don't forget about it. Done; added to Performance for lack of a better topic for this. https://commitfest.postgresql.org/23/2131/ > (I don't think there's anything very new about this behavior, so > I don't feel that we should consider it an open item for v12 --- > anyone think differently?) Agree that there's nothing new about the behavior itself. What may be new though is people getting increasingly bitten by it if they query tables containing large numbers of partitions most of which need to be scanned [1]. That is, provided they have use cases where a single client request contains hundreds of such queries to begin with. Thanks, Amit [1] AFAICT, that's the only class of queries where planner needs to keep a lot of stuff around, the memory cost of which increases with the number of partitions. I was thinking that the planning complex queries involving going through tons of indexes, joins, etc. also hoards tons of memory, but apparently not, because the planner seems fairly good at cleaning after itself as it's doing the work.
On Tue, May 28, 2019 at 6:57 AM Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > On 2019/05/27 21:56, Tom Lane wrote: > > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > >> On 2019/05/24 23:28, Tom Lane wrote: > >>> So my thought, if we want to do something about this, is not "find > >>> some things we can pfree at the end of planning" but "find a way > >>> to use a separate context for each statement in the query string". > > > >> Maybe like the attached? I'm not sure if we need to likewise be concerned > >> about exec_sql_string() being handed multi-query strings. the whole extension sql script is passed to execute_sql_string(), so I think that it's a good thing to have similar workaround there. About the patch: - * Switch to appropriate context for constructing querytrees (again, - * these must outlive the execution context). + * Switch to appropriate context for constructing querytrees. + * Memory allocated during this construction is released before + * the generated plan is executed. The comment should mention query and plan trees, everything else seems ok to me.
Hi Julien, Thanks for taking a look at this. On Thu, Jul 4, 2019 at 6:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > On Tue, May 28, 2019 at 6:57 AM Amit Langote wrote: > > >> Maybe like the attached? I'm not sure if we need to likewise be concerned > > >> about exec_sql_string() being handed multi-query strings. > > the whole extension sql script is passed to execute_sql_string(), so I > think that it's a good thing to have similar workaround there. That makes sense, although it is perhaps much less likely for memory usage explosion to occur in execute_sql_strings(), because the scripts passed to execute_sql_strings() mostly contain utility statements and rarely anything whose planning will explode in memory usage. Anyway, I've added similar handling in execute_sql_strings() for consistency. Now I wonder if we'll need to consider another path which calls pg_plan_queries() on a possibly multi-statement query -- BuildCachedPlan()... > About the patch: > > - * Switch to appropriate context for constructing querytrees (again, > - * these must outlive the execution context). > + * Switch to appropriate context for constructing querytrees. > + * Memory allocated during this construction is released before > + * the generated plan is executed. > > The comment should mention query and plan trees, everything else seems ok to me. Okay, fixed. Attached updated patch. Thanks again. Regards, Amit
Attachment
Hi Amit, On Mon, Jul 8, 2019 at 10:52 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Jul 4, 2019 at 6:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, May 28, 2019 at 6:57 AM Amit Langote wrote: > > > >> Maybe like the attached? I'm not sure if we need to likewise be concerned > > > >> about exec_sql_string() being handed multi-query strings. > > > > the whole extension sql script is passed to execute_sql_string(), so I > > think that it's a good thing to have similar workaround there. > > That makes sense, although it is perhaps much less likely for memory > usage explosion to occur in execute_sql_strings(), because the scripts > passed to execute_sql_strings() mostly contain utility statements and > rarely anything whose planning will explode in memory usage. > > Anyway, I've added similar handling in execute_sql_strings() for consistency. Thanks! > Now I wonder if we'll need to consider another path which calls > pg_plan_queries() on a possibly multi-statement query -- > BuildCachedPlan()... I also thought about this when reviewing the patch, but AFAICS you can't provide a multi-statement query to BuildCachedPlan() using prepared statements and I'm not sure that SPI is worth the trouble. I'll mark this patch as ready for committer. > > > About the patch: > > > > - * Switch to appropriate context for constructing querytrees (again, > > - * these must outlive the execution context). > > + * Switch to appropriate context for constructing querytrees. > > + * Memory allocated during this construction is released before > > + * the generated plan is executed. > > > > The comment should mention query and plan trees, everything else seems ok to me. > > Okay, fixed. > > Attached updated patch. Thanks again. > > Regards, > Amit
Amit Langote <amitlangote09@gmail.com> writes: > [ parse-plan-memcxt_v2.patch ] I got around to looking at this finally. I'm not at all happy with the fact that it's added a plantree copy step to the only execution path through exec_simple_query. That's a very significant overhead, in many use-cases, to solve something that nobody had complained about for a couple of decades before now. I don't see the need for any added copy step anyway. The only reason you're doing it AFAICS is so you can release the per-statement context a bit earlier, which is a completely unnecessary optimization. Just wait to release it till the bottom of the loop. Also, creating/deleting the sub-context is in itself an added overhead that accomplishes exactly nothing in the typical case where there's not multiple statements. I thought the idea was to do that only if there was more than one raw parsetree (or, maybe better, do it for all but the last parsetree). To show that this isn't an empty concern, I did a quick pgbench test. Using a single-client select-only test ("pgbench -S -T 60" in an -s 10 database), I got these numbers in three trials with HEAD: tps = 9593.818478 (excluding connections establishing) tps = 9570.189163 (excluding connections establishing) tps = 9596.579038 (excluding connections establishing) and these numbers after applying the patch: tps = 9411.918165 (excluding connections establishing) tps = 9389.279079 (excluding connections establishing) tps = 9409.350175 (excluding connections establishing) That's about a 2% dropoff. Now it's possible that that can be explained away as random variations from a slightly different layout of critical loops vs cacheline boundaries ... but I don't believe it. regards, tom lane
On Tue, Jul 9, 2019 at 6:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Langote <amitlangote09@gmail.com> writes: > > [ parse-plan-memcxt_v2.patch ] > > I got around to looking at this finally. Thanks for the review. > I'm not at all happy with > the fact that it's added a plantree copy step to the only execution > path through exec_simple_query. That's a very significant overhead, > in many use-cases, to solve something that nobody had complained > about for a couple of decades before now. I don't see the need for > any added copy step anyway. The only reason you're doing it AFAICS > is so you can release the per-statement context a bit earlier, which > is a completely unnecessary optimization. Just wait to release it > till the bottom of the loop. Ah, that makes sense. I've removed the copying of plan tree and also moved the temporary context deletion to the bottom of the loop. > Also, creating/deleting the sub-context is in itself an added overhead > that accomplishes exactly nothing in the typical case where there's > not multiple statements. I thought the idea was to do that only if > there was more than one raw parsetree (or, maybe better, do it for > all but the last parsetree). That makes sense too. I've made it (creation/deletion of the child context) conditional on whether there are more than one queries to plan. > To show that this isn't an empty concern, I did a quick pgbench > test. Using a single-client select-only test ("pgbench -S -T 60" > in an -s 10 database), I got these numbers in three trials with HEAD: > > tps = 9593.818478 (excluding connections establishing) > tps = 9570.189163 (excluding connections establishing) > tps = 9596.579038 (excluding connections establishing) > > and these numbers after applying the patch: > > tps = 9411.918165 (excluding connections establishing) > tps = 9389.279079 (excluding connections establishing) > tps = 9409.350175 (excluding connections establishing) > > That's about a 2% dropoff. With the updated patch, here are the numbers on my machine (HEAD vs patch) HEAD: tps = 3586.233815 (excluding connections establishing) tps = 3569.252542 (excluding connections establishing) tps = 3559.027733 (excluding connections establishing) Patched: tps = 3586.988057 (excluding connections establishing) tps = 3585.169589 (excluding connections establishing) tps = 3526.437968 (excluding connections establishing) A bit noisy but not much degradation. Attached updated patch. Thanks again. Regards, Amit
Attachment
Hi, At Wed, 10 Jul 2019 16:35:18 +0900, Amit Langote <amitlangote09@gmail.com> wrote in <CA+HiwqFCO4c8tdQmXcDNzyaD43A81caapYLJ6CEh8H3P0tPL4A@mail.gmail.com> > On Tue, Jul 9, 2019 at 6:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Amit Langote <amitlangote09@gmail.com> writes: > > > [ parse-plan-memcxt_v2.patch ] > > > > I got around to looking at this finally. > > Thanks for the review. > > > I'm not at all happy with > > the fact that it's added a plantree copy step to the only execution > > path through exec_simple_query. That's a very significant overhead, > > in many use-cases, to solve something that nobody had complained > > about for a couple of decades before now. I don't see the need for > > any added copy step anyway. The only reason you're doing it AFAICS > > is so you can release the per-statement context a bit earlier, which > > is a completely unnecessary optimization. Just wait to release it > > till the bottom of the loop. > > Ah, that makes sense. I've removed the copying of plan tree and also > moved the temporary context deletion to the bottom of the loop. - * Switch to appropriate context for constructing querytrees (again, - * these must outlive the execution context). + * Switch to appropriate context for constructing query and plan trees + * (again, these must outlive the execution context). Normally, it's + * MessageContext, but if there are more queries to plan, we use a + * temporary child context that will be reset after executing this + * query. We avoid that overhead of setting up a separate context + * for the common case of having just a single query. Might be stupid, but I feel uneasy that "usually it must live in MessageContxt, but not necessarily if there is succeeding query".. *I* need more explanation why it is safe to use that short-lived context. > > Also, creating/deleting the sub-context is in itself an added overhead > > that accomplishes exactly nothing in the typical case where there's > > not multiple statements. I thought the idea was to do that only if > > there was more than one raw parsetree (or, maybe better, do it for > > all but the last parsetree). > > That makes sense too. I've made it (creation/deletion of the child > context) conditional on whether there are more than one queries to > plan. > > > To show that this isn't an empty concern, I did a quick pgbench > > test. Using a single-client select-only test ("pgbench -S -T 60" > > in an -s 10 database), I got these numbers in three trials with HEAD: > > > > tps = 9593.818478 (excluding connections establishing) > > tps = 9570.189163 (excluding connections establishing) > > tps = 9596.579038 (excluding connections establishing) > > > > and these numbers after applying the patch: > > > > tps = 9411.918165 (excluding connections establishing) > > tps = 9389.279079 (excluding connections establishing) > > tps = 9409.350175 (excluding connections establishing) > > > > That's about a 2% dropoff. > > With the updated patch, here are the numbers on my machine (HEAD vs patch) > > HEAD: > > tps = 3586.233815 (excluding connections establishing) > tps = 3569.252542 (excluding connections establishing) > tps = 3559.027733 (excluding connections establishing) > > Patched: > > tps = 3586.988057 (excluding connections establishing) > tps = 3585.169589 (excluding connections establishing) > tps = 3526.437968 (excluding connections establishing) > > A bit noisy but not much degradation. > > Attached updated patch. Thanks again. > > Regards, > Amit regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Amit Langote <amitlangote09@gmail.com> writes: > Attached updated patch. Thanks again. Pushed with a bit of further cleanup --- most notably, the way you had execute_sql_string(), it was still leaking any cruft ProcessUtility might generate. We can fix that by running ProcessUtility in the per-statement context too. I also dropped the optimization for a single/last statement in execute_sql_string(), and simplified it to just always create and delete a child context. This was based on a couple of thoughts. The norm in this code path is that there's multiple statements, probably lots of them, so that the percentage savings from getting rid of one context creation is likely negligible. Also, unlike exec_simple_query, we *don't* know that the outer context is due to be cleared right afterwards. Since execute_sql_string() can run multiple times in one extension command, in principle we could get bloat from not cleaning up after the last command of each string. Admittedly, it's not likely that you'd have so many strings involved that that amounts to a lot, but between following upgrade-script chains and cascaded module loads, there could be more than a couple. So it seems like the argument for saving a context creation is much weaker here than in exec_simple_query. I tried to improve the comments too. I noticed that the bit about "(again, these must outlive the execution context)" seemed to be a dangling reference --- whatever previous comment it was referring to is not to be found anymore. So I made that self-contained. regards, tom lane
On Thu, Jul 11, 2019 at 3:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Langote <amitlangote09@gmail.com> writes: > > Attached updated patch. Thanks again. > > Pushed with a bit of further cleanup Thanks a lot. > --- most notably, the way > you had execute_sql_string(), it was still leaking any cruft > ProcessUtility might generate. We can fix that by running > ProcessUtility in the per-statement context too. Ah, I was thinking only about planning. > I also dropped the optimization for a single/last statement in > execute_sql_string(), and simplified it to just always create > and delete a child context. This was based on a couple of > thoughts. The norm in this code path is that there's multiple > statements, probably lots of them, so that the percentage savings > from getting rid of one context creation is likely negligible. > Also, unlike exec_simple_query, we *don't* know that the outer > context is due to be cleared right afterwards. Since > execute_sql_string() can run multiple times in one extension > command, in principle we could get bloat from not cleaning up > after the last command of each string. Admittedly, it's not > likely that you'd have so many strings involved that that > amounts to a lot, but between following upgrade-script chains > and cascaded module loads, there could be more than a couple. > So it seems like the argument for saving a context creation is > much weaker here than in exec_simple_query. Agreed. > I tried to improve the comments too. I noticed that the bit about > "(again, these must outlive the execution context)" seemed to be > a dangling reference --- whatever previous comment it was referring > to is not to be found anymore. So I made that self-contained. Thanks. Regards, Amit
Horiguchi-san, Thanks for the comment. My reply is a bit late now, but.... On Wed, Jul 10, 2019 at 5:39 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Wed, 10 Jul 2019 16:35:18 +0900, Amit Langote <amitlangote09@gmail.com> wrote: > - * Switch to appropriate context for constructing querytrees (again, > - * these must outlive the execution context). > + * Switch to appropriate context for constructing query and plan trees > + * (again, these must outlive the execution context). Normally, it's > + * MessageContext, but if there are more queries to plan, we use a > + * temporary child context that will be reset after executing this > + * query. We avoid that overhead of setting up a separate context > + * for the common case of having just a single query. > > Might be stupid, but I feel uneasy that "usually it must live in > MessageContxt, but not necessarily if there is succeeding > query".. *I* need more explanation why it is safe to use that > short-lived context. So the problem we're trying solve with this is that memory consumed when parsing/planning individual statements pile up in a single context (currently, MessageContext), which can lead to severe bloat especially if the planning of individual statements consumes huge amount of memory. The solution is to use a sub-context of MessageContext for each statement that's reset when its execution is finished. I think it's safe to use a shorter-lived context for each statement because the memory of a given statement should not need to be referenced when its execution finishes. Do you see any problem with that assumption? Thanks, Amit