Thread: Adding column "mem_usage" to view pg_prepared_statements
Hello,
I just implemented a small change that adds another column “mem_usage” to the system view “pg_prepared_statements”. It returns the memory usage total of CachedPlanSource.context, CachedPlanSource.query_content and if available CachedPlanSource.gplan.context.
Looks like this:
IKOffice_Daume=# prepare test as select * from vw_report_salesinvoice where salesinvoice_id = $1;
PREPARE
IKOffice_Daume=# select * from pg_prepared_statements;
name | statement | prepare_time | parameter_types | from_sql | mem_usage
------+----------------------------------------------------------------------------------+------------------------------+-----------------+----------+-----------
test | prepare test as select * from vw_report_salesinvoice where salesinvoice_id = $1; | 2019-07-27 20:21:12.63093+02 | {integer} | t | 33580232
(1 row)
I did this in preparation of reducing the memory usage of prepared statements and believe that this gives client application an option to investigate which prepared statements should be dropped. Also this makes it possible to directly examine the results of further changes and their effectiveness on reducing the memory load of prepared_statements.
Is a patch welcome or is this feature not of interest?
Also I wonder why the “prepare test as” is part of the statement column. I isn’t even part of the real statement that is prepared as far as I would assume. Would prefer to just have the “select *…” in that column.
Kind regards,
Daniel Migowski
Hi, On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote: > I just implemented a small change that adds another column "mem_usage" > to the system view "pg_prepared_statements". It returns the memory > usage total of CachedPlanSource.context, > CachedPlanSource.query_content and if available > CachedPlanSource.gplan.context. FWIW, it's generally easier to comment if you actually provide the patch, even if it's just POC, as that gives a better handle on how much additional complexity it introduces. I think this could be a useful feature. I'm not so sure we want it tied to just cached statements however - perhaps we ought to generalize it a bit more. Regarding the prepared statements specific considerations: I don't think we ought to explicitly reference CachedPlanSource.query_content, and CachedPlanSource.gplan.context. In the case of actual prepared statements (rather than oneshot plans) CachedPlanSource.query_context IIRC should live under CachedPlanSource.context. I think there's no relevant cases where gplan.context isn't a child of CachedPlanSource.context either, but not quite sure. Then we ought to just include child contexts in the memory computation (cf. logic in MemoryContextStatsInternal(), although you obviously wouldn't need all that). That way, if the cached statements has child contexts, we're going to stay accurate. > Also I wonder why the "prepare test as" is part of the statement > column. I isn't even part of the real statement that is prepared as > far as I would assume. Would prefer to just have the "select *..." in > that column. It's the statement that was executed. Note that you'll not see that in the case of protocol level prepared statements. It will sometimes include relevant information, e.g. about the types specified as part of the prepare (as in PREPARE foo(int, float, ...) AS ...). Greetings, Andres Freund
Hello Andres, how do you want to generalize it? Are you thinking about a view solely for the display of the memory usage of different objects?Like functions or views (that also have a plan associated with it, when I think about it)? While being interestingI still believe monitoring the mem usage of prepared statements is a bit more important than that of other objectsbecause of how they change memory consumption of the server without using any DDL or configuration options and I amnot aware of other objects with the same properties, or are there some? And for the other volatile objects like tablesand indexes and their contents PostgreSQL already has it's information functions. Regardless of that here is the patch for now. I didn't want to fiddle to much with MemoryContexts yet, so it still doesn'trecurse in child contexts, but I will change that also when I try to build a more compact MemoryContext implementationand see how that works out. Thanks for pointing out the relevant information in the statement column of the view. Regards, Daniel Migowski -----Ursprüngliche Nachricht----- Von: Andres Freund <andres@anarazel.de> Gesendet: Samstag, 27. Juli 2019 21:12 An: Daniel Migowski <dmigowski@ikoffice.de> Cc: pgsql-hackers@lists.postgresql.org Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements Hi, On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote: > I just implemented a small change that adds another column "mem_usage" > to the system view "pg_prepared_statements". It returns the memory > usage total of CachedPlanSource.context, > CachedPlanSource.query_content and if available > CachedPlanSource.gplan.context. FWIW, it's generally easier to comment if you actually provide the patch, even if it's just POC, as that gives a better handleon how much additional complexity it introduces. I think this could be a useful feature. I'm not so sure we want it tied to just cached statements however - perhaps we oughtto generalize it a bit more. Regarding the prepared statements specific considerations: I don't think we ought to explicitly reference CachedPlanSource.query_content,and CachedPlanSource.gplan.context. In the case of actual prepared statements (rather than oneshot plans) CachedPlanSource.query_context IIRC should live underCachedPlanSource.context. I think there's no relevant cases where gplan.context isn't a child of CachedPlanSource.contexteither, but not quite sure. Then we ought to just include child contexts in the memory computation (cf. logic in MemoryContextStatsInternal(), althoughyou obviously wouldn't need all that). That way, if the cached statements has child contexts, we're going to stayaccurate. > Also I wonder why the "prepare test as" is part of the statement > column. I isn't even part of the real statement that is prepared as > far as I would assume. Would prefer to just have the "select *..." in > that column. It's the statement that was executed. Note that you'll not see that in the case of protocol level prepared statements. Itwill sometimes include relevant information, e.g. about the types specified as part of the prepare (as in PREPARE foo(int,float, ...) AS ...). Greetings, Andres Freund
Attachment
Hello, Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because the systemcatalog is changed I would love to add it before 12.0 becomes stable. Regards, Daniel Migowski -----Ursprüngliche Nachricht----- Von: Daniel Migowski <dmigowski@ikoffice.de> Gesendet: Sonntag, 28. Juli 2019 08:21 An: Andres Freund <andres@anarazel.de> Cc: pgsql-hackers@lists.postgresql.org Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements Hello Andres, how do you want to generalize it? Are you thinking about a view solely for the display of the memory usage of different objects?Like functions or views (that also have a plan associated with it, when I think about it)? While being interestingI still believe monitoring the mem usage of prepared statements is a bit more important than that of other objectsbecause of how they change memory consumption of the server without using any DDL or configuration options and I amnot aware of other objects with the same properties, or are there some? And for the other volatile objects like tablesand indexes and their contents PostgreSQL already has it's information functions. Regardless of that here is the patch for now. I didn't want to fiddle to much with MemoryContexts yet, so it still doesn'trecurse in child contexts, but I will change that also when I try to build a more compact MemoryContext implementationand see how that works out. Thanks for pointing out the relevant information in the statement column of the view. Regards, Daniel Migowski -----Ursprüngliche Nachricht----- Von: Andres Freund <andres@anarazel.de> Gesendet: Samstag, 27. Juli 2019 21:12 An: Daniel Migowski <dmigowski@ikoffice.de> Cc: pgsql-hackers@lists.postgresql.org Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements Hi, On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote: > I just implemented a small change that adds another column "mem_usage" > to the system view "pg_prepared_statements". It returns the memory > usage total of CachedPlanSource.context, > CachedPlanSource.query_content and if available > CachedPlanSource.gplan.context. FWIW, it's generally easier to comment if you actually provide the patch, even if it's just POC, as that gives a better handleon how much additional complexity it introduces. I think this could be a useful feature. I'm not so sure we want it tied to just cached statements however - perhaps we oughtto generalize it a bit more. Regarding the prepared statements specific considerations: I don't think we ought to explicitly reference CachedPlanSource.query_content,and CachedPlanSource.gplan.context. In the case of actual prepared statements (rather than oneshot plans) CachedPlanSource.query_context IIRC should live underCachedPlanSource.context. I think there's no relevant cases where gplan.context isn't a child of CachedPlanSource.contexteither, but not quite sure. Then we ought to just include child contexts in the memory computation (cf. logic in MemoryContextStatsInternal(), althoughyou obviously wouldn't need all that). That way, if the cached statements has child contexts, we're going to stayaccurate. > Also I wonder why the "prepare test as" is part of the statement > column. I isn't even part of the real statement that is prepared as > far as I would assume. Would prefer to just have the "select *..." in > that column. It's the statement that was executed. Note that you'll not see that in the case of protocol level prepared statements. Itwill sometimes include relevant information, e.g. about the types specified as part of the prepare (as in PREPARE foo(int,float, ...) AS ...). Greetings, Andres Freund
Daniel Migowski <dmigowski@ikoffice.de> writes: > Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because thesystem catalog is changed I would love to add it before 12.0 becomes stable. v12 has been feature-frozen for months, and it's pretty hard to paint this as a bug fix, so I'd say the answer is "no". regards, tom lane
Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usage columnhere? If it was, I really would add some memory monitoring in our app regarding this. -----Ursprüngliche Nachricht----- Von: Tom Lane <tgl@sss.pgh.pa.us> Gesendet: Mittwoch, 31. Juli 2019 00:09 An: Daniel Migowski <dmigowski@ikoffice.de> Cc: Andres Freund <andres@anarazel.de>; pgsql-hackers@lists.postgresql.org Betreff: Re: AW: Adding column "mem_usage" to view pg_prepared_statements Daniel Migowski <dmigowski@ikoffice.de> writes: > Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because thesystem catalog is changed I would love to add it before 12.0 becomes stable. v12 has been feature-frozen for months, and it's pretty hard to paint this as a bug fix, so I'd say the answer is "no". regards, tom lane
On Tue, Jul 30, 2019 at 10:01:09PM +0000, Daniel Migowski wrote: >Hello, > >Will my patch be considered for 12.0? The calculation of the mem_usage >value might be improved later on but because the system catalog is >changed I would love to add it before 12.0 becomes stable. > Nope. Code freeze for PG12 data was April 7, 2019. You're a bit late for that, unfortunately. We're in PG13 land now. FWIW not sure what mail client you're using, but it seems to be breaking the threads for some reason, splitting it into two - see [1]. Also, please stop top-posting. It makes it way harder to follow the discussion. [1] https://www.postgresql.org/message-id/flat/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4%40EXCHANGESERVER.ikoffice.de regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Daniel Migowski <dmigowski@ikoffice.de> writes: > Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usagecolumn here? If it was, I really would add some memory monitoring in our app regarding this. You should certainly put it into the next commitfest. We might or might not accept it, but if it's not listed in the CF we might not remember to even review it. (The CF app is really a to-do list for patches ...) regards, tom lane
Am 31.07.2019 um 00:17 schrieb Tomas Vondra: > FWIW not sure what mail client you're using, but it seems to be breaking > the threads for some reason, splitting it into two - see [1]. > > Also, please stop top-posting. It makes it way harder to follow the > discussion. Was using Outlook because it's my companies mail client but switching to Thunderbird now for communication with the list, trying to be a good citizen now. Regards, Daniel Migowski
Am 31.07.2019 um 00:29 schrieb Tom Lane: > Daniel Migowski <dmigowski@ikoffice.de> writes: >> Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usagecolumn here? If it was, I really would add some memory monitoring in our app regarding this. > You should certainly put it into the next commitfest. We might or > might not accept it, but if it's not listed in the CF we might > not remember to even review it. (The CF app is really a to-do > list for patches ...) OK, added it there. Thanks for your patience :). Regards, Daniel Migowski
On Tue, Jul 30, 2019 at 10:01:09PM +0000, Daniel Migowski wrote: > Hello, > > Will my patch be considered for 12.0? The calculation of the > mem_usage value might be improved later on but because the system > catalog is changed I would love to add it before 12.0 becomes > stable. Feature freeze was April 8, so no. https://www.mail-archive.com/pgsql-hackers@lists.postgresql.org/msg37059.html You're in plenty of time for 13, though! Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 31.07.2019 1:38, Daniel Migowski wrote: > > Am 31.07.2019 um 00:29 schrieb Tom Lane: >> Daniel Migowski <dmigowski@ikoffice.de> writes: >>> Ok, just have read about the commitfest thing. Is the patch OK for >>> that? Or is there generally no love for a mem_usage column here? If >>> it was, I really would add some memory monitoring in our app >>> regarding this. >> You should certainly put it into the next commitfest. We might or >> might not accept it, but if it's not listed in the CF we might >> not remember to even review it. (The CF app is really a to-do >> list for patches ...) > > OK, added it there. Thanks for your patience :). > > Regards, > Daniel Migowski > The patch is not applied to the most recent source because extra parameter was added to CreateTemplateTupleDesc method. Please rebase it - the fix is trivial. I think that including in pg_prepared_statements information about memory used this statement is very useful. CachedPlanMemoryUsage function may be useful not only for this view, but for example it is also need in my autoprepare patch. I wonder if you consider go further and not only report but control memory used by prepared statements? For example implement some LRU replacement discipline on top of prepared statements cache which can evict rarely used prepared statements to avoid memory overflow. We have such patch for PgPro-EE but it limits only number of prepared statement, not taken in account amount of memory used by them. I think that memory based limit will be more accurate (although it adds more overhead). If you want, I can be reviewer of your patch. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote: > how do you want to generalize it? Are you thinking about a view solely > for the display of the memory usage of different objects? I'm not quite sure. I'm just not sure that adding separate infrastructure for various objects is a sutainable approach. We'd likely want to have this for prepared statements, for cursors, for the current statement, for various caches, ... I think an approach would be to add an 'owning_object' field to memory contexts, which has to point to a Node type if set. A table returning reporting function could recursively walk through the memory contexts, starting at TopMemoryContext. Whenever it encounters a context with owning_object set, it prints a string version of nodeTag(owning_object). For some node types it knows about (e.g. PreparedStatement, Portal, perhaps some of the caches), it prints additional metadata specific to the type (so for prepared statement it'd be something like 'prepared statement', '[name of prepared statement]'), and prints information about the associated context and all its children. The general context information probably should be something like: context_name, context_ident, context_total_bytes, context_total_blocks, context_total_freespace, context_total_freechunks, context_total_used, context_total_children context_self_bytes, context_self_blocks, context_self_freespace, context_self_freechunks, context_self_used, context_self_children, It might make sense to have said function return a row for the contexts it encounters that do not have an owner set too (that way we'd e.g. get CacheMemoryContext handled), but then still recurse. Arguably the proposed owning_object field would be a bit redundant with the already existing ident/MemoryContextSetIdentifier field, which e.g. already associates the query string with the contexts used for a prepared statement. But I'm not convinced that's going to be enough context in a lot of cases, because e.g. for prepared statements it could be interesting to have access to both the prepared statement name, and the statement. The reason I like something like this is that we wouldn't add new columns to a number of views, and lack views to associate such information to for some objects. And it'd be disproportional to add all the information to numerous places anyway. One counter-argument is that it'd be more expensive to get information specific to prepared statements (or other object types) that way. I'm not sure I buy that that's a problem - this isn't something that's likely going to be used at a high frequency. But if it becomes a problem, we can add a function that starts that process at a distinct memory context (e.g. a function that does this just for a single prepared statement, identified by name) - but I'd not start there. > While being interesting I still believe monitoring the mem usage of > prepared statements is a bit more important than that of other objects > because of how they change memory consumption of the server without > using any DDL or configuration options and I am not aware of other > objects with the same properties, or are there some? And for the other > volatile objects like tables and indexes and their contents PostgreSQL > already has it's information functions. Plenty other objects have that property. E.g. cursors. And for the catalog/relation/... caches it's even more pernicious - the client might have closed all its "handles", but we still use memory (and it's absolutely crucial for performance). Greetings, Andres Freund
Am 05.08.2019 um 18:30 schrieb Konstantin Knizhnik: > On 31.07.2019 1:38, Daniel Migowski wrote: >> Am 31.07.2019 um 00:29 schrieb Tom Lane: >>> Daniel Migowski <dmigowski@ikoffice.de> writes: >>>> Ok, just have read about the commitfest thing. Is the patch OK for >>>> that? Or is there generally no love for a mem_usage column here? If >>>> it was, I really would add some memory monitoring in our app >>>> regarding this. >>> You should certainly put it into the next commitfest. We might or >>> might not accept it, but if it's not listed in the CF we might >>> not remember to even review it. (The CF app is really a to-do >>> list for patches ...) >> OK, added it there. Thanks for your patience :). > The patch is not applied to the most recent source because extra > parameter was added to CreateTemplateTupleDesc method. > Please rebase it - the fix is trivial. OK, will do. > I think that including in pg_prepared_statements information about > memory used this statement is very useful. > CachedPlanMemoryUsage function may be useful not only for this view, > but for example it is also need in my autoprepare patch. I would love to use your work if it's done, and would also love to work together here. I am quite novice in C thought, I might take my time to get things right. > I wonder if you consider go further and not only report but control > memory used by prepared statements? > For example implement some LRU replacement discipline on top of > prepared statements cache which can > evict rarely used prepared statements to avoid memory overflow. THIS! Having some kind of safety net here would finally make sure that my precious processes will not grow endlessly until all mem is eaten up, even with prep statement count limits. While working on stuff I noticed there are three things stored in a CachedPlanSource. The raw query tree (a relatively small thing), the query list (analyzed-and-rewritten query tree) which takes up the most memory (at least here, maybe different with your usecases), and (often after the 6th call) the CachedPlan, which is more optimized that the query list and often needs less memory (half of the query list here). The query list seems to take the most time to create here, because I hit the GEQO engine here, but it could be recreated easily (up to 500ms for some queries). Creating the CachedPlan afterwards takes 60ms in some usecase. IF we could just invalidate them from time to time, that would be grate. Also, invalidating just the queries or the CachedPlan would not invalidate the whole prepared statement, which would break clients expectations, but just make them a slower, adding much to the stability of the system. I would pay that price, because I just don't use manually named prepared statements anyway and just autogenerate them as performance sugar without thinking about what really needs to be prepared anyway. There is an option in the JDBC driver to use prepared statements automatically after you have used them a few time. > We have such patch for PgPro-EE but it limits only number of prepared > statement, not taken in account amount of memory used by them. > I think that memory based limit will be more accurate (although it > adds more overhead). Limiting them by number is already done automatically here and would really not be of much value, but having a mem limit would be great. We could have a combined memory limit for your autoprepared statements as well as the manually prepared ones, so clients can know for sure the server processes won't eat up more that e.g. 800MB for prepared statements. And also I would like to have this value spread across all client processes, e.g. specifying max_prepared_statement_total_mem=5G for the server, and maybe max_prepared_statement_mem=1G for client processes. Of course we would have to implement cross client process invalidation here, and I don't know if communicating client processes are even intended. Anyway, a memory limit won't really add that much more overhead. At least not more than having no prepared statements at all because of the fear of server OOMs, or have just a small count of those statements. I was even think about a prepared statement reaper that checks the pg_prepared_statements every few minutes to clean things up manually, but having this in the server would be of great value to me. > If you want, I can be reviewer of your patch. I'd love to have you as my reviewer. Regards, Daniel Migowski
Am 05.08.2019 um 19:16 schrieb Andres Freund: > On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote: >> how do you want to generalize it? Are you thinking about a view solely >> for the display of the memory usage of different objects? > I'm not quite sure. I'm just not sure that adding separate > infrastructure for various objects is a sutainable approach. We'd likely > want to have this for prepared statements, for cursors, for the current > statement, for various caches, ... > > I think an approach would be to add an 'owning_object' field to memory > contexts, which has to point to a Node type if set. A table returning reporting > function could recursively walk through the memory contexts, starting at > TopMemoryContext. Whenever it encounters a context with owning_object > set, it prints a string version of nodeTag(owning_object). For some node > types it knows about (e.g. PreparedStatement, Portal, perhaps some of > the caches), it prints additional metadata specific to the type (so for > prepared statement it'd be something like 'prepared statement', '[name > of prepared statement]'), and prints information about the associated > context and all its children. I understand. So it would be something like the output of MemoryContextStatsInternal, but in table form with some extra columns. I would have loved this extra information already in MemoryContextStatsInternal btw., so it might be a good idea to upgrade it first to find the information and wrap a table function over it afterwards. > The general context information probably should be something like: > context_name, context_ident, > context_total_bytes, context_total_blocks, context_total_freespace, context_total_freechunks, context_total_used, context_total_children > context_self_bytes, context_self_blocks, context_self_freespace, context_self_freechunks, context_self_used, context_self_children, > > It might make sense to have said function return a row for the contexts > it encounters that do not have an owner set too (that way we'd e.g. get > CacheMemoryContext handled), but then still recurse. A nice way to learn about the internals of the server and to analyze the effects of memory reducing enhancements. > Arguably the proposed owning_object field would be a bit redundant with > the already existing ident/MemoryContextSetIdentifier field, which > e.g. already associates the query string with the contexts used for a > prepared statement. But I'm not convinced that's going to be enough > context in a lot of cases, because e.g. for prepared statements it could > be interesting to have access to both the prepared statement name, and > the statement. The identifier seems to be more like a category at the moment, because it does not seem to hold any relevant information about the object in question. So a more specific name would be nice. > The reason I like something like this is that we wouldn't add new > columns to a number of views, and lack views to associate such > information to for some objects. And it'd be disproportional to add all > the information to numerous places anyway. I understand your argumentation, but things like Cursors and Portals are rather short living while prepared statements seem to be the place where memory really builds up. > One counter-argument is that it'd be more expensive to get information > specific to prepared statements (or other object types) that way. I'm > not sure I buy that that's a problem - this isn't something that's > likely going to be used at a high frequency. But if it becomes a > problem, we can add a function that starts that process at a distinct > memory context (e.g. a function that does this just for a single > prepared statement, identified by name) - but I'd not start there. I also see no problem here, and with Konstantin Knizhnik's autoprepare I wouldn't use this very often anyway, more just for monitoring purposes, where I don't care if my query is a bit more complex. >> While being interesting I still believe monitoring the mem usage of >> prepared statements is a bit more important than that of other objects >> because of how they change memory consumption of the server without >> using any DDL or configuration options and I am not aware of other >> objects with the same properties, or are there some? And for the other >> volatile objects like tables and indexes and their contents PostgreSQL >> already has it's information functions. > Plenty other objects have that property. E.g. cursors. And for the > catalog/relation/... caches it's even more pernicious - the client might > have closed all its "handles", but we still use memory (and it's > absolutely crucial for performance). Maybe we can do both? Add a single column to pg_prepared_statements, and add another table for the output of MemoryContextStatsDetail? This has the advantage that the single real memory indicator useful for end users (to the question: How much mem takes my sh*t up?) is in pg_prepared_statements and some more intrinsic information in a detail view. Thinking about the latter I am against such a table, at least in the form where it gives information like context_total_freechunks, because it would just be useful for us developers. Why should any end user care for how many chunks are still open in a MemoryContext, except when he is working on C-style extensions. Could just be a source of confusion for them. Let's think about the goal this should have: The end user should be able to monitor the memory consumption of things he's in control of or could affect the system performance. Should such a table automatically aggregate some information? I think so. I would not add more than two memory columns to the view, just mem_used and mem_reserved. And even mem_used is questionable, because in his eyes only the memory he cannot use for other stuff because of object x is important for him (that was the reason I just added one column). He would even ask: WHY is there 50% more memory reserved than used, and how I can optimize it? (Would lead to more curious PostgreSQL developers maybe, so that's maybe a plus). Something that also clearly speaks FOR such a table and against my proposal is, that if someone cares for memory, he would most likely care for ALL his memory, and in that case monitoring prepared statements would just be a small subset of stuff to monitor. Ok, I am defeated and will rewrite my patch if the next proposal finds approval: I would propose a table pg_mem_usage containing the columns object_class, name, detail, mem_usage (rename them if it fits the style of the other tables more). The name would be empty for some objects like the unnamed prepared statement, the query strings would be in the detail column. One could add a final "Other" row containing the mem no specific output line has been accounted for. Also it could contain lines for Cursors and other stuff I am to novice to think of here. And last: A reason why still we need a child-parent-relationship in this table (and distinct this_ and total_ mem functions), is that prepared statements start up to use much more memory when the Generic Plan is stored in it after a few uses. As a user I always had the assumption that prepared a statement would already do all the required work to be fast, but a statement just becomes blazingly fast when the Generic Plan is available (and used), and it would be nice to see for which statements that plan has already been generated to consume his memory. I believe the reason for this would be the fear of excessive memory usage. On the other hand: The Generic Plan had been created for the first invocation of the prepared statement, why not store it immediatly. It is a named statement for a reason that it is intended to be reused, even when it is just twice, and since memory seems not to be seen as a scarce resource in this context why not store that immediately. Would drop the need for a hierarchy here also. Any comments? Regards, Daniel Migowski
Hi, On 2019-08-05 22:46:47 +0200, Daniel Migowski wrote: > > Arguably the proposed owning_object field would be a bit redundant with > > the already existing ident/MemoryContextSetIdentifier field, which > > e.g. already associates the query string with the contexts used for a > > prepared statement. But I'm not convinced that's going to be enough > > context in a lot of cases, because e.g. for prepared statements it could > > be interesting to have access to both the prepared statement name, and > > the statement. > The identifier seems to be more like a category at the moment, because it > does not seem to hold any relevant information about the object in question. > So a more specific name would be nice. I think you might be thinking of the context's name, not ident? E.g. for prepared statements the context name is: source_context = AllocSetContextCreate(CurrentMemoryContext, "CachedPlanSource", ALLOCSET_START_SMALL_SIZES); which is obviously the same for every statement. But then there's MemoryContextSetIdentifier(source_context, plansource->query_string); which obviously differs. > > The reason I like something like this is that we wouldn't add new > > columns to a number of views, and lack views to associate such > > information to for some objects. And it'd be disproportional to add all > > the information to numerous places anyway. > I understand your argumentation, but things like Cursors and Portals are > rather short living while prepared statements seem to be the place where > memory really builds up. That's not necessarily true, especially given WITH HOLD cursors. Nor does one only run out of memory in the context of long-lived objects. > > > While being interesting I still believe monitoring the mem usage of > > > prepared statements is a bit more important than that of other objects > > > because of how they change memory consumption of the server without > > > using any DDL or configuration options and I am not aware of other > > > objects with the same properties, or are there some? And for the other > > > volatile objects like tables and indexes and their contents PostgreSQL > > > already has it's information functions. > > Plenty other objects have that property. E.g. cursors. And for the > > catalog/relation/... caches it's even more pernicious - the client might > > have closed all its "handles", but we still use memory (and it's > > absolutely crucial for performance). > > Maybe we can do both? Add a single column to pg_prepared_statements, and add > another table for the output of MemoryContextStatsDetail? This has the > advantage that the single real memory indicator useful for end users (to the > question: How much mem takes my sh*t up?) is in pg_prepared_statements and > some more intrinsic information in a detail view. I don't see why we'd want to do both. Just makes pg_prepared_statements a considerably more expensive. And that's used by some applications / clients in an automated manner. > Thinking about the latter I am against such a table, at least in the form > where it gives information like context_total_freechunks, because it would > just be useful for us developers. Developers are also an audience for us. I mean we certainly can use this information during development. But even for bugreports such information would be useufl. > Why should any end user care for how many > chunks are still open in a MemoryContext, except when he is working on > C-style extensions. Could just be a source of confusion for them. Meh. As long as the crucial stuff is first, that's imo enough. > Let's think about the goal this should have: The end user should be able to > monitor the memory consumption of things he's in control of or could affect > the system performance. Should such a table automatically aggregate some > information? I think so. I would not add more than two memory columns to the > view, just mem_used and mem_reserved. And even mem_used is questionable, > because in his eyes only the memory he cannot use for other stuff because of > object x is important for him (that was the reason I just added one column). > He would even ask: WHY is there 50% more memory reserved than used, and how > I can optimize it? (Would lead to more curious PostgreSQL developers maybe, > so that's maybe a plus). It's important because it influences how memory usage will grow. > On the other hand: The Generic Plan had been created for the first > invocation of the prepared statement, why not store it immediatly. It is a > named statement for a reason that it is intended to be reused, even when it > is just twice, and since memory seems not to be seen as a scarce resource in > this context why not store that immediately. Would drop the need for a > hierarchy here also. Well, we'll maybe never use it, so ... Greetings, Andres Freund
On 05.08.2019 22:35, Daniel Migowski wrote: > . >> I think that including in pg_prepared_statements information about >> memory used this statement is very useful. >> CachedPlanMemoryUsage function may be useful not only for this view, >> but for example it is also need in my autoprepare patch. > I would love to use your work if it's done, and would also love to > work together here. I am quite novice in C thought, I might take my > time to get things right. Right now I resused your implementation of CachedPlanMemoryUsage function:) Before I took in account only memory used by plan->context, but not of plan->query_context and plan->gplan->context (although query_context for raw parse tree seems to be much smaller). >> I wonder if you consider go further and not only report but control >> memory used by prepared statements? >> For example implement some LRU replacement discipline on top of >> prepared statements cache which can >> evict rarely used prepared statements to avoid memory overflow. > > THIS! Having some kind of safety net here would finally make sure that > my precious processes will not grow endlessly until all mem is eaten > up, even with prep statement count limits. > > While working on stuff I noticed there are three things stored in a > CachedPlanSource. The raw query tree (a relatively small thing), the > query list (analyzed-and-rewritten query tree) which takes up the most > memory (at least here, maybe different with your usecases), and (often > after the 6th call) the CachedPlan, which is more optimized that the > query list and often needs less memory (half of the query list here). > > The query list seems to take the most time to create here, because I > hit the GEQO engine here, but it could be recreated easily (up to > 500ms for some queries). Creating the CachedPlan afterwards takes 60ms > in some usecase. IF we could just invalidate them from time to time, > that would be grate. > > Also, invalidating just the queries or the CachedPlan would not > invalidate the whole prepared statement, which would break clients > expectations, but just make them a slower, adding much to the > stability of the system. I would pay that price, because I just don't > use manually named prepared statements anyway and just autogenerate > them as performance sugar without thinking about what really needs to > be prepared anyway. There is an option in the JDBC driver to use > prepared statements automatically after you have used them a few time. I have noticed that cached plans for implicitly prepared statements in stored procedures are not shown in pg_prepared_statements view. It may be not a problem in your case (if you are accessing Postgres through JDBC and not using prepared statements), but can cause memory overflow in applications actively using stored procedures, because unlike explicitly created prepared statements, it is very difficult to estimate and control statements implicitly prepared by plpgsql. I am not sure what will be the best solution in this case. Adding yet another view for implicitly prepared statements? Or include them in pg_prepared_statements view? > >> We have such patch for PgPro-EE but it limits only number of prepared >> statement, not taken in account amount of memory used by them. >> I think that memory based limit will be more accurate (although it >> adds more overhead). > > Limiting them by number is already done automatically here and would > really not be of much value, but having a mem limit would be great. We > could have a combined memory limit for your autoprepared statements as > well as the manually prepared ones, so clients can know for sure the > server processes won't eat up more that e.g. 800MB for prepared > statements. And also I would like to have this value spread across all > client processes, e.g. specifying max_prepared_statement_total_mem=5G > for the server, and maybe max_prepared_statement_mem=1G for client > processes. Of course we would have to implement cross client process > invalidation here, and I don't know if communicating client processes > are even intended. > > Anyway, a memory limit won't really add that much more overhead. At > least not more than having no prepared statements at all because of > the fear of server OOMs, or have just a small count of those > statements. I was even think about a prepared statement reaper that > checks the pg_prepared_statements every few minutes to clean things up > manually, but having this in the server would be of great value to me. Right now memory context has no field containing amount of currently used memory. This is why context->methods->stats function implementation has to traverse all blocks to calculate size of memory used by context. It may be not so fast for large contexts. But I do not expect that contexts of prepared statements will be very large, although I have deal with customers which issued queries with query text length larger than few megabytes. I afraid to estimate size of plan for such queries. This is the reason of my concern that calculating memory context size may have negative effect on performance. But is has to be done only once when statement is prepared. So may be it is not a problem at all. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hello Daniel, This patch no longer applies. Please submit an updated version. Also, there's voluminous discussion that doesn't seem to have resulted in any revision of the code. Please fix that too. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services