Thread: Adding column "mem_usage" to view pg_prepared_statements

Adding column "mem_usage" to view pg_prepared_statements

From
Daniel Migowski
Date:

Hello,

 

I just implemented a small change that adds another column “mem_usage” to the system view “pg_prepared_statements”. It returns the memory usage total of CachedPlanSource.context, CachedPlanSource.query_content and if available CachedPlanSource.gplan.context.

 

Looks like this:

 

IKOffice_Daume=# prepare test as select * from vw_report_salesinvoice where salesinvoice_id = $1;

PREPARE

IKOffice_Daume=# select * from pg_prepared_statements;

name |                                    statement                                     |         prepare_time         | parameter_types | from_sql | mem_usage

------+----------------------------------------------------------------------------------+------------------------------+-----------------+----------+-----------

test | prepare test as select * from vw_report_salesinvoice where salesinvoice_id = $1; | 2019-07-27 20:21:12.63093+02 | {integer}       | t        |  33580232

(1 row)

 

I did this in preparation of reducing the memory usage of prepared statements and believe that this gives client application an option to investigate which prepared statements should be dropped. Also this makes it possible to directly examine the results of further changes and their effectiveness on reducing the memory load of prepared_statements.

 

Is a patch welcome or is this feature not of interest?

 

Also I wonder why the “prepare test as” is part of the statement column. I isn’t even part of the real statement that is prepared as far as I would assume. Would prefer to just have the “select *…” in that column.

 

Kind regards,

Daniel Migowski

 

Re: Adding column "mem_usage" to view pg_prepared_statements

From
Andres Freund
Date:
Hi,

On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote:
> I just implemented a small change that adds another column "mem_usage"
> to the system view "pg_prepared_statements". It returns the memory
> usage total of CachedPlanSource.context,
> CachedPlanSource.query_content and if available
> CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the
patch, even if it's just POC, as that gives a better handle on how much
additional complexity it introduces.

I think this could be a useful feature. I'm not so sure we want it tied
to just cached statements however - perhaps we ought to generalize it a
bit more.


Regarding the prepared statements specific considerations: I don't think
we ought to explicitly reference CachedPlanSource.query_content, and
CachedPlanSource.gplan.context.

In the case of actual prepared statements (rather than oneshot plans)
CachedPlanSource.query_context IIRC should live under
CachedPlanSource.context.  I think there's no relevant cases where
gplan.context isn't a child of CachedPlanSource.context either, but not
quite sure.

Then we ought to just include child contexts in the memory computation
(cf. logic in MemoryContextStatsInternal(), although you obviously
wouldn't need all that). That way, if the cached statements has child
contexts, we're going to stay accurate.


> Also I wonder why the "prepare test as" is part of the statement
> column. I isn't even part of the real statement that is prepared as
> far as I would assume. Would prefer to just have the "select *..." in
> that column.

It's the statement that was executed. Note that you'll not see that in
the case of protocol level prepared statements.  It will sometimes
include relevant information, e.g. about the types specified as part of
the prepare (as in PREPARE foo(int, float, ...) AS ...).

Greetings,

Andres Freund



Re: Adding column "mem_usage" to view pg_prepared_statements

From
Daniel Migowski
Date:
Hello Andres,

how do you want to generalize it? Are you thinking about a view solely for the display of the memory usage of different
objects?Like functions or views (that also have a plan associated with it, when I think about it)? While being
interestingI still believe monitoring the mem usage of prepared statements is a bit more important than that of other
objectsbecause of how they change memory consumption of the server without using any DDL or configuration options and I
amnot aware of other objects with the same properties, or are there some? And for the other volatile objects like
tablesand indexes and their contents PostgreSQL already has it's information functions.  

Regardless of that here is the patch for now. I didn't want to fiddle to much with MemoryContexts yet, so it still
doesn'trecurse in child contexts, but I will change that also when I try to build a more compact MemoryContext
implementationand see how that works out. 

Thanks for pointing out the relevant information in the statement column of the view.

Regards,
Daniel Migowski

-----Ursprüngliche Nachricht-----
Von: Andres Freund <andres@anarazel.de>
Gesendet: Samstag, 27. Juli 2019 21:12
An: Daniel Migowski <dmigowski@ikoffice.de>
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote:
> I just implemented a small change that adds another column "mem_usage"
> to the system view "pg_prepared_statements". It returns the memory
> usage total of CachedPlanSource.context,
> CachedPlanSource.query_content and if available
> CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the patch, even if it's just POC, as that gives a better
handleon how much additional complexity it introduces. 

I think this could be a useful feature. I'm not so sure we want it tied to just cached statements however - perhaps we
oughtto generalize it a bit more. 


Regarding the prepared statements specific considerations: I don't think we ought to explicitly reference
CachedPlanSource.query_content,and CachedPlanSource.gplan.context. 

In the case of actual prepared statements (rather than oneshot plans) CachedPlanSource.query_context IIRC should live
underCachedPlanSource.context.  I think there's no relevant cases where gplan.context isn't a child of
CachedPlanSource.contexteither, but not quite sure. 

Then we ought to just include child contexts in the memory computation (cf. logic in MemoryContextStatsInternal(),
althoughyou obviously wouldn't need all that). That way, if the cached statements has child contexts, we're going to
stayaccurate. 


> Also I wonder why the "prepare test as" is part of the statement
> column. I isn't even part of the real statement that is prepared as
> far as I would assume. Would prefer to just have the "select *..." in
> that column.

It's the statement that was executed. Note that you'll not see that in the case of protocol level prepared statements.
Itwill sometimes include relevant information, e.g. about the types specified as part of the prepare (as in PREPARE
foo(int,float, ...) AS ...). 

Greetings,

Andres Freund

Attachment

AW: Adding column "mem_usage" to view pg_prepared_statements

From
Daniel Migowski
Date:
Hello,

Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because the
systemcatalog is changed I would love to add it before 12.0 becomes stable.  

Regards,
Daniel Migowski

-----Ursprüngliche Nachricht-----
Von: Daniel Migowski <dmigowski@ikoffice.de>
Gesendet: Sonntag, 28. Juli 2019 08:21
An: Andres Freund <andres@anarazel.de>
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hello Andres,

how do you want to generalize it? Are you thinking about a view solely for the display of the memory usage of different
objects?Like functions or views (that also have a plan associated with it, when I think about it)? While being
interestingI still believe monitoring the mem usage of prepared statements is a bit more important than that of other
objectsbecause of how they change memory consumption of the server without using any DDL or configuration options and I
amnot aware of other objects with the same properties, or are there some? And for the other volatile objects like
tablesand indexes and their contents PostgreSQL already has it's information functions.  

Regardless of that here is the patch for now. I didn't want to fiddle to much with MemoryContexts yet, so it still
doesn'trecurse in child contexts, but I will change that also when I try to build a more compact MemoryContext
implementationand see how that works out. 

Thanks for pointing out the relevant information in the statement column of the view.

Regards,
Daniel Migowski

-----Ursprüngliche Nachricht-----
Von: Andres Freund <andres@anarazel.de>
Gesendet: Samstag, 27. Juli 2019 21:12
An: Daniel Migowski <dmigowski@ikoffice.de>
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote:
> I just implemented a small change that adds another column "mem_usage"
> to the system view "pg_prepared_statements". It returns the memory
> usage total of CachedPlanSource.context,
> CachedPlanSource.query_content and if available
> CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the patch, even if it's just POC, as that gives a better
handleon how much additional complexity it introduces. 

I think this could be a useful feature. I'm not so sure we want it tied to just cached statements however - perhaps we
oughtto generalize it a bit more. 


Regarding the prepared statements specific considerations: I don't think we ought to explicitly reference
CachedPlanSource.query_content,and CachedPlanSource.gplan.context. 

In the case of actual prepared statements (rather than oneshot plans) CachedPlanSource.query_context IIRC should live
underCachedPlanSource.context.  I think there's no relevant cases where gplan.context isn't a child of
CachedPlanSource.contexteither, but not quite sure. 

Then we ought to just include child contexts in the memory computation (cf. logic in MemoryContextStatsInternal(),
althoughyou obviously wouldn't need all that). That way, if the cached statements has child contexts, we're going to
stayaccurate. 


> Also I wonder why the "prepare test as" is part of the statement
> column. I isn't even part of the real statement that is prepared as
> far as I would assume. Would prefer to just have the "select *..." in
> that column.

It's the statement that was executed. Note that you'll not see that in the case of protocol level prepared statements.
Itwill sometimes include relevant information, e.g. about the types specified as part of the prepare (as in PREPARE
foo(int,float, ...) AS ...). 

Greetings,

Andres Freund



Re: AW: Adding column "mem_usage" to view pg_prepared_statements

From
Tom Lane
Date:
Daniel Migowski <dmigowski@ikoffice.de> writes:
> Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because
thesystem catalog is changed I would love to add it before 12.0 becomes stable.  

v12 has been feature-frozen for months, and it's pretty hard to paint
this as a bug fix, so I'd say the answer is "no".

            regards, tom lane



AW: AW: Adding column "mem_usage" to view pg_prepared_statements

From
Daniel Migowski
Date:
Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usage
columnhere? If it was, I really would add some memory monitoring in our app regarding this. 

-----Ursprüngliche Nachricht-----
Von: Tom Lane <tgl@sss.pgh.pa.us>
Gesendet: Mittwoch, 31. Juli 2019 00:09
An: Daniel Migowski <dmigowski@ikoffice.de>
Cc: Andres Freund <andres@anarazel.de>; pgsql-hackers@lists.postgresql.org
Betreff: Re: AW: Adding column "mem_usage" to view pg_prepared_statements

Daniel Migowski <dmigowski@ikoffice.de> writes:
> Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because
thesystem catalog is changed I would love to add it before 12.0 becomes stable.  

v12 has been feature-frozen for months, and it's pretty hard to paint this as a bug fix, so I'd say the answer is "no".

            regards, tom lane



Re: Adding column "mem_usage" to view pg_prepared_statements

From
Tomas Vondra
Date:
On Tue, Jul 30, 2019 at 10:01:09PM +0000, Daniel Migowski wrote:
>Hello,
>
>Will my patch be considered for 12.0? The calculation of the mem_usage
>value might be improved later on but because the system catalog is
>changed I would love to add it before 12.0 becomes stable.
>

Nope. Code freeze for PG12 data was April 7, 2019. You're a bit late for
that, unfortunately. We're in PG13 land now.

FWIW not sure what mail client you're using, but it seems to be breaking
the threads for some reason, splitting it into two - see [1].

Also, please stop top-posting. It makes it way harder to follow the
discussion.

[1] https://www.postgresql.org/message-id/flat/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4%40EXCHANGESERVER.ikoffice.de


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: AW: AW: Adding column "mem_usage" to view pg_prepared_statements

From
Tom Lane
Date:
Daniel Migowski <dmigowski@ikoffice.de> writes:
> Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a
mem_usagecolumn here? If it was, I really would add some memory monitoring in our app regarding this. 

You should certainly put it into the next commitfest.  We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it.  (The CF app is really a to-do
list for patches ...)

            regards, tom lane



Re: Adding column "mem_usage" to view pg_prepared_statements

From
Daniel Migowski
Date:
Am 31.07.2019 um 00:17 schrieb Tomas Vondra:
> FWIW not sure what mail client you're using, but it seems to be breaking
> the threads for some reason, splitting it into two - see [1].
>
> Also, please stop top-posting. It makes it way harder to follow the
> discussion.

Was using Outlook because it's my companies mail client but switching to 
Thunderbird now for communication with the list, trying to be a good 
citizen now.

Regards,
Daniel Migowski




Re: Adding column "mem_usage" to view pg_prepared_statements

From
Daniel Migowski
Date:
Am 31.07.2019 um 00:29 schrieb Tom Lane:
> Daniel Migowski <dmigowski@ikoffice.de> writes:
>> Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a
mem_usagecolumn here? If it was, I really would add some memory monitoring in our app regarding this.
 
> You should certainly put it into the next commitfest.  We might or
> might not accept it, but if it's not listed in the CF we might
> not remember to even review it.  (The CF app is really a to-do
> list for patches ...)

OK, added it there. Thanks for your patience :).

Regards,
Daniel Migowski




Re: Adding column "mem_usage" to view pg_prepared_statements

From
David Fetter
Date:
On Tue, Jul 30, 2019 at 10:01:09PM +0000, Daniel Migowski wrote:
> Hello,
> 
> Will my patch be considered for 12.0? The calculation of the
> mem_usage value might be improved later on but because the system
> catalog is changed I would love to add it before 12.0 becomes
> stable. 

Feature freeze was April 8, so no.

https://www.mail-archive.com/pgsql-hackers@lists.postgresql.org/msg37059.html

You're in plenty of time for 13, though!

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Adding column "mem_usage" to view pg_prepared_statements

From
Konstantin Knizhnik
Date:

On 31.07.2019 1:38, Daniel Migowski wrote:
>
> Am 31.07.2019 um 00:29 schrieb Tom Lane:
>> Daniel Migowski <dmigowski@ikoffice.de> writes:
>>> Ok, just have read about the commitfest thing. Is the patch OK for 
>>> that? Or is there generally no love for a mem_usage column here? If 
>>> it was, I really would add some memory monitoring in our app 
>>> regarding this.
>> You should certainly put it into the next commitfest.  We might or
>> might not accept it, but if it's not listed in the CF we might
>> not remember to even review it.  (The CF app is really a to-do
>> list for patches ...)
>
> OK, added it there. Thanks for your patience :).
>
> Regards,
> Daniel Migowski
>

The patch is not applied to the most recent source because extra 
parameter was added to CreateTemplateTupleDesc method.
Please rebase it - the fix is trivial.

I think that including in pg_prepared_statements information about 
memory used this statement is very useful.
CachedPlanMemoryUsage function may be useful not only for this view, but 
for example it is also need in my autoprepare patch.

I wonder if you consider go further and not only report but control 
memory used by prepared statements?
For example implement some LRU replacement discipline on top of prepared 
statements cache which can
evict rarely used prepared statements to avoid memory overflow.
We have such patch for PgPro-EE but it limits only number of prepared 
statement, not taken in account amount of memory used by them.
I think that memory based limit will be more accurate (although it adds 
more overhead).

If you want, I can be reviewer of your patch.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Adding column "mem_usage" to view pg_prepared_statements

From
Andres Freund
Date:
Hi,

On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote:
> how do you want to generalize it? Are you thinking about a view solely
> for the display of the memory usage of different objects?

I'm not quite sure. I'm just not sure that adding separate
infrastructure for various objects is a sutainable approach. We'd likely
want to have this for prepared statements, for cursors, for the current
statement, for various caches, ...

I think an approach would be to add an 'owning_object' field to memory
contexts, which has to point to a Node type if set. A table returning reporting
function could recursively walk through the memory contexts, starting at
TopMemoryContext. Whenever it encounters a context with owning_object
set, it prints a string version of nodeTag(owning_object). For some node
types it knows about (e.g. PreparedStatement, Portal, perhaps some of
the caches), it prints additional metadata specific to the type (so for
prepared statement it'd be something like 'prepared statement', '[name
of prepared statement]'), and prints information about the associated
context and all its children.

The general context information probably should be something like:
context_name, context_ident,
context_total_bytes, context_total_blocks, context_total_freespace, context_total_freechunks, context_total_used,
context_total_children
context_self_bytes, context_self_blocks, context_self_freespace, context_self_freechunks, context_self_used,
context_self_children,

It might make sense to have said function return a row for the contexts
it encounters that do not have an owner set too (that way we'd e.g. get
CacheMemoryContext handled), but then still recurse.


Arguably the proposed owning_object field would be a bit redundant with
the already existing ident/MemoryContextSetIdentifier field, which
e.g. already associates the query string with the contexts used for a
prepared statement. But I'm not convinced that's going to be enough
context in a lot of cases, because e.g. for prepared statements it could
be interesting to have access to both the prepared statement name, and
the statement.

The reason I like something like this is that we wouldn't add new
columns to a number of views, and lack views to associate such
information to for some objects. And it'd be disproportional to add all
the information to numerous places anyway.

One counter-argument is that it'd be more expensive to get information
specific to prepared statements (or other object types) that way. I'm
not sure I buy that that's a problem - this isn't something that's
likely going to be used at a high frequency. But if it becomes a
problem, we can add a function that starts that process at a distinct
memory context (e.g. a function that does this just for a single
prepared statement, identified by name) - but I'd not start there.


> While being interesting I still believe monitoring the mem usage of
> prepared statements is a bit more important than that of other objects
> because of how they change memory consumption of the server without
> using any DDL or configuration options and I am not aware of other
> objects with the same properties, or are there some? And for the other
> volatile objects like tables and indexes and their contents PostgreSQL
> already has it's information functions.

Plenty other objects have that property. E.g. cursors. And for the
catalog/relation/... caches it's even more pernicious - the client might
have closed all its "handles", but we still use memory (and it's
absolutely crucial for performance).




Greetings,

Andres Freund



Re: Adding column "mem_usage" to view pg_prepared_statements

From
Daniel Migowski
Date:
Am 05.08.2019 um 18:30 schrieb Konstantin Knizhnik:
> On 31.07.2019 1:38, Daniel Migowski wrote:
>> Am 31.07.2019 um 00:29 schrieb Tom Lane:
>>> Daniel Migowski <dmigowski@ikoffice.de> writes:
>>>> Ok, just have read about the commitfest thing. Is the patch OK for 
>>>> that? Or is there generally no love for a mem_usage column here? If 
>>>> it was, I really would add some memory monitoring in our app 
>>>> regarding this.
>>> You should certainly put it into the next commitfest.  We might or
>>> might not accept it, but if it's not listed in the CF we might
>>> not remember to even review it.  (The CF app is really a to-do
>>> list for patches ...)
>> OK, added it there. Thanks for your patience :).
> The patch is not applied to the most recent source because extra 
> parameter was added to CreateTemplateTupleDesc method.
> Please rebase it - the fix is trivial.
OK, will do.
> I think that including in pg_prepared_statements information about 
> memory used this statement is very useful.
> CachedPlanMemoryUsage function may be useful not only for this view, 
> but for example it is also need in my autoprepare patch.
I would love to use your work if it's done, and would also love to work 
together here. I am quite novice in C thought, I might take my time to 
get things right.
> I wonder if you consider go further and not only report but control 
> memory used by prepared statements?
> For example implement some LRU replacement discipline on top of 
> prepared statements cache which can
> evict rarely used prepared statements to avoid memory overflow.

THIS! Having some kind of safety net here would finally make sure that 
my precious processes will not grow endlessly until all mem is eaten up, 
even with prep statement count limits.

While working on stuff I noticed there are three things stored in a 
CachedPlanSource. The raw query tree (a relatively small thing), the 
query list (analyzed-and-rewritten query tree) which takes up the most 
memory (at least here, maybe different with your usecases), and (often 
after the 6th call) the CachedPlan, which is more optimized that the 
query list and often needs less memory (half of the query list here).

The query list seems to take the most time to create here, because I hit 
the GEQO engine here, but it could be recreated easily (up to 500ms for 
some queries). Creating the CachedPlan afterwards takes 60ms in some 
usecase. IF we could just invalidate them from time to time, that would 
be grate.

Also, invalidating just the queries or the CachedPlan would not 
invalidate the whole prepared statement, which would break clients 
expectations, but just make them a slower, adding much to the stability 
of the system. I would pay that price, because I just don't use manually 
named prepared statements anyway and just autogenerate them as 
performance sugar without thinking about what really needs to be 
prepared anyway. There is an option in the JDBC driver to use prepared 
statements automatically after you have used them a few time.

> We have such patch for PgPro-EE but it limits only number of prepared 
> statement, not taken in account amount of memory used by them.
> I think that memory based limit will be more accurate (although it 
> adds more overhead).

Limiting them by number is already done automatically here and would 
really not be of much value, but having a mem limit would be great. We 
could have a combined memory limit for your autoprepared statements as 
well as the manually prepared ones, so clients can know for sure the 
server processes won't eat up more that e.g. 800MB for prepared 
statements. And also I would like to have this value spread across all 
client processes, e.g. specifying max_prepared_statement_total_mem=5G 
for the server, and maybe max_prepared_statement_mem=1G for client 
processes. Of course we would have to implement cross client process 
invalidation here, and I don't know if communicating client processes 
are even intended.

Anyway, a memory limit won't really add that much more overhead. At 
least not more than having no prepared statements at all because of the 
fear of server OOMs, or have just a small count of those statements. I 
was even think about a prepared statement reaper that checks the 
pg_prepared_statements every few minutes to clean things up manually, 
but having this in the server would be of great value to me.

> If you want, I can be reviewer of your patch.

I'd love to have you as my reviewer.

Regards,
Daniel Migowski




Re: Adding column "mem_usage" to view pg_prepared_statements

From
Daniel Migowski
Date:
Am 05.08.2019 um 19:16 schrieb Andres Freund:
> On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote:
>> how do you want to generalize it? Are you thinking about a view solely
>> for the display of the memory usage of different objects?
> I'm not quite sure. I'm just not sure that adding separate
> infrastructure for various objects is a sutainable approach. We'd likely
> want to have this for prepared statements, for cursors, for the current
> statement, for various caches, ...
>
> I think an approach would be to add an 'owning_object' field to memory
> contexts, which has to point to a Node type if set. A table returning reporting
> function could recursively walk through the memory contexts, starting at
> TopMemoryContext. Whenever it encounters a context with owning_object
> set, it prints a string version of nodeTag(owning_object). For some node
> types it knows about (e.g. PreparedStatement, Portal, perhaps some of
> the caches), it prints additional metadata specific to the type (so for
> prepared statement it'd be something like 'prepared statement', '[name
> of prepared statement]'), and prints information about the associated
> context and all its children.
I understand. So it would be something like the output of 
MemoryContextStatsInternal, but in table form with some extra columns. I 
would have loved this extra information already in 
MemoryContextStatsInternal btw., so it might be a good idea to upgrade 
it first to find the information and wrap a table function over it 
afterwards.
> The general context information probably should be something like:
> context_name, context_ident,
> context_total_bytes, context_total_blocks, context_total_freespace, context_total_freechunks, context_total_used,
context_total_children
> context_self_bytes, context_self_blocks, context_self_freespace, context_self_freechunks, context_self_used,
context_self_children,
>
> It might make sense to have said function return a row for the contexts
> it encounters that do not have an owner set too (that way we'd e.g. get
> CacheMemoryContext handled), but then still recurse.
A nice way to learn about the internals of the server and to analyze the 
effects of memory reducing enhancements.
> Arguably the proposed owning_object field would be a bit redundant with
> the already existing ident/MemoryContextSetIdentifier field, which
> e.g. already associates the query string with the contexts used for a
> prepared statement. But I'm not convinced that's going to be enough
> context in a lot of cases, because e.g. for prepared statements it could
> be interesting to have access to both the prepared statement name, and
> the statement.
The identifier seems to be more like a category at the moment, because 
it does not seem to hold any relevant information about the object in 
question. So a more specific name would be nice.
> The reason I like something like this is that we wouldn't add new
> columns to a number of views, and lack views to associate such
> information to for some objects. And it'd be disproportional to add all
> the information to numerous places anyway.
I understand your argumentation, but things like Cursors and Portals are 
rather short living while prepared statements seem to be the place where 
memory really builds up.
> One counter-argument is that it'd be more expensive to get information
> specific to prepared statements (or other object types) that way. I'm
> not sure I buy that that's a problem - this isn't something that's
> likely going to be used at a high frequency. But if it becomes a
> problem, we can add a function that starts that process at a distinct
> memory context (e.g. a function that does this just for a single
> prepared statement, identified by name) - but I'd not start there.
I also see no problem here, and with Konstantin Knizhnik's autoprepare I 
wouldn't use this very often anyway, more just for monitoring purposes, 
where I don't care if my query is a bit more complex.
>> While being interesting I still believe monitoring the mem usage of
>> prepared statements is a bit more important than that of other objects
>> because of how they change memory consumption of the server without
>> using any DDL or configuration options and I am not aware of other
>> objects with the same properties, or are there some? And for the other
>> volatile objects like tables and indexes and their contents PostgreSQL
>> already has it's information functions.
> Plenty other objects have that property. E.g. cursors. And for the
> catalog/relation/... caches it's even more pernicious - the client might
> have closed all its "handles", but we still use memory (and it's
> absolutely crucial for performance).

Maybe we can do both? Add a single column to pg_prepared_statements, and 
add another table for the output of MemoryContextStatsDetail? This has 
the advantage that the single real memory indicator useful for end users 
(to the question: How much mem takes my sh*t up?) is in 
pg_prepared_statements and some more intrinsic information in a detail 
view.

Thinking about the latter I am against such a table, at least in the 
form where it gives information like context_total_freechunks, because 
it would just be useful for us developers. Why should any end user care 
for how many chunks are still open in a MemoryContext, except when he is 
working on C-style extensions. Could just be a source of confusion for 
them.

Let's think about the goal this should have: The end user should be able 
to monitor the memory consumption of things he's in control of or could 
affect the system performance. Should such a table automatically 
aggregate some information? I think so. I would not add more than two 
memory columns to the view, just mem_used and mem_reserved. And even 
mem_used is questionable, because in his eyes only the memory he cannot 
use for other stuff because of object x is important for him (that was 
the reason I just added one column). He would even ask: WHY is there 50% 
more memory reserved than used, and how I can optimize it? (Would lead 
to more curious PostgreSQL developers maybe, so that's maybe a plus).

Something that also clearly speaks FOR such a table and against my 
proposal is, that if someone cares for memory, he would most likely care 
for ALL his memory, and in that case monitoring prepared statements 
would just be a small subset of stuff to monitor. Ok, I am defeated and 
will rewrite my patch if the next proposal finds approval:

I would propose a table pg_mem_usage containing the columns 
object_class, name, detail, mem_usage (rename them if it fits the style 
of the other tables more). The name would be empty for some objects like 
the unnamed prepared statement, the query strings would be in the detail 
column. One could add a final "Other" row containing the mem no specific 
output line has been accounted for. Also it could contain lines for 
Cursors and other stuff I am to novice to think of here.

And last: A reason why still we need a child-parent-relationship in this 
table (and distinct this_ and total_ mem functions), is that prepared 
statements start up to use much more memory when the Generic Plan is 
stored in it after a few uses. As a user I always had the assumption 
that prepared a statement would already do all the required work to be 
fast, but a statement just becomes blazingly fast when the Generic Plan 
is available (and used), and it would be nice to see for which 
statements that plan has already been generated to consume his memory. I 
believe the reason for this would be the fear of excessive memory usage.

On the other hand: The Generic Plan had been created for the first 
invocation of the prepared statement, why not store it immediatly. It is 
a named statement for a reason that it is intended to be reused, even 
when it is just twice, and since memory seems not to be seen as a scarce 
resource in this context why not store that immediately. Would drop the 
need for a hierarchy here also.

Any comments?

Regards,
Daniel Migowski




Re: Adding column "mem_usage" to view pg_prepared_statements

From
Andres Freund
Date:
Hi,

On 2019-08-05 22:46:47 +0200, Daniel Migowski wrote:
> > Arguably the proposed owning_object field would be a bit redundant with
> > the already existing ident/MemoryContextSetIdentifier field, which
> > e.g. already associates the query string with the contexts used for a
> > prepared statement. But I'm not convinced that's going to be enough
> > context in a lot of cases, because e.g. for prepared statements it could
> > be interesting to have access to both the prepared statement name, and
> > the statement.
> The identifier seems to be more like a category at the moment, because it
> does not seem to hold any relevant information about the object in question.
> So a more specific name would be nice.

I think you might be thinking of the context's name, not ident? E.g. for
prepared statements the context name is:

    source_context = AllocSetContextCreate(CurrentMemoryContext,
                                           "CachedPlanSource",
                                           ALLOCSET_START_SMALL_SIZES);

which is obviously the same for every statement. But then there's

    MemoryContextSetIdentifier(source_context, plansource->query_string);

which obviously differs.


> > The reason I like something like this is that we wouldn't add new
> > columns to a number of views, and lack views to associate such
> > information to for some objects. And it'd be disproportional to add all
> > the information to numerous places anyway.
> I understand your argumentation, but things like Cursors and Portals are
> rather short living while prepared statements seem to be the place where
> memory really builds up.

That's not necessarily true, especially given WITH HOLD cursors.  Nor
does one only run out of memory in the context of long-lived objects.


> > > While being interesting I still believe monitoring the mem usage of
> > > prepared statements is a bit more important than that of other objects
> > > because of how they change memory consumption of the server without
> > > using any DDL or configuration options and I am not aware of other
> > > objects with the same properties, or are there some? And for the other
> > > volatile objects like tables and indexes and their contents PostgreSQL
> > > already has it's information functions.

> > Plenty other objects have that property. E.g. cursors. And for the
> > catalog/relation/... caches it's even more pernicious - the client might
> > have closed all its "handles", but we still use memory (and it's
> > absolutely crucial for performance).
> 
> Maybe we can do both? Add a single column to pg_prepared_statements, and add
> another table for the output of MemoryContextStatsDetail? This has the
> advantage that the single real memory indicator useful for end users (to the
> question: How much mem takes my sh*t up?) is in pg_prepared_statements and
> some more intrinsic information in a detail view.

I don't see why we'd want to do both. Just makes pg_prepared_statements
a considerably more expensive. And that's used by some applications /
clients in an automated manner.


> Thinking about the latter I am against such a table, at least in the form
> where it gives information like context_total_freechunks, because it would
> just be useful for us developers.

Developers are also an audience for us. I mean we certainly can use this
information during development. But even for bugreports such information
would be useufl.


> Why should any end user care for how many
> chunks are still open in a MemoryContext, except when he is working on
> C-style extensions. Could just be a source of confusion for them.

Meh. As long as the crucial stuff is first, that's imo enough.


> Let's think about the goal this should have: The end user should be able to
> monitor the memory consumption of things he's in control of or could affect
> the system performance. Should such a table automatically aggregate some
> information? I think so. I would not add more than two memory columns to the
> view, just mem_used and mem_reserved. And even mem_used is questionable,
> because in his eyes only the memory he cannot use for other stuff because of
> object x is important for him (that was the reason I just added one column).
> He would even ask: WHY is there 50% more memory reserved than used, and how
> I can optimize it? (Would lead to more curious PostgreSQL developers maybe,
> so that's maybe a plus).

It's important because it influences how memory usage will grow.


> On the other hand: The Generic Plan had been created for the first
> invocation of the prepared statement, why not store it immediatly. It is a
> named statement for a reason that it is intended to be reused, even when it
> is just twice, and since memory seems not to be seen as a scarce resource in
> this context why not store that immediately. Would drop the need for a
> hierarchy here also.

Well, we'll maybe never use it, so ...

Greetings,

Andres Freund



Re: Adding column "mem_usage" to view pg_prepared_statements

From
Konstantin Knizhnik
Date:

On 05.08.2019 22:35, Daniel Migowski wrote:
> .
>> I think that including in pg_prepared_statements information about 
>> memory used this statement is very useful.
>> CachedPlanMemoryUsage function may be useful not only for this view, 
>> but for example it is also need in my autoprepare patch.
> I would love to use your work if it's done, and would also love to 
> work together here. I am quite novice in C thought, I might take my 
> time to get things right.

Right now I resused your implementation of CachedPlanMemoryUsage function:)
Before I took in account only memory used by plan->context, but not of 
plan->query_context and plan->gplan->context (although query_context for 
raw parse tree seems to be much smaller).


>> I wonder if you consider go further and not only report but control 
>> memory used by prepared statements?
>> For example implement some LRU replacement discipline on top of 
>> prepared statements cache which can
>> evict rarely used prepared statements to avoid memory overflow.
>
> THIS! Having some kind of safety net here would finally make sure that 
> my precious processes will not grow endlessly until all mem is eaten 
> up, even with prep statement count limits.
>
> While working on stuff I noticed there are three things stored in a 
> CachedPlanSource. The raw query tree (a relatively small thing), the 
> query list (analyzed-and-rewritten query tree) which takes up the most 
> memory (at least here, maybe different with your usecases), and (often 
> after the 6th call) the CachedPlan, which is more optimized that the 
> query list and often needs less memory (half of the query list here).
>
> The query list seems to take the most time to create here, because I 
> hit the GEQO engine here, but it could be recreated easily (up to 
> 500ms for some queries). Creating the CachedPlan afterwards takes 60ms 
> in some usecase. IF we could just invalidate them from time to time, 
> that would be grate.
>
> Also, invalidating just the queries or the CachedPlan would not 
> invalidate the whole prepared statement, which would break clients 
> expectations, but just make them a slower, adding much to the 
> stability of the system. I would pay that price, because I just don't 
> use manually named prepared statements anyway and just autogenerate 
> them as performance sugar without thinking about what really needs to 
> be prepared anyway. There is an option in the JDBC driver to use 
> prepared statements automatically after you have used them a few time.

I have noticed that cached plans for implicitly prepared statements in 
stored procedures are not shown in pg_prepared_statements view.
It may be not a problem in your case (if you are accessing Postgres 
through  JDBC and not using prepared statements),
but can cause memory overflow in applications actively using stored 
procedures, because unlike explicitly created prepared statements, it is 
very difficult
to estimate and control statements implicitly prepared by plpgsql.

I am not sure what will be the best solution in this case. Adding yet 
another view for implicitly prepared statements? Or include them in 
pg_prepared_statements view?
>
>> We have such patch for PgPro-EE but it limits only number of prepared 
>> statement, not taken in account amount of memory used by them.
>> I think that memory based limit will be more accurate (although it 
>> adds more overhead).
>
> Limiting them by number is already done automatically here and would 
> really not be of much value, but having a mem limit would be great. We 
> could have a combined memory limit for your autoprepared statements as 
> well as the manually prepared ones, so clients can know for sure the 
> server processes won't eat up more that e.g. 800MB for prepared 
> statements. And also I would like to have this value spread across all 
> client processes, e.g. specifying max_prepared_statement_total_mem=5G 
> for the server, and maybe max_prepared_statement_mem=1G for client 
> processes. Of course we would have to implement cross client process 
> invalidation here, and I don't know if communicating client processes 
> are even intended.
>
> Anyway, a memory limit won't really add that much more overhead. At 
> least not more than having no prepared statements at all because of 
> the fear of server OOMs, or have just a small count of those 
> statements. I was even think about a prepared statement reaper that 
> checks the pg_prepared_statements every few minutes to clean things up 
> manually, but having this in the server would be of great value to me.

Right now memory context has no field containing amount of currently 
used memory.
This is why context->methods->stats function implementation has to 
traverse all blocks to calculate size of memory used by context.
It may be not so fast for large contexts. But I do not expect that 
contexts of prepared statements will be very large, although
I have deal with customers which issued queries with query text length 
larger than few megabytes. I afraid to estimate size of plan for such 
queries.
This is the reason of my concern that calculating memory context size 
may have negative effect on performance. But is has to be done only once 
when statement is prepared. So may be it is not a problem at all.

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Adding column "mem_usage" to view pg_prepared_statements

From
Alvaro Herrera
Date:
Hello Daniel,

This patch no longer applies.  Please submit an updated version.  Also,
there's voluminous discussion that doesn't seem to have resulted in any
revision of the code.  Please fix that too.

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services