Thread: Add tracking of backend memory allocated to pg_stat_activity

Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
Hi Hackers,

Attached is a patch to
Add tracking of backend memory allocated to pg_stat_activity
   
This new field displays the current bytes of memory allocated to the
backend process.  It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included
in the displayed value.  Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero.


--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com

Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Justin Pryzby
Date:
On Wed, Aug 31, 2022 at 12:03:06PM -0400, Reid Thompson wrote:
> Hi Hackers,
> 
> Attached is a patch to 
> Add tracking of backend memory allocated to pg_stat_activity

> +  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',

In the past, there was concern about making pg_stat_activity wider by
adding information that's less-essential than what's been there for
years.  This is only an int64, so it's not "wide", but I wonder if
there's another way to expose this information?  Like adding backends to
pg_get_backend_memory_contexts() , maybe with another view on top of
that ?

+                * shown allocated in pgstat_activity when the creator destroys the
                                                                                                          
 

pg_stat

> +         * Posix creation calls dsm_impl_posix_resize implying that resizing
> +         * occurs or may be added in the future. As implemented
> +         * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
> +         * whole new size as input, growing the allocation as needed * (only
> +         * truncate supports shrinking). We update by replacing the * old

wrapping caused extraneous stars

> +     * Do not allow backend_mem_allocated to go below zero. ereport if we
> +     * would have. There's no need for a lock around the read here asit's

as it's

> +        ereport(LOG, (errmsg("decrease reduces reported backend memory allocated below zero; setting reported to
0")));

errmsg() doesn't require the outside paranthesis since a couple years
ago.

> +    /*
> +     * Until header allocation is included in context->mem_allocated cast to
> +     * slab and decrement the headerSize

add a comma before "cast" ?

-- 
Justin



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Kyotaro Horiguchi
Date:
At Wed, 31 Aug 2022 12:05:55 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in 
> On Wed, Aug 31, 2022 at 12:03:06PM -0400, Reid Thompson wrote:
> > Hi Hackers,
> > 
> > Attached is a patch to 
> > Add tracking of backend memory allocated to pg_stat_activity
> 
> > +  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> 
> In the past, there was concern about making pg_stat_activity wider by
> adding information that's less-essential than what's been there for
> years.  This is only an int64, so it's not "wide", but I wonder if
> there's another way to expose this information?  Like adding backends to

The view looks already too wide to me.  I don't want the numbers for
metrics are added to the view.

> pg_get_backend_memory_contexts() , maybe with another view on top of
> that ?

+1

> +                * shown allocated in pgstat_activity when the creator destroys the
                                                                                                            
 
> 
> pg_stat
> 
> > +         * Posix creation calls dsm_impl_posix_resize implying that resizing
> > +         * occurs or may be added in the future. As implemented
> > +         * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
> > +         * whole new size as input, growing the allocation as needed * (only
> > +         * truncate supports shrinking). We update by replacing the * old
> 
> wrapping caused extraneous stars
> 
> > +     * Do not allow backend_mem_allocated to go below zero. ereport if we
> > +     * would have. There's no need for a lock around the read here asit's
> 
> as it's
> 
> > +        ereport(LOG, (errmsg("decrease reduces reported backend memory allocated below zero; setting reported to
0")));
> 
> errmsg() doesn't require the outside paranthesis since a couple years
> ago.

+1

> > +    /*
> > +     * Until header allocation is included in context->mem_allocated cast to
> > +     * slab and decrement the headerSize
> 
> add a comma before "cast" ?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Kyotaro Horiguchi
Date:
At Wed, 31 Aug 2022 12:03:06 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in 
> Attached is a patch to 
> Add tracking of backend memory allocated to pg_stat_activity

> @@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
>              return NULL;
>  
>          context->mem_allocated += blksize;
> +        pgstat_report_backend_mem_allocated_increase(blksize);

I'm not sure this is acceptable. The function adds a branch even when
the feature is turned off, which I think may cause a certain extent of
performance degradation. A past threads [1], [2] and [3] might be
informative.

[1] https://www.postgresql.org/message-id/1434311039.4369.39.camel%40jeff-desktop
[2] https://www.postgresql.org/message-id/72a656e0f71d0860161e0b3f67e4d771%40oss.nttdata.com
[3] https://www.postgresql.org/message-id/0271f440ac77f2a4180e0e56ebd944d1%40oss.nttdata.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add tracking of backend memory allocated to pg_stat_activity

From
"Drouvot, Bertrand"
Date:
Hi,

On 9/1/22 3:28 AM, Kyotaro Horiguchi wrote:
> At Wed, 31 Aug 2022 12:05:55 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in
>> On Wed, Aug 31, 2022 at 12:03:06PM -0400, Reid Thompson wrote:
>>> Hi Hackers,
>>>
>>> Attached is a patch to
>>> Add tracking of backend memory allocated

Thanks for the patch.

+ 1 on the idea.

>>> +  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
>> In the past, there was concern about making pg_stat_activity wider by
>> adding information that's less-essential than what's been there for
>> years.  This is only an int64, so it's not "wide", but I wonder if
>> there's another way to expose this information?  Like adding backends to
> The view looks already too wide to me.  I don't want the numbers for
> metrics are added to the view.

+1 for a dedicated view.

While we are at it, what do you think about also recording the max 
memory allocated by a backend? (could be useful and would avoid sampling 
for which there is no guarantee to sample the max anyway).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Wed, 2022-08-31 at 12:05 -0500, Justin Pryzby wrote:
> > +  proargmodes =>
> > '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}'
> > ,
>
> In the past, there was concern about making pg_stat_activity wider by
> adding information that's less-essential than what's been there for
> years.  This is only an int64, so it's not "wide", but I wonder if
> there's another way to expose this information?  Like adding backends
> to
> pg_get_backend_memory_contexts() , maybe with another view on top of

I will take a look at pg_get_backend_memory_contexts. I will also look
at the other suggestions in the thread.

> +                * shown allocated in pgstat_activity when the
>
> pg_stat

Corrected,

> > replacing the * old
>
> wrapping caused extraneous stars

Corrected

> > here asit's
>
> as it's

Corrected

> errmsg() doesn't require the outside paranthesis since a couple years
> ago.

Corrected

> > > mem_allocated cast to
> add a comma before "cast" ?

Corrected

Patch with the corrections attached

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com


--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com

Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Thu, 2022-09-01 at 13:43 +0900, Kyotaro Horiguchi wrote:
>
> > @@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
> >               return NULL;
> >  
> >           context->mem_allocated += blksize;
> > +        pgstat_report_backend_mem_allocated_increase(blksi
> > ze);
>
> I'm not sure this is acceptable. The function adds a branch even when
> the feature is turned off, which I think may cause a certain extent
> of
> performance degradation. A past threads [1], [2] and [3] might be
> informative.

 Stated above is '...even when the feature is turned off...', I want to
 note that this feature/patch (for tracking memory allocated) doesn't
 have an 'on/off'. Tracking would always occur.

 I'm open to guidance on testing for performance degradation.  I did
 note some basic pgbench comparison numbers in the thread regarding
 limiting backend memory allocations.

> --
> Kyotaro Horiguchi
> NTT Open Source Software Center




Re: Add tracking of backend memory allocated to pg_stat_activity

From
Kyotaro Horiguchi
Date:
At Tue, 06 Sep 2022 17:10:49 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in
> On Thu, 2022-09-01 at 13:43 +0900, Kyotaro Horiguchi wrote:
> >
> > > @@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
> > >               return NULL;
> > >  
> > >           context->mem_allocated += blksize;
> > > +        pgstat_report_backend_mem_allocated_increase(blksi
> > > ze);
> >
> > I'm not sure this is acceptable. The function adds a branch even when
> > the feature is turned off, which I think may cause a certain extent
> > of
> > performance degradation. A past threads [1], [2] and [3] might be
> > informative.
>
>  Stated above is '...even when the feature is turned off...', I want to
>  note that this feature/patch (for tracking memory allocated) doesn't
>  have an 'on/off'. Tracking would always occur.

In the patch, I see that
pgstat_report_backend_mem_allocated_increase() runs the following
code, which seems like to me to be a branch..

+    if (!beentry || !pgstat_track_activities)
+    {
+        /*
+         * Account for memory before pgstats is initialized. This will be
+         * migrated to pgstats on initialization.
+         */
+        backend_mem_allocated += allocation;
+
+        return;
+    }


>  I'm open to guidance on testing for performance degradation.  I did
>  note some basic pgbench comparison numbers in the thread regarding
>  limiting backend memory allocations.

Yeah.. That sounds good..

(I have a patch that is stuck at benchmarking on slight possible
degradation caused by a branch (or indirect call) on a hot path
similary to this one.  The test showed fluctuation that is not clearly
distinguishable between noise and degradation by running the target
functions in a busy loop..)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Kyotaro Horiguchi
Date:
At Wed, 07 Sep 2022 17:08:41 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> At Tue, 06 Sep 2022 17:10:49 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in
> > On Thu, 2022-09-01 at 13:43 +0900, Kyotaro Horiguchi wrote:
> > >
> > > > @@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
> > > >               return NULL;
> > > >  
> > > >           context->mem_allocated += blksize;
> > > > +        pgstat_report_backend_mem_allocated_increase(blksi
> > > > ze);
> > >
> > > I'm not sure this is acceptable. The function adds a branch even when
> > > the feature is turned off, which I think may cause a certain extent
> > > of
> > > performance degradation. A past threads [1], [2] and [3] might be
> > > informative.
> >
> >  Stated above is '...even when the feature is turned off...', I want to
> >  note that this feature/patch (for tracking memory allocated) doesn't
> >  have an 'on/off'. Tracking would always occur.
>
> In the patch, I see that
> pgstat_report_backend_mem_allocated_increase() runs the following
> code, which seems like to me to be a branch..

Ah.. sorry.

> pgstat_report_backend_mem_allocated_increase() runs the following
- code, which seems like to me to be a branch..
+ code, which seems like to me to be a branch that can turn of the feature..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Stephen Frost
Date:
Greetings,

* Drouvot, Bertrand (bdrouvot@amazon.com) wrote:
> On 9/1/22 3:28 AM, Kyotaro Horiguchi wrote:
> >At Wed, 31 Aug 2022 12:05:55 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in
> >>On Wed, Aug 31, 2022 at 12:03:06PM -0400, Reid Thompson wrote:
> >>>Attached is a patch to
> >>>Add tracking of backend memory allocated
>
> Thanks for the patch.
>
> + 1 on the idea.

Glad folks are in support of the general idea.

> >>>+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> >>In the past, there was concern about making pg_stat_activity wider by
> >>adding information that's less-essential than what's been there for
> >>years.  This is only an int64, so it's not "wide", but I wonder if
> >>there's another way to expose this information?  Like adding backends to
> >The view looks already too wide to me.  I don't want the numbers for
> >metrics are added to the view.
>
> +1 for a dedicated view.

A dedicated view with a single column in it hardly seems sensible.  I'd
also argue that this particular bit of information is extremely useful
and therefore worthy of being put directly into pg_stat_activity.  I
could see a dedicated view possibly *also* being added later if/when we
provide a more detailed break-down of how the memory is being used but
that's a whole other thing and I'm not even 100% sure we'll ever
actually get there, as you can already poke a backend and have it dump
out the memory context-level information on an as-needed basis.

> While we are at it, what do you think about also recording the max memory
> allocated by a backend? (could be useful and would avoid sampling for which
> there is no guarantee to sample the max anyway).

What would you do with that information..?  By itself, it doesn't strike
me as useful.  Perhaps it'd be interesting to grab the max required for
a particular query in pg_stat_statements or such but again, that's a
very different thing.

Thanks,

Stephen

Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Stephen Frost
Date:
Greetings,

* Kyotaro Horiguchi (horikyota.ntt@gmail.com) wrote:
> At Tue, 06 Sep 2022 17:10:49 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in
> >  I'm open to guidance on testing for performance degradation.  I did
> >  note some basic pgbench comparison numbers in the thread regarding
> >  limiting backend memory allocations.
>
> Yeah.. That sounds good..
>
> (I have a patch that is stuck at benchmarking on slight possible
> degradation caused by a branch (or indirect call) on a hot path
> similary to this one.  The test showed fluctuation that is not clearly
> distinguishable between noise and degradation by running the target
> functions in a busy loop..)

Just to be clear- this path is (hopefully) not *super* hot as we're only
tracking actual allocations (that is- malloc() calls), this isn't
changing anything for palloc() calls that aren't also needing to do a
malloc(), and we already try to reduce the amount of malloc() calls
we're doing by allocating more and more each time we run out in a given
context.

While I'm generally supportive of doing some benchmarking around this, I
don't think the bar is as high as it would be if we were actually
changing the cost of routine palloc() or such calls.

Thanks,

Stephen

Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Justin Pryzby
Date:
On Fri, Sep 09, 2022 at 12:34:15PM -0400, Stephen Frost wrote:
> > While we are at it, what do you think about also recording the max memory
> > allocated by a backend? (could be useful and would avoid sampling for which
> > there is no guarantee to sample the max anyway).

FYI, that's already kind-of available from getrusage:

$ psql ts -c "SET log_executor_stats=on; SET client_min_messages=debug;
SELECT a, COUNT(1) FROM generate_series(1,999999) a GROUP BY 1;" |wc
LOG:  EXECUTOR STATISTICS
...
!       194568 kB max resident size

Note that max rss counts things allocated outside postgres (like linked
libraries).

> What would you do with that information..?  By itself, it doesn't strike
> me as useful.  Perhaps it'd be interesting to grab the max required for
> a particular query in pg_stat_statements or such but again, that's a
> very different thing.

log_executor_stats is at level "debug", so it's not great to enable it
for a single session, and painful to think about enabling it globally.
This would be a lot friendlier.

Storing the maxrss per backend somewhere would be useful (and avoid the
issue of "sampling" with top), after I agree that it ought to be exposed
to a view.  For example, it might help to determine whether (and which!)
backends are using large multiple of work_mem, and then whether that can
be increased.  If/when we had a "memory budget allocator", this would
help to determine how to set its GUCs, maybe to see "which backends are
using the work_mem that are precluding this other backend from using
efficient query plan".

I wonder if it's better to combine these two threads into one.  The 0001
patch of course can be considered independently from the 0002 patch, as
usual.  Right now, there's different parties on both threads ...

-- 
Justin



Re: Add tracking of backend memory allocated to pg_stat_activity

From
"Drouvot, Bertrand"
Date:

Hi,

On 9/9/22 7:08 PM, Justin Pryzby wrote:
On Fri, Sep 09, 2022 at 12:34:15PM -0400, Stephen Frost wrote:
While we are at it, what do you think about also recording the max memory
allocated by a backend? (could be useful and would avoid sampling for which
there is no guarantee to sample the max anyway).
What would you do with that information..?  By itself, it doesn't strike
me as useful.  Perhaps it'd be interesting to grab the max required for
a particular query in pg_stat_statements or such but again, that's a
very different thing.

Storing the maxrss per backend somewhere would be useful (and avoid the
issue of "sampling" with top), after I agree that it ought to be exposed
to a view.  For example, it might help to determine whether (and which!)
backends are using large multiple of work_mem, and then whether that can
be increased.  If/when we had a "memory budget allocator", this would
help to determine how to set its GUCs, maybe to see "which backends are
using the work_mem that are precluding this other backend from using
efficient query plan".

+1.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Stephen Frost
Date:
Greetings,

* Drouvot, Bertrand (bdrouvot@amazon.com) wrote:
> On 9/9/22 7:08 PM, Justin Pryzby wrote:
> >On Fri, Sep 09, 2022 at 12:34:15PM -0400, Stephen Frost wrote:
> >>>While we are at it, what do you think about also recording the max memory
> >>>allocated by a backend? (could be useful and would avoid sampling for which
> >>>there is no guarantee to sample the max anyway).
> >>What would you do with that information..?  By itself, it doesn't strike
> >>me as useful.  Perhaps it'd be interesting to grab the max required for
> >>a particular query in pg_stat_statements or such but again, that's a
> >>very different thing.
>
> >Storing the maxrss per backend somewhere would be useful (and avoid the
> >issue of "sampling" with top), after I agree that it ought to be exposed
> >to a view.  For example, it might help to determine whether (and which!)
> >backends are using large multiple of work_mem, and then whether that can
> >be increased.  If/when we had a "memory budget allocator", this would
> >help to determine how to set its GUCs, maybe to see "which backends are
> >using the work_mem that are precluding this other backend from using
> >efficient query plan".
>
> +1.

I still have a hard time seeing the value in tracking which backends are
using the most memory over the course of a backend's entire lifetime,
which would involve lots of different queries, some of which might use
many multiples of work_mem and others not.  Much more interesting would
be to track this as part of pg_stat_statements and associated with
queries.

Either way, this looks like an independent feature which someone who has
interest in could work on but generally doesn't impact what the feature
of this thread is about; a feature which has already shown merit in
finding a recently introduced memory leak and is the basis of another
feature being contemplated to help avoid OOM-killer introduced crashes.

Thanks,

Stephen

Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
patch rebased to current master

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com




Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Tue, 2022-10-25 at 14:51 -0400, Reid Thompson wrote:
> patch rebased to current master
>
actually attach the patch

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com

Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
"Drouvot, Bertrand"
Date:
Hi,

On 10/25/22 8:59 PM, Reid Thompson wrote:
> On Tue, 2022-10-25 at 14:51 -0400, Reid Thompson wrote:
>> patch rebased to current master
>>
> actually attach the patch
> 

It looks like the patch does not apply anymore since b1099eca8f.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Fri, 2022-11-04 at 11:06 +0100, Drouvot, Bertrand wrote:
> Hi,
>
> It looks like the patch does not apply anymore since b1099eca8f.
>
> Regards,
>

Thanks,

rebased to current master attached.

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com

Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2022-11-04 08:56:13 -0400, Reid Thompson wrote:
> From a8de5d29c0c6f10962181926a49ad4fec1e52bd1 Mon Sep 17 00:00:00 2001
> From: Reid Thompson <jreidthompson@nc.rr.com>
> Date: Thu, 11 Aug 2022 12:01:25 -0400
> Subject: [PATCH] Add tracking of backend memory allocated to pg_stat_activity
> 
> This new field displays the current bytes of memory allocated to the
> backend process. It is updated as memory for the process is
> malloc'd/free'd. Memory allocated to items on the freelist is included in
> the displayed value.  Dynamic shared memory allocations are included
> only in the value displayed for the backend that created them, they are
> not included in the value for backends that are attached to them to
> avoid double counting. On occasion, orphaned memory segments may be
> cleaned up on postmaster startup. This may result in decreasing the sum
> without a prior increment. We limit the floor of backend_mem_allocated
> to zero. Updated pg_stat_activity documentation for the new column.

I'm not convinced that counting DSM values this way is quite right. There are
a few uses of DSMs that track shared resources, with the biggest likely being
the stats for relations etc.  I suspect that tracking that via backend memory
usage will be quite confusing, because fairly random backends that had to grow
the shared state end up being blamed with the memory usage in perpituity - and
then suddenly that memory usage will vanish when that backend exits, despite
the memory continuing to exist.



> @@ -734,6 +747,7 @@ AllocSetAlloc(MemoryContext context, Size size)
>              return NULL;
>  
>          context->mem_allocated += blksize;
> +        pgstat_report_backend_allocated_bytes_increase(blksize);
>  
>          block->aset = set;
>          block->freeptr = block->endptr = ((char *) block) + blksize;
> @@ -944,6 +958,7 @@ AllocSetAlloc(MemoryContext context, Size size)
>              return NULL;
>  
>          context->mem_allocated += blksize;
> +        pgstat_report_backend_allocated_bytes_increase(blksize);
>  
>          block->aset = set;
>          block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
> @@ -1043,6 +1058,7 @@ AllocSetFree(void *pointer)
>              block->next->prev = block->prev;
>  
>          set->header.mem_allocated -= block->endptr - ((char *) block);
> +        pgstat_report_backend_allocated_bytes_decrease(block->endptr - ((char *) block));
>  
>  #ifdef CLOBBER_FREED_MEMORY
>          wipe_mem(block, block->freeptr - ((char *) block));

I suspect this will be noticable cost-wise. Even though these paths aren't the
hottest memory allocation paths, by nature of going down into malloc, adding
an external function call that then does a bunch of branching etc. seems
likely to add up to some.  See below for how I think we can deal with that...


> +
> +/* --------
> + * pgstat_report_backend_allocated_bytes_increase() -
> + *
> + * Called to report increase in memory allocated for this backend
> + * --------
> + */
> +void
> +pgstat_report_backend_allocated_bytes_increase(uint64 allocation)
> +{
> +    volatile PgBackendStatus *beentry = MyBEEntry;
> +
> +    if (!beentry || !pgstat_track_activities)
> +    {
> +        /*
> +         * Account for memory before pgstats is initialized. This will be
> +         * migrated to pgstats on initialization.
> +         */
> +        backend_allocated_bytes += allocation;
> +
> +        return;
> +    }
> +
> +    /*
> +     * Update my status entry, following the protocol of bumping
> +     * st_changecount before and after.  We use a volatile pointer here to
> +     * ensure the compiler doesn't try to get cute.
> +     */
> +    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
> +    beentry->backend_allocated_bytes += allocation;
> +    PGSTAT_END_WRITE_ACTIVITY(beentry);
> +}

This is quite a few branches, including write/read barriers.

It doesn't really make sense to use the PGSTAT_BEGIN_WRITE_ACTIVITY() pattern
here - you're just updating a single value, there's nothing to be gained by
it. The point of PGSTAT_BEGIN_*ACTIVITY() stuff is to allow readers to get a
consistent view of multiple values - but there aren't multiple values here!


To avoid the overhead of checking (!beentry || !pgstat_track_activities) and
the external function call, I think you'd be best off copying the trickery I
introduced for pgstat_report_wait_start(), in 225a22b19ed.

I.e. make pgstat_report_backend_allocated_bytes_increase() a static inline
function that unconditionally updates something like
*my_backend_allocated_memory.  To deal with the case of (!beentry ||
!pgstat_track_activities), that variable initially points to some backend
local state and is set to the shared state in pgstat_bestart().

This additionally has the nice benefit that you can track memory usage from
before pgstat_bestart(), it'll be in the local variable.


> +void
> +pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation)
> +{
> +    volatile PgBackendStatus *beentry = MyBEEntry;
> +
> +    /*
> +     * Cases may occur where shared memory from a previous postmaster
> +     * invocation still exist. These are cleaned up at startup by
> +     * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
> +     * zero in case no corresponding prior increase exists or decrease has
> +     * already been accounted for.
> +     */

I don't really follow - postmaster won't ever have a backend status array, so
how would they be tracked here?


Greetings,

Andres Freund



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Melanie Plageman
Date:
On Fri, Nov 04, 2022 at 08:56:13AM -0400, Reid Thompson wrote:
> From a8de5d29c0c6f10962181926a49ad4fec1e52bd1 Mon Sep 17 00:00:00 2001
> From: Reid Thompson <jreidthompson@nc.rr.com>
> Date: Thu, 11 Aug 2022 12:01:25 -0400
> Subject: [PATCH] Add tracking of backend memory allocated to pg_stat_activity
> 
> This new field displays the current bytes of memory allocated to the
> backend process. It is updated as memory for the process is
> malloc'd/free'd. Memory allocated to items on the freelist is included in
> the displayed value.  Dynamic shared memory allocations are included
> only in the value displayed for the backend that created them, they are
> not included in the value for backends that are attached to them to
> avoid double counting. On occasion, orphaned memory segments may be
> cleaned up on postmaster startup. This may result in decreasing the sum
> without a prior increment. We limit the floor of backend_mem_allocated
> to zero. Updated pg_stat_activity documentation for the new column.
> ---
>  doc/src/sgml/monitoring.sgml                |  12 +++
>  src/backend/catalog/system_views.sql        |   1 +
>  src/backend/storage/ipc/dsm_impl.c          |  81 +++++++++++++++
>  src/backend/utils/activity/backend_status.c | 105 ++++++++++++++++++++
>  src/backend/utils/adt/pgstatfuncs.c         |   4 +-
>  src/backend/utils/mmgr/aset.c               |  18 ++++
>  src/backend/utils/mmgr/generation.c         |  15 +++
>  src/backend/utils/mmgr/slab.c               |  21 ++++
>  src/include/catalog/pg_proc.dat             |   6 +-
>  src/include/utils/backend_status.h          |   7 +-
>  src/test/regress/expected/rules.out         |   9 +-
>  11 files changed, 270 insertions(+), 9 deletions(-)
> 
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index e5d622d514..972805b85a 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -947,6 +947,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
>       </para></entry>
>      </row>
>  
> +    <row>
> +     <entry role="catalog_table_entry"><para role="column_definition">
> +      <structfield>backend_allocated_bytes</structfield> <type>bigint</type>
> +     </para>
> +     <para>
> +      The byte count of memory allocated to this backend. Dynamic shared memory
> +      allocations are included only in the value displayed for the backend that
> +      created them, they are not included in the value for backends that are
> +      attached to them to avoid double counting.
> +     </para></entry>
> +    </row>
> +

It doesn't seem like you need the backend_ prefix in the view since that
is implied by it being in pg_stat_activity.

For the wording on the description, I find "memory allocated to this
backend" a bit confusing. Perhaps you could reword it to make clear you
mean that the number represents the balance of allocations by this
backend. Something like:

    Memory currently allocated to this backend in bytes. This is the
    balance of bytes allocated and freed by this backend.

I would also link to the system administration function pg_size_pretty()
so users know how to easily convert the value.

If you end up removing shared memory as Andres suggests in [1], I would link
pg_shmem_allocations view here and point out that shared memory allocations can
be viewed there instead (and why).

You could instead add dynamic shared memory allocation to the
pg_shmem_allocations view as suggested as follow-on work by the commit which
introduced it, ed10f32e3.

> +/* --------
> + * pgstat_report_backend_allocated_bytes_increase() -
> + *
> + * Called to report increase in memory allocated for this backend
> + * --------
> + */

It seems like you could combine the
pgstat_report_backend_allocated_bytes_decrease/increase() by either using a
signed integer to represent the allocation/deallocation or passing in a
"direction" that is just a positive or negative multiplier enum.

Especially if you don't use the write barriers, I think you could
simplify the logic in the two functions.

If you do combine them, you might shorten the name to
pgstat_report_backend_allocation() or pgstat_report_allocation().

> +void
> +pgstat_report_backend_allocated_bytes_increase(uint64 allocation)
> +{
> +    volatile PgBackendStatus *beentry = MyBEEntry;
> +
> +    if (!beentry || !pgstat_track_activities)
> +    {
> +        /*
> +         * Account for memory before pgstats is initialized. This will be
> +         * migrated to pgstats on initialization.
> +         */
> +        backend_allocated_bytes += allocation;
> +
> +        return;
> +    }
> +
> +    /*
> +     * Update my status entry, following the protocol of bumping
> +     * st_changecount before and after.  We use a volatile pointer here to
> +     * ensure the compiler doesn't try to get cute.
> +     */
> +    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
> +    beentry->backend_allocated_bytes += allocation;
> +    PGSTAT_END_WRITE_ACTIVITY(beentry);
> +}
> +
> +/* --------
> + * pgstat_report_backend_allocated_bytes_decrease() -
> + *
> + * Called to report decrease in memory allocated for this backend
> + * --------
> + */
> +void
> +pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation)
> +{
> +    volatile PgBackendStatus *beentry = MyBEEntry;
> +
> +    /*
> +     * Cases may occur where shared memory from a previous postmaster
> +     * invocation still exist. These are cleaned up at startup by
> +     * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
> +     * zero in case no corresponding prior increase exists or decrease has
> +     * already been accounted for.
> +     */
> +
> +    if (!beentry || !pgstat_track_activities)
> +    {
> +        /*
> +         * Account for memory before pgstats is initialized. This will be
> +         * migrated to pgstats on initialization. Do not allow
> +         * backend_allocated_bytes to go below zero. If pgstats has not been
> +         * initialized, we are in startup and we set backend_allocated_bytes
> +         * to zero in cases where it would go negative and skip generating an
> +         * ereport.
> +         */
> +        if (deallocation > backend_allocated_bytes)
> +            backend_allocated_bytes = 0;
> +        else
> +            backend_allocated_bytes -= deallocation;
> +
> +        return;
> +    }
> +
> +    /*
> +     * Do not allow backend_allocated_bytes to go below zero. ereport if we
> +     * would have. There's no need for a lock around the read here as it's
> +     * being referenced from the same backend which means that there shouldn't
> +     * be concurrent writes. We want to generate an ereport in these cases.
> +     */
> +    if (deallocation > beentry->backend_allocated_bytes)
> +    {
> +        ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to
0"));
> +

I also think it would be nice to include the deallocation amount and
backend_allocated_bytes amount in the log message.
It also might be nice to start the message with something more clear
than "decrease".
For example, I would find this clear as a user:

    Backend [backend_type or pid] deallocated [deallocation number] bytes,
    [backend_allocated_bytes - deallocation number] more than this backend
    has reported allocating.

> diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
> index 96bffc0f2a..b6d135ad2f 100644
> --- a/src/backend/utils/adt/pgstatfuncs.c
> +++ b/src/backend/utils/adt/pgstatfuncs.c
> @@ -553,7 +553,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
>  Datum
>  pg_stat_get_activity(PG_FUNCTION_ARGS)
>  {
> -#define PG_STAT_GET_ACTIVITY_COLS    30
> +#define PG_STAT_GET_ACTIVITY_COLS    31
>      int            num_backends = pgstat_fetch_stat_numbackends();
>      int            curr_backend;
>      int            pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
> @@ -609,6 +609,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
>          else
>              nulls[16] = true;
>  
> +        values[30] = UInt64GetDatum(beentry->backend_allocated_bytes);

Though not the fault of this patch, it is becoming very difficult to
keep the columns straight in pg_stat_get_activity(). Perhaps you could
add a separate commit to add an enum for the columns so the function is
easier to understand.

> diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
> index b582b46e9f..75d87e8308 100644
> --- a/src/include/utils/backend_status.h
> +++ b/src/include/utils/backend_status.h
> @@ -169,6 +169,9 @@ typedef struct PgBackendStatus
>  
>      /* query identifier, optionally computed using post_parse_analyze_hook */
>      uint64        st_query_id;
> +
> +    /* Current memory allocated to this backend */
> +    uint64        backend_allocated_bytes;
>  } PgBackendStatus;

I don't think you need the backend_ prefix here since it is in
PgBackendStatus.

> @@ -313,7 +316,9 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
>  extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
>                                                         int buflen);
>  extern uint64 pgstat_get_my_query_id(void);
> -
> +extern void pgstat_report_backend_allocated_bytes_increase(uint64 allocation);
> +extern void pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation);
> +extern uint64 pgstat_get_all_backend_memory_allocated(void);
>  
>  /* ----------
>   * Support functions for the SQL-callable functions to
> diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
> index 624d0e5aae..ba9f494806 100644
> --- a/src/test/regress/expected/rules.out
> +++ b/src/test/regress/expected/rules.out
> @@ -1753,10 +1753,11 @@ pg_stat_activity| SELECT s.datid,
>      s.state,
>      s.backend_xid,
>      s.backend_xmin,
> +    s.backend_allocated_bytes,
>      s.query_id,
>      s.query,
>      s.backend_type

Seems like it would be possible to add a functional test to stats.sql.

- Melanie

[1] https://www.postgresql.org/message-id/20221105024146.xxlbtsxh2niyz2fu%40awork3.anarazel.de



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
Hi Andres,
Thanks for looking at this and for the feedback. Responses inline below.

On Fri, 2022-11-04 at 19:41 -0700, Andres Freund wrote:
> Hi,
>
> On 2022-11-04 08:56:13 -0400, Reid Thompson wrote:
>
> I'm not convinced that counting DSM values this way is quite right.
> There are a few uses of DSMs that track shared resources, with the biggest
> likely being the stats for relations etc.  I suspect that tracking that via
> backend memory usage will be quite confusing, because fairly random backends that
> had to grow the shared state end up being blamed with the memory usage in
> perpituity - and then suddenly that memory usage will vanish when that backend exits,
> despite the memory continuing to exist.

Ok, I'll make an attempt to identify these allocations and manage them
elsewhere.

>
>
> > @@ -734,6 +747,7 @@ AllocSetAlloc(MemoryContext context, Size size)
> >                         return NULL;
> >  
> >                 context->mem_allocated += blksize;
> > +               pgstat_report_backend_allocated_bytes_increase(blksize);
> >  
> >                 block->aset = set;
> >                 block->freeptr = block->endptr = ((char *) block) + blksize;
> > @@ -944,6 +958,7 @@ AllocSetAlloc(MemoryContext context, Size size)
> >                         return NULL;
> >  
> >                 context->mem_allocated += blksize;
> > +               pgstat_report_backend_allocated_bytes_increase(blksize);
> >  
> >                 block->aset = set;
> >                 block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
> > @@ -1043,6 +1058,7 @@ AllocSetFree(void *pointer)
> >                         block->next->prev = block->prev;
> >  
> >                 set->header.mem_allocated -= block->endptr - ((char *) block);
> > +               pgstat_report_backend_allocated_bytes_decrease(block->endptr - ((char *) block));
> >  
> >  #ifdef CLOBBER_FREED_MEMORY
> >                 wipe_mem(block, block->freeptr - ((char *) block));
>
> I suspect this will be noticable cost-wise. Even though these paths aren't the
> hottest memory allocation paths, by nature of going down into malloc, adding
> an external function call that then does a bunch of branching etc. seems
> likely to add up to some.  See below for how I think we can deal with
> that...
>
>
> > +
> > +/* --------
> > + * pgstat_report_backend_allocated_bytes_increase() -
> > + *
> > + * Called to report increase in memory allocated for this backend
> > + * --------
> > + */
> > +void
> > +pgstat_report_backend_allocated_bytes_increase(uint64 allocation)
> > +{
> > +       volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > +       if (!beentry || !pgstat_track_activities)
> > +       {
> > +               /*
> > +                * Account for memory before pgstats is initialized. This will be
> > +                * migrated to pgstats on initialization.
> > +                */
> > +               backend_allocated_bytes += allocation;
> > +
> > +               return;
> > +       }
> > +
> > +       /*
> > +        * Update my status entry, following the protocol of bumping
> > +        * st_changecount before and after.  We use a volatile pointer here to
> > +        * ensure the compiler doesn't try to get cute.
> > +        */
> > +       PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
> > +       beentry->backend_allocated_bytes += allocation;
> > +       PGSTAT_END_WRITE_ACTIVITY(beentry);
> > +}
>
> This is quite a few branches, including write/read barriers.
>
> It doesn't really make sense to use the PGSTAT_BEGIN_WRITE_ACTIVITY() pattern
> here - you're just updating a single value, there's nothing to be gained by
> it. The point of PGSTAT_BEGIN_*ACTIVITY() stuff is to allow readers to get a
> consistent view of multiple values - but there aren't multiple values
> here!

I'll remove the barriers - initially I copied how prior functions were
coded as my template ala
pgstat_report_query_id, pgstat_report_xact_timestamp.

>
> To avoid the overhead of checking (!beentry || !pgstat_track_activities) and
> the external function call, I think you'd be best off copying the trickery I
> introduced for pgstat_report_wait_start(), in 225a22b19ed.
>
> I.e. make pgstat_report_backend_allocated_bytes_increase() a static inline
> function that unconditionally updates something like
> *my_backend_allocated_memory.  To deal with the case of (!beentry ||
> !pgstat_track_activities), that variable initially points to some backend
> local state and is set to the shared state in pgstat_bestart().
>
> This additionally has the nice benefit that you can track memory usage from
> before pgstat_bestart(), it'll be in the local variable.

OK, I think I can mimic the code you reference.

>
> > +void
> > +pgstat_report_backend_allocated_bytes_decrease(uint64
> > deallocation)
> > +{
> > +       volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > +       /*
> > +        * Cases may occur where shared memory from a previous postmaster
> > +        * invocation still exist. These are cleaned up at startup by
> > +        * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
> > +        * zero in case no corresponding prior increase exists or decrease has
> > +        * already been accounted for.
> > +        */
>
> I don't really follow - postmaster won't ever have a backend status
> array, so how would they be tracked here?

On startup, a check is made for leftover dsm control segments in the
DataDir. It appears possible that in certain situations on startup we
may find and destroy stale segments and thus decrement the allocation
variable.

I based this off of:
/ipc/dsm.c

dsm_postmaster_startup:
 150 dsm_postmaster_startup(PGShmemHeader *shim)
 {
...snip...
 158     /*
                 
 159     ¦* If we're using the mmap implementations, clean up any leftovers.
                 
 160     ¦* Cleanup isn't needed on Windows, and happens earlier in startup for
                 
 161     ¦* POSIX and System V shared memory, via a direct call to
                 
 162     ¦* dsm_cleanup_using_control_segment.
                 
 163     ¦*/
...snip... }


dsm_cleanup_using_control_segment:
 206 /*
                 
 207  * Determine whether the control segment from the previous postmaster
                 
 208  * invocation still exists.  If so, remove the dynamic shared memory
                 
 209  * segments to which it refers, and then the control segment itself.
                 
 210  */
                 
 211 void
                 
 212 dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
                 
 213 {
 ...snip...
 270         /* Destroy the referenced segment. */
                 
 271         dsm_impl_op(DSM_OP_DESTROY, handle, 0, &junk_impl_private,
                 
 272                     &junk_mapped_address, &junk_mapped_size, LOG);
                 
 273     }
                 
 274
                 
 275     /* Destroy the old control segment, too. */
                 
 276     elog(DEBUG2,
                 
 277         ¦"cleaning up dynamic shared memory control segment with ID %u",
                 
 278         ¦old_control_handle);
                 
 279     dsm_impl_op(DSM_OP_DESTROY, old_control_handle, 0, &impl_private,
                 
 280                 &mapped_address, &mapped_size, LOG);
 281 }

>
> Greetings,
>
> Andres Freund

Thanks again,
Reid

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com





Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
Hi Melanie,
Thank you for looking at this and for the feedback. Responses inline
below.

On Mon, 2022-11-07 at 16:17 -0500, Melanie Plageman wrote:
> On Fri, Nov 04, 2022 at 08:56:13AM -0400, Reid Thompson wrote:
> > From a8de5d29c0c6f10962181926a49ad4fec1e52bd1 Mon Sep 17 00:00:00
> > 2001
> > From: Reid Thompson <jreidthompson@nc.rr.com>
> > Date: Thu, 11 Aug 2022 12:01:25 -0400
> > Subject: [PATCH] Add tracking of backend memory allocated to pg_stat_activity
> > +
>
> It doesn't seem like you need the backend_ prefix in the view since
> that is implied by it being in pg_stat_activity.

I will remove the prefix.

> For the wording on the description, I find "memory allocated to this
> backend" a bit confusing. Perhaps you could reword it to make clear
> you mean that the number represents the balance of allocations by this
> backend. Something like:
>
>         Memory currently allocated to this backend in bytes. This is the
>         balance of bytes allocated and freed by this backend.
> I would also link to the system administration function
> pg_size_pretty() so users know how to easily convert the value.

Thanks, I'll make these changes

> If you end up removing shared memory as Andres suggests in [1], I
> would link pg_shmem_allocations view here and point out that shared memory
> allocations can be viewed there instead (and why).
>
> You could instead add dynamic shared memory allocation to the
> pg_shmem_allocations view as suggested as follow-on work by the
> commit which introduced it, ed10f32e3.
>
> > +/* --------
> > + * pgstat_report_backend_allocated_bytes_increase() -
> > + *
> > + * Called to report increase in memory allocated for this backend
> > + * --------
> > + */
>
> It seems like you could combine the
> pgstat_report_backend_allocated_bytes_decrease/increase() by either
> using a signed integer to represent the allocation/deallocation or passing in
> a "direction" that is just a positive or negative multiplier enum.
>
> Especially if you don't use the write barriers, I think you could
> simplify the logic in the two functions.
>
> If you do combine them, you might shorten the name to
> pgstat_report_backend_allocation() or pgstat_report_allocation().

Agreed. This seems a cleaner, simpler way to go.  I'll add it to the
TODO list.

> > +       /*
> > +        * Do not allow backend_allocated_bytes to go below zero.
> > ereport if we
> > +        * would have. There's no need for a lock around the read
> > here as it's
> > +        * being referenced from the same backend which means that
> > there shouldn't
> > +        * be concurrent writes. We want to generate an ereport in
> > these cases.
> > +        */
> > +       if (deallocation > beentry->backend_allocated_bytes)
> > +       {
> > +               ereport(LOG, errmsg("decrease reduces reported
> > backend memory allocated below zero; setting reported to 0"));
> > +
>
> I also think it would be nice to include the deallocation amount and
> backend_allocated_bytes amount in the log message.
> It also might be nice to start the message with something more clear
> than "decrease".
> For example, I would find this clear as a user:
>
>         Backend [backend_type or pid] deallocated [deallocation number] bytes,
>         [backend_allocated_bytes - deallocation number] more than this backend
>         has reported allocating.

Sounds good, I'll implement these changes

> > diff --git a/src/backend/utils/adt/pgstatfuncs.c
> > b/src/backend/utils/adt/pgstatfuncs.c
> > index 96bffc0f2a..b6d135ad2f 100644
> > --- a/src/backend/utils/adt/pgstatfuncs.c
> > +++ b/src/backend/utils/adt/pgstatfuncs.c
> > @@ -553,7 +553,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
> >  Datum
> >  pg_stat_get_activity(PG_FUNCTION_ARGS)
> >  {
> > -#define PG_STAT_GET_ACTIVITY_COLS      30
> > +#define PG_STAT_GET_ACTIVITY_COLS      31
> >         int                     num_backends =
> >  
> > +               values[30] = UInt64GetDatum(beentry->backend_allocated_bytes);
>
> Though not the fault of this patch, it is becoming very difficult to
> keep the columns straight in pg_stat_get_activity(). Perhaps you
> could add a separate commit to add an enum for the columns so the function
> is easier to understand.
>
> > diff --git a/src/include/utils/backend_status.h
> > b/src/include/utils/backend_status.h
> > index b582b46e9f..75d87e8308 100644
> > --- a/src/include/utils/backend_status.h
> > +++ b/src/include/utils/backend_status.h
> > @@ -169,6 +169,9 @@ typedef struct PgBackendStatus
> >  
> >         /* query identifier, optionally computed using
> > post_parse_analyze_hook */
> >         uint64          st_query_id;
> > +
> > +       /* Current memory allocated to this backend */
> > +       uint64          backend_allocated_bytes;
> >  } PgBackendStatus;
>
> I don't think you need the backend_ prefix here since it is in
> PgBackendStatus.

Agreed again, I'll remove the prefix.

> > @@ -313,7 +316,9 @@ extern const char
> > *pgstat_get_backend_current_activity(int pid, bool checkUser);
> >  extern const char *pgstat_get_crashed_backend_activity(int pid,
> > char *buffer,
> >                                                                    
> >                                         int buflen);
> >  extern uint64 pgstat_get_my_query_id(void);
> > -
> > +extern void pgstat_report_backend_allocated_bytes_increase(uint64
> > allocation);
> > +extern void pgstat_report_backend_allocated_bytes_decrease(uint64
> > deallocation);
> > +extern uint64 pgstat_get_all_backend_memory_allocated(void);
> >  
> >  /* ----------
> >   * Support functions for the SQL-callable functions to
> > diff --git a/src/test/regress/expected/rules.out
> > b/src/test/regress/expected/rules.out
> > index 624d0e5aae..ba9f494806 100644
> > --- a/src/test/regress/expected/rules.out
> > +++ b/src/test/regress/expected/rules.out
> > @@ -1753,10 +1753,11 @@ pg_stat_activity| SELECT s.datid,
> >      s.state,
> >      s.backend_xid,
> >      s.backend_xmin,
> > +    s.backend_allocated_bytes,
> >      s.query_id,
> >      s.query,
> >      s.backend_type
>
> Seems like it would be possible to add a functional test to
> stats.sql.

I will look at adding this.


> - Melanie
>
> [1]
> https://www.postgresql.org/message-id/20221105024146.xxlbtsxh2niyz2fu%40awork3.anarazel.de

Thanks again,
Reid

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com




Re: Add tracking of backend memory allocated to pg_stat_activity

From
Andres Freund
Date:
Hi,

2022-11-09 08:54:54 -0500, Reid Thompson wrote:
> Thanks for looking at this and for the feedback. Responses inline below.
> > > +void
> > > +pgstat_report_backend_allocated_bytes_decrease(uint64
> > > deallocation)
> > > +{
> > > +       volatile PgBackendStatus *beentry = MyBEEntry;
> > > +
> > > +       /*
> > > +        * Cases may occur where shared memory from a previous postmaster
> > > +        * invocation still exist. These are cleaned up at startup by
> > > +        * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
> > > +        * zero in case no corresponding prior increase exists or decrease has
> > > +        * already been accounted for.
> > > +        */
> > 
> > I don't really follow - postmaster won't ever have a backend status
> > array, so how would they be tracked here?
> 
> On startup, a check is made for leftover dsm control segments in the
> DataDir. It appears possible that in certain situations on startup we
> may find and destroy stale segments and thus decrement the allocation
> variable. 
> 
> I based this off of:
> /ipc/dsm.c
> 
> dsm_postmaster_startup:
>  150 dsm_postmaster_startup(PGShmemHeader *shim)
>  {
> ...
>  281 }

I don't think we should account for memory allocations done in postmaster in
this patch. They'll otherwise be counted again in each of the forked
backends. As this cleanup happens during postmaster startup, we'll have to
make sure accounting is reset during backend startup.

Greetings,

Andres Freund



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
Code rebased to current master.
Updated to incorporate additional recommendations from the the list
   - add units to variables in view
   - remove 'backend_' prefix from variables/functions
   - update documentation
   - add functional test for allocated_bytes
   - refactor allocation reporting to reduce number of functions and
     branches/reduce performance hit
   - zero allocated bytes after fork to avoid double counting postmaster allocations




--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com


Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Wed, 2022-11-09 at 09:23 -0500, Reid Thompson wrote:
> Hi Melanie,
> Thank you for looking at this and for the feedback. Responses inline
> below.
>
> On Mon, 2022-11-07 at 16:17 -0500, Melanie Plageman wrote:
> >
> > It doesn't seem like you need the backend_ prefix in the view since
> > that is implied by it being in pg_stat_activity.
>
> I will remove the prefix.

done

>
> > For the wording on the description, I find "memory allocated to
> > this
> > backend" a bit confusing. Perhaps you could reword it to make clear
> > you mean that the number represents the balance of allocations by
> > this
> > backend. Something like:
> >
> >         Memory currently allocated to this backend in bytes. This
> > is the
> >         balance of bytes allocated and freed by this backend.
> > I would also link to the system administration function
> > pg_size_pretty() so users know how to easily convert the value.
>
> Thanks, I'll make these changes

done

> > > +/* --------
> > > + * pgstat_report_backend_allocated_bytes_increase() -
> > > + *
> > > + * Called to report increase in memory allocated for this
> > > backend
> > > + * --------
> > > + */
> >
> > It seems like you could combine the
> > pgstat_report_backend_allocated_bytes_decrease/increase() by either
> > using a signed integer to represent the allocation/deallocation or
> > passing in
> > a "direction" that is just a positive or negative multiplier enum.
> >
> > Especially if you don't use the write barriers, I think you could
> > simplify the logic in the two functions.
> >
> > If you do combine them, you might shorten the name to
> > pgstat_report_backend_allocation() or pgstat_report_allocation().
>
> Agreed. This seems a cleaner, simpler way to go.  I'll add it to the
> TODO list.

done

>
> > > +       /*
> > > +        * Do not allow backend_allocated_bytes to go below zero.
> > > ereport if we
> > > +        * would have. There's no need for a lock around the read
> > > here as it's
> > > +        * being referenced from the same backend which means
> > > that
> > > there shouldn't
> > > +        * be concurrent writes. We want to generate an ereport
> > > in
> > > these cases.
> > > +        */
> > > +       if (deallocation > beentry->backend_allocated_bytes)
> > > +       {
> > > +               ereport(LOG, errmsg("decrease reduces reported
> > > backend memory allocated below zero; setting reported to 0"));
> > > +
> >
> > I also think it would be nice to include the deallocation amount
> > and
> > backend_allocated_bytes amount in the log message.
> > It also might be nice to start the message with something more
> > clear
> > than "decrease".
> > For example, I would find this clear as a user:
> >
> >         Backend [backend_type or pid] deallocated [deallocation
> > number] bytes,
> >         [backend_allocated_bytes - deallocation number] more than
> > this backend
> >         has reported allocating.
>
> Sounds good, I'll implement these changes

done

> > > diff --git a/src/include/utils/backend_status.h
> > > b/src/include/utils/backend_status.h
> > > index b582b46e9f..75d87e8308 100644
> > > --- a/src/include/utils/backend_status.h
> > > +++ b/src/include/utils/backend_status.h
> > > @@ -169,6 +169,9 @@ typedef struct PgBackendStatus
> > >  
> > >         /* query identifier, optionally computed using
> > > post_parse_analyze_hook */
> > >         uint64          st_query_id;
> > > +
> > > +       /* Current memory allocated to this backend */
> > > +       uint64          backend_allocated_bytes;
> > >  } PgBackendStatus;
> >
> > I don't think you need the backend_ prefix here since it is in
> > PgBackendStatus.
>
> Agreed again, I'll remove the prefix.

done

> > >  /* ----------
> > >   * Support functions for the SQL-callable functions to
> > > diff --git a/src/test/regress/expected/rules.out
> > > b/src/test/regress/expected/rules.out
> > > index 624d0e5aae..ba9f494806 100644
> > > --- a/src/test/regress/expected/rules.out
> > > +++ b/src/test/regress/expected/rules.out
> > > @@ -1753,10 +1753,11 @@ pg_stat_activity| SELECT s.datid,
> > >      s.state,
> > >      s.backend_xid,
> > >      s.backend_xmin,
> > > +    s.backend_allocated_bytes,
> > >      s.query_id,
> > >      s.query,
> > >      s.backend_type
> >
> > Seems like it would be possible to add a functional test to
> > stats.sql.
>
> I will look at adding this.

done

patch attached to
https://www.postgresql.org/message-id/06b4922193b80776a31e08a3809f2414b0d4bf90.camel%40crunchydata.com
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com




Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Wed, 2022-11-09 at 08:54 -0500, Reid Thompson wrote:
> Hi Andres,
> Thanks for looking at this and for the feedback. Responses inline
> below.
>
>> On Fri, 2022-11-04 at 19:41 -0700, Andres Freund wrote:
> > Hi,
> >
> On 2022-11-04 08:56:13 -0400, Reid Thompson wrote:
> >
> > I'm not convinced that counting DSM values this way is quite right.
> > There are a few uses of DSMs that track shared resources, with the biggest
> > likely being the stats for relations etc.  I suspect that tracking that via
> > backend memory usage will be quite confusing, because fairly random backends that
> > had to grow the shared state end up being blamed with the memory usage in
> > perpituity - and then suddenly that memory usage will vanish when that backend exits,
> > despite the memory continuing to exist.
>
> Ok, I'll make an attempt to identify these allocations and manage
> them elsewhere.

still TBD

> >
> >
> > > @@ -734,6 +747,7 @@ AllocSetAlloc(MemoryContext context, Size
> > > size)
> > >                         return NULL;
> > >  
> > >                 context->mem_allocated += blksize;
> > > +               pgstat_report_backend_allocated_bytes_increase(bl
> > > ksize);
> >
> > I suspect this will be noticable cost-wise. Even though these paths aren't the
> > hottest memory allocation paths, by nature of going down into malloc, adding
> > an external function call that then does a bunch of branching etc. seems
> > likely to add up to some.  See below for how I think we can deal with that...
> >
> > This is quite a few branches, including write/read barriers.
> >
> > It doesn't really make sense to use the
> > PGSTAT_BEGIN_WRITE_ACTIVITY() pattern
> > here - you're just updating a single value, there's nothing to be gained by
> > it. The point of PGSTAT_BEGIN_*ACTIVITY() stuff is to allow readers to get a
> > consistent view of multiple values - but there aren't multiple values
> > here!
>
> I'll remove the barriers - initially I copied how prior functions were

barriers removed

> >
> > To avoid the overhead of checking (!beentry || !pgstat_track_activities) and
> > the external function call, I think you'd be best off copying the trickery I
> > introduced for pgstat_report_wait_start(), in 225a22b19ed.
> >
> > I.e. make pgstat_report_backend_allocated_bytes_increase() a static inline
> > function that unconditionally updates something like
> > *my_backend_allocated_memory.  To deal with the case of (!beentry ||
> > !pgstat_track_activities), that variable initially points to some backend
> > local state and is set to the shared state in pgstat_bestart().
> >
> > This additionally has the nice benefit that you can track memory usage from
> > before pgstat_bestart(), it'll be in the local variable.
>
> OK, I think I can mimic the code you reference.

done

patch attached to
https://www.postgresql.org/message-id/06b4922193b80776a31e08a3809f2414b0d4bf90.camel%40crunchydata.com

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com




Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Sat, 2022-11-26 at 22:10 -0500, Reid Thompson wrote:
> Code rebased to current master.
> Updated to incorporate additional recommendations from the the list
>    - add units to variables in view
>    - remove 'backend_' prefix from variables/functions
>    - update documentation
>    - add functional test for allocated_bytes
>    - refactor allocation reporting to reduce number of functions and
>      branches/reduce performance hit
>    - zero allocated bytes after fork to avoid double counting
> postmaster allocations
>
>
>
>

attempt to remedy cfbot windows build issues

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com

Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Justin Pryzby
Date:
On Sun, Nov 27, 2022 at 12:32:19AM -0500, Reid Thompson wrote:
> attempt to remedy cfbot windows build issues

You can trigger those tests under your own/private repo by pushing a
branch to github.  See src/tools/ci/README

I suppose the cfbot page ought to point that out.

-- 
Justin



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Ted Yu
Date:


On Sat, Nov 26, 2022 at 9:32 PM Reid Thompson <reid.thompson@crunchydata.com> wrote:
On Sat, 2022-11-26 at 22:10 -0500, Reid Thompson wrote:
> Code rebased to current master.
> Updated to incorporate additional recommendations from the the list
>    - add units to variables in view
>    - remove 'backend_' prefix from variables/functions
>    - update documentation
>    - add functional test for allocated_bytes
>    - refactor allocation reporting to reduce number of functions and
>      branches/reduce performance hit
>    - zero allocated bytes after fork to avoid double counting
> postmaster allocations
>
>
>
>

attempt to remedy cfbot windows build issues


Hi,

+               if (request_size > *mapped_size)
+               {
+                       pgstat_report_allocated_bytes(*mapped_size, DECREASE);
+                       pgstat_report_allocated_bytes(request_size, INCREASE);

pgstat_report_allocated_bytes is called twice for this case. Can the two calls be combined into one (with request_size - *mapped_size, INCREASE) ?

Cheers

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Justin Pryzby
Date:
On Sun, Nov 27, 2022 at 12:32:19AM -0500, Reid Thompson wrote:
> @@ -32,6 +33,12 @@ typedef enum BackendState
>      STATE_DISABLED
>  } BackendState;
>  
> +/* Enum helper for reporting memory allocated bytes */
> +enum allocation_direction
> +{
> +    DECREASE = -1,
> +    INCREASE = 1,
> +};

BTW, these should have some kind of prefix, like PG_ALLOC_* to avoid
causing the same kind of problem for someone else that another header
caused for you by defining something somewhere called IGNORE (ignore
what, I don't know).  The other problem was probably due to a define,
though.  Maybe instead of an enum, the function should take a boolean.

I still wonder whether there needs to be a separate CF entry for the
0001 patch.  One issue is that there's two different lists of people
involved in the threads.

-- 
Justin



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Ted Yu
Date:


On Sun, Nov 27, 2022 at 7:41 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sun, Nov 27, 2022 at 12:32:19AM -0500, Reid Thompson wrote:
> @@ -32,6 +33,12 @@ typedef enum BackendState
>       STATE_DISABLED
>  } BackendState;

> +/* Enum helper for reporting memory allocated bytes */
> +enum allocation_direction
> +{
> +     DECREASE = -1,
> +     INCREASE = 1,
> +};

BTW, these should have some kind of prefix, like PG_ALLOC_* to avoid
causing the same kind of problem for someone else that another header
caused for you by defining something somewhere called IGNORE (ignore
what, I don't know).  The other problem was probably due to a define,
though.  Maybe instead of an enum, the function should take a boolean.

I still wonder whether there needs to be a separate CF entry for the
0001 patch.  One issue is that there's two different lists of people
involved in the threads.

--
Justin


I am a bit curious: why is the allocation_direction enum needed ?

pgstat_report_allocated_bytes() can be given the amount (either negative or positive) to adjust directly.

Cheers

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Andres Freund
Date:
On 2022-11-26 22:10:06 -0500, Reid Thompson wrote:
>    - zero allocated bytes after fork to avoid double counting postmaster allocations

I still don't understand this - postmaster shouldn't be counted at all. It
doesn't have a PgBackendStatus. There simply shouldn't be any tracked
allocations from it.

Greetings,

Andres Freund



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Mon, 2022-11-28 at 10:59 -0800, Andres Freund wrote:
> On 2022-11-26 22:10:06 -0500, Reid Thompson wrote:
> >    - zero allocated bytes after fork to avoid double counting
> > postmaster allocations
>
> I still don't understand this - postmaster shouldn't be counted at
> all. It
> doesn't have a PgBackendStatus. There simply shouldn't be any tracked
> allocations from it.
>
> Greetings,
>
> Andres Freund

Hi Andres,
I based this on the following.

It appears to me that Postmaster populates the local variable that
*my_allocated_bytes points to. That allocation is passed to forked
children, and if not zeroed out, will be double counted as part of
the child allocation. Is this invalid?

$ ps -ef|grep postgres
postgres    6389       1  0 Dec01 ?        00:00:17 /usr/sbin/pgbouncer -d /etc/pgbouncer/pgbouncer.ini
rthompso 2937799       1  0 09:45 ?        00:00:00 /tmp/postgres/install/pg-stats-memory/bin/postgres -D
/var/tmp/pg-stats-memory-p 5433 
rthompso 2937812 2937799  0 09:45 ?        00:00:00 postgres: checkpointer
rthompso 2937813 2937799  0 09:45 ?        00:00:00 postgres: background writer
rthompso 2937816 2937799  0 09:45 ?        00:00:00 postgres: walwriter
rthompso 2937817 2937799  0 09:45 ?        00:00:00 postgres: autovacuum launcher
rthompso 2937818 2937799  0 09:45 ?        00:00:00 postgres: logical replication launcher
rthompso 2938877 2636586  0 09:46 pts/4    00:00:00 /usr/lib/postgresql/12/bin/psql -h localhost -p 5433 postgres
rthompso 2938909 2937799  0 09:46 ?        00:00:00 postgres: rthompso postgres 127.0.0.1(44532) idle
rthompso 2942164 1987403  0 09:49 pts/3    00:00:00 grep postgres

Bracketing fork_process() calls with logging to print *my_allocated_bytes immediately prior and after fork_process...
To me, this indicates that the forked children for this invocation
(checkpointer, walwriter, autovac launcher, client backend, autovac worker, etc)
are inheriting 240672 bytes from postmaster.

$ ccat logfile
2022-12-02 09:45:05.871 EST [2937799] LOG:  starting PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.4.0-1ubuntu1~20.04.1)9.4.0, 64-bit 
2022-12-02 09:45:05.872 EST [2937799] LOG:  listening on IPv4 address "127.0.0.1", port 5433
2022-12-02 09:45:05.874 EST [2937799] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
parent StartChildProcess process. pid: 2937799 *my_allocated_bytes: 240672
parent StartChildProcess process. pid: 2937799 *my_allocated_bytes: 240672
parent StartChildProcess process. pid: 2937799 *my_allocated_bytes: 240672
child StartChildProcess process. pid: 2937812 *my_allocated_bytes: 240672
child StartChildProcess process. pid: 2937813 *my_allocated_bytes: 240672
child StartChildProcess process. pid: 2937814 *my_allocated_bytes: 240672
2022-12-02 09:45:05.884 EST [2937814] LOG:  database system was shut down at 2022-12-02 09:41:13 EST
parent StartChildProcess process. pid: 2937799 *my_allocated_bytes: 240672
parent StartAutoVacLauncher process. pid: 2937799 *my_allocated_bytes: 240672
child StartChildProcess process. pid: 2937816 *my_allocated_bytes: 240672
parent do_start_bgworker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacLauncher process. pid: 2937817 *my_allocated_bytes: 240672
2022-12-02 09:45:05.889 EST [2937799] LOG:  database system is ready to accept connections
child do_start_bgworker process. pid: 2937818 *my_allocated_bytes: 240672
parent StartAutoVacWorker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacWorker process. pid: 2938417 *my_allocated_bytes: 240672
parent BackendStartup process. pid: 2937799 *my_allocated_bytes: 240672
child BackendStartup process. pid: 2938909 *my_allocated_bytes: 240672
parent StartAutoVacWorker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacWorker process. pid: 2938910 *my_allocated_bytes: 240672
parent StartAutoVacWorker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacWorker process. pid: 2939340 *my_allocated_bytes: 240672
parent StartAutoVacWorker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacWorker process. pid: 2939665 *my_allocated_bytes: 240672
parent StartAutoVacWorker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacWorker process. pid: 2940038 *my_allocated_bytes: 240672
parent StartAutoVacWorker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacWorker process. pid: 2940364 *my_allocated_bytes: 240672
parent StartAutoVacWorker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacWorker process. pid: 2940698 *my_allocated_bytes: 240672
parent StartAutoVacWorker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacWorker process. pid: 2941317 *my_allocated_bytes: 240672
parent StartAutoVacWorker process. pid: 2937799 *my_allocated_bytes: 240672
child StartAutoVacWorker process. pid: 2941825 *my_allocated_bytes: 240672


Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com




Re: Add tracking of backend memory allocated to pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2022-12-02 11:09:30 -0500, Reid Thompson wrote:
> It appears to me that Postmaster populates the local variable that
> *my_allocated_bytes points to. That allocation is passed to forked
> children, and if not zeroed out, will be double counted as part of
> the child allocation. Is this invalid?

I don't think we should count allocations made before backend_status.c has
been initialized.

Greetings,

Andres Freund



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Fri, 2022-12-02 at 09:18 -0800, Andres Freund wrote:
> Hi,
>
> On 2022-12-02 11:09:30 -0500, Reid Thompson wrote:
> > It appears to me that Postmaster populates the local variable that
> > *my_allocated_bytes points to. That allocation is passed to forked
> > children, and if not zeroed out, will be double counted as part of
> > the child allocation. Is this invalid?
>
> I don't think we should count allocations made before
> backend_status.c has
> been initialized.
>
> Greetings,
>
> Andres Freund

Hi,
The intent was to capture and display all the memory allocated to the
backends, for admins/users/max_total_backend_memory/others to utilize.
Why should we ignore the allocations prior to backend_status.c?

Thanks,
Reid

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2022-12-06 08:47:55 -0500, Reid Thompson wrote:
> The intent was to capture and display all the memory allocated to the
> backends, for admins/users/max_total_backend_memory/others to utilize.

It's going to be far less accurate than that. For one, there's a lot of
operating system resources, like the page table, that are going to be
ignored. We're also not going to capture allocations done directly via
malloc(). There's also copy-on-write effects that we're ignoring.

If we just want to show an accurate picture of the current memory usage, we
need to go to operating system specific interfaces.


> Why should we ignore the allocations prior to backend_status.c?

It seems to add complexity without a real increase in accuracy to me. But I'm
not going to push harder on it than I already have.

Greetings,

Andres Freund



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Sun, 2022-11-27 at 09:40 -0600, Justin Pryzby wrote:
> > BTW, these should have some kind of prefix, like PG_ALLOC_* to
> > avoid causing the same kind of problem for someone else that
> > another header caused for you by defining something somewhere
> > called IGNORE (ignore what, I don't know).  The other problem was
> > probably due to a define, though.  Maybe instead of an enum, the
> > function should take a boolean.
> >

Patch updated to current master and includes above prefix
recommendation and combining of two function calls to one recommended
by Ted Yu.

> >
> > I still wonder whether there needs to be a separate CF entry for
> > the 0001 patch.  One issue is that there's two different lists of
> > people involved in the threads.
> >

I'm OK with containing the conversation to one thread if everyone else
is.  If there's no argument against, then patches after today will go
to the "Add the ability to limit the amount of memory that can be
allocated to backends" thread 
https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.camel@crunchydata.com

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com



Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
vignesh C
Date:
On Thu, 8 Dec 2022 at 19:44, Reid Thompson
<reid.thompson@crunchydata.com> wrote:
>
> On Sun, 2022-11-27 at 09:40 -0600, Justin Pryzby wrote:
> > > BTW, these should have some kind of prefix, like PG_ALLOC_* to
> > > avoid causing the same kind of problem for someone else that
> > > another header caused for you by defining something somewhere
> > > called IGNORE (ignore what, I don't know).  The other problem was
> > > probably due to a define, though.  Maybe instead of an enum, the
> > > function should take a boolean.
> > >
>
> Patch updated to current master and includes above prefix
> recommendation and combining of two function calls to one recommended
> by Ted Yu.
>
> > >
> > > I still wonder whether there needs to be a separate CF entry for
> > > the 0001 patch.  One issue is that there's two different lists of
> > > people involved in the threads.
> > >
>
> I'm OK with containing the conversation to one thread if everyone else
> is.  If there's no argument against, then patches after today will go
> to the "Add the ability to limit the amount of memory that can be
> allocated to backends" thread
> https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.camel@crunchydata.com

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
e351f85418313e97c203c73181757a007dfda6d0 ===
=== applying patch
./0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patch
patching file src/backend/utils/mmgr/slab.c
Hunk #1 succeeded at 69 (offset 16 lines).
Hunk #2 succeeded at 414 (offset 175 lines).
Hunk #3 succeeded at 436 with fuzz 2 (offset 176 lines).
Hunk #4 FAILED at 286.
Hunk #5 succeeded at 488 (offset 186 lines).
Hunk #6 FAILED at 381.
Hunk #7 FAILED at 554.
3 out of 7 hunks FAILED -- saving rejects to file
src/backend/utils/mmgr/slab.c.rej

[1] - http://cfbot.cputube.org/patch_41_3865.log

Regards,
Vignesh



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Tue, 2023-01-03 at 16:25 +0530, vignesh C wrote:
> ...
> The patch does not apply on top of HEAD as in [1], please post a
> rebased patch:
>...
> Regards,
> Vignesh

Per conversation in thread listed below, patches have been submitted to the "Add the ability to limit the amount of
memorythat can be allocated to backends" thread 
https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.camel@crunchydata.com

0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patch
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patch

On Thu, 8 Dec 2022 at 19:44, Reid Thompson
<reid(dot)thompson(at)crunchydata(dot)com> wrote:
>
> On Sun, 2022-11-27 at 09:40 -0600, Justin Pryzby wrote:
> > > ...
> > > I still wonder whether there needs to be a separate CF entry for
> > > the 0001 patch.  One issue is that there's two different lists of
> > > people involved in the threads.
> > >
>
> I'm OK with containing the conversation to one thread if everyone else
> is.  If there's no argument against, then patches after today will go
> to the "Add the ability to limit the amount of memory that can be
> allocated to backends" thread
> https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.camel@crunchydata.com

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com




Re: Add tracking of backend memory allocated to pg_stat_activity

From
Justin Pryzby
Date:
On Thu, Jan 05, 2023 at 01:58:33PM -0500, Reid Thompson wrote:
> On Tue, 2023-01-03 at 16:25 +0530, vignesh C wrote:
> > ...
> > The patch does not apply on top of HEAD as in [1], please post a
> > rebased patch:
> >... 
> > Regards,
> > Vignesh
> 
> Per conversation in thread listed below, patches have been submitted to the "Add the ability to limit the amount of
memorythat can be allocated to backends" thread
 
> https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.camel@crunchydata.com

I suggest to close the associated CF entry.

(Also, the people who participated in this thread may want to be
included in the other thread going forward.)

> 0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patch
> 0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patch
> 
> On Thu, 8 Dec 2022 at 19:44, Reid Thompson
> <reid(dot)thompson(at)crunchydata(dot)com> wrote:
> >
> > On Sun, 2022-11-27 at 09:40 -0600, Justin Pryzby wrote:
> > > > ...
> > > > I still wonder whether there needs to be a separate CF entry for
> > > > the 0001 patch.  One issue is that there's two different lists of
> > > > people involved in the threads.
> > > >
> >
> > I'm OK with containing the conversation to one thread if everyone else
> > is.  If there's no argument against, then patches after today will go
> > to the "Add the ability to limit the amount of memory that can be
> > allocated to backends" thread
> > https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.camel@crunchydata.com



Re: Add tracking of backend memory allocated to pg_stat_activity

From
Reid Thompson
Date:
On Thu, 2023-01-05 at 14:13 -0600, Justin Pryzby wrote:
>
> I suggest to close the associated CF entry.

Closed with status of Withdrawn. If that is invalid, please advise.

> (Also, the people who participated in this thread may want to be
> included in the other thread going forward.)

Explicitly adding previously missed participants to Cc: - that conversation/patches are being consolidated to
the thread  "Add the ability to limit the amount of memory that can be allocated to backends"
 https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.camel@crunchydata.com



Thanks,
Reid

--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thompson@crunchydata.com
www.crunchydata.com




Re: Add tracking of backend memory allocated to pg_stat_activity

From
John Morris
Date:
Here is an updated version of the earlier work.

This version:
   1) Tracks memory as requested by the backend.
   2) Includes allocations made during program startup.
   3) Optimizes the "fast path" to only update two local variables.
   4) Places a cluster wide limit on total memory allocated.

The cluster wide limit is useful for multi-hosting. One greedy cluster doesn't starve
the other clusters of memory.

Note there isn't a good way to track actual memory used by a cluster.
Ideally, we like to get the working set size of each memory segment along with
the size of the associated kernel data structures.
Gathering that info in a portable way is a "can of worms".
Instead, we're managing memory as requested by the application.
While not identical, the two approaches are strongly correlated.

 The memory model used is
   1) Each process is assumed to use a certain amount of memory
       simply by existing.
   2) All pg memory allocations are counted, including those before
       the process is fully initialized.
    3) Each process maintains its own local counters. These are the "truth".
     4) Periodically,
        -  local counters are added into the global, shared memory counters.
         - pgstats is updated
         - total memory is checked.

For efficiency, the global total is an approximation, not a precise number.
It can be off by as much as 1 MB per process. Memory limiting
doesn't need precision, just a consistent and reasonable approximation.

Repeating the earlier benchmark test, there is no measurable loss of performance.


Attachment

Re: Add tracking of backend memory allocated to pg_stat_activity

From
Ted Yu
Date:


On Thu, Aug 31, 2023 at 9:19 AM John Morris <john.morris@crunchydata.com> wrote:
Here is an updated version of the earlier work.

This version:
   1) Tracks memory as requested by the backend.
   2) Includes allocations made during program startup.
   3) Optimizes the "fast path" to only update two local variables.
   4) Places a cluster wide limit on total memory allocated.

The cluster wide limit is useful for multi-hosting. One greedy cluster doesn't starve
the other clusters of memory.

Note there isn't a good way to track actual memory used by a cluster.
Ideally, we like to get the working set size of each memory segment along with
the size of the associated kernel data structures.
Gathering that info in a portable way is a "can of worms".
Instead, we're managing memory as requested by the application.
While not identical, the two approaches are strongly correlated.

 The memory model used is
   1) Each process is assumed to use a certain amount of memory
       simply by existing.
   2) All pg memory allocations are counted, including those before
       the process is fully initialized.
    3) Each process maintains its own local counters. These are the "truth".
     4) Periodically,
        -  local counters are added into the global, shared memory counters.
         - pgstats is updated
         - total memory is checked.

For efficiency, the global total is an approximation, not a precise number.
It can be off by as much as 1 MB per process. Memory limiting
doesn't need precision, just a consistent and reasonable approximation.

Repeating the earlier benchmark test, there is no measurable loss of performance.

Hi,
In `InitProcGlobal`:

+            elog(WARNING, "proc init: max_total=%d  result=%d\n", max_total_bkend_mem, result);

Is the above log for debugging purposes ? Maybe the log level should be changed.

+                                               errmsg("max_total_backend_memory %dMB - shared_memory_size %dMB is <= 100MB",

The `<=` in the error message is inconsistent with the `max_total_bkend_mem < result + 100` check.
Please modify one of them.

For update_global_allocation :

+       Assert((int64)pg_atomic_read_u64(&ProcGlobal->total_bkend_mem_bytes) >= 0);
+       Assert((int64)pg_atomic_read_u64(&ProcGlobal->global_dsm_allocation) >= 0);

Should the assertions be done earlier in the function?

For reserve_backend_memory:

+               return true;
+
+       /* CASE: the new allocation is within bounds. Take the fast path. */
+       else if (my_memory.allocated_bytes + size <= allocation_upper_bound)

The `else` can be omitted (the preceding if block returns).

For `atomic_add_within_bounds_i64`

+               newval = oldval + add;
+
+               /* check if we are out of bounds */
+               if (newval < lower_bound || newval > upper_bound || addition_overflow(oldval, add))

Since the summation is stored in `newval`, you can pass newval to `addition_overflow` so that `addition_overflow` doesn't add them again.

There are debug statements, such as:

+       debug("done\n");

you can drop them in the next patch.

Thanks