Re: Add tracking of backend memory allocated to pg_stat_activity - Mailing list pgsql-hackers

From Ted Yu
Subject Re: Add tracking of backend memory allocated to pg_stat_activity
Date
Msg-id CALte62zBxQ4wfS07ZAdOGy0pZVYxqcHvRb_zWA+7ZyCT9TrD=A@mail.gmail.com
Whole thread Raw
In response to Re: Add tracking of backend memory allocated to pg_stat_activity  (John Morris <john.morris@crunchydata.com>)
List pgsql-hackers


On Thu, Aug 31, 2023 at 9:19 AM John Morris <john.morris@crunchydata.com> wrote:
Here is an updated version of the earlier work.

This version:
   1) Tracks memory as requested by the backend.
   2) Includes allocations made during program startup.
   3) Optimizes the "fast path" to only update two local variables.
   4) Places a cluster wide limit on total memory allocated.

The cluster wide limit is useful for multi-hosting. One greedy cluster doesn't starve
the other clusters of memory.

Note there isn't a good way to track actual memory used by a cluster.
Ideally, we like to get the working set size of each memory segment along with
the size of the associated kernel data structures.
Gathering that info in a portable way is a "can of worms".
Instead, we're managing memory as requested by the application.
While not identical, the two approaches are strongly correlated.

 The memory model used is
   1) Each process is assumed to use a certain amount of memory
       simply by existing.
   2) All pg memory allocations are counted, including those before
       the process is fully initialized.
    3) Each process maintains its own local counters. These are the "truth".
     4) Periodically,
        -  local counters are added into the global, shared memory counters.
         - pgstats is updated
         - total memory is checked.

For efficiency, the global total is an approximation, not a precise number.
It can be off by as much as 1 MB per process. Memory limiting
doesn't need precision, just a consistent and reasonable approximation.

Repeating the earlier benchmark test, there is no measurable loss of performance.

Hi,
In `InitProcGlobal`:

+            elog(WARNING, "proc init: max_total=%d  result=%d\n", max_total_bkend_mem, result);

Is the above log for debugging purposes ? Maybe the log level should be changed.

+                                               errmsg("max_total_backend_memory %dMB - shared_memory_size %dMB is <= 100MB",

The `<=` in the error message is inconsistent with the `max_total_bkend_mem < result + 100` check.
Please modify one of them.

For update_global_allocation :

+       Assert((int64)pg_atomic_read_u64(&ProcGlobal->total_bkend_mem_bytes) >= 0);
+       Assert((int64)pg_atomic_read_u64(&ProcGlobal->global_dsm_allocation) >= 0);

Should the assertions be done earlier in the function?

For reserve_backend_memory:

+               return true;
+
+       /* CASE: the new allocation is within bounds. Take the fast path. */
+       else if (my_memory.allocated_bytes + size <= allocation_upper_bound)

The `else` can be omitted (the preceding if block returns).

For `atomic_add_within_bounds_i64`

+               newval = oldval + add;
+
+               /* check if we are out of bounds */
+               if (newval < lower_bound || newval > upper_bound || addition_overflow(oldval, add))

Since the summation is stored in `newval`, you can pass newval to `addition_overflow` so that `addition_overflow` doesn't add them again.

There are debug statements, such as:

+       debug("done\n");

you can drop them in the next patch.

Thanks

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Impact of checkpointer during pg_upgrade
Next
From: Erik Rijkers
Date:
Subject: Re: Row pattern recognition