Thread: Add the ability to limit the amount of memory that can be allocated to backends.
Add the ability to limit the amount of memory that can be allocated to backends.
Hi Hackers, Add the ability to limit the amount of memory that can be allocated to backends. This builds on the work that adds backend memory allocated to pg_stat_activity https://www.postgresql.org/message-id/67bb5c15c0489cb499723b0340f16e10c22485ec.camel%40crunchydata.com Both patches are attached. Add GUC variable max_total_backend_memory. Specifies a limit to the amount of memory (MB) that may be allocated to backends in total (i.e. this is not a per user or per backend limit). If unset, or set to 0 it is disabled. It is intended as a resource to help avoid the OOM killer. A backend request that would push the total over the limit will be denied with an out of memory error causing that backends current query/transaction to fail. Due to the dynamic nature of memory allocations, this limit is not exact. If within 1.5MB of the limit and two backends request 1MB each at the same time both may be allocated exceeding the limit. Further requests will not be allocated until dropping below the limit. Keep this in mind when setting this value to avoid the OOM killer. Currently, this limit does not affect auxiliary backend processes, this list of non-affected backend processes is open for discussion as to what should/should not be included. Backend memory allocations are displayed in the pg_stat_activity view. -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thompson@crunchydata.com www.crunchydata.com
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Wed, Aug 31, 2022 at 12:50:19PM -0400, Reid Thompson wrote: > Hi Hackers, > > Add the ability to limit the amount of memory that can be allocated to > backends. > > This builds on the work that adds backend memory allocated to > pg_stat_activity > https://www.postgresql.org/message-id/67bb5c15c0489cb499723b0340f16e10c22485ec.camel%40crunchydata.com > Both patches are attached. You should name the patches with different prefixes, like 001,002,003 Otherwise, cfbot may try to apply them in the wrong order. git format-patch is the usual tool for that. > + Specifies a limit to the amount of memory (MB) that may be allocated to MB are just the default unit, right ? The user should be allowed to write max_total_backend_memory='2GB' > + backends in total (i.e. this is not a per user or per backend limit). > + If unset, or set to 0 it is disabled. A backend request that would push > + the total over the limit will be denied with an out of memory error > + causing that backends current query/transaction to fail. Due to the dynamic backend's > + nature of memory allocations, this limit is not exact. If within 1.5MB of > + the limit and two backends request 1MB each at the same time both may be > + allocated exceeding the limit. Further requests will not be allocated until allocated, and exceed the limit > +bool > +exceeds_max_total_bkend_mem(uint64 allocation_request) > +{ > + bool result = false; > + > + if (MyAuxProcType != NotAnAuxProcess) > + return result; The double negative is confusing, so could use a comment. > + /* Convert max_total_bkend_mem to bytes for comparison */ > + if (max_total_bkend_mem && > + pgstat_get_all_backend_memory_allocated() + > + allocation_request > (uint64)max_total_bkend_mem * 1024 * 1024) > + { > + /* > + * Explicitely identify the OOM being a result of this > + * configuration parameter vs a system failure to allocate OOM. > + */ > + elog(WARNING, > + "request will exceed postgresql.conf defined max_total_backend_memory limit (%lu > %lu)", > + pgstat_get_all_backend_memory_allocated() + > + allocation_request, (uint64)max_total_bkend_mem * 1024 * 1024); I think it should be ereport() rather than elog(), which is internal-only, and not-translated. > + {"max_total_backend_memory", PGC_SIGHUP, RESOURCES_MEM, > + gettext_noop("Restrict total backend memory allocations to this max."), > + gettext_noop("0 turns this feature off."), > + GUC_UNIT_MB > + }, > + &max_total_bkend_mem, > + 0, 0, INT_MAX, > + NULL, NULL, NULL I think this needs a maximum like INT_MAX/1024/1024 > +uint64 > +pgstat_get_all_backend_memory_allocated(void) > +{ ... > + for (i = 1; i <= NumBackendStatSlots; i++) > + { It's looping over every backend for each allocation. Do you know if there's any performance impact of that ? I think it may be necessary to track the current allocation size in shared memory (with atomic increments?). Maybe decrements would need to be exactly accounted for, or otherwise Assert() that the value is not negative. I don't know how expensive it'd be to have conditionals for each decrement, but maybe the value would only be decremented at strategic times, like at transaction commit or backend shutdown. -- Justin
Re: Add the ability to limit the amount of memory that can be allocated to backends.
At Wed, 31 Aug 2022 12:50:19 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in > Hi Hackers, > > Add the ability to limit the amount of memory that can be allocated to > backends. The patch seems to limit both of memory-context allocations and DSM allocations happen on a specific process by the same budget. In the fist place I don't think it's sensible to cap the amount of DSM allocations by per-process budget. DSM is used by pgstats subsystem. There can be cases where pgstat complains for denial of DSM allocation after the budget has been exhausted by memory-context allocations, or every command complains for denial of memory-context allocation after once the per-process budget is exhausted by DSM allocations. That doesn't seem reasonable. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 8/31/22 6:50 PM, Reid Thompson wrote: > Hi Hackers, > > Add the ability to limit the amount of memory that can be allocated to > backends. Thanks for the patch. + 1 on the idea. > Specifies a limit to the amount of memory (MB) that may be allocated to > backends in total (i.e. this is not a per user or per backend limit). > If unset, or set to 0 it is disabled. It is intended as a resource to > help avoid the OOM killer. A backend request that would push the total > over the limit will be denied with an out of memory error causing that > backends current query/transaction to fail. I'm not sure we are choosing the right victims here (aka the ones that are doing the request that will push the total over the limit). Imagine an extreme case where a single backend consumes say 99% of the limit, shouldn't it be the one to be "punished"? (and somehow forced to give the memory back). The problem that i see with the current approach is that a "bad" backend could impact all the others and continue to do so. what about punishing say the highest consumer , what do you think? (just speaking about the general idea here, not about the implementation) Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Thu, 1 Sept 2022 at 04:52, Reid Thompson <reid.thompson@crunchydata.com> wrote: > Add the ability to limit the amount of memory that can be allocated to > backends. Are you aware that relcache entries are stored in backend local memory and that once we've added a relcache entry for a relation that we have no current code which attempts to reduce the memory consumption used by cache entries when there's memory pressure? It seems to me that if we had this feature as you propose that a backend could hit the limit and stay there just from the memory requirements of the relation cache after some number of tables have been accessed from the given backend. It's not hard to imagine a situation where the palloc() would start to fail during parse, which might make it quite infuriating for anyone trying to do something like: SET max_total_backend_memory TO 0; or ALTER SYSTEM SET max_total_backend_memory TO 0; I think a better solution to this problem would be to have "memory grants", where we configure some amount of "pool" memory that backends are allowed to use for queries. The planner would have to add the expected number of work_mem that the given query is expected to use and before that query starts, the executor would have to "checkout" that amount of memory from the pool and return it when finished. If there is not enough memory in the pool then the query would have to wait until enough memory is available. This creates a deadlocking hazard that the deadlock detector would need to be made aware of. I know Thomas Munro has mentioned this "memory grant" or "memory pool" feature to me previously and I think he even has some work in progress code for it. It's a very tricky problem, however, as aside from the deadlocking issue, it requires working out how much memory a given plan will use concurrently. That's not as simple as counting the nodes that use work_mem and summing those up. There is some discussion about the feature in [1]. I was unable to find what Thomas mentioned on the list about this. I've included him here in case he has any extra information to share. David [1] https://www.postgresql.org/message-id/flat/20220713222342.GE18011%40telsasoft.com#b4f526aa8f2c893567c1ecf069f9e6c7
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Wed, 2022-08-31 at 12:34 -0500, Justin Pryzby wrote: > You should name the patches with different prefixes, like > 001,002,003 Otherwise, cfbot may try to apply them in the wrong > order. > git format-patch is the usual tool for that. Thanks for the pointer. My experience with git in the past has been minimal and basic. > > + Specifies a limit to the amount of memory (MB) that may be > > allocated to > > MB are just the default unit, right ? > The user should be allowed to write max_total_backend_memory='2GB' Correct. Default units are MB. Other unit types are converted to MB. > > + causing that backends current query/transaction to fail. > > backend's > > + allocated exceeding the limit. Further requests will not > > allocated, and exceed the limit > > > + if (MyAuxProcType != NotAnAuxProcess) > The double negative is confusing, so could use a comment. > > + elog(WARNING, > I think it should be ereport() rather than elog(), which is > internal-only, and not-translated. Corrected/added the the above items. Attached patches with the corrections. > > + 0, 0, INT_MAX, > > + NULL, NULL, NULL > I think this needs a maximum like INT_MAX/1024/1024 Is this noting that we'd set a ceiling of 2048MB? > > + for (i = 1; i <= NumBackendStatSlots; i++) > > + { > > It's looping over every backend for each allocation. > Do you know if there's any performance impact of that ? I'm not very familiar with how to test performance impact, I'm open to suggestions. I have performed the below pgbench tests and noted the basic tps differences in the table. Test 1: branch master CFLAGS="-I/usr/include/python3.8/ " /home/rthompso/src/git/postgres/configure --silent --prefix=/home/rthompso/src/git/postgres/install/master--with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6 --with-perl--with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap --enable-nls make -s -j12 && make -s install initdb default postgresql.conf settings init pgbench pgbench -U rthompso -p 5433 -h localhost -i -s 50 testpgbench 10 iterations for ctr in {1..10}; do { time pgbench -p 5433 -h localhost -c 10 -j 10 -t 50000 testpgbench; } 2>&1 | tee -a pgstatsResultsNoLimitSet;done Test 2: branch pg-stat-activity-backend-memory-allocated CFLAGS="-I/usr/include/python3.8/ " /home/rthompso/src/git/postgres/configure --silent --prefix=/home/rthompso/src/git/postgres/install/pg-stats-memory/--with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6--with-perl --with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap--enable-nls make -s -j12 && make -s install initdb default postgresql.conf settings init pgbench pgbench -U rthompso -p 5433 -h localhost -i -s 50 testpgbench 10 iterations for ctr in {1..10}; do { time pgbench -p 5433 -h localhost -c 10 -j 10 -t 50000 testpgbench; } 2>&1 | tee -a pgstatsResultsPg-stats-memory;done Test 3: branch dev-max-memory CFLAGS="-I/usr/include/python3.8/ " /home/rthompso/src/git/postgres/configure --silent --prefix=/home/rthompso/src/git/postgres/install/dev-max-memory/--with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6--with-perl --with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap--enable-nls make -s -j12 && make -s install initdb default postgresql.conf settings init pgbench pgbench -U rthompso -p 5433 -h localhost -i -s 50 testpgbench 10 iterations for ctr in {1..10}; do { time pgbench -p 5433 -h localhost -c 10 -j 10 -t 50000 testpgbench; } 2>&1 | tee -a pgstatsResultsDev-max-memory;done Test 4: branch dev-max-memory CFLAGS="-I/usr/include/python3.8/ " /home/rthompso/src/git/postgres/configure --silent --prefix=/home/rthompso/src/git/postgres/install/dev-max-memory/--with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6--with-perl --with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap--enable-nls make -s -j12 && make -s install initdb non-default postgresql.conf setting for max_total_backend_memory = 100MB init pgbench pgbench -U rthompso -p 5433 -h localhost -i -s 50 testpgbench 10 iterations for ctr in {1..10}; do { time pgbench -p 5433 -h localhost -c 10 -j 10 -t 50000 testpgbench; } 2>&1 | tee -a pgstatsResultsDev-max-memory100MB;done Laptop 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz 8 Cores 16 threads 32GB RAM SSD drive Averages from the 10 runs and tps difference over the 10 runs |------------------+------------------+------------------------+-------------------+------------------+-------------------+---------------+------------------| | Test Run | Master | Track Memory Allocated | Diff from Master | Max Mem off | Diff from Master | Max Mem 100MB | Diff from Master | | Set 1 | Test 1 | Test 2 | | Test 3 | | Test 4 | | | latency average | 2.43390909090909 | 2.44327272727273 | | 2.44381818181818 | | 2.6843 | | | tps inc conn est | 3398.99291372727 | 3385.40984336364 | -13.583070363637 | 3385.08184309091 | -13.9110706363631| 3729.5363413 | 330.54342757273 | | tps exc conn est | 3399.12185727273 | 3385.52527490909 | -13.5965823636366 | 3385.22100872727 | -13.9008485454547| 3729.7097607 | 330.58790342727 | |------------------+------------------+------------------------+-------------------+------------------+-------------------+---------------+------------------| | Set 2 | | | | | | | | | latency average | 2.691 | 2.6895 | 2 | 2.69 | 3 | 2.6827 | 4 | | tps inc conn est | 3719.56 | 3721.7587106 | 2.1987106 | 3720.3 | .74 | 3730.86 | 11.30 | | tps exc conn est | 3719.71 | 3721.9268465 | 2.2168465 | 3720.47 | .76 | 3731.02 | 11.31 | |------------------+------------------+------------------------+-------------------+------------------+-------------------+---------------+------------------| > I think it may be necessary to track the current allocation size in > shared memory (with atomic increments?). Maybe decrements would need > to > be exactly accounted for, or otherwise Assert() that the value is not > negative. I don't know how expensive it'd be to have conditionals > for > each decrement, but maybe the value would only be decremented at > strategic times, like at transaction commit or backend shutdown. > -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thompson@crunchydata.com www.crunchydata.com
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Thu, 2022-09-01 at 11:48 +0900, Kyotaro Horiguchi wrote: > > > > The patch seems to limit both of memory-context allocations and DSM > > allocations happen on a specific process by the same budget. In the > > fist place I don't think it's sensible to cap the amount of DSM > > allocations by per-process budget. > > > > DSM is used by pgstats subsystem. There can be cases where pgstat > > complains for denial of DSM allocation after the budget has been > > exhausted by memory-context allocations, or every command complains > > for denial of memory-context allocation after once the per-process > > budget is exhausted by DSM allocations. That doesn't seem > > reasonable. > > regards. It's intended as a mechanism for administrators to limit total postgresql memory consumption to avoid the OOM killer causing a crash and restart, or to ensure that resources are available for other processes on shared hosts, etc. It limits all types of allocations in order to accomplish this. Our documentation will note this, so that administrators that have the need to set it are aware that it can affect all non-auxiliary processes and what the effect is.
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Fri, 2022-09-02 at 09:30 +0200, Drouvot, Bertrand wrote: > Hi, > > I'm not sure we are choosing the right victims here (aka the ones > that are doing the request that will push the total over the limit). > > Imagine an extreme case where a single backend consumes say 99% of > the limit, shouldn't it be the one to be "punished"? (and somehow forced > to give the memory back). > > The problem that i see with the current approach is that a "bad" > backend could impact all the others and continue to do so. > > what about punishing say the highest consumer , what do you think? > (just speaking about the general idea here, not about the implementation) Initially, we believe that punishing the detector is reasonable if we can help administrators avoid the OOM killer/resource starvation. But we can and should expand on this idea. Another thought is, rather than just failing the query/transaction we have the affected backend do a clean exit, freeing all it's resources. -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thompson@crunchydata.com www.crunchydata.com
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Greetings, * David Rowley (dgrowleyml@gmail.com) wrote: > On Thu, 1 Sept 2022 at 04:52, Reid Thompson > <reid.thompson@crunchydata.com> wrote: > > Add the ability to limit the amount of memory that can be allocated to > > backends. > > Are you aware that relcache entries are stored in backend local memory > and that once we've added a relcache entry for a relation that we have > no current code which attempts to reduce the memory consumption used > by cache entries when there's memory pressure? Short answer to this is yes, and that's an issue, but it isn't this patch's problem to deal with- that's an issue that the relcache system needs to be changed to address. > It seems to me that if we had this feature as you propose that a > backend could hit the limit and stay there just from the memory > requirements of the relation cache after some number of tables have > been accessed from the given backend. It's not hard to imagine a > situation where the palloc() would start to fail during parse, which > might make it quite infuriating for anyone trying to do something > like: Agreed that this could happen but I don't imagine it to be super likely- and even if it does, this is probably a better position to be in as the backend could then be disconnected from and would then go away and its memory free'd, unlike the current OOM-killer situation where we crash and go through recovery. We should note this in the documentation though, sure, so that administrators understand how this can occur and can take action to address it. > I think a better solution to this problem would be to have "memory > grants", where we configure some amount of "pool" memory that backends > are allowed to use for queries. The planner would have to add the > expected number of work_mem that the given query is expected to use > and before that query starts, the executor would have to "checkout" > that amount of memory from the pool and return it when finished. If > there is not enough memory in the pool then the query would have to > wait until enough memory is available. This creates a deadlocking > hazard that the deadlock detector would need to be made aware of. Sure, that also sounds great and a query acceptance system would be wonderful. If someone is working on that with an expectation of it landing before v16, great. Otherwise, I don't see it as relevant to the question about if we should include this feature or not, and I'm not even sure that we'd refuse this feature even if we already had an acceptance system as a stop-gap should we guess wrong and not realize it until it's too late. Thanks, Stephen
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Sat, Sep 03, 2022 at 11:40:03PM -0400, Reid Thompson wrote: > > > + 0, 0, INT_MAX, > > > + NULL, NULL, NULL > > I think this needs a maximum like INT_MAX/1024/1024 > > Is this noting that we'd set a ceiling of 2048MB? The reason is that you're later multiplying it by 1024*1024, so you need to limit it to avoid overflowing. Compare with min_dynamic_shared_memory, Log_RotationSize, maintenance_work_mem, autovacuum_work_mem. typo: Explicitely + errmsg("request will exceed postgresql.conf defined max_total_backend_memory limit (%lu > %lu)", I wouldn't mention postgresql.conf - it could be in postgresql.auto.conf, or an include file, or a -c parameter. Suggest: allocation would exceed max_total_backend_memory limit... + ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0")); Suggest: deallocation would decrease backend memory below zero; + {"max_total_backend_memory", PGC_SIGHUP, RESOURCES_MEM, Should this be PGC_SU_BACKEND to allow a superuser to set a higher limit (or no limit)? There's compilation warning under mingw cross compile due to sizeof(long). See d914eb347 and other recent commits which I guess is the current way to handle this. http://cfbot.cputube.org/reid-thompson.html For performance test, you'd want to check what happens with a large number of max_connections (and maybe a large number of clients). TPS isn't the only thing that matters. For example, a utility command might sometimes do a lot of allocations (or deallocations), or a "parameterized nested loop" may loop over over many outer tuples and reset for each. There's also a lot of places that reset to a "per-tuple" context. I started looking at its performance, but nothing to show yet. Would you keep people copied on your replies ("reply all") ? Otherwise I (at least) may miss them. I think that's what's typical on these lists (and the list tool is smart enough not to send duplicates to people who are direct recipients). -- Justin
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Fri, 2022-09-09 at 12:14 -0500, Justin Pryzby wrote: > On Sat, Sep 03, 2022 at 11:40:03PM -0400, Reid Thompson wrote: > > > > + 0, 0, INT_MAX, > > > > + NULL, NULL, NULL > > > I think this needs a maximum like INT_MAX/1024/1024 > > > > Is this noting that we'd set a ceiling of 2048MB? > > The reason is that you're later multiplying it by 1024*1024, so you > need > to limit it to avoid overflowing. Compare with > min_dynamic_shared_memory, Log_RotationSize, maintenance_work_mem, > autovacuum_work_mem. What I originally attempted to implement is: GUC "max_total_backend_memory" max value as INT_MAX = 2147483647 MB (2251799812636672 bytes). And the other variables and comparisons as bytes represented as uint64 to avoid overflow. Is this invalid? > typo: Explicitely corrected > + errmsg("request will exceed postgresql.conf > defined max_total_backend_memory limit (%lu > %lu)", > > I wouldn't mention postgresql.conf - it could be in > postgresql.auto.conf, or an include file, or a -c parameter. > Suggest: allocation would exceed max_total_backend_memory limit... > updated > > + ereport(LOG, errmsg("decrease reduces reported > backend memory allocated below zero; setting reported to 0")); > > Suggest: deallocation would decrease backend memory below zero; updated > + {"max_total_backend_memory", PGC_SIGHUP, > RESOURCES_MEM, > > > > Should this be PGC_SU_BACKEND to allow a superuser to set a higher > limit (or no limit)? Sounds good to me. I'll update to that. Would PGC_SUSET be too open? > There's compilation warning under mingw cross compile due to > sizeof(long). See d914eb347 and other recent commits which I guess > is > the current way to handle this. > http://cfbot.cputube.org/reid-thompson.html updated %lu to %llu and changed cast from uint64 to unsigned long long in the ereport call > For performance test, you'd want to check what happens with a large > number of max_connections (and maybe a large number of clients). TPS > isn't the only thing that matters. For example, a utility command > might > sometimes do a lot of allocations (or deallocations), or a > "parameterized nested loop" may loop over over many outer tuples and > reset for each. There's also a lot of places that reset to a > "per-tuple" context. I started looking at its performance, but > nothing > to show yet. Thanks > Would you keep people copied on your replies ("reply all") ? > Otherwise > I (at least) may miss them. I think that's what's typical on these > lists (and the list tool is smart enough not to send duplicates to > people who are direct recipients). Ok - will do, thanks. -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thompson@crunchydata.com www.crunchydata.com
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Fri, 2022-09-09 at 12:14 -0500, Justin Pryzby wrote:
> On Sat, Sep 03, 2022 at 11:40:03PM -0400, Reid Thompson wrote:
> > > > + 0, 0, INT_MAX,
> > > > + NULL, NULL, NULL
> > > I think this needs a maximum like INT_MAX/1024/1024
> >
> > Is this noting that we'd set a ceiling of 2048MB?
>
> The reason is that you're later multiplying it by 1024*1024, so you
> need
> to limit it to avoid overflowing. Compare with
> min_dynamic_shared_memory, Log_RotationSize, maintenance_work_mem,
> autovacuum_work_mem.
What I originally attempted to implement is:
GUC "max_total_backend_memory" max value as INT_MAX = 2147483647 MB
(2251799812636672 bytes). And the other variables and comparisons as
bytes represented as uint64 to avoid overflow.
Is this invalid?
> typo: Explicitely
corrected
> + errmsg("request will exceed postgresql.conf
> defined max_total_backend_memory limit (%lu > %lu)",
>
> I wouldn't mention postgresql.conf - it could be in
> postgresql.auto.conf, or an include file, or a -c parameter.
> Suggest: allocation would exceed max_total_backend_memory limit...
>
updated
>
> + ereport(LOG, errmsg("decrease reduces reported
> backend memory allocated below zero; setting reported to 0"));
>
> Suggest: deallocation would decrease backend memory below zero;
updated
> + {"max_total_backend_memory", PGC_SIGHUP,
> RESOURCES_MEM,
>
>
>
> Should this be PGC_SU_BACKEND to allow a superuser to set a higher
> limit (or no limit)?
Sounds good to me. I'll update to that.
Would PGC_SUSET be too open?
> There's compilation warning under mingw cross compile due to
> sizeof(long). See d914eb347 and other recent commits which I guess
> is
> the current way to handle this.
> http://cfbot.cputube.org/reid-thompson.html
updated %lu to %llu and changed cast from uint64 to
unsigned long long in the ereport call
> For performance test, you'd want to check what happens with a large
> number of max_connections (and maybe a large number of clients). TPS
> isn't the only thing that matters. For example, a utility command
> might
> sometimes do a lot of allocations (or deallocations), or a
> "parameterized nested loop" may loop over over many outer tuples and
> reset for each. There's also a lot of places that reset to a
> "per-tuple" context. I started looking at its performance, but
> nothing
> to show yet.
Thanks
> Would you keep people copied on your replies ("reply all") ?
> Otherwise
> I (at least) may miss them. I think that's what's typical on these
> lists (and the list tool is smart enough not to send duplicates to
> people who are direct recipients).
Ok - will do, thanks.
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
patching file src/backend/utils/misc/guc.c Hunk #1 FAILED at 3664. 1 out of 1 hunk FAILED -- saving rejects to file src/backend/utils/misc/guc.c.rej
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Thu, 2022-09-15 at 12:07 +0400, Ibrar Ahmed wrote: > > The patch does not apply; please rebase the patch. > > patching file src/backend/utils/misc/guc.c > Hunk #1 FAILED at 3664. > 1 out of 1 hunk FAILED -- saving rejects to file > src/backend/utils/misc/guc.c.rej > > patching file src/backend/utils/misc/postgresql.conf.sample > rebased patches attached. Thanks, Reid
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hello Reid,
could you rebase the patch again? It doesn't apply currently (http://cfbot.cputube.org/patch_40_3867.log). Thanks!
You mention, that you want to prevent the compiler from getting cute.
I don't think this comments are exactly helpful in the current state. I think probably fine to just omit them.
I don't understand the purpose of the result variable in exceeds_max_total_bkend_mem. What purpose does it serve?
I really like the simplicity of the suggestion here to prevent oom.
Regards
Arne
Sent: Thursday, September 15, 2022 4:58:19 PM
To: Ibrar Ahmed; pgsql-hackers@lists.postgresql.org
Cc: reid.thompson@crunchydata.com; Justin Pryzby
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.
>
> The patch does not apply; please rebase the patch.
>
> patching file src/backend/utils/misc/guc.c
> Hunk #1 FAILED at 3664.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/backend/utils/misc/guc.c.rej
>
> patching file src/backend/utils/misc/postgresql.conf.sample
>
rebased patches attached.
Thanks,
Reid
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi Arne, On Mon, 2022-10-24 at 15:27 +0000, Arne Roland wrote: > Hello Reid, > > could you rebase the patch again? It doesn't apply currently > (http://cfbot.cputube.org/patch_40_3867.log). Thanks! rebased patches attached. > You mention, that you want to prevent the compiler from getting > cute.I don't think this comments are exactly helpful in the current > state. I think probably fine to just omit them. I attempted to follow previous convention when adding code and these comments have been consistently applied throughout backend_status.c where a volatile pointer is being used. > I don't understand the purpose of the result variable in > exceeds_max_total_bkend_mem. What purpose does it serve? > > I really like the simplicity of the suggestion here to prevent oom. If max_total_backend_memory is configured, exceeds_max_total_bkend_mem() will return true if an allocation request will push total backend memory allocated over the configured value. exceeds_max_total_bkend_mem() is implemented in the various allocators along the lines of ...snip... /* Do not exceed maximum allowed memory allocation */ if (exceeds_max_total_bkend_mem('new request size')) return NULL; ...snip... Do not allocate the memory requested, return NULL instead. PG already had code in place to handle NULL returns from allocation requests. The allocation code in aset.c, slab.c, generation.c, dsm_impl.c utilizes exceeds_max_total_bkend_mem() max_total_backend_memory (integer) Specifies a limit to the amount of memory (MB) that may be allocated to backends in total (i.e. this is not a per user or per backend limit). If unset, or set to 0 it is disabled. A backend request that would push the total over the limit will be denied with an out of memory error causing that backend's current query/transaction to fail. Due to the dynamic nature of memory allocations, this limit is not exact. If within 1.5MB of the limit and two backends request 1MB each at the same time both may be allocated, and exceed the limit. Further requests will not be allocated until dropping below the limit. Keep this in mind when setting this value. This limit does not affect auxiliary backend processes Auxiliary process . Backend memory allocations (backend_mem_allocated) are displayed in the pg_stat_activity view. > I intent to play around with a lot of backends, once I get a rebased > patch. > > Regards > Arne
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Tue, 2022-10-25 at 11:49 -0400, Reid Thompson wrote: > Hi Arne, > > On Mon, 2022-10-24 at 15:27 +0000, Arne Roland wrote: > > Hello Reid, > > > > could you rebase the patch again? It doesn't apply currently > > (http://cfbot.cputube.org/patch_40_3867.log). Thanks! > > rebased patches attached. Rebased to current. Add a couple changes per conversation with D Christensen (include units in field name, group field with backend_xid and backend_xmin fields in pg_stat_activity view, rather than between query_id and query) -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thompson@crunchydata.com www.crunchydata.com
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Thu, 2022-11-03 at 11:48 -0400, Reid Thompson wrote: > On Tue, 2022-10-25 at 11:49 -0400, Reid Thompson wrote: > > Rebased to current. Add a couple changes per conversation with D > Christensen (include units in field name, group field with > backend_xid > and backend_xmin fields in pg_stat_activity view, rather than between > query_id and query) > rebased/patched to current master && current pg-stat-activity-backend-memory-allocated -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thompson@crunchydata.com www.crunchydata.com
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2022-11-26 22:22:15 -0500, Reid Thompson wrote: > rebased/patched to current master && current pg-stat-activity-backend-memory-allocated This version fails to build with msvc, and builds with warnings on other platforms. https://cirrus-ci.com/build/5410696721072128 msvc: [20:26:51.286] c:\cirrus\src\include\utils/backend_status.h(40): error C2059: syntax error: 'constant' mingw cross: [20:26:26.358] from /usr/share/mingw-w64/include/winsock2.h:23, [20:26:26.358] from ../../src/include/port/win32_port.h:60, [20:26:26.358] from ../../src/include/port.h:24, [20:26:26.358] from ../../src/include/c.h:1306, [20:26:26.358] from ../../src/include/postgres.h:47, [20:26:26.358] from controldata_utils.c:18: [20:26:26.358] ../../src/include/utils/backend_status.h:40:2: error: expected identifier before numeric constant [20:26:26.358] 40 | IGNORE, [20:26:26.358] | ^~~~~~ [20:26:26.358] In file included from ../../src/include/postgres.h:48, [20:26:26.358] from controldata_utils.c:18: [20:26:26.358] ../../src/include/utils/backend_status.h: In function ‘pgstat_report_allocated_bytes’: [20:26:26.358] ../../src/include/utils/backend_status.h:365:12: error: format ‘%ld’ expects argument of type ‘long int’,but argument 3 has type ‘uint64’ {aka ‘long long unsigned int’} [-Werror=format=] [20:26:26.358] 365 | errmsg("Backend %d deallocated %ld bytes, exceeding the %ld bytes it is currently reporting allocated.Setting reported to 0.", [20:26:26.358] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [20:26:26.358] 366 | MyProcPid, allocated_bytes, *my_allocated_bytes)); [20:26:26.358] | ~~~~~~~~~~~~~~~ [20:26:26.358] | | [20:26:26.358] | uint64 {aka long long unsigned int} Due to windows having long be 32bit, you need to use %lld. Our custom to deal with that is to cast the argument to errmsg as long long unsigned and use %llu. Btw, given that the argument is uint64, it doesn't seem correct to use %ld, that's signed. Not that it's going to matter, but ... Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Tue, 2022-12-06 at 10:32 -0800, Andres Freund wrote: > Hi, > > On 2022-11-26 22:22:15 -0500, Reid Thompson wrote: > > rebased/patched to current master && current pg-stat-activity- > > backend-memory-allocated > > This version fails to build with msvc, and builds with warnings on > other > platforms. > https://cirrus-ci.com/build/5410696721072128 > msvc: > > Andres Freund updated patches -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thompson@crunchydata.com www.crunchydata.com
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Fri, 9 Dec 2022 at 20:41, Reid Thompson <reid.thompson@crunchydata.com> wrote: > > On Tue, 2022-12-06 at 10:32 -0800, Andres Freund wrote: > > Hi, > > > > On 2022-11-26 22:22:15 -0500, Reid Thompson wrote: > > > rebased/patched to current master && current pg-stat-activity- > > > backend-memory-allocated > > > > This version fails to build with msvc, and builds with warnings on > > other > > platforms. > > https://cirrus-ci.com/build/5410696721072128 > > msvc: > > > > Andres Freund > > updated patches The patch does not apply on top of HEAD as in [1], please post a rebased patch: === Applying patches on top of PostgreSQL commit ID 92957ed98c5c565362ce665266132a7f08f6b0c0 === === applying patch ./0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patch ... patching file src/backend/utils/mmgr/slab.c Hunk #1 succeeded at 69 (offset 16 lines). Hunk #2 succeeded at 414 (offset 175 lines). Hunk #3 succeeded at 436 with fuzz 2 (offset 176 lines). Hunk #4 FAILED at 286. Hunk #5 succeeded at 488 (offset 186 lines). Hunk #6 FAILED at 381. Hunk #7 FAILED at 554. 3 out of 7 hunks FAILED -- saving rejects to file src/backend/utils/mmgr/slab.c.rej [1] - http://cfbot.cputube.org/patch_41_3867.log Regards, Vignesh
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Tue, 2023-01-03 at 16:22 +0530, vignesh C wrote: > .... > The patch does not apply on top of HEAD as in [1], please post a > rebased patch: > ... > Regards, > Vignesh > Attached is rebased patch, with some updates related to committed changes. Thanks, Reid -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thompson@crunchydata.com www.crunchydata.com
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2023-01-05 13:44:20 -0500, Reid Thompson wrote: > From 0a6b152e0559a250dddd33bd7d43eb0959432e0d Mon Sep 17 00:00:00 2001 > From: Reid Thompson <jreidthompson@nc.rr.com> > Date: Thu, 11 Aug 2022 12:01:25 -0400 > Subject: [PATCH 1/2] Add tracking of backend memory allocated to > pg_stat_activity > > This new field displays the current bytes of memory allocated to the > backend process. It is updated as memory for the process is > malloc'd/free'd. Memory allocated to items on the freelist is included in > the displayed value. It doesn't actually malloc/free. It tracks palloc/pfree. > Dynamic shared memory allocations are included only in the value displayed > for the backend that created them, they are not included in the value for > backends that are attached to them to avoid double counting. As mentioned before, I don't think accounting DSM this way makes sense. > --- a/src/backend/postmaster/autovacuum.c > +++ b/src/backend/postmaster/autovacuum.c > @@ -407,6 +407,9 @@ StartAutoVacLauncher(void) > > #ifndef EXEC_BACKEND > case 0: > + /* Zero allocated bytes to avoid double counting parent allocation */ > + pgstat_zero_my_allocated_bytes(); > + > /* in postmaster child ... */ > InitPostmasterChild(); > @@ -1485,6 +1488,9 @@ StartAutoVacWorker(void) > > #ifndef EXEC_BACKEND > case 0: > + /* Zero allocated bytes to avoid double counting parent allocation */ > + pgstat_zero_my_allocated_bytes(); > + > /* in postmaster child ... */ > InitPostmasterChild(); > > diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c > index eac3450774..24278e5c18 100644 > --- a/src/backend/postmaster/postmaster.c > +++ b/src/backend/postmaster/postmaster.c > @@ -4102,6 +4102,9 @@ BackendStartup(Port *port) > { > free(bn); > > + /* Zero allocated bytes to avoid double counting parent allocation */ > + pgstat_zero_my_allocated_bytes(); > + > /* Detangle from postmaster */ > InitPostmasterChild(); It doesn't at all seem right to call pgstat_zero_my_allocated_bytes() here, before even InitPostmasterChild() is called. Nor does it seem right to add the call to so many places. Note that this is before we even delete postmaster's memory, see e.g.: /* * If the PostmasterContext is still around, recycle the space; we don't * need it anymore after InitPostgres completes. Note this does not trash * *MyProcPort, because ConnCreate() allocated that space with malloc() * ... else we'd need to copy the Port data first. Also, subsidiary data * such as the username isn't lost either; see ProcessStartupPacket(). */ if (PostmasterContext) { MemoryContextDelete(PostmasterContext); PostmasterContext = NULL; } calling pgstat_zero_my_allocated_bytes() before we do this will lead to undercounting memory usage, afaict. > +/* Enum helper for reporting memory allocated bytes */ > +enum allocation_direction > +{ > + PG_ALLOC_DECREASE = -1, > + PG_ALLOC_IGNORE, > + PG_ALLOC_INCREASE, > +}; What's the point of this? > +/* ---------- > + * pgstat_report_allocated_bytes() - > + * > + * Called to report change in memory allocated for this backend. > + * > + * my_allocated_bytes initially points to local memory, making it safe to call > + * this before pgstats has been initialized. allocation_direction is a > + * positive/negative multiplier enum defined above. > + * ---------- > + */ > +static inline void > +pgstat_report_allocated_bytes(int64 allocated_bytes, int allocation_direction) I don't think this should take allocation_direction as a parameter, I'd make it two different functions. > +{ > + uint64 temp; > + > + /* > + * Avoid *my_allocated_bytes unsigned integer overflow on > + * PG_ALLOC_DECREASE > + */ > + if (allocation_direction == PG_ALLOC_DECREASE && > + pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp)) > + { > + *my_allocated_bytes = 0; > + ereport(LOG, > + errmsg("Backend %d deallocated %lld bytes, exceeding the %llu bytes it is currently reporting allocated.Setting reported to 0.", > + MyProcPid, (long long) allocated_bytes, > + (unsigned long long) *my_allocated_bytes)); We certainly shouldn't have an ereport in here. This stuff really needs to be cheap. > + } > + else > + *my_allocated_bytes += (allocated_bytes) * allocation_direction; Superfluous parens? > +/* ---------- > + * pgstat_get_all_memory_allocated() - > + * > + * Return a uint64 representing the current shared memory allocated to all > + * backends. This looks directly at the BackendStatusArray, and so will > + * provide current information regardless of the age of our transaction's > + * snapshot of the status array. > + * In the future we will likely utilize additional values - perhaps limit > + * backend allocation by user/role, etc. > + * ---------- > + */ > +uint64 > +pgstat_get_all_backend_memory_allocated(void) > +{ > + PgBackendStatus *beentry; > + int i; > + uint64 all_memory_allocated = 0; > + > + beentry = BackendStatusArray; > + > + /* > + * We probably shouldn't get here before shared memory has been set up, > + * but be safe. > + */ > + if (beentry == NULL || BackendActivityBuffer == NULL) > + return 0; > + > + /* > + * We include AUX procs in all backend memory calculation > + */ > + for (i = 1; i <= NumBackendStatSlots; i++) > + { > + /* > + * We use a volatile pointer here to ensure the compiler doesn't try > + * to get cute. > + */ > + volatile PgBackendStatus *vbeentry = beentry; > + bool found; > + uint64 allocated_bytes = 0; > + > + for (;;) > + { > + int before_changecount; > + int after_changecount; > + > + pgstat_begin_read_activity(vbeentry, before_changecount); > + > + /* > + * Ignore invalid entries, which may contain invalid data. > + * See pgstat_beshutdown_hook() > + */ > + if (vbeentry->st_procpid > 0) > + allocated_bytes = vbeentry->allocated_bytes; > + > + pgstat_end_read_activity(vbeentry, after_changecount); > + > + if ((found = pgstat_read_activity_complete(before_changecount, > + after_changecount))) > + break; > + > + /* Make sure we can break out of loop if stuck... */ > + CHECK_FOR_INTERRUPTS(); > + } > + > + if (found) > + all_memory_allocated += allocated_bytes; > + > + beentry++; > + } > + > + return all_memory_allocated; > +} > + > +/* > + * Determine if allocation request will exceed max backend memory allowed. > + * Do not apply to auxiliary processes. > + */ > +bool > +exceeds_max_total_bkend_mem(uint64 allocation_request) > +{ > + bool result = false; > + > + /* Exclude auxiliary processes from the check */ > + if (MyAuxProcType != NotAnAuxProcess) > + return result; > + > + /* Convert max_total_bkend_mem to bytes for comparison */ > + if (max_total_bkend_mem && > + pgstat_get_all_backend_memory_allocated() + > + allocation_request > (uint64) max_total_bkend_mem * 1024 * 1024) > + { > + /* > + * Explicitly identify the OOM being a result of this configuration > + * parameter vs a system failure to allocate OOM. > + */ > + ereport(WARNING, > + errmsg("allocation would exceed max_total_memory limit (%llu > %llu)", > + (unsigned long long) pgstat_get_all_backend_memory_allocated() + > + allocation_request, (unsigned long long) max_total_bkend_mem * 1024 * 1024)); > + > + result = true; > + } I think it's completely unfeasible to execute something as expensive as pgstat_get_all_backend_memory_allocated() on every allocation. Like, seriously, no. And we absolutely definitely shouldn't just add CHECK_FOR_INTERRUPT() calls into the middle of allocator code. Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Mon, 2023-01-09 at 18:31 -0800, Andres Freund wrote: > Hi, > > On 2023-01-05 13:44:20 -0500, Reid Thompson wrote: > > This new field displays the current bytes of memory allocated to the > > backend process. It is updated as memory for the process is > > malloc'd/free'd. Memory allocated to items on the freelist is included in > > the displayed value. > > It doesn't actually malloc/free. It tracks palloc/pfree. I will update the message > > > Dynamic shared memory allocations are included only in the value displayed > > for the backend that created them, they are not included in the value for > > backends that are attached to them to avoid double counting. > > As mentioned before, I don't think accounting DSM this way makes sense. Understood, previously you noted 'There are a few uses of DSMs that track shared resources, with the biggest likely being the stats for relations etc'. I'd like to come up with a solution to address this; identifying the long term allocations to shared state and accounting for them such that they don't get 'lost' when the allocating backend exits. Any guidance or direction would be appreciated. > > --- a/src/backend/postmaster/autovacuum.c > > +++ b/src/backend/postmaster/autovacuum.c > > @@ -407,6 +407,9 @@ StartAutoVacLauncher(void) > > > > #ifndef EXEC_BACKEND > > case 0: > > + /* Zero allocated bytes to avoid double counting parent allocation */ > > + pgstat_zero_my_allocated_bytes(); > > + > > /* in postmaster child ... */ > > InitPostmasterChild(); > > > > > @@ -1485,6 +1488,9 @@ StartAutoVacWorker(void) > > > > #ifndef EXEC_BACKEND > > case 0: > > + /* Zero allocated bytes to avoid double counting parent allocation */ > > + pgstat_zero_my_allocated_bytes(); > > + > > /* in postmaster child ... */ > > InitPostmasterChild(); > > > > diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c > > index eac3450774..24278e5c18 100644 > > --- a/src/backend/postmaster/postmaster.c > > +++ b/src/backend/postmaster/postmaster.c > > @@ -4102,6 +4102,9 @@ BackendStartup(Port *port) > > { > > free(bn); > > > > + /* Zero allocated bytes to avoid double counting parent allocation */ > > + pgstat_zero_my_allocated_bytes(); > > + > > /* Detangle from postmaster */ > > InitPostmasterChild(); > > > It doesn't at all seem right to call pgstat_zero_my_allocated_bytes() here, > before even InitPostmasterChild() is called. Nor does it seem right to add the > call to so many places. > > Note that this is before we even delete postmaster's memory, see e.g.: > /* > * If the PostmasterContext is still around, recycle the space; we don't > * need it anymore after InitPostgres completes. Note this does not trash > * *MyProcPort, because ConnCreate() allocated that space with malloc() > * ... else we'd need to copy the Port data first. Also, subsidiary data > * such as the username isn't lost either; see ProcessStartupPacket(). > */ > if (PostmasterContext) > { > MemoryContextDelete(PostmasterContext); > PostmasterContext = NULL; > } > > calling pgstat_zero_my_allocated_bytes() before we do this will lead to > undercounting memory usage, afaict. > OK - I'll trace back through these and see if I can better locate and reduce the number of invocations. > > +/* Enum helper for reporting memory allocated bytes */ > > +enum allocation_direction > > +{ > > + PG_ALLOC_DECREASE = -1, > > + PG_ALLOC_IGNORE, > > + PG_ALLOC_INCREASE, > > +}; > > What's the point of this? > > > > +/* ---------- > > + * pgstat_report_allocated_bytes() - > > + * > > + * Called to report change in memory allocated for this backend. > > + * > > + * my_allocated_bytes initially points to local memory, making it safe to call > > + * this before pgstats has been initialized. allocation_direction is a > > + * positive/negative multiplier enum defined above. > > + * ---------- > > + */ > > +static inline void > > +pgstat_report_allocated_bytes(int64 allocated_bytes, int allocation_direction) > > I don't think this should take allocation_direction as a parameter, I'd make > it two different functions. Originally it was two functions, a suggestion was made in the thread to maybe consolidate them to a single function with a direction indicator, hence the above. I'm fine with converting it back to separate functions. > > > + if (allocation_direction == PG_ALLOC_DECREASE && > > + pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp)) > > + { > > + *my_allocated_bytes = 0; > > + ereport(LOG, > > + errmsg("Backend %d deallocated %lld bytes, exceeding the %llu bytes it is currently reporting allocated.Setting reported to 0.", > > + MyProcPid, (long long) allocated_bytes, > > + (unsigned long long) *my_allocated_bytes)); > > We certainly shouldn't have an ereport in here. This stuff really needs to be > cheap. I will remove the ereport. > > > + *my_allocated_bytes += (allocated_bytes) * allocation_direction; > > Superfluous parens? I will remove these. > > > > +/* ---------- > > + * pgstat_get_all_memory_allocated() - > > + * > > + * Return a uint64 representing the current shared memory allocated to all > > + * backends. This looks directly at the BackendStatusArray, and so will > > + * provide current information regardless of the age of our transaction's > > + * snapshot of the status array. > > + * In the future we will likely utilize additional values - perhaps limit > > + * backend allocation by user/role, etc. > > + * ---------- > > I think it's completely unfeasible to execute something as expensive as > pgstat_get_all_backend_memory_allocated() on every allocation. Like, > seriously, no. Ok. Do we check every nth allocation/try to implement a scheme of checking more often as we we get closer to the declared max_total_bkend_mem? > > And we absolutely definitely shouldn't just add CHECK_FOR_INTERRUPT() calls > into the middle of allocator code. I'm open to guidance/suggestions/pointers to remedying these. > Greetings, > > Andres Freund > Thanks, Reid
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2023-01-13 09:15:10 -0500, Reid Thompson wrote: > On Mon, 2023-01-09 at 18:31 -0800, Andres Freund wrote: > > > Dynamic shared memory allocations are included only in the value displayed > > > for the backend that created them, they are not included in the value for > > > backends that are attached to them to avoid double counting. > > > > As mentioned before, I don't think accounting DSM this way makes sense. > > Understood, previously you noted 'There are a few uses of DSMs that track > shared resources, with the biggest likely being the stats for relations > etc'. I'd like to come up with a solution to address this; identifying the > long term allocations to shared state and accounting for them such that they > don't get 'lost' when the allocating backend exits. Any guidance or > direction would be appreciated. Tracking it as backend memory usage doesn't seem helpful to me, particularly because some of it is for server wide data tracking (pgstats, some caches). But that doesn't mean you couldn't track and report it separately. > > > +/* ---------- > > > + * pgstat_get_all_memory_allocated() - > > > + * > > > + * Return a uint64 representing the current shared memory allocated to all > > > + * backends. This looks directly at the BackendStatusArray, and so will > > > + * provide current information regardless of the age of our transaction's > > > + * snapshot of the status array. > > > + * In the future we will likely utilize additional values - perhaps limit > > > + * backend allocation by user/role, etc. > > > + * ---------- > > > > I think it's completely unfeasible to execute something as expensive as > > pgstat_get_all_backend_memory_allocated() on every allocation. Like, > > seriously, no. > > Ok. Do we check every nth allocation/try to implement a scheme of checking > more often as we we get closer to the declared max_total_bkend_mem? I think it's just not acceptable to do O(connections) work as part of something critical as memory allocation. Even if amortized imo. What you could do is to have a single, imprecise, shared counter for the total memory allocation, and have a backend-local "allowance". When the allowance is used up, refill it from the shared counter (a single atomic op). But honestly, I think we first need to have the accounting for a while before it makes sense to go for the memory limiting patch. And I doubt a single GUC will suffice to make this usable. > > And we absolutely definitely shouldn't just add CHECK_FOR_INTERRUPT() calls > > into the middle of allocator code. > > I'm open to guidance/suggestions/pointers to remedying these. Well, just don't have the CHECK_FOR_INTERRUPT(). Nor the O(N) operation. You also can't do the ereport(WARNING) there, that itself allocates memory, and could lead to recursion in some edge cases. Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Fri, 6 Jan 2023 at 00:19, Reid Thompson <reid.thompson@crunchydata.com> wrote: > > On Tue, 2023-01-03 at 16:22 +0530, vignesh C wrote: > > .... > > The patch does not apply on top of HEAD as in [1], please post a > > rebased patch: > > ... > > Regards, > > Vignesh > > > > Attached is rebased patch, with some updates related to committed changes. The patch does not apply on top of HEAD as in [1], please post a rebased patch: === Applying patches on top of PostgreSQL commit ID 48880840f18cb75fcaecc77b5e7816b92c27157b === === applying patch ./0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patch .... patching file src/test/regress/expected/rules.out Hunk #2 FAILED at 1875. Hunk #4 FAILED at 2090. 2 out of 4 hunks FAILED -- saving rejects to file src/test/regress/expected/rules.out.rej [1] - http://cfbot.cputube.org/patch_41_3867.log Regards, Vignesh
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Thu, 2023-01-19 at 16:50 +0530, vignesh C wrote: > > The patch does not apply on top of HEAD as in [1], please post a rebased patch: > > Regards, > Vignesh rebased patch attached Thanks, Reid
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2023-01-23 10:48:38 -0500, Reid Thompson wrote: > On Thu, 2023-01-19 at 16:50 +0530, vignesh C wrote: > > > > The patch does not apply on top of HEAD as in [1], please post a rebased patch: > > > > Regards, > > Vignesh > > rebased patch attached I think it's basically still waiting on author, until the O(N) cost is gone from the overflow limit check. Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Mon, 2023-01-23 at 12:31 -0800, Andres Freund wrote: > Hi, > > I think it's basically still waiting on author, until the O(N) cost is gone > from the overflow limit check. > > Greetings, > > Andres Freund Yes, just a rebase. There is still work to be done per earlier in the thread. I do want to follow up and note re palloc/pfree vs malloc/free that the tracking code (0001-Add-tracking-...) is not tracking palloc/pfree but is explicitely tracking malloc/free. Not every palloc/pfree call executes the tracking code, only those where the path followed includes malloc() or free(). Routine palloc() calls fulfilled from the context's freelist/emptyblocks/freeblock/etc and pfree() calls not invoking free() avoid the tracking code. Thanks, Reid
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Regarding the shared counter noted here, > What you could do is to have a single, imprecise, shared counter for the total > memory allocation, and have a backend-local "allowance". When the allowance is > used up, refill it from the shared counter (a single atomic op). Is there a preferred or suggested location to put variables like this? Perhaps a current variable to use as a reference? Thanks, Reid
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2023-01-26 15:27:20 -0500, Reid Thompson wrote: > Yes, just a rebase. There is still work to be done per earlier in the > thread. The tests recently started to fail: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3867 > I do want to follow up and note re palloc/pfree vs malloc/free that the > tracking code (0001-Add-tracking-...) is not tracking palloc/pfree but is > explicitely tracking malloc/free. Not every palloc/pfree call executes the > tracking code, only those where the path followed includes malloc() or > free(). Routine palloc() calls fulfilled from the context's > freelist/emptyblocks/freeblock/etc and pfree() calls not invoking free() > avoid the tracking code. Sure, but we create a lot of memory contexts, so that's not a whole lot of comfort. I marked this as waiting on author. Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Mon, 2023-02-13 at 16:26 -0800, Andres Freund wrote: > Hi, > > The tests recently started to fail: > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3867 > > I marked this as waiting on author. > > Greetings, > > Andres Freund Patch has been rebased to master. The memory limiting portion (patch 0002-*) has been refactored to utilize a shared counter for total memory allocation along with backend-local allowances that are initialized at process startup and refilled from the central counter upon being used up. Free'd memory is accumulated and returned to the shared counter upon meeting a threshold and/or upon process exit. At this point arbitrarily picked 1MB as the initial allowance and return threshold. Thanks, Reid
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 2023-03-02 14:41:26 -0500, reid.thompson@crunchydata.com wrote: > Patch has been rebased to master. Quite a few prior review comments seem to not have been addressed. There's not much point in posting new versions without that. I think there's zero chance 0002 can make it into 16. If 0001 is cleaned up, I can see a path.
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Updated patches attached. ==================================================================== pg-stat-activity-backend-memory-allocated ==================================================================== DSM allocations created by a process and not destroyed prior to it's exit are considered long lived and are tracked in global_dsm_allocated_bytes. created 2 new system views (see below): pg_stat_global_memory_allocation view displays datid, shared_memory_size, shared_memory_size_in_huge_pages, global_dsm_allocated_bytes. shared_memory_size and shared_memory_size_in_huge_pages display the calculated read only values for these GUCs. pg_stat_memory_allocation view Migrated allocated_bytes out of pg_stat_activity view into this view. pg_stat_memory_allocation also contains a breakdown of allocation by allocator type (aset, dsm, generation, slab). View displays datid, pid, allocated_bytes, aset_allocated_bytes, dsm_allocated_bytes, generation_allocated_bytes, slab_allocated_bytes by process. Reduced calls to initialize allocation counters by moving intialization call into InitPostmasterChild. postgres=# select * from pg_stat_global_memory_allocation; datid | shared_memory_size | shared_memory_size_in_huge_pages | global_dsm_allocated_bytes -------+--------------------+----------------------------------+---------------------------- 5 | 192MB | 96 | 1048576 (1 row) postgres=# select * from pg_stat_memory_allocation; datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes -------+--------+-----------------+----------------------+---------------------+----------------------------+---------------------- | 981842 | 771512 | 771512 | 0 | 0 | 0 | 981843 | 736696 | 736696 | 0 | 0 | 0 5 | 981913 | 4274792 | 4274792 | 0 | 0 | 0 | 981838 | 107216 | 107216 | 0 | 0 | 0 | 981837 | 123600 | 123600 | 0 | 0 | 0 | 981841 | 107216 | 107216 | 0 | 0 | 0 (6 rows) postgres=# select ps.datid, ps.pid, state,application_name,backend_type, pa.* from pg_stat_activity ps join pg_stat_memory_allocationpa on (pa.pid = ps.pid) order by dsm_allocated_bytes, pa.pid; datid | pid | state | application_name | backend_type | datid | pid | allocated_bytes | aset_allocated_bytes| dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes -------+--------+--------+------------------+------------------------------+-------+--------+-----------------+----------------------+---------------------+----------------------------+---------------------- | 981837 | | | checkpointer | | 981837 | 123600 | 123600 | 0 | 0 | 0 | 981838 | | | background writer | | 981838 | 107216 | 107216 | 0 | 0 | 0 | 981841 | | | walwriter | | 981841 | 107216 | 107216 | 0 | 0 | 0 | 981842 | | | autovacuum launcher | | 981842 | 771512 | 771512 | 0 | 0 | 0 | 981843 | | | logical replication launcher | | 981843 | 736696 | 736696 | 0 | 0 | 0 5 | 981913 | active | psql | client backend | 5 | 981913 | 5390864 | 5382824 | 0 | 8040 | 0 (6 rows) ==================================================================== dev-max-memory ==================================================================== Include shared_memory_size in max_total_backend_memory calculations. max_total_backend_memory is reduced by shared_memory_size at startup. Local allowance is refilled when consumed from global max_total_bkend_mem_bytes_available. pg_stat_global_memory_allocation view add columns max_total_backend_memory_bytes, max_total_bkend_mem_bytes_available. max_total_backend_memory_bytes displays a byte representation of max_total_backend_memory. max_total_bkend_mem_bytes_available tracks the balance of max_total_backend_memory_bytes available to backend processes. postgres=# select * from pg_stat_global_memory_allocation; datid | shared_memory_size | shared_memory_size_in_huge_pages | max_total_backend_memory_bytes | max_total_bkend_mem_bytes_available| global_dsm_allocated_bytes -------+--------------------+----------------------------------+--------------------------------+-------------------------------------+---------------------------- 5 | 192MB | 96 | 2147483648 | 1874633712 | 5242880 (1 row) postgres=# select * from pg_stat_memory_allocation ; datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes -------+--------+-----------------+----------------------+---------------------+----------------------------+---------------------- | 534528 | 812472 | 812472 | 0 | 0 | 0 | 534529 | 736696 | 736696 | 0 | 0 | 0 5 | 556271 | 4458088 | 4458088 | 0 | 0 | 0 5 | 534942 | 1298680 | 1298680 | 0 | 0 | 0 5 | 709283 | 7985464 | 7985464 | 0 | 0 | 0 5 | 718693 | 8809240 | 8612504 | 196736 | 0 | 0 5 | 752113 | 25803192 | 25803192 | 0 | 0 | 0 5 | 659886 | 9042232 | 9042232 | 0 | 0 | 0 | 534525 | 2491088 | 2491088 | 0 | 0 | 0 | 534524 | 4465360 | 4465360 | 0 | 0 | 0 | 534527 | 107216 | 107216 | 0 | 0 | 0 (11 rows) postgres=# select ps.datid, ps.pid, state,application_name,backend_type, pa.* from pg_stat_activity ps join pg_stat_memory_allocationpa on (pa.pid = ps.pid) order by dsm_allocated_bytes, pa.pid; datid | pid | state | application_name | backend_type | datid | pid | allocated_bytes | aset_allocated_bytes| dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes -------+--------+--------+------------------+------------------------------+-------+--------+-----------------+----------------------+---------------------+----------------------------+---------------------- | 534524 | | | checkpointer | | 534524 | 4465360 | 4465360 | 0 | 0 | 0 | 534525 | | | background writer | | 534525 | 2491088 | 2491088 | 0 | 0 | 0 | 534527 | | | walwriter | | 534527 | 107216 | 107216 | 0 | 0 | 0 | 534528 | | | autovacuum launcher | | 534528 | 812472 | 812472 | 0 | 0 | 0 | 534529 | | | logical replication launcher | | 534529 | 736696 | 736696 | 0 | 0 | 0 5 | 534942 | idle | psql | client backend | 5 | 534942 | 1298680 | 1298680 | 0 | 0 | 0 5 | 556271 | active | psql | client backend | 5 | 556271 | 4866576 | 4858536 | 0 | 8040 | 0 5 | 659886 | active | | autovacuum worker | 5 | 659886 | 8993080 | 8993080 | 0 | 0 | 0 5 | 709283 | active | | autovacuum worker | 5 | 709283 | 7928120 | 7928120 | 0 | 0 | 0 5 | 752113 | active | | autovacuum worker | 5 | 752113 | 27935608 | 27935608 | 0 | 0 | 0 5 | 718693 | active | psql | client backend | 5 | 718693 | 8669976 | 8473240 | 196736 | 0 | 0 (11 rows)
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Updated patches attached. Rebased to current master. Added additional columns to pg_stat_global_memory_allocation to summarize backend allocations by type. Updated documentation. Corrected some issues noted in review by John Morris. Added code re EXEC_BACKEND for dev-max-memory branch.
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Wed, 2023-04-19 at 23:28 +0000, Arne Roland wrote: > > Thank you! I just tried our benchmark and got a performance > > degration > of around 28 %, which is way better than the last > > patch. > > > > The simple query select * from generate_series(0, 10000000) shows > > > roughly 18.9 % degradation on my test server. > > > > By raising initial_allocation_allowance and > > > allocation_allowance_refill_qty I can get it to 16 % degradation. > > So > most of the degradation seems to be independent from raising > > the > allowance. > > > > I think we probably should investigate this further. > > > > Regards > > Arne > > Hi Arne, Thanks for the feedback. I'm plannning to look at this. Is your benchmark something that I could utilize? I.E. is it a set of scripts or a standard test from somewhere that I can duplicate? Thanks, Reid
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Wed, 2023-05-17 at 23:07 -0400, reid.thompson@crunchydata.com wrote: > Thanks for the feedback. > > I'm plannning to look at this. > > Is your benchmark something that I could utilize? I.E. is it a set of > scripts or a standard test from somewhere that I can duplicate? > > Thanks, > Reid > Hi Arne, Followup to the above. I experimented on my system regarding "The simple query select * from generate_series(0, 10000000) shows roughly 18.9 % degradation on my test server." My laptop: 32GB ram 11th Gen Intel(R) Core(TM) i7-11850H 8 cores/16 threads @ 2.50GHz (Max Turbo Frequency. 4.80 GHz ; Cache. 24 MB) SSD -> Model: KXG60ZNV1T02 NVMe KIOXIA 1024GB (nvme) I updated to latest master and rebased my patch branches. I wrote a script to check out, build, install, init, and startup master, patch 1, patch 1+2, patch 1+2 as master, pg-stats-memory, dev-max-memory, and dev-max-memory-unset configured with ../../configure --silent --prefix=/home/rthompso/src/git/postgres/install/${dir} --with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6--with-perl --with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap--enable-nls where $dir in master, pg-stats-memory, and dev-max-memory, dev-max-memory-unset. The only change made to the default postgresql.conf was to have the script add to the dev-max-memory instance the line "max_total_backend_memory = 2048" before startup. I did find one change in patch 2 that I pushed back into patch 1, this should only impact the pg-stats-memory instance. my .psqlrc turns timing on I created a script where I can pass two instances to be compared. It invokes psql -At -d postgres $connstr -P pager=off -c 'select * from generate_series(0, 10000000)' 100 times on each of the 2 instances and calculates the AVG time and SD for the 100 runs. It then uses the AVG from each instance to calculate the percentage difference. Depending on the instance, my results differ from master from negligible to ~5.5%. Comparing master to itself had up to a ~2% variation. See below. ------------------------ 12 runs comparing dev-max-memory 2048 VS master Shows ~3% to 5.5% variation Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1307.14 -> VER dev-max-memory 2048 1240.74 -> VER master 5.21218% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1315.99 -> VER dev-max-memory 2048 1245.64 -> VER master 5.4926% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1317.39 -> VER dev-max-memory 2048 1265.33 -> VER master 4.03141% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1313.52 -> VER dev-max-memory 2048 1256.69 -> VER master 4.42221% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1329.98 -> VER dev-max-memory 2048 1253.75 -> VER master 5.90077% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1314.47 -> VER dev-max-memory 2048 1245.6 -> VER master 5.38032% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1309.7 -> VER dev-max-memory 2048 1258.55 -> VER master 3.98326% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1322.16 -> VER dev-max-memory 2048 1248.94 -> VER master 5.69562% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1320.15 -> VER dev-max-memory 2048 1261.41 -> VER master 4.55074% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1345.22 -> VER dev-max-memory 2048 1280.96 -> VER master 4.8938% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1296.03 -> VER dev-max-memory 2048 1257.06 -> VER master 3.05277% difference -- Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master 1319.5 -> VER dev-max-memory 2048 1252.34 -> VER master 5.22272% difference ---------------------------- 12 showing dev-max-memory-unset VS master Shows ~2.5% to 5% variation Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1300.93 -> VER dev-max-memory unset 1235.12 -> VER master 5.18996% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1293.57 -> VER dev-max-memory unset 1263.93 -> VER master 2.31789% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1303.05 -> VER dev-max-memory unset 1258.11 -> VER master 3.50935% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1302.14 -> VER dev-max-memory unset 1256.51 -> VER master 3.56672% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1299.22 -> VER dev-max-memory unset 1282.74 -> VER master 1.27655% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1334.06 -> VER dev-max-memory unset 1263.77 -> VER master 5.41144% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1319.92 -> VER dev-max-memory unset 1262.35 -> VER master 4.45887% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1318.01 -> VER dev-max-memory unset 1257.16 -> VER master 4.7259% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1316.88 -> VER dev-max-memory unset 1257.63 -> VER master 4.60282% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1320.33 -> VER dev-max-memory unset 1282.12 -> VER master 2.93646% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1306.91 -> VER dev-max-memory unset 1246.12 -> VER master 4.76218% difference -- Calculate average runtime percentage difference between VER dev-max-memory unset and VER master 1320.65 -> VER dev-max-memory unset 1258.78 -> VER master 4.79718% difference ------------------------------- 12 showing pg-stat-activity-only VS master Shows ~<1% to 2.5% variation Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1252.65 -> VER pg-stat-activity-backend-memory-allocated 1245.36 -> VER master 0.583665% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1294.75 -> VER pg-stat-activity-backend-memory-allocated 1277.55 -> VER master 1.33732% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1264.11 -> VER pg-stat-activity-backend-memory-allocated 1257.57 -> VER master 0.518702% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1267.44 -> VER pg-stat-activity-backend-memory-allocated 1251.31 -> VER master 1.28079% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1270.05 -> VER pg-stat-activity-backend-memory-allocated 1250.1 -> VER master 1.58324% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1298.92 -> VER pg-stat-activity-backend-memory-allocated 1265.04 -> VER master 2.64279% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1280.99 -> VER pg-stat-activity-backend-memory-allocated 1263.51 -> VER master 1.37394% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1273.23 -> VER pg-stat-activity-backend-memory-allocated 1275.53 -> VER master -0.18048% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1261.2 -> VER pg-stat-activity-backend-memory-allocated 1263.04 -> VER master -0.145786% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1289.73 -> VER pg-stat-activity-backend-memory-allocated 1289.02 -> VER master 0.0550654% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1287.57 -> VER pg-stat-activity-backend-memory-allocated 1279.42 -> VER master 0.634985% difference -- Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master 1272.01 -> VER pg-stat-activity-backend-memory-allocated 1259.22 -> VER master 1.01058% difference ---------------------------------- I also did 12 runs master VS master Shows, ~1% to 2% variation Calculate average runtime percentage difference between VER master and VER master 1239.6 -> VER master 1263.73 -> VER master -1.92783% difference -- Calculate average runtime percentage difference between VER master and VER master 1253.82 -> VER master 1252.5 -> VER master 0.105334% difference -- Calculate average runtime percentage difference between VER master and VER master 1256.05 -> VER master 1258.97 -> VER master -0.232205% difference -- Calculate average runtime percentage difference between VER master and VER master 1264.8 -> VER master 1248.94 -> VER master 1.26186% difference -- Calculate average runtime percentage difference between VER master and VER master 1265.08 -> VER master 1275.43 -> VER master -0.814797% difference -- Calculate average runtime percentage difference between VER master and VER master 1260.95 -> VER master 1288.81 -> VER master -2.1853% difference -- Calculate average runtime percentage difference between VER master and VER master 1260.46 -> VER master 1252.86 -> VER master 0.604778% difference -- Calculate average runtime percentage difference between VER master and VER master 1253.49 -> VER master 1255.25 -> VER master -0.140309% difference -- Calculate average runtime percentage difference between VER master and VER master 1277.5 -> VER master 1267.42 -> VER master 0.792166% difference -- Calculate average runtime percentage difference between VER master and VER master 1266.2 -> VER master 1283.12 -> VER master -1.32741% difference -- Calculate average runtime percentage difference between VER master and VER master 1245.78 -> VER master 1246.78 -> VER master -0.0802388% difference -- Calculate average runtime percentage difference between VER master and VER master 1255.15 -> VER master 1276.73 -> VER master -1.70466% difference
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Mon, 2023-05-22 at 08:42 -0400, reid.thompson@crunchydata.com wrote: > On Wed, 2023-05-17 at 23:07 -0400, reid.thompson@crunchydata.com wrote: > > Thanks for the feedback. > > > > I'm plannning to look at this. > > > > Is your benchmark something that I could utilize? I.E. is it a set of > > scripts or a standard test from somewhere that I can duplicate? > > > > Thanks, > > Reid > > Attach patches updated to master. Pulled from patch 2 back to patch 1 a change that was also pertinent to patch 1.
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Mon, 2023-05-22 at 08:42 -0400, reid.thompson@crunchydata.com wrote: More followup to the above. > > I experimented on my system regarding > "The simple query select * from generate_series(0, 10000000) shows roughly 18.9 % degradation on my test server." > > My laptop: > 32GB ram > 11th Gen Intel(R) Core(TM) i7-11850H 8 cores/16 threads @ 2.50GHz (Max Turbo Frequency. 4.80 GHz ; Cache. 24 MB) > SSD -> Model: KXG60ZNV1T02 NVMe KIOXIA 1024GB (nvme) Hi Ran through a few more tests on my system varying the initial_allocation_allowance and allocation_allowance_refill_qty from the current 1MB to 2, 4, 6, 8, 10 mb. Also realized that in my last tests/email I had posted percent difference rather than percent change. Turns out for the numbers that were being compared they're essentially the same, but I'm providing both for this set of tests. Ten runs for each comparison. Compared dev-max-memory set, dev-max-memory unset, master, and pg-stat-activity-backend-memory-allocated against master at each allocation value; Again, the test invokes psql -At -d postgres $connstr -P pager=off -c 'select * from generate_series(0, 10000000)' 100 times on each of the 2 instances and calculates the AVG time and SD for the 100 runs. It then uses the AVG from each instance to calculate the percentage difference/change. These tests contain one code change not yet pushed to pgsql-hackers. In AllocSetReset() do not enter pgstat_report_allocated_bytes_decrease if no memory has been freed. Will format and post some pgbench test result in a separate email. Percent difference: ───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ Results: difference-dev-max-memory-set VS master ───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation 2 │ 4.2263% difference 3.03961% difference 0.0585808% difference 2.92451% difference 3.34694% difference 2.67771% difference 3 │ 3.55709% difference 3.92339% difference 2.29144% difference 3.2156% difference 2.06153% difference 2.86217% difference 4 │ 2.04389% difference 2.91866% difference 3.73463% difference 2.86161% difference 3.60992% difference 3.07293% difference 5 │ 3.1306% difference 3.64773% difference 2.38063% difference 1.84845% difference 4.87375% difference 4.16953% difference 6 │ 3.12556% difference 3.34537% difference 2.99052% difference 2.60538% difference 2.14825% difference 1.95454% difference 7 │ 2.20615% difference 2.12861% difference 2.85282% difference 2.43336% difference 2.31389% difference 3.21563% difference 8 │ 1.9954% difference 3.61371% difference 3.35543% difference 3.49821% difference 3.41526% difference 8.25753% difference 9 │ 2.46845% difference 2.57784% difference 3.13067% difference 3.67681% difference 2.89139% difference 3.6067% difference 10 │ 3.60092% difference 2.16164% difference 3.9976% difference 2.6144% difference 4.27892% difference 2.68998% difference 11 │ 2.55454% difference 2.39073% difference 3.09631% difference 3.24292% difference 1.9107% difference 1.76182% difference 12 │ 13 │ 28.9089/10 29.74729/10 27.888631/10 28.92125/10 30.85055/10 34.26854/10 14 │ 2.89089 2.974729 2.7888631 2.892125 3.085055 3.426854 ───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ Results: difference-dev-max-memory-unset VS master ───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation 2 │ 3.96616% difference 3.05528% difference 0.563267% difference 1.12075% difference 3.52398% difference 3.25641% difference 3 │ 3.11387% difference 3.12499% difference 1.1133% difference 4.86997% difference 2.11481% difference 1.11668% difference 4 │ 3.14506% difference 2.06193% difference 3.36034% difference 2.80644% difference 2.37822% difference 3.07669% difference 5 │ 2.81052% difference 3.18499% difference 2.70705% difference 2.27847% difference 2.78506% difference 3.02919% difference 6 │ 2.9765% difference 3.44165% difference 2.62039% difference 4.61596% difference 2.27937% difference 3.89676% difference 7 │ 3.201% difference 1.35838% difference 2.40578% difference 3.95695% difference 2.25983% difference 4.17585% difference 8 │ 5.35191% difference 3.96434% difference 4.32891% difference 3.62715% difference 2.17503% difference 0.620856% difference 9 │ 3.44241% difference 2.9754% difference 3.03765% difference 1.48104% difference 1.53958% difference 3.14598% difference 10 │ 10.1155% difference 4.21062% difference 1.64416% difference 1.51458% difference 2.92131% difference 2.95603% difference 11 │ 3.11011% difference 4.31318% difference 2.01991% difference 4.71192% difference 2.37039% difference 4.25241% difference 12 │ 13 │ 41.23304/10 31.69076/10 23.800757/10 30.98323/10 24.34758/10 29.526856/10 14 │ 4.123304 3.169076 2.3800757 3.098323 2.434758 2.9526856 ───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ Results: difference-master VS master ───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation 2 │ 0.0734782% difference 0.0955457% difference 0.0521627% difference 2.32643% difference 0.286493% difference 1.26977% difference 3 │ 0.547862% difference 1.19087% difference 0.276915% difference 0.334332% difference 0.260545% difference 0.108956% difference 4 │ 0.0714666% difference 0.931605% difference 0.753996% difference 0.457174% difference 0.215904% difference 1.43979% difference 5 │ 0.269737% difference 0.848613% difference 0.222909% difference 0.315927% difference 0.290408% difference 0.248591% difference 6 │ 1.04231% difference 0.367444% difference 0.699571% difference 0.29266% difference 0.844548% difference 0.273776% difference 7 │ 0.0584984% difference 0.15094% difference 0.0721539% difference 0.594991% difference 1.80223% difference 0.500557% difference 8 │ 0.355129% difference 1.19517% difference 0.201835% difference 1.2351% difference 0.266004% difference 0.80893% difference 9 │ 0.0811794% difference 1.16184% difference 1.01913% difference 0.149087% difference 0.402931% difference 0.125788% difference 10 │ 0.950973% difference 0.154471% difference 0.42623% difference 0.874816% difference 0.157934% difference 0.225433% difference 11 │ 0.501783% difference 0.308357% difference 0.279147% difference 0.122458% difference 0.538141% difference 0.865846% difference 12 │ 13 │ 3.952417/10 6.404856/10 4.00405/10 6.702975/10 5.065138/10 5.867437/10 14 │ 0.3952417 0.6404856 0.400405 0.6702975 0.5065138 0.5867437 ───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ Results: difference-pg-stat-activity-backend-memory-allocated VS master ───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation 2 │ 2.04788% difference 0.50705% difference 0.504772% difference 0.136316% difference 0.590087% difference 1.33931% difference 3 │ 1.21173% difference 0.3309% difference 0.482685% difference 1.67956% difference 0.175478% difference 0.969286% difference 4 │ 0.0680972% difference 0.295211% difference 0.867547% difference 1.12959% difference 0.193756% difference 0.714178% difference 5 │ 0.91525% difference 1.42408% difference 1.49059% difference 0.641652% difference 1.34265% difference 0.378394% difference 6 │ 2.46448% difference 2.67081% difference 0.63824% difference 0.650301% difference 0.481858% difference 1.65711% difference 7 │ 1.31021% difference 0.0548831% difference 1.23217% difference 2.11691% difference 0.31629% difference 3.85858% difference 8 │ 1.61458% difference 0.46042% difference 0.724742% difference 0.172952% difference 1.33157% difference 0.556898% difference 9 │ 1.65063% difference 0.59815% difference 1.42473% difference 0.725576% difference 0.229639% difference 0.875489% difference 10 │ 1.78567% difference 1.45652% difference 0.6317% difference 1.99146% difference 0.999521% difference 1.85291% difference 11 │ 0.391318% difference 1.13216% difference 0.138291% difference 0.531084% difference 0.680197% difference 1.63162% difference 12 │ 13 │ 13.459845/10 8.930184/10 8.135467/10 9.775401/10 6.341046/10 13.83377/10 14 │ 1.3459845 0.8930184 0.8135467 0.9775401 0.6341046 1.3833775 ───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Percent change: ───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ Results: change-dev-max-memory-set VS master ───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation 2 │ 4.13884% change 2.99411% change 0.0585636% change 2.88237% change 3.29185% change 2.64233% change 3 │ 3.49493% change 3.84791% change 2.26549% change 3.16472% change 2.0405% change 2.82179% change 4 │ 2.02322% change 2.87668% change 3.66617% change 2.82124% change 3.54592% change 3.02643% change 5 │ 3.08235% change 3.5824% change 2.35263% change 1.83153% change 4.75781% change 4.08438% change 6 │ 3.07746% change 3.29033% change 2.94646% change 2.57188% change 2.12542% change 1.93562% change 7 │ 2.18208% change 2.10619% change 2.8127% change 2.40411% change 2.28743% change 3.16474% change 8 │ 1.97569% change 3.54957% change 3.30007% change 3.43808% change 3.35792% change 7.93011% change 9 │ 2.43836% change 2.54504% change 3.08242% change 3.61044% change 2.85019% change 3.54281% change 10 │ 3.53724% change 2.13852% change 3.91926% change 2.58067% change 4.18929% change 2.65428% change 11 │ 2.52233% change 2.36249% change 3.0491% change 3.19118% change 1.89262% change 1.74644% change 12 │ 13 │ 28.4725/10 29.29324/10 27.452864/10 28.49622/10 30.33895/10 33.54893/10 14 │ 2.84725 2.929324 2.7452864 2.849622 3.033895 3.354893 ───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ Results: change-dev-max-memory-unset VS master ───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation 2 │ 3.88903% change 3.00931% change 0.564858% change 1.11451% change 3.46296% change 3.20424% change 3 │ 3.06613% change 3.07691% change 1.10714% change 4.75421% change 2.09268% change 1.11048% change 4 │ 3.09637% change 2.04089% change 3.30482% change 2.7676% change 2.35028% change 3.03008% change 5 │ 2.77157% change 3.13506% change 2.6709% change 2.2528% change 2.74681% change 2.984% change 6 │ 2.93285% change 3.38343% change 2.5865% change 4.51183% change 2.25368% change 3.82229% change 7 │ 3.15057% change 1.34921% change 2.37719% change 3.88018% change 2.23458% change 4.09044% change 8 │ 5.21243% change 3.88728% change 4.23719% change 3.56254% change 2.15163% change 0.62279% change 9 │ 3.38416% change 2.93178% change 2.99221% change 1.47015% change 1.52782% change 3.09726% change 10 │ 10.6543% change 4.1238% change 1.63075% change 1.5032% change 2.87926% change 2.91298% change 11 │ 3.06248% change 4.22213% change 1.99972% change 4.60347% change 2.34263% change 4.16388% change 12 │ 13 │ 41.21989/10 31.1598/10 23.471278/10 30.42049/10 24.04233/10 29.03844/10 14 │ 4.121989 3.11598 2.3471278 3.042049 2.404233 2.903844 ───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ Results: change-master VS master ───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation 2 │ 0.0734512% change 0.0955% change 0.0521763% change 2.35381% change 0.286904% change 1.27789% change 3 │ 0.549367% change 1.18382% change 0.276532% change 0.333774% change 0.260206% change 0.108897% change 4 │ 0.0714411% change 0.927286% change 0.751164% change 0.456132% change 0.216137% change 1.4295% change 5 │ 0.269374% change 0.845028% change 0.222661% change 0.315429% change 0.29083% change 0.2489% change 6 │ 1.0369% change 0.368121% change 0.702026% change 0.292232% change 0.840997% change 0.273402% change 7 │ 0.0584813% change 0.151054% change 0.07218% change 0.596766% change 1.78613% change 0.499307% change 8 │ 0.355761% change 1.18807% change 0.201631% change 1.22752% change 0.265651% change 0.805671% change 9 │ 0.0812124% change 1.16863% change 1.02435% change 0.149198% change 0.402121% change 0.125709% change 10 │ 0.955516% change 0.154351% change 0.425324% change 0.871006% change 0.158059% change 0.225179% change 11 │ 0.500527% change 0.307882% change 0.278758% change 0.122533% change 0.539593% change 0.862113% change 12 │ 13 │ 3.952031/10 6.389742/10 4.006802/10 6.7184/10 5.046628/10 5.856568/10 14 │ 0.3952031 0.6389742 0.4006802 0.67184 0.5046628 0.5856568 ───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ Results: change-pg-stat-activity-backend-memory-allocated VS master ───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation 2 │ 2.02713% change 0.505768% change 0.506049% change 0.136223% change 0.591833% change 1.3304% change 3 │ 1.20444% change 0.331448% change 0.481523% change 1.66557% change 0.175325% change 0.974006% change 4 │ 0.068074% change 0.294776% change 0.8638% change 1.12325% change 0.193568% change 0.711637% change 5 │ 0.91108% change 1.41401% change 1.47956% change 0.6396% change 1.33369% change 0.377679% change 6 │ 2.43448% change 2.63562% change 0.636209% change 0.648194% change 0.4807% change 1.64349% change 7 │ 1.30168% change 0.054868% change 1.22463% change 2.09474% change 0.316791% change 3.93449% change 8 │ 1.60165% change 0.461483% change 0.722126% change 0.173102% change 1.32277% change 0.555352% change 9 │ 1.63712% change 0.599944% change 1.41466% change 0.722953% change 0.229375% change 0.871673% change 10 │ 1.76986% change 1.44599% change 0.629711% change 1.97183% change 0.99455% change 1.8359% change 11 │ 0.392085% change 1.12579% change 0.138195% change 0.532498% change 0.677892% change 1.61841% change 12 │ 13 │ 13.347599/10 8.869697/10 8.096463/10 9.70796/10 6.316494/10 13.853037/10 14 │ 1.3347599 0.8869697 0.8096463 0.970796 0.6316494 1.385303 ───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote: > Attach patches updated to master. > Pulled from patch 2 back to patch 1 a change that was also pertinent to patch 1. +1 to the idea, have doubts on the implementation. I have a question. I see the feature triggers ERROR on the exceeding of the memory limit. The superior PG_CATCH() section will handle the error. As I see, many such sections use memory allocations. What if some routine, like the CopyErrorData(), exceeds the limit, too? In this case, we could repeat the error until the top PG_CATCH(). Is this correct behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for recursion and allow error handlers to slightly exceed this hard limit? Also, the patch needs to be rebased. -- regards, Andrey Lepikhov Postgres Professional
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 29/9/2023 09:52, Andrei Lepikhov wrote: > On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote: >> Attach patches updated to master. >> Pulled from patch 2 back to patch 1 a change that was also pertinent >> to patch 1. > +1 to the idea, have doubts on the implementation. > > I have a question. I see the feature triggers ERROR on the exceeding of > the memory limit. The superior PG_CATCH() section will handle the error. > As I see, many such sections use memory allocations. What if some > routine, like the CopyErrorData(), exceeds the limit, too? In this case, > we could repeat the error until the top PG_CATCH(). Is this correct > behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for > recursion and allow error handlers to slightly exceed this hard limit? By the patch in attachment I try to show which sort of problems I'm worrying about. In some PП_CATCH() sections we do CopyErrorData (allocate some memory) before aborting the transaction. So, the allocation error can move us out of this section before aborting. We await for soft ERROR message but will face more hard consequences. -- regards, Andrey Lepikhov Postgres Professional
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Greetings, * Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote: > On 29/9/2023 09:52, Andrei Lepikhov wrote: > > On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote: > > > Attach patches updated to master. > > > Pulled from patch 2 back to patch 1 a change that was also pertinent > > > to patch 1. > > +1 to the idea, have doubts on the implementation. > > > > I have a question. I see the feature triggers ERROR on the exceeding of > > the memory limit. The superior PG_CATCH() section will handle the error. > > As I see, many such sections use memory allocations. What if some > > routine, like the CopyErrorData(), exceeds the limit, too? In this case, > > we could repeat the error until the top PG_CATCH(). Is this correct > > behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for > > recursion and allow error handlers to slightly exceed this hard limit? > By the patch in attachment I try to show which sort of problems I'm worrying > about. In some PП_CATCH() sections we do CopyErrorData (allocate some > memory) before aborting the transaction. So, the allocation error can move > us out of this section before aborting. We await for soft ERROR message but > will face more hard consequences. While it's an interesting idea to consider making exceptions to the limit, and perhaps we'll do that (or have some kind of 'reserve' for such cases), this isn't really any different than today, is it? We might have a malloc() failure in the main path, end up in PG_CATCH() and then try to do a CopyErrorData() and have another malloc() failure. If we can rearrange the code to make this less likely to happen, by doing a bit more work to free() resources used in the main path before trying to do new allocations, then, sure, let's go ahead and do that, but that's independent from this effort. Thanks! Stephen
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 19/10/2023 02:00, Stephen Frost wrote: > Greetings, > > * Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote: >> On 29/9/2023 09:52, Andrei Lepikhov wrote: >>> On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote: >>>> Attach patches updated to master. >>>> Pulled from patch 2 back to patch 1 a change that was also pertinent >>>> to patch 1. >>> +1 to the idea, have doubts on the implementation. >>> >>> I have a question. I see the feature triggers ERROR on the exceeding of >>> the memory limit. The superior PG_CATCH() section will handle the error. >>> As I see, many such sections use memory allocations. What if some >>> routine, like the CopyErrorData(), exceeds the limit, too? In this case, >>> we could repeat the error until the top PG_CATCH(). Is this correct >>> behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for >>> recursion and allow error handlers to slightly exceed this hard limit? > >> By the patch in attachment I try to show which sort of problems I'm worrying >> about. In some PП_CATCH() sections we do CopyErrorData (allocate some >> memory) before aborting the transaction. So, the allocation error can move >> us out of this section before aborting. We await for soft ERROR message but >> will face more hard consequences. > > While it's an interesting idea to consider making exceptions to the > limit, and perhaps we'll do that (or have some kind of 'reserve' for > such cases), this isn't really any different than today, is it? We > might have a malloc() failure in the main path, end up in PG_CATCH() and > then try to do a CopyErrorData() and have another malloc() failure. > > If we can rearrange the code to make this less likely to happen, by > doing a bit more work to free() resources used in the main path before > trying to do new allocations, then, sure, let's go ahead and do that, > but that's independent from this effort. I agree that rearranging efforts can be made independently. The code in the letter above was shown just as a demo of the case I'm worried about. IMO, the thing that should be implemented here is a recursion level for the memory limit. If processing the error, we fall into recursion with this limit - we should ignore it. I imagine custom extensions that use PG_CATCH() and allocate some data there. At least we can raise the level of error to FATAL. -- regards, Andrey Lepikhov Postgres Professional
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Greetings, * Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote: > On 19/10/2023 02:00, Stephen Frost wrote: > > * Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote: > > > On 29/9/2023 09:52, Andrei Lepikhov wrote: > > > > On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote: > > > > > Attach patches updated to master. > > > > > Pulled from patch 2 back to patch 1 a change that was also pertinent > > > > > to patch 1. > > > > +1 to the idea, have doubts on the implementation. > > > > > > > > I have a question. I see the feature triggers ERROR on the exceeding of > > > > the memory limit. The superior PG_CATCH() section will handle the error. > > > > As I see, many such sections use memory allocations. What if some > > > > routine, like the CopyErrorData(), exceeds the limit, too? In this case, > > > > we could repeat the error until the top PG_CATCH(). Is this correct > > > > behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for > > > > recursion and allow error handlers to slightly exceed this hard limit? > > > > > By the patch in attachment I try to show which sort of problems I'm worrying > > > about. In some PП_CATCH() sections we do CopyErrorData (allocate some > > > memory) before aborting the transaction. So, the allocation error can move > > > us out of this section before aborting. We await for soft ERROR message but > > > will face more hard consequences. > > > > While it's an interesting idea to consider making exceptions to the > > limit, and perhaps we'll do that (or have some kind of 'reserve' for > > such cases), this isn't really any different than today, is it? We > > might have a malloc() failure in the main path, end up in PG_CATCH() and > > then try to do a CopyErrorData() and have another malloc() failure. > > > > If we can rearrange the code to make this less likely to happen, by > > doing a bit more work to free() resources used in the main path before > > trying to do new allocations, then, sure, let's go ahead and do that, > > but that's independent from this effort. > > I agree that rearranging efforts can be made independently. The code in the > letter above was shown just as a demo of the case I'm worried about. > IMO, the thing that should be implemented here is a recursion level for the > memory limit. If processing the error, we fall into recursion with this > limit - we should ignore it. > I imagine custom extensions that use PG_CATCH() and allocate some data > there. At least we can raise the level of error to FATAL. Ignoring such would defeat much of the point of this effort- which is to get to a position where we can say with some confidence that we're not going to go over some limit that the user has set and therefore not allow ourselves to end up getting OOM killed. These are all the same issues that already exist today on systems which don't allow overcommit too, there isn't anything new here in regards to these risks, so I'm not really keen to complicate this to deal with issues that are already there. Perhaps once we've got the basics in place then we could consider reserving some space for handling such cases.. but I don't think it'll actually be very clean and what if we have an allocation that goes beyond what that reserved space is anyway? Then we're in the same spot again where we have the choice of either failing the allocation in a less elegant way than we might like to handle that error, or risk getting outright kill'd by the kernel. Of those choices, sure seems like failing the allocation is the better way to go. Thanks, Stephen
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2023-10-19 18:06:10 -0400, Stephen Frost wrote: > Ignoring such would defeat much of the point of this effort- which is to > get to a position where we can say with some confidence that we're not > going to go over some limit that the user has set and therefore not > allow ourselves to end up getting OOM killed. I think that is a good medium to long term goal. I do however think that we'd be better off merging the visibility of memory allocations soon-ish and implement the limiting later. There's a lot of hairy details to get right for the latter, and even just having visibility will be a huge improvement. I think even patch 1 is doing too much at once. I doubt the DSM stuff is quite right. I'm unconvinced it's a good idea to split the different types of memory contexts out. That just exposes too much implementation detail stuff without a good reason. I think the overhead even just the tracking implies right now is likely too high and needs to be optimized. It should be a single math operation, not tracking things in multiple fields. I don't think pg_sub_u64_overflow() should be in the path either, that suddenly adds conditional branches. You really ought to look at the difference in assembly for the hot functions. Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2023-10-19 18:06:10 -0400, Stephen Frost wrote: > > Ignoring such would defeat much of the point of this effort- which is to > > get to a position where we can say with some confidence that we're not > > going to go over some limit that the user has set and therefore not > > allow ourselves to end up getting OOM killed. > > I think that is a good medium to long term goal. I do however think that we'd > be better off merging the visibility of memory allocations soon-ish and > implement the limiting later. There's a lot of hairy details to get right for > the latter, and even just having visibility will be a huge improvement. I agree that having the visibility will be a great improvement and perhaps could go in separately, but I don't know that I agree that the limits are going to be that much of an issue. In any case, there's been work ongoing on this and that'll be posted soon. I was just trying to address the general comment raised in this sub-thread here. > I think even patch 1 is doing too much at once. I doubt the DSM stuff is > quite right. Getting DSM right has certainly been tricky, along with other things, but we've been working towards, and continue to work towards, getting everything to line up nicely between memory context allocations of various types and the amounts which are being seen as malloc'd/free'd. There's been parts of this also reworked to allow us to see per-backend reservations as well as total reserved and to get those numbers able to be matched up inside of a given transaction using the statistics system. > I'm unconvinced it's a good idea to split the different types of memory > contexts out. That just exposes too much implementation detail stuff without a > good reason. DSM needs to be independent anyway ... as for the others, perhaps we could combine them, though that's pretty easily done later and for now it's been useful to see them split out as we've been working on the patch. > I think the overhead even just the tracking implies right now is likely too > high and needs to be optimized. It should be a single math operation, not > tracking things in multiple fields. I don't think pg_sub_u64_overflow() should > be in the path either, that suddenly adds conditional branches. You really > ought to look at the difference in assembly for the hot functions. This has been improved in the most recent work and we'll have that posted soon, probably best to hold off from larger review of this right now- as mentioned, I was just trying to address the specific question in this sub-thread since a new patch is coming soon. Thanks, Stephen
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 20/10/2023 05:06, Stephen Frost wrote: > Greetings, > > * Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote: >> On 19/10/2023 02:00, Stephen Frost wrote: >>> * Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote: >>>> On 29/9/2023 09:52, Andrei Lepikhov wrote: >>>>> On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote: >>>>>> Attach patches updated to master. >>>>>> Pulled from patch 2 back to patch 1 a change that was also pertinent >>>>>> to patch 1. >>>>> +1 to the idea, have doubts on the implementation. >>>>> >>>>> I have a question. I see the feature triggers ERROR on the exceeding of >>>>> the memory limit. The superior PG_CATCH() section will handle the error. >>>>> As I see, many such sections use memory allocations. What if some >>>>> routine, like the CopyErrorData(), exceeds the limit, too? In this case, >>>>> we could repeat the error until the top PG_CATCH(). Is this correct >>>>> behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for >>>>> recursion and allow error handlers to slightly exceed this hard limit? >>> >>>> By the patch in attachment I try to show which sort of problems I'm worrying >>>> about. In some PП_CATCH() sections we do CopyErrorData (allocate some >>>> memory) before aborting the transaction. So, the allocation error can move >>>> us out of this section before aborting. We await for soft ERROR message but >>>> will face more hard consequences. >>> >>> While it's an interesting idea to consider making exceptions to the >>> limit, and perhaps we'll do that (or have some kind of 'reserve' for >>> such cases), this isn't really any different than today, is it? We >>> might have a malloc() failure in the main path, end up in PG_CATCH() and >>> then try to do a CopyErrorData() and have another malloc() failure. >>> >>> If we can rearrange the code to make this less likely to happen, by >>> doing a bit more work to free() resources used in the main path before >>> trying to do new allocations, then, sure, let's go ahead and do that, >>> but that's independent from this effort. >> >> I agree that rearranging efforts can be made independently. The code in the >> letter above was shown just as a demo of the case I'm worried about. >> IMO, the thing that should be implemented here is a recursion level for the >> memory limit. If processing the error, we fall into recursion with this >> limit - we should ignore it. >> I imagine custom extensions that use PG_CATCH() and allocate some data >> there. At least we can raise the level of error to FATAL. > > Ignoring such would defeat much of the point of this effort- which is to > get to a position where we can say with some confidence that we're not > going to go over some limit that the user has set and therefore not > allow ourselves to end up getting OOM killed. These are all the same > issues that already exist today on systems which don't allow overcommit > too, there isn't anything new here in regards to these risks, so I'm not > really keen to complicate this to deal with issues that are already > there. > > Perhaps once we've got the basics in place then we could consider > reserving some space for handling such cases.. but I don't think it'll > actually be very clean and what if we have an allocation that goes > beyond what that reserved space is anyway? Then we're in the same spot > again where we have the choice of either failing the allocation in a > less elegant way than we might like to handle that error, or risk > getting outright kill'd by the kernel. Of those choices, sure seems > like failing the allocation is the better way to go. I've got your point. The only issue I worry about is the uncertainty and clutter that can be created by this feature. In the worst case, when we have a complex error stack (including the extension's CATCH sections, exceptions in stored procedures, etc.), the backend will throw the memory limit error repeatedly. Of course, one failed backend looks better than a surprisingly killed postmaster, but the mix of different error reports and details looks terrible and challenging to debug in the case of trouble. So, may we throw a FATAL error if we reach this limit while handling an exception? -- regards, Andrey Lepikhov Postgres Professional
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Greetings, * Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote: > The only issue I worry about is the uncertainty and clutter that can be > created by this feature. In the worst case, when we have a complex error > stack (including the extension's CATCH sections, exceptions in stored > procedures, etc.), the backend will throw the memory limit error repeatedly. I'm not seeing what additional uncertainty or clutter there is- this is, again, exactly the same as what happens today on a system with overcommit disabled and I don't feel like we get a lot of complaints about this today. > Of course, one failed backend looks better than a surprisingly killed > postmaster, but the mix of different error reports and details looks > terrible and challenging to debug in the case of trouble. So, may we throw a > FATAL error if we reach this limit while handling an exception? I don't see why we'd do that when we can do better- we just fail whatever the ongoing query or transaction is and allow further requests on the same connection. We already support exactly that and it works really rather well and I don't see why we'd throw that away because there's a different way to get an OOM error. If you want to make the argument that we should throw FATAL on OOM when handling an exception, that's something you could argue independently of this effort already today, but I don't think you'll get agreement that it's an improvement. Thanks, Stephen
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 20/10/2023 19:39, Stephen Frost wrote: Greetings, > * Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote: >> The only issue I worry about is the uncertainty and clutter that can be >> created by this feature. In the worst case, when we have a complex error >> stack (including the extension's CATCH sections, exceptions in stored >> procedures, etc.), the backend will throw the memory limit error repeatedly. > > I'm not seeing what additional uncertainty or clutter there is- this is, > again, exactly the same as what happens today on a system with > overcommit disabled and I don't feel like we get a lot of complaints > about this today. Maybe I missed something or see this feature from an alternate point of view (as an extension developer), but overcommit is more useful so far: it kills a process. It means that after restart, the backend/background worker will have an initial internal state. With this limit enabled, we need to remember that each function call can cause an error, and we have to remember it using static PG_CATCH sections where we must rearrange local variables to the initial (?) state. So, it complicates development. Of course, this limit is a good feature, but from my point of view, it would be better to kill a memory-consuming backend instead of throwing an error. At least for now, we don't have a technique to repeat query planning with chances to build a more effective plan. -- regards, Andrei Lepikhov Postgres Professional
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2023-10-24 09:39:42 +0700, Andrei Lepikhov wrote: > On 20/10/2023 19:39, Stephen Frost wrote: > Greetings, > > * Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote: > > > The only issue I worry about is the uncertainty and clutter that can be > > > created by this feature. In the worst case, when we have a complex error > > > stack (including the extension's CATCH sections, exceptions in stored > > > procedures, etc.), the backend will throw the memory limit error repeatedly. > > > > I'm not seeing what additional uncertainty or clutter there is- this is, > > again, exactly the same as what happens today on a system with > > overcommit disabled and I don't feel like we get a lot of complaints > > about this today. > > Maybe I missed something or see this feature from an alternate point of view > (as an extension developer), but overcommit is more useful so far: it kills > a process. In case of postgres it doesn't just kill one postgres, it leads to *all* connections being terminated. > It means that after restart, the backend/background worker will have an > initial internal state. With this limit enabled, we need to remember that > each function call can cause an error, and we have to remember it using > static PG_CATCH sections where we must rearrange local variables to the > initial (?) state. So, it complicates development. You need to be aware of errors being thrown regardless this feature, as out-of-memory errors can be encountered today already. There also are many other kinds of errors that can be thrown. Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Here is an updated patch for tracking Postgres memory usage.
In this new patch, Postgres “reserves” memory, first by updating process-private counters, and then eventually by updating global counters. If the new GUC variable “max_total_memory” is set, reservations exceeding the limit are turned down and treated as though the kernel had reported an out of memory error.
Postgres memory reservations come from multiple sources.
- Malloc calls made by the Postgres memory allocators.
- Static shared memory created by the postmaster at server startup,
- Dynamic shared memory created by the backends.
- A fixed amount (1Mb) of “initial” memory reserved whenever a process starts up.
Each process also maintains an accurate count of its actual memory allocations. The process-private variable “my_memory” holds the total allocations for that process. Since there can be no contention, each process updates its own counters very efficiently.
Pgstat now includes global memory counters. These shared memory counters represent the sum of all reservations made by all Postgres processes. For efficiency, these global counters are only updated when new reservations exceed a threshold, currently 1 Mb for each process. Consequently, the global reservation counters are approximate totals which may differ from the actual allocation totals by up to 1 Mb per process.
The max_total_memory limit is checked whenever the global counters are updated. There is no special error handling if a memory allocation exceeds the global limit. That allocation returns a NULL for malloc style allocations or an ENOMEM for shared memory allocations. Postgres has existing mechanisms for dealing with out of memory conditions.
For sanity checking, pgstat now includes the pg_backend_memory_allocation view showing memory allocations made by the backend process. This view includes a scan of the top memory context, so it compares memory allocations reported through pgstat with actual allocations. The two should match.
Two other views were created as well. pg_stat_global_memory_tracking shows how much server memory has been reserved overall and how much memory remains to be reserved. pg_stat_memory_reservation iterates through the memory reserved by each server process. Both of these views use pgstat’s “snapshot” mechanism to ensure consistent values within a transaction.
Performance-wise, there was no measurable impact with either pgbench or a simple “SELECT * from series” query.
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2023-10-31 17:11:26 +0000, John Morris wrote: > Postgres memory reservations come from multiple sources. > > * Malloc calls made by the Postgres memory allocators. > * Static shared memory created by the postmaster at server startup, > * Dynamic shared memory created by the backends. > * A fixed amount (1Mb) of “initial” memory reserved whenever a process starts up. > > Each process also maintains an accurate count of its actual memory > allocations. The process-private variable “my_memory” holds the total > allocations for that process. Since there can be no contention, each process > updates its own counters very efficiently. I think this will introduce measurable overhead in low concurrency cases and very substantial overhead / contention when there is a decent amount of concurrency. This makes all memory allocations > 1MB contend on a single atomic. Massive amount of energy have been spent writing multi-threaded allocators that have far less contention than this - the current state is to never contend on shared resources on any reasonably common path. This gives away one of the few major advantages our process model has away. The patch doesn't just introduce contention when limiting is enabled - it introduces it even when memory usage is just tracked. It makes absolutely no sense to have a single contended atomic in that case - just have a per-backend variable in shared memory that's updated. It's *WAY* cheaper to compute the overall memory usage during querying than to keep a running total in shared memory. > Pgstat now includes global memory counters. These shared memory counters > represent the sum of all reservations made by all Postgres processes. For > efficiency, these global counters are only updated when new reservations > exceed a threshold, currently 1 Mb for each process. Consequently, the > global reservation counters are approximate totals which may differ from the > actual allocation totals by up to 1 Mb per process. I see that you added them to the "cumulative" stats system - that doesn't immediately makes sense to me - what you're tracking here isn't an accumulating counter, it's something showing the current state, right? > The max_total_memory limit is checked whenever the global counters are > updated. There is no special error handling if a memory allocation exceeds > the global limit. That allocation returns a NULL for malloc style > allocations or an ENOMEM for shared memory allocations. Postgres has > existing mechanisms for dealing with out of memory conditions. I still think it's extremely unwise to do tracking of memory and limiting of memory in one patch. You should work towards and acceptable patch that just tracks memory usage in an as simple and low overhead way as possible. Then we later can build on that. > For sanity checking, pgstat now includes the pg_backend_memory_allocation > view showing memory allocations made by the backend process. This view > includes a scan of the top memory context, so it compares memory allocations > reported through pgstat with actual allocations. The two should match. Can't you just do this using the existing pg_backend_memory_contexts view? > Performance-wise, there was no measurable impact with either pgbench or a > simple “SELECT * from series” query. That seems unsurprising - allocations aren't a major part of the work there, you'd have to regress by a lot to see memory allocator changes to show a significant performance decrease. > diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out > index 7a6f36a6a9..6c813ec465 100644 > --- a/src/test/regress/expected/opr_sanity.out > +++ b/src/test/regress/expected/opr_sanity.out > @@ -468,9 +468,11 @@ WHERE proallargtypes IS NOT NULL AND > ARRAY(SELECT proallargtypes[i] > FROM generate_series(1, array_length(proallargtypes, 1)) g(i) > WHERE proargmodes IS NULL OR proargmodes[i] IN ('i', 'b', 'v')); > - oid | proname | proargtypes | proallargtypes | proargmodes > ------+---------+-------------+----------------+------------- > -(0 rows) > + oid | proname | proargtypes | proallargtypes | proargmodes > +------+----------------------------------+-------------+---------------------------+------------------- > + 9890 | pg_stat_get_memory_reservation | | {23,23,20,20,20,20,20,20} | {i,o,o,o,o,o,o,o} > + 9891 | pg_get_backend_memory_allocation | | {23,23,20,20,20,20,20} | {i,o,o,o,o,o,o} > +(2 rows) This indicates that your pg_proc entries are broken, they need to fixed rather than allowed here. Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2023-10-31 17:11:26 +0000, John Morris wrote: > > Postgres memory reservations come from multiple sources. > > > > * Malloc calls made by the Postgres memory allocators. > > * Static shared memory created by the postmaster at server startup, > > * Dynamic shared memory created by the backends. > > * A fixed amount (1Mb) of “initial” memory reserved whenever a process starts up. > > > > Each process also maintains an accurate count of its actual memory > > allocations. The process-private variable “my_memory” holds the total > > allocations for that process. Since there can be no contention, each process > > updates its own counters very efficiently. > > I think this will introduce measurable overhead in low concurrency cases and > very substantial overhead / contention when there is a decent amount of > concurrency. This makes all memory allocations > 1MB contend on a single > atomic. Massive amount of energy have been spent writing multi-threaded > allocators that have far less contention than this - the current state is to > never contend on shared resources on any reasonably common path. This gives > away one of the few major advantages our process model has away. We could certainly adjust the size of each reservation to reduce the frequency of having to hit the atomic. Specific suggestions about how to benchmark and see the regression that's being worried about here would be great. Certainly my hope has generally been that when we do a larger allocation, it's because we're about to go do a bunch of work, meaning that hopefully the time spent updating the atomic is minor overall. > The patch doesn't just introduce contention when limiting is enabled - it > introduces it even when memory usage is just tracked. It makes absolutely no > sense to have a single contended atomic in that case - just have a per-backend > variable in shared memory that's updated. It's *WAY* cheaper to compute the > overall memory usage during querying than to keep a running total in shared > memory. Agreed that we should avoid the contention when limiting isn't being used, certainly easy to do so, and had actually intended to but that seems to have gotten lost along the way. Will fix. Other than that change inside update_global_reservation though, the code for reporting per-backend memory usage and querying it does work as you're outlining above inside the stats system. That said- I just want to confirm that you would agree that querying the amount of memory used by every backend, to add it all up to enforce an overall limit, surely isn't something we're going to want to do during an allocation and that having a global atomic for that is better, right? > > Pgstat now includes global memory counters. These shared memory counters > > represent the sum of all reservations made by all Postgres processes. For > > efficiency, these global counters are only updated when new reservations > > exceed a threshold, currently 1 Mb for each process. Consequently, the > > global reservation counters are approximate totals which may differ from the > > actual allocation totals by up to 1 Mb per process. > > I see that you added them to the "cumulative" stats system - that doesn't > immediately makes sense to me - what you're tracking here isn't an > accumulating counter, it's something showing the current state, right? Yes, this is current state, not an accumulation. > > The max_total_memory limit is checked whenever the global counters are > > updated. There is no special error handling if a memory allocation exceeds > > the global limit. That allocation returns a NULL for malloc style > > allocations or an ENOMEM for shared memory allocations. Postgres has > > existing mechanisms for dealing with out of memory conditions. > > I still think it's extremely unwise to do tracking of memory and limiting of > memory in one patch. You should work towards and acceptable patch that just > tracks memory usage in an as simple and low overhead way as possible. Then we > later can build on that. Frankly, while tracking is interesting, the limiting is the feature that's needed more urgently imv. We could possibly split it up but there's an awful lot of the same code that would need to be changed and that seems less than ideal. Still, we'll look into this. > > For sanity checking, pgstat now includes the pg_backend_memory_allocation > > view showing memory allocations made by the backend process. This view > > includes a scan of the top memory context, so it compares memory allocations > > reported through pgstat with actual allocations. The two should match. > > Can't you just do this using the existing pg_backend_memory_contexts view? Not and get a number that you can compare to the local backend number due to the query itself happening and performing allocations and creating new contexts. We wanted to be able to show that we are accounting correctly and exactly matching to what the memory context system is tracking. > > - oid | proname | proargtypes | proallargtypes | proargmodes > > ------+---------+-------------+----------------+------------- > > -(0 rows) > > + oid | proname | proargtypes | proallargtypes | proargmodes > > +------+----------------------------------+-------------+---------------------------+------------------- > > + 9890 | pg_stat_get_memory_reservation | | {23,23,20,20,20,20,20,20} | {i,o,o,o,o,o,o,o} > > + 9891 | pg_get_backend_memory_allocation | | {23,23,20,20,20,20,20} | {i,o,o,o,o,o,o} > > +(2 rows) > > This indicates that your pg_proc entries are broken, they need to fixed rather > than allowed here. Agreed, will fix. Thanks! Stephen
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2023-11-06 13:02:50 -0500, Stephen Frost wrote: > > > The max_total_memory limit is checked whenever the global counters are > > > updated. There is no special error handling if a memory allocation exceeds > > > the global limit. That allocation returns a NULL for malloc style > > > allocations or an ENOMEM for shared memory allocations. Postgres has > > > existing mechanisms for dealing with out of memory conditions. > > > > I still think it's extremely unwise to do tracking of memory and limiting of > > memory in one patch. You should work towards and acceptable patch that just > > tracks memory usage in an as simple and low overhead way as possible. Then we > > later can build on that. > > Frankly, while tracking is interesting, the limiting is the feature > that's needed more urgently imv. I agree that we need limiting, but that the tracking needs to be very robust for that to be usable. > We could possibly split it up but there's an awful lot of the same code that > would need to be changed and that seems less than ideal. Still, we'll look > into this. Shrug. IMO keeping them together just makes it very likely that neither goes in. > > > For sanity checking, pgstat now includes the pg_backend_memory_allocation > > > view showing memory allocations made by the backend process. This view > > > includes a scan of the top memory context, so it compares memory allocations > > > reported through pgstat with actual allocations. The two should match. > > > > Can't you just do this using the existing pg_backend_memory_contexts view? > > Not and get a number that you can compare to the local backend number > due to the query itself happening and performing allocations and > creating new contexts. We wanted to be able to show that we are > accounting correctly and exactly matching to what the memory context > system is tracking. I think creating a separate view for this will be confusing for users, without really much to show for. Excluding the current query would be useful for other cases as well, why don't we provide a way to do that with pg_backend_memory_contexts? Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2023-11-06 13:02:50 -0500, Stephen Frost wrote: > > > > The max_total_memory limit is checked whenever the global counters are > > > > updated. There is no special error handling if a memory allocation exceeds > > > > the global limit. That allocation returns a NULL for malloc style > > > > allocations or an ENOMEM for shared memory allocations. Postgres has > > > > existing mechanisms for dealing with out of memory conditions. > > > > > > I still think it's extremely unwise to do tracking of memory and limiting of > > > memory in one patch. You should work towards and acceptable patch that just > > > tracks memory usage in an as simple and low overhead way as possible. Then we > > > later can build on that. > > > > Frankly, while tracking is interesting, the limiting is the feature > > that's needed more urgently imv. > > I agree that we need limiting, but that the tracking needs to be very robust > for that to be usable. Is there an issue with the tracking in the patch that you saw? That's certainly an area that we've tried hard to get right and to match up to numbers from the rest of the system, such as the memory context system. > > We could possibly split it up but there's an awful lot of the same code that > > would need to be changed and that seems less than ideal. Still, we'll look > > into this. > > Shrug. IMO keeping them together just makes it very likely that neither goes > in. I'm happy to hear your support for the limiting part of this- that's encouraging. > > > > For sanity checking, pgstat now includes the pg_backend_memory_allocation > > > > view showing memory allocations made by the backend process. This view > > > > includes a scan of the top memory context, so it compares memory allocations > > > > reported through pgstat with actual allocations. The two should match. > > > > > > Can't you just do this using the existing pg_backend_memory_contexts view? > > > > Not and get a number that you can compare to the local backend number > > due to the query itself happening and performing allocations and > > creating new contexts. We wanted to be able to show that we are > > accounting correctly and exactly matching to what the memory context > > system is tracking. > > I think creating a separate view for this will be confusing for users, without > really much to show for. Excluding the current query would be useful for other > cases as well, why don't we provide a way to do that with > pg_backend_memory_contexts? Both of these feel very much like power-user views, so I'm not terribly concerned about users getting confused. That said, we could possibly drop this as a view and just have the functions which are then used in the regression tests to catch things should the numbers start to diverge. Having a way to get the memory contexts which don't include the currently running query might be interesting too but is rather independent of what this patch is trying to do. The only reason we collected up the memory-context info is as a cross-check to the tracking that we're doing and while the existing memory-context view is just fine for a lot of other things, it doesn't work for that specific need. Thanks, Stephen
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2023-11-07 15:55:48 -0500, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > On 2023-11-06 13:02:50 -0500, Stephen Frost wrote: > > > > > The max_total_memory limit is checked whenever the global counters are > > > > > updated. There is no special error handling if a memory allocation exceeds > > > > > the global limit. That allocation returns a NULL for malloc style > > > > > allocations or an ENOMEM for shared memory allocations. Postgres has > > > > > existing mechanisms for dealing with out of memory conditions. > > > > > > > > I still think it's extremely unwise to do tracking of memory and limiting of > > > > memory in one patch. You should work towards and acceptable patch that just > > > > tracks memory usage in an as simple and low overhead way as possible. Then we > > > > later can build on that. > > > > > > Frankly, while tracking is interesting, the limiting is the feature > > > that's needed more urgently imv. > > > > I agree that we need limiting, but that the tracking needs to be very robust > > for that to be usable. > > Is there an issue with the tracking in the patch that you saw? That's > certainly an area that we've tried hard to get right and to match up to > numbers from the rest of the system, such as the memory context system. There's some details I am pretty sure aren't right - the DSM tracking piece seems bogus to me. But beyond that: I don't know. There's enough other stuff in the patch that it's hard to focus on that aspect. That's why I'd like to merge a patch doing just that, so we actually can collect numbers. If any of the developers of the patch had focused on polishing that part instead of focusing on the limiting, it'd have been ready to be merged a while ago, maybe even in 16. I think the limiting piece is unlikely to be ready for 17. Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
hi. +static void checkAllocations(); should be "static void checkAllocations(void);" ? PgStatShared_Memtrack there is a lock, but seems not initialized, and not used. Can you expand on it? So in view pg_stat_global_memory_tracking, column "total_memory_reserved" is a point of time, total memory the whole server reserved/malloced? will it change every time you call it? the function pg_stat_get_global_memory_tracking provolatile => 's'. should be a VOLATILE function? pg_stat_get_memory_reservation, pg_stat_get_global_memory_tracking should be proretset => 'f'. +{ oid => '9891', + descr => 'statistics: memory utilized by current backend', + proname => 'pg_get_backend_memory_allocation', prorows => '1', proisstrict => 'f', + proretset => 't', provolatile => 's', proparallel => 'r', you declared +void pgstat_backend_memory_reservation_cb(void); but seems there is no definition. this part is unnecessary since you already declared src/include/catalog/pg_proc.dat? +/* SQL Callable functions */ +extern Datum pg_stat_get_memory_reservation(PG_FUNCTION_ARGS); +extern Datum pg_get_backend_memory_allocation(PG_FUNCTION_ARGS); +extern Datum pg_stat_get_global_memory_tracking(PG_FUNCTION_ARGS); The last sentence is just a plain link, no explanation. something is missing? <para> + Reports how much memory remains available to the server. If a + backend process attempts to allocate more memory than remains, + the process will fail with an out of memory error, resulting in + cancellation of the process's active query/transaction. + If memory is not being limited (ie. max_total_memory is zero or not set), + this column returns NULL. + <xref linkend="guc-max-total-memory"/>. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>static_shared_memory</structfield> <type>bigint</type> + </para> + <para> + Reports how much static shared memory (non-DSM shared memory) is being used by + the server. Static shared memory is configured by the postmaster at + at server startup. + <xref linkend="guc-max-total-memory"/>. + </para></entry> + </row>
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hello! Earlier in this thread, the pgbench results were published, where with a strong memory limit of 100MB a significant, about 10%, decrease in TPS was observed [1]. Using dedicated server with 12GB RAM and methodology described in [3], i performed five series of measurements for the patches from the [2]. The series were like this: 1) unpatched 16th version at the REL_16_BETA1 (e0b82fc8e83) as close to [2] in time. 2) patched REL_16_BETA1 at e0b82fc8e83 with undefined max_total_backend_memory GUC (with default value = 0). 3) patched REL_16_BETA1 with max_total_backend_memory = 16GB 4) the same with max_total_backend_memory = 8GB 5) and again with max_total_backend_memory = 200MB Measurements with max_total_backend_memory = 100MB were not be carried out, with limit 100MB the server gave an error on startup: FATAL: configured max_total_backend_memory 100MB is <= shared_memory_size 143MB So i used 200MB to retain all other GUCs the same. Pgbench gave the following results: 1) and 2) almost the same: ~6350 TPS. See orange and green distributions on the attached graph.png respectively. 3) and 4) identical to each other (~6315 TPS) and a bit slower than 1) and 2) by ~0,6%. See blue and yellow distributions respectively. 5) is slightly slower (~6285 TPS) than 3) and 4) by another 0,5%. (grey distribution) The standard error in all series was ~0.2%. There is a raw data in the raw_data.txt. With the best wishes, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company [1] https://www.postgresql.org/message-id/3178e9a1b7acbcf023fafed68ca48d76afc07907.camel%40crunchydata.com [2] https://www.postgresql.org/message-id/4edafedc0f8acb12a2979088ac1317bd7dd42145.camel%40crunchydata.com [3] https://www.postgresql.org/message-id/1d3a7d8f-cb7c-4468-a578-d8a1194ea2de%40postgrespro.ru
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 12/26/23 11:49, Anton A. Melnikov wrote: > Hello! > > Earlier in this thread, the pgbench results were published, where with a > strong memory limit of 100MB > a significant, about 10%, decrease in TPS was observed [1]. > > Using dedicated server with 12GB RAM and methodology described in [3], i > performed five series > of measurements for the patches from the [2]. Can you share some info about the hardware? For example the CPU model, number of cores, and so on. 12GB RAM is not quite huge, so presumably it was a small machine. > The series were like this: > 1) unpatched 16th version at the REL_16_BETA1 (e0b82fc8e83) as close to > [2] in time. > 2) patched REL_16_BETA1 at e0b82fc8e83 with undefined > max_total_backend_memory GUC (with default value = 0). > 3) patched REL_16_BETA1 with max_total_backend_memory = 16GB > 4) the same with max_total_backend_memory = 8GB > 5) and again with max_total_backend_memory = 200MB > OK > Measurements with max_total_backend_memory = 100MB were not be carried out, > with limit 100MB the server gave an error on startup: > FATAL: configured max_total_backend_memory 100MB is <= > shared_memory_size 143MB > So i used 200MB to retain all other GUCs the same. > I'm not very familiar with the patch yet, but this seems a bit strange. Why should shared_buffers be included this limit? > Pgbench gave the following results: > 1) and 2) almost the same: ~6350 TPS. See orange and green > distributions on the attached graph.png respectively. > 3) and 4) identical to each other (~6315 TPS) and a bit slower than 1) > and 2) by ~0,6%. > See blue and yellow distributions respectively. > 5) is slightly slower (~6285 TPS) than 3) and 4) by another 0,5%. (grey > distribution) > The standard error in all series was ~0.2%. There is a raw data in the > raw_data.txt. > I think 6350 is a pretty terrible number, especially for scale 8, which is maybe 150MB of data. I think that's a pretty clear sign the system was hitting some other bottleneck, which can easily mask regressions in the memory allocation code. AFAICS the pgbench runs were regular r/w benchmarks, so I'd bet it was hitting I/O, and possibly even subject to some random effects at that level. I think what would be interesting are runs with pgbench -M prepared -S -c $N -j $N i.e. read-only tests (to not hit I/O), and $N being sufficiently large to maybe also show some concurrency/locking bottlenecks, etc. I may do some benchmarks if I happen to find a bit of time, but maybe you could collect such numbers too? The other benchmark that might be interesting is more OLAP, with low concurrency but backends allocating a lot of memory. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, I wanted to take a look at the patch, and I noticed it's broken since 3d51cb5197 renamed a couple pgstat functions in August. I plan to maybe do some benchmarks etc. preferably on current master, so here's a version fixing that minor bitrot. As for the patch, I only skimmed through the thread so far, to get some idea of what the approach and goals are, etc. I didn't look at the code yet, so can't comment on that. However, at pgconf.eu a couple week ago I had quite a few discussions about such "backend memory limit" could/should work in principle, and I've been thinking about ways to implement this. So let me share some thoughts about how this patch aligns with that ... (FWIW it's not my intent to hijack or derail this patch in any way, but there's a couple things I think we should do differently.) I'm 100% on board with having a memory limit "above" work_mem. It's really annoying that we have no way to restrict the amount of memory a backend can allocate for complex queries, etc. But I find it a bit strange that we aim to introduce a "global" memory limit for all backends combined first. I'm not against having that too, but it's not the feature I usually wish to have. I need some protection against runaway backends, that happen to allocate a lot memory. Similarly, I'd like to be able to have different limits depending on what the backend does - a backend doing OLAP may naturally need more memory, while a backend doing OLTP may have a much tighter limit. But with a single global limit none of this is possible. It may help reducing the risk of unexpected OOM issues (not 100%, but useful), but it can't limit the impact to the one backend - if memory starts runnning out, it will affect all other backends a bit randomly (depending on the order in which the backends happen to allocate memory). And it does not consider what workloads the backends execute. Let me propose a slightly different architecture that I imagined while thinking about this. It's not radically differrent from what the patch does, but it focuses on the local accounting first. I believe it's possible to extend this to enforce the global limit too. FWIW I haven't tried implementing this - I don't want to "hijack" this thread and do my own thing. I can take a stab at a PoC if needed. Firstly, I'm not quite happy with how all the memory contexts have to do their own version of the accounting and memory checking. I think we should move that into a new abstraction which I call "memory pool". It's very close to "memory context" but it only deals with allocating blocks, not the chunks requested by palloc() etc. So when someone does palloc(), that may be AllocSetAlloc(). And instead of doing malloc() that would do MemoryPoolAlloc(blksize), and then that would do all the accounting and checks, and then do malloc(). This may sound like an unnecessary indirection, but the idea is that a single memory pool would back many memory contexts (perhaps all for a given backend). In principle we might even establish separate memory pools for different parts of the memory context hierarchy, but I'm not sure we need that. I can imagine the pool could also cache blocks for cases when we create and destroy contexts very often, but glibc should already does that for us, I believe. For me, the accounting and memory context is the primary goal. I wonder if we considered this context/pool split while working on the accounting for hash aggregate, but I think we were too attached to doing all of it in the memory context hierarchy. Of course, this memory pool is per backend, and so would be the memory accounting and limit enforced by it. But I can imagine extending to do a global limit similar to what the current patch does - using a counter in shared memory, or something. I haven't reviewed what's the overhead or how it handles cases when a backend terminates in some unexpected way. But whatever the current patch does, memory pool could do too. Secondly, I think there's an annoying issue with the accounting at the block level - it makes it problematic to use low limit values. We double the block size, so we may quickly end up with a block size a couple MBs, which means the accounting granularity gets very coarse. I think it'd be useful to introduce a "backpressure" between the memory pool and the memory context, depending on how close we are to the limit. For example if the limit is 128MB and the backend allocated 16MB so far, we're pretty far away from the limit. So if the backend requests 8MB block, that's fine and the memory pool should malloc() that. But if we already allocated 100MB, maybe we should be careful and not allow 8MB blocks - the memory pool should be allowed to override this and return just 1MB block. Sure, this would have to be optional, and not all places can accept a smaller block than requested (when the chunk would not fit into the smaller block). It would require a suitable memory pool API and more work in the memory contexts, but it seems pretty useful. Certainly not something for v1. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi! Thanks for your interest and reply! On 26.12.2023 20:28, Tomas Vondra wrote: > Can you share some info about the hardware? For example the CPU model, > number of cores, and so on. 12GB RAM is not quite huge, so presumably it > was a small machine. > It is HP ProLanit 2x socket server. 2x6 cores Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 2x12GB RAM, RAID from SSD drives. Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux One cpu was disabled and some tweaks was made as Andres advised to avoid NUMA and other side effects. Full set of the configuration commands for server was like that: numactl --cpunodebind=0 --membind=0 --physcpubind=1,3,5,7,9,11 bash sudo cpupower frequency-set -g performance sudo cpupower idle-set -D0 echo 3059000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq (Turbo Boost and hyperthreading was disabled in BIOS.) > I think what would be interesting are runs with > > pgbench -M prepared -S -c $N -j $N > > i.e. read-only tests (to not hit I/O), and $N being sufficiently large > to maybe also show some concurrency/locking bottlenecks, etc. > > I may do some benchmarks if I happen to find a bit of time, but maybe > you could collect such numbers too? Firstly, i repeated the same -c and -j values in read-only mode as you advised. As one can see in the read-only.png, the absolute TPS value has increased significantly, by about 13 times. Patched version with limit 200Mb was slightly slower than with limit 0 by ~2%. The standard error in all series was ~0.5%. Since the deviation has increased in comparison with rw test the difference between unpatched version and patched ones with limits 0, 8Gb and 16Gb is not sensible. There is a raw data in the raw_data-read-only.txt. > I think 6350 is a pretty terrible number, especially for scale 8, which > is maybe 150MB of data. I think that's a pretty clear sign the system > was hitting some other bottleneck, which can easily mask regressions in > the memory allocation code. AFAICS the pgbench runs were regular r/w > benchmarks, so I'd bet it was hitting I/O, and possibly even subject to > some random effects at that level. > To avoid possible I/O bottleneck i followed these steps: - gave all 24G mem to cpu 0 rather than 12G as in [1]; - created a ramdisk of 12G size; - disable swap like that: numactl --cpunodebind=0 --physcpubind=1,3,5,7,9,11 bash sudo swapoff -a sudo mkdir /mnt/ramdisk sudo mount -t tmpfs -o rw,size=12G tmpfs /mnt/ramdisk The inst dir, data dir and log file were all on ramdisk. Pgbench in rw mode gave the following results: - the difference between unpatched version and patched ones with limits 0 and 16Gb almost the same: ~7470+-0.2% TPS. (orange, green and blue distributions on the RW-ramdisk.png respectively) - patched version with limit 8GB is slightly slower than three above; (yellow distribution) - patched version with limit 200MB slower than the first three by a measurable value ~0,4% (~7440 TPS); (black distribution) The standard error in all series was ~0.2%. There is a raw data in the raw_data-rw-ramdisk.txt For the sake of completeness i'm going to repeat read-only measurements with ramdisk. Аnd perform some tests with increased -c and -j values as you advised to find the possible point where concurrency/blocking bottlenecks start to play a role. And do this, of cause, for the last version of the patch. Thanks for rebased it! In general, i don't observe any considerable degradation in performance from this patch of several or even 10%, which were mentioned in [2]. With the best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company [1] https://www.postgresql.org/message-id/1d3a7d8f-cb7c-4468-a578-d8a1194ea2de%40postgrespro.ru [2] https://www.postgresql.org/message-id/3178e9a1b7acbcf023fafed68ca48d76afc07907.camel%40crunchydata.com
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, I took a closer look at this patch over the last couple days, and I did a bunch of benchmarks to see how expensive the accounting is. The good news is that I haven't observed any overhead - I did two simple tests, that I think would be particularly sensitive to this: 1) regular "pgbench -S" with scale 10, so tiny and not hitting I/O 2) "select count(*) from generate_series(1,N)" where N is either 10k or 1M, which should be enough to cause a fair number of allocations And I tested this on three branches - master (no patches applied), patched (but limit=0, so just accounting) and patched-limit (with limit set to 4.5GB, so high enough not to be hit). Attached are script and raw results for both benchmarks from two machines, i5 (small, 4C) and xeon (bigger, 16C/32C), and a PDF showing the results as candlestick charts (with 1 and 2 sigma intervals). AFAICS there's no measurable difference between master and patched builds. Which is good, clearly the local allowance makes the overhead negligible (at least in these benchmarks). Now, for the patch itself - I already said some of the stuff about a month ago [1], but I'll repeat some of it for the sake of completeness before I get to comments about the code etc. Firstly, I agree with the goal of having a way to account for memory used by the backends, and also ability to enforce some sort of limit. It's difficult to track the memory at the OS level (interpreting RSS values is not trivial), and work_mem is not sufficient to enforce a backend-level limit, not even talking about a global limit. But as I said earlier, it seems quite strange to start by introducing some sort of global limit, combining memory for all backends. I do understand that the intent is to have such global limit in order to prevent issues with the OOM killer and/or not to interfere with other stuff running on the same machine. And while I'm not saying we should not have such limit, every time I wished to have a memory limit it was a backend-level one. Ideally a workmem-like limit that would "adjust" the work_mem values used by the optimizer (but that's not what this patch aims to do), or at least a backstop in case something goes wrong (say, a memory leak, OLTP application issuing complex queries, etc.). The accounting and infrastructure introduced by the patch seems to be suitable for both types of limits (global and per-backend), but then it goes for the global limit first. It may seem simpler to implement, but in practice there's a bunch of problems mentioned earlier, but ignored. For example, I really don't believe it's OK to just report the error in the first backend that happens to hit the limit. Bertrand mentioned that [2], asking about a situation when a runaway process allocates 99% of memory. The response to that was > Initially, we believe that punishing the detector is reasonable if we > can help administrators avoid the OOM killer/resource starvation. But > we can and should expand on this idea. which I find rather unsatisfactory. Belief is not an argument, and it relies on the assumption that this helps the administrator to avoid the OOM killer etc. ISTM it can easily mean the administrator can't even connect to the database, run a query (to inspect the new system views), etc. because any of that would hit the memory limit. That seems more like a built-in DoS facility ... If this was up to me, I'd probably start with the per-backend limit. But that's my personal opinion. Now, some comments about the code etc. Stephen was promising a new patch version in October [6], but that didn't happen yet, so I looked at the patches I rebased in December. 0001 ---- 1) I don't see why we should separate memory by context type - without knowing which exact backends are using the memory, this seems pretty irrelevant/useless. Either it's private or shared memory, I don't think the accounting should break this into more counters. Also, while I'm not aware of anyone proposing a new memory context type, I doubt we'd want to add more and more counters if that happens. suggestion: only two counters, for local and shared memory If we absolutely want to have per-type counters, we should not mix the private memory and shared memory ones. 2) I have my doubts about the tracking of shared memory. It seems weird to count them only into the backend that allocated them, and transfer them to "global" on process exit. Surely we should know which shared memory is meant to be long-lived from the start, no? For example, it's not uncommon that an extension allocates a chunk of shared memory to store some sort of global state (e.g. BDR/pglogical does that, I'm sure other extensions do that). But the process that allocates the shared memory keeps running. AFAICS this would be tracked as backend DSM memory, which I'm not sure this is what we want ... And I'm not alone in thinking this should work differently - [3], [4]. 3) We limit the floor of allocation counters to zero. This seems really weird. Why would it be necessary? I mean, how could we get a negative amount of memory? Seems like it might easily mask issues with incorrect accounting, or something. 4) pg_stat_memory_allocation Maybe pg_stat_memory_usage would be a better name ... 5) The comments/docs repeatedly talk about "dynamic nature" and that it makes the values not exact: > Due to the dynamic nature of memory allocations the allocated bytes > values may not be exact but should be sufficient for the intended > purposes. I don't understand what "dynamic nature" refers to, and why would it make the values not exact. Presumably this refers to the "allowance" but how is that "dynamic nature"? This definitely needs to explain this better, with some basic estimate how how accurate the values are expected to be. 6) The SGML docs keep recommending to use pg_size_pretty(). I find that unnecessary, no other docs reporting values in bytes do that. 7) I see no reason to include shared_memory_size_mb in the view, and same for shared_memory_size_in_huge_pages. It has little to do with "allocated" memory, IMO. And a value in "MB" goes directly against the suggestion to use pg_size_pretty(). suggestion: drop this from the view 8) In a lot of places we do context->mem_allocated += blksize; pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET); Maybe we should integrate the two types of accounting, wrap them into a single function call, or something? This makes it very simple to forget updating one of those places. AFAIC the patch tries to minimize the number of updates of the new shared counters, but with the allowance that should not be an issue I think. 9) This seems wrong. Why would it be OK to ever overflow? Isn't that a sign of incorrect accounting? /* Avoid allocated_bytes unsigned integer overflow on decrease */ if (pg_sub_u64_overflow(*my_allocated_bytes, proc_allocated_bytes, &temp)) { /* On overflow, set allocated bytes and allocator type bytes to zero */ *my_allocated_bytes = 0; *my_aset_allocated_bytes = 0; *my_dsm_allocated_bytes = 0; *my_generation_allocated_bytes = 0; *my_slab_allocated_bytes = 0; } 9) How could any of these values be negative? It's all capped to 0 and also stored in uint64. Seems pretty useless. +SELECT + datid > 0, pg_size_bytes(shared_memory_size) >= 0, shared_memory_size_in_huge_pages >= -1, global_dsm_allocated_bytes >= 0 +FROM + pg_stat_global_memory_allocation; + ?column? | ?column? | ?column? | ?column? +----------+----------+----------+---------- + t | t | t | t +(1 row) + +-- ensure that pg_stat_memory_allocation view exists +SELECT + pid > 0, allocated_bytes >= 0, aset_allocated_bytes >= 0, dsm_allocated_bytes >= 0, generation_allocated_bytes >= 0, slab_allocated_bytes >= 0 0003 ---- 1) The commit message says: > Further requests will not be allocated until dropping below the limit. > Keep this in mind when setting this value. The SGML docs have a similar "keep this in mind" suggestion in relation to the 1MB local allowance. I find this pretty useless, as it doesn't really say what to do with the information / what it means. I mean, should I set the value higher, or what am I supposed to do? This needs to be understandable for average user reading the SGML docs, who is unlikely to know the implementation details. 2) > This limit does not affect auxiliary backend processes. This seems pretty unfortunate, because in the cases where I actually saw OOM killer to intervene, it was often because of auxiliary processes allocating a lot of memory (say, autovacuum with maintenance_work_mem set very high, etc.). In a way, not excluding these auxiliary processes from the limit seems to go against the idea of preventing the OOM killer. 3) I don't think the patch ever explains what the "local allowance" is, how the refill works, why it's done this way, etc. I do think I understand that now, but I had to go through the thread, That's not really how it should be. There should be an explanation of how this works somewhere (comment in mcxt.c? separate README?). 4) > doc/src/sgml/monitoring.sgml Not sure why this removes the part about DSM being included only in the backend that created it. 5) > total_bkend_mem_bytes_available The "bkend" name is strange (to save 2 characters), and the fact how it combines local and shared memory seems confusing too. 6) + /* + * Local counter to manage shared memory allocations. At backend startup, set to + * initial_allocation_allowance via pgstat_init_allocated_bytes(). Decrease as + * memory is malloc'd. When exhausted, atomically refill if available from + * ProcGlobal->max_total_bkend_mem via exceeds_max_total_bkend_mem(). + */ +uint64 allocation_allowance = 0; But this allowance is for shared memory too, and shared memory is not allocated using malloc. 7) There's a lot of new global variables. Maybe it'd be better to group them into a struct, or something? 8) Why does pgstat_set_allocated_bytes_storage need the new return? 9) Unnecessary changes in src/backend/utils/hash/dynahash.c (whitespace, new comment that seems not very useful) 10) Shouldn't we have a new malloc wrapper that does the check? That way we wouldn't need to have the call in every memory context place calling malloc. 11) I'm not sure about the ereport() calls after hitting the limit. Adres thinks it might lead to recursion [4], but Stephen [5] seems to think this does not really make the situation worse. I'm not sure about that, though - I agree the ENOMEM can already happen, but but maybe having a limit (which clearly needs to be more stricter than the limit used by kernel for OOM) would be more likely to hit? 12) In general, I agree with Andres [4] that we'd be better of to focus on the accounting part, see how it works in practice, and then add some ability to limit memory after a while. regards [1] https://www.postgresql.org/message-id/4fb99fb7-8a6a-2828-dd77-e2f1d75c7dd0%40enterprisedb.com [2] https://www.postgresql.org/message-id/3b9b90c6-f4ae-a7df-6519-847ea9d5fe1e%40amazon.com [3] https://www.postgresql.org/message-id/20230110023118.qqbjbtyecofh3uvd%40awork3.anarazel.de [4] https://www.postgresql.org/message-id/20230113180411.rdqbrivz5ano2uat%40awork3.anarazel.de [5] https://www.postgresql.org/message-id/ZTArWsctGn5fEVPR%40tamriel.snowman.net [6] https://www.postgresql.org/message-id/ZTGycSYuFrsixv6q%40tamriel.snowman.net -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, > I took a closer look at this patch over the last couple days, and I did > a bunch of benchmarks to see how expensive the accounting is. The good > news is that I haven't observed any overhead - I did two simple tests, > that I think would be particularly sensitive to this: > > [...] Just wanted to let you know that v20231226 doesn't apply. The patch needs love from somebody interested in it. Best regards, Aleksander Alekseev (wearing a co-CFM hat)
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 12.03.2024 16:30, Aleksander Alekseev wrote: > Just wanted to let you know that v20231226 doesn't apply. The patch > needs love from somebody interested in it. Thanks for pointing to this! Here is a version updated for the current master. With the best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 13.03.2024 10:41, Anton A. Melnikov wrote: > Here is a version updated for the current master. > During patch updating i mistakenly added double counting of deallocatated blocks. That's why the tests in the patch tester failed. Fixed it and squashed fix 0002 with 0001. Here is fixed version. With the best wishes! -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hello Anton, 14.03.2024 23:36, Anton A. Melnikov wrote: > On 13.03.2024 10:41, Anton A. Melnikov wrote: > >> Here is a version updated for the current master. >> > > During patch updating i mistakenly added double counting of deallocatated blocks. > That's why the tests in the patch tester failed. > Fixed it and squashed fix 0002 with 0001. > Here is fixed version. Please try the following with the patches applied: echo "shared_buffers = '1MB' max_total_backend_memory = '10MB'" > /tmp/extra.config CPPFLAGS="-Og" ./configure --enable-tap-tests --enable-debug --enable-cassert ... TEMP_CONFIG=/tmp/extra.config make check It fails for me as follows: ... # postmaster did not respond within 60 seconds, examine ".../src/test/regress/log/postmaster.log" for the reason ... src/test/regress/log/postmaster.log contains: ... TRAP: failed Assert("ret != NULL"), File: "mcxt.c", Line: 1327, PID: 4109270 TRAP: failed Assert("ret != NULL"), File: "mcxt.c", Line: 1327, PID: 4109271 postgres: autovacuum launcher (ExceptionalCondition+0x69)[0x55ce441fcc6e] postgres: autovacuum launcher (palloc0+0x0)[0x55ce4422eb67] postgres: logical replication launcher (ExceptionalCondition+0x69)[0x55ce441fcc6e] postgres: autovacuum launcher (InitDeadLockChecking+0xa6)[0x55ce4408a6f0] postgres: logical replication launcher (palloc0+0x0)[0x55ce4422eb67] postgres: logical replication launcher (InitDeadLockChecking+0x45)[0x55ce4408a68f] postgres: autovacuum launcher (InitProcess+0x600)[0x55ce4409c6f2] postgres: logical replication launcher (InitProcess+0x600)[0x55ce4409c6f2] postgres: autovacuum launcher (+0x44b4e2)[0x55ce43ff24e2] ... grep TRAP src/test/regress/log/postmaster.log | wc -l 445 Best regards, Alexander
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Reviving this thread, because I am thinking about something related -- please ignore the "On Fri, Dec 27, 2024" date, this seems to be an artifact of me re-sending the message, from the list archive. The original message was from January 28, 2024. On Fri, Dec 27, 2024 at 11:02 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Firstly, I agree with the goal of having a way to account for memory > used by the backends, and also ability to enforce some sort of limit. > It's difficult to track the memory at the OS level (interpreting RSS > values is not trivial), and work_mem is not sufficient to enforce a > backend-level limit, not even talking about a global limit. > > But as I said earlier, it seems quite strange to start by introducing > some sort of global limit, combining memory for all backends. I do > understand that the intent is to have such global limit in order to > prevent issues with the OOM killer and/or not to interfere with other > stuff running on the same machine. And while I'm not saying we should > not have such limit, every time I wished to have a memory limit it was a > backend-level one. Ideally a workmem-like limit that would "adjust" the > work_mem values used by the optimizer (but that's not what this patch > aims to do), or at least a backstop in case something goes wrong (say, a > memory leak, OLTP application issuing complex queries, etc.). I think what Tomas suggests is the right strategy. I am thinking of something like: 1. Say we have a backend_work_mem limit. Then the total amount of memory available on the entire system, for all queries, as work_mem, would be backend_work_mem * max_connections. 2. We use this backend_work_mem to "adjust" work_mem values used by the executor. (I don't care about the optimizer right now -- optimizer just does its best to predict what will happen at runtime.) At runtime, every node that uses work_mem currently checks its memory usage against the session work_mem (and possibly hash_mem_multiplier) GUC(s). Instead, now, every node would check against its node-specific "adjusted" work_mem. If it exceeds this limit, it spills, using existing logic. In other words -- existing logic spills based on comparison to global work_mem GUC. Let's make it spill, instead, based on comparison to an operator-local "PlanState.work_mem" field. And then let's set that "PlanState.work_mem" field based on a new "backend_work_mem" GUC. Then no queries will run OOM (at least, not due to work_mem -- we'll address other memory usages separately), so they won't need to be canceled. Instead, they'll spill. This strategy solves the ongoing problem of how to set work_mem, if some queries have lots of operators and others don't -- now we just set backend_work_mem, as a limit on the entire query's total work_mem. And a bit of integration with the optimizer will allow us to distribute the total backend_work_mem to individual execution nodes, with the goal of minimizing spilling, without exceeding the backend_work_mem limit. Anyway, I revived this thread to see if there's interest in this sort of strategy -- Thanks, James
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 12/27/24 20:14, James Hunter wrote: > Reviving this thread, because I am thinking about something related -- > please ignore the "On Fri, Dec 27, 2024" date, this seems to be an > artifact of me re-sending the message, from the list archive. The > original message was from January 28, 2024. > > On Fri, Dec 27, 2024 at 11:02 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> Firstly, I agree with the goal of having a way to account for memory >> used by the backends, and also ability to enforce some sort of limit. >> It's difficult to track the memory at the OS level (interpreting RSS >> values is not trivial), and work_mem is not sufficient to enforce a >> backend-level limit, not even talking about a global limit. >> >> But as I said earlier, it seems quite strange to start by introducing >> some sort of global limit, combining memory for all backends. I do >> understand that the intent is to have such global limit in order to >> prevent issues with the OOM killer and/or not to interfere with other >> stuff running on the same machine. And while I'm not saying we should >> not have such limit, every time I wished to have a memory limit it was a >> backend-level one. Ideally a workmem-like limit that would "adjust" the >> work_mem values used by the optimizer (but that's not what this patch >> aims to do), or at least a backstop in case something goes wrong (say, a >> memory leak, OLTP application issuing complex queries, etc.). > > I think what Tomas suggests is the right strategy. I am thinking of > something like: > > 1. Say we have a backend_work_mem limit. Then the total amount of > memory available on the entire system, for all queries, as work_mem, > would be backend_work_mem * max_connections. > > 2. We use this backend_work_mem to "adjust" work_mem values used by > the executor. (I don't care about the optimizer right now -- optimizer > just does its best to predict what will happen at runtime.) > > At runtime, every node that uses work_mem currently checks its memory > usage against the session work_mem (and possibly hash_mem_multiplier) > GUC(s). Instead, now, every node would check against its node-specific > "adjusted" work_mem. If it exceeds this limit, it spills, using > existing logic. > > In other words -- existing logic spills based on comparison to global > work_mem GUC. Let's make it spill, instead, based on comparison to an > operator-local "PlanState.work_mem" field. > > And then let's set that "PlanState.work_mem" field based on a new > "backend_work_mem" GUC. Then no queries will run OOM (at least, not > due to work_mem -- we'll address other memory usages separately), so > they won't need to be canceled. Instead, they'll spill. > > This strategy solves the ongoing problem of how to set work_mem, if > some queries have lots of operators and others don't -- now we just > set backend_work_mem, as a limit on the entire query's total work_mem. > And a bit of integration with the optimizer will allow us to > distribute the total backend_work_mem to individual execution nodes, > with the goal of minimizing spilling, without exceeding the > backend_work_mem limit. > > Anyway, I revived this thread to see if there's interest in this sort > of strategy -- > I think there's still interest in having a memory limit of this kind, but it's not very clear to me what you mean by "adjusted work_mem". Can you explain how would that actually work in practice? Whenever I've been thinking about this in the past, it wasn't clear to me how would we know when to start adjusting work_mem, because we don't know which nodes will actually use work_mem concurrently. We certainly don't know that during planning - e.g. because we don't actually know what nodes will be "above" the operation we're planning. And AFIAK we don't have any concept which parts of the plan may be active at the same time ... Let's say we have limits work_mem=4MB and query_mem=8MB (to keep it simple). And we have a query plan with 3 separate hash joins, each needing 4MB. I believe these things can happen: 1) The hash joins are in distant parts of the query plan, and will not actually run concurrently - we'll run each hashjoin till completion before starting the next one. So, no problem with query_mem. 2) The hash joins are active at the same time, initialized one by one. But we don't know what, so we'll init HJ #1, it'll get 4MB of memory. Then we'll init HJ #2, it'll also get 4MB of memory. And now we want to init HJ #3, but we've already exhausted query_mem, but the memory is already used and we can't reclaim it. How would the work_mem get adjusted in these cases? Ideally, we'd not adjust work_mem in (1), because it'd force hash joins to spill data to disk, affecting performance when there's enough memory to just run the query plan (but maybe that's a reasonable price for the memory limit?). While in (2) we'd need to start adjusting the memory from the beginning, but I don't think we know that so early. Can you explain / walk me through the proposal for these cases? regards -- Tomas Vondra
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On 12/28/24 13:36, Anton A. Melnikov wrote: > Hi! > > On 28.12.2024 04:48, Tomas Vondra wrote: >> On 12/27/24 20:14, James Hunter wrote: >>> Reviving this thread, because I am thinking about something related -- >>> please ignore the "On Fri, Dec 27, 2024" date, this seems to be an >>> artifact of me re-sending the message, from the list archive. The >>> original message was from January 28, 2024. >>> >>> On Fri, Dec 27, 2024 at 11:02 AM Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> >>>> Firstly, I agree with the goal of having a way to account for memory >>>> used by the backends, and also ability to enforce some sort of limit. >>>> It's difficult to track the memory at the OS level (interpreting RSS >>>> values is not trivial), and work_mem is not sufficient to enforce a >>>> backend-level limit, not even talking about a global limit. >>>> >>>> But as I said earlier, it seems quite strange to start by introducing >>>> some sort of global limit, combining memory for all backends. I do >>>> understand that the intent is to have such global limit in order to >>>> prevent issues with the OOM killer and/or not to interfere with other >>>> stuff running on the same machine. And while I'm not saying we should >>>> not have such limit, every time I wished to have a memory limit it >>>> was a >>>> backend-level one. Ideally a workmem-like limit that would "adjust" the >>>> work_mem values used by the optimizer (but that's not what this patch >>>> aims to do), or at least a backstop in case something goes wrong >>>> (say, a >>>> memory leak, OLTP application issuing complex queries, etc.). >>> >>> I think what Tomas suggests is the right strategy. > > I'm also interested in this topic. And agreed that it's best to move > from the limit > for a separate backend to the global one. In more details let me suggest > the following steps or parts: > 1) realize memory limitation for a separate backend independent from the > work_mem GUC; > 2) add workmem-like limit that would "adjust" the work_mem values used by > the optimize as Thomas suggested; > 3) add global limit for all backends. > > As for p.1 there is a patch that was originally suggested by my colleague > Maxim Orlov <orlovmg@gmail.com> and which i modified for the current > master. > This patch introduces the only max_backend_memory GUC that specifies > the maximum amount of memory that can be allocated to a backend. > Zero value means no limit. > If the allocated memory size is exceeded, a standard "out of memory" > error will be issued. > Also the patch introduces the > pg_get_backend_memory_contexts_total_bytes() function, > which allows to know how many bytes have already been allocated > to the process in contexts. And in the build with asserts it adds > the pg_get_backend_memory_allocation_stats() function that allows > to get additional information about memory allocations for debug purposes. > Not sure a simple memory limit like in the patch (which just adds memory accounting + OOM after hitting the limit) can work as anything but a the last safety measure. It seems to me the limit would have to be set to a value that's much higher than any backend would realistically need. The first challenge I envision is that without any feedback (either to the planner or executor), it may break queries quite easily. It just takes the planner to add one more sort / hash join / materialize (which it can do arbitrarily, as it has no concept of the memory limit), and now the query can't run. And secondly, there are allocations that we don't restrict by work_mem, but this memory limit would include them, ofc. The main example that I can think of is hash join, where we (currently) don't account for the BufFile arrays, and that can already lead to annoying OOM issues, see e.g. [1] [2] and [3] (I'm sure there are more threads about the issue). It's wrong we don't account for the BufFile arrays, so it's not included in work_mem (or considered is some other way). And maybe we should finally improve that (not the fault of this patch, ofc). But it's hard, because as the amount of data grows, we have to add more batches - and at some point that starts adding more memory than we save. Ultimately, we end up breaking work_mem one way or the other - either we add more batches, or allow the hash table to exceed work_mem. That's a preexisting issue, of course. But wouldn't this simple limit make the situation worse? The query would likely complete OK (otherwise we'd get many more reports about OOM), but with the new limit it would probably fail with OOM. Maybe that's correct, and the response to that is "Don't set the limit with such queries," although it's hard to say in advance and it can happen randomly? Not sure. What bothers me a bit is that users would likely try to reduce work_mem, but that's the wrong thing to do in this case - it just increases the number of batches, and thus makes the situation worse. [1] https://www.postgresql.org/message-id/20190504003414.bulcbnge3rhwhcsh%40development [2] https://www.postgresql.org/message-id/20230228190643.1e368315%40karst [3] https://www.postgresql.org/message-id/bc138e9f-c89e-9147-5395-61d51a757b3b%40gusw.net >>> This strategy solves the ongoing problem of how to set work_mem, if >>> some queries have lots of operators and others don't -- now we just >>> set backend_work_mem, as a limit on the entire query's total work_mem. >>> And a bit of integration with the optimizer will allow us to >>> distribute the total backend_work_mem to individual execution nodes, >>> with the goal of minimizing spilling, without exceeding the >>> backend_work_mem limit. > > As for p.2 maybe one can set a maximum number of parallel sort or > hash table operations before writing to disk instead of absolute > value in the work_mem GUC? E.g. introduce а max_query_operations GUC > or a variable in such a way that old work_mem will be equal > to max_backend_memory divided by max_query_operations. > > What do you think about such approach? > It's not clear to me how you want to calculate the number of parallel operations that might use work_mem. Can you elaborate? regards -- Tomas Vondra
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Sat, 28 Dec 2024 15:57:44 +0100 Tomas Vondra <tomas@vondra.me> wrote: > > Not sure a simple memory limit like in the patch (which just adds > memory accounting + OOM after hitting the limit) can work as anything > but a the last safety measure. It seems to me the limit would have to > be set to a value that's much higher than any backend would > realistically need. > > The first challenge I envision is that without any feedback (either to > the planner or executor), it may break queries quite easily. It just > takes the planner to add one more sort / hash join / materialize > (which it can do arbitrarily, as it has no concept of the memory > limit), and now the query can't run. > > And secondly, there are allocations that we don't restrict by > work_mem, but this memory limit would include them, ofc. The main > example that I can think of is hash join, where we (currently) don't > account for the BufFile arrays, and that can already lead to annoying > OOM issues, see e.g. [1] [2] and [3] (I'm sure there are more threads > about the issue). Hi James! While I don't have a detailed design in mind, I'd like to add a strong +1 on the general idea that work_mem is hard to effectively use because queries can vary so widely in how many nodes might need work memory. I'd almost like to have two limits: First, a hard per-connection limit which could be set very high - we can track total memory usage across contexts inside of palloc and pfree (and maybe this could also be exposed in pg_stat_activity for easy visibility into a snapshot of memory use across all backends). If palloc detects that an allocation would take the total over the hard limit, then you just fail the palloc with an OOM. This protects postgres from a memory leak in a single backend OOM'ing the whole system and restarting the whole database; failing a single connection is better than failing all of them. Second, a soft per-connection "total_connection_work_mem_target" which could be set lower. The planner can just look at the total number of nodes that it expects to allocate work memory, divide the target by this and then set the work_mem for that query. There should be a reasonable floor (minimum) for work_mem - maybe the value of work_mem itself becomes this and the new target doesn't do anything besides increasing runtime work_mem. Maybe even could do a "total_instance_work_mem_target" where it's divided by the number of average active connections or something. In practice this target idea doesn't guarantee anything about total work memory used, but it's the tool I'd most like as an admin. And the per-connection hard limit is the tool I'd like to have for protecting myself against memory leaks or individual connections going bonkers and killing all my connections for an instance restart. -Jeremy
Re: Add the ability to limit the amount of memory that can be allocated to backends.
While I don't have a detailed design in mind, I'd like to add a strong
+1 on the general idea that work_mem is hard to effectively use because
queries can vary so widely in how many nodes might need work memory.
I'd almost like to have two limits:
First, a hard per-connection limit which could be set very high - we
can track total memory usage across contexts inside of palloc and pfree
(and maybe this could also be exposed in pg_stat_activity for easy
visibility into a snapshot of memory use across all backends). If
palloc detects that an allocation would take the total over the hard
limit, then you just fail the palloc with an OOM. This protects
postgres from a memory leak in a single backend OOM'ing the whole
system and restarting the whole database; failing a single connection
is better than failing all of them.
Second, a soft per-connection "total_connection_work_mem_target" which
could be set lower. The planner can just look at the total number of
nodes that it expects to allocate work memory, divide the target by
this and then set the work_mem for that query. There should be a
reasonable floor (minimum) for work_mem - maybe the value of work_mem
itself becomes this and the new target doesn't do anything besides
increasing runtime work_mem.
Maybe even could do a "total_instance_work_mem_target" where it's
divided by the number of average active connections or something.