Thread: PostgreSQL's vacuumdb fails to allocate memory for non-root users
FreeBSD 5.4-Release PostgreSQL 8.0.3 I noticed that the nightly cron consisting of a vacuumdb was failing due to "unable to allocate memory". I do have maintenance_mem set at 512MB, and the /boot/loader.conf file sets the max datasize to 1GB (verified by limit). The odd thing is that if I run the command (either vacuumdb from the command line or vacuum verbose analyze from a psql session) as the Unix user root (and any psql superuser) the vacuum runs fine. It is when the unix user is non-root (e.g. su -l pgsql -c "vacuumdb -a -z") that this memory error occurs. All users use the "default" class for login.conf purposes which has not been modified from its installed settings. Any ideas on how to a) troubleshoot this or b) fix this (if it is something obvious that I just cannot see). Thanks, Sven
Sven Willenberger <sven@dmv.com> writes: > FreeBSD 5.4-Release > PostgreSQL 8.0.3 > > I noticed that the nightly cron consisting of a vacuumdb was failing due > to "unable to allocate memory". I do have maintenance_mem set at 512MB, > and the /boot/loader.conf file sets the max datasize to 1GB (verified by > limit). The odd thing is that if I run the command (either vacuumdb from > the command line or vacuum verbose analyze from a psql session) as the > Unix user root (and any psql superuser) the vacuum runs fine. It is when > the unix user is non-root (e.g. su -l pgsql -c "vacuumdb -a -z") that > this memory error occurs. All users use the "default" class for > login.conf purposes which has not been modified from its installed > settings. Any ideas on how to a) troubleshoot this or b) fix this (if it > is something obvious that I just cannot see). Is the out-of-memory condition occurring on the server or client side? Is there anything in the Postgres logs? You might put a 'ulimit -a' command in your cron script to make sure your memory limit settings are propagating correctly... -Doug
On Wed, 2005-06-29 at 09:43 -0400, Douglas McNaught wrote: > Sven Willenberger <sven@dmv.com> writes: > > > FreeBSD 5.4-Release > > PostgreSQL 8.0.3 > > > > I noticed that the nightly cron consisting of a vacuumdb was failing due > > to "unable to allocate memory". I do have maintenance_mem set at 512MB, > > and the /boot/loader.conf file sets the max datasize to 1GB (verified by > > limit). The odd thing is that if I run the command (either vacuumdb from > > the command line or vacuum verbose analyze from a psql session) as the > > Unix user root (and any psql superuser) the vacuum runs fine. It is when > > the unix user is non-root (e.g. su -l pgsql -c "vacuumdb -a -z") that > > this memory error occurs. All users use the "default" class for > > login.conf purposes which has not been modified from its installed > > settings. Any ideas on how to a) troubleshoot this or b) fix this (if it > > is something obvious that I just cannot see). > > Is the out-of-memory condition occurring on the server or client side? > Is there anything in the Postgres logs? In this case they are one and the same machine ... i.e whether invoked from the command-line as vacuumdb or invoked from psql (connecting to localhost) as "vacuum analyze;" the memory error occurs. The logfile reveals: ERROR: out of memory DETAIL: Failed on request of size 536870910. > You might put a 'ulimit -a' command in your cron script to make sure > your memory limit settings are propagating correctly... I created a cron that consisted of just that command (ulimit -a) and the output revealed nothing abnormal (i.e. max dataseg still 1G, etc). This occurs outside of cron also, (it was just the failing cronjob that brought it to my attention). Again, if I log in as myself and try to run the command vaccumdb -a -z it fails; if I su to root and repeat it works fine. I am trying to narrow this down to a PostgreSQL issue vs FreeBSD issue. Sven
Sven Willenberger <sven@dmv.com> writes: > Again, if I log in as myself and try to run > the command vaccumdb -a -z it fails; if I su to root and repeat it works > fine. I am trying to narrow this down to a PostgreSQL issue vs FreeBSD > issue. That's really weird, because in both cases the actual VACUUM work happens in the backend, which should be running as 'postgres'. I don't see how the client user could possibly make a difference. -Doug
Sven Willenberger <sven@dmv.com> writes: > ERROR: out of memory > DETAIL: Failed on request of size 536870910. That's a server-side failure ... > Again, if I log in as myself and try to run > the command vaccumdb -a -z it fails; if I su to root and repeat it works > fine. I am trying to narrow this down to a PostgreSQL issue vs FreeBSD > issue. That's fairly hard to believe, assuming that you are talking to the same server in both cases (I have seen trouble reports that turned out to be because the complainant was accidentally using two different servers...) The ulimit the backend is running under couldn't change depending on where the client is su'd to. Is it possible that you've got per-user configuration settings that affect this, like a different maintenance_mem value for the root user? regards, tom lane
On Wed, 2005-06-29 at 11:21 -0400, Tom Lane wrote: > Sven Willenberger <sven@dmv.com> writes: > > ERROR: out of memory > > DETAIL: Failed on request of size 536870910. > > That's a server-side failure ... > > > Again, if I log in as myself and try to run > > the command vaccumdb -a -z it fails; if I su to root and repeat it works > > fine. I am trying to narrow this down to a PostgreSQL issue vs FreeBSD > > issue. > > That's fairly hard to believe, assuming that you are talking to the same > server in both cases (I have seen trouble reports that turned out to be > because the complainant was accidentally using two different servers...) > The ulimit the backend is running under couldn't change depending on > where the client is su'd to. > > Is it possible that you've got per-user configuration settings that > affect this, like a different maintenance_mem value for the root user? > > regards, tom lane > I have done some more tests and tried to keep the results of vacuumdb distinct from connecting to the backend (psql -U pgsql ...) and running vaccum analyze. Apparently the hopping back and forth from both methods interfered with my original interpretations of what appeared to be happening. Anyway, here is what I see: First test psql connection version: psql then vacuum analyze => works fine whether the current unix user is root or plain user. (ran this a couple times via new psql connections to verify). Then quit psql and move to command line vacuumdb => whether running as su -l pgsql -c "vacuumdb -a -z" (or specifying a dbname instead of all) or directly as a user the out of memory error occurs. If I then connect via psql to the backend and try to run vacuum analyze I receive an out of memory error. This last connection to psql after a failed vacuumdb was confabulating my interpretations earlier of the error being based on unix user. top shows: PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 6754 pgsql 4 0 602M 88688K sbwait 0 0:03 0.00% 0.00% postgres until I disconnect the psql session. I can then psql again and the same error happens (out of memory) and top shows the same again. At this point I am not sure if it is a memory issue of vacuumdb, vacuum itself, or the FreeBSD memory management system. Again, if enough time passes (or some other events) since I last try vacuumdb, then running vacuum [verbose][analyze] via a psql connection works fine. Sven
On Jun 29, 2005, at 9:01 AM, Sven Willenberger wrote: > Unix user root (and any psql superuser) the vacuum runs fine. It is > when > the unix user is non-root (e.g. su -l pgsql -c "vacuumdb -a -z") that > this memory error occurs. All users use the "default" class for > login.conf purposes which has not been modified from its installed > settings. Any ideas on how to a) troubleshoot this or b) fix this > (if it > is something obvious that I just cannot see). This doesn't make sense: the actual command is executed by the backend postgres server, so the uid of the client program doens't make a bit of difference. You need to see exactly who is generating that error. It certainly is not the Pg backend.
On Wed, 2005-06-29 at 14:59 -0400, Vivek Khera wrote: > On Jun 29, 2005, at 9:01 AM, Sven Willenberger wrote: > > > Unix user root (and any psql superuser) the vacuum runs fine. It is > > when > > the unix user is non-root (e.g. su -l pgsql -c "vacuumdb -a -z") that > > this memory error occurs. All users use the "default" class for > > login.conf purposes which has not been modified from its installed > > settings. Any ideas on how to a) troubleshoot this or b) fix this > > (if it > > is something obvious that I just cannot see). > > This doesn't make sense: the actual command is executed by the > backend postgres server, so the uid of the client program doens't > make a bit of difference. > > You need to see exactly who is generating that error. It certainly > is not the Pg backend. > The issue being tied to a certain "login" user has been negated by further testing (the illusion that it was based on user happened as a result of the order in which I ran tests to try and find out what was going on ) -- it does seem tied to invoking vacuumdb at this point. As a point of clarification, when maxdsiz and dfldsiz are set, those values are per "process" not per "user", correct? Something I have noticed, when the memory error occurs during the psql session (after a failed vacuumdb attempt) the memory stays at 600+MB in top (under size) until the psql session is closed -- that may just be the way top reports it though. Sven Sven
On Wed, 2005-06-29 at 16:40 -0400, Charles Swiger wrote: > On Jun 29, 2005, at 4:12 PM, Sven Willenberger wrote: > [ ... ] > > Something I have noticed, > > when the memory error occurs during the psql session (after a failed > > vacuumdb attempt) the memory stays at 600+MB in top (under size) until > > the psql session is closed -- that may just be the way top reports it > > though. > > Double-check your system limits via "ulimit -a" or "ulimit -aH". By > default, FreeBSD will probably restrict the maximum data size of the > process to 512MB, which may be what you are running into. You can > rebuild the kernel to permit a larger data size, or else tweak /boot/ > loader.conf: > > echo 'kern.maxdsiz="1024M"' >> /boot/loader.conf > :>ulimit -a cpu time (seconds, -t) unlimited file size (512-blocks, -f) unlimited data seg size (kbytes, -d) 1048576 stack size (kbytes, -s) 65536 core file size (512-blocks, -c) unlimited max memory size (kbytes, -m) unlimited locked memory (kbytes, -l) unlimited max user processes (-u) 5547 open files (-n) 11095 virtual mem size (kbytes, -v) unlimited sbsize (bytes, -b) unlimited :> cat /boot/loader.conf kern.maxdsiz="1073741824" kern.dfldsiz="891289600" and if I don't run vacuumdb at all, but rather connect to the backend via psql and run vacuum, it works ok with full memory allocation. Still testing RAM to see if the issue is physical. Sven
On Wed, 2005-06-29 at 16:58 -0400, Sven Willenberger wrote: > On Wed, 2005-06-29 at 16:40 -0400, Charles Swiger wrote: > > On Jun 29, 2005, at 4:12 PM, Sven Willenberger wrote: > > [ ... ] > > > Something I have noticed, > > > when the memory error occurs during the psql session (after a failed > > > vacuumdb attempt) the memory stays at 600+MB in top (under size) until > > > the psql session is closed -- that may just be the way top reports it > > > though. > > > > Double-check your system limits via "ulimit -a" or "ulimit -aH". By > > default, FreeBSD will probably restrict the maximum data size of the > > process to 512MB, which may be what you are running into. You can > > rebuild the kernel to permit a larger data size, or else tweak /boot/ > > loader.conf: > > > > echo 'kern.maxdsiz="1024M"' >> /boot/loader.conf > > > > :>ulimit -a > cpu time (seconds, -t) unlimited > file size (512-blocks, -f) unlimited > data seg size (kbytes, -d) 1048576 > stack size (kbytes, -s) 65536 > core file size (512-blocks, -c) unlimited > max memory size (kbytes, -m) unlimited > locked memory (kbytes, -l) unlimited > max user processes (-u) 5547 > open files (-n) 11095 > virtual mem size (kbytes, -v) unlimited > sbsize (bytes, -b) unlimited > :> cat /boot/loader.conf > kern.maxdsiz="1073741824" > kern.dfldsiz="891289600" > > and if I don't run vacuumdb at all, but rather connect to the backend > via psql and run vacuum, it works ok with full memory allocation. Still > testing RAM to see if the issue is physical. > > Sven > > I have found the answer/problem. On a hunch I increased maxdsiz to 1.5G in the loader.conf file and rebooted. I ran vacuumdb and watched top as the process proceeded. What I saw was SIZE sitting at 603MB (which was 512MB plus another 91MB which corresponded nicely to the value of RES for the process. A bit into the process I saw SIZE jump to 1115 -- i.e. another 512 MB of RAM was requested and this time allocated. At one point SIZE dropped back to 603 and then back up to 1115. I suspect the same type of issue was occuring in regular vacuum from the psql client connecting to the backend, for some reason not as frequently. I am gathering that maintenance work mem is either not being recognized as having already been allocated and another malloc is made or the process is thinking the memory was released and tried to grab a chunk of memory again. This would correspond to the situation where I was size stuck at 603MB after a failed memory allocation (when maxdsiz was only 1G). Now I am not sure if I will run into the situation where yet another 512MB request would be made (when already 1115 appears in SIZE) but if so, then the same problem will arise. I will keep an eye on it ... Sven
Sven Willenberger <sven@dmv.com> writes: > I have found the answer/problem. On a hunch I increased maxdsiz to 1.5G > in the loader.conf file and rebooted. I ran vacuumdb and watched top as > the process proceeded. What I saw was SIZE sitting at 603MB (which was > 512MB plus another 91MB which corresponded nicely to the value of RES > for the process. A bit into the process I saw SIZE jump to 1115 -- i.e. > another 512 MB of RAM was requested and this time allocated. At one > point SIZE dropped back to 603 and then back up to 1115. I suspect the > same type of issue was occuring in regular vacuum from the psql client > connecting to the backend, for some reason not as frequently. I am > gathering that maintenance work mem is either not being recognized as > having already been allocated and another malloc is made or the process > is thinking the memory was released and tried to grab a chunk of memory > again. Hmm. It's probably a fragmentation issue. VACUUM will allocate a maintenance work mem-sized chunk during command startup, but that's likely not all that gets allocated, and if any stuff allocated after it is not freed at the same time, the process size won't go back down. Which wouldn't be a killer in itself, but unless the next iteration is able to fit that array in the same space, you'd see the above behavior. BTW, do you have any evidence that it's actually useful to set maintenance work mem that high for VACUUM? A quick and dirty solution would be to bound the dead-tuples array size at something more sane... regards, tom lane
Tom Lane presumably uttered the following on 06/29/05 19:12: > Sven Willenberger <sven@dmv.com> writes: > >>I have found the answer/problem. On a hunch I increased maxdsiz to 1.5G >>in the loader.conf file and rebooted. I ran vacuumdb and watched top as >>the process proceeded. What I saw was SIZE sitting at 603MB (which was >>512MB plus another 91MB which corresponded nicely to the value of RES >>for the process. A bit into the process I saw SIZE jump to 1115 -- i.e. >>another 512 MB of RAM was requested and this time allocated. At one >>point SIZE dropped back to 603 and then back up to 1115. I suspect the >>same type of issue was occuring in regular vacuum from the psql client >>connecting to the backend, for some reason not as frequently. I am >>gathering that maintenance work mem is either not being recognized as >>having already been allocated and another malloc is made or the process >>is thinking the memory was released and tried to grab a chunk of memory >>again. > > > Hmm. It's probably a fragmentation issue. VACUUM will allocate a > maintenance work mem-sized chunk during command startup, but that's > likely not all that gets allocated, and if any stuff allocated after > it is not freed at the same time, the process size won't go back down. > Which wouldn't be a killer in itself, but unless the next iteration > is able to fit that array in the same space, you'd see the above > behavior. > So maintenance work mem is not a measure of the max that can allocated by a maintenance procedure but rather an increment of memory that is requested by a maintenance process (which currently are vacuum and index, no?), if my reading of the above is correct. > BTW, do you have any evidence that it's actually useful to set > maintenance work mem that high for VACUUM? A quick and dirty solution > would be to bound the dead-tuples array size at something more sane... > I was under the assumption that on systems with RAM to spare, it was beneficial to set main work mem high to make those processes more efficient. Again my thinking was that the value you set for that variable determined a *max* allocation by any given maintenance process, not a memory allocation request size. If, as my tests would indicate, the process can request and receive more memory than specified in maintenance work mem, then to play it safe I imagine I could drop that value to 256MB or so. Sven
On Jun 29, 2005, at 4:12 PM, Sven Willenberger wrote: [ ... ] > Something I have noticed, > when the memory error occurs during the psql session (after a failed > vacuumdb attempt) the memory stays at 600+MB in top (under size) until > the psql session is closed -- that may just be the way top reports it > though. Double-check your system limits via "ulimit -a" or "ulimit -aH". By default, FreeBSD will probably restrict the maximum data size of the process to 512MB, which may be what you are running into. You can rebuild the kernel to permit a larger data size, or else tweak /boot/ loader.conf: echo 'kern.maxdsiz="1024M"' >> /boot/loader.conf -- -Chuck