Thread: VACUUM ANALYZE out of memory
Hi, I am encountering problems when trying to run VACUUM FULL ANALYZE on a particular table in my database; namely that the process crashes out with the following problem: INFO: vacuuming "pg_catalog.pg_largeobject" ERROR: out of memory DETAIL: Failed on request of size 536870912. INFO: vacuuming "pg_catalog.pg_largeobject" ERROR: out of memory DETAIL: Failed on request of size 32. Granted, our largeobject table is a bit large: INFO: analyzing "pg_catalog.pg_largeobject" INFO: "pg_largeobject": scanned 3000 of 116049431 pages, containing 18883 live rows and 409 dead rows; 3000 rows in sample, 730453802 estimated total rows ...but I trust that VACUUM ANALYZE doesn't try to read the entire table into memory at once. :-) The machine was set up with 1.2 GB shared memory and 1 GB maintenance memory, so I would have expected this to be sufficient for the task (we will eventually set this up oa 64-bit machine with 16 GB memory, but at the moment we are restricted to 32 bit). This is currently running on PostgreSQL 8.3beta2, but since I haven't seen this problem reported before, I guess this will also be a problem in earlier versions. Have we run into a bug/limitation of the Postgres VACUUM or is this something we might be able to solve via reconfiguring the server/database, or downgrading the DBMS version. I shall be trying to run a simple VACUUM later this evening, in order to see whether that manages to complete. Unfortunately, due to the time it takes to load data, it's not really practicable to shift servers at the moment A little background on the application: We are building a raster-database to be used for storing weather and water data. The raster data (2D matrices of floating points) are stored using large objects and indexed using a values table (with multiple dimensions: time, parameter, altitudes, etc). This is a technique I've worked with successfully in the past, though in that case using an Informix DBMS. My current employer is a strong proponent for Open Software, which has led to our implementation of the current system on a PostgreSQL DBMS (we will also be releasing our system as GPL in the near future). The test instance we are working on now is about 1 TB; we expect to increase that by a factor of at least 5 within the first year of operation, so we'd really like to ensure that we can get VACUUM working (although the data is mostly going to be static on this installation, we will have others that won't be). Anyone with some insights on VACUUM FULL ANALYZE who can weigh in on what is going wrong? Regards, Michael Akinde ---- Database Architect, met.no
Attachment
Michael Akinde wrote: > Hi, > > I am encountering problems when trying to run VACUUM FULL ANALYZE on a > particular table in my database; namely that the process crashes out > with the following problem: > > INFO: vacuuming "pg_catalog.pg_largeobject" > ERROR: out of memory > DETAIL: Failed on request of size 536870912. > > INFO: vacuuming "pg_catalog.pg_largeobject" > ERROR: out of memory > DETAIL: Failed on request of size 32. > > Granted, our largeobject table is a bit large: > > INFO: analyzing "pg_catalog.pg_largeobject" > INFO: "pg_largeobject": scanned 3000 of 116049431 pages, containing > 18883 live rows and 409 dead rows; 3000 rows in sample, 730453802 > estimated total rows > > ...but I trust that VACUUM ANALYZE doesn't try to read the entire table > into memory at once. :-) The machine was set up with 1.2 GB shared > memory and 1 GB maintenance memory, so I would have expected this to be > sufficient for the task (we will eventually set this up oa 64-bit > machine with 16 GB memory, but at the moment we are restricted to 32 bit). > > This is currently running on PostgreSQL 8.3beta2, but since I haven't > seen this problem reported before, I guess this will also be a problem > in earlier versions. Have we run into a bug/limitation of the Postgres > VACUUM or is this something we might be able to solve via reconfiguring > the server/database, or downgrading the DBMS version. this seems simply a problem of setting maintenance_work_mem too high (ie higher than what your OS can support - maybe an ulimit/processlimit is in effect?) . Try reducing maintenance_work_mem to say 128MB and retry. If you promise postgresql that it can get 1GB it will happily try to use it ... Stefan
On Tue, 2007-12-11 at 10:59 +0100, Michael Akinde wrote: > I am encountering problems when trying to run VACUUM FULL ANALYZE on a > particular table in my database; namely that the process crashes out > with the following problem: Probably just as well, since a VACUUM FULL on an 800GB table is going to take a rather long time, so you are saved from discovering just how excessively long it will run for. But it seems like a bug. This happens consistently, I take it? Can you run ANALYZE and then VACUUM VERBOSE, both on just pg_largeobject, please? It will be useful to know whether they succeed. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Thanks for the rapid responses.
Stefan Kaltenbrunner wrote:
The way the process was running, it seems to have basically just continually allocated memory until (presumably) it broke through the slightly less than 1.2 GB shared memory allocation we had provided for PostgreSQL (at least the postgres process was still running by the time resident size had reached 1.1 GB).
Incidentally, in the first error of the two I posted, the shared memory setting was significantly lower (24 MB, I believe). I'll try with 128 MB before I leave in the evening, though (assuming the other tests I'm running complete by then).
Simon Riggs wrote:
so I like to know what (useful) tools we have available and stress the system as much as possible... :-)
I am currently running a VACUUM VERBOSE on the database. It isn't done yet, but it is running with a steady (low) resource usage.
Regards,
Michael A.
Stefan Kaltenbrunner wrote:
this seems simply a problem of setting maintenance_work_mem too high (ie higher than what your OS can support - maybe an ulimit/processlimit is in effect?) . Try reducing maintenance_work_mem to say 128MB and retry.I set up the system together with one of our Linux sysOps, so I think the settings should be OK. Kernel.shmmax is set to 1.2 GB, but I'll get him to recheck if there could be any other limits he has forgotten to increase.
If you promise postgresql that it can get 1GB it will happily try to use it ...
The way the process was running, it seems to have basically just continually allocated memory until (presumably) it broke through the slightly less than 1.2 GB shared memory allocation we had provided for PostgreSQL (at least the postgres process was still running by the time resident size had reached 1.1 GB).
Incidentally, in the first error of the two I posted, the shared memory setting was significantly lower (24 MB, I believe). I'll try with 128 MB before I leave in the evening, though (assuming the other tests I'm running complete by then).
Simon Riggs wrote:
I suspect so, though it has only happened a couple of times yet (as it does take a while) before it hits that 1.1 GB roof. But part of the reason for running the VACUUM FULL was of course to find out how long time it would take. Reliability is always a priority for us,On Tue, 2007-12-11 at 10:59 +0100, Michael Akinde wrote:I am encountering problems when trying to run VACUUM FULL ANALYZE on a particular table in my database; namely that the process crashes out with the following problem:Probably just as well, since a VACUUM FULL on an 800GB table is going to take a rather long time, so you are saved from discovering just how excessively long it will run for. But it seems like a bug. This happens consistently, I take it?
so I like to know what (useful) tools we have available and stress the system as much as possible... :-)
I ran just ANALYZE on the entire database yesterday, and that worked without any problems.Can you run ANALYZE and then VACUUM VERBOSE, both on just pg_largeobject, please? It will be useful to know whether they succeed.
I am currently running a VACUUM VERBOSE on the database. It isn't done yet, but it is running with a steady (low) resource usage.
Regards,
Michael A.
Attachment
Michael Akinde wrote: > Thanks for the rapid responses. > > Stefan Kaltenbrunner wrote: >> this seems simply a problem of setting maintenance_work_mem too high >> (ie higher than what your OS can support - maybe an >> ulimit/processlimit is in effect?) . Try reducing maintenance_work_mem >> to say 128MB and retry. >> If you promise postgresql that it can get 1GB it will happily try to >> use it ... > I set up the system together with one of our Linux sysOps, so I think > the settings should be OK. Kernel.shmmax is set to 1.2 GB, but I'll get > him to recheck if there could be any other limits he has forgotten to > increase. > > The way the process was running, it seems to have basically just > continually allocated memory until (presumably) it broke through the > slightly less than 1.2 GB shared memory allocation we had provided for > PostgreSQL (at least the postgres process was still running by the time > resident size had reached 1.1 GB). > > Incidentally, in the first error of the two I posted, the shared memory > setting was significantly lower (24 MB, I believe). I'll try with 128 MB > before I leave in the evening, though (assuming the other tests I'm > running complete by then). this is most likely not at all related to your shared memory settings but to your setting of maintenance_work_mem which is the amount of memory a single backend(!) can use for maintainance operations (which VACUUM is for example). notice that your first error refers to an allocation of about 500MB which your ulimit/kernel process limit simply might not be able to give a single process. And for very large tables VACUUM FULL is generally not a good idea at all - either look into regular normal vacuum scheduling or if you need to recover from a a bloated database use a command that forced a rewrite of the table (like CLUSTER) which will be heaps faster but also require about twice the amount of diskspace. Stefan
Michael Akinde wrote: > Thanks for the rapid responses. > > Stefan Kaltenbrunner wrote: >> this seems simply a problem of setting maintenance_work_mem too high (ie >> higher than what your OS can support - maybe an ulimit/processlimit is in >> effect?) . Try reducing maintenance_work_mem to say 128MB and retry. >> If you promise postgresql that it can get 1GB it will happily try to use >> it ... > I set up the system together with one of our Linux sysOps, so I think the > settings should be OK. Kernel.shmmax is set to 1.2 GB, but I'll get him to > recheck if there could be any other limits he has forgotten to increase. You are confusing shared memory (shared_buffers and kernel.shmmax) with local memory (work_mem and maintenance_work_mem). The error you got is about the latter kind. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "La soledad es compañía"
On Tue, Dec 11, 2007 at 12:30:43PM +0100, Michael Akinde wrote: > The way the process was running, it seems to have basically just > continually allocated memory until (presumably) it broke through the > slightly less than 1.2 GB shared memory allocation we had provided for > PostgreSQL (at least the postgres process was still running by the time > resident size had reached 1.1 GB). I think you're slightly confused. The VACUUM isn't going to use much of the shared memory anyway. Shared memory is just disk buffers mostly and is all allocated at startup. The memory being allocated by VACUUM is the maintainence workmem *in addition* to any shared memory. Also, depending on what's happening it may be allocating maintainence workmem more than once. > Incidentally, in the first error of the two I posted, the shared memory > setting was significantly lower (24 MB, I believe). I'll try with 128 MB > before I leave in the evening, though (assuming the other tests I'm > running complete by then). What you want is a reasonable shared mem, maybe 0.5GB and a smaller maintainence workmem since the letter is probably what's killing you. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
Stefan Kaltenbrunner wrote: > Michael Akinde wrote: >> Incidentally, in the first error of the two I posted, the shared >> memory setting was significantly lower (24 MB, I believe). I'll try >> with 128 MB before I leave in the evening, though (assuming the other >> tests I'm running complete by then). > > this is most likely not at all related to your shared memory settings > but to your setting of maintenance_work_mem which is the amount of > memory a single backend(!) can use for maintainance operations (which > VACUUM is for example). > notice that your first error refers to an allocation of about 500MB > which your ulimit/kernel process limit simply might not be able to > give a single process. Yes - in the first case, the maintenance_work_mem was at default (so I wasn't surprised to see it fail to allocate half a gigabyte). In the second case, though, maintenance_work_mem was set at 1024 MB (where it then has the slighly odd error "Failed on request of size 32"). The server has 4 GB RAM available, so even if it was trying to use 1.2 GB shared memory + 1 GB for maintenance_mem all at once, it still seems odd that the process would fail. As far as I can tell (running ulimit -a ), the limits look pretty OK to me. core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited max nice (-e) 0 file size (blocks, -f) unlimited pending signals (-i) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) unlimited max rt priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Being unable to run VACUUM FULL isn't a problem for the current configuration of our application (as it will mostly be large amounts of static data), but we're likely to have an application working with the database next year where we'd move around 100 GB through the database on a daily basis. At least based on the documentation of the various commands, I would expect that one would want to perform VACUUM FULL every once in a while. Again, thanks for the feedback. Regards, Michael Akinde Database Architect, met.no
Attachment
On Tue, Dec 11, 2007 at 03:18:54PM +0100, Michael Akinde wrote: > The server has 4 GB RAM available, so even if it was trying to use 1.2 > GB shared memory + 1 GB for maintenance_mem all at once, it still seems > odd that the process would fail. As far as I can tell (running ulimit -a > ), the limits look pretty OK to me. IIRC you said you're on a 32-bit architecture? Which means any single process only has 4GB address space. Take off 1GB for the kernel, 1GB shared memory, 1 GB maintainence workmem and a collection of libraries, stack space and general memory fragmentation and I can absolutly beleive you've run into the limit of *address* space. On a 64-bit machine it doesn't matter so much but on a 32-bit machine using 1GB for shared memory severely cuts the amount of auxilliary memory the server can use. Unless you've shown a measuable difference between 256MB and 1G shared memory, I'd say you're better off using the smaller amount so you can have higher maintainence work mem. VACUUM doesn't benefit much from lots of shared buffers, but it does benefit from maint workmem. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
Martijn van Oosterhout wrote: > IIRC you said you're on a 32-bit architecture? Which means any single > process only has 4GB address space. Take off 1GB for the kernel, 1GB > shared memory, 1 GB maintainence workmem and a collection of libraries, > stack space and general memory fragmentation and I can absolutly > beleive you've run into the limit of *address* space. > Should have been 64-bit, but a foul-up means it is running in 32-bit at the moment. > On a 64-bit machine it doesn't matter so much but on a 32-bit machine > using 1GB for shared memory severely cuts the amount of auxilliary > memory the server can use. Unless you've shown a measuable difference > between 256MB and 1G shared memory, I'd say you're better off using the > smaller amount so you can have higher maintainence work mem. > We're still in the process of testing and tuning (which takes its sweet time), so at the moment I can not tell what benefits we have on the different settings in practice. But I'll try to set shared buffers down to 128-256 MB and the maintenance_work_memory to 512-1024MB when I next have a time slot where I can run the server into the ground. However, the problem also occurred with the shared_buffers limit set at 24 MB and maintenance_work_mem was at its default setting (16 MB?), so I would be rather surprised if the problem did not repeat itself. Regards, Michael Akinde Database Architect, met.no
Attachment
[Synopsis: VACUUM FULL ANALYZE goes out of memory on a very large pg_catalog.pg_largeobject table.] Simon Riggs wrote: > Can you run ANALYZE and then VACUUM VERBOSE, both on just > pg_largeobject, please? It will be useful to know whether they succeed ANALYZE: INFO: analyzing "pg_catalog.pg_largeobject" INFO: "pg_largeobject": scanned 3000 of 116049431 pages, containing 18883 live rows and 409 dead rows; 3000 rows in sample, 730453802 estimated total rows VACUUM VERBOSE: INFO: vacuuming "pg_catalog.pg_largeobject" INFO: scanned index "pg_largeobject_loid_pn_index" to remove 106756133 row versions DETAIL: CPU 38.88s/303.43u sec elapsed 2574.24 sec. INFO: "pg_largeobject": removed 106756133 row versions in 13190323 pages DETAIL: CPU 259.42s/113.20u sec elapsed 14017.17 sec. INFO: index "pg_largeobject_loid_pn_index" now contains 706303560 row versions in 2674471 pages DETAIL: 103960219 index row versions were removed. 356977 index pages have been deleted, 77870 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.02 sec. INFO: "pg_largeobject": found 17489832 removable, 706303560 nonremovable row versions in 116049431 pages DETAIL: 0 dead row versions cannot be removed yet. There were 36000670 unused item pointers. 64493445 pages contain useful free space. 0 pages are entirely empty. CPU 1605.42s/1107.48u sec elapsed 133032.02 sec. WARNING: relation "pg_catalog.pg_largeobject" contains more than "max_fsm_pages" pages with useful free space HINT: Consider using VACUUM FULL on this relation or increasing the configuration parameter "max_fsm_pages". VACUUM (This took some 36+ Hours. It will be interesting to see what happens when we add another 20 years worth of data to the 13 years already in the DB). ANALYZE: INFO: analyzing "pg_catalog.pg_largeobject" INFO: "pg_largeobject": scanned 3000 of 116049431 pages, containing 17830 live rows and 0 dead rows; 3000 rows in sample, 689720452 estimated total rows I will lower the SharedMem and MaintenanceWorkMem settings as suggested in earlier posts before leaving for home this evening, and then let it run a VACUUM FULL ANALYZE. I remain dubious though - as mentioned, the first test I did had quite low settings for this, and we still had the memory crash. No reason not to try it though. Over Christmas, we will be moving this over on a 64-bit kernel and 16 GB, so after that we'll be able to test on the database with > 1GB maintenance memory as well. Regards, Michael A. Database Architect, met.no
Attachment
As suggested, I tested a VACUUM FULL ANALYZE with 128MB shared_buffers and 512 MB reserved for maintenance_work_mem (on a 32 bit machine with 4 GB RAM). That ought to leave more than enough space for other processes in the system. Again, the system fails on the VACUUM with the following error (identical to the error we had when maintenance_work_mem was very low. INFO: vacuuming "pg_catalog.pg_largeobject" ERROR: out of memory DETAIL: Failed on request of size 536870912 I've now also tested a 64-bit setup with 16 GB RAM, with 2 GB maintenance_work_mem; this time on PostgreSQL 8.2.5. INFO: vacuuming "pg_catalog.pg_largeobject" ERROR: invalid memory alloc request size 1073741824 It strikes me as somewhat worrying that VACUUM FULL ANALYZE has so much trouble with a large table. Granted - 730 million rows is a good deal - but it's really not that much for a large database. I'd expect an operation on such a table to take time, of course, but not to consistently crash out of memory. Any suggestions as to what we can otherwise try to isolate the problem? Regards, Michael Akinde Database Architect, met.no Michael Akinde wrote: > [Synopsis: VACUUM FULL ANALYZE goes out of memory on a very large > pg_catalog.pg_largeobject table.] > > Simon Riggs wrote: >> Can you run ANALYZE and then VACUUM VERBOSE, both on just >> pg_largeobject, please? It will be useful to know whether they succeed > ANALYZE: > > INFO: analyzing "pg_catalog.pg_largeobject" > INFO: "pg_largeobject": scanned 3000 of 116049431 pages, containing > 18883 live rows and 409 dead rows; 3000 rows in sample, 730453802 > estimated total rows > > VACUUM VERBOSE: > > INFO: vacuuming "pg_catalog.pg_largeobject" > INFO: scanned index "pg_largeobject_loid_pn_index" to remove > 106756133 row versions > DETAIL: CPU 38.88s/303.43u sec elapsed 2574.24 sec. > INFO: "pg_largeobject": removed 106756133 row versions in 13190323 pages > DETAIL: CPU 259.42s/113.20u sec elapsed 14017.17 sec. > INFO: index "pg_largeobject_loid_pn_index" now contains 706303560 row > versions in 2674471 pages > DETAIL: 103960219 index row versions were removed. > 356977 index pages have been deleted, 77870 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.02 sec. > INFO: "pg_largeobject": found 17489832 removable, 706303560 > nonremovable row versions in 116049431 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 36000670 unused item pointers. > 64493445 pages contain useful free space. > 0 pages are entirely empty. > CPU 1605.42s/1107.48u sec elapsed 133032.02 sec. > WARNING: relation "pg_catalog.pg_largeobject" contains more than > "max_fsm_pages" pages with useful free space > HINT: Consider using VACUUM FULL on this relation or increasing the > configuration parameter "max_fsm_pages". > VACUUM > > (This took some 36+ Hours. It will be interesting to see what happens > when we add another 20 years worth of data to the 13 years already in > the DB). > > ANALYZE: > > INFO: analyzing "pg_catalog.pg_largeobject" > INFO: "pg_largeobject": scanned 3000 of 116049431 pages, containing > 17830 live rows and 0 dead rows; 3000 rows in sample, 689720452 > estimated total rows > > I will lower the SharedMem and MaintenanceWorkMem settings as > suggested in earlier posts before leaving for home this evening, and > then let it run a VACUUM FULL ANALYZE. I remain dubious though - as > mentioned, the first test I did had quite low settings for this, and > we still had the memory crash. No reason not to try it though. > > Over Christmas, we will be moving this over on a 64-bit kernel and 16 > GB, so after that we'll be able to test on the database with > 1GB > maintenance memory as well. > > Regards, > > Michael A. > Database Architect, met.no > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Attachment
On Jan 7, 2008 2:40 PM, Michael Akinde <michael.akinde@met.no> wrote:
My Apologies if my question seems redundant and something you have already discussed with list members, but why do you need to do a VACUUM FULL? have you not vacuumed for a while? or some special requirement which requires very aggressive space re-claim? Vacuum Full is also known to cause some index bloat at times as well. most systems i know run regular vacuums and had never required to run a vacuum full.
As suggested, I tested a VACUUM FULL ANALYZE with 128MB shared_buffers
and 512 MB reserved for maintenance_work_mem (on a 32 bit machine with 4
GB RAM).
My Apologies if my question seems redundant and something you have already discussed with list members, but why do you need to do a VACUUM FULL? have you not vacuumed for a while? or some special requirement which requires very aggressive space re-claim? Vacuum Full is also known to cause some index bloat at times as well. most systems i know run regular vacuums and had never required to run a vacuum full.
--
Usama Munir Dar http://www.linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar
Hi,
The system we are building is intended to be utilized in a number of different applications, so the testing we are doing is primarily directed at stressing the system by running it through its paces and uncovering any weaknesses. I prefer to find as many problems as possible now, rather than in production. ;-)
For the current application set I'm testing, I expect we won't need to do much VACUUMing, as it will be a fairly static dataset only used for querying (once all the data is loaded). I know that we will be running some databases with some pretty rapid throughput (100 GB/day), but if VACUUM will do (as I expect), then we'll probably just stick to that. I don't have time to do any testing on that until next month, though.
I do find it odd, however, that pgsql recommends using a VACUUM FULL (as a result of running the VACUUM). Especially if, as it seems, VACUUM FULL doesn't work for tables beyond a certain size. Assuming we have not set up something completely wrongly, this seems like a bug.
If this is the wrong mailing list to be posting this, then please let me know.
Regards,
Michael Akinde
Database Architect, Met.no
Usama Dar wrote:
The system we are building is intended to be utilized in a number of different applications, so the testing we are doing is primarily directed at stressing the system by running it through its paces and uncovering any weaknesses. I prefer to find as many problems as possible now, rather than in production. ;-)
For the current application set I'm testing, I expect we won't need to do much VACUUMing, as it will be a fairly static dataset only used for querying (once all the data is loaded). I know that we will be running some databases with some pretty rapid throughput (100 GB/day), but if VACUUM will do (as I expect), then we'll probably just stick to that. I don't have time to do any testing on that until next month, though.
I do find it odd, however, that pgsql recommends using a VACUUM FULL (as a result of running the VACUUM). Especially if, as it seems, VACUUM FULL doesn't work for tables beyond a certain size. Assuming we have not set up something completely wrongly, this seems like a bug.
If this is the wrong mailing list to be posting this, then please let me know.
Regards,
Michael Akinde
Database Architect, Met.no
Usama Dar wrote:
On Jan 7, 2008 2:40 PM, Michael Akinde <michael.akinde@met.no> wrote:As suggested, I tested a VACUUM FULL ANALYZE with 128MB shared_buffers
and 512 MB reserved for maintenance_work_mem (on a 32 bit machine with 4
GB RAM).
My Apologies if my question seems redundant and something you have already discussed with list members, but why do you need to do a VACUUM FULL? have you not vacuumed for a while? or some special requirement which requires very aggressive space re-claim? Vacuum Full is also known to cause some index bloat at times as well. most systems i know run regular vacuums and had never required to run a vacuum full.
--
Usama Munir Dar http://www.linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar
Attachment
On Mon, Jan 07, 2008 at 10:40:23AM +0100, Michael Akinde wrote: > As suggested, I tested a VACUUM FULL ANALYZE with 128MB shared_buffers > and 512 MB reserved for maintenance_work_mem (on a 32 bit machine with 4 > GB RAM). That ought to leave more than enough space for other processes > in the system. Again, the system fails on the VACUUM with the following > error (identical to the error we had when maintenance_work_mem was very > low. > > INFO: vacuuming "pg_catalog.pg_largeobject" > ERROR: out of memory > DETAIL: Failed on request of size 536870912 Something is using up the memory on the machine, or (I'll bet this is more likely) your user (postgres? Whatever's running the postmaster) has a ulimit on its ability to allocate memory on the machine. > It strikes me as somewhat worrying that VACUUM FULL ANALYZE has so much > trouble with a large table. Granted - 730 million rows is a good deal - No, it's not really that big. I've never seen a problem like this. If it were the 8.3 beta, I'd be worried; but I'm inclined to suggest you look at the OS settings first given your set up. Note that you should almost never use VACUUM FULL unless you've really messed things up. I understand from the thread that you're just testing things out right now. But VACUUM FULL is not something you should _ever_ need in production, if you've set things up correctly. A
On Mon, 07 Jan 2008 10:57:53 -0500, Andrew Sullivan wrote: > Note that you should almost never use VACUUM FULL unless you've really > messed things up. I understand from the thread that you're just testing > things out right now. But VACUUM FULL is not something you should _ever_ > need in production, if you've set things up correctly. Then why does it exist? Is it a historical leftover? If it is only needed for emergency, should it not have a different name? Just curious.. Holger
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 07 Jan 2008 17:33:53 +0100 "Holger Hoffstaette" <holger@wizards.de> wrote: > On Mon, 07 Jan 2008 10:57:53 -0500, Andrew Sullivan wrote: > > > Note that you should almost never use VACUUM FULL unless you've > > really messed things up. I understand from the thread that you're > > just testing things out right now. But VACUUM FULL is not > > something you should _ever_ need in production, if you've set > > things up correctly. > > Then why does it exist? Is it a historical leftover? If it is > only needed for emergency, should it not have a different name? > Just curious.. There are times when it is required, usually when people don't configure normal vacuum/autovacuum correctly. Sincerely, Joshua D. Drake > > Holger > > > > ---------------------------(end of > broadcast)--------------------------- TIP 4: Have you searched our > list archives? > > http://archives.postgresql.org > - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD' -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHgmEbATb/zqfZUUQRAoyXAJ9GB4lXGr6UsEMpdT4KDvtIkRv+ywCdEMQV T07KuT+OUWcrr9NEX+blSuQ= =rDNL -----END PGP SIGNATURE-----
Michael Akinde <michael.akinde@met.no> writes: > As suggested, I tested a VACUUM FULL ANALYZE with 128MB shared_buffers > and 512 MB reserved for maintenance_work_mem (on a 32 bit machine with 4 > GB RAM). That ought to leave more than enough space for other processes > in the system. Again, the system fails on the VACUUM with the following > error (identical to the error we had when maintenance_work_mem was very > low. > INFO: vacuuming "pg_catalog.pg_largeobject" > ERROR: out of memory > DETAIL: Failed on request of size 536870912 Are you sure this is a VACUUM FULL, and not a plain VACUUM? I suspect that it's the latter, and the reason it's failing is that you are running the postmaster under a ulimit that is less than 512MB (or at least not enough more to allow an allocation of that size). regards, tom lane
Tom Lane wrote:
Ran a VACUUM FULL again yesterday (the prior query was a VACUUM FULL ANALYZE) and received essentially the same error, simply with different failure size.
INFO: vacuuming "pg_catalog.pg_largeobject"
ERROR: invalid memory alloc request size 1073741824
No changes done on the system from the previous iteration. VACUUM ran OK on the 8.3beta2 instance I tested with before Christmas (current setup is 8.2.5)
$> ulimit -a
core file size (blocks, -c) 100000000
data seg size (kbytes, -d) unlimited
max nice (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Anything we should try to change?
Andrew Sullivan wrote:
> Something is using up the memory on the machine, or (I'll bet this is more
> likely) your user (postgres? Whatever's running the postmaster) has a
> ulimit on its ability to allocate memory on the machine.
If one looks at the system resources while the VACUUM FULL is going up, its pretty obvious that its a postgres process going on a memory allocation rampage that eats up all the resources.
> No, it's not really that big. I've never seen a problem like this. If it
> were the 8.3 beta, I'd be worried; but I'm inclined to suggest you look at
> the OS settings first given your set up.
Have the same problem with the 8.3beta, but won't be using it anyway until its been out for a while.
> Note that you should almost never use VACUUM FULL unless you've really
> messed things up. I understand from the thread that you're just testing
> things out right now. But VACUUM FULL is not something you should _ever_
> need in production, if you've set things up correctly.
That's good to hear. I'm not particularly worried about this with respect to my own system. So far, we have found Postgres amazingly robust in every other issue that we have deliberately (or unwittingly) provoked. More reason to be puzzled about this problem, though.
Holger Hoffstaette wrote:
> Then why does it exist? Is it a historical leftover? If it is
> only needed for emergency, should it not have a different name?
Or in this case: if VACUUM FULL is never required (except in very special circumstances), it might be a good idea not to have VACUUM recommend running it (cf. the VACUUM I ran before New Year on a similar size table).
INFO: vacuuming "pg_catalog.pg_largeobject"
INFO: scanned index "pg_largeobject_loid_pn_index" to remove 106756133 row versions
DETAIL: CPU 38.88s/303.43u sec elapsed 2574.24 sec.
INFO: "pg_largeobject": removed 106756133 row versions in 13190323 pages
DETAIL: CPU 259.42s/113.20u sec elapsed 14017.17 sec.
INFO: index "pg_largeobject_loid_pn_index" now contains 706303560 row versions in 2674471 pages
DETAIL: 103960219 index row versions were removed.
356977 index pages have been deleted, 77870 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO: "pg_largeobject": found 17489832 removable, 706303560 nonremovable row versions in 116049431 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 36000670 unused item pointers.
64493445 pages contain useful free space.
0 pages are entirely empty.
CPU 1605.42s/1107.48u sec elapsed 133032.02 sec.
WARNING: relation "pg_catalog.pg_largeobject" contains more than "max_fsm_pages" pages with useful free space HINT: Consider using VACUUM FULL on this relation or increasing the configuration parameter "max_fsm_pages".
Anyway, thanks for the responses.
I do have the test setup available for hopefully some weeks, so if there is anyone interested in digging further into the matter, we do have the possibility to run further test attempts for a while (it takes about a week to load all the data, so once we take it back down, it may be a while before we set it up again).
Regards,
Michael Akinde
Database Architect, met.no
Michael Akinde <michael.akinde@met.no> writes:Very sure.INFO: vacuuming "pg_catalog.pg_largeobject"
ERROR: out of memory
DETAIL: Failed on request of size 536870912
Are you sure this is a VACUUM FULL, and not a plain VACUUM?
Ran a VACUUM FULL again yesterday (the prior query was a VACUUM FULL ANALYZE) and received essentially the same error, simply with different failure size.
INFO: vacuuming "pg_catalog.pg_largeobject"
ERROR: invalid memory alloc request size 1073741824
No changes done on the system from the previous iteration. VACUUM ran OK on the 8.3beta2 instance I tested with before Christmas (current setup is 8.2.5)
I suspect that it's the latter, and the reason it's failing is that you areWe went over this somewhat prior to Christmas. Here's how its currently set up.
running the postmaster under a ulimit that is less than 512MB (or at
least not enough more to allow an allocation of that size).
$> ulimit -a
core file size (blocks, -c) 100000000
data seg size (kbytes, -d) unlimited
max nice (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Anything we should try to change?
Andrew Sullivan wrote:
> Something is using up the memory on the machine, or (I'll bet this is more
> likely) your user (postgres? Whatever's running the postmaster) has a
> ulimit on its ability to allocate memory on the machine.
If one looks at the system resources while the VACUUM FULL is going up, its pretty obvious that its a postgres process going on a memory allocation rampage that eats up all the resources.
> No, it's not really that big. I've never seen a problem like this. If it
> were the 8.3 beta, I'd be worried; but I'm inclined to suggest you look at
> the OS settings first given your set up.
Have the same problem with the 8.3beta, but won't be using it anyway until its been out for a while.
> Note that you should almost never use VACUUM FULL unless you've really
> messed things up. I understand from the thread that you're just testing
> things out right now. But VACUUM FULL is not something you should _ever_
> need in production, if you've set things up correctly.
That's good to hear. I'm not particularly worried about this with respect to my own system. So far, we have found Postgres amazingly robust in every other issue that we have deliberately (or unwittingly) provoked. More reason to be puzzled about this problem, though.
Holger Hoffstaette wrote:
> Then why does it exist? Is it a historical leftover? If it is
> only needed for emergency, should it not have a different name?
Or in this case: if VACUUM FULL is never required (except in very special circumstances), it might be a good idea not to have VACUUM recommend running it (cf. the VACUUM I ran before New Year on a similar size table).
INFO: vacuuming "pg_catalog.pg_largeobject"
INFO: scanned index "pg_largeobject_loid_pn_index" to remove 106756133 row versions
DETAIL: CPU 38.88s/303.43u sec elapsed 2574.24 sec.
INFO: "pg_largeobject": removed 106756133 row versions in 13190323 pages
DETAIL: CPU 259.42s/113.20u sec elapsed 14017.17 sec.
INFO: index "pg_largeobject_loid_pn_index" now contains 706303560 row versions in 2674471 pages
DETAIL: 103960219 index row versions were removed.
356977 index pages have been deleted, 77870 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO: "pg_largeobject": found 17489832 removable, 706303560 nonremovable row versions in 116049431 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 36000670 unused item pointers.
64493445 pages contain useful free space.
0 pages are entirely empty.
CPU 1605.42s/1107.48u sec elapsed 133032.02 sec.
WARNING: relation "pg_catalog.pg_largeobject" contains more than "max_fsm_pages" pages with useful free space HINT: Consider using VACUUM FULL on this relation or increasing the configuration parameter "max_fsm_pages".
Anyway, thanks for the responses.
I do have the test setup available for hopefully some weeks, so if there is anyone interested in digging further into the matter, we do have the possibility to run further test attempts for a while (it takes about a week to load all the data, so once we take it back down, it may be a while before we set it up again).
Regards,
Michael Akinde
Database Architect, met.no
Attachment
On Tue, Jan 08, 2008 at 09:50:07AM +0100, Michael Akinde wrote: > stack size (kbytes, -s) 8192 Perhaps this is the issue? (I don't know.) Also, this _is_ for the postgres user, right? That's the relevant one: the one that's actually running the back end process. Also, are you sure there's nothing else in the way? I don't remember what OS you're using. On AIX, for instance, there's some _other_ dopey setting that allows you to control user resource consumption as well, and it means that ulimit's answers are not the full story. (I learned this through painful experience, and confess it's one of the many reasons I think AIX should be prounounced as one word, rather than three letters.) > Andrew Sullivan wrote: > > Something is using up the memory on the machine, or (I'll bet this is > more > > likely) your user (postgres? Whatever's running the postmaster) has a > > ulimit on its ability to allocate memory on the machine. > > If one looks at the system resources while the VACUUM FULL is going up, > its pretty obvious that its a postgres process going on a memory > allocation rampage that eats up all the resources. Of course VACUUM FULL is eating up as much memory as it can: it's moving a lot of data around. But is it in fact exhausting memory on the machine? There are only two possibilities: either there's something else that is preventing that allocation, or else you've run into a case so unusual that nobody else has ever seen it. The data you're talking about isn't that big: I've run similar-sized databases on my laptop without pain. > Or in this case: if VACUUM FULL is never required (except in very > special circumstances), it might be a good idea not to have VACUUM > recommend running it (cf. the VACUUM I ran before New Year on a similar > size table). The suggestion you see there, though, is in fact one of the cases where you might in fact want to run it. That is, > WARNING: relation "pg_catalog.pg_largeobject" contains more than > "max_fsm_pages" pages with useful free space HINT: Consider using VACUUM > FULL on this relation or increasing the configuration parameter > "max_fsm_pages". what it is saying is that a regular vacuum can no longer recover all the dead pages in the table, and if you want that space back and marked usable on your disk, you have to run VACUUM FULL (or, in fact, CLUSTER, or else dump and reload the table. But one of these). Note that I said that, if you have things configured _correctly_, you shouldn't have to run VACUUM FULL except in unusual circumstances. That doesn't mean "never". The problem here is an historical one: you have a "hangover" from previous missed maintenance or sub-optimal vacuum scheduling. In those cases, you may want to perform VACUUM FULL, provided you understand the potential side effects (like possibly slower inserts initially, and some possible index bloat). A
Michael Akinde <michael.akinde@met.no> writes: > We went over this somewhat prior to Christmas. Here's how its currently > set up. > $> ulimit -a > core file size (blocks, -c) 100000000 > ... What you're showing us is the conditions that prevail in your interactive session. That doesn't necessarily have a lot to do with the ulimits that init-scripts run under ... regards, tom lane
Tom Lane wrote:
Regards,
Michael A.
Database Architect, Met.no
Those are the ulimits of the db_admin account (i.e., the user that set up and runs the DB processes). Is Postgres limited by other settings?Michael Akinde <michael.akinde@met.no> writes:$> ulimit -a core file size (blocks, -c) 100000000 ...What you're showing us is the conditions that prevail in your interactive session. That doesn't necessarily have a lot to do with the ulimits that init-scripts run under ...
Regards,
Michael A.
Database Architect, Met.no
Attachment
On Tue, Jan 08, 2008 at 05:27:16PM +0100, Michael Akinde wrote: > > > Those are the ulimits of the db_admin account (i.e., the user that set > up and runs the DB processes). Is Postgres limited by other settings? Are you sure? On one system I used many years ago, /bin/sh wasn't what I thought it was, and so the ulimit that I got when logged in was not what the postmaster was starting under. Took me many days to figure out what was up. A
Andrew Sullivan <ajs@crankycanuck.ca> writes: > On Tue, Jan 08, 2008 at 05:27:16PM +0100, Michael Akinde wrote: >> Those are the ulimits of the db_admin account (i.e., the user that set >> up and runs the DB processes). Is Postgres limited by other settings? > Are you sure? > On one system I used many years ago, /bin/sh wasn't what I thought it was, > and so the ulimit that I got when logged in was not what the postmaster was > starting under. Took me many days to figure out what was up. The only thing I find convincing is to insert "ulimit -a >someplace" into the script that starts the postmaster, adjacent to where it does so, and then reboot. There are too many systems on which daemons are launched under settings different from what interactive shells use (a policy that's often a good one, too). regards, tom lane
On Tue, Jan 08, 2008 at 12:33:34PM -0500, Tom Lane wrote: > Andrew Sullivan <ajs@crankycanuck.ca> writes: > > On one system I used many years ago, /bin/sh wasn't what I thought it was, > > and so the ulimit that I got when logged in was not what the postmaster was > > starting under. Took me many days to figure out what was up. > > The only thing I find convincing is to insert "ulimit -a >someplace" > into the script that starts the postmaster, adjacent to where it does > so, and then reboot. There are too many systems on which daemons are > launched under settings different from what interactive shells use > (a policy that's often a good one, too). What about a stored procedure in a language that allows you to do system(3) calls? Sam
Sam Mason <sam@samason.me.uk> writes: > On Tue, Jan 08, 2008 at 12:33:34PM -0500, Tom Lane wrote: >> The only thing I find convincing is to insert "ulimit -a >someplace" >> into the script that starts the postmaster, > What about a stored procedure in a language that allows you to do > system(3) calls? Yeah, that would work, if you have any untrusted languages installed. regards, tom lane
On Tue, Jan 08, 2008 at 05:53:28PM +0000, Sam Mason wrote: > What about a stored procedure in a language that allows you to do > system(3) calls? PL/bash? (I think there is something like this). But surely the ulimit before start is much easier! A
Thanks for the explanation on the ulimits; I can see how that could turn out a problem in some cases.
Following Tom's suggestion, here is the startup script I used:
#!/bin/sh
ulimit -a > $PGHOST/server.ulimit
pg_ctl start -l $PGHOST/server.log
The ulimits seem to be the same, though:
$> cat server.ulimit
core file size (blocks, -c) 100000000
data seg size (kbytes, -d) unlimited
max nice (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Regards,
Michael A.
Tom Lane wrote:
Following Tom's suggestion, here is the startup script I used:
#!/bin/sh
ulimit -a > $PGHOST/server.ulimit
pg_ctl start -l $PGHOST/server.log
The ulimits seem to be the same, though:
$> cat server.ulimit
core file size (blocks, -c) 100000000
data seg size (kbytes, -d) unlimited
max nice (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Regards,
Michael A.
Tom Lane wrote:
Andrew Sullivan <ajs@crankycanuck.ca> writes:On Tue, Jan 08, 2008 at 05:27:16PM +0100, Michael Akinde wrote:Those are the ulimits of the db_admin account (i.e., the user that set up and runs the DB processes). Is Postgres limited by other settings?On one system I used many years ago, /bin/sh wasn't what I thought it was, and so the ulimit that I got when logged in was not what the postmaster was starting under. Took me many days to figure out what was up.The only thing I find convincing is to insert "ulimit -a >someplace" into the script that starts the postmaster, adjacent to where it does so, and then reboot. There are too many systems on which daemons are launched under settings different from what interactive shells use (a policy that's often a good one, too). regards, tom lane