Thread: Postgres using more memory than it should
Hi. I have a problem on one of our production servers. A fairly complicated query is running, and the backend process is using 30 GB of RAM. The machine only has 32GB, and is understandably swapping like crazy. My colleague is creating swap files as quickly as it can use them up. The work_mem setting on this machine is 1000MB, running Postgres 8.3.0. Here is an excerpt from top: top - 15:54:17 up 57 days, 6:49, 3 users, load average: 20.17, 21.29, 16.31 Tasks: 250 total, 2 running, 248 sleeping, 0 stopped, 0 zombie Cpu(s): 3.1%us, 2.5%sy, 0.0%ni, 15.2%id, 78.7%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 32961364k total, 32898588k used, 62776k free, 22440k buffers Swap: 8096344k total, 8096344k used, 0k free, 6056472k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27192 postgres 18 0 30.6g 22g 1984 R 31 71.7 32:20.09 postgres: flymine production-release-15.0 192.168.128.84(33736)INSERT 650 root 10 -5 0 0 0 S 5 0.0 13:56.10 [kswapd2] 5513 postgres 15 0 130m 19m 364 S 4 0.1 1067:04 postgres: stats collector process 957 root 10 -5 0 0 0 D 1 0.0 1:39.13 [md2_raid1] 649 root 10 -5 0 0 0 D 1 0.0 14:14.95 [kswapd1] 28599 root 15 0 0 0 0 D 1 0.0 0:01.25 [pdflush] 648 root 10 -5 0 0 0 S 0 0.0 15:10.68 [kswapd0] 2585 root 10 -5 0 0 0 D 0 0.0 67:15.89 [kjournald] The query that is being run is an INSERT INTO table SELECT a fairly complex query. Any ideas why this is going so badly, and what I can do to solve it? Matthew -- First law of computing: Anything can go wro sig: Segmentation fault. core dumped.
In response to Matthew Wakeling <matthew@flymine.org>: > > Hi. I have a problem on one of our production servers. A fairly > complicated query is running, and the backend process is using 30 GB of > RAM. The machine only has 32GB, and is understandably swapping like crazy. > My colleague is creating swap files as quickly as it can use them up. > > The work_mem setting on this machine is 1000MB, running Postgres 8.3.0. If your query it dealing with a lot of data, it could easily use 1G per sort operation. If there are a lot of sorts (more than 32) you'll end up with this problem. 1G is probably too much memory to allocate for work_mem. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
> > Hi. I have a problem on one of our production servers. A fairly > complicated query is running, and the backend process is using 30 GB of > RAM. The machine only has 32GB, and is understandably swapping like crazy. > My colleague is creating swap files as quickly as it can use them up. > > The work_mem setting on this machine is 1000MB, running Postgres 8.3.0. Are you aware that this is a per-session / per-sort settings? That means, if you have 10 sessions, each of them running query with 2 sort steps in the plan, it may occupy up to 20 GB of RAM (if both sorts use the whole 1GB of RAM). regards Tomas
On Wed, 3 Dec 2008, tv@fuzzy.cz wrote: >> Hi. I have a problem on one of our production servers. A fairly >> complicated query is running, and the backend process is using 30 GB of >> RAM. The machine only has 32GB, and is understandably swapping like crazy. >> My colleague is creating swap files as quickly as it can use them up. >> >> The work_mem setting on this machine is 1000MB, running Postgres 8.3.0. > > Are you aware that this is a per-session / per-sort settings? That means, > if you have 10 sessions, each of them running query with 2 sort steps in > the plan, it may occupy up to 20 GB of RAM (if both sorts use the whole > 1GB of RAM). Quite aware, thanks. Having sent the process a SIGINT and inspected the logs, I now have a query to explain. Looking at it, there is one single sort, and ten hash operations, which would equate to 10GB, not 30GB. What is more worrying is that now that the query has been stopped, the backend process is still hanging onto the RAM. Matthew -- Failure is not an option. It comes bundled with your Microsoft product. -- Ferenc Mantfeld
On Wed, Dec 3, 2008 at 9:34 AM, Matthew Wakeling <matthew@flymine.org> wrote: > On Wed, 3 Dec 2008, tv@fuzzy.cz wrote: >>> >>> Hi. I have a problem on one of our production servers. A fairly >>> complicated query is running, and the backend process is using 30 GB of >>> RAM. The machine only has 32GB, and is understandably swapping like >>> crazy. >>> My colleague is creating swap files as quickly as it can use them up. >>> >>> The work_mem setting on this machine is 1000MB, running Postgres 8.3.0. >> >> Are you aware that this is a per-session / per-sort settings? That means, >> if you have 10 sessions, each of them running query with 2 sort steps in >> the plan, it may occupy up to 20 GB of RAM (if both sorts use the whole >> 1GB of RAM). > > Quite aware, thanks. > > Having sent the process a SIGINT and inspected the logs, I now have a query > to explain. Looking at it, there is one single sort, and ten hash > operations, which would equate to 10GB, not 30GB. What is more worrying is > that now that the query has been stopped, the backend process is still > hanging onto the RAM. What's your setting for share_buffers, as that's likely what the backend is holding onto. Also, you should REALLY update to 8.3.5 as there are some nasty bugs fixed from 8.3.0 you don't want to run into. Who knows, you might be being bitten by one right now. Unlike other bits of software floating around, pgsql updates are bug fix / security fix only, with no major code changes allowed, since those go into the next release which is usually ~1 year later anyway.
On Wed, 3 Dec 2008, Scott Marlowe wrote: >> Having sent the process a SIGINT and inspected the logs, I now have a query >> to explain. Looking at it, there is one single sort, and ten hash >> operations, which would equate to 10GB, not 30GB. What is more worrying is >> that now that the query has been stopped, the backend process is still >> hanging onto the RAM. > > What's your setting for share_buffers, as that's likely what the > backend is holding onto. Shared buffers are set at 500MB, which is what all the other backends are holding onto. It's just the one backend that is using 30GB. At the moment, it is being swapped out, but the system seems responsive. We'll restart the whole lot some time in the middle of the night when noone minds. > Also, you should REALLY update to 8.3.5 as there are some nasty bugs > fixed from 8.3.0 you don't want to run into. Who knows, you might be > being bitten by one right now. Unlike other bits of software floating > around, pgsql updates are bug fix / security fix only, with no major > code changes allowed, since those go into the next release which is > usually ~1 year later anyway. It's possible, although I didn't see any relevant memory leaks in the release notes. This is one of the only machines we have that has not been upgraded, and it is on our schedule. Because it is running a slightly old version of RedHat Fedora, upgrading involves more horribleness than our sysadmin is willing to do on the fly with the server up. Matthew -- The email of the species is more deadly than the mail.
On Wed, Dec 03, 2008 at 04:01:48PM +0000, Matthew Wakeling wrote: > The work_mem setting on this machine is 1000MB, running Postgres 8.3.0. Check bug report from 2008-11-28, by Grzegorz Jaskiewicz: query failed, not enough memory on 8.3.5 http://archives.postgresql.org/pgsql-bugs/2008-11/msg00180.php depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On Wed, Dec 3, 2008 at 9:59 AM, Matthew Wakeling <matthew@flymine.org> wrote: > On Wed, 3 Dec 2008, Scott Marlowe wrote: >> Also, you should REALLY update to 8.3.5 as there are some nasty bugs >> fixed from 8.3.0 you don't want to run into. Who knows, you might be >> being bitten by one right now. Unlike other bits of software floating >> around, pgsql updates are bug fix / security fix only, with no major >> code changes allowed, since those go into the next release which is >> usually ~1 year later anyway. > > It's possible, although I didn't see any relevant memory leaks in the > release notes. This is one of the only machines we have that has not been > upgraded, and it is on our schedule. Because it is running a slightly old > version of RedHat Fedora, upgrading involves more horribleness than our > sysadmin is willing to do on the fly with the server up. That makes absolutely no sense. If it's an in house built rpm, you just create a new one with the same .spec file, if it was built from source it's a simple ./configure --youroptionshere ;make;make install. You need a new sysadmin.
On Wed, 3 Dec 2008, hubert depesz lubaczewski wrote: > Check bug report from 2008-11-28, by Grzegorz Jaskiewicz: > query failed, not enough memory on 8.3.5 > > http://archives.postgresql.org/pgsql-bugs/2008-11/msg00180.php Thanks, that does explain everything. So workmem is not a hard limit on the amount of memory used per hash. Once the planner has committed to using a hash (and it can only use estimates to work out whether it will fit in workmem), then the execution will blindly go ahead and try and fit absolutely everything in a hash in memory even if it doesn't fit in workmem. I agree it would be nice to fix this, but I'm not sure how at the moment. Matthew -- To most people, solutions mean finding the answers. But to chemists, solutions are things that are still all mixed up.
On Thu, 4 Dec 2008, Matthew Wakeling wrote: >> http://archives.postgresql.org/pgsql-bugs/2008-11/msg00180.php > > Thanks, that does explain everything. Oh right, yes. It explains everything *except* the fact that the backend is still holding onto all the RAM after the query is finished. Could the fact that we SIGINTed it in the middle of the query explain that at all? Matthew -- I'd try being be a pessimist, but it probably wouldn't work anyway.