Thread: Performance tuning on RedHat Enterprise Linux 3
Executive summary: We just did a cutover from a RedHat 8.0 box to a RedHat Enterprise Linux 3 box and we're seeing a lot more swapping on the new box than we ever did on the old box ... this is killing performance ... Background: Old Box: RedHat 8.0 2GB Memory Dual PIII 600MHz Postgres 7.3.4 SHMMAX = 1073741824 (1 GB) shared_buffers = 65536 (roughly 0.5 GB) max_fsm_relations = 1000 max_fsm_pages = 1000000 vacuum_mem = 131072 Roughly 25 - 30 connections open (mostly idle) at any given time (connection pools) New Box: RedHat Enterprise Linux ES 3 2GB Memory Dual P4 Xeon 2.7 GHz Postgres 7.3.4 SHMMAX = 1610612736 (1.5 GB) shared_buffers = 131072 (roughly 1GB) max_fsm_relations = 10000 max_fsm_pages = 10000000 sort_mem = 4096 vacuum_mem = 262144 Roughly 25 - 30 connections open (mostly idle) at any given time (connection pools) Both boxes are dedicated DB servers ... With the new configuration, we were seeing swap rates of 1000-5000 KB/s (according to vmstat) ... with the old configuration, we never saw any swapping ... I turned the shared_buffers and sort_mem down on the new box to match the settings of the old box and found that it reduced the swapping significantly (roughly 0-2000 KB/s) but didn't eliminate it completely ... when looking at 'top' on the new box, the list of postgres processes all seem to be indicating a different amount of memory usage ... under the periods of heavy swapping, one or more of the postgres processes would be way up there (between 500MB and 1000MB (which would easily explain the swapping)) ... the question is: why aren't all of the processes sharing the same pool of shared memory since I thought that's what I'm doing when adjusting the shared_buffers property? Here's an example of my 'top' (not under heavy load) demonstrating the different memory usage by each postgres process ... I unfortunately don't have the same snapshot data from the old configuration, but I seem to recall that all of the postgres processes had PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 16966 postgres 15 0 107M 107M 105M S 0.0 5.3 0:39 1 postmaster 20198 postgres 15 0 40448 39M 37752 S 0.2 1.9 0:07 0 postmaster 18801 postgres 15 0 21932 21M 19616 S 0.0 1.0 0:01 0 postmaster 19210 postgres 16 0 21276 20M 19008 S 0.0 1.0 0:01 0 postmaster 19507 postgres 15 0 15504 14M 13580 S 0.0 0.7 0:00 3 postmaster 20308 postgres 15 0 12832 12M 11248 S 0.0 0.6 0:00 3 postmaster 20456 postgres 15 0 12500 12M 10920 S 0.0 0.6 0:00 1 postmaster 20403 postgres 15 0 11572 11M 9928 S 0.2 0.5 0:00 1 postmaster 20251 postgres 15 0 10796 10M 9260 S 0.0 0.5 0:00 0 postmaster 20398 postgres 15 0 10792 10M 9256 S 0.0 0.5 0:00 2 postmaster 20306 postgres 21 0 9100 8808 7796 S 0.0 0.4 0:00 1 postmaster 20425 postgres 16 0 9100 8808 7796 S 0.0 0.4 0:00 0 postmaster 20360 postgres 15 0 9096 8804 7792 S 0.0 0.4 0:00 3 postmaster 20383 postgres 21 0 9096 8804 7792 S 0.0 0.4 0:00 0 postmaster 20434 postgres 21 0 9096 8804 7792 S 0.0 0.4 0:00 1 postmaster 20305 postgres 15 0 9108 8796 7804 S 0.0 0.4 0:00 2 postmaster Can anyone think of a reason as to why I'm seeing such heavy swapping? According to Bruce Momjian's performance tuning guide, he recommends roughly half the amount of physical RAM for the shared_buffers ... I tried turning UP the shared_buffers even higher (to 180,000 i believe; roughly 1.5GB) and that seemed to make the problem even worse ... Thanks in advance, Dave
On Mon, Dec 06, 2004 at 09:08:02AM -0500, David Esposito wrote: > shared_buffers = 131072 (roughly 1GB) > max_fsm_relations = 10000 > max_fsm_pages = 10000000 > sort_mem = 4096 > vacuum_mem = 262144 > Roughly 25 - 30 connections open (mostly idle) at any given time > (connection pools) I'd suggest reducing shared_buffers to maybe a few thousand, there's really no point reserving so much memory that way, it just a waste. Secondly, up your sort_mem a bit to reflact how big your sorts are likely to be. How's your effective_cache_size? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
"David Esposito" <pgsql-general@esposito.newnetco.com> writes: > New Box: > shared_buffers = 131072 (roughly 1GB) This setting is an order of magnitude too large. There is hardly any evidence that it's worth setting shared_buffers much above 10000. regards, tom lane
On Mon, Dec 06, 2004 at 09:08:02AM -0500, David Esposito wrote: > According to Bruce Momjian's performance tuning guide, he recommends roughly > half the amount of physical RAM for the shared_buffers ... Does he? The guide I've seen from him AFAIR states that you should allocate around 10% of physical RAM to shared_buffers. And this advice goes against common Postgres folklore. Maybe it's a document that needs to be updated. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "There is evil in the world. There are dark, awful things. Occasionally, we get a glimpse of them. But there are dark corners; horrors almost impossible to imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)
Thanks for the replies guys ... The copy of Bruce's book I was reading is at: http://www.postgresql.org/docs/aw_pgsql_book/hw_performance/node8.html and I was mistaken, it recommends 25% of physical memory be allocated to the shared cache .. Is there a better resource (even a commercial publication) that I should've been looking through? Bruce's book is a little too high-level and obviously leaves out some of the detail about the fact that there is a practical maximum ... I will crank my shared_buffers down ... But how do I know what my sort_mem setting should be? Are there statistics tables that indicate cache hits/misses like in Oracle? Lastly, about the effective_cache_size ... If I cut down my shared buffers to 10,000 like Tom recommended, and I assume that the OS itself and some overhead for the sort_mem and vacuum mem takes up about 512MB total, should I set the effective_cache_size to assume that the remaining 1.5 GB of physical memory is being allocated for the file cache by the kernel? Thanks, Dave > -----Original Message----- > From: Martijn van Oosterhout [mailto:kleptog@svana.org] > Sent: Monday, December 06, 2004 10:39 AM > To: David Esposito > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Performance tuning on RedHat Enterprise Linux 3 > > On Mon, Dec 06, 2004 at 09:08:02AM -0500, David Esposito wrote: > > shared_buffers = 131072 (roughly 1GB) > > max_fsm_relations = 10000 > > max_fsm_pages = 10000000 > > sort_mem = 4096 > > vacuum_mem = 262144 > > Roughly 25 - 30 connections open (mostly idle) at any given time > > (connection pools) > > I'd suggest reducing shared_buffers to maybe a few thousand, there's > really no point reserving so much memory that way, it just a waste. > > Secondly, up your sort_mem a bit to reflact how big your sorts are > likely to be. > > How's your effective_cache_size? > -- > Martijn van Oosterhout <kleptog@svana.org> > http://svana.org/kleptog/ > > Patent. n. Genius is 5% inspiration and 95% perspiration. A > patent is a > > tool for doing 5% of the work and then sitting around > waiting for someone > > else to do the other 95% so you can sue them. >
<snip> > ... under the periods of heavy swapping, one or more of the postgres >processes would be way up there (between 500MB and 1000MB (which would >easily explain the swapping)) ... the question is: why aren't all of the >processes sharing the same pool of shared memory since I thought that's what >I'm doing when adjusting the shared_buffers property? > > <snip> I seem to remember hearing that the memory limit on certain operations, such as sorts, is not "enforced" (may the hackers correct me if I am wrong); rather, the planner estimates how much a sort might take by looking at the statistics for a table. If the statistics are wrong, however, the sort doesn't actually stay within sort memory, and so the process consumes a very large amount of memory, much more than the sort_mem configuration parameter should allow it to. If the other suggestions given (to reduce the shared buffers) don't fix it, I suggest running ANALYZE all your tables and see if the erratic memory usage goes away. If that doesn't help, then try to figure out what query is causing the high memory usage, and run EXPLAIN ANALYZE on just that query to see if it is returning drastically more rows than the planner thinks it will. Paul Tillotson
On Mon, 2004-12-06 at 19:37 -0500, Paul Tillotson wrote: > I seem to remember hearing that the memory limit on certain operations, > such as sorts, is not "enforced" (may the hackers correct me if I am > wrong); rather, the planner estimates how much a sort might take by > looking at the statistics for a table. > > If the statistics are wrong, however, the sort doesn't actually stay > within sort memory, and so the process consumes a very large amount of > memory, much more than the sort_mem configuration parameter should allow > it to. AFAIK this is not the case. sort_mem defines the in-memory buffer used _per_ sort operation. The problem you may be referring to is that multiple concurrent sort operations (possibly within a single backend) will each consume up to sort_mem, so the aggregate memory usage for sort operations may be significantly higher than sort_mem. -Neil
On Tue, Dec 07, 2004 at 12:02:13PM +1100, Neil Conway wrote: > On Mon, 2004-12-06 at 19:37 -0500, Paul Tillotson wrote: > > I seem to remember hearing that the memory limit on certain operations, > > such as sorts, is not "enforced" (may the hackers correct me if I am > > wrong); rather, the planner estimates how much a sort might take by > > looking at the statistics for a table. > AFAIK this is not the case. AFAIK this is indeed the case with hashed aggregation, which uses the sort_mem (work_mem) parameter to control its operation, but for which it is not a hard limit. I concur however that multiple concurrent sorts may consume more memory than the limit specified for one sort. (Just last week I saw a server running with sort_mem set to 800 MB ... no wonder the server went belly up every day at 3.00am, exactly when a lot of reports were being generated) -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Acepta los honores y aplausos y perderás tu libertad"
On Mon, 2004-12-06 at 22:19 -0300, Alvaro Herrera wrote: > AFAIK this is indeed the case with hashed aggregation, which uses the > sort_mem (work_mem) parameter to control its operation, but for which it > is not a hard limit. Hmmm -- I knew we didn't implement disk-spilling for hashed aggregation, but I thought we had _some_ sane means to avoid consuming a lot of memory if we got the plan completely wrong. AFAICS you are right, and this is not the case :-( We definitely ought to fix this. -Neil
Alvaro Herrera wrote: >On Tue, Dec 07, 2004 at 12:02:13PM +1100, Neil Conway wrote: > > >>On Mon, 2004-12-06 at 19:37 -0500, Paul Tillotson wrote: >> >> >>>I seem to remember hearing that the memory limit on certain operations, >>>such as sorts, is not "enforced" (may the hackers correct me if I am >>>wrong); rather, the planner estimates how much a sort might take by >>>looking at the statistics for a table. >>> >>> > > > >>AFAIK this is not the case. >> >> > >AFAIK this is indeed the case with hashed aggregation, which uses the >sort_mem (work_mem) parameter to control its operation, but for which it >is not a hard limit. > >I concur however that multiple concurrent sorts may consume more memory >than the limit specified for one sort. (Just last week I saw a server >running with sort_mem set to 800 MB ... no wonder the server went belly >up every day at 3.00am, exactly when a lot of reports were being >generated) > > Does postgres actually do multiple concurrent sorts within a single backend? I didn't think it would ever do this, since each backend has only a single thread. David says that he sees a particular process start to consume very large amounts of memory, and from my understanding of postgres, this must be one single query taking a lot of memory, not "multiple concurrent sorts." Paul Tillotson
Neil Conway <neilc@samurai.com> writes: > On Mon, 2004-12-06 at 22:19 -0300, Alvaro Herrera wrote: >> AFAIK this is indeed the case with hashed aggregation, which uses the >> sort_mem (work_mem) parameter to control its operation, but for which it >> is not a hard limit. > Hmmm -- I knew we didn't implement disk-spilling for hashed aggregation, > but I thought we had _some_ sane means to avoid consuming a lot of > memory if we got the plan completely wrong. The *sort* code is fairly good about respecting sort_mem. The *hash* code is not so good. > We definitely ought to fix this. Bear in mind that the price of honoring sort_mem carefully is considerably far from zero. (Or, if you know how to do it cheaply, let's see it ...) The issue with the hash code is that it sets size parameters on the basis of the estimated input row count; the memory usage error factor is basically inversely proportional to the error in the planner's row estimate. The seriously bad cases I've seen reported were directly due to horribly-out-of-date planner table size estimates. A large part of the rationale for applying that last-minute 8.0 change in relpages/ reltuples handling was to try to suppress the worst cases in hashtable size estimation. regards, tom lane
Paul Tillotson <pntil@shentel.net> writes: > Does postgres actually do multiple concurrent sorts within a single > backend? Certainly. Consider for example a merge join with each input being sorted by an explicit sort step. DISTINCT, ORDER BY, UNION, and related operators require their own sort steps in the current implementation. It's not difficult to invent queries that require arbitrarily large numbers of sort steps. regards, tom lane
On Mon, 2004-12-06 at 23:55 -0500, Tom Lane wrote: > Bear in mind that the price of honoring sort_mem carefully is > considerably far from zero. I'll do some thinking about disk-based spilling for hashed aggregation for 8.1 > The issue with the hash code is that it sets size parameters on the > basis of the estimated input row count; the memory usage error factor > is basically inversely proportional to the error in the planner's row > estimate. Right. But I don't think it's acceptable to consume an arbitrary amount of memory to process a query, even if we only do that when the planner makes a mistake (regrettably, planner mistakes occur with some regularity). As a quick hack, what about throwing away the constructed hash table and switching to hashing for sorting if we exceed sort_mem by a significant factor? (say, 200%) We might also want to print a warning message to the logs. This assumes that aggregation-by-sorting can be used in a superset of the cases where aggregation-by-hashing can be used, and that the semantics of both nodes are the same; I believe both conditions hold. And of course, performance will likely suck -- but (a) since the planner has guessed wrong performance is probably going to suck anyway (b) it is better than running the machine OOM. -Neil
Neil Conway <neilc@samurai.com> writes: > As a quick hack, what about throwing away the constructed hash table and > switching to hashing for sorting if we exceed sort_mem by a significant > factor? (say, 200%) We might also want to print a warning message to the > logs. If I thought that a 200% error in memory usage were cause for a Chinese fire drill, then I'd say "yeah, let's do that". The problem is that the place where performance actually goes into the toilet is normally an order of magnitude or two above the nominal sort_mem setting (for obvious reasons: admins can't afford to push the envelope on sort_mem because of the various unpredictable multiples that may apply). So switching to a hugely more expensive implementation as soon as we exceed some arbitrary limit is likely to be a net loss not a win. If you can think of a spill methodology that has a gentle degradation curve, then I'm all for that. But I doubt there are any quick-hack improvements to be had here. regards, tom lane
There are many reports of kernel problems with memory allocation (too agressive) and swap issues with RHEL 3.0 on both RAID and non-RAID systems. I hope folks have worked through all those issues before blaming postgresql. Tom Lane wrote: > > If I thought that a 200% error in memory usage were cause for a Chinese > fire drill, then I'd say "yeah, let's do that". The problem is that the > place where performance actually goes into the toilet is normally an > order of magnitude or two above the nominal sort_mem setting (for > obvious reasons: admins can't afford to push the envelope on sort_mem > because of the various unpredictable multiples that may apply). So > switching to a hugely more expensive implementation as soon as we exceed > some arbitrary limit is likely to be a net loss not a win. > > If you can think of a spill methodology that has a gentle degradation > curve, then I'm all for that. But I doubt there are any quick-hack > improvements to be had here. > > regards, tom lane -- P. J. "Josh" Rovero Sonalysts, Inc. Email: rovero@sonalysts.com www.sonalysts.com 215 Parkway North Work: (860)326-3671 or 442-4355 Waterford CT 06385 ***********************************************************************
On Tue, 07 Dec 2004 07:50:44 -0500, P.J. Josh Rovero <rovero@sonalysts.com> wrote: > There are many reports of kernel problems with memory allocation > (too agressive) and swap issues with RHEL 3.0 on both RAID > and non-RAID systems. I hope folks have worked through all > those issues before blaming postgresql. We have seen several boxes have kswapd go crazy (near 100% CPU) on RHEL 3 boxes. Upgrading to kernel 2.4.21-4 fixed this. Tony Wasson
--=======67E74690======= Content-Type: text/plain; x-avg-checked=avg-ok-39E01936; charset=us-ascii; format=flowed Content-Transfer-Encoding: 8bit But isn't the problem when the planner screws up and not the sortmem setting? There was my case where the 7.4 planner estimated 1500 distinct rows when there were actually 1391110. On 7.3.4 it used about 4.4MB. Whereas 7.4 definitely used more than 400MB for the same query ) - I had to kill postgresql - didn't wait for it to use more. That's a lot more than 200%. Maybe 3x sort_mem is too low, but at least by default keep it below server RAM/number of backends or something like that. Even if the planner has improved a lot if cases like that still occur from time to time it'll be a lot better for stability/availability if there's a limit. Doubt if I still have the same data to test on 8.0. Link. At 12:35 AM 12/7/2004 -0500, Tom Lane wrote: >Neil Conway <neilc@samurai.com> writes: > > As a quick hack, what about throwing away the constructed hash table and > > switching to hashing for sorting if we exceed sort_mem by a significant > > factor? (say, 200%) We might also want to print a warning message to the > > logs. > >If I thought that a 200% error in memory usage were cause for a Chinese >fire drill, then I'd say "yeah, let's do that". The problem is that the >place where performance actually goes into the toilet is normally an >order of magnitude or two above the nominal sort_mem setting (for >obvious reasons: admins can't afford to push the envelope on sort_mem >because of the various unpredictable multiples that may apply). So >switching to a hugely more expensive implementation as soon as we exceed >some arbitrary limit is likely to be a net loss not a win. > >If you can think of a spill methodology that has a gentle degradation >curve, then I'm all for that. But I doubt there are any quick-hack >improvements to be had here. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly --=======67E74690=======--
Tom Lane wrote: > Paul Tillotson <pntil@shentel.net> writes: >> Does postgres actually do multiple concurrent sorts within a single >> backend? > > Certainly. Consider for example a merge join with each input being > sorted by an explicit sort step. DISTINCT, ORDER BY, UNION, and > related operators require their own sort steps in the current > implementation. It's not difficult to invent queries that require > arbitrarily large numbers of sort steps. Tom, in Bruce's document on performance tuning, the page titled "Multiple CPUs" states: "POSTGRESQL uses a multi-process model, meaning each database connection has its own Unix process...POSTGRESQL does not use multi-threading to allow a single process to use multiple CPUs." I took this to mean that PostgreSQL was not multi-threaded at all, and that each connection was serviced by a single, non-threaded process. Have I interpreted this incorrectly? Are you saying that the backend process actually is multi-threaded? In the example you site, multiple sorts could be accomplished serially in a non-threaded process. -- Guy Rouillier
Tom Lane <tgl@sss.pgh.pa.us> writes: > Paul Tillotson <pntil@shentel.net> writes: > > Does postgres actually do multiple concurrent sorts within a single > > backend? > > Certainly. Consider for example a merge join with each input being > sorted by an explicit sort step. DISTINCT, ORDER BY, UNION, and related > operators require their own sort steps in the current implementation. > It's not difficult to invent queries that require arbitrarily large > numbers of sort steps. I think there's a bit of misunderstanding here. He's talking about two sorts actually being executed in parallel. I don't think Postgres actually does that even if there are multiple sorts in the plan. Postgres isn't threaded (either manually or via OS threads) and Postgres's sort isn't incremental and doesn't return any tuples to the outer nodes until it's completely finished sorting (it's not bubble sort or selection sort:). However a sort step still takes up memory after it's finished executing because it has to store the ordered tuples. So a merge join joining two sorted tables needs to do the sort on one and then keep around the tuples, and do the sort on the second and keep around the tuples for that one too. I think the actual sort algorithm used can consume up to 3x the space of just the sorted tuples. But I'm not really sure on that, nor am I sure whether that space is reclaimed once the actual execution is done. -- greg
>>>Does postgres actually do multiple concurrent sorts within a single >>>backend? >>> >>> >>Certainly. Consider for example a merge join with each input being >>sorted by an explicit sort step. DISTINCT, ORDER BY, UNION, and >>related operators require their own sort steps in the current >>implementation. It's not difficult to invent queries that require >>arbitrarily large numbers of sort steps. >> >> > >Tom, in Bruce's document on performance tuning, the page titled >"Multiple CPUs" states: > >"POSTGRESQL uses a multi-process model, meaning each database connection >has its own Unix process...POSTGRESQL does not use multi-threading to >allow a single process to use multiple CPUs." > >I took this to mean that PostgreSQL was not multi-threaded at all, and >that each connection was serviced by a single, non-threaded process. >Have I interpreted this incorrectly? Are you saying that the backend >process actually is multi-threaded? In the example you site, multiple >sorts could be accomplished serially in a non-threaded process. > > Guy, You understand correctly. Each process is only running one query at once, but in terms of memory usage, several sorts are executing in parallel. For example, a merge join requires that both the left and right tables be sorted; as the join is being executed, both the left and right tables are being sorted. (Why doesn't it sort one and then the other? It would be a waste of memory to require that one of the [sorted] tables be kept in memory or written completely to the disk and then fetched later. Instead, it just sorts them both as it goes along.) However, this does mean that the amount of per-process memory being used for sorting will not vary with the "workload" of the database or number of people running that query (as each process only runs the query once). The amount of per-process memory used will vary with the complexity of the query and the plan chosen by the planner. Paul Tillotson