Thread: Monitoring buffercache...
Hi All; I've installed pg_buffercache and I want to use it to help define the optimal shared_buffers size. Currently I run this each 15min via cron: insert into buffercache_stats select now(), isdirty, count(*) as buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2; and here's it's explain plan explain insert into buffercache_stats select now(), isdirty, count(*) as buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2; QUERY PLAN ------------------------------------------------------------------------------------------- Subquery Scan "*SELECT*" (cost=65.00..65.23 rows=2 width=25) -> HashAggregate (cost=65.00..65.12 rows=2 width=1) -> Function Scan on pg_buffercache_pages p (cost=0.00..55.00 rows=1000 width=1) (3 rows) Then once a day I will pull a report from the buffercache_stats table. The buffercache_stats table is our own creation : \d buffercache_stats Table "public.buffercache_stats" Column | Type | Modifiers ----------------+-----------------------------+----------- snap_timestamp | timestamp without time zone | isdirty | boolean | buffers | integer | memory | integer | Here's my issue, the server that we'll eventually roll this out to is extremely busy and the every 15min query above has the potential to have a huge impact on performance. Does anyone have any suggestions per a better approach or maybe a way to improve the performance for the above query ? Thanks in advance...
On Mon, 2008-11-24 at 11:43 -0700, Kevin Kempter wrote: > Hi All; > > I've installed pg_buffercache and I want to use it to help define the optimal > shared_buffers size. > > Currently I run this each 15min via cron: > insert into buffercache_stats select now(), isdirty, count(*) as buffers, > (count(*) * 8192) as memory from pg_buffercache group by 1,2; > > and here's it's explain plan > explain insert into buffercache_stats select now(), isdirty, count(*) as > buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2; > QUERY PLAN > ------------------------------------------------------------------------------------------- > Subquery Scan "*SELECT*" (cost=65.00..65.23 rows=2 width=25) > -> HashAggregate (cost=65.00..65.12 rows=2 width=1) > -> Function Scan on pg_buffercache_pages p (cost=0.00..55.00 > rows=1000 width=1) > (3 rows) > > > Then once a day I will pull a report from the buffercache_stats table. The > buffercache_stats table is our own creation : > > \d buffercache_stats > Table "public.buffercache_stats" > Column | Type | Modifiers > ----------------+-----------------------------+----------- > snap_timestamp | timestamp without time zone | > isdirty | boolean | > buffers | integer | > memory | integer | > > > Here's my issue, the server that we'll eventually roll this out to is > extremely busy and the every 15min query above has the potential to have a > huge impact on performance. I wouldn't routinely run pg_buffercache on a busy database. Plus, I don't think that pg_buffercache will answer this question for you. It will tell you whats currently in the buffer pool and the clean/dirty status, but that's not the first place I'd look, but what you really need is to figure out the hit ratio on the buffer pool and go from there. > Does anyone have any suggestions per a better approach or maybe a way to > improve the performance for the above query ? You should be able to use the blocks hit vs block read data in the pg_stat_database view (for the overall database), and drill down into pg_statio_user_tables/pg_statio_all_tables to get more detailed data if you want. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
On Mon, Nov 24, 2008 at 11:43 AM, Kevin Kempter <kevin@consistentstate.com> wrote: > Hi All; > > I've installed pg_buffercache and I want to use it to help define the optimal > shared_buffers size. > > Currently I run this each 15min via cron: > insert into buffercache_stats select now(), isdirty, count(*) as buffers, > (count(*) * 8192) as memory from pg_buffercache group by 1,2; > > and here's it's explain plan > explain insert into buffercache_stats select now(), isdirty, count(*) as > buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2; > QUERY PLAN > ------------------------------------------------------------------------------------------- > Subquery Scan "*SELECT*" (cost=65.00..65.23 rows=2 width=25) > -> HashAggregate (cost=65.00..65.12 rows=2 width=1) > -> Function Scan on pg_buffercache_pages p (cost=0.00..55.00 > rows=1000 width=1) > (3 rows) > > > Then once a day I will pull a report from the buffercache_stats table. The > buffercache_stats table is our own creation : > > \d buffercache_stats > Table "public.buffercache_stats" > Column | Type | Modifiers > ----------------+-----------------------------+----------- > snap_timestamp | timestamp without time zone | > isdirty | boolean | > buffers | integer | > memory | integer | > > > Here's my issue, the server that we'll eventually roll this out to is > extremely busy and the every 15min query above has the potential to have a > huge impact on performance. > > Does anyone have any suggestions per a better approach or maybe a way to > improve the performance for the above query ? I wouldn't worry about running it every 15 minutes unless it's on a REALLY slow machine. I just ran it in a loop over and over on my 8 core opteron server and it ran the load factor up by almost exactly 1.0. Under our normal daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new load of running that query over and over. So, it doesn't seem to be blocking or anything.
On Mon, 2008-11-24 at 12:46 -0700, Scott Marlowe wrote: > On Mon, Nov 24, 2008 at 11:43 AM, Kevin Kempter > <kevin@consistentstate.com> wrote: > > Hi All; > > > > I've installed pg_buffercache and I want to use it to help define the optimal > > shared_buffers size. > > > > Currently I run this each 15min via cron: > > insert into buffercache_stats select now(), isdirty, count(*) as buffers, > > (count(*) * 8192) as memory from pg_buffercache group by 1,2; > > > > and here's it's explain plan > > explain insert into buffercache_stats select now(), isdirty, count(*) as > > buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2; > > QUERY PLAN > > ------------------------------------------------------------------------------------------- > > Subquery Scan "*SELECT*" (cost=65.00..65.23 rows=2 width=25) > > -> HashAggregate (cost=65.00..65.12 rows=2 width=1) > > -> Function Scan on pg_buffercache_pages p (cost=0.00..55.00 > > rows=1000 width=1) > > (3 rows) > > > > > > Then once a day I will pull a report from the buffercache_stats table. The > > buffercache_stats table is our own creation : > > > > \d buffercache_stats > > Table "public.buffercache_stats" > > Column | Type | Modifiers > > ----------------+-----------------------------+----------- > > snap_timestamp | timestamp without time zone | > > isdirty | boolean | > > buffers | integer | > > memory | integer | > > > > > > Here's my issue, the server that we'll eventually roll this out to is > > extremely busy and the every 15min query above has the potential to have a > > huge impact on performance. > > > > Does anyone have any suggestions per a better approach or maybe a way to > > improve the performance for the above query ? > > I wouldn't worry about running it every 15 minutes unless it's on a > REALLY slow machine. > > I just ran it in a loop over and over on my 8 core opteron server and > it ran the load factor up by almost exactly 1.0. Under our normal > daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new > load of running that query over and over. So, it doesn't seem to be > blocking or anything. The internal docs for pg_buffercache_pages.c state: "To get a consistent picture of the buffer state, we must lock all partitions of the buffer map. Needless to say, this is horrible for concurrency. Must grab locks in increasing order to avoid possible deadlocks." I'd be concerned about that running routinely. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
On Mon, Nov 24, 2008 at 12:52 PM, Brad Nicholson <bnichols@ca.afilias.info> wrote: >> I just ran it in a loop over and over on my 8 core opteron server and >> it ran the load factor up by almost exactly 1.0. Under our normal >> daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new >> load of running that query over and over. So, it doesn't seem to be >> blocking or anything. > > The internal docs for pg_buffercache_pages.c state: > > "To get a consistent picture of the buffer state, we must lock all > partitions of the buffer map. Needless to say, this is horrible > for concurrency. Must grab locks in increasing order to avoid > possible deadlocks." Well, the pg hackers tend to take a parnoid view (it's a good thing TM) on things like this. My guess is that the period of time for which pg_buffercache takes locks on the buffer map are short enough that it isn't a real big deal on a fast enough server. On mine, it certainly had no real negative effects for the 5 minutes or so it was running in a loop. None I could see, and we run hundreds of queries per second on our system. Of course, for certain other types of loads it could be a much bigger issue. But for our load, on our machine, it was virtually unnoticeable.
On Mon, 24 Nov 2008, Kevin Kempter wrote: > Currently I run this each 15min via cron: > insert into buffercache_stats select now(), isdirty, count(*) as buffers, > (count(*) * 8192) as memory from pg_buffercache group by 1,2; This query isn't going to save the information you need to figure out if shared_buffers is working effectively for you. You'll need the usage count information (if you're on 8.3) and a notion of what tables it's caching large amounts of data from to do that. What's going to happen with the above is that you'll watch shared_buffers grow to fill whatever size you've allocated it, and then the only useful information you'll be saving is what percentage of that happens to be dirty. If it happens that the working set of everything you touch is smaller than shared_buffers, you'll find that out, but you don't need this query to figure that out--just look at the amount of shared memory the postgres processes are using with ipcs or top and you can find where that peaks at. I've got some queries that I find more useful, along with a general suggested methodology for figuring out if you've sized the buffers correctly, in my "Inside the PostgreSQL Buffer Cache" presentation at at http://www.westnet.com/~gsmith/content/postgresql > Does anyone have any suggestions per a better approach or maybe a way to > improve the performance for the above query ? It's possible to obtain this data in a rather messy but faster way by not taking all those locks. Someone even submitted a patch to do just that: http://archives.postgresql.org/pgsql-general/2008-02/msg00453.php I wouldn't recommend doing that, and it's really the only way to make this run faster. It's nice to grab a snapshot of the buffer cache every now and then just to see what tends to accumulate high usage counts and such. I predict that trying to collect it all the time will leave you overwhelmed with data it's hard to analyze and act on. I'd suggest a snapshot per hour, spread across one normal day every week, would be more than enough data to figure out how your system is behaving. If you want something worth saving every 15 minutes, you should save a snapshot of the data in pg_statio_user_tables. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, 24 Nov 2008, Scott Marlowe wrote: > My guess is that the period of time for which pg_buffercache takes locks > on the buffer map are short enough that it isn't a real big deal on a > fast enough server. As the server involved gets faster, the amount of time the locks are typically held for drops. As your shared_buffers allocation increases, that amount of time goes up. So how painful the overhead is depends on how fast your CPU is relative to how much memory is in it. Since faster systems tend to have more RAM in them, too, it's hard to say whether the impact will be noticable. Also, noting that the average case isn't impacted much isn't the concern here. The problem is how much having all partition locks held will increase impact worst-case latency. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, Nov 24, 2008 at 9:40 PM, Greg Smith <gsmith@gregsmith.com> wrote: > On Mon, 24 Nov 2008, Scott Marlowe wrote: > >> My guess is that the period of time for which pg_buffercache takes locks >> on the buffer map are short enough that it isn't a real big deal on a fast >> enough server. > > As the server involved gets faster, the amount of time the locks are > typically held for drops. > > As your shared_buffers allocation increases, that amount of time goes up. > > So how painful the overhead is depends on how fast your CPU is relative to > how much memory is in it. Since faster systems tend to have more RAM in > them, too, it's hard to say whether the impact will be noticable. > > Also, noting that the average case isn't impacted much isn't the concern > here. The problem is how much having all partition locks held will increase > impact worst-case latency. True. I was just looking to see how it impacted my servers. Just FYI, it's an 8 core opteron 2.1GHz with 32 Gig 667MHz DDR2 ram. It runs on a fast RAID-10 set (12 15k drives under an areca 1680, but I don't know if that matters that much here.) It can pretty easily about 400 or so active transactions and still be responsive, but noticeably slower. At anything under about 50 or so transactions it's still quite fast. It's configured to have 8Gig of the 32Gig allocated as shared buffers, and a buffercache query takes about 1 second to run. Is the shared_mem locked all that time? And is it only locked against writes?
Scott Marlowe wrote: > On Mon, Nov 24, 2008 at 12:52 PM, Brad Nicholson > <bnichols@ca.afilias.info> wrote: > >>> I just ran it in a loop over and over on my 8 core opteron server and >>> it ran the load factor up by almost exactly 1.0. Under our normal >>> daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new >>> load of running that query over and over. So, it doesn't seem to be >>> blocking or anything. >>> >> The internal docs for pg_buffercache_pages.c state: >> >> "To get a consistent picture of the buffer state, we must lock all >> partitions of the buffer map. Needless to say, this is horrible >> for concurrency. Must grab locks in increasing order to avoid >> possible deadlocks." >> > > Well, the pg hackers tend to take a parnoid view (it's a good thing > TM) on things like this. My guess is that the period of time for > which pg_buffercache takes locks on the buffer map are short enough > that it isn't a real big deal on a fast enough server. On mine, it > certainly had no real negative effects for the 5 minutes or so it was > running in a loop. None I could see, and we run hundreds of queries > per second on our system. > > Of course, for certain other types of loads it could be a much bigger > issue. But for our load, on our machine, it was virtually > unnoticeable. > > Yeah, I wouldn't worry about accessing it every 15 minutes! I put the comment there to make it clear that (like pg_locks) selecting from it *very frequently* could effect performance. Cheers Mark