Thread: Finding bottleneck
Hello, we recently upgraded our dual Xeon Dell to a brand new Sun v40z with 4 opterons, 16GB of memory and MegaRAID with enough disks. OS is Debian Sarge amd64, PostgreSQL is 8.0.3. Size of database is something like 80GB and our website performs about 600 selects and several updates/inserts a second. v40z performs somewhat better than our old Dell but mostly due to increased amount of memory. The problem is.. there seems to by plenty of free CPU available and almost no IO-wait but CPU bound queries seem to linger for some reason. Problem appears very clearly during checkpointing. Queries accumulate and when checkpointing is over, there can be something like 400 queries running but over 50% of cpu is just idling. procs -----------memory------------ ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 3 1 0 494008 159492 14107180 0 0 919 3164 3176 13031 29 12 52 8 5 3 0 477508 159508 14118452 0 0 1071 4479 3474 13298 27 13 47 13 0 0 0 463604 159532 14128832 0 0 922 2903 3352 12627 29 11 52 8 3 1 0 442260 159616 14141668 0 0 1208 3153 3357 13163 28 12 52 9 An example of a lingering query (there's usually several of these or similar): SELECT u.uid, u.nick, u.name, u.showname, i.status, i.stamp, i.image_id, i.info, i.t_width, i.t_height FROM users u INNER JOIN image i ON i.uid = u.uid INNER JOIN user_online uo ON u.uid = uo.uid WHERE u.city_id = 5 AND i.status = 'd' AND u.status = 'a' ORDER BY city_id, upper(u.nick) LIMIT (40 + 1) OFFSET 320 Tables involved contain no more than 4 million rows. Those are constantly accessed and should fit nicely to cache. But database is just slow because of some unknown reason. Any ideas? ----------------->8 Relevant rows from postgresql.conf 8<----------------- shared_buffers = 15000 # min 16, at least max_connections*2, 8KB each work_mem = 1536 # min 64, size in KB maintenance_work_mem = 32768 # min 1024, size in KB max_fsm_pages = 1000000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 5000 # min 100, ~50 bytes each vacuum_cost_delay = 15 # 0-1000 milliseconds vacuum_cost_limit = 120 # 0-10000 credits bgwriter_percent = 2 # 0-100% of dirty buffers in each round fsync = true # turns forced synchronization on or off # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 128 # min 4, 8KB each commit_delay = 80000 # range 0-100000, in microseconds commit_siblings = 10 # range 1-1000 checkpoint_segments = 200 # in logfile segments, min 1, 16MB each checkpoint_timeout = 1800 # range 30-3600, in seconds effective_cache_size = 1000000 # typically 8KB each random_page_cost = 1.8 # units are one sequential page fetch cost default_statistics_target = 150 # range 1-1000 stats_start_collector = true stats_command_string = true |\__/| ( oo ) Kari Lavikka - tuner@bdb.fi - (050) 380 3808 __ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _ ""
Hi, On Thu, 28 Jul 2005, Kari Lavikka wrote: > ----------------->8 Relevant rows from postgresql.conf 8<----------------- > > shared_buffers = 15000 # min 16, at least max_connections*2, 8KB each > work_mem = 1536 # min 64, size in KB As an aside, I'd increase work_mem -- but it doesn't sound like that is your problem. > maintenance_work_mem = 32768 # min 1024, size in KB > > max_fsm_pages = 1000000 # min max_fsm_relations*16, 6 bytes each > max_fsm_relations = 5000 # min 100, ~50 bytes each > > vacuum_cost_delay = 15 # 0-1000 milliseconds > vacuum_cost_limit = 120 # 0-10000 credits > > bgwriter_percent = 2 # 0-100% of dirty buffers in each round > > fsync = true # turns forced synchronization on or off > # fsync, fdatasync, open_sync, or open_datasync > wal_buffers = 128 # min 4, 8KB each Some benchmarking results out today suggest that wal_buffers = 1024 or even 2048 could greatly assist you. > commit_delay = 80000 # range 0-100000, in microseconds > commit_siblings = 10 # range 1-1000 This may explain the fact that you've got backed up queries and idle CPU -- I'm not certain though. What does disabling commit_delay do to your situation? Gavin
> effective_cache_size = 1000000 # typically 8KB each I have this setting on postgresql 7.4.8 on FreeBSD with 4 GB RAM: effective_cache_size = 27462 So eventhough your machine runs Debian and you have four times as much RAM as mine your effective_cache_size is 36 times larger. You could try lowering this setting. regards Claus
Kari Lavikka wrote: > shared_buffers = 15000 you can play around with this one but in my experience it doesn't make much difference anymore (it used to). > work_mem = 1536 # min 64, size in KB this seems low. are you sure you are not getting sorts swapped to disk? > fsync = true # turns forced synchronization on or off does turning this to off make a difference? This would help narrow down where the problem is. > commit_delay = 80000 # range 0-100000, in microseconds hm! how did you arrive at this number? try setting to zero and comparing. > stats_start_collector = true > stats_command_string = true with a high query load you may want to consider turning this off. On win32, I've had some problem with stat's collector under high load conditions. Not un unix, but it's something to look at. Just turn off stats for a while and see if it helps. good luck! your hardware should be more than adequate. Merlin
On 7/28/05 2:21 AM, "Kari Lavikka" <tuner@bdb.fi> wrote: There's a new profiling tool called oprofile: http://oprofile.sourceforge.net/download/ that can be run without instrumenting the binaries beforehand. To actually find out what the code is doing during these stalls, oprofile can show you in which routines the CPU is spending time when you start/stop the profiling. As an alternative to the "guess->change parameters->repeat" approach, this is the most direct way to find the exact nature of the problem. - Luke
Hi! Oprofile looks quite interesting. I'm not very familiar with postgresql internals, but here's some report output: CPU: AMD64 processors, speed 2190.23 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 13513390 16.0074 AtEOXact_CatCache 4492257 5.3213 StrategyGetBuffer 2279285 2.6999 AllocSetAlloc 2121509 2.5130 LWLockAcquire 2023574 2.3970 hash_seq_search 1971358 2.3352 nocachegetattr 1837168 2.1762 GetSnapshotData 1793693 2.1247 SearchCatCache 1777385 2.1054 hash_search 1460804 1.7304 ExecMakeFunctionResultNoSets 1360930 1.6121 _bt_compare 1344604 1.5928 yyparse 1318407 1.5617 LWLockRelease 1290814 1.5290 FunctionCall2 1137544 1.3475 ExecEvalVar 1102236 1.3057 hash_any 912677 1.0811 OpernameGetCandidates 877993 1.0400 ReadBufferInternal 783908 0.9286 TransactionIdPrecedes 772886 0.9155 MemoryContextAllocZeroAligned 679768 0.8052 StrategyBufferLookup 609339 0.7218 equal 600584 0.7114 PGSemaphoreLock And btw, I tried to strace lingering queries under different loads. When number of concurrent queries increases, lseek and read syscalls stay within quite constant limits but number of semop calls quadruples. Are there some buffer locking issues? |\__/| ( oo ) Kari Lavikka - tuner@bdb.fi - (050) 380 3808 __ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _ "" On Thu, 28 Jul 2005, Luke Lonergan wrote: > On 7/28/05 2:21 AM, "Kari Lavikka" <tuner@bdb.fi> wrote: > > There's a new profiling tool called oprofile: > > http://oprofile.sourceforge.net/download/ > > that can be run without instrumenting the binaries beforehand. To actually > find out what the code is doing during these stalls, oprofile can show you > in which routines the CPU is spending time when you start/stop the > profiling. > > As an alternative to the "guess->change parameters->repeat" approach, this > is the most direct way to find the exact nature of the problem. > > - Luke > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Kari Lavikka <tuner@bdb.fi> writes: > samples % symbol name > 13513390 16.0074 AtEOXact_CatCache That seems quite odd --- I'm not used to seeing that function at the top of a profile. What is the workload being profiled, exactly? regards, tom lane
> Kari Lavikka <tuner@bdb.fi> writes: > > samples % symbol name > > 13513390 16.0074 AtEOXact_CatCache > > That seems quite odd --- I'm not used to seeing that function at the top > of a profile. What is the workload being profiled, exactly? He is running a commit_delay of 80000. Could that be playing a role? Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: >> Kari Lavikka <tuner@bdb.fi> writes: >>> samples % symbol name >>> 13513390 16.0074 AtEOXact_CatCache >> >> That seems quite odd --- I'm not used to seeing that function at the top >> of a profile. What is the workload being profiled, exactly? > He is running a commit_delay of 80000. Could that be playing a role? It wouldn't cause AtEOXact_CatCache to suddenly get expensive. (I have little or no faith in the value of nonzero commit_delay, though.) regards, tom lane
Actually I modified postgresql.conf a bit and there isn't commit delay any more. That didn't make noticeable difference though.. Workload is generated by a website with about 1000 dynamic page views a second. Finland's biggest site among youths btw. Anyway, there are about 70 tables and here's some of the most important: relname | reltuples ----------------------------------+------------- comment | 1.00723e+08 comment_archive | 9.12764e+07 channel_comment | 6.93912e+06 image | 5.80314e+06 admin_event | 5.1936e+06 user_channel | 3.36877e+06 users | 325929 channel | 252267 Queries to "comment" table are mostly IO-bound but are performing quite well. Here's an example: (SELECT u.nick, c.comment, c.private, c.admin, c.visible, c.parsable, c.uid_sender, to_char(c.stamp, 'DD.MM.YY HH24:MI') AS stamp, c.comment_id FROM comment c INNER JOIN users u ON u.uid = c.uid_sender WHERE u.status = 'a' AND c.image_id = 15500900 AND c.uid_target = 780345 ORDER BY uid_target DESC, image_id DESC, c.comment_id DESC) LIMIT 36 And explain analyze: Limit (cost=0.00..6.81 rows=1 width=103) (actual time=0.263..17.522 rows=12 loops=1) -> Nested Loop (cost=0.00..6.81 rows=1 width=103) (actual time=0.261..17.509 rows=12 loops=1) -> Index Scan Backward using comment_uid_target_image_id_comment_id_20050527 on "comment" c (cost=0.00..3.39rows=1 width=92) (actual time=0.129..16.213 rows=12 loops=1) Index Cond: ((uid_target = 780345) AND (image_id = 15500900)) -> Index Scan using users_pkey on users u (cost=0.00..3.40 rows=1 width=15) (actual time=0.084..0.085 rows=1loops=12) Index Cond: (u.uid = "outer".uid_sender) Filter: (status = 'a'::bpchar) Total runtime: 17.653 ms We are having performance problems with some smaller tables and very simple queries. For example: SELECT u.uid, u.nick, extract(epoch from uc.stamp) AS stamp FROM user_channel uc INNER JOIN users u USING (uid) WHERE channel_id = 281321 AND u.status = 'a' ORDER BY uc.channel_id, upper(uc.nick) And explain analyze: Nested Loop (cost=0.00..200.85 rows=35 width=48) (actual time=0.414..38.128 rows=656 loops=1) -> Index Scan using user_channel_channel_id_nick on user_channel uc (cost=0.00..40.18 rows=47 width=27) (actual time=0.090..0.866rows=667 loops=1) Index Cond: (channel_id = 281321) -> Index Scan using users_pkey on users u (cost=0.00..3.40 rows=1 width=25) (actual time=0.048..0.051 rows=1 loops=667) Index Cond: ("outer".uid = u.uid) Filter: (status = 'a'::bpchar) Total runtime: 38.753 ms Under heavy load these queries tend to take several minutes to execute although there's plenty of free cpu available. There aren't any blocking locks in pg_locks. |\__/| ( oo ) Kari Lavikka - tuner@bdb.fi - (050) 380 3808 __ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _ "" On Mon, 8 Aug 2005, Merlin Moncure wrote: >> Kari Lavikka <tuner@bdb.fi> writes: >>> samples % symbol name >>> 13513390 16.0074 AtEOXact_CatCache >> >> That seems quite odd --- I'm not used to seeing that function at the > top >> of a profile. What is the workload being profiled, exactly? > > He is running a commit_delay of 80000. Could that be playing a role? > > Merlin > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Kari Lavikka <tuner@bdb.fi> writes: > We are having performance problems with some smaller tables and very > simple queries. For example: > SELECT u.uid, u.nick, extract(epoch from uc.stamp) AS stamp FROM > user_channel uc INNER JOIN users u USING (uid) WHERE channel_id = 281321 > AND u.status = 'a' ORDER BY uc.channel_id, upper(uc.nick) > And explain analyze: > Nested Loop (cost=0.00..200.85 rows=35 width=48) (actual time=0.414..38.128 rows=656 loops=1) > -> Index Scan using user_channel_channel_id_nick on user_channel uc (cost=0.00..40.18 rows=47 width=27) (actual time=0.090..0.866rows=667 loops=1) > Index Cond: (channel_id = 281321) > -> Index Scan using users_pkey on users u (cost=0.00..3.40 rows=1 width=25) (actual time=0.048..0.051 rows=1 loops=667) > Index Cond: ("outer".uid = u.uid) > Filter: (status = 'a'::bpchar) > Total runtime: 38.753 ms > Under heavy load these queries tend to take several minutes to execute > although there's plenty of free cpu available. What that sounds like to me is a machine with inadequate disk I/O bandwidth. Your earlier comment that checkpoint drives the machine into the ground fits right into that theory, too. You said there is "almost no IO-wait" but are you sure you are measuring that correctly? Something else just struck me from your first post: > Queries accumulate and when checkpointing is over, there can be > something like 400 queries running but over 50% of cpu is just idling. 400 queries? Are you launching 400 separate backends to do that? Some sort of connection pooling seems like a good idea, if you don't have it in place already. If the system's effective behavior in the face of heavy load is to start even more concurrent backends, that could easily drive things into the ground. regards, tom lane
On Mon, 8 Aug 2005, Tom Lane wrote: > What that sounds like to me is a machine with inadequate disk I/O bandwidth. > Your earlier comment that checkpoint drives the machine into the ground > fits right into that theory, too. You said there is "almost no IO-wait" > but are you sure you are measuring that correctly? Currently there's some iowait caused by "fragmentation" of the comment table. Periodic clustering helps a lot. Disk configurations looks something like this: sda: data (10 spindles, raid10) sdb: xlog & clog (2 spindles, raid1) sdc: os and other stuff Usually iostat (2 second interval) says: avg-cpu: %user %nice %sys %iowait %idle 32.38 0.00 12.88 11.62 43.12 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 202.00 1720.00 0.00 3440 0 sdb 152.50 4.00 2724.00 8 5448 sdc 0.00 0.00 0.00 0 0 And during checkpoint: avg-cpu: %user %nice %sys %iowait %idle 31.25 0.00 14.75 54.00 0.00 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 3225.50 1562.00 35144.00 3124 70288 sdb 104.50 10.00 2348.00 20 4696 sdc 0.00 0.00 0.00 0 0 I think (insufficiency of) disk IO shouldn't cause those lingering queries because dataset is rather small and it's continuously accessed. It should fit into cache and stay there(?) > 400 queries? Are you launching 400 separate backends to do that? Well yes. That's the common problem with php and persistent connections. > Some sort of connection pooling seems like a good idea, if you don't > have it in place already. pg_pool for example? I'm planning to give it a try. > regards, tom lane |\__/| ( oo ) Kari Lavikka - tuner@bdb.fi - (050) 380 3808 __ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _ ""
Kari Lavikka <tuner@bdb.fi> writes: > Disk configurations looks something like this: > sda: data (10 spindles, raid10) > sdb: xlog & clog (2 spindles, raid1) > sdc: os and other stuff That's definitely wrong. Put clog on the data disk. The entire point of giving xlog its own spindle is that you don't ever want the disk heads moving off the current xlog file. I'm not sure how much this is hurting you, given that clog is relatively low volume, but if you're going to go to the trouble of putting xlog on a separate spindle then it should be a completely dedicated spindle. regards, tom lane
I think I have a solution for you. You have posted that you presently have these RAID volumes and behaviors: sda: data (10 spindles, raid10) sdb: xlog & clog (2 spindles, raid1) sdc: os and other stuff Usually iostat (2 second interval) says: avg-cpu: %user %nice %sys %iowait %idle 32.38 0.00 12.88 11.62 43.12 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 202.00 1720.00 0.00 3440 0 sdb 152.50 4.00 2724.00 8 5448 sdc 0.00 0.00 0.00 0 0 And during checkpoint: avg-cpu: %user %nice %sys %iowait %idle 31.25 0.00 14.75 54.00 0.00 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 3225.50 1562.00 35144.00 3124 70288 sdb 104.50 10.00 2348.00 20 4696 sdc 0.00 0.00 0.00 0 0 During checkpoints sda is becoming saturated, essentially halting all other DB activity involving sda. A lesser version of the porblem is probably occurring every time multiple entities on sda are being accessed simultaneously, particularly simultaneous writes. My Proposed Solution: Put comment and its index on it's own dedicated RAID volume. Put comment_archive and its index on its own dedicated RAID volume. Put the rest of the tables currently part of "data" on their own dedicated RAID volume. Put the rest if the indexes to the tables currently part of "data" on their own dedicated RAID volume. Put xlog on its own dedicated RAID volume. The general idea here is to put any tables or indexes that tend to require simultaneous access, particularly write access, on different spindles. Like all things, there's a point of diminishing returns that is dependent on the HW used and the DB load. If you must wring every last bit of IO out of the HD subsystem, a more exact set of spindle assignments can be made by analyzing your queries and then 1) make sure writes that tend to be simultaneous are to different spindles, then (if you still need better IO) 2) make sure reads that tend to be simultaneous are to different spindles. At some point, your controller will become the bottleneck. At some point beyond that, the IO channels on the mainboard will become the bottleneck. My suggestion should get you to within 80-90% of optimal if I've understood the implications of your posts correctly. The other suggestion I'd make is to bump your RAM from 16GB to 32GB as soon as you can afford it and then tune your PostgreSQL parameters to make best use of it. The more RAM resident your DB, the better. Hope this helps, Ron Peacetree ===========Original Message Follows=========== From: Kari Lavikka <tuner ( at ) bdb ( dot ) fi> To: Merlin Moncure <merlin ( dot ) moncure ( at ) rcsonline ( dot ) com> Subject: Re: Finding bottleneck Date: Mon, 8 Aug 2005 19:19:09 +0300 (EETDST) ---------- Actually I modified postgresql.conf a bit and there isn't commit delay any more. That didn't make noticeable difference though.. Workload is generated by a website with about 1000 dynamic page views a second. Finland's biggest site among youths btw. Anyway, there are about 70 tables and here's some of the most important: relname | reltuples ----------------------------------+------------- comment | 1.00723e+08 comment_archive | 9.12764e+07 channel_comment | 6.93912e+06 image | 5.80314e+06 admin_event | 5.1936e+06 user_channel | 3.36877e+06 users | 325929 channel | 252267 Queries to "comment" table are mostly IO-bound but are performing quite well. Here's an example: (SELECT u.nick, c.comment, c.private, c.admin, c.visible, c.parsable, c.uid_sender, to_char(c.stamp, 'DD.MM.YY HH24:MI') AS stamp, c.comment_id FROM comment c INNER JOIN users u ON u.uid = c.uid_sender WHERE u.status = 'a' AND c.image_id = 15500900 AND c.uid_target = 780345 ORDER BY uid_target DESC, image_id DESC, c.comment_id DESC) LIMIT 36 And explain analyze: Limit (cost=0.00..6.81 rows=1 width=103) (actual time=0.263..17.522 rows=12 loops=1) -> Nested Loop (cost=0.00..6.81 rows=1 width=103) (actual time=0.261..17.509 rows=12 loops=1) -> Index Scan Backward using comment_uid_target_image_id_comment_id_20050527 on "comment" c (cost=0.00..3.39 rows=1 width=92) (actual time=0.129..16.213 rows=12 loops=1) Index Cond: ((uid_target = 780345) AND (image_id = 15500900)) -> Index Scan using users_pkey on users u (cost=0.00..3.40 rows=1 width=15) (actual time=0.084..0.085 rows=1 loops=12) Index Cond: (u.uid = "outer".uid_sender) Filter: (status = 'a'::bpchar) Total runtime: 17.653 ms We are having performance problems with some smaller tables and very simple queries. For example: SELECT u.uid, u.nick, extract(epoch from uc.stamp) AS stamp FROM user_channel uc INNER JOIN users u USING (uid) WHERE channel_id = 281321 AND u.status = 'a' ORDER BY uc.channel_id, upper(uc.nick) And explain analyze: Nested Loop (cost=0.00..200.85 rows=35 width=48) (actual time=0.414..38.128 rows=656 loops=1) -> Index Scan using user_channel_channel_id_nick on user_channel uc (cost=0.00..40.18 rows=47 width=27) (actual time=0.090..0.866 rows=667 loops=1) Index Cond: (channel_id = 281321) -> Index Scan using users_pkey on users u (cost=0.00..3.40 rows=1 width=25) (actual time=0.048..0.051 rows=1 loops=667) Index Cond: ("outer".uid = u.uid) Filter: (status = 'a'::bpchar) Total runtime: 38.753 ms Under heavy load these queries tend to take several minutes to execute although there's plenty of free cpu available. There aren't any blocking locks in pg_locks. |\__/| ( oo ) Kari Lavikka - tuner ( at ) bdb ( dot ) fi - (050) 380 3808 __ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _ ""
On Mon, 8 Aug 2005, Tom Lane wrote: > What that sounds like to me is a machine with inadequate disk I/O bandwidth. > Your earlier comment that checkpoint drives the machine into the ground > fits right into that theory, too. You said there is "almost no IO-wait" > but are you sure you are measuring that correctly? Reducing checkpoint_timeout to 600 seconds had a positive effect. Previous value was 1800 seconds. We have a spare disk array from the old server and I'm planning to use it as a tablespace for the comment table (the 100M+ rows one) as Ron suggested. >> Queries accumulate and when checkpointing is over, there can be >> something like 400 queries running but over 50% of cpu is just idling. > > 400 queries? Are you launching 400 separate backends to do that? > Some sort of connection pooling seems like a good idea, if you don't > have it in place already. If the system's effective behavior in the > face of heavy load is to start even more concurrent backends, that > could easily drive things into the ground. Ok, I implemented connection pooling using pgpool and it increased performance a lot! We are now delivering about 1500 dynamic pages a second without problems. Each of the eight single-cpu webservers are running a pgpool instance with 20 connections. However, those configuration changes didn't have significant effect to oprofile results. AtEOXact_CatCache consumes even more cycles. This isn't a problem right now but it may be in the future... CPU: AMD64 processors, speed 2190.23 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 1147870 21.1602 AtEOXact_CatCache 187466 3.4558 hash_seq_search 174357 3.2142 AllocSetAlloc 170896 3.1504 nocachegetattr 131724 2.4282 ExecMakeFunctionResultNoSets 125292 2.3097 SearchCatCache 117264 2.1617 StrategyDirtyBufferList 105741 1.9493 hash_search 98245 1.8111 FunctionCall2 97878 1.8043 yyparse 90932 1.6763 LWLockAcquire 83555 1.5403 LWLockRelease 81045 1.4940 _bt_compare ... and so on ... ----->8 Signigicant rows from current postgresql.conf 8<----- max_connections = 768 # unnecessarily large with connection pooling shared_buffers = 15000 work_mem = 2048 maintenance_work_mem = 32768 max_fsm_pages = 1000000 max_fsm_relations = 5000 bgwriter_percent = 2 fsync = true wal_buffers = 512 checkpoint_segments = 200 # less would probably be enuff with 600sec timeout checkpoint_timeout = 600 effective_cache_size = 500000 random_page_cost = 1.5 default_statistics_target = 150 stats_start_collector = true stats_command_string = true |\__/| ( oo ) Kari Lavikka - tuner@bdb.fi - (050) 380 3808 __ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _ ""
Kari Lavikka <tuner@bdb.fi> writes: > However, those configuration changes didn't have significant effect to > oprofile results. AtEOXact_CatCache consumes even more cycles. I believe I've fixed that for 8.1. regards, tom lane
> Kari Lavikka <tuner@bdb.fi> writes: > > However, those configuration changes didn't have significant effect to > > oprofile results. AtEOXact_CatCache consumes even more cycles. > > I believe I've fixed that for 8.1. Relative to 8.0, I am seeing a dramatic, almost miraculous reduction in CPU load times in 8.1devel. This is for ISAM style access patterns over the parse/bind interface. (IOW one record at a time, 90% read, 10% write). Relative to commercial dedicated ISAM storage engines, pg holds up very well except in cpu load, but 8.1 is a huge step towards addressing that. So far, except for one minor (and completely understandable) issue with bitmap issues, 8.1 has been a stellar performer. Also great is the expansion of pg_locks view (which I didn't see mentioned in Bruce's TODO list, just FYI). Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > Relative to 8.0, I am seeing a dramatic, almost miraculous reduction in > CPU load times in 8.1devel. This is for ISAM style access patterns over > the parse/bind interface. (IOW one record at a time, 90% read, 10% > write). > Relative to commercial dedicated ISAM storage engines, pg holds up very > well except in cpu load, but 8.1 is a huge step towards addressing that. Cool --- we've done a fair amount of work on squeezing out internal inefficiencies during this devel cycle, but it's always hard to predict just how much anyone will notice in the real world. Care to do some oprofile or gprof profiles to see where it's still bad? regards, tom lane
> Cool --- we've done a fair amount of work on squeezing out internal > inefficiencies during this devel cycle, but it's always hard to predict > just how much anyone will notice in the real world. > > Care to do some oprofile or gprof profiles to see where it's still bad? > Since release of 8.0, we are a strictly windows shop :). I tried building pg with -pg flag and got errors in some of the satellite libraries. I think this is solvable though at some point I'll spend more time on it. Anyways, just so you know the #s that I'm seein, I've run several benchmarks of various programs that access pg via our ISAM bridge. The results are as consistent as they are good. These tests are on the same box using the same .conf on the same freshly loaded data. The disk doesn't play a major role in these tests. All data access is through ExecPrepared libpq C interface. Benchmark is run from a separate box on a LAN. Bill of Materials Traversal ( ~ 62k records). ISAM* pg 8.0 pg 8.1 devel delta 8.0->8.1 running time 63 sec 90 secs 71 secs 21% cpu load 17% 45% 32% 29% loadsecs** 10.71 40.5 22.72 44% recs/sec 984 688 873 recs/loadsec 5882 1530 2728 *ISAM is an anonymous commercial ISAM library in an optimized server architecture (pg smokes the non-optimized flat file version). **Loadsecs being seconds of CPU at 100% load. IOW cpu load drop is around 44%. Amazing! Merlin
> Bill of Materials Traversal ( ~ 62k records). > > ISAM* pg 8.0 pg 8.1 devel delta 8.0->8.1 > running time 63 sec 90 secs 71 secs 21% > cpu load 17% 45% 32% 29% > loadsecs** 10.71 40.5 22.72 44% > recs/sec 984 688 873 > recs/loadsec 5882 1530 2728 > > *ISAM is an anonymous commercial ISAM library in an optimized server > architecture (pg smokes the non-optimized flat file version). > **Loadsecs being seconds of CPU at 100% load. One thing that might interest you is that the penalty in 8.1 for stats_command_string=true in this type of access pattern is very high: I was experimenting to see if the new cpu efficiency gave me enough of a budget to start using this. This more than doubled the cpu load to around 70% with a runtime of 82 seconds. This is actually worse than 8.0 :(. This *might* be a somewhat win32 specific issue. I've had issues with the stats collector before. Anyways, the feature is a frill so it's not a big deal. Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > One thing that might interest you is that the penalty in 8.1 for > stats_command_string=true in this type of access pattern is very high: I > was experimenting to see if the new cpu efficiency gave me enough of a > budget to start using this. This more than doubled the cpu load to > around 70% with a runtime of 82 seconds. This is actually worse than > 8.0 :(. That seems quite peculiar; AFAICS the pgstat code shouldn't be any slower than before. At first I thought it might be because we'd increased PGSTAT_ACTIVITY_SIZE, but actually that happened before 8.0 release, so it shouldn't be a factor in this comparison. Can anyone else confirm a larger penalty for stats_command_string in HEAD than in 8.0? A self-contained test case would be nice too. regards, tom lane
> That seems quite peculiar; AFAICS the pgstat code shouldn't be any > slower than before. At first I thought it might be because we'd > increased PGSTAT_ACTIVITY_SIZE, but actually that happened before > 8.0 release, so it shouldn't be a factor in this comparison. Just FYI the last time I looked at stats was in the 8.0 beta period. > Can anyone else confirm a larger penalty for stats_command_string in > HEAD than in 8.0? A self-contained test case would be nice too. looking into it. Merlin