Thread: Fwd: FreeBSD, PostgreSQL, semwait and sbwait!
---------- Forwarded Message ---------- Subject: FreeBSD, PostgreSQL, semwait and sbwait! Date: March 23, 2004 12:02 pm From: "Jason Coene" <jcoene@gotfrag.com> To: <freebsd-performance@freebsd.org> Hello all, We're having a substantial problem with our FreeBSD 5.2 database server running PostgreSQL - it's getting a lot of traffic (figure about 3,000 queries per second), but queries are slow, and it's seemingly waiting on other things than CPU time. The database server is a dual P4-2.8 w/ HT enabled (kernel finds 4 processors), 2GB RAM, 4 disk Serial ATA on 3ware RAID, gigabit Ethernet connection to web servers. It's running FreeBSD 5.2 and PostgreSQL 7.4.1. The server is taking a while to respond to both connections, and then queries (between 1-3 seconds, on a query that should execute in 0.05 or less). The CPU usage for the server never goes above 30% (70% idle), and the CPU time that's in use is nearly always split equal between user and system. The system is using Doing a "top", this is what we see: last pid: 51833; load averages: 13.72, 11.74, 10.01 up 0+01:55:45 15:00:03 116 processes: 1 running, 99 sleeping, 16 lock CPU states: 14.6% user, 0.0% nice, 23.7% system, 0.2% interrupt, 61.5% idle Mem: 91M Active, 1043M Inact, 160M Wired, 52K Cache, 112M Buf, 644M Free Swap: 4096M Total, 4096M Free 20354 pgsql 131 0 80728K 5352K select 0 0:24 1.71% 1.71% postgres 36415 pgsql 4 0 81656K 67468K sbwait 2 0:00 3.23% 0.59% postgres 36442 pgsql 128 0 82360K 15868K select 2 0:00 1.75% 0.24% postgres 36447 pgsql -4 0 82544K 10616K semwai 0 0:00 2.05% 0.20% postgres 36461 pgsql -4 0 81612K 6844K semwai 2 0:00 2.05% 0.20% postgres 36368 pgsql 4 0 82416K 20780K sbwait 3 0:00 0.50% 0.15% postgres 36459 pgsql -4 0 81840K 7816K semwai 0 0:00 1.54% 0.15% postgres 36469 pgsql -4 0 81840K 7964K semwai 2 0:00 1.54% 0.15% postgres 36466 pgsql 129 0 81840K 7976K *Giant 2 0:00 1.54% 0.15% postgres 36479 pgsql -4 0 81528K 6648K semwai 0 0:00 3.00% 0.15% postgres 36457 pgsql -4 0 81840K 8040K semwai 1 0:00 1.03% 0.10% postgres 36450 pgsql 129 0 82352K 8188K *Giant 2 0:00 1.03% 0.10% postgres 36472 pgsql -4 0 81824K 7416K semwai 2 0:00 1.03% 0.10% postgres 36478 pgsql 131 0 81840K 7936K select 0 0:00 2.00% 0.10% postgres 36454 pgsql 4 0 82416K 16300K sbwait 3 0:00 0.51% 0.05% postgres 36414 pgsql 4 0 82416K 15872K sbwait 2 0:00 0.27% 0.05% postgres Our kernel is GENERIC plus: maxusers 512 options SYSVSHM options SHMMAXPGS=262144 options SHMSEG=512 options SHMMNI=512 options SYSVSEM options SEMMNI=512 options SEMMNS=1024 options SEMMNU=512 options SEMMAP=512 options NMBCLUSTERS=32768 Interesting bits from postgresql.conf: max_connections = 512 shared_buffers = 8192 sort_mem = 16384 vacuum_mem = 8192 fsync = false It seems that queries are executing fine once they start, but it's taking a while for them to get going, while the postgres process sits in semwait, sbwait or select. This problem doesn't happen when there's little load on the server, it's only when we open it for public consumption that it exhibits these problems. Anyone have this type of problem before? Am I missing something? Thanks, Jason _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" ------------------------------------------------------- -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
Darcy Buskermolen <darcy@wavefire.com> writes: > The database server is a dual P4-2.8 w/ HT enabled (kernel finds 4 > processors), 2GB RAM, 4 disk Serial ATA on 3ware RAID, gigabit Ethernet > connection to web servers. It's running FreeBSD 5.2 and PostgreSQL 7.4.1. Hm. What happens if you turn off the hyperthreading? We have seen a number of reports recently that suggest that our spinlocking code behaves inefficiently on hyperthreaded machines. This hasn't got to the point where we have any substantiated evidence, mind you, but maybe you can help provide some. Also it might be interesting to put one of these into the outer loop in s_lock(): __asm__ __volatile__( " rep; nop \n" : : : "memory"); (This suggestion is a quick-and-dirty backport of a change that's already in CVS tip.) regards, tom lane
Darcy, I suggest getting this person over here instead. They have a *lot* to learn about tuning PostgreSQL. -- -Josh Berkus Aglio Database Solutions San Francisco
Tom, > Hm. What happens if you turn off the hyperthreading? Forget hyperthreading. Look at their postgresql.conf settings. 8mb shared mem, 16mb sort mem per connection for 512 connections, default effective_cache_size. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Forget hyperthreading. Look at their postgresql.conf settings. 8mb shared > mem, 16mb sort mem per connection for 512 connections, default > effective_cache_size. They could well be going into swap hell due to the oversized sort_mem, but that didn't quite seem to explain the reported behavior. I'd want to see vmstat or similar output to confirm whether the disks are busy, though. Amazing how many people forget that a database is normally I/O-bound rather than CPU-bound. regards, tom lane
Josh Berkus wrote: > Forget hyperthreading. Look at their postgresql.conf settings. 8mb shared > >mem, 16mb sort mem per connection for 512 connections, default >effective_cache_size. > > > Umm...its 64Mb shared buffers isn't it ? However agree completely with general thrust of message.... particularly the 16Mb of sort mem / connection - a very bad idea unless you are running a data warehouse box for only a few users (not 512 of them...) regards Mark
Darcy Buskermolen wrote: >---------- Forwarded Message ---------- > >Subject: FreeBSD, PostgreSQL, semwait and sbwait! >Date: March 23, 2004 12:02 pm >From: "Jason Coene" <jcoene@gotfrag.com> >To: <freebsd-performance@freebsd.org> > >Hello all, > >We're having a substantial problem with our FreeBSD 5.2 database server >running PostgreSQL - it's getting a lot of traffic (figure about 3,000 >queries per second), but queries are slow, and it's seemingly waiting on >other things than CPU time > > Could this be a 5.2 performance issue ? In spite of certain areas where the 5.x series performance is known to be much better than 4.x (e.g networking), this may not be manifested in practice for a complete application. (e.g. I am still running 4.9 as it outperformed 5.1 vastly for a ~100 database sessions running queries - note that I have not tried out 5.2, so am happy to be corrected on this) regards Mark
Hello, >> We're having a substantial problem with our FreeBSD 5.2 database >> server >> running PostgreSQL - it's getting a lot of traffic (figure about 3,000 >> queries per second), but queries are slow, and it's seemingly waiting >> on >> other things than CPU time >> > Could this be a 5.2 performance issue ? > > In spite of certain areas where the 5.x series performance is known to > be much better than 4.x (e.g networking), this may not be manifested > in practice for a complete application. > (e.g. I am still running 4.9 as it outperformed 5.1 vastly for a ~100 > database sessions running queries - note that I have not tried out > 5.2, so am happy to be corrected on this) I found the same problem. I use OpenBSD 3.3, On Pentium 2,4 GHz with 1 Gb RAM, RAID 10. With PostgreSQL 7.4.1 with 32 Kb bock's size (to match ffs and raid block's size) With pg_autovacuum daemon from Pg 7.5. I run a web indexer. sd0 raid-1 with system pg-log and indexer-log sd1 raid-10 with pg-data and indexer-data The sd1 disk achives between 10 and 40 Mb/s on normal operation. When I get semwait in top, system waits ;-) Not much disk activity. Not much log in pg or indexer. Just wait.... What can I do ? > sudo top -s1 -S -I load averages: 4.45, 4.45, 3.86 11:25:52 97 processes: 1 running, 96 idle CPU states: 2.3% user, 0.0% nice, 3.8% system, 0.8% interrupt, 93.1% idle Memory: Real: 473M/803M act/tot Free: 201M Swap: 0K/3953M used/tot PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND 2143 postgres -5 0 4008K 37M sleep biowai 1:02 1.81% postgres 28662 postgres 14 0 4060K 37M sleep semwai 0:59 1.17% postgres 25794 postgres 14 0 4072K 37M sleep semwai 1:30 0.93% postgres 23271 postgres -5 0 4060K 37M sleep biowai 1:13 0.29% postgres 14619 root 28 0 276K 844K run - 0:01 0.00% top > vmstat -w1 sd0 sd1 r b w avm fre flt re pi po fr sr sd0 sd1 in sy cs us sy id 0 4 0 527412 36288 1850 0 0 0 0 0 26 72 368 8190 588 0 4 96 0 4 0 527420 36288 1856 0 0 0 0 0 0 86 356 8653 620 2 2 97 0 4 0 527432 36280 1853 0 0 0 0 0 0 54 321 8318 458 1 3 96 0 4 0 527436 36248 1864 0 0 0 0 0 0 77 358 8417 539 1 2 97 0 4 0 522828 40932 2133 0 0 0 0 0 7 70 412 15665 724 2 3 95 0 4 0 522896 40872 1891 0 0 0 0 0 15 72 340 9656 727 3 5 92 0 4 0 522900 40872 1841 0 0 0 0 0 0 69 322 8308 536 1 2 98 0 4 0 522920 40860 1846 0 0 0 0 0 1 69 327 8023 520 2 2 97 0 4 0 522944 40848 1849 0 0 0 0 0 4 76 336 8035 567 1 2 97 0 4 0 522960 40848 1843 0 0 0 0 0 0 77 331 14669 587 3 2 95 0 4 0 522976 40836 1848 0 0 0 0 0 4 81 339 8384 581 1 2 97 0 4 0 522980 40836 1841 0 0 0 0 0 3 65 320 8068 502 1 4 95 0 4 0 523000 40824 1848 0 0 0 0 0 14 74 341 8226 564 3 2 95 0 4 0 523020 40812 1844 0 0 0 0 0 0 67 317 7606 530 2 1 97 1 4 0 523052 40796 1661 0 0 0 0 0 0 68 315 11603 493 2 2 97 1 4 0 523056 40800 233 0 0 0 0 0 12 87 341 12550 609 2 2 96 0 4 0 523076 40788 1845 0 0 0 0 0 0 82 334 12457 626 2 2 96 0 4 0 523100 40776 1851 0 0 0 0 0 0 91 345 10914 623 2 3 95 0 4 0 523120 40764 1845 0 0 0 0 0 0 92 343 19213 596 1 5 95 0 4 0 523136 40752 1845 0 0 0 0 0 0 97 349 8659 605 2 2 96 0 4 0 523144 40748 4501 0 0 0 0 0 32 78 385 15632 934 25 12 64 0 4 0 523168 40728 1853 0 0 0 0 0 3 74 335 3965 531 0 2 98 > ps -Upostgresql -Ostart | grep -v idle PID STARTED TT STAT TIME COMMAND 8267 10:53AM ?? Is 0:00.28 /usr/local/bin/pg_autovacuum -D -L /var/pgsql/autovacuum 23271 10:54AM ?? I 1:13.56 postmaster: dps dps 127.0.0.1 SELECT (postgres) 28662 10:55AM ?? I 0:59.98 postmaster: dps dps 127.0.0.1 SELECT (postgres) 25794 10:56AM ?? D 1:30.48 postmaster: dps dps 127.0.0.1 SELECT (postgres) 2143 11:02AM ?? D 1:02.06 postmaster: dps dps 127.0.0.1 DELETE (postgres) 25904 10:52AM C0- I 0:00.07 /usr/local/bin/postmaster -D /var/pgsql (postgres) 10908 10:52AM C0- I 0:05.96 postmaster: stats collector process (postgres) 7045 10:52AM C0- I 0:05.19 postmaster: stats buffer process (postgres) > grep -v -E '^#' /var/pgsql/postgresql.conf tcpip_socket = true max_connections = 100 shared_buffers = 1024 # 32KB max_fsm_pages = 1000000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 200 # min 100, ~50 bytes each wal_buffers = 32 # min 4, 8KB each checkpoint_segments = 16 # in logfile segments, min 1, 16MB each commit_delay = 100 # range 0-100000, in microseconds effective_cache_size = 4096 # 32KB each random_page_cost = 3 default_statistics_target = 200 # range 1-1000 client_min_messages = notice # Values, in order of decreasing detail: log_min_messages = log # Values, in order of decreasing detail: log_min_duration_statement = 20000 # Log all statements whose log_timestamp = true stats_start_collector = true stats_command_string = true stats_block_level = true stats_row_level = true stats_reset_on_server_start = true lc_messages = 'C' # locale for system error message strings lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C' # locale for time formatting explain_pretty_print = true > sysctl -a | grep seminfo kern.seminfo.semmni = 256 kern.seminfo.semmns = 2048 kern.seminfo.semmnu = 30 kern.seminfo.semmsl = 60 kern.seminfo.semopm = 100 kern.seminfo.semume = 10 kern.seminfo.semusz = 100 kern.seminfo.semvmx = 32767 kern.seminfo.semaem = 16384 > systat vmstat 7 users Load 3.48 3.64 3.56 Fri Apr 30 14:42:18 2004 memory totals (in KB) PAGING SWAPPING Interrupts real virtual free in out in out 361 total Active 514768 527436 36280 ops 100 clock All 992496 1005164 4071736 pages 128 rtc 45 fxp0 Proc:r d s w Csw Trp Sys Int Sof Flt 6 forks 88 twe0 4 26 580 1848 8395 361 249 1856 6 fkppw fksvm 3.0% Sys 1.1% User 0.0% Nice 95.9% Idle pwait | | | | | | | | | | | relck => rlkok noram Namei Sys-cache Proc-cache No-cache 80 ndcpy Calls hits % hits % miss % 54 fltcp 812 806 99 5 1 1 0 208 zfod 95 cow Discs cd0 sd0 sd1 sd2 fd0 128 fmin seeks 6 82 170 ftarg xfers 6 82 60208 itarg Kbyte 47 2554 226 wired sec 1.0 pdfre > tail -f /var/pgsql/log 2004-04-30 11:35:03 LOG: recycled transaction log file "000000C8000000CA" 2004-04-30 11:35:03 LOG: recycled transaction log file "000000C8000000CB" 2004-04-30 11:35:03 LOG: recycled transaction log file "000000C8000000CC" 2004-04-30 11:35:03 LOG: recycled transaction log file "000000C8000000BF" 2004-04-30 11:35:03 LOG: recycled transaction log file "000000C8000000C0" 2004-04-30 11:35:03 LOG: recycled transaction log file "000000C8000000C1" 2004-04-30 11:35:03 LOG: recycled transaction log file "000000C8000000C2" 2004-04-30 11:36:46 LOG: duration: 28284.360 ms statement: SELECT rec_id,url FROM url WHERE status > 300 AND status<>304 AND (referrer='28615' OR referrer='0') AND bad_since_time<1083317778 2004-04-30 11:36:46 LOG: duration: 24918.201 ms statement: SELECT rec_id,url FROM url WHERE status > 300 AND status<>304 AND (referrer='122879' OR referrer='0') AND bad_since_time<1083317781 2004-04-30 11:36:46 LOG: duration: 21173.427 ms statement: SELECT rec_id,url FROM url WHERE status > 300 AND status<>304 AND (referrer='586182' OR referrer='0') AND bad_since_time<1083317785 From PhpPgAdmin: Table url: Info Row Performance Sequential Index Rows Scan Read Scan Fetch Insert Udate Delete 1 414173 85711 10963854 20431 8707 594 I/O Performance Heap Index TOAST TOAST Index Disk Cache % Disk Cache % Disk Cache % Disk Cache % 3298907 7790769 (70%) 200782 1274898 (86%) 0 0 (0%) 0 0 (0%) Index Row Performance Index Scan Read Fetch url_bad_since_time 0 0 0 url_crc 2924 131566 131566 url_hops 0 0 0 url_last_mod_time 0 0 0 url_next_index_time 5 5120 5120 url_pkey 9187 8980 8980 url_referrer 4431 10753641 10753641 url_seed 0 0 0 url_serverid 0 0 0 url_siteid 0 0 0 url_status 0 0 0 url_url 69164 64547 64547 Index I/O Performance Index Disk Cache % url_bad_since_time 7169 80280 (92%) url_crc 9106 19200 (68%) url_hops 9071 109864 (92%) url_last_mod_time 5836 27887 (83%) url_next_index_time 12004 109887 (90%) url_pkey 7501 52825 (88%) url_referrer 58765 97634 (62%) url_seed 30293 88712 (75%) url_serverid 8647 110078 (93%) url_siteid 8888 109864 (93%) url_status 7448 111250 (94%) url_url 36054 357417 (91%) Cordialement, Jean-Gérard Pailloncy
I am wondering if your wait is caused by contention between pg_autovacuum and the DELETE that is running. Your large Pg blocksize (32K) *may* be contributing to any possible contention as well. Maybe try disabling pg_autovacuum to see if there is any change in behaviour. Also going through my head is '32 Kb bock's size (to match ffs and raid block's size)' - does that mean you have raid strip size = 32K? maybe try 128K (I know it sounds like a bad thing, but generally raid stripes of 128K->256K are better than 32K->64K) regards Mark Pailloncy Jean-Gérard wrote: > Hello, > >> > I found the same problem. > > I use OpenBSD 3.3, > On Pentium 2,4 GHz with 1 Gb RAM, RAID 10. > With PostgreSQL 7.4.1 with 32 Kb bock's size (to match ffs and raid > block's size) > With pg_autovacuum daemon from Pg 7.5. > > I run a web indexer. > sd0 raid-1 with system pg-log and indexer-log > sd1 raid-10 with pg-data and indexer-data > The sd1 disk achives between 10 and 40 Mb/s on normal operation. > > When I get semwait in top, system waits ;-) > Not much disk activity. > Not much log in pg or indexer. > Just wait.... >