Thread: Extreme high load averages
The only time that I have ever seen load averages of 30 or more under OpenBSD is when one of my scripts goes wild. However, I can say that I am also seeing these load averages under PostgreSQL 7.3.2 after a migration to it from MySQL. MySQL Statistics: Uptime: 1055352 Threads: 178 Questions: 75161710 Slow queries: 46 Opens: 1084 Flush tables: 1 Open tables: 206 Queries per second avg: 71.220 The above are statistics from older generation scripts that would make use of MySQL as to give an idea of what's going on. That generation of scripts would handle the referential integrity, since foreign key constraints are not enforced under that system. However, the system handled 250 concurrent users without a singular problem, while under Postgres with new scripts using functions, referential integrity, transactions and lighter code, the system starts to buckle at even less then 70 users. What I would like to know is. Why? The kernel has been compiled to handle the number of concurrent connections, the server may not be the best, but it should be able to handle the requests: PIII 1Ghz, 1GB SDRAM, 2 IDE 20GB drives. I have changed settings to take advantage of the memory. So the following settings are of interest: shared_buffers = 16384 wal_buffers = 256 sort_mem = 16384 vacuum_mem = 32768 Statistics gathering has now been disabled, and logging is done through syslog. I do not expect those settings to cripple system performance however. The scripts are heavy SELECTS with a fair dose of UPDATES and INSERTS. To get a concept of what these scripts done, you can look at Ethereal Realms (http://www.ethereal-realms.org) which are running the PostgreSQL script variants or consider that this is a chat site. Anyone have ideas? Is the use of connection pooling consider bad? Should flush be run more then once a day? I have no intention of going back to MySQL, and would like to make this new solution work. Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org
Richard Huxton wrote: > On Sunday 06 Jul 2003 5:54 am, Martin Foster wrote: > >>The only time that I have ever seen load averages of 30 or more under >>OpenBSD is when one of my scripts goes wild. However, I can say that >>I am also seeing these load averages under PostgreSQL 7.3.2 after a >>migration to it from MySQL. > > [snip] > >>However, the system >>handled 250 concurrent users without a singular problem, while under >>Postgres with new scripts using functions, referential integrity, >>transactions and lighter code, the system starts to buckle at even less >>then 70 users. > > [snip] > >>PIII 1Ghz, 1GB >>SDRAM, 2 IDE 20GB drives. >> >>I have changed settings to take advantage of the memory. So the >>following settings are of interest: >> shared_buffers = 16384 >> wal_buffers = 256 >> sort_mem = 16384 >> vacuum_mem = 32768 > > > You do know that sort_mem is in kB per sort (not per connection, but per sort > being done by a connection). That's 16MB per sort you've allowed in main > memory, or for 70 concurrent sorts up to 1.1GB of memory allocated to > sorting. You're not going into swap by any chance? > > Might want to try halving shared_buffers too and see what happens. > > I don't know the *BSDs myself, but do you have the equivalent of iostat/vmstat > output you could get for us? Also a snapshot of "top" output? People are > going to want to see: > - overall memory usage (free/buffers/cache/swap) > - memory usage per process > - disk activity (blocks in/out) > >From that lot, someone will be able to point towards the issue, I'm sure. Actually, no I did not. Which is probably why it was as high as it is. When looking at the PostgreSQL Hardware Performance Tuning page, it seems to imply that you should calculate based on RAM to give it an appropriate value. http://www.postgresql.org/docs/aw_pgsql_book/hw_performance/node8.html I dropped that value, and will see if that helps. The thing is, the system always indicated plenty of memory available. Even when at a 30 load level the free memory was still roughly 170MB. Tomorrow will be a good gage to see if the changes will actually help matters. And if they do not, I will include vmstat, iostat, and top as requested. Thanks! Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org
Martin Foster <martin@ethereal-realms.org> writes: >> The only time that I have ever seen load averages of 30 or more under >> OpenBSD is when one of my scripts goes wild. Note also that "high load average" is not per se an indication that anything is wrong. In Postgres, if you have thirty queries waiting for disk I/O, that's thirty processes --- so if that's the average state then the kernel will report a load average of thirty. While I'm no MySQL expert, I believe that the equivalent condition in MySQL would be thirty threads blocked for I/O within one process. Depending on how your kernel is written, that might show as a load average of one ... but the difference is completely illusory, because what counts is the number of disk I/Os in flight, and that's the same. You didn't say whether you were seeing any real performance problems, like slow queries or performance dropping when query load rises, but that is the aspect to concentrate on. I concur with the nearby recommendations to drop your resource settings. The thing you have to keep in mind about Postgres is that it likes to have a lot of physical RAM available for kernel disk buffers (disk cache). In a correctly tuned system that's been up for any length of time, "free memory" should be nearly nada, and the amount of RAM used for disk buffers should be sizable (50% or more of RAM would be good IMHO). regards, tom lane
On Sunday 06 Jul 2003 5:54 am, Martin Foster wrote: > The only time that I have ever seen load averages of 30 or more under > OpenBSD is when one of my scripts goes wild. However, I can say that > I am also seeing these load averages under PostgreSQL 7.3.2 after a > migration to it from MySQL. [snip] > However, the system > handled 250 concurrent users without a singular problem, while under > Postgres with new scripts using functions, referential integrity, > transactions and lighter code, the system starts to buckle at even less > then 70 users. [snip] > PIII 1Ghz, 1GB > SDRAM, 2 IDE 20GB drives. > > I have changed settings to take advantage of the memory. So the > following settings are of interest: > shared_buffers = 16384 > wal_buffers = 256 > sort_mem = 16384 > vacuum_mem = 32768 You do know that sort_mem is in kB per sort (not per connection, but per sort being done by a connection). That's 16MB per sort you've allowed in main memory, or for 70 concurrent sorts up to 1.1GB of memory allocated to sorting. You're not going into swap by any chance? Might want to try halving shared_buffers too and see what happens. I don't know the *BSDs myself, but do you have the equivalent of iostat/vmstat output you could get for us? Also a snapshot of "top" output? People are going to want to see: - overall memory usage (free/buffers/cache/swap) - memory usage per process - disk activity (blocks in/out) From that lot, someone will be able to point towards the issue, I'm sure. -- Richard Huxton
Tom Lane wrote: > Martin Foster <martin@ethereal-realms.org> writes: > >>>The only time that I have ever seen load averages of 30 or more under >>>OpenBSD is when one of my scripts goes wild. > > > Note also that "high load average" is not per se an indication that > anything is wrong. In Postgres, if you have thirty queries waiting > for disk I/O, that's thirty processes --- so if that's the average > state then the kernel will report a load average of thirty. While > I'm no MySQL expert, I believe that the equivalent condition in MySQL > would be thirty threads blocked for I/O within one process. Depending > on how your kernel is written, that might show as a load average of > one ... but the difference is completely illusory, because what counts > is the number of disk I/Os in flight, and that's the same. > > You didn't say whether you were seeing any real performance problems, > like slow queries or performance dropping when query load rises, but > that is the aspect to concentrate on. > > I concur with the nearby recommendations to drop your resource settings. > The thing you have to keep in mind about Postgres is that it likes to > have a lot of physical RAM available for kernel disk buffers (disk > cache). In a correctly tuned system that's been up for any length of > time, "free memory" should be nearly nada, and the amount of RAM used > for disk buffers should be sizable (50% or more of RAM would be good > IMHO). > > regards, tom lane Under a circumstance where we have 250 concurrent users, MySQL would report an uptime of 0.5 sometimes 0.8 depending on the tasks being performed. This would translate to wait times averaging less then a second, and under a heavy resource script 4 seconds. That system had less RAM however. This new system when showing a load average of 30, produced wait times of 12 seconds averages and about 30 seconds for the heavy resource script. The web server itself showed a load average of 0.5 showing that it was not heavy client interaction slowing things down. So there is a very noticeable loss of performance when the system skyrockets like that. All of the load as indicated by top is at user level, and not swap is even touched. This may help show why I was slightly concerned. Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org
Richard Huxton wrote: > > I don't know the *BSDs myself, but do you have the equivalent of iostat/vmstat > output you could get for us? Also a snapshot of "top" output? People are > going to want to see: > - overall memory usage (free/buffers/cache/swap) > - memory usage per process > - disk activity (blocks in/out) > I changed a bit of the scripting code to cut down on the weight of a query being run. This is the only thing in the entire system that would cause scripts to run at high processor times for extended lengths. With the corrections, postgres processes average more closely to < 1% then before. This is not stopping the system from getting high load averages. Attached, is an example of the site running at 160 users with very slow response rates (30 seconds for some scripts). According to top, and ps nothing is taking up all that processing time. The processor seems to be purposely sitting there twiddling it's thumbs. Which leads me to believe that perhaps the nice levels have to be changed on the server itself? And perhaps increase the file system buffer to cache files in memory instead of always fetching/writing them? Anyone more ideas? Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org --- top --- load averages: 5.00, 4.72, 3.75 21:45:56 134 processes: 6 running, 128 idle CPU states: 91.7% user, 0.0% nice, 6.6% system, 1.6% interrupt, 0.2% idle Memory: Real: 279M/390M act/tot Free: 617M Swap: 0K/2048M used/tot PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND 23235 postgres 64 0 12M 95M run - 0:02 9.42% postgres 5299 postgres 64 0 3872K 57M run - 0:01 5.86% postgres 8933 postgres 64 0 3408K 55M run - 0:01 5.47% postgres 16398 postgres 2 0 3776K 17M sleep netio 0:02 0.05% postgres 14007 named 2 0 2528K 2572K sleep select 0:06 0.00% named 3684 postgres 2 0 2120K 4812K sleep select 0:07 0.00% postgres 23518 postgres 2 0 3664K 36M sleep netio 0:03 0.00% postgres 571 postgres 2 0 3776K 51M sleep netio 0:03 0.00% postgres 11159 postgres 2 0 3664K 35M sleep netio 0:03 0.00% postgres 19184 postgres 2 0 3776K 16M sleep netio 0:03 0.00% postgres 28931 postgres 2 0 3712K 16M sleep netio 0:02 0.00% postgres 17523 postgres 2 0 3712K 14M sleep netio 0:02 0.00% postgres 8272 postgres 2 0 3712K 14M sleep netio 0:02 0.00% postgres 12034 postgres 2 0 3712K 14M sleep netio 0:02 0.00% postgres 30825 postgres 2 0 3776K 17M sleep netio 0:02 0.00% postgres 29173 postgres 2 0 3712K 15M sleep netio 0:02 0.00% postgres 9472 postgres 2 0 3664K 34M sleep netio 0:02 0.00% postgres 11542 postgres 2 0 3776K 16M sleep netio 0:02 0.00% postgres --- vmstat --- procs memory page disks faults cpu r b w avm fre flt re pi po fr sr wd0 wd1 in sy cs us sy id 1 0 0 275352 642800 8173 0 0 0 0 0 18 21 534 853 243 50 5 45 --- iostat --- tty wd0 wd1 cpu tin tout KB/t t/s MB/s KB/t t/s MB/s us ni sy in id 0 74 13.00 18 0.23 10.08 21 0.21 50 0 4 1 45 --- pstat -s --- Device 512-blocks Used Avail Capacity Priority swap_device 4194288 0 4194288 0% 0 --- dmesg --- OpenBSD 3.3-stable (compile) #2: Sat Jul 5 15:17:30 MDT 2003 martin@carme:/usr/src/sys/arch/i386/compile cpu0: Intel Pentium III (Coppermine) ("GenuineIntel" 686-class) 1 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SYS,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SIMD real mem = 1073250304 (1048096K) avail mem = 992940032 (969668K) using 4278 buffers containing 53764096 bytes (52504K) of memory mainbus0 (root) bios0 at mainbus0: AT/286+(a1) BIOS, date 07/20/01, BIOS32 rev. 0 @ 0xf0b20 apm0 at bios0: Power Management spec V1.2 apm0: AC on, battery charge unknown pcibios0 at bios0: rev. 2.1 @ 0xf0000/0x1382 pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xf12d0/176 (9 entries) pcibios0: PCI Interrupt Router at 000:04:0 ("VIA VT82C586 PCI-ISA" rev 0x00) pcibios0: PCI bus #1 is the last bus bios0: ROM list: 0xc0000/0x8000 0xc8000/0x1800 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 "VIA VT82C691 Host-PCI" rev 0xc4 ppb0 at pci0 dev 1 function 0 "VIA VT82C598 PCI-AGP" rev 0x00 pci1 at ppb0 bus 1 vga1 at pci1 dev 0 function 0 "NVidia/SGS-Thomson Velocity128" rev 0x22 wsdisplay0 at vga1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) pcib0 at pci0 dev 4 function 0 "VIA VT82C686 PCI-ISA" rev 0x40 pciide0 at pci0 dev 4 function 1 "VIA VT82C571 IDE" rev 0x06: ATA100, channel 0 configured to compatibility, channel 1 configuredto compatibility wd0 at pciide0 channel 0 drive 0: <FUJITSU MPG3204AT E> wd0: 16-sector PIO, LBA, 19546MB, 16383 cyl, 16 head, 63 sec, 40031712 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 wd1 at pciide0 channel 1 drive 0: <FUJITSU MPF3204AT> wd1: 16-sector PIO, LBA, 19546MB, 16383 cyl, 16 head, 63 sec, 40031712 sectors atapiscsi0 at pciide0 channel 1 drive 1 scsibus0 at atapiscsi0: 2 targets scsibus0 targ 0 lun 0: <LG, CD-ROM CRD-8522B, 2.03> SCSI0 5/cdrom removable not configured wd1(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 4 atapiscsi0(pciide0:1:1): using PIO mode 4, DMA mode 2 "VIA VT82C686 SMBus" rev 0x40 at pci0 dev 4 function 4 not configured fxp0 at pci0 dev 10 function 0 "Intel 82557" rev 0x0c: irq 11, address 00:02:b3:8c:4f:db inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4 isa0 at pcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard, using wsdisplay0 pcppi0 at isa0 port 0x61 sysbeep0 at pcppi0 npx0 at isa0 port 0xf0/16: using exception 16 biomask c000 netmask c800 ttymask c802 pctr: 686-class user-level performance counters enabled mtrr: Pentium Pro MTRR support dkcsum: wd0 matched BIOS disk 80 dkcsum: wd1 matched BIOS disk 81 root on wd0a rootdev=0x0 rrootdev=0x300 rawdev=0x302
On Sun, 6 Jul 2003, Martin Foster wrote: > The processor seems to be purposely sitting there twiddling it's thumbs. > Which leads me to believe that perhaps the nice levels have to be > changed on the server itself? It could also be all the usual things that affect performance. Are your queries using indexes where it should? Do you vacuum analyze after you have updated/inserted a lot of data? It could be that some of your queries is not as efficient as it should, like doing a sequenctial scan over a table instead of an index scan. That translates into more IO needed and slower response times. Especially when you have more connections figthing for the available IO. -- /Dennis
Dennis Björklund wrote: > On Sun, 6 Jul 2003, Martin Foster wrote: > > >>The processor seems to be purposely sitting there twiddling it's thumbs. >> Which leads me to believe that perhaps the nice levels have to be >>changed on the server itself? > > > It could also be all the usual things that affect performance. Are your > queries using indexes where it should? Do you vacuum analyze after you > have updated/inserted a lot of data? > > It could be that some of your queries is not as efficient as it should, > like doing a sequenctial scan over a table instead of an index scan. That > translates into more IO needed and slower response times. Especially when > you have more connections figthing for the available IO. > I actually got a bit more respect for PostgreSQL tonight. It seems that one of my scripts was not committing changes after maintenance was conducted. Meaning that rows that would normally be removed after offline archiving was completed were in fact still around. Normally at any given point in time this table would grow 50K rows during a day, be archived that night and then loose rows that were no longer needed. This process, is what allowed MySQL to maintain any stability as the size of this table can balloon significantly. PostgreSQL with tweaking was handling a table with nearly 300K rows. That size alone would of dragged the MySQL system down to a near grind, and since most of those rows are not needed. One can imagine that queries are needlessly processing rows that should be outright ignored. This probably explains why row numbering based searches greatly accelerated the overall process. By fixing the script and doing the appropriate full vacuum and re-index, the system is behaving much more like it should. Even if the process may seem a bit odd to some. The reason for removing rows on a daily basis is due to the perishable nature of the information. Since this is a chat site, posts over a day old are rarely needed for any reason. Which is why they are archived into dumps in case we really need to retrieve the information itself and this gives us the added bonus of smaller backup sizes and smaller database sizes. Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org
> I actually got a bit more respect for PostgreSQL tonight. It seems > that one of my scripts was not committing changes after maintenance > was conducted. Meaning that rows that would normally be removed > after offline archiving was completed were in fact still around. > > Normally at any given point in time this table would grow 50K rows > during a day, be archived that night and then loose rows that were > no longer needed. This process, is what allowed MySQL to maintain > any stability as the size of this table can balloon significantly. > > PostgreSQL with tweaking was handling a table with nearly 300K rows. > That size alone would of dragged the MySQL system down to a near > grind, and since most of those rows are not needed. One can imagine > that queries are needlessly processing rows that should be outright > ignored. Having used MySQL once upon a time and run into it's problems when you have more than 1M rows in a table, it took me a while when 1st using PostgreSQL to trust that PostgreSQL can reliably handle millions or billions of rows without crapping out randomly and corrupting itself. If you would have let this grow, you'd run out of disk space long before you hit anything close to a stability, reliability, or performance problem with PostgreSQL. I have one table in particular that has about 1.9B rows in it right now and it conservatively takes about 0.04ms for non-complex queries to run against the table. In MySQL land, I wouldn't dare let something grow that big... which would've been a huge problem because the table mentioned above isn't logging data or something I can routinely purge. It's a strange feeling at first to not have to design your application around size or tuple limitations of the database any more. :) I'm glad you're enjoying PostgreSQL. -sc -- Sean Chittenden