Thread: Tuning New Server (slow function)
We just purchased a new Dell PowerEdge 2800 (dual xeon, 8GB RAM, raid 4, RHEL, postgres 8.1) and ported our old database over to it (single cpu, 2GB RAM, no raid, postgres 7.4). Our apps perform great on it, however some queries are super slow. One function in particular, which used to take 15-30 minutes on the old server, has been running now for over 12 hours: BEGIN TRUNCATE stock.datacount; FOR rec IN SELECT itemID, item, hexValue FROM stock.activeitem LOOP histdate := (SELECT updatedate FROM stock.historical s WHERE s.itemID=rec.itemID ORDER BY updatedate DESC LIMIT 1); IF histdate IS NOT NULL THEN funddate := (SELECT updatedate FROM stock.funddata s WHERE s.itemID=rec.itemID); techdate := (SELECT updatedate FROM stock.techsignals s WHERE s.itemID=rec.itemID); IF (histdate <> funddate) OR (histdate <> techdate) OR (funddate IS NULL) OR (techdate IS NULL) THEN counter := counter + 1; outrec.itemID := rec.itemID; outrec.item := rec.item; outrec.hexvalue := rec.hexvalue; RETURN NEXT outrec; END IF; END IF; END LOOP; INSERT INTO stock.datacount (itemcount) VALUES (counter); COPY stock.datacount TO ''/tmp/datacount''; RETURN; END; note: stock.activeitem contains about 75000 rows "top" shows: CPU states: cpu user nice system irq softirq iowait idle total 5.8% 0.6% 31.2% 0.0% 0.0% 0.5% 61.6% Mem: 8152592k av, 8143012k used, 9580k free, 0k shrd, 179888k buff 6342296k actv, 1206340k in_d, 137916k in_c Swap: 8385760k av, 259780k used, 8125980k free 7668624k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 17027 postgres 25 0 566M 561M 560M R 24.9 7.0 924:34 1 postmaster I've likely set some parameter(s) to the wrong values, but I don't know which one(s). Here are my relevant postgresql.conf settings: shared_buffers = 70000 work_mem = 9192 maintenance_work_mem = 131072 max_fsm_pages = 70000 fsync = off (temporarily, will be turned back on) checkpoint_segments = 64 checkpoint_timeout = 1800 effective_cache_size = 70000 [root@new-server root]# cat /proc/sys/kernel/shmmax 660000000 We want to put this into production soon, but this is a showstopper. Can anyone help me out with this? Thanks Ron St.Pierre
On Wed, Jun 21, 2006 at 02:27:41PM -0700, Ron St-Pierre wrote: > We just purchased a new Dell PowerEdge 2800 (dual xeon, 8GB RAM, raid 4, > RHEL, postgres 8.1) and ported our old database over to it (single cpu, RAID *4*? If you do any kind of updating at all, you're likely to be real unhappy with that... > 2GB RAM, no raid, postgres 7.4). Our apps perform great on it, however > some queries are super slow. One function in particular, which used to > take 15-30 minutes on the old server, has been running now for over 12 > hours: > BEGIN > TRUNCATE stock.datacount; > FOR rec IN SELECT itemID, item, hexValue FROM stock.activeitem LOOP > histdate := (SELECT updatedate FROM stock.historical s WHERE > s.itemID=rec.itemID ORDER BY updatedate DESC LIMIT 1); > IF histdate IS NOT NULL THEN > funddate := (SELECT updatedate FROM stock.funddata s WHERE > s.itemID=rec.itemID); > techdate := (SELECT updatedate FROM stock.techsignals s > WHERE s.itemID=rec.itemID); > IF (histdate <> funddate) OR (histdate <> techdate) OR > (funddate IS NULL) OR (techdate IS NULL) THEN > counter := counter + 1; > outrec.itemID := rec.itemID; > outrec.item := rec.item; > outrec.hexvalue := rec.hexvalue; > RETURN NEXT outrec; > END IF; > END IF; > END LOOP; > INSERT INTO stock.datacount (itemcount) VALUES (counter); > COPY stock.datacount TO ''/tmp/datacount''; > RETURN; > END; > > note: stock.activeitem contains about 75000 rows Getting EXPLAIN ANALYZE from the queries would be good. Adding debug output via NOTICE to see how long each step is taking would be a good idea, too. Of course, even better would be to do away with the cursor... > "top" shows: > CPU states: cpu user nice system irq softirq iowait idle > total 5.8% 0.6% 31.2% 0.0% 0.0% 0.5% 61.6% > Mem: 8152592k av, 8143012k used, 9580k free, 0k shrd, 179888k > buff The high system % (if I'm reading this correctly) makes me wonder if this is some kind of locking issue. > 6342296k actv, 1206340k in_d, 137916k in_c > Swap: 8385760k av, 259780k used, 8125980k free 7668624k > cached > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND > 17027 postgres 25 0 566M 561M 560M R 24.9 7.0 924:34 1 > postmaster > > I've likely set some parameter(s) to the wrong values, but I don't know > which one(s). Here are my relevant postgresql.conf settings: > shared_buffers = 70000 > work_mem = 9192 > maintenance_work_mem = 131072 > max_fsm_pages = 70000 > fsync = off (temporarily, will be turned back on) > checkpoint_segments = 64 > checkpoint_timeout = 1800 > effective_cache_size = 70000 > > [root@new-server root]# cat /proc/sys/kernel/shmmax > 660000000 > > We want to put this into production soon, but this is a showstopper. Can > anyone help me out with this? > > > Thanks > > Ron St.Pierre > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Wed, Jun 21, 2006 at 02:27:41PM -0700, Ron St-Pierre wrote: > >> We just purchased a new Dell PowerEdge 2800 (dual xeon, 8GB RAM, raid 4, >> RHEL, postgres 8.1) and ported our old database over to it (single cpu, >> > > RAID *4*? > oops, raid 5 (but we are getting good io throughput...) > If you do any kind of updating at all, you're likely to be real unhappy > with that... > > >> 2GB RAM, no raid, postgres 7.4). Our apps perform great on it, however >> some queries are super slow. One function in particular, which used to >> take 15-30 minutes on the old server, has been running now for over 12 >> hours: >> BEGIN >> TRUNCATE stock.datacount; >> FOR rec IN SELECT itemID, item, hexValue FROM stock.activeitem LOOP >> histdate := (SELECT updatedate FROM stock.historical s WHERE >> s.itemID=rec.itemID ORDER BY updatedate DESC LIMIT 1); >> IF histdate IS NOT NULL THEN >> funddate := (SELECT updatedate FROM stock.funddata s WHERE >> s.itemID=rec.itemID); >> techdate := (SELECT updatedate FROM stock.techsignals s >> WHERE s.itemID=rec.itemID); >> IF (histdate <> funddate) OR (histdate <> techdate) OR >> (funddate IS NULL) OR (techdate IS NULL) THEN >> counter := counter + 1; >> outrec.itemID := rec.itemID; >> outrec.item := rec.item; >> outrec.hexvalue := rec.hexvalue; >> RETURN NEXT outrec; >> END IF; >> END IF; >> END LOOP; >> INSERT INTO stock.datacount (itemcount) VALUES (counter); >> COPY stock.datacount TO ''/tmp/datacount''; >> RETURN; >> END; >> >> note: stock.activeitem contains about 75000 rows >> > > Getting EXPLAIN ANALYZE from the queries would be good. Adding debug > output via NOTICE to see how long each step is taking would be a good > idea, too. > > I set client_min_messages = debug2, log_min_messages = debug2 and log_statement = 'all' and am running the query with EXPLAIN ANALYZE. I don't know how long it will take until something useful returns, but I will let it run for a while. > Of course, even better would be to do away with the cursor... > > How would I rewrite it to do away with the cursor? >> "top" shows: >> CPU states: cpu user nice system irq softirq iowait idle >> total 5.8% 0.6% 31.2% 0.0% 0.0% 0.5% 61.6% >> Mem: 8152592k av, 8143012k used, 9580k free, 0k shrd, 179888k >> buff >> > > The high system % (if I'm reading this correctly) makes me wonder if > this is some kind of locking issue. > > But it's the only postgres process running. >> 6342296k actv, 1206340k in_d, 137916k in_c >> Swap: 8385760k av, 259780k used, 8125980k free 7668624k >> cached >> >> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND >> 17027 postgres 25 0 566M 561M 560M R 24.9 7.0 924:34 1 >> postmaster >> >> I've likely set some parameter(s) to the wrong values, but I don't know >> which one(s). Here are my relevant postgresql.conf settings: >> shared_buffers = 70000 >> work_mem = 9192 >> maintenance_work_mem = 131072 >> max_fsm_pages = 70000 >> fsync = off (temporarily, will be turned back on) >> checkpoint_segments = 64 >> checkpoint_timeout = 1800 >> effective_cache_size = 70000 >> >> [root@new-server root]# cat /proc/sys/kernel/shmmax >> 660000000 >> >> We want to put this into production soon, but this is a showstopper. Can >> anyone help me out with this? >> >> >> Thanks >> >> Ron St.Pierre >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq >> >> > >
On Jun 21, 2006, at 5:53 PM, Ron St-Pierre wrote: > Jim C. Nasby wrote: >> On Wed, Jun 21, 2006 at 02:27:41PM -0700, Ron St-Pierre wrote: >> >>> We just purchased a new Dell PowerEdge 2800 (dual xeon, 8GB RAM, >>> raid 4, RHEL, postgres 8.1) and ported our old database over to >>> it (single cpu, >> >> RAID *4*? >> > oops, raid 5 (but we are getting good io throughput...) Just remember that unless you have a really good battery-backed controller, writes to RAID5 pretty much suck. >>> BEGIN >>> TRUNCATE stock.datacount; >>> FOR rec IN SELECT itemID, item, hexValue FROM >>> stock.activeitem LOOP >>> histdate := (SELECT updatedate FROM stock.historical s >>> WHERE s.itemID=rec.itemID ORDER BY updatedate DESC LIMIT 1); >>> IF histdate IS NOT NULL THEN >>> funddate := (SELECT updatedate FROM stock.funddata s >>> WHERE s.itemID=rec.itemID); >>> techdate := (SELECT updatedate FROM >>> stock.techsignals s WHERE s.itemID=rec.itemID); >>> IF (histdate <> funddate) OR (histdate <> techdate) >>> OR (funddate IS NULL) OR (techdate IS NULL) THEN >>> counter := counter + 1; >>> outrec.itemID := rec.itemID; >>> outrec.item := rec.item; >>> outrec.hexvalue := rec.hexvalue; >>> RETURN NEXT outrec; >>> END IF; >>> END IF; >>> END LOOP; >>> INSERT INTO stock.datacount (itemcount) VALUES (counter); >>> COPY stock.datacount TO ''/tmp/datacount''; >>> RETURN; >>> END; > How would I rewrite it to do away with the cursor? Something like... SELECT ... FROM (SELECT a...., f.updatedate AS funddate, t.updatedate AS techdate, max(updatedate) hist_date FROM activeitem a JOIN historical h USING itemid GROUP BY a...., f.updatedate, t.updatedate) AS a LEFT JOIN funddate f USING itemid LEFT JOIN techsignals USING itemid WHERE f.updatedate <> hist_date OR t.updatedate <> hist_date OR f.updatedate IS NULL OR t.updatedate IS NULL ; BTW, there's some trick that would let you include the NULL tests with the other tests in the WHERE, but I can't remember it off the top of my head... >>> "top" shows: >>> CPU states: cpu user nice system irq softirq >>> iowait idle >>> total 5.8% 0.6% 31.2% 0.0% 0.0% 0.5% >>> 61.6% >>> Mem: 8152592k av, 8143012k used, 9580k free, 0k shrd, >>> 179888k buff >>> >> >> The high system % (if I'm reading this correctly) makes me wonder if >> this is some kind of locking issue. >> >> > But it's the only postgres process running. Sure, but PostgreSQL still acquires internal locks. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461