Thread: Tuning New Server (slow function)

Tuning New Server (slow function)

From
Ron St-Pierre
Date:
We just purchased a new Dell PowerEdge 2800 (dual xeon, 8GB RAM, raid 4,
RHEL, postgres 8.1) and ported our old database over to it (single cpu,
2GB RAM, no raid, postgres 7.4). Our apps perform great on it, however
some queries are super slow. One function in particular, which used to
take 15-30 minutes on the old server, has been running now for over 12
hours:
  BEGIN
      TRUNCATE stock.datacount;
      FOR rec IN SELECT itemID, item, hexValue FROM stock.activeitem LOOP
          histdate := (SELECT updatedate FROM stock.historical s WHERE
s.itemID=rec.itemID ORDER BY updatedate DESC LIMIT 1);
          IF histdate IS NOT NULL THEN
              funddate := (SELECT updatedate FROM stock.funddata s WHERE
s.itemID=rec.itemID);
              techdate := (SELECT updatedate FROM stock.techsignals s
WHERE s.itemID=rec.itemID);
              IF (histdate <> funddate) OR (histdate <> techdate) OR
(funddate IS NULL) OR (techdate IS NULL) THEN
                  counter := counter + 1;
                  outrec.itemID := rec.itemID;
                  outrec.item := rec.item;
                  outrec.hexvalue := rec.hexvalue;
                  RETURN NEXT outrec;
              END IF;
          END IF;
      END LOOP;
      INSERT INTO stock.datacount (itemcount) VALUES (counter);
      COPY stock.datacount TO ''/tmp/datacount'';
      RETURN;
  END;

note: stock.activeitem contains about 75000 rows


"top" shows:
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
         total    5.8%    0.6%   31.2%   0.0%     0.0%    0.5%   61.6%
Mem:  8152592k av, 8143012k used,    9580k free,       0k shrd,  179888k
buff
                 6342296k actv, 1206340k in_d,  137916k in_c
Swap: 8385760k av,  259780k used, 8125980k free                 7668624k
cached

PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
17027 postgres  25   0  566M 561M  560M R    24.9  7.0 924:34   1
postmaster

I've likely set some parameter(s) to the wrong values, but I don't know
which one(s). Here are my relevant postgresql.conf settings:
shared_buffers = 70000
work_mem = 9192
maintenance_work_mem = 131072
max_fsm_pages = 70000
fsync = off       (temporarily, will be turned back on)
checkpoint_segments = 64
checkpoint_timeout = 1800
effective_cache_size = 70000

[root@new-server root]# cat /proc/sys/kernel/shmmax
660000000

We want to put this into production soon, but this is a showstopper. Can
anyone help me out with this?


Thanks

Ron St.Pierre


Re: Tuning New Server (slow function)

From
"Jim C. Nasby"
Date:
On Wed, Jun 21, 2006 at 02:27:41PM -0700, Ron St-Pierre wrote:
> We just purchased a new Dell PowerEdge 2800 (dual xeon, 8GB RAM, raid 4,
> RHEL, postgres 8.1) and ported our old database over to it (single cpu,

RAID *4*?

If you do any kind of updating at all, you're likely to be real unhappy
with that...

> 2GB RAM, no raid, postgres 7.4). Our apps perform great on it, however
> some queries are super slow. One function in particular, which used to
> take 15-30 minutes on the old server, has been running now for over 12
> hours:
>  BEGIN
>      TRUNCATE stock.datacount;
>      FOR rec IN SELECT itemID, item, hexValue FROM stock.activeitem LOOP
>          histdate := (SELECT updatedate FROM stock.historical s WHERE
> s.itemID=rec.itemID ORDER BY updatedate DESC LIMIT 1);
>          IF histdate IS NOT NULL THEN
>              funddate := (SELECT updatedate FROM stock.funddata s WHERE
> s.itemID=rec.itemID);
>              techdate := (SELECT updatedate FROM stock.techsignals s
> WHERE s.itemID=rec.itemID);
>              IF (histdate <> funddate) OR (histdate <> techdate) OR
> (funddate IS NULL) OR (techdate IS NULL) THEN
>                  counter := counter + 1;
>                  outrec.itemID := rec.itemID;
>                  outrec.item := rec.item;
>                  outrec.hexvalue := rec.hexvalue;
>                  RETURN NEXT outrec;
>              END IF;
>          END IF;
>      END LOOP;
>      INSERT INTO stock.datacount (itemcount) VALUES (counter);
>      COPY stock.datacount TO ''/tmp/datacount'';
>      RETURN;
>  END;
>
> note: stock.activeitem contains about 75000 rows

Getting EXPLAIN ANALYZE from the queries would be good. Adding debug
output via NOTICE to see how long each step is taking would be a good
idea, too.

Of course, even better would be to do away with the cursor...

> "top" shows:
> CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
>         total    5.8%    0.6%   31.2%   0.0%     0.0%    0.5%   61.6%
> Mem:  8152592k av, 8143012k used,    9580k free,       0k shrd,  179888k
> buff

The high system % (if I'm reading this correctly) makes me wonder if
this is some kind of locking issue.

>                 6342296k actv, 1206340k in_d,  137916k in_c
> Swap: 8385760k av,  259780k used, 8125980k free                 7668624k
> cached
>
> PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> 17027 postgres  25   0  566M 561M  560M R    24.9  7.0 924:34   1
> postmaster
>
> I've likely set some parameter(s) to the wrong values, but I don't know
> which one(s). Here are my relevant postgresql.conf settings:
> shared_buffers = 70000
> work_mem = 9192
> maintenance_work_mem = 131072
> max_fsm_pages = 70000
> fsync = off       (temporarily, will be turned back on)
> checkpoint_segments = 64
> checkpoint_timeout = 1800
> effective_cache_size = 70000
>
> [root@new-server root]# cat /proc/sys/kernel/shmmax
> 660000000
>
> We want to put this into production soon, but this is a showstopper. Can
> anyone help me out with this?
>
>
> Thanks
>
> Ron St.Pierre
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Tuning New Server (slow function)

From
Ron St-Pierre
Date:
Jim C. Nasby wrote:
> On Wed, Jun 21, 2006 at 02:27:41PM -0700, Ron St-Pierre wrote:
>
>> We just purchased a new Dell PowerEdge 2800 (dual xeon, 8GB RAM, raid 4,
>> RHEL, postgres 8.1) and ported our old database over to it (single cpu,
>>
>
> RAID *4*?
>
oops, raid 5 (but we are getting good io throughput...)
> If you do any kind of updating at all, you're likely to be real unhappy
> with that...
>
>
>> 2GB RAM, no raid, postgres 7.4). Our apps perform great on it, however
>> some queries are super slow. One function in particular, which used to
>> take 15-30 minutes on the old server, has been running now for over 12
>> hours:
>>  BEGIN
>>      TRUNCATE stock.datacount;
>>      FOR rec IN SELECT itemID, item, hexValue FROM stock.activeitem LOOP
>>          histdate := (SELECT updatedate FROM stock.historical s WHERE
>> s.itemID=rec.itemID ORDER BY updatedate DESC LIMIT 1);
>>          IF histdate IS NOT NULL THEN
>>              funddate := (SELECT updatedate FROM stock.funddata s WHERE
>> s.itemID=rec.itemID);
>>              techdate := (SELECT updatedate FROM stock.techsignals s
>> WHERE s.itemID=rec.itemID);
>>              IF (histdate <> funddate) OR (histdate <> techdate) OR
>> (funddate IS NULL) OR (techdate IS NULL) THEN
>>                  counter := counter + 1;
>>                  outrec.itemID := rec.itemID;
>>                  outrec.item := rec.item;
>>                  outrec.hexvalue := rec.hexvalue;
>>                  RETURN NEXT outrec;
>>              END IF;
>>          END IF;
>>      END LOOP;
>>      INSERT INTO stock.datacount (itemcount) VALUES (counter);
>>      COPY stock.datacount TO ''/tmp/datacount'';
>>      RETURN;
>>  END;
>>
>> note: stock.activeitem contains about 75000 rows
>>
>
> Getting EXPLAIN ANALYZE from the queries would be good. Adding debug
> output via NOTICE to see how long each step is taking would be a good
> idea, too.
>
>
I set client_min_messages = debug2, log_min_messages = debug2 and
log_statement = 'all' and am running the query with EXPLAIN ANALYZE. I
don't know how long it will take until something useful returns, but I
will let it run for a while.
> Of course, even better would be to do away with the cursor...
>
>
How would I rewrite it to do away with the cursor?
>> "top" shows:
>> CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
>>         total    5.8%    0.6%   31.2%   0.0%     0.0%    0.5%   61.6%
>> Mem:  8152592k av, 8143012k used,    9580k free,       0k shrd,  179888k
>> buff
>>
>
> The high system % (if I'm reading this correctly) makes me wonder if
> this is some kind of locking issue.
>
>
But it's the only postgres process running.
>>                 6342296k actv, 1206340k in_d,  137916k in_c
>> Swap: 8385760k av,  259780k used, 8125980k free                 7668624k
>> cached
>>
>> PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>> 17027 postgres  25   0  566M 561M  560M R    24.9  7.0 924:34   1
>> postmaster
>>
>> I've likely set some parameter(s) to the wrong values, but I don't know
>> which one(s). Here are my relevant postgresql.conf settings:
>> shared_buffers = 70000
>> work_mem = 9192
>> maintenance_work_mem = 131072
>> max_fsm_pages = 70000
>> fsync = off       (temporarily, will be turned back on)
>> checkpoint_segments = 64
>> checkpoint_timeout = 1800
>> effective_cache_size = 70000
>>
>> [root@new-server root]# cat /proc/sys/kernel/shmmax
>> 660000000
>>
>> We want to put this into production soon, but this is a showstopper. Can
>> anyone help me out with this?
>>
>>
>> Thanks
>>
>> Ron St.Pierre
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faq
>>
>>
>
>


Re: Tuning New Server (slow function)

From
Jim Nasby
Date:
On Jun 21, 2006, at 5:53 PM, Ron St-Pierre wrote:
> Jim C. Nasby wrote:
>> On Wed, Jun 21, 2006 at 02:27:41PM -0700, Ron St-Pierre wrote:
>>
>>> We just purchased a new Dell PowerEdge 2800 (dual xeon, 8GB RAM,
>>> raid 4, RHEL, postgres 8.1) and ported our old database over to
>>> it (single cpu,
>>
>> RAID *4*?
>>
> oops, raid 5 (but we are getting good io throughput...)

Just remember that unless you have a really good battery-backed
controller, writes to RAID5 pretty much suck.

>>>  BEGIN
>>>      TRUNCATE stock.datacount;
>>>      FOR rec IN SELECT itemID, item, hexValue FROM
>>> stock.activeitem LOOP
>>>          histdate := (SELECT updatedate FROM stock.historical s
>>> WHERE s.itemID=rec.itemID ORDER BY updatedate DESC LIMIT 1);
>>>          IF histdate IS NOT NULL THEN
>>>              funddate := (SELECT updatedate FROM stock.funddata s
>>> WHERE s.itemID=rec.itemID);
>>>              techdate := (SELECT updatedate FROM
>>> stock.techsignals s WHERE s.itemID=rec.itemID);
>>>              IF (histdate <> funddate) OR (histdate <> techdate)
>>> OR (funddate IS NULL) OR (techdate IS NULL) THEN
>>>                  counter := counter + 1;
>>>                  outrec.itemID := rec.itemID;
>>>                  outrec.item := rec.item;
>>>                  outrec.hexvalue := rec.hexvalue;
>>>                  RETURN NEXT outrec;
>>>              END IF;
>>>          END IF;
>>>      END LOOP;
>>>      INSERT INTO stock.datacount (itemcount) VALUES (counter);
>>>      COPY stock.datacount TO ''/tmp/datacount'';
>>>      RETURN;
>>>  END;
> How would I rewrite it to do away with the cursor?

Something like...

SELECT ...
    FROM (SELECT a...., f.updatedate AS funddate, t.updatedate AS
techdate, max(updatedate) hist_date
                FROM activeitem a
                    JOIN historical h USING itemid
                GROUP BY a...., f.updatedate, t.updatedate) AS a
        LEFT JOIN funddate f USING itemid
        LEFT JOIN techsignals USING itemid
    WHERE f.updatedate <> hist_date OR t.updatedate <> hist_date OR
f.updatedate IS NULL OR t.updatedate IS NULL
;

BTW, there's some trick that would let you include the NULL tests
with the other tests in the WHERE, but I can't remember it off the
top of my head...

>>> "top" shows:
>>> CPU states:  cpu    user    nice  system    irq  softirq
>>> iowait    idle
>>>         total    5.8%    0.6%   31.2%   0.0%     0.0%    0.5%
>>> 61.6%
>>> Mem:  8152592k av, 8143012k used,    9580k free,       0k shrd,
>>> 179888k buff
>>>
>>
>> The high system % (if I'm reading this correctly) makes me wonder if
>> this is some kind of locking issue.
>>
>>
> But it's the only postgres process running.

Sure, but PostgreSQL still acquires internal locks.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461