Re: pg 8.1.3, AIX, huge box, painfully slow. - Mailing list pgsql-performance

From Gavin Hamill
Subject Re: pg 8.1.3, AIX, huge box, painfully slow.
Date
Msg-id 20060407204526.197afc19.gdh@laterooms.com
Whole thread Raw
In response to Re: pg 8.1.3, AIX, huge box, painfully slow.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg 8.1.3, AIX, huge box, painfully slow.
List pgsql-performance
On Fri, 07 Apr 2006 14:41:39 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Gavin Hamill <gdh@laterooms.com> writes:
> > OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650
> > (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot
> > up and our website is next to unusable. The IBM is not swapping (not
> > with 16GB of RAM!), disk i/o is low, but there must be something
> > critically wrong for this monster to be performing so badly..
>
> Have you vacuumed/analyzed since reloading your data?

Absolutely - a VACUUM FULL was the first thing I did, and have VACUUM ANALYZE VERBOSE'd a couple of times since. I have
plentyof overhead to keep the entire free space map in RAM. 

> Compare some
> EXPLAIN ANALYZE outputs for identical queries on the two machines,
> that usually helps figure out what's wrong.

If only :)

Since 90% of the db work is the 'hotelsearch' function (which is 350 lines-worth that I'm not permitted to share :(( ),
anEXPLAIN ANALYZE reveals practically nothing: 

##### jalapeno (IBM)
laterooms=# EXPLAIN ANALYZE select * from hotelsearch(12.48333::numeric, 41.90000::numeric, 5::int4,
'2006-04-13'::date,5::int4, NULL::int4, 1::int4, NULL::int4, NULL::int4, TRUE::bool, FALSE::bool, FALSE::bool, 1::int2,
'GBP'::text,'Roma'::text, 7::int4, NULL::int4, NULL::int4) limit 500; 
                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..6.25 rows=500 width=1587) (actual time=2922.282..2922.908 rows=255 loops=1)
   ->  Function Scan on hotelsearch  (cost=0.00..12.50 rows=1000 width=1587) (actual time=2922.277..2922.494 rows=255
loops=1)
 Total runtime: 2923.296 ms
(3 rows)

##### cayenne (xeon)
laterooms=# EXPLAIN ANALYZE select * from hotelsearch(12.48333::numeric, 41.90000::numeric, 5::int4,
'2006-04-13'::date,5::int4, NULL::int4, 1::int4, NULL::int4, NULL::int4, TRUE::bool, FALSE::bool, FALSE::bool, 1::int2,
'GBP'::text,'Roma'::text, 7::int4, NULL::int4, NULL::int4) limit 500; 
                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..6.25 rows=500 width=1587) (actual time=1929.483..1930.103 rows=255 loops=1)
   ->  Function Scan on hotelsearch  (cost=0.00..12.50 rows=1000 width=1587) (actual time=1929.479..1929.693 rows=255
loops=1)
 Total runtime: 1930.506 ms
(3 rows)


The 'LIMIT 500' is a red herring since the function body will get all data, so reducing the LIMIT in the call to
hotelsearchdoesn't reduce the amount of work being done. 

The killer in it all is tail'ing the postgres log (which I have set only to log queries at 1000ms or up) is things will
bereturning at 1000-2000ms.. then suddenly shoot up to 8000ms.. and if I try a few of those 8000ms queries on the xeon
box,they exec in ~1500ms.. and if I try them again a few moments later on the ibm, they'll also exec in maybe ~2500ms. 

This is one hell of a moving target and I can't help but think I'm just missing something that's right in front of my
nose,too close to see.  

Cheers,
Gavin.


pgsql-performance by date:

Previous
From: Chris Mair
Date:
Subject: Re: bad performance on Solaris 10
Next
From: Gavin Hamill
Date:
Subject: Re: pg 8.1.3, AIX, huge box, painfully slow.