Thread: poor performance of db?

poor performance of db?

From
"SpaceBallOne"
Date:
Hello everyone,

First time poster to the mailing list here.
 
We have been running pgsql for about a year now at a pretty basic level (I guess) as a backend for custom web (intranet) application software. Our database so far is a "huge" (note sarcasm) 10 Mb containing of about 6 or so principle tables. 
 
Our 'test' screen we've been using loads a 600kb HTML document which is basically a summary of our client's orders. It took originally 11.5 seconds to load in internet explorer (all 10.99 seconds were pretty much taken up by postgres processes on a freebsd server).
 
I then re-wrote the page to use a single select query to call all the information needed by PHP to draw the screen. That managed to shave it down to 3.5 seconds... but this so far is as fast as I can get the page to load. Have tried vacuuming and creating indexes but to no avail. (increasing shared mem buffers yet to be done)
 
Now heres the funny bit ...

Every time I tested an idea to speed it up, I got exactly the same loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server. Now, why a dual opteron machine can't perform any faster than a lowly 1800+ athlon in numerous tests is completely beyond me .. increased memory and RAID 0 disc configurations so far have not resulted in any significant performance gain in the opteron server.
 
Do these facts sound right? If postgres is meant to be a 200Gb industrial strength database, should it really be taking this long pulling 600kb worth of info from a 10Mb database? And why no performance difference between two vastly different hardware spec'd computers??? Am I missing some vital postgres.conf setting??

Any advice welcome.

Thanks,
Dave
 

Re: poor performance of db?

From
Andrei Reinus
Date:
SpaceBallOne wrote:

> Hello everyone,
>
> First time poster to the mailing list here.
>
> We have been running pgsql for about a year now at a pretty basic
> level (I guess) as a backend for custom
> web (intranet) application software. Our database so far is a "huge"
> (note sarcasm) 10 Mb containing of about 6 or so principle tables.
>
> Our 'test' screen we've been using loads a 600kb HTML document which
> is basically a summary of our client's orders. It took originally 11.5
> seconds to load in internet explorer (all 10.99 seconds were pretty
> much taken up by postgres processes on a freebsd server).
>
> I then re-wrote the page to use a single select query to call all the
> information needed by PHP to draw the screen. That managed to shave it
> down to 3.5 seconds... but this so far is as fast as I can get the
> page to load. Have tried vacuuming and creating indexes but to no
> avail. (increasing shared mem buffers yet to be done)
>
> Now heres the funny bit ...
>
> Every time I tested an idea to speed it up, I got exactly the same
> loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as
> compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server.
> Now, why a dual opteron machine can't perform any faster than a lowly
> 1800+ athlon in numerous tests is completely beyond me .. increased
> memory and RAID 0 disc configurations so far have not resulted in any
> significant performance gain in the opteron server.
>
> Do these facts sound right? If postgres is meant to be a 200Gb
> industrial strength database, should it really be taking this long
> pulling 600kb worth of info from a 10Mb database? And why no
> performance difference between two vastly different hardware spec'd
> computers??? Am I missing some vital postgres.conf setting??
>
> Any advice welcome.
>
> Thanks,
> Dave
> space_ball_one@hotmail.com <mailto:space_ball_one@hotmail.com>
>

Could you give us a bit more info.
What you are trying to do. EXPLAIN ANALYZE would be great.
In my experience first problem with the first db app is no indexes used
in joining.

--
-- Andrei Reinus


Attachment

Re: poor performance of db?

From
Chris Mair
Date:
> I then re-wrote the page to use a single select query to call all the
> information needed by PHP to draw the screen. That managed to shave it
> down to 3.5 seconds... but this so far is as fast as I can get the
> page to load. Have tried vacuuming and creating indexes but to no
> avail. (increasing shared mem buffers yet to be done)

If you call this select statement directly from psql instead of through
the PHP thing, does timing change?

(just to make sure, time is actually spent in the query and not
somewhere else)

PS: use \timing in psql to see timing information

Bye, Chris.



Re: poor performance of db?

From
"SpaceBallOne"
Date:
Thanks for the replies guys,

Chris -
very cool feature timing - didnt know about that one. Appears to be taking
the following times in pulling up the page:
web browser: 1.15 sec
postgres: 1.52 sec
other: 0.83 sec

Andrew:
Query looks like the following:

explain analyse SELECT

job.*,
customer.*,
ubd.suburb, location.*,
street.street,
location.designation_no,
a1.initials as surveyor,
a2.initials as draftor,
prices.*,
plans.*

FROM

job,
login a1,
login a2,
prices,
location,
ubd,
plans

WHERE

(
a1.code = job.surveyor_no AND
a2.code = job.draftor_no AND
job.customer_no = customer.customer_no AND
job.location_no = location.location_no AND
location.suburb_no = ubd.suburb_id AND
location.street_no = street.street_no AND
job.customer_no = customer.customer_no AND
job.price_id = prices.pricelist_id AND
job.price_revision = prices.revision AND
location.plan_no = plans.number AND
location.plan_type = plans.plantype AND

( (job.jobbookflag <> 'flagged') AND ( job.status = 'normal' ) ))

ORDER BY job_no DESC;




 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=566.31..567.06 rows=298 width=2626) (actual
time=1378.38..1380.08 rows=353 loops=1)
   Sort Key: job.job_no
   ->  Hash Join  (cost=232.59..554.06 rows=298 width=2626) (actual
time=124.96..1374.12 rows=353 loops=1)
         Hash Cond: ("outer".suburb_no = "inner".suburb_id)
         ->  Hash Join  (cost=221.45..519.06 rows=288 width=2606) (actual
time=118.60..1187.87 rows=353 loops=1)
               Hash Cond: ("outer".street_no = "inner".street_no)
               ->  Hash Join  (cost=204.79..496.64 rows=287 width=2587)
(actual time=108.16..997.57 rows=353 loops=1)
                     Hash Cond: ("outer".surveyor_no = "inner".code)
                     ->  Hash Join  (cost=203.21..490.05 rows=287
width=2573) (actual time=106.89..823.47 rows=353 loops=1)
                           Hash Cond: ("outer".customer_no =
"inner".customer_no)
                           ->  Hash Join  (cost=159.12..440.93 rows=287
width=2291) (actual time=92.16..654.51 rows=353 loops=1)
                                 Hash Cond: ("outer".draftor_no =
"inner".code)
                                 ->  Hash Join  (cost=157.55..434.33
rows=287 width=2277) (actual time=90.96..507.34 rows=353 loops=1)
                                       Hash Cond: ("outer".price_id =
"inner".pricelist_id)
                                       Join Filter: ("outer".price_revision
= "inner".revision)
                                       ->  Hash Join  (cost=142.95..401.01
rows=336 width=2150) (actual time=82.57..377.87 rows=353 loops=1)
                                             Hash Cond: ("outer".plan_no =
"inner".number)
                                             Join Filter: ("outer".plan_type
= "inner".plantype)
                                             ->  Hash Join
(cost=25.66..272.20 rows=418 width=2110) (actual time=14.58..198.50 rows=353
loops=1)
                                                   Hash Cond:
("outer".location_no = "inner".location_no)
                                                   ->  Seq Scan on job
(cost=0.00..238.18 rows=418 width=2029) (actual time=0.31..95.21 rows=353
loops=1)
                                                         Filter:
((jobbookflag <> 'flagged'::character varying) AND (status =
'normal'::character varying))
                                                   ->  Hash
(cost=23.53..23.53 rows=853 width=81) (actual time=13.91..13.91 rows=0
loops=1)
                                                         ->  Seq Scan on
"location"  (cost=0.00..23.53 rows=853 width=81) (actual time=0.03..8.92
rows=853 loops=1)
                                             ->  Hash  (cost=103.43..103.43
rows=5543 width=40) (actual time=67.55..67.55 rows=0 loops=1)
                                                   ->  Seq Scan on plans
(cost=0.00..103.43 rows=5543 width=40) (actual time=0.01..36.89 rows=5544
loops=1)
                                       ->  Hash  (cost=13.68..13.68 rows=368
width=127) (actual time=7.98..7.98 rows=0 loops=1)
                                             ->  Seq Scan on prices
(cost=0.00..13.68 rows=368 width=127) (actual time=0.03..5.83 rows=368
loops=1)
                                 ->  Hash  (cost=1.46..1.46 rows=46
width=14) (actual time=0.57..0.57 rows=0 loops=1)
                                       ->  Seq Scan on login a2
(cost=0.00..1.46 rows=46 width=14) (actual time=0.02..0.31 rows=46 loops=1)
                           ->  Hash  (cost=42.07..42.07 rows=807 width=282)
(actual time=14.24..14.24 rows=0 loops=1)
                                 ->  Seq Scan on customer  (cost=0.00..42.07
rows=807 width=282) (actual time=0.03..9.03 rows=807 loops=1)
                     ->  Hash  (cost=1.46..1.46 rows=46 width=14) (actual
time=0.57..0.57 rows=0 loops=1)
                           ->  Seq Scan on login a1  (cost=0.00..1.46
rows=46 width=14) (actual time=0.02..0.31 rows=46 loops=1)
               ->  Hash  (cost=14.53..14.53 rows=853 width=19) (actual
time=9.79..9.79 rows=0 loops=1)
                     ->  Seq Scan on street  (cost=0.00..14.53 rows=853
width=19) (actual time=0.01..5.12 rows=853 loops=1)
         ->  Hash  (cost=9.91..9.91 rows=491 width=20) (actual
time=5.73..5.73 rows=0 loops=1)
               ->  Seq Scan on ubd  (cost=0.00..9.91 rows=491 width=20)
(actual time=0.02..2.98 rows=491 loops=1)
 Total runtime: 1383.99 msec
(39 rows)

Time: 1445.80 ms



I tried setting up 10-15 indexes yesterday, but couldn't see they were doing
anything. I have since deleted them (on the premise that I didn't have a
clue what I was doing).

I'm not actually running any keys in this database... would that be a
simpler way of running my queries? I only learnt postgres / unix from
scratch a year ago so my db setup and queries is probably pretty messy    :)

Thanks,
Dave
space_ball_one@hotmail.com





----- Original Message -----
From: "Andrei Reinus" <andrei.reinus@uptime.ee>
To: "SpaceBallOne" <space_ball_one@hotmail.com>
Cc: <pgsql-performance@postgresql.org>
Sent: Monday, January 24, 2005 5:22 PM
Subject: Re: [PERFORM] poor performance of db?


> SpaceBallOne wrote:
>
>> Hello everyone,
>>
>> First time poster to the mailing list here.
>>
>> We have been running pgsql for about a year now at a pretty basic
>> level (I guess) as a backend for custom
>> web (intranet) application software. Our database so far is a "huge"
>> (note sarcasm) 10 Mb containing of about 6 or so principle tables.
>>
>> Our 'test' screen we've been using loads a 600kb HTML document which
>> is basically a summary of our client's orders. It took originally 11.5
>> seconds to load in internet explorer (all 10.99 seconds were pretty
>> much taken up by postgres processes on a freebsd server).
>>
>> I then re-wrote the page to use a single select query to call all the
>> information needed by PHP to draw the screen. That managed to shave it
>> down to 3.5 seconds... but this so far is as fast as I can get the
>> page to load. Have tried vacuuming and creating indexes but to no
>> avail. (increasing shared mem buffers yet to be done)
>>
>> Now heres the funny bit ...
>>
>> Every time I tested an idea to speed it up, I got exactly the same
>> loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as
>> compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server.
>> Now, why a dual opteron machine can't perform any faster than a lowly
>> 1800+ athlon in numerous tests is completely beyond me .. increased
>> memory and RAID 0 disc configurations so far have not resulted in any
>> significant performance gain in the opteron server.
>>
>> Do these facts sound right? If postgres is meant to be a 200Gb
>> industrial strength database, should it really be taking this long
>> pulling 600kb worth of info from a 10Mb database? And why no
>> performance difference between two vastly different hardware spec'd
>> computers??? Am I missing some vital postgres.conf setting??
>>
>> Any advice welcome.
>>
>> Thanks,
>> Dave
>> space_ball_one@hotmail.com <mailto:space_ball_one@hotmail.com>
>>
>
> Could you give us a bit more info.
> What you are trying to do. EXPLAIN ANALYZE would be great.
> In my experience first problem with the first db app is no indexes used
> in joining.
>
> --
> -- Andrei Reinus
>
>


--------------------------------------------------------------------------------


>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

Re: poor performance of db?

From
"SpaceBallOne"
Date:
Thanks for the reply John,

There are approximately 800 rows total in our job table (which stays
approximately the same because 'completed' jobs get moved to a 'job_archive'
table).The other jobs not shown by the specific query could be on backorder
status, temporary deleted status, etc etc.

You are correct in assuming the _id and _no (stands for 'number') fields are
unique - this was one of the first pages I built when I started learning
postgres, so not knowing how to set up primary and foriegn keys at the time,
I did it that way ... it is normalised to a point (probably rather sloppy,
but its a juggling act between learning on the fly, what I'd like to have,
and time constraints of being the only I.T. guy in the company!)...

I think I will definitely focus on converting my database and php pages to
using proper primary keys in postgres - especially if they automatically
index themselves. I didn't do a vacuum analyse on them so that may explain
why they didn't seem to do much.

Thanks,
Dave
space_ball_one@hotmail.com



----- Original Message -----
From: "John Arbash Meinel" <john@arbash-meinel.com>
To: "SpaceBallOne" <space_ball_one@hotmail.com>
Sent: Tuesday, January 25, 2005 9:56 AM
Subject: Re: [PERFORM] poor performance of db?

SpaceBallOne wrote:

>
>
> I tried setting up 10-15 indexes yesterday, but couldn't see they were
> doing anything. I have since deleted them (on the premise that I
> didn't have a clue what I was doing).

Did you VACUUM ANALYZE after you created the indexes? It really depends
on how many rows you need vs how many rows are in the table. If you are
trying to show everything in the tables, then it won't help.

I can tell that your query is returning 353 rows.  How many rows total
do you have? I think the rule is that indexes help when you need < 10%
of your data.

 From what I can see, it looks like all of the *_no columns, and *_id
columns (which are basically your keys), would be helped by having an
index on them.

>
> I'm not actually running any keys in this database... would that be a
> simpler way of running my queries? I only learnt postgres / unix from
> scratch a year ago so my db setup and queries is probably pretty
> messy    :)
>
I would probably think that you would want a "primary key" on every
table, and this would be your column for references. This way you can
get referential integrity, *and* it automatically creates an index.

For instance, the job table could be:

create table job (
    id serial primary key,
    surveyor_id integer references surveyor(id),
    draftor_id integer references draftor(id),
    ...
);

Then your other tables would also need an id field. I can't say much
more without looking deeper, but from the looks of it, all of your "_no"
and "_id" references should probably be referencing a primary key on the
other table. Personally, I always name it "id" and "_id", but if "_no"
means something to you, then you certainly could keep it.

If these entries are not unique, then probably your database isn't
properly normalized.
John
=:->

> Thanks,
> Dave
> space_ball_one@hotmail.com
>


Re: poor performance of db?

From
John Arbash Meinel
Date:
SpaceBallOne wrote:

> Thanks for the reply John,
>
> There are approximately 800 rows total in our job table (which stays
> approximately the same because 'completed' jobs get moved to a
> 'job_archive' table).The other jobs not shown by the specific query
> could be on backorder status, temporary deleted status, etc etc.
>
> You are correct in assuming the _id and _no (stands for 'number')
> fields are unique - this was one of the first pages I built when I
> started learning postgres, so not knowing how to set up primary and
> foriegn keys at the time, I did it that way ... it is normalised to a
> point (probably rather sloppy, but its a juggling act between learning
> on the fly, what I'd like to have, and time constraints of being the
> only I.T. guy in the company!)...
>
> I think I will definitely focus on converting my database and php
> pages to using proper primary keys in postgres - especially if they
> automatically index themselves. I didn't do a vacuum analyse on them
> so that may explain why they didn't seem to do much.


You probably can add them now if you don't want to do a lot of redesign.
ALTER TABLE job ADD PRIMARY KEY (id);

If they are not unique this will cause problems, but as they should be
unique, I think it will work.

I'm not sure how much help indexes will be if you only have 800 rows,
and your queries use 300+ of them.

You might need re-think the query/table design.

You might try doing nested queries, or explicit joins, rather than one
big query with a WHERE clause.

Meaning do stuff like:

SELECT
  (job JOIN customer ON job.customer_no = customer.customer_no) as jc
  JOIN location on jc.location_no = location.location_no
...

I also see that the planner seems to mis-estimate the number of rows in
some cases. Like here:

>               ->  Hash  (cost=14.53..14.53 rows=853 width=19) (actual
> time=9.79..9.79 rows=0 loops=1)
>                     ->  Seq Scan on street  (cost=0.00..14.53 rows=853
> width=19) (actual time=0.01..5.12 rows=853 loops=1)
>         ->  Hash  (cost=9.91..9.91 rows=491 width=20) (actual
> time=5.73..5.73 rows=0 loops=1)
>               ->  Seq Scan on ubd  (cost=0.00..9.91 rows=491 width=20)
> (actual time=0.02..2.98 rows=491

Where it thinks the hash will return all of the rows from the sequential
scan, when in reality it returns none.

I think problems with the planner fall into 3 categories.

   1. You didn't VACUUM ANALYZE.
   2. You did, but the planner doesn't keep sufficient statistics (ALTER
      TABLE job ALTER COLUMN no SET STATISTICS <a number>)
   3. You're join needs cross column statistics, which postgres doesn't
      support (yet).

If you only have 800 rows, I don't think you have to worry about
statistics, so that leaves things at 1 or 3. If you did do 1, then I
don't know what to tell you.

John
=:->

PS> I'm not a guru at this stuff, so some of what I say may be wrong.
But hopefully I point you in the right direction.


>
> Thanks,
> Dave
> space_ball_one@hotmail.com
>


Attachment

Re: poor performance of db?

From
andrew@pillette.com
Date:
This is a multi-part message in MIME format.

--bound1106633891
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

I'm also an autodidact on DB design, although it's well more than a year now. If you are planning to clean up the
design,I strongly suggest getting a visual tool. Google for something like "database design tool". Some are extremely
expensive(e.g. ERwin, which I think is renamed having been bought out). There's a very cheap shareware one that I won't
mentionby name because it crashed my machine consistently. Right now I'm using "Case Studio", which has some very
eccentricUI (no one enforced consistency of UI across modules, which is rather ironic in a design tool) but capable and
user-extensible.ERwin's manual also had the best explanation of denormalization I've read, short and to the point. 

The ability to make schema revisions quickly lets me concentrate on *better-written queries* and *improved table
definition*without having to overcome inertia. 

--bound1106633891--

Re: poor performance of db?

From
PFC
Date:

> Every time I tested an idea to speed it up, I got exactly the same
> loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as
> compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server.
> Now, why a dual opteron machine can't perform any faster than a lowly
> 1800+ athlon in numerous tests is completely beyond me ... increased
> memory and RAID 0 disc configurations so far have not resulted in any
> significant performance gain in the opteron server.

    How many rows does the query return ?

    Maybe a lot of time is spent, hidden in the PHP libraries, converting the
rows returned by psql into PHP objects.

    You should try that :

    EXPLAIN ANALYZE SELECT your query
    -> time is T1

    CREATE TABLE cache AS SELECT your query
    EXPLAIN ANALYZE SELECT * FROM cache
    -> time is T2 (probably very small)

    Now in your PHP script replace SELECT your query by SELECT * FROM cache.
How much does the final page time changes ? This will tell you the time
spend in the postgres engine, not in data transmission and PHPing. It will
tell wat you can gain optimizing the query.