Thread: poor performance of db?
Hello everyone,
First time poster to the mailing list here.
First time poster to the mailing list here.
We have been running pgsql for about a year now at a pretty basic level (I guess) as a backend for custom web (intranet) application software. Our database so far is a "huge" (note sarcasm) 10 Mb containing of about 6 or so principle tables.
Our 'test' screen we've been using loads a 600kb HTML document which is basically a summary of our client's orders. It took originally 11.5 seconds to load in internet explorer (all 10.99 seconds were pretty much taken up by postgres processes on a freebsd server).
I then re-wrote the page to use a single select query to call all the information needed by PHP to draw the screen. That managed to shave it down to 3.5 seconds... but this so far is as fast as I can get the page to load. Have tried vacuuming and creating indexes but to no avail. (increasing shared mem buffers yet to be done)
Now heres the funny bit ...
Every time I tested an idea to speed it up, I got exactly the same loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server. Now, why a dual opteron machine can't perform any faster than a lowly 1800+ athlon in numerous tests is completely beyond me .. increased memory and RAID 0 disc configurations so far have not resulted in any significant performance gain in the opteron server.
Every time I tested an idea to speed it up, I got exactly the same loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server. Now, why a dual opteron machine can't perform any faster than a lowly 1800+ athlon in numerous tests is completely beyond me .. increased memory and RAID 0 disc configurations so far have not resulted in any significant performance gain in the opteron server.
Do these facts sound right? If postgres is meant to be a 200Gb industrial strength database, should it really be taking this long pulling 600kb worth of info from a 10Mb database? And why no performance difference between two vastly different hardware spec'd computers??? Am I missing some vital postgres.conf setting??
Any advice welcome.
Thanks,
Dave
Any advice welcome.
Thanks,
Dave
SpaceBallOne wrote: > Hello everyone, > > First time poster to the mailing list here. > > We have been running pgsql for about a year now at a pretty basic > level (I guess) as a backend for custom > web (intranet) application software. Our database so far is a "huge" > (note sarcasm) 10 Mb containing of about 6 or so principle tables. > > Our 'test' screen we've been using loads a 600kb HTML document which > is basically a summary of our client's orders. It took originally 11.5 > seconds to load in internet explorer (all 10.99 seconds were pretty > much taken up by postgres processes on a freebsd server). > > I then re-wrote the page to use a single select query to call all the > information needed by PHP to draw the screen. That managed to shave it > down to 3.5 seconds... but this so far is as fast as I can get the > page to load. Have tried vacuuming and creating indexes but to no > avail. (increasing shared mem buffers yet to be done) > > Now heres the funny bit ... > > Every time I tested an idea to speed it up, I got exactly the same > loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as > compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server. > Now, why a dual opteron machine can't perform any faster than a lowly > 1800+ athlon in numerous tests is completely beyond me .. increased > memory and RAID 0 disc configurations so far have not resulted in any > significant performance gain in the opteron server. > > Do these facts sound right? If postgres is meant to be a 200Gb > industrial strength database, should it really be taking this long > pulling 600kb worth of info from a 10Mb database? And why no > performance difference between two vastly different hardware spec'd > computers??? Am I missing some vital postgres.conf setting?? > > Any advice welcome. > > Thanks, > Dave > space_ball_one@hotmail.com <mailto:space_ball_one@hotmail.com> > Could you give us a bit more info. What you are trying to do. EXPLAIN ANALYZE would be great. In my experience first problem with the first db app is no indexes used in joining. -- -- Andrei Reinus
Attachment
> I then re-wrote the page to use a single select query to call all the > information needed by PHP to draw the screen. That managed to shave it > down to 3.5 seconds... but this so far is as fast as I can get the > page to load. Have tried vacuuming and creating indexes but to no > avail. (increasing shared mem buffers yet to be done) If you call this select statement directly from psql instead of through the PHP thing, does timing change? (just to make sure, time is actually spent in the query and not somewhere else) PS: use \timing in psql to see timing information Bye, Chris.
Thanks for the replies guys, Chris - very cool feature timing - didnt know about that one. Appears to be taking the following times in pulling up the page: web browser: 1.15 sec postgres: 1.52 sec other: 0.83 sec Andrew: Query looks like the following: explain analyse SELECT job.*, customer.*, ubd.suburb, location.*, street.street, location.designation_no, a1.initials as surveyor, a2.initials as draftor, prices.*, plans.* FROM job, login a1, login a2, prices, location, ubd, plans WHERE ( a1.code = job.surveyor_no AND a2.code = job.draftor_no AND job.customer_no = customer.customer_no AND job.location_no = location.location_no AND location.suburb_no = ubd.suburb_id AND location.street_no = street.street_no AND job.customer_no = customer.customer_no AND job.price_id = prices.pricelist_id AND job.price_revision = prices.revision AND location.plan_no = plans.number AND location.plan_type = plans.plantype AND ( (job.jobbookflag <> 'flagged') AND ( job.status = 'normal' ) )) ORDER BY job_no DESC; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=566.31..567.06 rows=298 width=2626) (actual time=1378.38..1380.08 rows=353 loops=1) Sort Key: job.job_no -> Hash Join (cost=232.59..554.06 rows=298 width=2626) (actual time=124.96..1374.12 rows=353 loops=1) Hash Cond: ("outer".suburb_no = "inner".suburb_id) -> Hash Join (cost=221.45..519.06 rows=288 width=2606) (actual time=118.60..1187.87 rows=353 loops=1) Hash Cond: ("outer".street_no = "inner".street_no) -> Hash Join (cost=204.79..496.64 rows=287 width=2587) (actual time=108.16..997.57 rows=353 loops=1) Hash Cond: ("outer".surveyor_no = "inner".code) -> Hash Join (cost=203.21..490.05 rows=287 width=2573) (actual time=106.89..823.47 rows=353 loops=1) Hash Cond: ("outer".customer_no = "inner".customer_no) -> Hash Join (cost=159.12..440.93 rows=287 width=2291) (actual time=92.16..654.51 rows=353 loops=1) Hash Cond: ("outer".draftor_no = "inner".code) -> Hash Join (cost=157.55..434.33 rows=287 width=2277) (actual time=90.96..507.34 rows=353 loops=1) Hash Cond: ("outer".price_id = "inner".pricelist_id) Join Filter: ("outer".price_revision = "inner".revision) -> Hash Join (cost=142.95..401.01 rows=336 width=2150) (actual time=82.57..377.87 rows=353 loops=1) Hash Cond: ("outer".plan_no = "inner".number) Join Filter: ("outer".plan_type = "inner".plantype) -> Hash Join (cost=25.66..272.20 rows=418 width=2110) (actual time=14.58..198.50 rows=353 loops=1) Hash Cond: ("outer".location_no = "inner".location_no) -> Seq Scan on job (cost=0.00..238.18 rows=418 width=2029) (actual time=0.31..95.21 rows=353 loops=1) Filter: ((jobbookflag <> 'flagged'::character varying) AND (status = 'normal'::character varying)) -> Hash (cost=23.53..23.53 rows=853 width=81) (actual time=13.91..13.91 rows=0 loops=1) -> Seq Scan on "location" (cost=0.00..23.53 rows=853 width=81) (actual time=0.03..8.92 rows=853 loops=1) -> Hash (cost=103.43..103.43 rows=5543 width=40) (actual time=67.55..67.55 rows=0 loops=1) -> Seq Scan on plans (cost=0.00..103.43 rows=5543 width=40) (actual time=0.01..36.89 rows=5544 loops=1) -> Hash (cost=13.68..13.68 rows=368 width=127) (actual time=7.98..7.98 rows=0 loops=1) -> Seq Scan on prices (cost=0.00..13.68 rows=368 width=127) (actual time=0.03..5.83 rows=368 loops=1) -> Hash (cost=1.46..1.46 rows=46 width=14) (actual time=0.57..0.57 rows=0 loops=1) -> Seq Scan on login a2 (cost=0.00..1.46 rows=46 width=14) (actual time=0.02..0.31 rows=46 loops=1) -> Hash (cost=42.07..42.07 rows=807 width=282) (actual time=14.24..14.24 rows=0 loops=1) -> Seq Scan on customer (cost=0.00..42.07 rows=807 width=282) (actual time=0.03..9.03 rows=807 loops=1) -> Hash (cost=1.46..1.46 rows=46 width=14) (actual time=0.57..0.57 rows=0 loops=1) -> Seq Scan on login a1 (cost=0.00..1.46 rows=46 width=14) (actual time=0.02..0.31 rows=46 loops=1) -> Hash (cost=14.53..14.53 rows=853 width=19) (actual time=9.79..9.79 rows=0 loops=1) -> Seq Scan on street (cost=0.00..14.53 rows=853 width=19) (actual time=0.01..5.12 rows=853 loops=1) -> Hash (cost=9.91..9.91 rows=491 width=20) (actual time=5.73..5.73 rows=0 loops=1) -> Seq Scan on ubd (cost=0.00..9.91 rows=491 width=20) (actual time=0.02..2.98 rows=491 loops=1) Total runtime: 1383.99 msec (39 rows) Time: 1445.80 ms I tried setting up 10-15 indexes yesterday, but couldn't see they were doing anything. I have since deleted them (on the premise that I didn't have a clue what I was doing). I'm not actually running any keys in this database... would that be a simpler way of running my queries? I only learnt postgres / unix from scratch a year ago so my db setup and queries is probably pretty messy :) Thanks, Dave space_ball_one@hotmail.com ----- Original Message ----- From: "Andrei Reinus" <andrei.reinus@uptime.ee> To: "SpaceBallOne" <space_ball_one@hotmail.com> Cc: <pgsql-performance@postgresql.org> Sent: Monday, January 24, 2005 5:22 PM Subject: Re: [PERFORM] poor performance of db? > SpaceBallOne wrote: > >> Hello everyone, >> >> First time poster to the mailing list here. >> >> We have been running pgsql for about a year now at a pretty basic >> level (I guess) as a backend for custom >> web (intranet) application software. Our database so far is a "huge" >> (note sarcasm) 10 Mb containing of about 6 or so principle tables. >> >> Our 'test' screen we've been using loads a 600kb HTML document which >> is basically a summary of our client's orders. It took originally 11.5 >> seconds to load in internet explorer (all 10.99 seconds were pretty >> much taken up by postgres processes on a freebsd server). >> >> I then re-wrote the page to use a single select query to call all the >> information needed by PHP to draw the screen. That managed to shave it >> down to 3.5 seconds... but this so far is as fast as I can get the >> page to load. Have tried vacuuming and creating indexes but to no >> avail. (increasing shared mem buffers yet to be done) >> >> Now heres the funny bit ... >> >> Every time I tested an idea to speed it up, I got exactly the same >> loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as >> compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server. >> Now, why a dual opteron machine can't perform any faster than a lowly >> 1800+ athlon in numerous tests is completely beyond me .. increased >> memory and RAID 0 disc configurations so far have not resulted in any >> significant performance gain in the opteron server. >> >> Do these facts sound right? If postgres is meant to be a 200Gb >> industrial strength database, should it really be taking this long >> pulling 600kb worth of info from a 10Mb database? And why no >> performance difference between two vastly different hardware spec'd >> computers??? Am I missing some vital postgres.conf setting?? >> >> Any advice welcome. >> >> Thanks, >> Dave >> space_ball_one@hotmail.com <mailto:space_ball_one@hotmail.com> >> > > Could you give us a bit more info. > What you are trying to do. EXPLAIN ANALYZE would be great. > In my experience first problem with the first db app is no indexes used > in joining. > > -- > -- Andrei Reinus > > -------------------------------------------------------------------------------- > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
Thanks for the reply John, There are approximately 800 rows total in our job table (which stays approximately the same because 'completed' jobs get moved to a 'job_archive' table).The other jobs not shown by the specific query could be on backorder status, temporary deleted status, etc etc. You are correct in assuming the _id and _no (stands for 'number') fields are unique - this was one of the first pages I built when I started learning postgres, so not knowing how to set up primary and foriegn keys at the time, I did it that way ... it is normalised to a point (probably rather sloppy, but its a juggling act between learning on the fly, what I'd like to have, and time constraints of being the only I.T. guy in the company!)... I think I will definitely focus on converting my database and php pages to using proper primary keys in postgres - especially if they automatically index themselves. I didn't do a vacuum analyse on them so that may explain why they didn't seem to do much. Thanks, Dave space_ball_one@hotmail.com ----- Original Message ----- From: "John Arbash Meinel" <john@arbash-meinel.com> To: "SpaceBallOne" <space_ball_one@hotmail.com> Sent: Tuesday, January 25, 2005 9:56 AM Subject: Re: [PERFORM] poor performance of db? SpaceBallOne wrote: > > > I tried setting up 10-15 indexes yesterday, but couldn't see they were > doing anything. I have since deleted them (on the premise that I > didn't have a clue what I was doing). Did you VACUUM ANALYZE after you created the indexes? It really depends on how many rows you need vs how many rows are in the table. If you are trying to show everything in the tables, then it won't help. I can tell that your query is returning 353 rows. How many rows total do you have? I think the rule is that indexes help when you need < 10% of your data. From what I can see, it looks like all of the *_no columns, and *_id columns (which are basically your keys), would be helped by having an index on them. > > I'm not actually running any keys in this database... would that be a > simpler way of running my queries? I only learnt postgres / unix from > scratch a year ago so my db setup and queries is probably pretty > messy :) > I would probably think that you would want a "primary key" on every table, and this would be your column for references. This way you can get referential integrity, *and* it automatically creates an index. For instance, the job table could be: create table job ( id serial primary key, surveyor_id integer references surveyor(id), draftor_id integer references draftor(id), ... ); Then your other tables would also need an id field. I can't say much more without looking deeper, but from the looks of it, all of your "_no" and "_id" references should probably be referencing a primary key on the other table. Personally, I always name it "id" and "_id", but if "_no" means something to you, then you certainly could keep it. If these entries are not unique, then probably your database isn't properly normalized. John =:-> > Thanks, > Dave > space_ball_one@hotmail.com >
SpaceBallOne wrote: > Thanks for the reply John, > > There are approximately 800 rows total in our job table (which stays > approximately the same because 'completed' jobs get moved to a > 'job_archive' table).The other jobs not shown by the specific query > could be on backorder status, temporary deleted status, etc etc. > > You are correct in assuming the _id and _no (stands for 'number') > fields are unique - this was one of the first pages I built when I > started learning postgres, so not knowing how to set up primary and > foriegn keys at the time, I did it that way ... it is normalised to a > point (probably rather sloppy, but its a juggling act between learning > on the fly, what I'd like to have, and time constraints of being the > only I.T. guy in the company!)... > > I think I will definitely focus on converting my database and php > pages to using proper primary keys in postgres - especially if they > automatically index themselves. I didn't do a vacuum analyse on them > so that may explain why they didn't seem to do much. You probably can add them now if you don't want to do a lot of redesign. ALTER TABLE job ADD PRIMARY KEY (id); If they are not unique this will cause problems, but as they should be unique, I think it will work. I'm not sure how much help indexes will be if you only have 800 rows, and your queries use 300+ of them. You might need re-think the query/table design. You might try doing nested queries, or explicit joins, rather than one big query with a WHERE clause. Meaning do stuff like: SELECT (job JOIN customer ON job.customer_no = customer.customer_no) as jc JOIN location on jc.location_no = location.location_no ... I also see that the planner seems to mis-estimate the number of rows in some cases. Like here: > -> Hash (cost=14.53..14.53 rows=853 width=19) (actual > time=9.79..9.79 rows=0 loops=1) > -> Seq Scan on street (cost=0.00..14.53 rows=853 > width=19) (actual time=0.01..5.12 rows=853 loops=1) > -> Hash (cost=9.91..9.91 rows=491 width=20) (actual > time=5.73..5.73 rows=0 loops=1) > -> Seq Scan on ubd (cost=0.00..9.91 rows=491 width=20) > (actual time=0.02..2.98 rows=491 Where it thinks the hash will return all of the rows from the sequential scan, when in reality it returns none. I think problems with the planner fall into 3 categories. 1. You didn't VACUUM ANALYZE. 2. You did, but the planner doesn't keep sufficient statistics (ALTER TABLE job ALTER COLUMN no SET STATISTICS <a number>) 3. You're join needs cross column statistics, which postgres doesn't support (yet). If you only have 800 rows, I don't think you have to worry about statistics, so that leaves things at 1 or 3. If you did do 1, then I don't know what to tell you. John =:-> PS> I'm not a guru at this stuff, so some of what I say may be wrong. But hopefully I point you in the right direction. > > Thanks, > Dave > space_ball_one@hotmail.com >
Attachment
This is a multi-part message in MIME format. --bound1106633891 Content-Type: text/plain Content-Transfer-Encoding: 7bit I'm also an autodidact on DB design, although it's well more than a year now. If you are planning to clean up the design,I strongly suggest getting a visual tool. Google for something like "database design tool". Some are extremely expensive(e.g. ERwin, which I think is renamed having been bought out). There's a very cheap shareware one that I won't mentionby name because it crashed my machine consistently. Right now I'm using "Case Studio", which has some very eccentricUI (no one enforced consistency of UI across modules, which is rather ironic in a design tool) but capable and user-extensible.ERwin's manual also had the best explanation of denormalization I've read, short and to the point. The ability to make schema revisions quickly lets me concentrate on *better-written queries* and *improved table definition*without having to overcome inertia. --bound1106633891--
> Every time I tested an idea to speed it up, I got exactly the same > loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as > compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server. > Now, why a dual opteron machine can't perform any faster than a lowly > 1800+ athlon in numerous tests is completely beyond me ... increased > memory and RAID 0 disc configurations so far have not resulted in any > significant performance gain in the opteron server. How many rows does the query return ? Maybe a lot of time is spent, hidden in the PHP libraries, converting the rows returned by psql into PHP objects. You should try that : EXPLAIN ANALYZE SELECT your query -> time is T1 CREATE TABLE cache AS SELECT your query EXPLAIN ANALYZE SELECT * FROM cache -> time is T2 (probably very small) Now in your PHP script replace SELECT your query by SELECT * FROM cache. How much does the final page time changes ? This will tell you the time spend in the postgres engine, not in data transmission and PHPing. It will tell wat you can gain optimizing the query.