Re: poor performance of db? - Mailing list pgsql-performance

From John Arbash Meinel
Subject Re: poor performance of db?
Date
Msg-id 41F5B9D0.2010803@arbash-meinel.com
Whole thread Raw
In response to Re: poor performance of db?  ("SpaceBallOne" <space_ball_one@hotmail.com>)
List pgsql-performance
SpaceBallOne wrote:

> Thanks for the reply John,
>
> There are approximately 800 rows total in our job table (which stays
> approximately the same because 'completed' jobs get moved to a
> 'job_archive' table).The other jobs not shown by the specific query
> could be on backorder status, temporary deleted status, etc etc.
>
> You are correct in assuming the _id and _no (stands for 'number')
> fields are unique - this was one of the first pages I built when I
> started learning postgres, so not knowing how to set up primary and
> foriegn keys at the time, I did it that way ... it is normalised to a
> point (probably rather sloppy, but its a juggling act between learning
> on the fly, what I'd like to have, and time constraints of being the
> only I.T. guy in the company!)...
>
> I think I will definitely focus on converting my database and php
> pages to using proper primary keys in postgres - especially if they
> automatically index themselves. I didn't do a vacuum analyse on them
> so that may explain why they didn't seem to do much.


You probably can add them now if you don't want to do a lot of redesign.
ALTER TABLE job ADD PRIMARY KEY (id);

If they are not unique this will cause problems, but as they should be
unique, I think it will work.

I'm not sure how much help indexes will be if you only have 800 rows,
and your queries use 300+ of them.

You might need re-think the query/table design.

You might try doing nested queries, or explicit joins, rather than one
big query with a WHERE clause.

Meaning do stuff like:

SELECT
  (job JOIN customer ON job.customer_no = customer.customer_no) as jc
  JOIN location on jc.location_no = location.location_no
...

I also see that the planner seems to mis-estimate the number of rows in
some cases. Like here:

>               ->  Hash  (cost=14.53..14.53 rows=853 width=19) (actual
> time=9.79..9.79 rows=0 loops=1)
>                     ->  Seq Scan on street  (cost=0.00..14.53 rows=853
> width=19) (actual time=0.01..5.12 rows=853 loops=1)
>         ->  Hash  (cost=9.91..9.91 rows=491 width=20) (actual
> time=5.73..5.73 rows=0 loops=1)
>               ->  Seq Scan on ubd  (cost=0.00..9.91 rows=491 width=20)
> (actual time=0.02..2.98 rows=491

Where it thinks the hash will return all of the rows from the sequential
scan, when in reality it returns none.

I think problems with the planner fall into 3 categories.

   1. You didn't VACUUM ANALYZE.
   2. You did, but the planner doesn't keep sufficient statistics (ALTER
      TABLE job ALTER COLUMN no SET STATISTICS <a number>)
   3. You're join needs cross column statistics, which postgres doesn't
      support (yet).

If you only have 800 rows, I don't think you have to worry about
statistics, so that leaves things at 1 or 3. If you did do 1, then I
don't know what to tell you.

John
=:->

PS> I'm not a guru at this stuff, so some of what I say may be wrong.
But hopefully I point you in the right direction.


>
> Thanks,
> Dave
> space_ball_one@hotmail.com
>


Attachment

pgsql-performance by date:

Previous
From: "SpaceBallOne"
Date:
Subject: Re: poor performance of db?
Next
From: "Jim C. Nasby"
Date:
Subject: Re: PostgreSQL clustering VS MySQL clustering