Thread: Why would this slow the query down so much?

Why would this slow the query down so much?

From
Stuart Grimshaw
Date:
I have 3 tables that I am trying to join together: 

------------------------------------------ Table "caturljoin"Attribute |  Type   |
-----------+---------+category  | integer |url       | integer |
Index: caturljoin_url       caturljoin_cat
          Table "stories" Attribute  |          Type          |
-------------+------------------------+urn         | integer                |headline    | character varying
|author     | character varying      |source      | integer                |story       | text                   |added
     | date                   |description | character varying      |displayall  | smallint               |fullurl
|character varying(255) |publish     | smallint               |error       | integer                |sourceurl   |
charactervarying(255) |sourcename  | character varying(100) |rank        | smallint               |
 
Indices: stories_added,        stories_source,        stories_unique_story,        stories_urn_key
          Table "urllist" Attribute   |          Type          |
--------------+------------------------+urn          | integer                |url          | character varying(255)
|friendlyname| character varying(30)  |homepage     | character varying(255) |method       | smallint
|script      | character varying(20)  |params       | character varying(500) |collect      | smallint
|section     | smallint               |index_script | character varying      |regexp       | character varying(100)
|baseurl     | character varying(75)  |
 
Index: urllist_urn
------------------------------------------

With the following SQL:

------------------------------------------
SELECT a.category, b.headline, b.added, c.friendlyname
FROM caturljoin as a       INNER JOIN stories as b ON (a.url = b.source)       INNER JOIN urllist as c ON (a.url =
d.urn)
WHERE a.category = 93 ORDER BY b.added DESC LIMIT 1;
------------------------------------------

The results of explain for the above are:

------------------------------------------
psql:scratch.sql:5: NOTICE:  QUERY PLAN:
Limit  (cost=1587.30..1587.30 rows=1 width=44) ->  Sort  (cost=1587.30..1587.30 rows=1 width=44)       ->  Merge Join
(cost=249.89..1587.29rows=1 width=44)             ->  Sort  (cost=249.89..249.89 rows=409 width=28)
-> Nested Loop  (cost=0.00..232.15 rows=409 width=28)                         ->  Index Scan using caturljoin_cat on
caturljoin
 
a  (cost=0.00..5.09 rows=7 width=8)                         ->  Index Scan using stories_source on stories b  
(cost=0.00..34.41 rows=29 width=20)             ->  Index Scan using urllist_urn on urllist c  
(cost=0.00..1323.69 rows=505 width=16)
EXPLAIN
------------------------------------------

and as you might be able to guess the query takes an age to complete.

If I remove the table urllist from the query, I get a much better response:

------------------------------------------
psql:scratch.sql:4: NOTICE:  QUERY PLAN:
Limit  (cost=0.00..207.74 rows=1 width=28) ->  Nested Loop  (cost=0.00..84945.18 rows=409 width=28)       ->  Index
ScanBackward using stories_added on stories b  
 
(cost=0.00..2310.04 rows=16149 width=20)       ->  Index Scan using caturljoin_url on caturljoin a  
(cost=0.00..5.10 rows=1 width=8)
EXPLAIN
------------------------------------------

Currently the tables contain the following rows of data:

------------------------------------------
caturljoin: 653 rows
urllist: 505 rows
stories: 21554 rows
------------------------------------------

Can anyone tell me why the inclusion of urllist would slow it down so much, 
and what can I do to improve the speed of the query?


Re: Why would this slow the query down so much?

From
Tom Lane
Date:
Stuart Grimshaw <nospam@smgsystems.co.uk> writes:
> SELECT a.category, b.headline, b.added, c.friendlyname
> FROM caturljoin as a
>         INNER JOIN stories as b ON (a.url = b.source)
>         INNER JOIN urllist as c ON (a.url = d.urn)
> WHERE a.category = 93 ORDER BY b.added DESC LIMIT 1;

(I assume "d.urn" is a typo for "c.urn"...)

The query plan you show looks pretty reasonable if the planner's row
count estimates are in the right ballpark.  How many caturljoin rows
have category = 93?  How many stories rows will match each caturljoin
row?  How many urllist rows ditto?
        regards, tom lane


Re: Why would this slow the query down so much?

From
"Josh Berkus"
Date:
Stuart,

> ------------------------------------------
> SELECT a.category, b.headline, b.added, c.friendlyname
> FROM caturljoin as a
>         INNER JOIN stories as b ON (a.url = b.source)
>         INNER JOIN urllist as c ON (a.url = d.urn)
> WHERE a.category = 93 ORDER BY b.added DESC LIMIT 1;
> ------------------------------------------

Hmmm?  Where did table "d" come from?  If this is really your query,
that's your problem.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: Why would this slow the query down so much?

From
Stuart Grimshaw
Date:
On Monday 15 October 2001 16:12 pm, Tom Lane wrote:
> Stuart Grimshaw <nospam@smgsystems.co.uk> writes:
> > SELECT a.category, b.headline, b.added, c.friendlyname
> > FROM caturljoin as a
> >         INNER JOIN stories as b ON (a.url = b.source)
> >         INNER JOIN urllist as c ON (a.url = d.urn)
> > WHERE a.category = 93 ORDER BY b.added DESC LIMIT 1;
>
> (I assume "d.urn" is a typo for "c.urn"...)
>
> The query plan you show looks pretty reasonable if the planner's row
> count estimates are in the right ballpark.  How many caturljoin rows
> have category = 93?  How many stories rows will match each caturljoin
> row?  How many urllist rows ditto?

There are 194 rows in caturljoin where url = 93, 29806 rows in stories will 
match those 194 rows and only 1 row in urllist will match.

-- 

| Stuart Grimshaw <stuart@footballnet.com>
| Chief Operations Officer
| Football Networks Ltd
|-
| t:07976 625221
| f:0870 7060260


Re: Why would this slow the query down so much?

From
Masaru Sugawara
Date:
On Tue, 16 Oct 2001 17:58:32 +0100
Stuart Grimshaw wrote:

> On Monday 15 October 2001 16:12 pm, Tom Lane wrote:
> > Stuart Grimshaw <nospam@smgsystems.co.uk> writes:
> > > SELECT a.category, b.headline, b.added, c.friendlyname
> > > FROM caturljoin as a
> > >         INNER JOIN stories as b ON (a.url = b.source)
> > >         INNER JOIN urllist as c ON (a.url = d.urn)
> > > WHERE a.category = 93 ORDER BY b.added DESC LIMIT 1;
> >
> > (I assume "d.urn" is a typo for "c.urn"...)
> >
> > The query plan you show looks pretty reasonable if the planner's row
> > count estimates are in the right ballpark.  How many caturljoin rows
> > have category = 93?  How many stories rows will match each caturljoin
> > row?  How many urllist rows ditto?
> 
> There are 194 rows in caturljoin where url = 93, 29806 rows in stories will 
> match those 194 rows and only 1 row in urllist will match.
> 
If it's convenient, would you try to delete some indices of the "stories" table?  the total number of sorts on theQUERY
PLANmight decrease.  However, this trial may be a vain effort. I can't expect the result of the QUERY PLAN.  :-)The
indices:"stories_source",             "stories_unique_story",             and "stories_urn_key"
 


Regards,
Masaru Sugawara