Thread: performance help

performance help

From

Ernie

Date:

26 July 2000, 16:01:09

Hello all,
In writing a perl/DBI based application for our customers, we noticed some very
intersting behavoir.  Against 2 tables, running a select, when you we do a
WHERE clause with a lot of items in it ... the query is fast.  When we do it
with just one or two items, its hugely slower!    Enough so that postgres goes to
100Megs of memory to do the query, and the box begins to swap out!  Here are
the details:

Pentium III 550Mhz
128Mb Ram
20Meg IDE disk

Postgres 7.0.0 (Should upgrade to 7.0.2?)
Installed and compiled by me, but no compiler tricks

Here is the SQL thats fast:

cw=# SELECT distinct n.news_id, headline, link, to_char(created, 'mm-dd-yyyy
hh24:mi'),
cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id =
nc.news_id AND
cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id
cw-#
in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5
462,
cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc;

Here is the SQL that's slow:
cw=# SELECT distinct n.news_id, headline, link,
cw-#  to_char(created, 'mm-dd-yyyy hh24:mi'),
cw-# created FROM news_article_summary n, news_cat nc
cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND
cw-# nc.code_id in(4261,4182) ORDER BY created desc;

Here's an explain on the above query:

NOTICE:  QUERY PLAN:

Unique    (cost=60322.14..60559.66 rows=1900 width=48)
  ->  Sort  (cost=60322.14..60322.14 rows=19001 width=48)
    ->  Nested Loop  (cost=0.00..58651.80 rows=19001 width=48)
          ->  Seq Scan on news_article_summary n  (cost=0.00..416.14
rows=1898 width=36)
          ->  Index Scan using news_cat_news_id on news_cat nc
(cost=0.00..30.53 rows=10 width=12)

EXPLAIN
(I ran this, but I'm not really sure what any of it means!  Is there an
explanation somewhere).

Here is news_article_summary and news_cat

cw=# \d news_article_summary
    Table "news_article_summary"
    Attribute      |    Type      | Modifier
------------------+---------------+----------
 news_id      | integer      | not null
 headline      | varchar(255)  |
 brief          | varchar(4000) |
 top_of_news      | varchar(1)      |
 urgent       | varchar(1)      |
 created      | timestamp      |
 link          | varchar(255)  |
 region_id      | integer      |
 ftfl          | varchar(1)      |
 service_mark      | varchar(1)      |
 syndication_file | varchar(12)   |
 ufs_file      | varchar(16)   |
Indices: news_article_summary_news_id,
     x1
Constraint: (news_id NOTNULL)

cw=# \d news_cat
      Table "news_cat"
 Attribute |     Type      | Modifier
-----------+--------------+----------
 news_id   | numeric(7,0) | not null
 code_id   | numeric(7,0) | not null
Indices: news_cat_code_id,
     news_cat_news_id


Let me know if more info would be helpfull... I'd appreciate any pointers
anyone could give me, I'm new to this RDBMS stuff.  Oh,here are my postmaster
options

PGOPTS="-B 256 -N 64 -S 2048"

Thanks!

-ernie

Re: performance help

From

Tom Lane

Date:

26 July 2000, 17:42:16

Ernie <ernie.cline@ipgdirect.com> writes:
> Here's an explain on the above query:

Um, *which* query was that for?  And what's the EXPLAIN output for
the other query?

            regards, tom lane

Re: performance help

From

Ernie

Date:

28 July 2000, 10:42:59

Hey,
Sorry I guess I should have been more specific.  I should note that these
query's are NOT trying to get the same data ...

This query is very fast.

cw=# SELECT distinct n.news_id, headline, link, to_char(created,
'mm-dd-yyyyhh24:mi'),
cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id =
nc.news_id AND
cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id
cw-#
in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5
462,
cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc;

Here is its EXPLAIN:
NOTICE:  QUERY PLAN:

Unique    (cost=60322.14..60559.66 rows=1900 width=48)
  ->  Sort  (cost=60322.14..60322.14 rows=19001 width=48)
    ->  Nested Loop  (cost=0.00..58651.80 rows=19001 width=48)
          ->  Seq Scan on news_article_summary n  (cost=0.00..416.14
rows=1898 width=36)
          ->  Index Scan using news_cat_news_id on news_cat nc
(cost=0.00..30.53 rows=10 width=12)

And here is the query I'm having problems with (slow)
cw=# SELECT distinct n.news_id, headline, link,
cw-# to_char(created, 'mm-dd-yyyy hh24:mi'),
cw-# created FROM news_article_summary n, news_cat nc
cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND
cw-# nc.code_id in(4261,4182) ORDER BY created desc;

And here is the explain output for that query:
NOTICE:  QUERY PLAN:

Unique    (cost=35162.58..35181.27 rows=150 width=48)
  ->  Sort  (cost=35162.58..35162.58 rows=1496 width=48)
    ->  Nested Loop  (cost=0.00..35083.71 rows=1496 width=48)
          ->  Index Scan using news_cat_code_id, news_cat_code_id on
news_cat nc  (cost=0.00..55.31 rows=79 width=12)
          ->  Seq Scan on news_article_summary n  (cost=0.00..416.14
rows=1898 width=36)

Here are my table definitions:
cw=# \d news_article_summary
Table "news_article_summary"
Attribute | Type | Modifier
------------------+---------------+----------
news_id | integer | not null
headline | varchar(255) |
brief | varchar(4000) |
top_of_news | varchar(1) |
urgent | varchar(1) |
created | timestamp |
link | varchar(255) |
region_id | integer |
ftfl | varchar(1) |
service_mark | varchar(1) |
syndication_file | varchar(12) |
ufs_file | varchar(16) |
Indices: news_article_summary_news_id, x1
Constraint: (news_id NOTNULL)


cw=# \d news_cat
Table "news_cat"
Attribute | Type | Modifier
-----------+--------------+----------
news_id | numeric(7,0) | not null
code_id | numeric(7,0) | not null
Indices: news_cat_code_id,
news_cat_news_id

PGOPTS="-B 256 -N 64 -S 2048"

The problem is is that the 'slow' query causes Postgres to use all available
memory until it gets the query output.    The other strange thing is that when I
executed the 'slow' query, it was relatively fast again returning data in about
10 seconds ... but 2 days ago when I ran it, it took like 130 seconds ... Any
help would be appreciated!

-ernie


On Wed, 26 Jul 2000 17:42:09 -0400, Tom Lane said:

> Ernie <ernie.cline@ipgdirect.com> writes:
>  > Here's an explain on the above query:
>
>  Um, *which* query was that for?  And what's the EXPLAIN output for
>  the other query?
>
>              regards, tom lane
>

Re: performance help

From

Tom Lane

Date:

28 July 2000, 16:44:54

Ernie <ernie.cline@ipgdirect.com> writes:
> This query is very fast.
>
> cw=# SELECT distinct n.news_id, headline, link, to_char(created,
> 'mm-dd-yyyyhh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id =
> nc.news_id AND
> cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id
> cw-#
> in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5
> 462,
> cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc;
>
> Unique    (cost=60322.14..60559.66 rows=1900 width=48)
>   ->  Sort  (cost=60322.14..60322.14 rows=19001 width=48)
>     ->  Nested Loop  (cost=0.00..58651.80 rows=19001 width=48)
>           ->  Seq Scan on news_article_summary n  (cost=0.00..416.14
> rows=1898 width=36)
>           ->  Index Scan using news_cat_news_id on news_cat nc
> (cost=0.00..30.53 rows=10 width=12)
>
> And here is the query I'm having problems with (slow)
> cw=# SELECT distinct n.news_id, headline, link,
> cw-# to_char(created, 'mm-dd-yyyy hh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc
> cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND
> cw-# nc.code_id in(4261,4182) ORDER BY created desc;
>
> Unique    (cost=35162.58..35181.27 rows=150 width=48)
>   ->  Sort  (cost=35162.58..35162.58 rows=1496 width=48)
>     ->  Nested Loop  (cost=0.00..35083.71 rows=1496 width=48)
>           ->  Index Scan using news_cat_code_id, news_cat_code_id on
> news_cat nc  (cost=0.00..55.31 rows=79 width=12)
>           ->  Seq Scan on news_article_summary n  (cost=0.00..416.14
> rows=1898 width=36)

The difference evidently is which table is scanned as the outside of the
nested loop.  The first plan says "scan news_article_summary
sequentially, and for each row that passes the WHERE clauses that
mention only that table, probe into news_cat_news_id for the row(s)
that match by news_id; then check the remaining WHERE clauses on the
combined row(s)."  The second plan says "probe into news_cat_news_id
for the row(s) that have the requested code_id values, and for each
one scan news_article_summary sequentially to locate matching rows."

If there are a lot of matches for nc.code_id in(4261,4182) then the
second plan would scan news_article_summary many times, which'd account
for it being slow.

Are news_id and code_id unique columns?  I am guessing from context
that at least news_id might be.  The planner does not seem to know
that, judging from its row-count guesses.  Perhaps all you need is
a VACUUM ANALYZE so that the planner becomes aware the column is
unique.

            regards, tom lane