Thread: performance help
Hello all, In writing a perl/DBI based application for our customers, we noticed some very intersting behavoir. Against 2 tables, running a select, when you we do a WHERE clause with a lot of items in it ... the query is fast. When we do it with just one or two items, its hugely slower! Enough so that postgres goes to 100Megs of memory to do the query, and the box begins to swap out! Here are the details: Pentium III 550Mhz 128Mb Ram 20Meg IDE disk Postgres 7.0.0 (Should upgrade to 7.0.2?) Installed and compiled by me, but no compiler tricks Here is the SQL thats fast: cw=# SELECT distinct n.news_id, headline, link, to_char(created, 'mm-dd-yyyy hh24:mi'), cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id = nc.news_id AND cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id cw-# in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5 462, cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc; Here is the SQL that's slow: cw=# SELECT distinct n.news_id, headline, link, cw-# to_char(created, 'mm-dd-yyyy hh24:mi'), cw-# created FROM news_article_summary n, news_cat nc cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND cw-# nc.code_id in(4261,4182) ORDER BY created desc; Here's an explain on the above query: NOTICE: QUERY PLAN: Unique (cost=60322.14..60559.66 rows=1900 width=48) -> Sort (cost=60322.14..60322.14 rows=19001 width=48) -> Nested Loop (cost=0.00..58651.80 rows=19001 width=48) -> Seq Scan on news_article_summary n (cost=0.00..416.14 rows=1898 width=36) -> Index Scan using news_cat_news_id on news_cat nc (cost=0.00..30.53 rows=10 width=12) EXPLAIN (I ran this, but I'm not really sure what any of it means! Is there an explanation somewhere). Here is news_article_summary and news_cat cw=# \d news_article_summary Table "news_article_summary" Attribute | Type | Modifier ------------------+---------------+---------- news_id | integer | not null headline | varchar(255) | brief | varchar(4000) | top_of_news | varchar(1) | urgent | varchar(1) | created | timestamp | link | varchar(255) | region_id | integer | ftfl | varchar(1) | service_mark | varchar(1) | syndication_file | varchar(12) | ufs_file | varchar(16) | Indices: news_article_summary_news_id, x1 Constraint: (news_id NOTNULL) cw=# \d news_cat Table "news_cat" Attribute | Type | Modifier -----------+--------------+---------- news_id | numeric(7,0) | not null code_id | numeric(7,0) | not null Indices: news_cat_code_id, news_cat_news_id Let me know if more info would be helpfull... I'd appreciate any pointers anyone could give me, I'm new to this RDBMS stuff. Oh,here are my postmaster options PGOPTS="-B 256 -N 64 -S 2048" Thanks! -ernie
Ernie <ernie.cline@ipgdirect.com> writes: > Here's an explain on the above query: Um, *which* query was that for? And what's the EXPLAIN output for the other query? regards, tom lane
Hey, Sorry I guess I should have been more specific. I should note that these query's are NOT trying to get the same data ... This query is very fast. cw=# SELECT distinct n.news_id, headline, link, to_char(created, 'mm-dd-yyyyhh24:mi'), cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id = nc.news_id AND cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id cw-# in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5 462, cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc; Here is its EXPLAIN: NOTICE: QUERY PLAN: Unique (cost=60322.14..60559.66 rows=1900 width=48) -> Sort (cost=60322.14..60322.14 rows=19001 width=48) -> Nested Loop (cost=0.00..58651.80 rows=19001 width=48) -> Seq Scan on news_article_summary n (cost=0.00..416.14 rows=1898 width=36) -> Index Scan using news_cat_news_id on news_cat nc (cost=0.00..30.53 rows=10 width=12) And here is the query I'm having problems with (slow) cw=# SELECT distinct n.news_id, headline, link, cw-# to_char(created, 'mm-dd-yyyy hh24:mi'), cw-# created FROM news_article_summary n, news_cat nc cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND cw-# nc.code_id in(4261,4182) ORDER BY created desc; And here is the explain output for that query: NOTICE: QUERY PLAN: Unique (cost=35162.58..35181.27 rows=150 width=48) -> Sort (cost=35162.58..35162.58 rows=1496 width=48) -> Nested Loop (cost=0.00..35083.71 rows=1496 width=48) -> Index Scan using news_cat_code_id, news_cat_code_id on news_cat nc (cost=0.00..55.31 rows=79 width=12) -> Seq Scan on news_article_summary n (cost=0.00..416.14 rows=1898 width=36) Here are my table definitions: cw=# \d news_article_summary Table "news_article_summary" Attribute | Type | Modifier ------------------+---------------+---------- news_id | integer | not null headline | varchar(255) | brief | varchar(4000) | top_of_news | varchar(1) | urgent | varchar(1) | created | timestamp | link | varchar(255) | region_id | integer | ftfl | varchar(1) | service_mark | varchar(1) | syndication_file | varchar(12) | ufs_file | varchar(16) | Indices: news_article_summary_news_id, x1 Constraint: (news_id NOTNULL) cw=# \d news_cat Table "news_cat" Attribute | Type | Modifier -----------+--------------+---------- news_id | numeric(7,0) | not null code_id | numeric(7,0) | not null Indices: news_cat_code_id, news_cat_news_id PGOPTS="-B 256 -N 64 -S 2048" The problem is is that the 'slow' query causes Postgres to use all available memory until it gets the query output. The other strange thing is that when I executed the 'slow' query, it was relatively fast again returning data in about 10 seconds ... but 2 days ago when I ran it, it took like 130 seconds ... Any help would be appreciated! -ernie On Wed, 26 Jul 2000 17:42:09 -0400, Tom Lane said: > Ernie <ernie.cline@ipgdirect.com> writes: > > Here's an explain on the above query: > > Um, *which* query was that for? And what's the EXPLAIN output for > the other query? > > regards, tom lane >
Ernie <ernie.cline@ipgdirect.com> writes: > This query is very fast. > > cw=# SELECT distinct n.news_id, headline, link, to_char(created, > 'mm-dd-yyyyhh24:mi'), > cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id = > nc.news_id AND > cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id > cw-# > in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5 > 462, > cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc; > > Unique (cost=60322.14..60559.66 rows=1900 width=48) > -> Sort (cost=60322.14..60322.14 rows=19001 width=48) > -> Nested Loop (cost=0.00..58651.80 rows=19001 width=48) > -> Seq Scan on news_article_summary n (cost=0.00..416.14 > rows=1898 width=36) > -> Index Scan using news_cat_news_id on news_cat nc > (cost=0.00..30.53 rows=10 width=12) > > And here is the query I'm having problems with (slow) > cw=# SELECT distinct n.news_id, headline, link, > cw-# to_char(created, 'mm-dd-yyyy hh24:mi'), > cw-# created FROM news_article_summary n, news_cat nc > cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND > cw-# nc.code_id in(4261,4182) ORDER BY created desc; > > Unique (cost=35162.58..35181.27 rows=150 width=48) > -> Sort (cost=35162.58..35162.58 rows=1496 width=48) > -> Nested Loop (cost=0.00..35083.71 rows=1496 width=48) > -> Index Scan using news_cat_code_id, news_cat_code_id on > news_cat nc (cost=0.00..55.31 rows=79 width=12) > -> Seq Scan on news_article_summary n (cost=0.00..416.14 > rows=1898 width=36) The difference evidently is which table is scanned as the outside of the nested loop. The first plan says "scan news_article_summary sequentially, and for each row that passes the WHERE clauses that mention only that table, probe into news_cat_news_id for the row(s) that match by news_id; then check the remaining WHERE clauses on the combined row(s)." The second plan says "probe into news_cat_news_id for the row(s) that have the requested code_id values, and for each one scan news_article_summary sequentially to locate matching rows." If there are a lot of matches for nc.code_id in(4261,4182) then the second plan would scan news_article_summary many times, which'd account for it being slow. Are news_id and code_id unique columns? I am guessing from context that at least news_id might be. The planner does not seem to know that, judging from its row-count guesses. Perhaps all you need is a VACUUM ANALYZE so that the planner becomes aware the column is unique. regards, tom lane