sub-query optimization - Mailing list pgsql-sql
From | Brad Hilton |
---|---|
Subject | sub-query optimization |
Date | |
Msg-id | 1045246939.29966.22.camel@aragorn.vpop.net Whole thread Raw |
Responses |
Re: sub-query optimization
Re: sub-query optimization Re: sub-query optimization Re: sub-query optimization Re: sub-query optimization |
List | pgsql-sql |
Hello, I am hoping someone can help explain why modifying the following query can effect such a huge change in speed. The query is: select * from articles where exists ( select 1 from article_categories where article_categories.article_id= articles.id and article_categories.category_id is null ) The original query was much more complex, but I have trimmed it down to highlight the problem. The query above also manifests the problem. OK, the above query (with 100,000 records in the articles table) takes 1292 msec (see output below). If I modify the query slightly: -------- select 1 from article_categories --> select 1 from articles, article_categories --------- the query takes 98 msec. Now, normally, I would just leave the query at that and enjoy my newfound speed. But, adding that extra table to the inner query only helps when the inner query matches few or no records. In my sample dataset, there are no records where category_id is null. However, if I modify the inner query such that it matches many records, adding that extra table to the inner query has the opposite effect. It kills the speed. I am wondering if there is a way to write a query that performs effeciently, regardless of the number of records the inner query matches. Can anyone offer any help? Thanks! -Brad ============================================ TABLE STRUCTURE: --------------- create table articles ( id serial primary key ); create index articles_publish_time_key on articles (publish_time); create table categories (id serial primary key ); create table article_categories ( article_id int not null references articles, category_id int not null references categories,primary key(article_id, category_id) ); create index article_categories_article_id_key on article_categories (article_id); ---------------------------------------- EXPLAIN ANALYZE OUTPUT -------------------------- (FAST QUERY)Result (cost=0.00..4217.00 rows=100000 width=187) (actual time=98.00..98.00 rows=0 loops=1) One-Time Filter: $0 InitPlan -> Nested Loop (cost=0.00..2984.03 rows=1 width=8)(actual time=97.98..97.98 rows=0 loops=1) -> Seq Scan on article_categories (cost=0.00..2981.00 rows=1 width=4) (actual time=97.98..97.98 rows=0 loops=1) Filter: (category_id IS NULL) -> IndexScan using articles_pkey on articles (cost=0.00..3.01 rows=1 width=4) (never executed) Index Cond: ("outer".article_id = articles.id) -> SeqScan on articles (cost=0.00..4217.00 rows=100000 width=187) (never executed)Total runtime: 98.24 msec (10 rows) ------------------------------------------------ SLOW QUERY: -------------Seq Scan on articles (cost=0.00..306827.16 rows=50000 width=187) (actual time=1292.48..1292.48 rows=0 loops=1) Filter: (subplan) SubPlan -> Index Scan using article_categories_article_id_keyon article_categories (cost=0.00..3.03 rows=1 width=0) (actual time=0.01..0.01 rows=0 loops=100000) Index Cond: (article_id = $0) Filter: (category_id IS NULL)Total runtime:1292.68 msec (7 rows)