LIMIT causes SEQSCAN in subselect - Mailing list pgsql-performance
From | Mike Rylander |
---|---|
Subject | LIMIT causes SEQSCAN in subselect |
Date | |
Msg-id | b918cf3d04121010405e6c963c@mail.gmail.com Whole thread Raw |
Responses |
Re: LIMIT causes SEQSCAN in subselect
|
List | pgsql-performance |
First off, WOO-HOO! The lists are back and I can finally get my PG fix!!! Now, on to the business at hand... I have four query plans below, the first two help explain my question, and the last two are about a speculative alternative. The first query use a subselects that are generated from a middleware layer and are then joined to create the final result set. In order to keep total execution time down, the middleware imposes a LIMIT clause on each. I'm using the fact that Postgres can elevate a subselect-join to a simple join when there are no aggregates involved and I think I remember there has been some work recently on elevating subselects that contain a LIMIT, so I went back and ran the plans without the LIMITs to see what would happen. Well, the limit killed the subselect elevation. I'm not too worried about the relative execution times since it's very fast, but more curious about the plan that was chosen. It seems that the planner knows that the results from subselect 'b' will contain just one row due to the fact that the index it is scanning is unique. Would it not make sense to discard the LIMIT clause on that subselect? That would result in the third plan, which has better performance than the generated query, and is guaranteed to return the same results since the index in use is unique. Also, wouldn't it make sense for subselect 'a' to be elevated sans LIMIT just to see if there is a unique index it might be able to use? I realize this is a rather specialized case and not really great form. But because PG can, in some cases, elevate subselects, writing middleware to join results becomes pretty easy. Just a matter of defining result sets independently, and creating a simple wrapper to join them. In any case, I will probably end up just detecting the subselect condition in the middleware and drop the limit when there are some WHERE clauses on the inner query. I just thought I'd bring up a possible optimization for the future, and was curious what the gurus might have to say! -- Version info and queries in question. oils4=# select version(); version --------------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 8.0.0beta4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6) (1 row) -- query 1: the query generated by middleware oils4=# EXPLAIN ANALYZE select a.record, b.control from (select * from biblio.record where id = 100000 limit 1000) b, (select * from biblio.metarecord_field_entry limit 1000) a where a.source = b.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=3.68..44.49 rows=5 width=40) (actual time=2.066..2.066 rows=0 loops=1) Hash Cond: ("outer".source = "inner".id) -> Subquery Scan a (cost=0.00..35.75 rows=1000 width=16) (actual time=0.005..1.295 rows=1000 loops=1) -> Limit (cost=0.00..25.75 rows=1000 width=87) (actual time=0.003..0.641 rows=1000 loops=1) -> Seq Scan on metarecord_field_entry (cost=0.00..43379.75 rows=1684575 width=87) (actual time=0.003..0.435 rows=1000 loops=1) -> Hash (cost=3.68..3.68 rows=1 width=40) (actual time=0.039..0.039 rows=0 loops=1) -> Subquery Scan b (cost=0.00..3.68 rows=1 width=40) (actual time=0.031..0.033 rows=1 loops=1) -> Limit (cost=0.00..3.67 rows=1 width=1070) (actual time=0.029..0.030 rows=1 loops=1) -> Index Scan using biblio_record_pkey on record (cost=0.00..3.67 rows=1 width=1070) (actual time=0.027..0.028 rows=1 loops=1) Index Cond: (id = 100000) Total runtime: 2.171 ms (11 rows) -- query 2: the fast query, no limit allows elevation of subselects oils4=# EXPLAIN ANALYZE select a.record, b.control from (select * from biblio.record where id = 100000) b, (select * from biblio.metarecord_field_entry) a where a.source = b.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..19.95 rows=9 width=22) (actual time=0.043..0.055 rows=7 loops=1) -> Index Scan using biblio_record_pkey on record (cost=0.00..3.67 rows=1 width=22) (actual time=0.025..0.026 rows=1 loops=1) Index Cond: (id = 100000) -> Index Scan using metarecord_field_entry_source_idx on metarecord_field_entry (cost=0.00..16.19 rows=9 width=16) (actual time=0.011..0.018 rows=7 loops=1) Index Cond: (source = 100000) Total runtime: 0.101 ms (6 rows) -- query 3: if we were to drop the limit, since we're using a unique index oils4=# EXPLAIN ANALYZE select a.record, b.control from (select * from biblio.record where id = 100000) b, (select * from biblio.metarecord_field_entry limit 1000) a where a.source = b.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..41.97 rows=5 width=22) (actual time=1.169..1.169 rows=0 loops=1) -> Index Scan using biblio_record_pkey on record (cost=0.00..3.67 rows=1 width=22) (actual time=0.036..0.038 rows=1 loops=1) Index Cond: (id = 100000) -> Subquery Scan a (cost=0.00..38.25 rows=5 width=16) (actual time=1.126..1.126 rows=0 loops=1) Filter: (source = 100000) -> Limit (cost=0.00..25.75 rows=1000 width=87) (actual time=0.005..0.673 rows=1000 loops=1) -> Seq Scan on metarecord_field_entry (cost=0.00..43379.75 rows=1684575 width=87) (actual time=0.004..0.424 rows=1000 loops=1) Total runtime: 1.243 ms (8 rows) -- query 4: what I would like the seqscan in query 3 to become... oils4=# EXPLAIN ANALYZE select * from biblio.metarecord_field_entry where source = 100000 limit 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..16.19 rows=9 width=87) (actual time=0.026..0.035 rows=7 loops=1) -> Index Scan using metarecord_field_entry_source_idx on metarecord_field_entry (cost=0.00..16.19 rows=9 width=87) (actual time=0.025..0.032 rows=7 loops=1) Index Cond: (source = 100000) Total runtime: 0.069 ms (4 rows) -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
pgsql-performance by date: