Thread: LIMIT causes SEQSCAN in subselect

LIMIT causes SEQSCAN in subselect

From
Mike Rylander
Date:
First off, WOO-HOO!  The lists are back and I can finally get my PG
fix!!!  Now, on to the business at hand...

I have four query plans below, the first two help explain my question,
and the last two are about a speculative alternative.  The first query
use a subselects that are generated from a middleware layer and are
then joined to create the final result set.  In order to keep total
execution time down, the middleware imposes a LIMIT clause on each.

I'm using the fact that Postgres can elevate a subselect-join to a
simple join when there are no aggregates involved and I think I
remember there has been some work recently on elevating subselects
that contain a LIMIT, so I went back and ran the plans without the
LIMITs to see what would happen.  Well, the limit killed the subselect
elevation.  I'm not too worried about the relative execution times
since it's very fast, but more curious about the plan that was chosen.

It seems that the planner knows that the results from subselect 'b'
will contain just one row due to the fact that the index it is
scanning is unique.  Would it not make sense to discard the LIMIT
clause on that subselect?  That would result in the third plan, which
has better performance than the generated query, and is guaranteed to
return the same results since the index in use is unique.  Also,
wouldn't it make sense for subselect 'a' to be elevated sans LIMIT
just to see if there is a unique index it might be able to use?

I realize this is a rather specialized case and not really great form.
 But because PG can, in some cases, elevate subselects, writing
middleware to join results becomes pretty easy.  Just a matter of
defining result sets independently, and creating a simple wrapper to
join them.

In any case, I will probably end up just detecting the subselect
condition in the middleware and drop the limit when there are some
WHERE clauses on the inner query.  I just thought I'd bring up a
possible optimization for the future, and was curious what the gurus
might have to say!



--  Version info and queries in question.

oils4=# select version();

version

---------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.0.0beta4 on x86_64-unknown-linux-gnu, compiled by GCC
gcc (GCC) 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2,
pie-8.7.6)
(1 row)


-- query 1:  the query generated by middleware

oils4=# EXPLAIN ANALYZE select a.record, b.control from (select * from
biblio.record where id = 100000 limit 1000) b, (select * from
biblio.metarecord_field_entry limit 1000) a where a.source = b.id;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=3.68..44.49 rows=5 width=40) (actual
time=2.066..2.066 rows=0 loops=1)
   Hash Cond: ("outer".source = "inner".id)
   ->  Subquery Scan a  (cost=0.00..35.75 rows=1000 width=16) (actual
time=0.005..1.295 rows=1000 loops=1)
         ->  Limit  (cost=0.00..25.75 rows=1000 width=87) (actual
time=0.003..0.641 rows=1000 loops=1)
               ->  Seq Scan on metarecord_field_entry
(cost=0.00..43379.75 rows=1684575 width=87) (actual time=0.003..0.435
rows=1000 loops=1)
   ->  Hash  (cost=3.68..3.68 rows=1 width=40) (actual
time=0.039..0.039 rows=0 loops=1)
         ->  Subquery Scan b  (cost=0.00..3.68 rows=1 width=40)
(actual time=0.031..0.033 rows=1 loops=1)
               ->  Limit  (cost=0.00..3.67 rows=1 width=1070) (actual
time=0.029..0.030 rows=1 loops=1)
                     ->  Index Scan using biblio_record_pkey on record
 (cost=0.00..3.67 rows=1 width=1070) (actual time=0.027..0.028 rows=1
loops=1)
                           Index Cond: (id = 100000)
 Total runtime: 2.171 ms
(11 rows)


-- query 2: the fast query, no limit allows elevation of subselects

oils4=# EXPLAIN ANALYZE select a.record, b.control from (select * from
biblio.record where id = 100000) b, (select * from
biblio.metarecord_field_entry) a where a.source = b.id;

     QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..19.95 rows=9 width=22) (actual
time=0.043..0.055 rows=7 loops=1)
   ->  Index Scan using biblio_record_pkey on record  (cost=0.00..3.67
rows=1 width=22) (actual time=0.025..0.026 rows=1 loops=1)
         Index Cond: (id = 100000)
   ->  Index Scan using metarecord_field_entry_source_idx on
metarecord_field_entry  (cost=0.00..16.19 rows=9 width=16) (actual
time=0.011..0.018 rows=7 loops=1)
         Index Cond: (source = 100000)
 Total runtime: 0.101 ms
(6 rows)



-- query 3:  if we were to drop the limit, since we're using a unique index

oils4=# EXPLAIN ANALYZE select a.record, b.control from (select * from
biblio.record where id = 100000) b, (select * from
biblio.metarecord_field_entry limit 1000) a where a.source = b.id;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..41.97 rows=5 width=22) (actual
time=1.169..1.169 rows=0 loops=1)
   ->  Index Scan using biblio_record_pkey on record  (cost=0.00..3.67
rows=1 width=22) (actual time=0.036..0.038 rows=1 loops=1)
         Index Cond: (id = 100000)
   ->  Subquery Scan a  (cost=0.00..38.25 rows=5 width=16) (actual
time=1.126..1.126 rows=0 loops=1)
         Filter: (source = 100000)
         ->  Limit  (cost=0.00..25.75 rows=1000 width=87) (actual
time=0.005..0.673 rows=1000 loops=1)
               ->  Seq Scan on metarecord_field_entry
(cost=0.00..43379.75 rows=1684575 width=87) (actual time=0.004..0.424
rows=1000 loops=1)
 Total runtime: 1.243 ms
(8 rows)



-- query 4:  what I would like the seqscan in query 3 to become...

oils4=# EXPLAIN ANALYZE select * from biblio.metarecord_field_entry
where source = 100000 limit 1000;

     QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..16.19 rows=9 width=87) (actual time=0.026..0.035
rows=7 loops=1)
   ->  Index Scan using metarecord_field_entry_source_idx on
metarecord_field_entry  (cost=0.00..16.19 rows=9 width=87) (actual
time=0.025..0.032 rows=7 loops=1)
         Index Cond: (source = 100000)
 Total runtime: 0.069 ms
(4 rows)

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: LIMIT causes SEQSCAN in subselect

From
Josh Berkus
Date:
Mike,

> I'm using the fact that Postgres can elevate a subselect-join to a
> simple join when there are no aggregates involved and I think I
> remember there has been some work recently on elevating subselects
> that contain a LIMIT, so I went back and ran the plans without the
> LIMITs to see what would happen.  Well, the limit killed the subselect
> elevation.

Actually, this makes sense.   A LIMIT requires the data to be ordered first,
and then cut based on the order; it prevents collapsing the subselect into
the main query.   Some sort of materializing is necessary, even in cases like
yours where the limit is inherently meaningless because you've neglected to
use an ORDER BY.

The fact that the estimator knows that the LIMIT is pointless because there
are less rows in the subselect than the LIMIT will return is not something we
want to count on; sometimes the estimator has innaccurate information.  The
UNIQUE index makes this more certain, except that I'm not sure that the
planner distinguishes between actual UNIQUE indexes and columns which are
estimated unique (per the pg_stats).   And I think you can see in your case
that there's quite a difference between a column we're CERTAIN is unique,
versus a column we THINK is unique.

> I realize this is a rather specialized case and not really great form.

Exactly.   You've grasped the main issue: that this has not been optimized
because it's bizarre and not very sensible query writing.   Someday we'll get
around to optimizing the really wierd queries, but there's still a lot of
work to be done on the common ones (like count(*) ...).

Keep in mind that the only reason we support LIMIT inside subqueries in the
first place is a workaround to slow aggregates, and a way to do RANK.  It's
certainly not SQL-standard.

> Just a matter of
> defining result sets independently, and creating a simple wrapper to
> join them.

Well, if you think so, you know where to submit patches ...

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: LIMIT causes SEQSCAN in subselect

From
Mike Rylander
Date:
On Fri, 10 Dec 2004 21:40:18 -0800, Josh Berkus <josh@agliodbs.com> wrote:
> Mike,
> The fact that the estimator knows that the LIMIT is pointless because there
> are less rows in the subselect than the LIMIT will return is not something we
> want to count on; sometimes the estimator has innaccurate information.  The
> UNIQUE index makes this more certain, except that I'm not sure that the
> planner distinguishes between actual UNIQUE indexes and columns which are
> estimated unique (per the pg_stats).   And I think you can see in your case
> that there's quite a difference between a column we're CERTAIN is unique,
> versus a column we THINK is unique.

Absolutely.  At first I was going to ask if perhaps using the stats to
discard the LIMIT would be possible, but since the stats are only
guidelines I dropped that.  The stats are just so tempting!

>
> > I realize this is a rather specialized case and not really great form.
>
> Exactly.   You've grasped the main issue: that this has not been optimized
> because it's bizarre and not very sensible query writing.   Someday we'll get
> around to optimizing the really wierd queries, but there's still a lot of
> work to be done on the common ones (like count(*) ...).

Absolutely.  And if I can help out with the common cases to gain some
Karmic currency I will. ;)  After thinking about it some more, I don't
think those queries we really all that wacky though.  The problem with
the example is that the generated query is very simple, and real-world
queries that would be used in the subselect  would be much more
complex, and row estimation would be untrustworthy without a UNIQUE
index.

>
> Keep in mind that the only reason we support LIMIT inside subqueries in the
> first place is a workaround to slow aggregates, and a way to do RANK.  It's
> certainly not SQL-standard.
>

No it's not, but then nobody ever accused the authors of the SQL spec
of being omniscient...  I' cant think of another way to get, say, a
'top 10' list from a subselect, or use a paging iterator (LIMIT ..
OFFSET ..) as the seed for an outer query.  Well, other than an SRF of
course.

> > Just a matter of
> > defining result sets independently, and creating a simple wrapper to
> > join them.
>
> Well, if you think so, you know where to submit patches ...
>

Well, I do, but I was talking about it being 'easy' in the middleware.
 Just let PG handle optimizing the subselects.

For example, you have a pile of predefined SELECTS that don't know
they are related and are used for simple lookups.  You tell the SQL
generator thingy that it should use two of those, queries A and B,
that they are related on x, and that you want to see the 'id' from A
and the 'value' from B.  Instead of having to preplan every possible
combination of JOINS the SQL generator will toss the preplanned ones
into subselects and join them in the outer query instead of having to
rip them apart and calculate the join syntax.  And yes, I know that
view will take care of most of that for me... :)

Thanks for all your comments.  Pretty much what I expected, but I
thought I'd raise a use case.  I'll just have to give the query
builder more smarts.

> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: LIMIT causes SEQSCAN in subselect

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> The fact that the estimator knows that the LIMIT is pointless because there
> are less rows in the subselect than the LIMIT will return is not something we
> want to count on; sometimes the estimator has innaccurate information.

However, when the estimator is basing that estimate on the existence of
a unique index for the column, the estimate could be trusted.  There are
a couple of reasons that we don't perform that optimization at present,
though:

1. If the finished query plan doesn't actually *use* the index in
question, then dropping the index would not directly invalidate the
query plan, but nonetheless the query would be broken.  You could
subsequently get silently-wrong answers.

2. For the particular point at hand, there's an implementation problem,
which is that decisions about whether to flatten subqueries are taken
before we do any rowcount estimation.  So even if we discarded the LIMIT
clause once we realized it was redundant, it'd be too late to get the
optimal overall plan.

Point #1 is something I would like to fix whenever we get around to
implementing proper invalidation of cached plans.  There would need to
be a way to list "indirect" as well as direct dependencies of a plan.

            regards, tom lane

Re: LIMIT causes SEQSCAN in subselect

From
Pierre-Frédéric Caillaud
Date:
> The fact that the estimator knows that the LIMIT is pointless because
> there
> are less rows in the subselect than the LIMIT will return is not
> something we
> want to count on; sometimes the estimator has innaccurate information.
> The
> UNIQUE index makes this more certain, except that I'm not sure that the
> planner distinguishes between actual UNIQUE indexes and columns which are
> estimated unique (per the pg_stats).   And I think you can see in your
> case
> that there's quite a difference between a column we're CERTAIN is unique,
> versus a column we THINK is unique.

    I think a UNIQUE constraint can permit several 'different' NULL values...
better say "UNIQUE NOT NULL" ?