Thread: Odd problem with planner choosing seq scan

Odd problem with planner choosing seq scan

From
Colin McGuigan
Date:
I have two tables, staff (530 rows) and location (2.5 million rows).  I
do a query that joins the two together, as so:

SELECT s.ProprietorId, l.LocationId, s.RoleId
    FROM Location l
    INNER JOIN (
        SELECT *
        FROM Staff
    ) s ON l.ProprietorId = s.ProprietorId
    WHERE s.UserId = 123456
    AND s.LocationId IS NULL

Ignore the fact that it's a subquery -- the query plan is the same if
its a straight JOIN, and I'm going to use the subquery to demonstrate
something interesting.

Anyways, this takes ~45 seconds to run, and returns 525 rows (just about
1 per record in the Staff table; 5 records are not for that user are so
are excluded).  The EXPLAIN is:

Nested Loop  (cost=243.50..34315.32 rows=10286 width=12)
  ->  Subquery Scan s  (cost=0.00..21.93 rows=1 width=8)
        Filter: ((userid = 123456) AND (locationid IS NULL))
        ->  Limit  (cost=0.00..15.30 rows=530 width=102)
              ->  Seq Scan on staff  (cost=0.00..15.30 rows=530 width=102)
  ->  Bitmap Heap Scan on "location" l  (cost=243.50..34133.68
rows=12777 width=8)
        Recheck Cond: (s.proprietorid = l.proprietorid)
        ->  Bitmap Index Scan on idx_location_proprietorid_locationid
(cost=0.00..240.30 rows=12777 width=0)
              Index Cond: (s.proprietorid = l.proprietorid)

The EXPLAIN ANALYZE is:

Hash Join  (cost=23.16..129297.25 rows=2022281 width=12) (actual
time=62.315..48632.406 rows=525 loops=1)
  Hash Cond: (l.proprietorid = staff.proprietorid)
  ->  Seq Scan on "location" l  (cost=0.00..101337.11 rows=2057111
width=8) (actual time=0.056..44504.431 rows=2057111 loops=1)
  ->  Hash  (cost=16.63..16.63 rows=523 width=8) (actual
time=46.411..46.411 rows=525 loops=1)
        ->  Seq Scan on staff  (cost=0.00..16.63 rows=523 width=8)
(actual time=0.022..45.428 rows=525 loops=1)
              Filter: ((userid = 123456) AND (locationid IS NULL))
Total runtime: 48676.282 ms

Now, the interesting thing is, if I add "LIMIT 5000" into that inner
subquery on the staff table, it no longer seq scans location, and the
whole thing runs in less than a second.

SELECT s.ProprietorId, l.LocationId, s.RoleId
    FROM Location l
    INNER JOIN (
        SELECT *
        FROM Staff
        LIMIT 5000     -- Only change; remember, this
table                          -- only has 530 rows
    ) s ON l.ProprietorId = s.ProprietorId
    WHERE s.UserId = 123456
    AND s.LocationId IS NULL

EXPLAIN:

Nested Loop  (cost=243.50..34315.32 rows=10286 width=12)
  ->  Subquery Scan s  (cost=0.00..21.93 rows=1 width=8)
        Filter: ((userid = 123456) AND (locationid IS NULL))
        ->  Limit  (cost=0.00..15.30 rows=530 width=102)
              ->  Seq Scan on staff  (cost=0.00..15.30 rows=530 width=102)
  ->  Bitmap Heap Scan on "location" l  (cost=243.50..34133.68
rows=12777 width=8)
        Recheck Cond: (s.proprietorid = l.proprietorid)
        ->  Bitmap Index Scan on idx_location_proprietorid_locationid
(cost=0.00..240.30 rows=12777 width=0)
              Index Cond: (s.proprietorid = l.proprietorid)

EXPLAIN ANALYZE:

Nested Loop  (cost=243.50..34315.32 rows=10286 width=12) (actual
time=74.097..569.372 rows=525 loops=1)
  ->  Subquery Scan s  (cost=0.00..21.93 rows=1 width=8) (actual
time=16.452..21.092 rows=525 loops=1)
        Filter: ((userid = 123456) AND (locationid IS NULL))
        ->  Limit  (cost=0.00..15.30 rows=530 width=102) (actual
time=16.434..19.128 rows=530 loops=1)
              ->  Seq Scan on staff  (cost=0.00..15.30 rows=530
width=102) (actual time=16.429..17.545 rows=530 loops=1)
  ->  Bitmap Heap Scan on "location" l  (cost=243.50..34133.68
rows=12777 width=8) (actual time=1.027..1.029 rows=1 loops=525)
        Recheck Cond: (s.proprietorid = l.proprietorid)
        ->  Bitmap Index Scan on idx_location_proprietorid_locationid
(cost=0.00..240.30 rows=12777 width=0) (actual time=0.151..0.151 rows=1
loops=525)
              Index Cond: (s.proprietorid = l.proprietorid)
Total runtime: 570.868 ms

This confuses me.  As far as I can tell, the EXPLAIN output is the same
regardless of whether LIMIT 5000 is in there or not.  However, I don't
know why a) the EXPLAIN ANALYZE plan is different in the first case,
where there is no LIMIT 5000, or b) why adding a LIMIT 5000 onto a table
would change anything when the table has only 530 rows in it.
Furthermore, I can repeat this experiment over and over, so I know that
its not caching.  Removing the LIMIT 5000 returns performance to > 45
seconds.

I've ANALYZEd both tables, so I'm relatively certain statistics are up
to date.  This is test data, so there are no ongoing
inserts/updates/deletes -- only selects.

I'd really prefer this query run in < 1 second rather than > 45, but I'd
really like to do that without having hacks like adding in pointless
LIMIT clauses.

Any help would be much appreciated.

--Colin McGuigan


Re: Odd problem with planner choosing seq scan

From
Tom Lane
Date:
Colin McGuigan <cmcguigan@earthcomber.com> writes:
>   ->  Subquery Scan s  (cost=0.00..21.93 rows=1 width=8)
>         Filter: ((userid = 123456) AND (locationid IS NULL))
>         ->  Limit  (cost=0.00..15.30 rows=530 width=102)
>               ->  Seq Scan on staff  (cost=0.00..15.30 rows=530 width=102)

There does seem to be a bug here, but not the one you think: the rows=1
estimate for the subquery node seems a bit silly given that it knows
there are 530 rows in the underlying query.  I'm not sure how bright the
code is about finding stats for variables emitted by a subquery, but
even with totally default estimates it should not come up with a
selectivity of 1/500 for the filter.  Unfortunately, fixing that is
likely to bias it further away from the plan you want ...

> Furthermore, I can repeat this experiment over and over, so I know that
> its not caching.

You mean it *is* caching.

> I'd really prefer this query run in < 1 second rather than > 45, but I'd
> really like to do that without having hacks like adding in pointless
> LIMIT clauses.

The right way to do it is to adjust the planner cost parameters.
The standard values of those are set on the assumption of
tables-much-bigger-than-memory, a situation in which the planner's
preferred plan probably would be the best.  What you are testing here
is most likely a situation in which the whole of both tables fits in
RAM.  If that pretty much describes your production situation too,
then you should decrease seq_page_cost and random_page_cost.  I find
setting them both to 0.1 produces estimates that are more nearly in
line with true costs for all-in-RAM situations.

(Pre-8.2, there's no seq_page_cost, so instead set random_page_cost
to 1 and inflate all the cpu_xxx cost constants by 10.)

            regards, tom lane

Re: Odd problem with planner choosing seq scan

From
Colin McGuigan
Date:
Tom Lane wrote:
> The right way to do it is to adjust the planner cost parameters.
> The standard values of those are set on the assumption of
> tables-much-bigger-than-memory, a situation in which the planner's
> preferred plan probably would be the best.  What you are testing here
> is most likely a situation in which the whole of both tables fits in
> RAM.  If that pretty much describes your production situation too,
> then you should decrease seq_page_cost and random_page_cost.  I find
> setting them both to 0.1 produces estimates that are more nearly in
> line with true costs for all-in-RAM situations.
>
I know I can do it by adjusting cost parameters, but I was really
curious as to why adding a "LIMIT 5000" onto a SELECT from a table with
only 530 rows in it would affect matters at all.  The plan the planner
uses when LIMIT 5000 is on is the one I want, without adjusting any
performance costs.  It doesn't seem to matter what the limit is -- LIMIT
99999 also produces the desired plan, whereas no LIMIT produces the
undesirable plan.

--Colin McGuigan

Re: Odd problem with planner choosing seq scan

From
Tom Lane
Date:
Colin McGuigan <cmcguigan@earthcomber.com> writes:
> I know I can do it by adjusting cost parameters, but I was really
> curious as to why adding a "LIMIT 5000" onto a SELECT from a table with
> only 530 rows in it would affect matters at all.

The LIMIT prevents the sub-select from being flattened into the main
query.  In the current code this has a side-effect of preventing any
statistical information from being used to estimate the selectivity
of the filter conditions --- so you get a default rowcount estimate
that's way too small, and that changes the shape of the join plan.
It's giving you the "right" answer for entirely the wrong reason.

            regards, tom lane