Re: Is this a planner bug? - Mailing list pgsql-general

From Tom Lane
Subject Re: Is this a planner bug?
Date
Msg-id 28959.1398180038@sss.pgh.pa.us
Whole thread Raw
In response to Re: Is this a planner bug?  (Torsten Förtsch <torsten.foertsch@gmx.net>)
List pgsql-general
=?UTF-8?B?VG9yc3RlbiBGw7ZydHNjaA==?= <torsten.foertsch@gmx.net> writes:
> Using the index:

> Limit  (cost=0.57..2.95 rows=1 width=0)
>        (actual time=0.095..0.095 rows=1 loops=1)
>    ->  Index Scan ... (cost=0.57..14857285.83 rows=6240539 width=0)
>                       (actual time=0.095..0.095 rows=1 loops=1)
>          Index Cond:...
>          Filter: ...
>          Rows Removed by Filter: 4
>  Total runtime: 0.147 ms


> seq scan:

> Limit  (cost=0.00..1.12 rows=1 width=0)
>        (actual time=0.943..0.944 rows=1 loops=1)
>    ->  Seq Scan ...  (cost=0.00..6967622.77 rows=6240580 width=0)
>                      (actual time=0.940..0.940 rows=1 loops=1)
>          Filter: ...
>          Rows Removed by Filter: 215
>  Total runtime: 0.997 ms

So the real question here is whether 215 rows skipped to find the first
matching row is more or less than the statistical expectation.

You said the table has 80M rows, so the planner evidently thinks the
filter has selectivity 6240580/80M or about 1/13, so it would have been
expecting the scan to find a match after about 13 rows on average.  Having
to scan 215 rows is thus considerably more than it was guessing.  If we
had statistics that would allow a better guess at where the first matching
row is, then indeed this would be a planner bug --- but it's not a bug,
it's just a limitation of the available statistical data.

What's more interesting is that the index scan only had to skip 4 rows
to get a match.  Is it getting unduly lucky rather than unduly unlucky?

There have been some discussions of intentionally penalizing the estimates
for small-LIMIT plans, so that we assume worse-than-random placement of
the first few matching rows.  That would kick up the estimate for this
seqscan all right, but I'm unsure that it wouldn't kick up the estimate
for the indexscan just as much.

            regards, tom lane


pgsql-general by date:

Previous
From: Torsten Förtsch
Date:
Subject: Re: Is this a planner bug?
Next
From: Adrian Klaver
Date:
Subject: Re: could not rename temporary statistics file "/run/shm/pgstat.tmp" to "/run/shm/pgstat.stat": No such file or directory