Thread: SQL select query becomes slow when using limit (with no offset)

SQL select query becomes slow when using limit (with no offset)

From
Kees van Dieren
Date:
Hi folks,

We have problems with performance of a simple SQL statement.

If we add a LIMIT 50, the query is about 6 times slower than without a limit (query returns 2 rows).

I have read this discussion: http://archives.postgresql.org/pgsql-performance/2008-09/msg00005.php but there seems to be no solution in it.

I tried this things: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server but changing settings doesn't have significant effect.

The DDL statements (create tables, indices) are attached.

The events_events table contains 375K rows, the events_event_types contains 71 rows.

The query:
select events_events.id FROM events_events
left join events_event_types on events_events.eventType_id=events_event_types.id
where events_event_types.severity=70
and events_events.cleared='f'
order by events_events.dateTime DESC

It takes 155ms to run this query (returning 2 rows)

After adding LIMIT 10, it takes 950 ms to run.

Query plan: without limit:
"Sort  (cost=20169.62..20409.50 rows=95952 width=16)"
"  Sort Key: events_events.datetime"
"  ->  Hash Join  (cost=2.09..12229.58 rows=95952 width=16)"
"        Hash Cond: (events_events.eventtype_id = events_event_types.id)"
"        ->  Seq Scan on events_events  (cost=0.00..9918.65 rows=359820 width=24)"
"              Filter: (NOT cleared)"
"        ->  Hash  (cost=1.89..1.89 rows=16 width=8)"
"              ->  Seq Scan on events_event_types  (cost=0.00..1.89 rows=16 width=8)"
"                    Filter: (severity = 70)"

Query plan: with limit:
"Limit  (cost=0.00..12.50 rows=10 width=16)"
"  ->  Nested Loop  (cost=0.00..119932.21 rows=95952 width=16)"
"        ->  Index Scan Backward using events_events_datetime_ind on events_events  (cost=0.00..18242.28 rows=359820 width=24)"
"              Filter: (NOT cleared)"
"        ->  Index Scan using events_event_types_pkey on events_event_types  (cost=0.00..0.27 rows=1 width=8)"
"              Index Cond: (events_event_types.id = events_events.eventtype_id)"
"              Filter: (events_event_types.severity = 70)"

So postgres seems to handle a query with limit different internally. Tried to set default_statistics_target to 10, 100, 200, but no significant differences.

This problem appears on both Postgres 8.3 and 8.4.

Any suggestions?

Thanks in advance!

Best regards,

Kees van Dieren

--
Squins | IT, Honestly
Oranjestraat 23
2983 HL Ridderkerk
The Netherlands
Phone: +31 (0)180 414520
Mobile: +31 (0)6 30413841
www.squins.com
Chamber of commerce Rotterdam: 22048547
Attachment

Re: SQL select query becomes slow when using limit (with no offset)

From
Greg Stark
Date:
On Fri, Jul 31, 2009 at 1:11 PM, Kees van Dieren<keesvandieren@gmail.com> wrote:
> It takes 155ms to run this query (returning 2 rows)
>
> Query plan: without limit:
> "Sort  (cost=20169.62..20409.50 rows=95952 width=16)"

Could you send the results of EXPLAIN ANALYZE for both queries?
Evidently the planner is expecting a lot more rows than the 2 rows
you're expecting but it's not clear where it's gone wrong.


--
greg
http://mit.edu/~gsstark/resume.pdf

> The query:
> select events_events.id FROM events_events
> left join events_event_types on events_events.eventType_id=
> events_event_types.id
> where events_event_types.severity=70
> and events_events.cleared='f'
> order by events_events.dateTime DESC

    The main problem seems to be lack of a suitable index...

- Try creating an index on events_events( eventType_id, cleared )
- Or the other way around : events_events( cleared, eventType_id )

    (depends on your other queries)

    Please try both and report EXPLAIN ANALYZE.

Re: SQL select query becomes slow when using limit (with no offset)

From
Kees van Dieren
Date:
Hi Folks,

Thanks for your response.

I have added the following index (suggested by other post):

CREATE INDEX events_events_cleared_eventtype
  ON events_events
  USING btree
  (eventtype_id, cleared)
  WHERE cleared = false;

Also with columns in reversed order.

No changes in response time noticed.

Index on cleared column already is there (indices are in sql file attached to initial post.). eventtype_id has a foreign key constraint, which adds an index automatically I believe?

The explain analyze results for both queries:
explain analyze select events_events.id FROM events_events
left join events_event_types on events_events.eventType_id=events_event_types.id
where events_event_types.severity=70
and not events_events.cleared
order by events_events.dateTime DESC LIMIT 100
>>>
"Limit  (cost=0.00..125.03 rows=100 width=16) (actual time=0.046..3897.094 rows=77 loops=1)"
"  ->  Nested Loop  (cost=0.00..120361.40 rows=96269 width=16) (actual time=0.042..3896.881 rows=77 loops=1)"
"        ->  Index Scan Backward using events_events_datetime_ind on events_events  (cost=0.00..18335.76 rows=361008 width=24) (actual time=0.025..720.345 rows=360637 loops=1)"
"              Filter: (NOT cleared)"
"        ->  Index Scan using events_event_types_pkey on events_event_types  (cost=0.00..0.27 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=360637)"
"              Index Cond: (events_event_types.id = events_events.eventtype_id)"
"              Filter: (events_event_types.severity = 70)"
"Total runtime: 3897.268 ms"

explain analyze select events_events.id FROM events_events
left join events_event_types on events_events.eventType_id=events_event_types.id
where events_event_types.severity=70
and not events_events.cleared
order by events_events.dateTime DESC
>>>
"Sort  (cost=20255.18..20495.85 rows=96269 width=16) (actual time=1084.842..1084.951 rows=77 loops=1)"
"  Sort Key: events_events.datetime"
"  Sort Method:  quicksort  Memory: 20kB"
"  ->  Hash Join  (cost=2.09..12286.62 rows=96269 width=16) (actual time=1080.789..1084.696 rows=77 loops=1)"
"        Hash Cond: (events_events.eventtype_id = events_event_types.id)"
"        ->  Seq Scan on events_events  (cost=0.00..9968.06 rows=361008 width=24) (actual time=0.010..542.946 rows=360637 loops=1)"
"              Filter: (NOT cleared)"
"        ->  Hash  (cost=1.89..1.89 rows=16 width=8) (actual time=0.077..0.077 rows=16 loops=1)"
"              ->  Seq Scan on events_event_types  (cost=0.00..1.89 rows=16 width=8) (actual time=0.010..0.046 rows=16 loops=1)"
"                    Filter: (severity = 70)"
"Total runtime: 1085.145 ms"

Any suggestions?

Thanks in advance!

Best regards,

Kees van Dieren


pgsql-performance@postgresql.org

2009/7/31 Greg Stark <gsstark@mit.edu>
On Fri, Jul 31, 2009 at 1:11 PM, Kees van Dieren<keesvandieren@gmail.com> wrote:
> It takes 155ms to run this query (returning 2 rows)
>
> Query plan: without limit:
> "Sort  (cost=20169.62..20409.50 rows=95952 width=16)"

Could you send the results of EXPLAIN ANALYZE for both queries?
Evidently the planner is expecting a lot more rows than the 2 rows
you're expecting but it's not clear where it's gone wrong.


--
greg
http://mit.edu/~gsstark/resume.pdf



--
Squins | IT, Honestly
Oranjestraat 23
2983 HL Ridderkerk
The Netherlands
Phone: +31 (0)180 414520
Mobile: +31 (0)6 30413841
www.squins.com
Chamber of commerce Rotterdam: 22048547

Re: SQL select query becomes slow when using limit (with no offset)

From
Russell Smith
Date:
Kees van Dieren wrote:
> Hi Folks,
>
> Thanks for your response.
>
> I have added the following index (suggested by other post):
>
> CREATE INDEX events_events_cleared_eventtype
>   ON events_events
>   USING btree
>   (eventtype_id, cleared)
>   WHERE cleared = false;
>
> Also with columns in reversed order.
>
> No changes in response time noticed.
>
> Index on cleared column already is there (indices are in sql file
> attached to initial post.). eventtype_id has a foreign key constraint,
> which adds an index automatically I believe?
>
> The explain analyze results for both queries:
> explain analyze select events_events.id <http://events_events.id> FROM
> events_events
> left join events_event_types on
> events_events.eventType_id=events_event_types.id
> <http://events_event_types.id>
> where events_event_types.severity=70
> and not events_events.cleared
> order by events_events.dateTime DESC LIMIT 100
> >>>
> "Limit  (cost=0.00..125.03 rows=100 width=16) (actual
> time=0.046..3897.094 rows=77 loops=1)"
> "  ->  Nested Loop  (cost=0.00..120361.40 rows=96269 width=16) (actual
> time=0.042..3896.881 rows=77 loops=1)"
> "        ->  Index Scan Backward using events_events_datetime_ind on
> events_events  (cost=0.00..18335.76 rows=361008 width=24) (actual
> time=0.025..720.345 rows=360637 loops=1)"
> "              Filter: (NOT cleared)"
> "        ->  Index Scan using events_event_types_pkey on
> events_event_types  (cost=0.00..0.27 rows=1 width=8) (actual
> time=0.003..0.003 rows=0 loops=360637)"
> "              Index Cond: (events_event_types.id
> <http://events_event_types.id> = events_events.eventtype_id)"
> "              Filter: (events_event_types.severity = 70)"
> "Total runtime: 3897.268 ms"
>
The plan here is guessing that we will find the 100 rows we want pretty
quickly by scanning the dateTime index.  As we aren't expecting to have
to look through many rows to find 100 that match the criteria.  With no
cross column statistics it's more a guess than a good calculation.  So
the guess is bad and we end up scanning 360k rows from the index before
we find what we want.   My skills are not up to giving specific advise
on how to avert this problem.  Maybe somebody else can help there.
> explain analyze select events_events.id <http://events_events.id> FROM
> events_events
> left join events_event_types on
> events_events.eventType_id=events_event_types.id
> <http://events_event_types.id>
> where events_event_types.severity=70
> and not events_events.cleared
> order by events_events.dateTime DESC
> >>>
> "Sort  (cost=20255.18..20495.85 rows=96269 width=16) (actual
> time=1084.842..1084.951 rows=77 loops=1)"
> "  Sort Key: events_events.datetime"
> "  Sort Method:  quicksort  Memory: 20kB"
> "  ->  Hash Join  (cost=2.09..12286.62 rows=96269 width=16) (actual
> time=1080.789..1084.696 rows=77 loops=1)"
> "        Hash Cond: (events_events.eventtype_id =
> events_event_types.id <http://events_event_types.id>)"
> "        ->  Seq Scan on events_events  (cost=0.00..9968.06
> rows=361008 width=24) (actual time=0.010..542.946 rows=360637 loops=1)"
> "              Filter: (NOT cleared)"
> "        ->  Hash  (cost=1.89..1.89 rows=16 width=8) (actual
> time=0.077..0.077 rows=16 loops=1)"
> "              ->  Seq Scan on events_event_types  (cost=0.00..1.89
> rows=16 width=8) (actual time=0.010..0.046 rows=16 loops=1)"
> "                    Filter: (severity = 70)"
> "Total runtime: 1085.145 ms"
>
> Any suggestions?
This plan is faster as you avoid the index scan.  The planner is
preferring to do a tablescan to find what it needs.  This is much faster
than the 360k random I/O index lookups.  You can force this type of plan
with a subquery and the OFFSET 0 trick, but I'm not sure it's the best
solution.

eg

explain analyze SELECT * FROM
    (SELECT events_events.id <http://events_events.id> FROM events_events
         LEFT JOIN events_event_types on
events_events.eventType_id=events_event_types.id
<http://events_event_types.id>
        WHERE events_event_types.severity=70
                     AND not events_events.cleared
        ORDER BY events_events.dateTime DESC OFFSET 0) AS a LIMIT 100

Regards

Russell

Re: SQL select query becomes slow when using limit (with no offset)

From
Kees van Dieren
Date:
Thanks for your response.

I think your analysis is correct, When there are more than 100 rows that match this query, limit 100 is fast.

However, we often have less than hundred rows, so this is not sufficient for us.

This suggestion ('OFFSET 0' trick) did not show differences in response time (runs in 947ms).

One thing that helps, is limiting the set by adding this to where clause:
and events_events.dateTime > '2009-07-24'.
(query now runs in approx 500ms)

The workaround we implemented,  is query caching in our application (Java, with JPA / Hibernate second level query cache). This actually solves the problem for us, but I'd prefer to get this query perform in postgres as well. I'd think that the Postgresql query planner should be smarter in handling LIMIT statements.

Would it get attention if I submit this to http://www.postgresql.org/support/submitbug ? (in fact it is not really a bug, but an improvement request).

Best regards,

Kees


2009/8/5 Russell Smith <mr-russ@pws.com.au>
Kees van Dieren wrote:
> Hi Folks,
>
> Thanks for your response.
>
> I have added the following index (suggested by other post):
>
> CREATE INDEX events_events_cleared_eventtype
>   ON events_events
>   USING btree
>   (eventtype_id, cleared)
>   WHERE cleared = false;
>
> Also with columns in reversed order.
>
> No changes in response time noticed.
>
> Index on cleared column already is there (indices are in sql file
> attached to initial post.). eventtype_id has a foreign key constraint,
> which adds an index automatically I believe?
>
> The explain analyze results for both queries:
> explain analyze select events_events.id <http://events_events.id> FROM
> events_events
> left join events_event_types on
> events_events.eventType_id=events_event_types.id
> <http://events_event_types.id>
> where events_event_types.severity=70
> and not events_events.cleared
> order by events_events.dateTime DESC LIMIT 100
> >>>
> "Limit  (cost=0.00..125.03 rows=100 width=16) (actual
> time=0.046..3897.094 rows=77 loops=1)"
> "  ->  Nested Loop  (cost=0.00..120361.40 rows=96269 width=16) (actual
> time=0.042..3896.881 rows=77 loops=1)"
> "        ->  Index Scan Backward using events_events_datetime_ind on
> events_events  (cost=0.00..18335.76 rows=361008 width=24) (actual
> time=0.025..720.345 rows=360637 loops=1)"
> "              Filter: (NOT cleared)"
> "        ->  Index Scan using events_event_types_pkey on
> events_event_types  (cost=0.00..0.27 rows=1 width=8) (actual
> time=0.003..0.003 rows=0 loops=360637)"
> "              Index Cond: (events_event_types.id
> <http://events_event_types.id> = events_events.eventtype_id)"
> "              Filter: (events_event_types.severity = 70)"
> "Total runtime: 3897.268 ms"
>
The plan here is guessing that we will find the 100 rows we want pretty
quickly by scanning the dateTime index.  As we aren't expecting to have
to look through many rows to find 100 that match the criteria.  With no
cross column statistics it's more a guess than a good calculation.  So
the guess is bad and we end up scanning 360k rows from the index before
we find what we want.   My skills are not up to giving specific advise
on how to avert this problem.  Maybe somebody else can help there.
> explain analyze select events_events.id <http://events_events.id> FROM
> events_events
> left join events_event_types on
> events_events.eventType_id=events_event_types.id
> <http://events_event_types.id>
> where events_event_types.severity=70
> and not events_events.cleared
> order by events_events.dateTime DESC
> >>>
> "Sort  (cost=20255.18..20495.85 rows=96269 width=16) (actual
> time=1084.842..1084.951 rows=77 loops=1)"
> "  Sort Key: events_events.datetime"
> "  Sort Method:  quicksort  Memory: 20kB"
> "  ->  Hash Join  (cost=2.09..12286.62 rows=96269 width=16) (actual
> time=1080.789..1084.696 rows=77 loops=1)"
> "        Hash Cond: (events_events.eventtype_id =
> events_event_types.id <http://events_event_types.id>)"
> "        ->  Seq Scan on events_events  (cost=0.00..9968.06
> rows=361008 width=24) (actual time=0.010..542.946 rows=360637 loops=1)"
> "              Filter: (NOT cleared)"
> "        ->  Hash  (cost=1.89..1.89 rows=16 width=8) (actual
> time=0.077..0.077 rows=16 loops=1)"
> "              ->  Seq Scan on events_event_types  (cost=0.00..1.89
> rows=16 width=8) (actual time=0.010..0.046 rows=16 loops=1)"
> "                    Filter: (severity = 70)"
> "Total runtime: 1085.145 ms"
>
> Any suggestions?
This plan is faster as you avoid the index scan.  The planner is
preferring to do a tablescan to find what it needs.  This is much faster
than the 360k random I/O index lookups.  You can force this type of plan
with a subquery and the OFFSET 0 trick, but I'm not sure it's the best
solution.

eg

explain analyze SELECT * FROM
   (SELECT events_events.id <http://events_events.id> FROM events_events
        LEFT JOIN events_event_types on
events_events.eventType_id=events_event_types.id
<http://events_event_types.id>
       WHERE events_event_types.severity=70
                    AND not events_events.cleared
       ORDER BY events_events.dateTime DESC OFFSET 0) AS a LIMIT 100

Regards

Russell



--
Squins | IT, Honestly
Oranjestraat 23
2983 HL Ridderkerk
The Netherlands
Phone: +31 (0)180 414520
Mobile: +31 (0)6 30413841
www.squins.com
Chamber of commerce Rotterdam: 22048547

Re: SQL select query becomes slow when using limit (with no offset)

From
Robert Haas
Date:
On Fri, Aug 7, 2009 at 4:00 AM, Kees van Dieren<keesvandieren@gmail.com> wrote:
> Would it get attention if I submit this to
> http://www.postgresql.org/support/submitbug ? (in fact it is not really a
> bug, but an improvement request).

I think that many of the people who read that mailing list also read
this one, including, critically, Tom Lane, and you're not the first
person to run into a problem caused by lack of cross-column
statistics.  I don't think you're going to get very far by submitting
this as a bug.  There are already several people interested in this
problem, but as most of us don't get paid to hack on PostgreSQL, it's
a question of finding enough round tuits; this is not an easy thing to
fix.

If you are sufficiently bothered by this problem that you are willing
to pay someone to fix it for you, there are several companies with
whom you can contract to get this feature developed and committed for
the next release of PostgreSQL.

...Robert

Re: SQL select query becomes slow when using limit (with no offset)

From
Scott Carey
Date:
On 8/7/09 5:53 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

> On Fri, Aug 7, 2009 at 4:00 AM, Kees van Dieren<keesvandieren@gmail.com>
> wrote:
>> Would it get attention if I submit this to
>> http://www.postgresql.org/support/submitbug ? (in fact it is not really a
>> bug, but an improvement request).
>
> I think that many of the people who read that mailing list also read
> this one, including, critically, Tom Lane, and you're not the first
> person to run into a problem caused by lack of cross-column
> statistics.

Critically, it should be understood that this general problem is not just
born from lack of cross-column statistics.

It is also one of using the statistical expected value to calculate cost
without consideration of the variance.  Often, the cost of a plan varies
widely and nonlinearly with a small change in the expected value the stats
used to estimate cost.

The problem is acute with LIMIT and various other boundary conditions where
there is a steep 'cost cliff' for certain plan types.  When LIMIT is
applied, the planner changes its estimates, but does not take into account
the _greatly_ increased uncertainty of those estimates.

Imagine a case where the planner's estimate is 100% correct, and on average
one side of a join will have 2 tuples.  The planner chooses nested loops to
do that join.  But the distribution of the number of tuples at this node is
skewed, so although the expected value is 2, a values of 10 and 0 are both
common.  When values of 10 occur, the execution time goes up significantly.
An alternate plan might be slower for the case where the actual values in
execution equal the expected values, but faster in the average case!
The average cost of a plan is NOT that cost of the query with average
statistics, due to variance, nonlinearity, and skew.  And even if they were
equal, it might not be preferable to choose the plan that is best on average
over the one that is best at the 90th percentile.

Getting back to the case above with the nestloop -- if the planner estimated
the cost of the nestloop join versus other joins with some idea of the
variance in the estimations it could favor more 'stable' execution plans.

So in short, cross-column statistics don't have to be gathered and used to
make this problem less acute.  The planner just has to be more aware of
variance and the things that lead to it, such as column correlation.
Thus with a LIMIT applied, the expected value for the number of tuples
scanned before a match will shrink, but the uncertainty of this estimate
grows significantly as well, so the right plan to choose is one that hedges
against the uncertainty, not the one that assumes the expected value is
correct.
Gathering and using cross column statistics will change the expected value
for some plans, but more importantly will allow the uncertainty to be
reduced.  Better stats go hand in hand with the uncertainty analysis because
one can never have all cross column statistics, across all tables and all
join-function spaces, analyzed.  Stats gathering can never be complete or
without flaws.  The planner can never be perfect.




> I don't think you're going to get very far by submitting
> this as a bug.  There are already several people interested in this
> problem, but as most of us don't get paid to hack on PostgreSQL, it's
> a question of finding enough round tuits; this is not an easy thing to
> fix.
>
> If you are sufficiently bothered by this problem that you are willing
> to pay someone to fix it for you, there are several companies with
> whom you can contract to get this feature developed and committed for
> the next release of PostgreSQL.
>
> ...Robert
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: SQL select query becomes slow when using limit (with no offset)

From
Robert Haas
Date:
On Fri, Aug 7, 2009 at 5:09 PM, Scott Carey<scott@richrelevance.com> wrote:
> On 8/7/09 5:53 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
>
>> On Fri, Aug 7, 2009 at 4:00 AM, Kees van Dieren<keesvandieren@gmail.com>
>> wrote:
>>> Would it get attention if I submit this to
>>> http://www.postgresql.org/support/submitbug ? (in fact it is not really a
>>> bug, but an improvement request).
>>
>> I think that many of the people who read that mailing list also read
>> this one, including, critically, Tom Lane, and you're not the first
>> person to run into a problem caused by lack of cross-column
>> statistics.
>
> Critically, it should be understood that this general problem is not just
> born from lack of cross-column statistics.
>
> It is also one of using the statistical expected value to calculate cost
> without consideration of the variance.  Often, the cost of a plan varies
> widely and nonlinearly with a small change in the expected value the stats
> used to estimate cost.
>
> The problem is acute with LIMIT and various other boundary conditions where
> there is a steep 'cost cliff' for certain plan types.  When LIMIT is
> applied, the planner changes its estimates, but does not take into account
> the _greatly_ increased uncertainty of those estimates.
>
> Imagine a case where the planner's estimate is 100% correct, and on average
> one side of a join will have 2 tuples.  The planner chooses nested loops to
> do that join.  But the distribution of the number of tuples at this node is
> skewed, so although the expected value is 2, a values of 10 and 0 are both
> common.  When values of 10 occur, the execution time goes up significantly.
> An alternate plan might be slower for the case where the actual values in
> execution equal the expected values, but faster in the average case!
> The average cost of a plan is NOT that cost of the query with average
> statistics, due to variance, nonlinearity, and skew.  And even if they were
> equal, it might not be preferable to choose the plan that is best on average
> over the one that is best at the 90th percentile.
>
> Getting back to the case above with the nestloop -- if the planner estimated
> the cost of the nestloop join versus other joins with some idea of the
> variance in the estimations it could favor more 'stable' execution plans.
>
> So in short, cross-column statistics don't have to be gathered and used to
> make this problem less acute.  The planner just has to be more aware of
> variance and the things that lead to it, such as column correlation.
> Thus with a LIMIT applied, the expected value for the number of tuples
> scanned before a match will shrink, but the uncertainty of this estimate
> grows significantly as well, so the right plan to choose is one that hedges
> against the uncertainty, not the one that assumes the expected value is
> correct.

This is a good analysis.  I think I proposed some kind of idea about
tracking uncertainty in the planner a while back, but I never got very
far with it.  The problem is, first, figuring out how to estimate the
uncertainty, and second, figuring out what to do with the result once
you've estimated it.  The concept of hedging against uncertainty is
the right one, I think, but it's not obvious how to fit that into the
cost-comparison algorithm that the planner uses.  A sticking point
here too is that the comparison of path costs is already a hot spot;
making it more complex will likely have a noticeable negative impact
on query planning time.  For queries against huge databases that may
not matter much, but for OLTP queries it certainly does.

There are a couple of methods that have been proposed to deal with
this in the past.  The most obvious one that's been talked about a
couple of times is switching a nested loop to a hash join if the
number of iterations exceeds some bound, which would require some
executor support.

Empirically, it seems to me that the planner generally follows a
pretty consistent pattern.  If the inner relation is tiny or the
number of loops is estimated to be very small, it uses a nested loop.
When the inner rel is a bit bigger, or the number of iterations is
nontrivial, it switches to a hash join.  When the inner relation gets
too big to fit in work_mem, or just big enough that hashing it looks
too slow, it switches to a nested loop with inner index-scan or,
especially if a useful sort is available, a merge join.

Just handling better the case where we pick a straight nested loop
rather than a hash join would help a lot of people.  Some basic
conservatism about the number of outer rows would be useful here (in
particular, we should probably assume that there will be at least 2
when costing a nest loop, unless the outer side is known unique), and
it's also worth thinking about the fact that a hash join won't build
the table unless there is at least 1 outer row, which I don't think
the current costing algorithm takes into account.  Or maybe we should
estimate the amount by which the nested loop figures to beat out the
hash join and only accepted the nested loop plan if the win exceeds
some threshold percentage.

...Robert

Re: SQL select query becomes slow when using limit (with no offset)

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> Just handling better the case where we pick a straight nested loop
> rather than a hash join would help a lot of people.  Some basic
> conservatism about the number of outer rows would be useful here (in
> particular, we should probably assume that there will be at least 2
> when costing a nest loop, unless the outer side is known unique),
> and it's also worth thinking about the fact that a hash join won't
> build the table unless there is at least 1 outer row, which I don't
> think the current costing algorithm takes into account.  Or maybe we
> should estimate the amount by which the nested loop figures to beat
> out the hash join and only accepted the nested loop plan if the win
> exceeds some threshold percentage.

But in our environment the most common cause of a sub-optimal planning
choice is over-estimating the cost of a nested loop.  We've never been
able to get good plans overall without dropping random_page_cost to
twice the seq_page_cost or less -- in highly cached situations we
routinely drop both to 0.1.  Creating further bias against nested loop
plans just means we'll have to push the numbers further from what the
purportedly represent.  It seems to me a significant unsolved problem
is properly accounting for caching effects.

That said, I have not really clear idea on how best to solve that
problem.  The only ideas which recur when facing these issues are:

(1)  The the planner refused to deal with fractional estimates of how
many rows will be returned in a loop -- it treats 0.01 as 1 on the
basis that you can't read a fractional row, rather than as a 1% chance
that you will read a row and need to do the related work.  I have
always thought that changing this might allow more accurate estimates;
perhaps I should hack a version which behaves that way and test it as
a "proof of concept."  Note that this is diametrically opposed to your
suggestion that we always assume at least two rows in the absence of a
unique index.

(2)  Somehow use effective_cache_size in combination with some sort of
current activity metrics to dynamically adjust random access costs.
(I know, that one's total hand-waving, but it seems to have some
possibility of better modeling reality than what we currently do.)

-Kevin

Re: SQL select query becomes slow when using limit (with no offset)

From
Robert Haas
Date:
On Mon, Aug 10, 2009 at 11:19 AM, Kevin
Grittner<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> Just handling better the case where we pick a straight nested loop
>> rather than a hash join would help a lot of people.  Some basic
>> conservatism about the number of outer rows would be useful here (in
>> particular, we should probably assume that there will be at least 2
>> when costing a nest loop, unless the outer side is known unique),
>> and it's also worth thinking about the fact that a hash join won't
>> build the table unless there is at least 1 outer row, which I don't
>> think the current costing algorithm takes into account.  Or maybe we
>> should estimate the amount by which the nested loop figures to beat
>> out the hash join and only accepted the nested loop plan if the win
>> exceeds some threshold percentage.
>
> But in our environment the most common cause of a sub-optimal planning
> choice is over-estimating the cost of a nested loop.  We've never been
> able to get good plans overall without dropping random_page_cost to
> twice the seq_page_cost or less -- in highly cached situations we
> routinely drop both to 0.1.

Even if our statistics were perfect, you'd still need to do this if
your database is mostly cached.  That's routine tuning.

> Creating further bias against nested loop
> plans just means we'll have to push the numbers further from what the
> purportedly represent.

Not at all.  You'd have to set them closer to their real values i.e.
the cost of reading a page from cache.

> It seems to me a significant unsolved problem
> is properly accounting for caching effects.

Definitely true.

> That said, I have not really clear idea on how best to solve that
> problem.  The only ideas which recur when facing these issues are:
>
> (1)  The the planner refused to deal with fractional estimates of how
> many rows will be returned in a loop -- it treats 0.01 as 1 on the
> basis that you can't read a fractional row, rather than as a 1% chance
> that you will read a row and need to do the related work.  I have
> always thought that changing this might allow more accurate estimates;
> perhaps I should hack a version which behaves that way and test it as
> a "proof of concept."  Note that this is diametrically opposed to your
> suggestion that we always assume at least two rows in the absence of a
> unique index.

You're right. The problem is that you can get hosed in both
directions.  My queries are always referencing a column that they have
a foreign key towards, so this never happens to me.  But it does
happen to other people.

> (2)  Somehow use effective_cache_size in combination with some sort of
> current activity metrics to dynamically adjust random access costs.
> (I know, that one's total hand-waving, but it seems to have some
> possibility of better modeling reality than what we currently do.)

Yeah, I gave a lightning talk on this at PGcon, but I haven't had time
to do anything with it.  There are a couple of problems.  One is that
you have to have a source for your current activity metrics.  Since a
lot of the pages of interest will be in the OS buffer pool rather than
PG shared buffers, there's no easy way to handle this, and you also
have to keep in mind that plans can be cached and reused, so you need
the estimates not to change too fast or you'll have horrible plan
stability problems.

The other is that right now, the page costs are constant.  Making them
per-relation will mean that they require syscache lookups.  I'm not
sure whether that's expensive enough to impact planning time, and if
so what to do about it.

...Robert

Re: SQL select query becomes slow when using limit (with no offset)

From
Devin Ben-Hur
Date:
Robert Haas wrote:
> On Mon, Aug 10, 2009 at 11:19 AM, Kevin Grittner<Kevin.Grittner@wicourts.gov> wrote:
>> (2)  Somehow use effective_cache_size in combination with some sort of
>> current activity metrics to dynamically adjust random access costs.
>> (I know, that one's total hand-waving, but it seems to have some
>> possibility of better modeling reality than what we currently do.)

I was disappointed when I learned that effective_cache_size doesn't get
generally used to predict the likelihood of a buffer fetch requiring
physical io.

> Yeah, I gave a lightning talk on this at PGcon, but I haven't had time
> to do anything with it.  There are a couple of problems.  One is that
> you have to have a source for your current activity metrics.  Since a
> lot of the pages of interest will be in the OS buffer pool rather than
> PG shared buffers, there's no easy way to handle this

While there are portability concerns, mmap + mincore works across BSD,
Linux, Solaris and will return a vector of file pages in the OS buffer
pool.  So it's certainly possible that on supported systems, an activity
monitor can have direct knowledge of OS caching effectiveness on a per
relation/index basis.

--
-Devin