Thread: Yet another slow nested loop

Yet another slow nested loop

From
Alexander Staubo
Date:
Here's the query:

select photos.*
from photos
inner join event_participations on
  event_participations.user_id = photos.creator_id and
  event_participations.attend = true
inner join event_instances on
  event_instances.id = event_participations.event_instance_id
where (
  (event_instances.venue_id = 1290) and
  (photos.taken_at > (event_instances.time + interval '-3600 seconds')) and
  (photos.taken_at < (event_instances.time + interval '25200 seconds'))
)
order by taken_at desc
limit 20

It occasionally takes four minutes to run:


             QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..10997.65 rows=20 width=116) (actual
time=262614.474..262614.474 rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..5729774.95 rows=10420 width=116)
(actual time=262614.470..262614.470 rows=0 loops=1)
         Join Filter: ((photos.taken_at > (event_instances."time" +
'-01:00:00'::interval)) AND (photos.taken_at < (event_instances."time"
+ '07:00:00'::interval)))
         ->  Nested Loop  (cost=0.00..2055574.35 rows=11869630
width=120) (actual time=21.750..121838.012 rows=14013998 loops=1)
               ->  Index Scan Backward using photos_taken_at on photos
 (cost=0.00..40924.34 rows=544171 width=116) (actual
time=14.997..1357.724 rows=544171 loops=1)
               ->  Index Scan using event_participations_user_id_index
on event_participations  (cost=0.00..2.95 rows=60 width=8) (actual
time=0.007..0.159 rows=26 loops=544171)
                     Index Cond: (event_participations.user_id =
photos.creator_id)
                     Filter: event_participations.attend
         ->  Index Scan using event_instances_pkey on event_instances
(cost=0.00..0.29 rows=1 width=12) (actual time=0.008..0.008 rows=0
loops=14013998)
               Index Cond: (event_instances.id =
event_participations.event_instance_id)
               Filter: (event_instances.venue_id = 1290)
 Total runtime: 262614.585 ms

With enable_nestloop to false, it takes about 1 second to run.

Database is freshly analyzed and vacuumed. Default statistics target
is 100. I have tried increasing the stats on
event_participations.user_id, event_participations.event_instance_id
and photos.taken_at to 1000, but no improvement.

This is PostgreSQL 8.3.3.

A.

Re: Yet another slow nested loop

From
"Dave Dutcher"
Date:
> -----Original Message-----
> From: Alexander Staubo
>
>    ->  Nested Loop  (cost=0.00..5729774.95 rows=10420 width=116)
> (actual time=262614.470..262614.470 rows=0 loops=1)
>          Join Filter: ((photos.taken_at > (event_instances."time" +
> '-01:00:00'::interval)) AND (photos.taken_at < (event_instances."time"
> + '07:00:00'::interval)))
>          ->  Nested Loop  (cost=0.00..2055574.35 rows=11869630
> width=120) (actual time=21.750..121838.012 rows=14013998 loops=1)


Do you have any of the other enable_* options set to false?  What do you
have random_page_cost set to?  I ask because I'm surprised to see postgres
choose to loop when it knows it will have to loop 11 million times.

Dave



Re: Yet another slow nested loop

From
Alexander Staubo
Date:
On Tue, Jun 16, 2009 at 3:56 PM, Dave Dutcher<dave@tridecap.com> wrote:
>> -----Original Message-----
>> From: Alexander Staubo
>>
>>    ->  Nested Loop  (cost=0.00..5729774.95 rows=10420 width=116)
>> (actual time=262614.470..262614.470 rows=0 loops=1)
>>          Join Filter: ((photos.taken_at > (event_instances."time" +
>> '-01:00:00'::interval)) AND (photos.taken_at < (event_instances."time"
>> + '07:00:00'::interval)))
>>          ->  Nested Loop  (cost=0.00..2055574.35 rows=11869630
>> width=120) (actual time=21.750..121838.012 rows=14013998 loops=1)
>
>
> Do you have any of the other enable_* options set to false?

No.

> What do you
> have random_page_cost set to?  I ask because I'm surprised to see postgres
> choose to loop when it knows it will have to loop 11 million times.

The default, ie. 4.0.

A.

Re: Yet another slow nested loop

From
Tom Lane
Date:
Alexander Staubo <alex@bengler.no> writes:
> Here's the query:
> select photos.*
> from photos
> inner join event_participations on
>   event_participations.user_id = photos.creator_id and
>   event_participations.attend = true
> inner join event_instances on
>   event_instances.id = event_participations.event_instance_id
> where (
>   (event_instances.venue_id = 1290) and
>   (photos.taken_at > (event_instances.time + interval '-3600 seconds')) and
>   (photos.taken_at < (event_instances.time + interval '25200 seconds'))
> )
> order by taken_at desc
> limit 20

> It occasionally takes four minutes to run:

Actually the easiest way to fix that is to get rid of the LIMIT.
(Maybe use a cursor instead, and fetch only twenty rows.)  LIMIT
magnifies the risks from any estimation error, and you've got a lot
of that here ...

            regards, tom lane

Re: Yet another slow nested loop

From
Alexander Staubo
Date:
On Tue, Jun 16, 2009 at 4:36 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Actually the easiest way to fix that is to get rid of the LIMIT.
> (Maybe use a cursor instead, and fetch only twenty rows.)  LIMIT
> magnifies the risks from any estimation error, and you've got a lot
> of that here ...

There's no cursor support in ActiveRecord, the ORM library we use, and
I'm not going to write it. Anyway, I would prefer not to gloss over
the underlying problem with something that requires a "TODO" next to
it. What can be done to fix the underlying problem? Nothing?

A.

Re: Yet another slow nested loop

From
Robert Haas
Date:
On Tue, Jun 16, 2009 at 11:16 AM, Alexander Staubo<alex@bengler.no> wrote:
> On Tue, Jun 16, 2009 at 4:36 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> Actually the easiest way to fix that is to get rid of the LIMIT.
>> (Maybe use a cursor instead, and fetch only twenty rows.)  LIMIT
>> magnifies the risks from any estimation error, and you've got a lot
>> of that here ...
>
> There's no cursor support in ActiveRecord, the ORM library we use, and
> I'm not going to write it. Anyway, I would prefer not to gloss over
> the underlying problem with something that requires a "TODO" next to
> it. What can be done to fix the underlying problem? Nothing?

Basically, we need a system that can accurately estimate multi-column
selectivity, or else some kind of planner hints.

http://archives.postgresql.org/pgsql-performance/2009-06/msg00023.php
http://archives.postgresql.org/pgsql-performance/2009-06/msg00119.php

(with apologies for linking to my own posts, but you can go back and
read the whole thread if you're interested)

...Robert