Re: Slow query with backwards index scan - Mailing list pgsql-performance

From Nis Jørgensen
Subject Re: Slow query with backwards index scan
Date
Msg-id f8l3oc$6lp$1@sea.gmane.org
Whole thread Raw
In response to Re: Slow query with backwards index scan  (Tilmann Singer <tils-pgsql@tils.net>)
Responses Re: Slow query with backwards index scan
List pgsql-performance
Tilmann Singer skrev:

> But the subselect is not fast for the user with many relationships and
> matched rows at the beginning of the sorted large_table:
>
> testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
>  WHERE user_id IN (SELECT contact_id FROM relationships WHERE
>  user_id=55555)
>  ORDER BY created_at DESC LIMIT 10;
>                                                                                        QUERY PLAN
                                    
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=6963.94..6963.96 rows=10 width=621) (actual time=53187.349..53187.424 rows=10 loops=1)
>    ->  Sort  (cost=6963.94..6966.96 rows=1211 width=621) (actual time=53187.341..53187.360 rows=10 loops=1)
>          Sort Key: lt.created_at
>          ->  Nested Loop  (cost=39.52..6901.92 rows=1211 width=621) (actual time=201.728..52673.800 rows=69018
loops=1)
>                ->  HashAggregate  (cost=39.52..39.53 rows=1 width=4) (actual time=178.777..178.966 rows=40 loops=1)
>                      ->  Bitmap Heap Scan on relationships  (cost=4.33..39.49 rows=10 width=4) (actual
time=47.049..178.560rows=40 loops=1) 
>                            Recheck Cond: (user_id = 55555)
>                            ->  Bitmap Index Scan on relationships_user_id_contact_id_index  (cost=0.00..4.33 rows=10
width=0)(actual time=28.721..28.721 rows=40 loops=1) 
>                                  Index Cond: (user_id = 55555)
>                ->  Index Scan using large_user_id_started_at_index on large_table lt  (cost=0.00..6834.04 rows=2268
width=621)(actual time=21.994..1301.375 rows=1725 loops=40) 
>                      Index Cond: (lt.user_id = relationships.contact_id)
>  Total runtime: 53188.584 ms
>
>
>
> Using a join now the query for mat

> Any ideas?

It seems to me the subselect plan would benefit quite a bit from not
returning all rows, but only the 10 latest for each user. I believe the
problem is similar to what is discussed for UNIONs here:


http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=st&q=postgresql+limit++union&rnum=1&hl=en#367f5052b1bbf1c8

which got implemented here:


http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=st&q=postgresql+limit++union&rnum=26&hl=en#9b3e5bd2d612f565

It seems to me the planner in this case would actually need to push the
limit into the nested loop, since we want the plan to take advantage of
the index (using a backwards index scan). I am ready to be corrected though.

You could try this (quite hackish) attempt at forcing the query planner
to use this plan:

SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE
 user_id=55555) AND created_at in (SELECT created_at FROM large_table
lt2 WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10)
 ORDER BY created_at DESC LIMIT 10;

If that doesn't work, you might have reached the point where you need to
use some kind of record-keeping system to keep track of which records to
look at.

/Nis

pgsql-performance by date:

Previous
From: "Steven Flatt"
Date:
Subject: Re: Vacuum looping?
Next
From: Tilmann Singer
Date:
Subject: Re: Slow query with backwards index scan