Re: Slow query with backwards index scan - Mailing list pgsql-performance

From Tilmann Singer
Subject Re: Slow query with backwards index scan
Date
Msg-id 20070728125236.GD19392@tils.net
Whole thread Raw
In response to Re: Slow query with backwards index scan  (Nis Jørgensen <nis@superlativ.dk>)
List pgsql-performance
* Nis Jørgensen <nis@superlativ.dk> [20070727 20:31]:
> How does the "obvious" UNION query do - ie:
>
> SELECT * FROM (
> SELECT * FROM large_table lt
> WHERE lt.user_id = 12345
>
> UNION
>
> SELECT * FROM large_table lt
> WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
> ) q
>
> ORDER BY created_at DESC LIMIT 10;

Great for the user with little data:

testdb=# EXPLAIN ANALYZE SELECT * FROM (
SELECT * FROM large_table lt
WHERE lt.user_id = 12345
UNION
SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
) q
ORDER BY created_at DESC LIMIT 10;
                                                                                             QUERY PLAN
                                  

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14673.77..14673.80 rows=10 width=3140) (actual time=133.877..133.946 rows=10 loops=1)
   ->  Sort  (cost=14673.77..14682.22 rows=3378 width=3140) (actual time=133.870..133.894 rows=10 loops=1)
         Sort Key: q.created_at
         ->  Unique  (cost=14315.34..14442.01 rows=3378 width=622) (actual time=133.344..133.705 rows=38 loops=1)
               ->  Sort  (cost=14315.34..14323.78 rows=3378 width=622) (actual time=133.337..133.432 rows=38 loops=1)
                     Sort Key: id, user_id, plaze_id, device, started_at, updated_at, status, "type", duration,
permission,created_at, mac_address, subnet, msc 
                     ->  Append  (cost=0.00..14117.35 rows=3378 width=622) (actual time=39.144..133.143 rows=38
loops=1)
                           ->  Index Scan using large_user_id_started_at_index on large_table lt  (cost=0.00..7243.59
rows=2158width=622) (actual time=39.138..109.831 rows=34 loops=1) 
                                 Index Cond: (user_id = 12345)
                           ->  Nested Loop  (cost=42.78..6839.98 rows=1220 width=622) (actual time=14.859..23.112
rows=4loops=1) 
                                 ->  HashAggregate  (cost=42.78..42.79 rows=1 width=4) (actual time=8.092..8.095 rows=1
loops=1)
                                       ->  Bitmap Heap Scan on relationships  (cost=4.34..42.75 rows=11 width=4)
(actualtime=8.067..8.070 rows=1 loops=1) 
                                             Recheck Cond: (user_id = 12345)
                                             ->  Bitmap Index Scan on relationships_user_id_index  (cost=0.00..4.34
rows=11width=0) (actual time=8.057..8.057 rows=1 loops=1) 
                                                   Index Cond: (user_id = 12345)
                                 ->  Index Scan using large_user_id_started_at_index on large_table lt
(cost=0.00..6768.48rows=2297 width=622) (actual time=6.751..14.970 rows=4 loops=1) 
                                       Index Cond: (lt.user_id = relationships.contact_id)
 Total runtime: 134.220 ms


Not so great for the user with many early matches:

testdb=# EXPLAIN ANALYZE SELECT * FROM (
SELECT * FROM large_table lt
WHERE lt.user_id = 55555
UNION
SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=55555)
) q
ORDER BY created_at DESC LIMIT 10;
                                                                                               QUERY PLAN
                                  

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14673.77..14673.80 rows=10 width=3140) (actual time=3326.304..3326.367 rows=10 loops=1)
   ->  Sort  (cost=14673.77..14682.22 rows=3378 width=3140) (actual time=3326.297..3326.318 rows=10 loops=1)
         Sort Key: q.created_at
         ->  Unique  (cost=14315.34..14442.01 rows=3378 width=622) (actual time=2413.070..3019.385 rows=69757 loops=1)
               ->  Sort  (cost=14315.34..14323.78 rows=3378 width=622) (actual time=2413.062..2590.354 rows=69757
loops=1)
                     Sort Key: id, user_id, plaze_id, device, started_at, updated_at, status, "type", duration,
permission,created_at, mac_address, subnet, msc 
                     ->  Append  (cost=0.00..14117.35 rows=3378 width=622) (actual time=0.067..1911.626 rows=69757
loops=1)
                           ->  Index Scan using large_user_id_started_at_index on large_table lt  (cost=0.00..7243.59
rows=2158width=622) (actual time=0.062..3.440 rows=739 loops=1) 
                                 Index Cond: (user_id = 55555)
                           ->  Nested Loop  (cost=42.78..6839.98 rows=1220 width=622) (actual time=0.451..1557.901
rows=69018loops=1) 
                                 ->  HashAggregate  (cost=42.78..42.79 rows=1 width=4) (actual time=0.404..0.580
rows=40loops=1) 
                                       ->  Bitmap Heap Scan on relationships  (cost=4.34..42.75 rows=11 width=4)
(actualtime=0.075..0.241 rows=40 loops=1) 
                                             Recheck Cond: (user_id = 55555)
                                             ->  Bitmap Index Scan on relationships_user_id_index  (cost=0.00..4.34
rows=11width=0) (actual time=0.053..0.053 rows=40 loops=1) 
                                                   Index Cond: (user_id = 55555)
                                 ->  Index Scan using large_user_id_started_at_index on large_table lt
(cost=0.00..6768.48rows=2297 width=622) (actual time=0.048..28.033 rows=1725 loops=40) 
                                       Index Cond: (lt.user_id = relationships.contact_id)
 Total runtime: 3327.744 ms


> How about
>
> SELECT * FROM large_table lt
> WHERE lt.user_id = 12345 OR user_id IN (SELECT contact_id FROM
> relationships WHERE user_id=12345)
> ORDER BY created_at DESC LIMIT 10;

Not good for the one with few matches:

testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
WHERE lt.user_id = 12345 OR user_id IN (SELECT contact_id FROM
relationships WHERE user_id=12345)
ORDER BY created_at DESC LIMIT 10;
                                                                                     QUERY PLAN
                                  

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=42.78..47.05 rows=10 width=622) (actual time=38360.090..62008.336 rows=10 loops=1)
   ->  Index Scan Backward using large_created_at_index on large_table lt  (cost=42.78..935924.84 rows=2192498
width=622)(actual time=38360.084..62008.269 rows=10 loops=1) 
         Filter: ((user_id = 12345) OR (hashed subplan))
         SubPlan
           ->  Bitmap Heap Scan on relationships  (cost=4.34..42.75 rows=11 width=4) (actual time=0.031..0.034 rows=1
loops=1)
                 Recheck Cond: (user_id = 12345)
                 ->  Bitmap Index Scan on relationships_user_id_index  (cost=0.00..4.34 rows=11 width=0) (actual
time=0.020..0.020rows=1 loops=1) 
                       Index Cond: (user_id = 12345)
 Total runtime: 62008.500 ms


Good for the one with many early matches:

testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
WHERE lt.user_id = 55555 OR user_id IN (SELECT contact_id FROM
relationships WHERE user_id=55555)
ORDER BY created_at DESC LIMIT 10;
                                                                                 QUERY PLAN
                                  

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=42.78..47.05 rows=10 width=622) (actual time=0.473..0.572 rows=10 loops=1)
   ->  Index Scan Backward using large_created_at_index on large_table lt  (cost=42.78..935924.84 rows=2192498
width=622)(actual time=0.467..0.512 rows=10 loops=1) 
         Filter: ((user_id = 55555) OR (hashed subplan))
         SubPlan
           ->  Bitmap Heap Scan on relationships  (cost=4.34..42.75 rows=11 width=4) (actual time=0.070..0.264 rows=40
loops=1)
                 Recheck Cond: (user_id = 55555)
                 ->  Bitmap Index Scan on relationships_user_id_index  (cost=0.00..4.34 rows=11 width=0) (actual
time=0.047..0.047rows=40 loops=1) 
                       Index Cond: (user_id = 55555)
 Total runtime: 0.710 ms


> I am missing a unique constraint on (user_id, contact_id) - otherwise
> the subselect is not equivalent to the join.
>
> It might be good to know whether contact_id = user_id is possible -
> since this would rule out the possibility of a row satisfying both
> branches of the union.

Thanks for the hint - this is a rails project with the validations
starting out in the application only, so sometimes we forget to also
check the data integrity in the database. There were in fact a couple
of duplicate user_id/contact_id pairs and a couple of rows where
user_id=contact_id although those shouldn't be allowed.

I added a unique index on (user_id, contact_id) and a check constraint
to prevent (user_id=contact_id), vacuumed, and ran the 4 above query
variants again - but it did not result in changed plans or substantial
differences in execution time.

> Probably you also should have foreign key constraints on
> relationships.user_id and relationships.contact_id. These are unlikely
> to affect performance though, in my experience.

They are there, I just removed them from the schema output before
posting, also assuming that they are not relevant for join
performance.


Til

pgsql-performance by date:

Previous
From: "Steven Flatt"
Date:
Subject: Vacuum looping?
Next
From: Tom Lane
Date:
Subject: Re: Vacuum looping?