Re: Slow query with backwards index scan - Mailing list pgsql-performance
From | Tilmann Singer |
---|---|
Subject | Re: Slow query with backwards index scan |
Date | |
Msg-id | 20070728205450.GI19392@tils.net Whole thread Raw |
In response to | Re: Slow query with backwards index scan (Craig James <craig_james@emolecules.com>) |
Responses |
Re: Slow query with backwards index scan
|
List | pgsql-performance |
* Craig James <craig_james@emolecules.com> [20070728 22:00]: > >>SELECT * FROM ( > >> (SELECT * FROM large_table lt > >> WHERE lt.user_id = 12345 > >> ORDER BY created_at DESC LIMIT 10) AS q1 > >> UNION > >> (SELECT * FROM large_table lt > >> WHERE user_id IN (SELECT contact_id FROM relationships WHERE > >> user_id=12345) > >> ORDER BY created_at DESC LIMIT 10) AS q2 > >>ORDER BY created_at DESC LIMIT 10; > > > >It's not possible to use ORDER BY or LIMIT within unioned queries. > > > >http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION > > If I'm reading this documentation correctly, it *is* possible, as long as > they're inside of a sub-select, as in this case. I completely overlooked that obvious note in the documentation, sorry. I tried it only with the aliases which fooled me into thinking that doesn't work at all: testdb=# (select 1 limit 1) as q1 union (select 2) as q2; ERROR: syntax error at or near "as" LINE 1: (select 1 limit 1) as q1 union (select 2) as q2; ^ but this works: testdb=# (select 1 limit 1) union (select 2); ?column? ---------- 1 2 Great - that works! What I didn't realize in the original post is that the problem actually seems to be how to retrieve the rows from large_table for the correlated relationships in an efficient way - the second of the two queries that could be UNIONed. Using a subselect is efficient for the user with few relationships and matched rows at the end of the sorted large_table: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=6963.94..6963.96 rows=10 width=621) (actual time=94.598..94.629 rows=4 loops=1) -> Sort (cost=6963.94..6966.96 rows=1211 width=621) (actual time=94.592..94.602 rows=4 loops=1) Sort Key: lt.created_at -> Nested Loop (cost=39.52..6901.92 rows=1211 width=621) (actual time=85.670..94.547 rows=4 loops=1) -> HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual time=23.549..23.552 rows=1 loops=1) -> Bitmap Heap Scan on relationships (cost=4.33..39.49 rows=10 width=4) (actual time=23.526..23.530rows=1 loops=1) Recheck Cond: (user_id = 12345) -> Bitmap Index Scan on relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0)(actual time=0.027..0.027 rows=1 loops=1) Index Cond: (user_id = 12345) -> Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..6834.04 rows=2268 width=621)(actual time=62.108..70.952 rows=4 loops=1) Index Cond: (lt.user_id = relationships.contact_id) Total runtime: 94.875 ms But the subselect is not fast for the user with many relationships and matched rows at the beginning of the sorted large_table: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=55555) ORDER BY created_at DESC LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=6963.94..6963.96 rows=10 width=621) (actual time=53187.349..53187.424 rows=10 loops=1) -> Sort (cost=6963.94..6966.96 rows=1211 width=621) (actual time=53187.341..53187.360 rows=10 loops=1) Sort Key: lt.created_at -> Nested Loop (cost=39.52..6901.92 rows=1211 width=621) (actual time=201.728..52673.800 rows=69018 loops=1) -> HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual time=178.777..178.966 rows=40 loops=1) -> Bitmap Heap Scan on relationships (cost=4.33..39.49 rows=10 width=4) (actual time=47.049..178.560rows=40 loops=1) Recheck Cond: (user_id = 55555) -> Bitmap Index Scan on relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0)(actual time=28.721..28.721 rows=40 loops=1) Index Cond: (user_id = 55555) -> Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..6834.04 rows=2268 width=621)(actual time=21.994..1301.375 rows=1725 loops=40) Index Cond: (lt.user_id = relationships.contact_id) Total runtime: 53188.584 ms Using a join now the query for matches for the user with little data is slow: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt JOIN relationships r ON lt.user_id=r.contact_id WHERE r.user_id=12345 ORDER BY lt.created_at DESC LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..24116.65 rows=10 width=645) (actual time=100348.436..145552.633 rows=4 loops=1) -> Nested Loop (cost=0.00..28751864.52 rows=11922 width=645) (actual time=100348.429..145552.602 rows=4 loops=1) -> Index Scan Backward using large_created_at_index on large_table lt (cost=0.00..824833.09 rows=4384343 width=621)(actual time=28.961..82448.167 rows=4384064 loops=1) -> Index Scan using relationships_user_id_contact_id_index on relationships r (cost=0.00..6.36 rows=1 width=24)(actual time=0.009..0.009 rows=0 loops=4384064) Index Cond: ((r.user_id = 12345) AND (lt.user_id = r.contact_id)) Total runtime: 145552.809 ms And for the user with much data it's fast: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt JOIN relationships r ON lt.user_id=r.contact_id WHERE r.user_id=55555 ORDER BY lt.created_at DESC LIMIT 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..24116.65 rows=10 width=645) (actual time=0.068..0.428 rows=10 loops=1) -> Nested Loop (cost=0.00..28751864.52 rows=11922 width=645) (actual time=0.063..0.376 rows=10 loops=1) -> Index Scan Backward using large_created_at_index on large_table lt (cost=0.00..824833.09 rows=4384343 width=621)(actual time=0.028..0.064 rows=13 loops=1) -> Index Scan using relationships_user_id_contact_id_index on relationships r (cost=0.00..6.36 rows=1 width=24)(actual time=0.010..0.013 rows=1 loops=13) Index Cond: ((r.user_id = 55555) AND (lt.user_id = r.contact_id)) Total runtime: 0.609 ms Any ideas? tia, Til
pgsql-performance by date: