Index Selection: ORDER BY vs. PRIMARY KEY - Mailing list pgsql-performance
From | Thomas F. O'Connell |
---|---|
Subject | Index Selection: ORDER BY vs. PRIMARY KEY |
Date | |
Msg-id | 5DDEAB6E-36F9-422C-A44F-56E190B9FF55@sitening.com Whole thread Raw |
Responses |
Re: Index Selection: ORDER BY vs. PRIMARY KEY
|
List | pgsql-performance |
I have a query that looks roughly like this (I've removed irrelevant SELECT clause material and obfuscated names, trying to keep them consistent where altered in EXPLAIN output): SELECT u.emma_member_id, h.action_ts FROM user as u, history as h WHERE u.user_id = h.user_id AND h.action_id = '$constant_data' ORDER BY h.action_ts DESC LIMIT 100 OFFSET 0 The user table has ~25,000 rows. The history table has ~750,000 rows. Currently, there is an index on history.action_ts and a separate one on history.action_id. There's also a PRIMARY KEY on user.user_id. If I run the query as such, I get a plan like this: QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ -------------------------------------------------------- Limit (cost=0.00..2196.30 rows=100 width=925) (actual time=947.208..3178.775 rows=3 loops=1) -> Nested Loop (cost=0.00..83898.65 rows=3820 width=925) (actual time=947.201..3178.759 rows=3 loops=1) -> Index Scan Backward using h_action_ts_idx on history h (cost=0.00..60823.53 rows=3820 width=480) (actual time=946.730..3177.953 rows=3 loops=1) Filter: (action_id = $constant_data::bigint) -> Index Scan using user_pkey on user u (cost=0.00..6.01 rows=1 width=445) (actual time=0.156..0.161 rows=1 loops=3) Index Cond: (u.user_id = "outer".user_id) Total runtime: 3179.143 ms (7 rows) If I drop the index on the timestamp field, I get a plan like this: QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------- Limit (cost=17041.41..17041.66 rows=100 width=925) (actual time=201.725..201.735 rows=3 loops=1) -> Sort (cost=17041.41..17050.96 rows=3820 width=925) (actual time=201.719..201.722 rows=3 loops=1) Sort Key: h.action_ts -> Merge Join (cost=13488.15..16814.13 rows=3820 width=925) (actual time=7.306..201.666 rows=3 loops=1) Merge Cond: ("outer".user_id = "inner".user_id) -> Index Scan using user_pkey on user u (cost=0.00..3134.82 rows=26802 width=445) (actual time=0.204..151.351 rows=24220 loops=1) -> Sort (cost=13488.15..13497.70 rows=3820 width=480) (actual time=0.226..0.234 rows=3 loops=1) Sort Key: h.user_id -> Index Scan using h_action_id_idx on history h (cost=0.00..13260.87 rows=3820 width=480) (actual time=0.184..0.195 rows=3 loops=1) Index Cond: (action_id = $constant_data::bigint) Total runtime: 202.089 ms (11 rows) Clearly, if the index on the timestamp field is there, postgres wants to use it for the ORDER BY, even though the performance is worse. How is this preference made internally? If both indexes exist, will postgres always prefer the index on an ordered column? If I need the index on the timestamp field for other queries, is my best bet just to increase sort_mem for this query? Here's my version string: PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC 2.95.4 -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i™ http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax)
pgsql-performance by date: