Thread: ORDER BY ... LIMIT and JOIN
Hello, I'm trying to optimize the follow query which returns the top users ordered by ranking. I'll show you my schema and "explain analyze" for each case. So, i'm asking two things: 1) Why "ranking" index is not used in the second query when sorting. 2) Am i missing some obvious optimization like a missing index? :) Schemas: # \d ranking Table "public.ranking" Column | Type | Modifiers -----------+-----------------------+----------- ranking | bigint | score | double precision | username | character varying(20) | not null variation | bigint | Indexes: "ranking_tmp_pkey1" PRIMARY KEY, btree (username) "idxrank_6057" btree (ranking) CLUSTER # \d user Table "public.user" Column | Type | Modifiers ------------+-----------------------+--------------------------------------------------- id | integer | not null default nextval('user_id_seq'::regclass) username | character varying(20) | not null about | text | name | character varying(50) | photo | text | country_id | integer | Indexes: "user_pkey" PRIMARY KEY, btree (username) "country_ranking_user_idx" btree (country_id) Explain: # explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON ("ranking"."username" = "user"."username") WHERE "user"."country_id" = 1 ORDER BY "ranking"."ranking" ASC LIMIT 100; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=13.03..13.04 rows=1 width=180) (actual time=965.229..965.302 rows=100 loops=1) -> Sort (cost=13.03..13.04 rows=1 width=180) (actual time=965.227..965.256 rows=100 loops=1) Sort Key: ranking.ranking Sort Method: top-N heapsort Memory: 56kB -> Nested Loop (cost=0.00..13.02 rows=1 width=180) (actual time=0.049..900.847 rows=57309 loops=1) -> Index Scan using country_ranking_user_idx on "user" (cost=0.00..6.49 rows=1 width=145) (actual time=0.023..57.633 rows=57309 loops=1) Index Cond: (country_id = 1) -> Index Scan using ranking_tmp_pkey1 on ranking (cost=0.00..6.52 rows=1 width=35) (actual time=0.013..0.013 rows=1 loops=57309) Index Cond: ((ranking.username)::text = ("user".username)::text) Total runtime: 965.412 ms (10 rows) # explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON ("ranking"."username" = "user"."username") ORDER BY "ranking"."ranking" ASC LIMIT 100; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..137.02 rows=100 width=180) (actual time=0.056..1.973 rows=100 loops=1) -> Nested Loop (cost=0.00..3081316.65 rows=2248753 width=180) (actual time=0.055..1.921 rows=100 loops=1) -> Index Scan using idxrank_6057 on ranking (cost=0.00..70735.73 rows=2248753 width=35) (actual time=0.021..0.076 rows=100 loops=1) -> Index Scan using user_pkey on "user" (cost=0.00..1.33 rows=1 width=145) (actual time=0.016..0.017 rows=1 loops=100) Index Cond: (("user".username)::text = (ranking.username)::text) Total runtime: 2.043 ms (6 rows) Thanks! Fz
On Saturday 08 August 2009 08:02:47 Fizu wrote: > -> Index Scan using country_ranking_user_idx on "user" > (cost=0.00..6.49 rows=1 width=145) (actual time=0.023..57.633 > rows=57309 loops=1) > Index Cond: (country_id = 1) The planner is expecting one user with country_id = 1, but instead there are 57309. Have you analyzed recently? Maybe increasing the statistics target will help. /Michael
On Sat, Aug 8, 2009 at 2:09 PM, Michael Andreen<harv@ruin.nu> wrote: > The planner is expecting one user with country_id = 1, but instead there are > 57309. Have you analyzed recently? Maybe increasing the statistics target will > help. > > /Michael Just after analyze user and ranking it still taking so long to order by an indexed field. # explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON ("ranking"."username" = "user"."username") WHERE "user"."country_id" = 5 ORDER BY "ranking"."ranking" ASC LIMIT 100; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=15340.13..15340.38 rows=100 width=178) (actual time=4955.795..4955.865 rows=100 loops=1) -> Sort (cost=15340.13..15343.69 rows=1425 width=178) (actual time=4955.794..4955.820 rows=100 loops=1) Sort Key: ranking.ranking Sort Method: top-N heapsort Memory: 56kB -> Nested Loop (cost=0.00..15285.67 rows=1425 width=178) (actual time=20.951..4952.337 rows=1972 loops=1) -> Index Scan using country_ranking_user_idx on "user" (cost=0.00..4807.25 rows=1710 width=143) (actual time=20.923..4898.931 rows=1972 loops=1) Index Cond: (country_id = 5) -> Index Scan using ranking_tmp_pkey on ranking (cost=0.00..6.12 rows=1 width=35) (actual time=0.024..0.025 rows=1 loops=1972) Index Cond: ((ranking.username)::text = ("user".username)::text) Total runtime: 4955.974 ms (10 rows) # explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON ("ranking"."username" = "user"."username") ORDER BY "ranking"."ranking" ASC LIMIT 100; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..136.78 rows=100 width=178) (actual time=0.058..1.870 rows=100 loops=1) -> Nested Loop (cost=0.00..3116910.51 rows=2278849 width=178) (actual time=0.056..1.818 rows=100 loops=1) -> Index Scan using idxrank_6224 on ranking (cost=0.00..71682.17 rows=2278849 width=35) (actual time=0.022..0.065 rows=100 loops=1) -> Index Scan using user_pkey on "user" (cost=0.00..1.32 rows=1 width=143) (actual time=0.015..0.016 rows=1 loops=100) Index Cond: (("user".username)::text = (ranking.username)::text) Total runtime: 1.946 ms (6 rows) Thank you! M
On Sunday 09 August 2009 21:26:08 Fizu wrote: > -> Index Scan using country_ranking_user_idx on "user" > (cost=0.00..4807.25 rows=1710 width=143) (actual > time=20.923..4898.931 rows=1972 loops=1) > Index Cond: (country_id = 5) The statistics looks good now, but almost all the time is still spent on fetching users with country_id = 5. The actual ordering is only a tiny part of the full cost. Why it takes time probably depends on your hardware in relation to database size. I guess the database doesn't fit in ram? What settings have you changed? Clustering users on country_ranking_user_idx would probably help for this specific case, but if it is a good idea depends on what other queries need to be fast. If the table or indexes are bloated then clustering on any index or doing reindex might do it. /Michael
On Sun, Aug 9, 2009 at 3:26 PM, Fizu<Fizu@advancedsl.com.ar> wrote: > -> Index Scan using country_ranking_user_idx on "user" > (cost=0.00..4807.25 rows=1710 width=143) (actual > time=20.923..4898.931 rows=1972 loops=1) > Index Cond: (country_id = 5) An index scan that picks up 1972 rows is taking 5 seconds? I think there must be something wrong with this index. Is it possible that since you apparently weren't analyzing this database, that maybe you didn't vacuum it either? If so, you should probably do a VACUUM FULL on your database and then a database-wide REINDEX, but at a minimum you should try reindexing this particular index. ...Robert