Thread: indexes across joins not used for count
I'm running a query to do a count with two joins in it. I've added indexes to the tables for the join columns, but the explain of the query doesn't seem to be using the indexes: Table 1: invites (id:int) Table 2: sms_requests (id:int, invoker_id:int, invoker_type:string, sms_message_id:int) Indexes: "sms_requests_pkey" PRIMARY KEY, btree (id) "index_sms_requests_on_invoker_id_and_invoker_type" btree (invoker_id, invoker_type) "index_sms_requests_on_sms_message_id" btree (sms_message_id) Table 3: sms_messages (id:int, sent_at:timestamp) Indexes: "sms_messages_pkey" PRIMARY KEY, btree (id) "index_sms_messages_on_sent_at_partial" btree (sent_at) WHERE sent_at IS NULL "index_sms_messages_on_sent_at" btree (sent_at) Query: SELECT COUNT(*) FROM "invites" INNER JOIN "sms_requests" ON "sms_requests"."invoker_id" = "invites"."id" AND "sms_requests"."invoker_type" = 'Invite' INNER JOIN "sms_messages" ON "sms_messages"."id" = "sms_requests"."sms_message_id" WHERE "sms_messages"."sent_at" IS NOT NULL Explain: Aggregate (cost=165914.42..165914.43 rows=1 width=0) -> Hash Join (cost=92326.82..163534.87 rows=951821 width=0) Hash Cond: (sms_requests.sms_message_id = sms_messages.id) -> Hash Join (cost=32692.53..83674.38 rows=951821 width=4) Hash Cond: (invites.id = sms_requests.invoker_id) -> Seq Scan on invites (cost=0.00..27525.48 rows=1238948 width=4) -> Hash (cost=20794.76..20794.76 rows=951821 width=8) -> Seq Scan on sms_requests (cost=0.00..20794.76 rows=951821 width=8) Filter: ((invoker_type)::text = 'Invite'::text) -> Hash (cost=48180.24..48180.24 rows=916324 width=4) -> Seq Scan on sms_messages (cost=0.00..48180.24 rows=916324 width=4) Filter: (sent_at IS NOT NULL) This is pretty slow, ~5000ms on my development machine. I would have expected it to be able to make use of the indexes I've created. Any ideas on what I can do to make this perform better? Jeremy
On Tue, 2012-10-16 at 16:08 +1300, Jeremy Wells wrote: > I'm running a query to do a count with two joins in it. I've added > indexes to the tables for the join columns, but the explain of the query > doesn't seem to be using the indexes: Can you post the output of EXPLAIN ANALYZE? Did you do an ANALYZE of the tables already? You can often force an index scan by doing: SET enable_seqscan=false; So also try setting that, and run EXPLAIN ANALYZE again, and see if it uses the indexes, and if so, if it's faster. Regards, Jeff Davis
Jeff Davis wrote:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=129556.14..129556.14 rows=1 width=0) (actual time=21530.146..21530.147 rows=1 loops=1)
-> Hash Join (cost=81261.64..128882.20 rows=1347872 width=0) (actual time=6961.957..20437.245 rows=1340639 loops=1)
Hash Cond: (sms_requests.sms_message_id = sms_messages.id)
-> Hash Join (cost=22033.10..62240.37 rows=1347872 width=4) (actual time=3618.615..11410.322 rows=1340639 loops=1)
Hash Cond: (invites.id = sms_requests.invoker_id)
-> Seq Scan on invites (cost=0.00..33216.78 rows=1683927 width=4) (actual time=0.009..1983.440 rows=1683927 loops=1)
-> Hash (cost=17315.55..17315.55 rows=1347872 width=8) (actual time=3617.847..3617.847 rows=1347872 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 52652kB
-> Seq Scan on sms_requests (cost=0.00..17315.55 rows=1347872 width=8) (actual time=0.027..1651.786 rows=1347872 loops=1)
Filter: ((invoker_type)::text = 'Invite'::text)
-> Hash (cost=55408.40..55408.40 rows=1091467 width=4) (actual time=3342.029..3342.029 rows=1107628 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 38941kB
-> Seq Scan on sms_messages (cost=0.00..55408.40 rows=1091467 width=4) (actual time=0.009..1737.765 rows=1107628 loops=1)
Filter: (sent_at IS NOT NULL)
Total runtime: 21530.247 ms
SET
=> EXPLAIN ANALYZE SELECT COUNT(*) FROM "invites" INNER JOIN "sms_requests" ON "sms_requests"."invoker_id" = "invites"."id" AND "sms_requests"."invoker_type" = 'Invite' INNER JOIN "sms_messages" ON "sms_messages"."id" = "sms_requests"."sms_message_id" WHERE "sms_messages"."sent_at" IS NOT NULL;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=198896.67..198896.67 rows=1 width=0) (actual time=21923.378..21923.378 rows=1 loops=1)
-> Hash Join (cost=90165.47..198222.73 rows=1347872 width=0) (actual time=5465.116..20552.881 rows=1340639 loops=1)
Hash Cond: (sms_requests.invoker_id = invites.id)
-> Merge Join (cost=7.29..101325.20 rows=1347872 width=4) (actual time=0.035..9630.580 rows=1347872 loops=1)
Merge Cond: (sms_messages.id = sms_requests.sms_message_id)
-> Index Scan using sms_messages_pkey on sms_messages (cost=0.00..69875.42 rows=1091467 width=4) (actual time=0.011..3009.455 rows=1107628 loops=1)
Filter: (sent_at IS NOT NULL)
-> Index Scan using index_sms_requests_on_sms_message_id on sms_requests (cost=0.00..26190.25 rows=1347872 width=8) (actual time=0.014..2257.271 rows=1347872 loops=1)
Filter: ((invoker_type)::text = 'Invite'::text)
-> Hash (cost=84264.43..84264.43 rows=1683927 width=4) (actual time=5462.709..5462.709 rows=1683927 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 59201kB
-> Index Scan using invites_pkey on invites (cost=0.00..84264.43 rows=1683927 width=4) (actual time=0.159..2763.626 rows=1683927 loops=1)
Total runtime: 21923.504 ms
EXPLAIN ANALYZE SELECT COUNT(*) FROM "invites" INNER JOIN "sms_requests" ON "sms_requests"."invoker_id" = "invites"."id" AND "sms_requests"."invoker_type" = 'Invite' INNER JOIN "sms_messages" ON "sms_messages"."id" = "sms_requests"."sms_message_id" WHERE "sms_messages"."sent_at" IS NOT NULL;Can you post the output of EXPLAIN ANALYZE? Did you do an ANALYZE of the tables already?
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=129556.14..129556.14 rows=1 width=0) (actual time=21530.146..21530.147 rows=1 loops=1)
-> Hash Join (cost=81261.64..128882.20 rows=1347872 width=0) (actual time=6961.957..20437.245 rows=1340639 loops=1)
Hash Cond: (sms_requests.sms_message_id = sms_messages.id)
-> Hash Join (cost=22033.10..62240.37 rows=1347872 width=4) (actual time=3618.615..11410.322 rows=1340639 loops=1)
Hash Cond: (invites.id = sms_requests.invoker_id)
-> Seq Scan on invites (cost=0.00..33216.78 rows=1683927 width=4) (actual time=0.009..1983.440 rows=1683927 loops=1)
-> Hash (cost=17315.55..17315.55 rows=1347872 width=8) (actual time=3617.847..3617.847 rows=1347872 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 52652kB
-> Seq Scan on sms_requests (cost=0.00..17315.55 rows=1347872 width=8) (actual time=0.027..1651.786 rows=1347872 loops=1)
Filter: ((invoker_type)::text = 'Invite'::text)
-> Hash (cost=55408.40..55408.40 rows=1091467 width=4) (actual time=3342.029..3342.029 rows=1107628 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 38941kB
-> Seq Scan on sms_messages (cost=0.00..55408.40 rows=1091467 width=4) (actual time=0.009..1737.765 rows=1107628 loops=1)
Filter: (sent_at IS NOT NULL)
Total runtime: 21530.247 ms
=> SET enable_seqscan=false;You can often force an index scan by doing: SET enable_seqscan=false; So also try setting that, and run EXPLAIN ANALYZE again, and see if it uses the indexes, and if so, if it's faster.
SET
=> EXPLAIN ANALYZE SELECT COUNT(*) FROM "invites" INNER JOIN "sms_requests" ON "sms_requests"."invoker_id" = "invites"."id" AND "sms_requests"."invoker_type" = 'Invite' INNER JOIN "sms_messages" ON "sms_messages"."id" = "sms_requests"."sms_message_id" WHERE "sms_messages"."sent_at" IS NOT NULL;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=198896.67..198896.67 rows=1 width=0) (actual time=21923.378..21923.378 rows=1 loops=1)
-> Hash Join (cost=90165.47..198222.73 rows=1347872 width=0) (actual time=5465.116..20552.881 rows=1340639 loops=1)
Hash Cond: (sms_requests.invoker_id = invites.id)
-> Merge Join (cost=7.29..101325.20 rows=1347872 width=4) (actual time=0.035..9630.580 rows=1347872 loops=1)
Merge Cond: (sms_messages.id = sms_requests.sms_message_id)
-> Index Scan using sms_messages_pkey on sms_messages (cost=0.00..69875.42 rows=1091467 width=4) (actual time=0.011..3009.455 rows=1107628 loops=1)
Filter: (sent_at IS NOT NULL)
-> Index Scan using index_sms_requests_on_sms_message_id on sms_requests (cost=0.00..26190.25 rows=1347872 width=8) (actual time=0.014..2257.271 rows=1347872 loops=1)
Filter: ((invoker_type)::text = 'Invite'::text)
-> Hash (cost=84264.43..84264.43 rows=1683927 width=4) (actual time=5462.709..5462.709 rows=1683927 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 59201kB
-> Index Scan using invites_pkey on invites (cost=0.00..84264.43 rows=1683927 width=4) (actual time=0.159..2763.626 rows=1683927 loops=1)
Total runtime: 21923.504 ms
On Sun, 2012-10-21 at 08:37 +1300, Jeremy Wells wrote: > Jeff Davis wrote: > > Can you post the output of EXPLAIN ANALYZE? Did you do an ANALYZE of the > > tables already? > EXPLAIN ANALYZE SELECT COUNT(*) FROM "invites" INNER JOIN > "sms_requests" ON "sms_requests"."invoker_id" = "invites"."id" AND > "sms_requests"."invoker_type" = 'Invite' INNER JOIN "sms_messages" ON > "sms_messages"."id" = "sms_requests"."sms_message_id" WHERE > "sms_messages"."sent_at" IS NOT NULL; > Interesting... it looks like the two plans take about the same amount of time, so the planner is not making a mistake. I tried making three tables about the same size as yours, and then doing a three-way join, and on my machine it took closer to a second. So there's clearly something else going on with your data. Maybe the data distribution is skewed (some values having many matches in another table, others having none)? Regards, Jeff Davis
Jeff Davis wrote: > > I tried making three tables about the same size as yours, and then doing > a three-way join, and on my machine it took closer to a second. So > there's clearly something else going on with your data. Maybe the data > distribution is skewed (some values having many matches in another > table, others having none)? > Every row in sms_requests will match a row in sms_messages (but maybe the same row as sms_requests n<-->1 sms_messages). Each row in invites will have either 0 or 1 row in sms_requests. Given that the number of sms_messages with sent_at IS NULL will remain pretty constant will the query continue to run at about the same speed or will it slow significantly as each table grows in size? Although the time of the count is slow it's not giving me problems yet. Can I speed it up? Would it make sense to denormalize the sent_at time into a boolean on the sms_requests table? Thanks for your help so far, Jeremy
Jeremy Wells <jemmyw@gmail.com> writes: > Given that the number of sms_messages with sent_at IS NULL will remain > pretty constant will the query continue to run at about the same speed > or will it slow significantly as each table grows in size? At some point the IS NULL condition will become selective enough that it's worth using an index on sent_at to pull out just those rows. The planner evidently thinks you are not there yet. (Or, perhaps, you are using a version too old to be able to use an index for IS NULL? You didn't mention your PG version, which is a fundamental mistake in any performance-related question.) My overall take on this is that it's folly to imagine that indexes are a magic bullet for full-table joins: if the query has to read most/all of the table anyway, an index doesn't help much. The plans you've shown look perfectly sane from here. If you were using 9.2, and the tables were relatively static, index-only scans might make sense. But they're no magic bullet either. regards, tom lane
Tom Lane wrote: > (Or, perhaps, you > are using a version too old to be able to use an index for IS NULL? > You didn't mention your PG version, which is a fundamental mistake in > any performance-related question.) I'm using 9.1