bug in query planning? - Mailing list pgsql-general
From | Steven D.Arnold |
---|---|
Subject | bug in query planning? |
Date | |
Msg-id | 971FAF7E-3422-11D8-B5C3-000A95BA4396@neosynapse.net Whole thread Raw |
Responses |
Re: bug in query planning?
|
List | pgsql-general |
I have a query which does not use column indexes that it should use. I have discovered some interesting behaviors of Postgres which may indicate a bug in the database's query planning. Take a look at the query below. There is a btree index on both m.account_id and a.account_id. Query (1) does not use the index on the messages table, instead opting for a full table scan, thus killing performance. The messages table can contain potentially hundreds of thousands or millions of rows. Even at 50,000, it's murder. Query (2) below is the same query, but we reverse the order of the tables. It's obviously not quite the same query semantically, even though in my case it should always produce the same result. It is interesting to note that it uses the indexes tho. Finally, query (3) below uses traditional joining (non-ANSI). Indexes are correctly used in that query. The suggestion is that Postgres does not correctly analyze queries using ANSI joins. Indexes are occasionally skipped when they should be used. This seems like a bug in Postgres. I'm using version 7.3.4 of Postgres. Thanks in advance for any comments... steve Query (1) ========= defender=# explain analyze defender-# select count(message_id) defender-# from messages m defender-# left join accounts a defender-# on m.account_id::bigint = a.account_id::bigint defender-# where a.email = 'stevena@neosynapse.net'; QUERY PLAN ------------------------------------------------------------------------ -------------------------------------------------------------------- Aggregate (cost=20461.10..20461.10 rows=1 width=47) (actual time=1420.09..1420.09 rows=1 loops=1) -> Hash Join (cost=30.77..20334.38 rows=50687 width=47) (actual time=0.51..1319.69 rows=51419 loops=1) Hash Cond: ("outer".account_id = "inner".account_id) Filter: ("inner".email = 'stevena@neosynapse.net'::text) -> Seq Scan on messages m (cost=0.00..19289.87 rows=50687 width=16) (actual time=0.06..703.89 rows=52541 loops=1) -> Hash (cost=30.76..30.76 rows=3 width=31) (actual time=0.40..0.40 rows=0 loops=1) -> Index Scan using accounts_pkey on accounts a (cost=0.00..30.76 rows=3 width=31) (actual time=0.17..0.38 rows=3 loops=1) Total runtime: 1420.25 msec (8 rows) Query (2) ========= defender=# explain analyze defender-# select count(message_id) defender-# from accounts a defender-# left join messages m defender-# on a.account_id::bigint = m.account_id::bigint defender-# where a.email = 'stevena@neosynapse.net'; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ -------- Aggregate (cost=6806.54..6806.54 rows=1 width=24) (actual time=792.14..792.14 rows=1 loops=1) -> Nested Loop (cost=0.00..6764.30 rows=16896 width=24) (actual time=0.38..718.12 rows=51419 loops=1) -> Index Scan using accounts_email on accounts a (cost=0.00..8.98 rows=1 width=8) (actual time=0.22..0.25 rows=1 loops=1) Index Cond: (email = 'stevena@neosynapse.net'::text) -> Index Scan using messages_account_id on messages m (cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..593.15 rows=51419 loops=1) Index Cond: ("outer".account_id = m.account_id) Total runtime: 792.33 msec (7 rows) Query (3) ========= defender=# explain analyze defender-# select count(message_id) defender-# from messages m, accounts a defender-# where m.account_id::bigint = a.account_id::bigint defender-# and a.email = 'stevena@neosynapse.net'; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ -------- Aggregate (cost=6806.54..6806.54 rows=1 width=24) (actual time=782.30..782.30 rows=1 loops=1) -> Nested Loop (cost=0.00..6764.30 rows=16896 width=24) (actual time=0.33..708.52 rows=51422 loops=1) -> Index Scan using accounts_email on accounts a (cost=0.00..8.98 rows=1 width=8) (actual time=0.15..0.18 rows=1 loops=1) Index Cond: (email = 'stevena@neosynapse.net'::text) -> Index Scan using messages_account_id on messages m (cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..578.23 rows=51422 loops=1) Index Cond: (m.account_id = "outer".account_id) Total runtime: 782.46 msec (7 rows)
pgsql-general by date: