bug in query planning? - Mailing list pgsql-general

From Steven D.Arnold
Subject bug in query planning?
Date
Msg-id 971FAF7E-3422-11D8-B5C3-000A95BA4396@neosynapse.net
Whole thread Raw
Responses Re: bug in query planning?
List pgsql-general
I have a query which does not use column indexes that it should use.  I
have discovered some interesting behaviors of Postgres which may
indicate a bug in the database's query planning.

Take a look at the query below.  There is a btree index on both
m.account_id and a.account_id.  Query (1) does not use the index on the
messages table, instead opting for a full table scan, thus killing
performance.  The messages table can contain potentially hundreds of
thousands or millions of rows.  Even at 50,000, it's murder.

Query (2) below is the same query, but we reverse the order of the
tables.  It's obviously not quite the same query semantically, even
though in my case it should always produce the same result.  It is
interesting to note that it uses the indexes tho.

Finally, query (3) below uses traditional joining (non-ANSI).  Indexes
are correctly used in that query.  The suggestion is that Postgres does
not correctly analyze queries using ANSI joins.  Indexes are
occasionally skipped when they should be used.  This seems like a bug
in Postgres.  I'm using version 7.3.4 of Postgres.

Thanks in advance for any comments...
steve


Query (1)
=========
defender=# explain analyze
defender-# select    count(message_id)
defender-# from      messages m
defender-# left join accounts a
defender-# on        m.account_id::bigint = a.account_id::bigint
defender-# where     a.email = 'stevena@neosynapse.net';
                                                                  QUERY
PLAN
------------------------------------------------------------------------
--------------------------------------------------------------------
  Aggregate  (cost=20461.10..20461.10 rows=1 width=47) (actual
time=1420.09..1420.09 rows=1 loops=1)
    ->  Hash Join  (cost=30.77..20334.38 rows=50687 width=47) (actual
time=0.51..1319.69 rows=51419 loops=1)
          Hash Cond: ("outer".account_id = "inner".account_id)
          Filter: ("inner".email = 'stevena@neosynapse.net'::text)
          ->  Seq Scan on messages m  (cost=0.00..19289.87 rows=50687
width=16) (actual time=0.06..703.89 rows=52541 loops=1)
          ->  Hash  (cost=30.76..30.76 rows=3 width=31) (actual
time=0.40..0.40 rows=0 loops=1)
                ->  Index Scan using accounts_pkey on accounts a
(cost=0.00..30.76 rows=3 width=31) (actual time=0.17..0.38 rows=3
loops=1)
  Total runtime: 1420.25 msec
(8 rows)


Query (2)
=========
defender=# explain analyze
defender-# select    count(message_id)
defender-# from      accounts a
defender-# left join messages m
defender-# on        a.account_id::bigint = m.account_id::bigint
defender-# where     a.email = 'stevena@neosynapse.net';

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------
  Aggregate  (cost=6806.54..6806.54 rows=1 width=24) (actual
time=792.14..792.14 rows=1 loops=1)
    ->  Nested Loop  (cost=0.00..6764.30 rows=16896 width=24) (actual
time=0.38..718.12 rows=51419 loops=1)
          ->  Index Scan using accounts_email on accounts a
(cost=0.00..8.98 rows=1 width=8) (actual time=0.22..0.25 rows=1
loops=1)
                Index Cond: (email = 'stevena@neosynapse.net'::text)
          ->  Index Scan using messages_account_id on messages m
(cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..593.15
rows=51419 loops=1)
                Index Cond: ("outer".account_id = m.account_id)
  Total runtime: 792.33 msec
(7 rows)


Query (3)
=========
defender=# explain analyze
defender-# select    count(message_id)
defender-# from      messages m, accounts a
defender-# where     m.account_id::bigint = a.account_id::bigint
defender-# and       a.email = 'stevena@neosynapse.net';

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------
  Aggregate  (cost=6806.54..6806.54 rows=1 width=24) (actual
time=782.30..782.30 rows=1 loops=1)
    ->  Nested Loop  (cost=0.00..6764.30 rows=16896 width=24) (actual
time=0.33..708.52 rows=51422 loops=1)
          ->  Index Scan using accounts_email on accounts a
(cost=0.00..8.98 rows=1 width=8) (actual time=0.15..0.18 rows=1
loops=1)
                Index Cond: (email = 'stevena@neosynapse.net'::text)
          ->  Index Scan using messages_account_id on messages m
(cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..578.23
rows=51422 loops=1)
                Index Cond: (m.account_id = "outer".account_id)
  Total runtime: 782.46 msec
(7 rows)


pgsql-general by date:

Previous
From: Paul Ganainm
Date:
Subject: Re: Firebird and PostgreSQL at the DB Corral.
Next
From: Tom Lane
Date:
Subject: Re: bug in query planning?