improving a badly optimized query - Mailing list pgsql-general

From Brandon Craig Rhodes
Subject improving a badly optimized query
Date
Msg-id w6lm3pe8jh.fsf@guinness.ts.gatech.edu
Whole thread Raw
Responses Re: improving a badly optimized query  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
A query has surprised me by the amount of time it takes, and it seems
that PostgreSQL is performing insufficient optimization.  To make the
example simple, consider the following database:

        CREATE TABLE role_keys (
                role SERIAL PRIMARY KEY
        );
        CREATE TABLE role_person (
                role INTEGER UNIQUE NOT NULL REFERENCES role_keys,
                person INTEGER NOT NULL
        );
        CREATE INDEX role_person_index ON role_person (person);
        CREATE VIEW roles AS SELECT
                role_keys.role, person
                FROM role_keys NATURAL LEFT JOIN role_person;

Having populated these tables, I attempted the following query:

        SELECT * FROM roles WHERE person = 28389;

It turns out that this query - equivalent to query (a) shown below -
takes more than ten times the amount of time required by query (b),
despite being guaranteed to give exactly the same result!

(a) (slow)
        SELECT * FROM role_keys NATURAL LEFT JOIN role_person
                WHERE person = 28389;

(b) (fast)
        SELECT * FROM role_keys NATURAL JOIN role_person
                WHERE person = 28389;

Apparently PostgreSQL does not realize that the rows created for
unmatched role_keys rows by the LEFT JOIN are guaranteed to be thrown
out by the WHERE clause (their `person' fields will be null).  Because
of this it reads through the entire role_keys table:

(a) (when run with EXPLAIN)
        Merge Join  (cost=0.00..3990.83 rows=67524 width=12)
        ->  Index Scan using role_keys_pkey on role_keys
                (cost=0.00..1280.67 rows=67524 width=4)
        ->  Index Scan using role_person_role_key on role_person
                (cost=0.00..1359.68 rows=67525 width=8)

(b) (when run with EXPLAIN)
        Nested Loop  (cost=0.00..6.91 rows=1 width=12)
        ->  Index Scan using role_person_index on role_person
                (cost=0.00..3.02 rows=1 width=8)
        ->  Index Scan using role_keys_pkey on role_keys
                (cost=0.00..3.01 rows=1 width=4)

It is not obvious to me where in PostgreSQL's optimization routine to
insert the intelligence to reduce this from a `LEFT JOIN' to a `JOIN'.
Has anyone else had to deal with this case?

The VIEW itself must be a LEFT JOIN because I need all roles to appear
when I query the view; but I will frequently need to do queries like
the above, and would like to avoid either (a) having to create a
separate view for each combination of fields on which I might search,
or (b) querying using the raw database tables since I would like the
actualy design hidden from my business logic.

Thanks for any ideas,
--
Brandon Craig Rhodes                         http://www.rhodesmill.org/brandon
Georgia Tech                                            brandon@oit.gatech.edu

pgsql-general by date:

Previous
From: Medi Montaseri
Date:
Subject: Re: help in starting up / shutting down postgres as another
Next
From: Medi Montaseri
Date:
Subject: Re: stability of pg library usage