Thread: Question about a query plan

Question about a query plan

From
Bill Moseley
Date:
PostgreSQL 7.4.8 on i486-pc-linux-gnu, Debian
Reading: http://www.postgresql.org/docs/8.0/static/performance-tips.html

I'm just starting to look at query plans, and I'm not understanding a
few things.  I don't have that many questions, but I'm including my
examples below, so it's a bit long.


First table is "class" (as in a class taught at a school) and has an
indexed column "class_time" as timestamp(0) with time zone.


First question is why the planner is not using an index scan when I
use "now()" or CURRENT_TIMESTAMP?


    EXPLAIN ANALYZE select id from class where class_time > now();
                                                    QUERY PLAN
    ----------------------------------------------------------------------------------------------------------
     Seq Scan on "class"  (cost=0.00..655.62 rows=414 width=4) (actual time=213.327..288.407 rows=28 loops=1)
       Filter: (class_time > now())

    EXPLAIN ANALYZE select id from class where class_time > now()::timestamp(0) with time zone;
                                                  QUERY PLAN
    ------------------------------------------------------------------------------------------------------
     Seq Scan on "class"  (cost=0.00..658.72 rows=414 width=4) (actual time=2.065..5.251 rows=28 loops=1)
       Filter: (class_time > (now())::timestamp(0) with time zone)


At first I thought the planner was related to the ration of rows the
planner was expecting to return to the total number of rows.  But
using < or > uses a a scan.  But if I do "class_time = now()" then it
uses an Index Scan.


But, if I specify the timestamp then it always uses an Index Scan:

    select now()::timestamp(0) with time zone;
              now
    ------------------------
     2005-09-16 06:44:10-07

     EXPLAIN ANALYZE select id from class where class_time > '2005-09-16 06:44:10-07';
                                                                  QUERY PLAN
                  

--------------------------------------------------------------------------------------------------------------------------------------
     Index Scan using class_class_time_index on "class"  (cost=0.00..191.17 rows=50 width=4) (actual
time=66.072..66.248rows=28 loops=1) 
       Index Cond: (class_time > '2005-09-16 06:44:10-07'::timestamp with time zone)



Ok now on to the second question.  I have two other related tables.
First, I have a table "person" which you can guess what it holds.  And
a link table instructors (a class can have more than one instructor):

     \d instructors
      Table "public.instructors"
     Column |  Type   | Modifiers
    --------+---------+-----------
     person | integer | not null
     class  | integer | not null
    Indexes:
        "instructors_pkey" primary key, btree (person, "class")
        "instructors_class_index" btree ("class")
        "instructors_person_index" btree (person)
    Foreign-key constraints:
        "$2" FOREIGN KEY ("class") REFERENCES "class"(id)
        "$1" FOREIGN KEY (person) REFERENCES person(id)



I want to find out who is teaching classes in the future:


EXPLAIN ANALYZE select person, class from instructors
                where instructors.class in
                    (select class.id from class where class_time > now());
                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Hash IN Join  (cost=656.65..687.64 rows=437 width=8) (actual time=31.741..33.443 rows=29 loops=1)
   Hash Cond: ("outer"."class" = "inner".id)
   ->  Seq Scan on instructors  (cost=0.00..20.08 rows=1308 width=8) (actual time=0.057..1.433 rows=1308 loops=1)
   ->  Hash  (cost=655.62..655.62 rows=414 width=4) (actual time=30.963..30.963 rows=0 loops=1)
         ->  Seq Scan on "class"  (cost=0.00..655.62 rows=414 width=4) (actual time=18.716..30.892 rows=28 loops=1)
               Filter: (class_time > now())

Perhaps I'm reading that incorrectly, but the sub-select is returning
28 rows of "class.id".  Then why is it doing a Seq Scan on instructors
instead of an index scan?  If I innumerate all 28 classes I get an
Index Scan.



Finally, not really a question, but my goal is to show a count of classes taught by each in
instructor.  Perhaps there's a better query?

EXPLAIN select person, first_name, count(class)
        from instructors, person
        where instructors.class in
            (select id from class where class_time > now() )
            AND person.id = instructors.person
            group by person, first_name;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 HashAggregate  (cost=734.06..735.15 rows=437 width=17)
   ->  Merge Join  (cost=706.81..730.78 rows=437 width=17)
         Merge Cond: ("outer".id = "inner".person)
         ->  Index Scan using person_pkey on person  (cost=0.00..1703.82 rows=12246 width=13)
         ->  Sort  (cost=706.81..707.90 rows=437 width=8)
               Sort Key: instructors.person
               ->  Hash IN Join  (cost=656.65..687.64 rows=437 width=8)
                     Hash Cond: ("outer"."class" = "inner".id)
                     ->  Seq Scan on instructors  (cost=0.00..20.08 rows=1308 width=8)
                     ->  Hash  (cost=655.62..655.62 rows=414 width=4)
                           ->  Seq Scan on "class"  (cost=0.00..655.62 rows=414 width=4)
                                 Filter: (class_time > now())



--
Bill Moseley
moseley@hank.org


Re: Question about a query plan

From
Peter Eisentraut
Date:
Am Freitag, 16. September 2005 16:18 schrieb Bill Moseley:
> First question is why the planner is not using an index scan when I
> use "now()" or CURRENT_TIMESTAMP?
>
>
>     EXPLAIN ANALYZE select id from class where class_time > now();
>                                                     QUERY PLAN
>
> ---------------------------------------------------------------------------
>------------------------------- Seq Scan on "class"  (cost=0.00..655.62
> rows=414 width=4) (actual time=213.327..288.407 rows=28 loops=1) Filter:
> (class_time > now())

The planner thinks your query will return 414 rows, so it thinks the
sequential scan is faster.  In reality, your query only retuns 28 rows, so
you need to create better statistics, either by running ANALYZE or VACUUM (or
both) or tweaking the statistics parameters of the columns.

> Perhaps I'm reading that incorrectly, but the sub-select is returning
> 28 rows of "class.id".  Then why is it doing a Seq Scan on instructors
> instead of an index scan?  If I innumerate all 28 classes I get an
> Index Scan.

Again, bad statistics.  It thinks that the scan on instructors is going to
return 1308 rows, which is probably not true.

> Finally, not really a question, but my goal is to show a count of classes
> taught by each in instructor.  Perhaps there's a better query?

You could probably rewrite it as a join, but as you could see, the planner
rewrote it as a join internally anyway.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Question about a query plan

From
Thomas O'Connell
Date:
On Sep 16, 2005, at 9:18 AM, Bill Moseley wrote:

> First question is why the planner is not using an index scan when I
> use "now()" or CURRENT_TIMESTAMP?

It also used to be the case (pre-8.0; I couldn't find in the release
notes whether this was an 8.0 or 8.1 fix) that now() and
CURRENT_TIMESTAMP were not indexable, I think because of mutability.

For older versions of postgres, it's recommended that you determine
the time in the client and use constant data in your query.

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-469-5150
615-469-5151 (fax)

Re: Question about a query plan

From
Bill Moseley
Date:
On Fri, Sep 16, 2005 at 04:45:57PM +0200, Peter Eisentraut wrote:
> The planner thinks your query will return 414 rows, so it thinks the
> sequential scan is faster.  In reality, your query only retuns 28 rows, so
> you need to create better statistics, either by running ANALYZE or VACUUM (or
> both) or tweaking the statistics parameters of the columns.

I did wonder about the planner stats, so I had run ANALYZE on the
database with no change.  I just now ran VACUUM (and VACUUM ANALYZE),
and again see no change.  Perhaps my table is just too small for this
test.


--
Bill Moseley
moseley@hank.org


Re: Question about a query plan

From
Bill Moseley
Date:
On Fri, Sep 16, 2005 at 10:02:28AM -0500, Thomas O'Connell wrote:
>
> On Sep 16, 2005, at 9:18 AM, Bill Moseley wrote:
>
> >First question is why the planner is not using an index scan when I
> >use "now()" or CURRENT_TIMESTAMP?
>
> It also used to be the case (pre-8.0; I couldn't find in the release
> notes whether this was an 8.0 or 8.1 fix) that now() and
> CURRENT_TIMESTAMP were not indexable, I think because of mutability.
>
> For older versions of postgres, it's recommended that you determine
> the time in the client and use constant data in your query.

Interesting.  I have a few VIEWs that include now(), but I guess I
could adjust and pass in the date from the client.

Thanks for the tip.

--
Bill Moseley
moseley@hank.org


Re: Question about a query plan

From
Martijn van Oosterhout
Date:
On Fri, Sep 16, 2005 at 10:18:19AM -0700, Bill Moseley wrote:
> On Fri, Sep 16, 2005 at 10:02:28AM -0500, Thomas O'Connell wrote:
> > For older versions of postgres, it's recommended that you determine
> > the time in the client and use constant data in your query.
>
> Interesting.  I have a few VIEWs that include now(), but I guess I
> could adjust and pass in the date from the client.

I sometimes use the constant 'now' instead, since it obviously isn't
mutable. It's a bit tricky because sometimes it expands when you define
the statement. You have to do 'now'::text usually and even then you
should check...
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Question about a query plan

From
Tom Lane
Date:
"Thomas O'Connell" <tfo@sitening.com> writes:
> On Sep 16, 2005, at 9:18 AM, Bill Moseley wrote:
>> First question is why the planner is not using an index scan when I
>> use "now()" or CURRENT_TIMESTAMP?

> It also used to be the case (pre-8.0; I couldn't find in the release
> notes whether this was an 8.0 or 8.1 fix) that now() and
> CURRENT_TIMESTAMP were not indexable, I think because of mutability.

8.0 fix.  It's not that they're "not indexable" per se, it's that
pre-8.0 planners punted when it came to estimating what fraction of rows
would meet a condition like "timestampcol > now()" --- and the default
estimate for such things doesn't favor an indexscan.  The 8.0 change is
to go ahead and run the function and see what value it's returning now
(pardon the pun) and then compare that to the column's statistical
histogram to derive a rows estimate.  It's entirely likely that we'll
get ragged on eventually because of cases where this procedure generates
bad estimates ... but for the moment it seems like a win.

            regards, tom lane

Re: Question about a query plan

From
Bill Moseley
Date:
I'm still trying to understand EXPLAIN ANALYZE output.

ws2=> select count(*) from person_role;
 count
-------
   123
(1 row)

ws2=> select count(*) from person;
 count
-------
 11033
(1 row)

ws2=> EXPLAIN ANALYZE select id, first_name, last_name from person, person_role where id = 94 and person_role.person =
person.idand (person_role.role = 2); 
                                                        QUERY PLAN
   

---------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..8.28 rows=1 width=23) (actual time=0.198..0.237 rows=1 loops=1)
   ->  Index Scan using person_pkey on person  (cost=0.00..5.44 rows=1 width=23) (actual time=0.054..0.056 rows=1
loops=1)
         Index Cond: (id = 94)
   ->  Seq Scan on person_role  (cost=0.00..2.83 rows=1 width=4) (actual time=0.130..0.165 rows=1 loops=1)
         Filter: ((role = 2) AND (person = 94))
 Total runtime: 0.379 ms
(6 rows)


Why does it say "Seq Scan" on person_role?  The query has both the
"person" and "role" to use as a primary key -- which is indexed.
Indeed, "rows=1" so it looks like an index fetch.

Perhaps, I'm reading that incorrectly?


ws2=> \d person_role;
  Table "public.person_role"
 Column |  Type   | Modifiers
--------+---------+-----------
 person | integer | not null
 role   | integer | not null
Indexes:
    "person_role_pkey" primary key, btree (person, role)
Foreign-key constraints:
    "$2" FOREIGN KEY (role) REFERENCES role(id) ON DELETE RESTRICT
    "$1" FOREIGN KEY (person) REFERENCES person(id) ON DELETE CASCADE

Thanks,



--
Bill Moseley
moseley@hank.org


Re: Question about a query plan

From
Stephan Szabo
Date:
On Tue, 20 Sep 2005, Bill Moseley wrote:

> ws2=> select count(*) from person_role;
>  count
> -------
>    123
> (1 row)
>
> ws2=> select count(*) from person;
>  count
> -------
>  11033
> (1 row)
>
> ws2=> EXPLAIN ANALYZE select id, first_name, last_name from person, person_role where id = 94 and person_role.person
=person.id and (person_role.role = 2); 
>                                                         QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..8.28 rows=1 width=23) (actual time=0.198..0.237 rows=1 loops=1)
>    ->  Index Scan using person_pkey on person  (cost=0.00..5.44 rows=1 width=23) (actual time=0.054..0.056 rows=1
loops=1)
>          Index Cond: (id = 94)
>    ->  Seq Scan on person_role  (cost=0.00..2.83 rows=1 width=4) (actual time=0.130..0.165 rows=1 loops=1)
>          Filter: ((role = 2) AND (person = 94))
>  Total runtime: 0.379 ms
> (6 rows)
>
>
> Why does it say "Seq Scan" on person_role?  The query has both the
> "person" and "role" to use as a primary key -- which is indexed.
> Indeed, "rows=1" so it looks like an index fetch.

IIRC, that's how many rows met the filter.

My guess is that unless there's dead space, 123 rows of person_role should
fit in 1 page, so it's probably deciding that using the index would
involve more disk access than not.

Re: Question about a query plan

From
Tom Lane
Date:
Bill Moseley <moseley@hank.org> writes:
> ws2=> select count(*) from person_role;
>  count
> -------
>    123
> (1 row)
> ...
>    ->  Seq Scan on person_role  (cost=0.00..2.83 rows=1 width=4) (actual time=0.130..0.165 rows=1 loops=1)
>          Filter: ((role = 2) AND (person = 94))

> Why does it say "Seq Scan" on person_role?

Probably because it doesn't consider that table big enough to warrant
using an index.

            regards, tom lane