Thread: What is the difference between these?

What is the difference between these?

From

Matt Nuzum

Date:

05 November 2004, 21:10:02

To me, these three queries seem identical... why doesn't the first one
(simplest to understand and write) go the same speed as the third one?

I'll I'm trying to do is get statistics for one day (in this case,
today) summarized.  Table has ~25M rows.  I'm using postgres 7.3.? on
rh linux 7.3 (note that i think the difference between the first two
might just be related to the data being in memory for the second
query).


 EXPLAIN ANALYZE
 select count(distinct sessionid) from usage_access where atime >
date_trunc('day', now());
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=580350.43..580350.43 rows=1 loops=1)
   ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=580164.48..580342.21 rows=2964 loops=1)
         Filter: (atime > date_trunc('day'::text, now()))
 Total runtime: 580350.65 msec
(4 rows)


 EXPLAIN ANALYZE
 select count(distinct sessionid) from (select * from usage_access
where atime > date_trunc('day', now())) as temp;
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=348012.85..348012.85 rows=1 loops=1)
   ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=347960.53..348004.68 rows=2964 loops=1)
         Filter: (atime > date_trunc('day'::text, now()))
 Total runtime: 348013.10 msec
(4 rows)


 EXPLAIN ANALYZE
 select count(distinct sessionid) from usage_access where atime
between date_trunc('day', now()) and date_trunc('day', now()) + '1
day'::interval;
                                                                    QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=89324.98..89324.98 rows=1 width=4) (actual
time=27.84..27.84 rows=1 loops=1)
   ->  Index Scan using usage_access_atime on usage_access
(cost=0.00..89009.39 rows=126237 width=4) (actual time=0.51..20.37
rows=2964 loops=1)
         Index Cond: ((atime >= date_trunc('day'::text, now())) AND
(atime <= (date_trunc('day'::text, now()) + '1 day'::interval)))
 Total runtime: 28.11 msec
(4 rows)

--
Matthew Nuzum        | Makers of "Elite Content Management System"
www.followers.net        | View samples of Elite CMS in action
matt@followers.net        | http://www.followers.net/portfolio/

Re: What is the difference between these?

From

"Matthew T. O'Connor"

Date:

05 November 2004, 22:44:41

Matt Nuzum wrote:

>To me, these three queries seem identical... why doesn't the first one
>(simplest to understand and write) go the same speed as the third one?
>
>
If you look at the explain output, you will notice that only the 3rd
query is using an Index Scan, where as the 1st and 2nd are doing a
sequential scan over the entire table of 25M rows.  My guess is that the
problem is related to outdated statistics on the atime column.  If you
notice the 1st and 2nd queries estimate 8.4M rows returned at which
point a seq scan is the right choice, but the 3rd query using the
between statement only estimates 127k rows which make the Index a better
option.  All of these queries only return 2964 rows so it looks like
your stats are out of date.  Try running an analyze command right before
doing any of these queries and see what happens.

>I'll I'm trying to do is get statistics for one day (in this case,
>today) summarized.  Table has ~25M rows.  I'm using postgres 7.3.? on
>rh linux 7.3 (note that i think the difference between the first two
>might just be related to the data being in memory for the second
>query).
>
>
> EXPLAIN ANALYZE
> select count(distinct sessionid) from usage_access where atime >
>date_trunc('day', now());
>                                                            QUERY PLAN

>----------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
>time=580350.43..580350.43 rows=1 loops=1)
>   ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
>width=4) (actual time=580164.48..580342.21 rows=2964 loops=1)
>         Filter: (atime > date_trunc('day'::text, now()))
> Total runtime: 580350.65 msec
>(4 rows)
>
>
> EXPLAIN ANALYZE
> select count(distinct sessionid) from (select * from usage_access
>where atime > date_trunc('day', now())) as temp;
>                                                            QUERY PLAN

>----------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
>time=348012.85..348012.85 rows=1 loops=1)
>   ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
>width=4) (actual time=347960.53..348004.68 rows=2964 loops=1)
>         Filter: (atime > date_trunc('day'::text, now()))
> Total runtime: 348013.10 msec
>(4 rows)
>
>
> EXPLAIN ANALYZE
> select count(distinct sessionid) from usage_access where atime
>between date_trunc('day', now()) and date_trunc('day', now()) + '1
>day'::interval;
>                                                                    QUERY PLAN

>--------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=89324.98..89324.98 rows=1 width=4) (actual
>time=27.84..27.84 rows=1 loops=1)
>   ->  Index Scan using usage_access_atime on usage_access
>(cost=0.00..89009.39 rows=126237 width=4) (actual time=0.51..20.37
>rows=2964 loops=1)
>         Index Cond: ((atime >= date_trunc('day'::text, now())) AND
>(atime <= (date_trunc('day'::text, now()) + '1 day'::interval)))
> Total runtime: 28.11 msec
>(4 rows)
>
>
>

Re: What is the difference between these?

From

Tom Lane

Date:

05 November 2004, 23:33:17

Matt Nuzum <matt.followers@gmail.com> writes:
> To me, these three queries seem identical... why doesn't the first one
> (simplest to understand and write) go the same speed as the third one?

This is the standard problem that the planner has to guess about the
selectivity of inequalities involving non-constants (like now()).
The guesses are set up so that a one-sided inequality will use a
seqscan while a range constraint will use an indexscan.

See the pgsql-performance archives for other ways of persuading it
that an indexscan is a good idea.

            regards, tom lane