Thread: On using "date 'XXX' + interval 'XXX'" vs "date 'XXX'"

On using "date 'XXX' + interval 'XXX'" vs "date 'XXX'"

From

Jon Lapham

Date:

20 May 2002, 13:57:04

Hello-

In rewriting some queries I noticed a huge performance penalty when
using a "date + interval" summation in the SELECT statement, versus a
single simple "date".  It is almost as though the "date + interval" is
being calculated for each row...

Much easier to demonstrate than it is to explain (note the runtimes of
the queries, 72ms versus 482ms):

===============================================
main_v0_8=# explain analyze select count(*) from sample_tracker where
initdate>=date '2002-02-01';
NOTICE:  QUERY PLAN:

Aggregate  (cost=607.24..607.24 rows=1 width=0) (actual
time=72.08..72.08 rows=1 loops=1)
   ->  Seq Scan on sample_tracker  (cost=0.00..595.74 rows=4600 width=0)
(actual time=0.04..63.62 rows=4266 loops=1)
Total runtime: 72.20 msec

EXPLAIN

===============================================
main_v0_8=# explain analyze select count(*) from sample_tracker where
initdate>=date '2002-01-01' + interval '1 month';
NOTICE:  QUERY PLAN:

Aggregate  (cost=738.23..738.23 rows=1 width=0) (actual
time=482.49..482.49 rows=1 loops=1)
   ->  Seq Scan on sample_tracker  (cost=0.00..723.98 rows=5700 width=0)
(actual time=0.13..470.94 rows=4266 loops=1)
Total runtime: 482.62 msec

EXPLAIN

===============================================
main_v0_8=# explain ANALYZE select date '2002-01-01' + interval '1 month';
NOTICE:  QUERY PLAN:

Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.04..0.05 rows=1
loops=1)
Total runtime: 0.09 msec

===============================================
main_v0_8=# select version();
                            version
-------------------------------------------------------------
  PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96



Seems like this could be something ripe for optimization...

--

-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham
  Extracta Moléculas Naturais, Rio de Janeiro, Brasil
  email: lapham@extracta.com.br      web: http://www.extracta.com.br/
***-*--*----*-------*------------*--------------------*---------------

Re: On using "date 'XXX' + interval 'XXX'" vs "date 'XXX'"

From

Tom Lane

Date:

20 May 2002, 14:28:32

Jon Lapham <lapham@extracta.com.br> writes:
> In rewriting some queries I noticed a huge performance penalty when
> using a "date + interval" summation in the SELECT statement, versus a
> single simple "date".  It is almost as though the "date + interval" is
> being calculated for each row...

Try coercing the sum result back to a date.

It's a little easier to see what's happening in current sources:

regression=# create table sample_tracker (initdate date primary key);

regression=# set enable_seqscan TO 0;
SET
regression=# explain  select count(*) from sample_tracker where
regression-# initdate>=date '2002-02-01';
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=46.33..46.33 rows=1 width=0)
   ->  Index Scan using sample_tracker_pkey on sample_tracker  (cost=0.00..45.50 rows=333 width=0)
         Index Cond: (initdate >= '2002-02-01'::date)
(3 rows)

regression=# explain  select count(*) from sample_tracker where
regression-# initdate>=date '2002-01-01' + interval '1 month';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Aggregate  (cost=100000025.83..100000025.83 rows=1 width=0)
   ->  Seq Scan on sample_tracker  (cost=100000000.00..100000025.00 rows=333 width=0)
         Filter: ("timestamp"(initdate) >= '2002-02-01 00:00:00'::timestamp without time zone)
(3 rows)

regression=#

Writing date(date '2002-01-01' + interval '1 month') gets me back to
the first plan.

            regards, tom lane

Re: On using "date 'XXX' + interval 'XXX'" vs "date 'XXX'"

From

Jon Lapham

Date:

20 May 2002, 14:55:30

Tom Lane wrote:
> Try coercing the sum result back to a date.

Yup, that does the trick.

As usual, thanks Tom.

-Jon

PS: Is this optimizable?  (ie: have the coersion be implicit in
situations like this).  It sure seems like it is a good candidate...

--

-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham
  Extracta Moléculas Naturais, Rio de Janeiro, Brasil
  email: lapham@extracta.com.br      web: http://www.extracta.com.br/
***-*--*----*-------*------------*--------------------*---------------

Re: On using "date 'XXX' + interval 'XXX'" vs "date 'XXX'"

From

Thomas Lockhart

Date:

20 May 2002, 15:34:15

> > Try coercing the sum result back to a date.
...
> PS: Is this optimizable?  (ie: have the coersion be implicit in
> situations like this).  It sure seems like it is a good candidate...

Not at the parser or optimizer level. You *could* have had hours,
minutes, or seconds in that interval value you specified, in which case
you would be truncating to get back to date. We'd need more
infrastructure to somehow know how to optimize something like that.

Or, we could split the INTERVAL type into the (ugh) bunch-o-types
envisioned by the SQL standard. YEAR, MONTH, YEAR TO MONTH, and DAY
intervals could be converted directly to dates rather than timestamps.
This would allow the optimizer to know what the output range would be,
whereas now the range info is just used for input and output (and is
usually a don't-care internally since the other fields are zeros).

                  - Thomas