Thread: Index scan vs. Seq scan on timestamps

Index scan vs. Seq scan on timestamps

From
Per Jensen
Date:
List,

PG version is 7.4.2


I log apache hits to a postgres server. The table layout is as follows:

apachelog=# \d accesslog
             Table "public.accesslog"
   Column  |           Type           | Modifiers
----------+--------------------------+-----------
  id       | integer                  | not null
  ip       | character varying(15)    |
  ident    | character varying(200)   |
  auth     | character varying(200)   |
  time     | timestamp with time zone |
  request  | character varying(200)   |
  code     | integer                  |
  bytes    | integer                  |
  referrer | character varying(200)   |
  agent    | character varying(200)   |
Indexes:
     "accesslog_pkey" primary key, btree (id)
     "accesslog_time_idx" btree ("time")

Number of rows: approx: 530.000

If I want to count hits 30 days back this query uses an efficient
index scan:

select count(*)
from accesslog
where time  between (current_timestamp - INTERVAL '30 d') and
current_timestamp;

Aggregate  (cost=8294.40..8294.40 rows=1 width=0)
    ->  Index Scan using accesslog_time_idx on accesslog
(cost=0.00..8287.97 rows=2569 width=0)
          Index Cond: (("time" >= (('now'::text)::timestamp(6) with time
zone - '30 days'::interval)) AND ("time" <= ('now'::text)::timestamp(6)
with time zone))
(3 rows)

while this query uses a less efficient seq scan:

select count(*)
from accesslog
where time  between (timeofday()::timestamp - INTERVAL '30 d') and
timeofday()::timestamp;

Aggregate  (cost=34966.56..34966.56 rows=1 width=0)
    ->  Seq Scan on accesslog  (cost=0.00..34823.86 rows=57077 width=0)
          Filter: (("time" >= (((timeofday())::timestamp without time
zone - '30 days'::interval))::timestamp with time zone) AND ("time" <=
((timeofday())::timestamp without time zone)::timestamp with time zone))
(3 rows)

Why does PG not use the index on the time column in the second select,
timeofday() has been cast to a timestamp after all.

Any insight much appreciated.

Regards
Per
--
----------------------------------------------------------------------
Per Jensen        http://www.net-es.dk/~pj         Linux rules!
----------------------------------------------------------------------

Re: Index scan vs. Seq scan on timestamps

From
Andrew - Supernews
Date:
On 2004-12-06, Per Jensen <per@net-es.dk> wrote:
> Why does PG not use the index on the time column in the second select,
> timeofday() has been cast to a timestamp after all.

"timestamp" is "timestamp without time zone" (not the most useful type in
the world). Your column is of type "timestamp with time zone" (correct).
The relationship between the two is not trivial and the lack of an index
scan therefore expected. Try casting to "timestamp with time zone" instead.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: Index scan vs. Seq scan on timestamps

From
Per Jensen
Date:
Andrew - Supernews wrote:
> On 2004-12-06, Per Jensen <per@net-es.dk> wrote:
>
>>Why does PG not use the index on the time column in the second select,
>>timeofday() has been cast to a timestamp after all.
>
>
> "timestamp" is "timestamp without time zone" (not the most useful type in
> the world). Your column is of type "timestamp with time zone" (correct).
> The relationship between the two is not trivial and the lack of an index
> scan therefore expected. Try casting to "timestamp with time zone" instead.
>

Andrew,

thanks for your fast reply.

explain
select count(*)
from accesslog
where time  between (timeofday()::timestamptz - INTERVAL '30 d') and
timeofday()::timestamptz;

gives

  Aggregate  (cost=32398.12..32398.12 rows=1 width=0)
    ->  Seq Scan on accesslog  (cost=0.00..32255.42 rows=57077 width=0)
          Filter: (("time" >= ((timeofday())::timestamp with time zone -
'30 days'::interval)) AND ("time" <= (timeofday())::timestamp with time
zone))

Still a seq scan

/Per

Re: Index scan vs. Seq scan on timestamps

From
Stephan Szabo
Date:
On Mon, 6 Dec 2004, Per Jensen wrote:

> select count(*)
> from accesslog
> where time  between (timeofday()::timestamp - INTERVAL '30 d') and
> timeofday()::timestamp;

Besides the type issue, timeofday() is volatile and thus is not allowed to
be turned into a constant in order to do an index scan because it's
allowed to return different values for every row of the input.


Re: Index scan vs. Seq scan on timestamps

From
Stephen Frost
Date:
* Stephan Szabo (sszabo@megazone.bigpanda.com) wrote:
> On Mon, 6 Dec 2004, Per Jensen wrote:
> > select count(*)
> > from accesslog
> > where time  between (timeofday()::timestamp - INTERVAL '30 d') and
> > timeofday()::timestamp;
>
> Besides the type issue, timeofday() is volatile and thus is not allowed to
> be turned into a constant in order to do an index scan because it's
> allowed to return different values for every row of the input.

Is there a way to say "just take the value of this function at the start
of the transaction and then have it be constant" in a query?

    Stephen

Attachment

Re: Index scan vs. Seq scan on timestamps

From
Andrew - Supernews
Date:
On 2004-12-07, Stephen Frost <sfrost@snowman.net> wrote:
> Is there a way to say "just take the value of this function at the start
> of the transaction and then have it be constant" in a query?

Why not use CURRENT_TIMESTAMP, etc., which do exactly that?

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: Index scan vs. Seq scan on timestamps

From
Stephan Szabo
Date:
On Mon, 6 Dec 2004, Stephen Frost wrote:

> * Stephan Szabo (sszabo@megazone.bigpanda.com) wrote:
> > On Mon, 6 Dec 2004, Per Jensen wrote:
> > > select count(*)
> > > from accesslog
> > > where time  between (timeofday()::timestamp - INTERVAL '30 d') and
> > > timeofday()::timestamp;
> >
> > Besides the type issue, timeofday() is volatile and thus is not allowed to
> > be turned into a constant in order to do an index scan because it's
> > allowed to return different values for every row of the input.
>
> Is there a way to say "just take the value of this function at the start
> of the transaction and then have it be constant" in a query?

I can't think of a general one unless you make some kind of session
variable functions where the get was stable.  In this particular case
now() or CURRENT_TIMESTAMP is a stable at transaction start time value.

Currently you can fake the system out by using a scalar subselect or
writing a wrapper function that lies about volatility, but I don't believe
that those are considered guaranteed to keep working forever.


Re: Index scan vs. Seq scan on timestamps

From
pj@net-es.dk (Per Jensen)
Date:
Den Tue, Dec 07, 2004 at 03:13:04AM -0000 eller der omkring skrev Andrew - Supernews:
> On 2004-12-07, Stephen Frost <sfrost@snowman.net> wrote:
> > Is there a way to say "just take the value of this function at the start
> > of the transaction and then have it be constant" in a query?
>
> Why not use CURRENT_TIMESTAMP, etc., which do exactly that?



Oops,

CURRENT_TIMESTAMP *does* advance from transaction to transaction. As you can
see from my previous mail, I thought it was fixed to the time of session
start.

CURRENT_TIMESTAMP is fine for me then and I will not use
timeofday()::timestamptz

/Per


Re: Index scan vs. Seq scan on timestamps

From
pj@net-es.dk (Per Jensen)
Date:
Den Tue, Dec 07, 2004 at 03:13:04AM -0000 eller der omkring skrev Andrew - Supernews:
> On 2004-12-07, Stephen Frost <sfrost@snowman.net> wrote:
> > Is there a way to say "just take the value of this function at the start
> > of the transaction and then have it be constant" in a query?
>
> Why not use CURRENT_TIMESTAMP, etc., which do exactly that?
>

Because when using transactions, CURRENT_TIMESTAMP does not advance, but is fixed
to time of session start

/Per

Re: Index scan vs. Seq scan on timestamps

From
Michael Fuhr
Date:
On Tue, Dec 07, 2004 at 09:25:20AM +0100, Per Jensen wrote:
> Den Tue, Dec 07, 2004 at 03:13:04AM -0000 eller der omkring skrev Andrew - Supernews:
> > On 2004-12-07, Stephen Frost <sfrost@snowman.net> wrote:
> > > Is there a way to say "just take the value of this function at the start
> > > of the transaction and then have it be constant" in a query?
> >
> > Why not use CURRENT_TIMESTAMP, etc., which do exactly that?
>
> Because when using transactions, CURRENT_TIMESTAMP does not advance, but is fixed
> to time of session start

CURRENT_TIMESTAMP is fixed to the time of transaction start, not
session start; this is documented and observable behavior.  Can you
demonstrate otherwise?  If so, on what version of PostgreSQL?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/