Thread: Index scan vs. Seq scan on timestamps
List, PG version is 7.4.2 I log apache hits to a postgres server. The table layout is as follows: apachelog=# \d accesslog Table "public.accesslog" Column | Type | Modifiers ----------+--------------------------+----------- id | integer | not null ip | character varying(15) | ident | character varying(200) | auth | character varying(200) | time | timestamp with time zone | request | character varying(200) | code | integer | bytes | integer | referrer | character varying(200) | agent | character varying(200) | Indexes: "accesslog_pkey" primary key, btree (id) "accesslog_time_idx" btree ("time") Number of rows: approx: 530.000 If I want to count hits 30 days back this query uses an efficient index scan: select count(*) from accesslog where time between (current_timestamp - INTERVAL '30 d') and current_timestamp; Aggregate (cost=8294.40..8294.40 rows=1 width=0) -> Index Scan using accesslog_time_idx on accesslog (cost=0.00..8287.97 rows=2569 width=0) Index Cond: (("time" >= (('now'::text)::timestamp(6) with time zone - '30 days'::interval)) AND ("time" <= ('now'::text)::timestamp(6) with time zone)) (3 rows) while this query uses a less efficient seq scan: select count(*) from accesslog where time between (timeofday()::timestamp - INTERVAL '30 d') and timeofday()::timestamp; Aggregate (cost=34966.56..34966.56 rows=1 width=0) -> Seq Scan on accesslog (cost=0.00..34823.86 rows=57077 width=0) Filter: (("time" >= (((timeofday())::timestamp without time zone - '30 days'::interval))::timestamp with time zone) AND ("time" <= ((timeofday())::timestamp without time zone)::timestamp with time zone)) (3 rows) Why does PG not use the index on the time column in the second select, timeofday() has been cast to a timestamp after all. Any insight much appreciated. Regards Per -- ---------------------------------------------------------------------- Per Jensen http://www.net-es.dk/~pj Linux rules! ----------------------------------------------------------------------
On 2004-12-06, Per Jensen <per@net-es.dk> wrote: > Why does PG not use the index on the time column in the second select, > timeofday() has been cast to a timestamp after all. "timestamp" is "timestamp without time zone" (not the most useful type in the world). Your column is of type "timestamp with time zone" (correct). The relationship between the two is not trivial and the lack of an index scan therefore expected. Try casting to "timestamp with time zone" instead. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews wrote: > On 2004-12-06, Per Jensen <per@net-es.dk> wrote: > >>Why does PG not use the index on the time column in the second select, >>timeofday() has been cast to a timestamp after all. > > > "timestamp" is "timestamp without time zone" (not the most useful type in > the world). Your column is of type "timestamp with time zone" (correct). > The relationship between the two is not trivial and the lack of an index > scan therefore expected. Try casting to "timestamp with time zone" instead. > Andrew, thanks for your fast reply. explain select count(*) from accesslog where time between (timeofday()::timestamptz - INTERVAL '30 d') and timeofday()::timestamptz; gives Aggregate (cost=32398.12..32398.12 rows=1 width=0) -> Seq Scan on accesslog (cost=0.00..32255.42 rows=57077 width=0) Filter: (("time" >= ((timeofday())::timestamp with time zone - '30 days'::interval)) AND ("time" <= (timeofday())::timestamp with time zone)) Still a seq scan /Per
On Mon, 6 Dec 2004, Per Jensen wrote: > select count(*) > from accesslog > where time between (timeofday()::timestamp - INTERVAL '30 d') and > timeofday()::timestamp; Besides the type issue, timeofday() is volatile and thus is not allowed to be turned into a constant in order to do an index scan because it's allowed to return different values for every row of the input.
* Stephan Szabo (sszabo@megazone.bigpanda.com) wrote: > On Mon, 6 Dec 2004, Per Jensen wrote: > > select count(*) > > from accesslog > > where time between (timeofday()::timestamp - INTERVAL '30 d') and > > timeofday()::timestamp; > > Besides the type issue, timeofday() is volatile and thus is not allowed to > be turned into a constant in order to do an index scan because it's > allowed to return different values for every row of the input. Is there a way to say "just take the value of this function at the start of the transaction and then have it be constant" in a query? Stephen
Attachment
On 2004-12-07, Stephen Frost <sfrost@snowman.net> wrote: > Is there a way to say "just take the value of this function at the start > of the transaction and then have it be constant" in a query? Why not use CURRENT_TIMESTAMP, etc., which do exactly that? -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
On Mon, 6 Dec 2004, Stephen Frost wrote: > * Stephan Szabo (sszabo@megazone.bigpanda.com) wrote: > > On Mon, 6 Dec 2004, Per Jensen wrote: > > > select count(*) > > > from accesslog > > > where time between (timeofday()::timestamp - INTERVAL '30 d') and > > > timeofday()::timestamp; > > > > Besides the type issue, timeofday() is volatile and thus is not allowed to > > be turned into a constant in order to do an index scan because it's > > allowed to return different values for every row of the input. > > Is there a way to say "just take the value of this function at the start > of the transaction and then have it be constant" in a query? I can't think of a general one unless you make some kind of session variable functions where the get was stable. In this particular case now() or CURRENT_TIMESTAMP is a stable at transaction start time value. Currently you can fake the system out by using a scalar subselect or writing a wrapper function that lies about volatility, but I don't believe that those are considered guaranteed to keep working forever.
Den Tue, Dec 07, 2004 at 03:13:04AM -0000 eller der omkring skrev Andrew - Supernews: > On 2004-12-07, Stephen Frost <sfrost@snowman.net> wrote: > > Is there a way to say "just take the value of this function at the start > > of the transaction and then have it be constant" in a query? > > Why not use CURRENT_TIMESTAMP, etc., which do exactly that? Oops, CURRENT_TIMESTAMP *does* advance from transaction to transaction. As you can see from my previous mail, I thought it was fixed to the time of session start. CURRENT_TIMESTAMP is fine for me then and I will not use timeofday()::timestamptz /Per
Den Tue, Dec 07, 2004 at 03:13:04AM -0000 eller der omkring skrev Andrew - Supernews: > On 2004-12-07, Stephen Frost <sfrost@snowman.net> wrote: > > Is there a way to say "just take the value of this function at the start > > of the transaction and then have it be constant" in a query? > > Why not use CURRENT_TIMESTAMP, etc., which do exactly that? > Because when using transactions, CURRENT_TIMESTAMP does not advance, but is fixed to time of session start /Per
On Tue, Dec 07, 2004 at 09:25:20AM +0100, Per Jensen wrote: > Den Tue, Dec 07, 2004 at 03:13:04AM -0000 eller der omkring skrev Andrew - Supernews: > > On 2004-12-07, Stephen Frost <sfrost@snowman.net> wrote: > > > Is there a way to say "just take the value of this function at the start > > > of the transaction and then have it be constant" in a query? > > > > Why not use CURRENT_TIMESTAMP, etc., which do exactly that? > > Because when using transactions, CURRENT_TIMESTAMP does not advance, but is fixed > to time of session start CURRENT_TIMESTAMP is fixed to the time of transaction start, not session start; this is documented and observable behavior. Can you demonstrate otherwise? If so, on what version of PostgreSQL? -- Michael Fuhr http://www.fuhr.org/~mfuhr/