Re: Timestamp-based indexing - Mailing list pgsql-performance

From Tom Lane
Subject Re: Timestamp-based indexing
Date
Msg-id 28247.1090857551@sss.pgh.pa.us
Whole thread Raw
In response to Re: Timestamp-based indexing  ("Matthew T. O'Connor" <matthew@zeut.net>)
Responses Re: Timestamp-based indexing  (Litao Wu <litaowu@yahoo.com>)
List pgsql-performance
"Matthew T. O'Connor" <matthew@zeut.net> writes:
> VACUUM FULL ANALYZE every 3 hours seems a little severe.

If rows are only deleted once a day, that's a complete waste of time,
indeed.

I'd suggest running a plain VACUUM just after the deletion pass is done.
ANALYZEs are a different matter and possibly need to be done every
few hours, seeing that your maximum timestamp value is constantly
changing.

>> monitor=# set enable_seqscan = false;
>> SET
>> monitor=# explain analyze select * from "eventtable" where timestamp >
>> CURRENT_TIMESTAMP - INTERVAL '10 minutes';
>> QUERY PLAN
>>
-------------------------------------------------------------------------------------------------------------------------------------

>>
>> Seq Scan on "eventtable"  (cost=100000000.00..100019009.97 rows=136444
>> width=155) (actual time=9909.847..9932.438 rows=1763 loops=1)
>> Filter: (("timestamp")::timestamp with time zone >
>> (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
>> Total runtime: 9934.353 ms

You've got some datatype confusion, too.  CURRENT_TIMESTAMP yields
timestamp with time zone, and since you made the timestamp column
timestamp without time zone, you've got a cross-type comparison which is
not indexable (at least not in 7.4).  My opinion is that you chose the
wrong type for the column.  Values that represent specific real-world
instants should always be timestamp with time zone, so that they mean
the same thing if you look at them in a different time zone.

Another issue here is that because CURRENT_TIMESTAMP - INTERVAL '10
minutes' isn't a constant, the planner isn't able to make use of the
statistics gathered by ANALYZE anyway.  That's why the rowcount estimate
has nothing to do with reality.  Unless you force the decision with
"set enable_seqscan", the planner will never pick an indexscan with this
rowcount estimate.  The standard advice for getting around this is to
hide the nonconstant calculation inside a function that's deliberately
mislabeled immutable.  For example,

create function ago(interval) returns timestamp with time zone as
'select now() - $1' language sql strict immutable;

select * from "eventtable" where timestamp > ago('10 minutes');

The planner folds the "ago('10 minutes')" to a constant, checks the
statistics, and should do the right thing.  Note however that this
technique may break if you put a call to ago() inside a function
or prepared statement --- it's only safe in interactive queries,
where you don't care that the value is reduced to a constant during
planning instead of during execution.

            regards, tom lane

pgsql-performance by date:

Previous
From: "Harmon S. Nine"
Date:
Subject: Re: Timestamp-based indexing
Next
From: Stephan Szabo
Date:
Subject: Re: Timestamp-based indexing