Home > mailing lists

Re: Timestamp-based indexing - Mailing list pgsql-performance

From	Harmon S. Nine
Subject	Re: Timestamp-based indexing
Date	July 26, 2004 15:46:52
Msg-id	41052762.50801@netarx.com Whole thread Raw
In response to	Re: Timestamp-based indexing ("Matthew T. O'Connor" <matthew@zeut.net>)
List	pgsql-performance

Tree view

We were getting a little desperate, so we engaged in overkill to rule
out lack-of-analyze as a cause for the slow queries.

Thanks for your advice :)

-- Harmon

Matthew T. O'Connor wrote:

> VACUUM FULL ANALYZE every 3 hours seems a little severe.  You will
> probably be be served just as well by VACUUM ANALYZE.  But you
> probably don't need the VACUUM part most of the time.   You might try
> doing an ANALYZE on the specific tables you are having issues with.
> Since ANALYZE should be much quicker and not have the performance
> impact of a VACUUM, you could do it every hour, or even every 15 minutes.
>
> Good luck...
>
> Harmon S. Nine wrote:
>
>> Hello --
>>
>> To increase query (i.e. select) performance, we're trying to get
>> postgres to use an index based on a timestamp column in a given table.
>>
>> Event-based data is put into this table several times a minute, with
>> the timestamp indicating when a particular row was placed in the table.
>>
>> The table is purged daily, retaining only the rows that are less than
>> 7 days old.  That is, any row within the table is less than 1 week
>> old (+ 1 day, since the purge is daily).
>>
>> A typical number of rows in the table is around 400,000.
>>
>> A "VACUUM FULL ANALYZE"  is performed every 3 hours.
>>
>>
>> The problem:
>> We often query the table to extract those rows that are, say, 10
>> minutes old or less.
>>
>> Given there are 10080 minutes per week, the planner could, properly
>> configured, estimate the number of rows returned by such a query to be:
>>
>> 10 min/ 10080 min  *  400,000 = 0.001 * 400,000 = 400.
>>
>> Making an index scan, with the timestamp field the index, far faster
>> then a sequential scan.
>>
>>
>> However, we can't get the planner to do an timestamp-based index scan.
>>
>> Anyone know what to do?
>>
>>
>> Here's the table specs:
>>
>> monitor=# \d "eventtable"
>>                                        Table "public.eventtable"
>>  Column   |            Type             |
>> Modifiers
>> -----------+-----------------------------+--------------------------------------------------------------
>>
>> timestamp | timestamp without time zone | not null default
>> ('now'::text)::timestamp(6) with time zone
>> key       | bigint                      | not null default
>> nextval('public."eventtable_key_seq"'::text)
>> propagate | boolean                     |
>> facility  | character(10)               |
>> priority  | character(10)               |
>> host      | character varying(128)      | not null
>> message   | text                        | not null
>> Indexes:
>>    "eventtable_pkey" primary key, btree ("timestamp", "key")
>>    "eventtable_host" btree (host)
>>    "eventtable_timestamp" btree ("timestamp")
>>
>>
>> Here's a query (with "explain analyze"):
>>
>> monitor=# explain analyze select * from "eventtable" where timestamp
>> > CURRENT_TIMESTAMP - INTERVAL '10 minutes';
>>                                                         QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------------

>>
>> Seq Scan on "eventtable"  (cost=0.00..19009.97 rows=136444 width=155)
>> (actual time=11071.073..11432.522 rows=821 loops=1)
>>   Filter: (("timestamp")::timestamp with time zone >
>> (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
>> Total runtime: 11433.384 ms
>> (3 rows)
>>
>>
>> Here's something strange.  We try to disable sequential scans, but to
>> no avail.  The estimated cost skyrockets, though:
>>
>> monitor=# set enable_seqscan = false;
>> SET
>> monitor=# explain analyze select * from "eventtable" where timestamp
>> > CURRENT_TIMESTAMP - INTERVAL '10 minutes';
>>                                                             QUERY PLAN
>>
-------------------------------------------------------------------------------------------------------------------------------------

>>
>> Seq Scan on "eventtable"  (cost=100000000.00..100019009.97
>> rows=136444 width=155) (actual time=9909.847..9932.438 rows=1763
>> loops=1)
>>   Filter: (("timestamp")::timestamp with time zone >
>> (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
>> Total runtime: 9934.353 ms
>> (3 rows)
>>
>> monitor=# set enable_seqscan = true;
>> SET
>> monitor=#
>>
>>
>>
>> Any help is greatly appreciated :)
>>
>> -- Harmon
>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: if posting/reading through Usenet, please send an appropriate
>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>      message can get through to the mailing list cleanly
>>
>

pgsql-performance by date:

From: "Harmon S. Nine"
Date: 26 July 2004, 15:43:06
Subject: Re: Timestamp-based indexing

From: Tom Lane
Date: 26 July 2004, 15:59:59
Subject: Re: Timestamp-based indexing

Re: Timestamp-based indexing - Mailing list pgsql-performance

Previous

Next