Optimizing suggestions - Mailing list pgsql-general
From | David Ford |
---|---|
Subject | Optimizing suggestions |
Date | |
Msg-id | 3D13CC95.7010109@blue-labs.org Whole thread Raw |
Responses |
Re: Optimizing suggestions
|
List | pgsql-general |
Ok, I have a table which is currently a bit of a conglomerate. I intend to break it down into multiple tables as is more benefiting an RDBMS but here is the scoop so far. I have a table that I need to count up statistics on it based on the timestamp of the entry. bmilter=> \d history ; Table "history" Column | Type | Modifiers -----------------+--------------------------+--------------- bmilter_host_id | inet | not null timestamp | timestamp with time zone | default now() size | bigint | not null content_type | smallint | default '0' message_id | character varying | not null Indexes: history_idx The index is of a btree type on column timestamp. What I need to do is get a count of the number of rows for a given day for the last N days. Typically 7, 14, 1mo, etc. At present this takes just under one second per query, ~680ms. Each query returns the count of one day. So for a week of data, this comes out to 7 queries and about 4 seconds worth of time. An example query for two days ago is: bmilter=> select count(*) from history where timestamp::date = now()::date - '2 days'::interval; count ------- 3513 I am currently accumulating about 4K rows per day but I expect this to nearly double. I'm looking for enlightenment, suggestions on how to improve this etc. As stated above, I'd like to fetch a count of rows per day for the last N days. The explain return is here: bmilter=> explain analyze select count(*) from history where timestamp::date = now()::date - '2 days'::interval; NOTICE: QUERY PLAN: Aggregate (cost=690.66..690.66 rows=1 width=0) (actual time=674.95..674.95 rows=1 loops=1) -> Seq Scan on history (cost=0.00..690.46 rows=83 width=0) (actual time=219.03..663.33 rows=3513 loops=1) Total runtime: 675.11 msec In the near future I'm going to split this table up into many sub tables and introduce several more columns, i.e.: create table hx_id ( serial int8, bmilter_host_id inet NOT NULL, timestamp timestamp DEFAULT 'now'::timestamp primary key ); create table hx_stats ( serial int8, size int8 DEFAULT '0', content_type int2 DEFAULT '0', message_id character varying ); create table hx_recipients ( serial int8, recipients character varying ); create table hx_spam_tagged ( serial int8, spam_id int8 references spam_ids (id) on update cascade ); I believe I've laid it out now, so hit me with it if you care to. What can I do to improve the speed with which the queries are done? I don't have the fastest of machines, so I do need to tune as best can. Normal postmaster tuning aside, I've already increased the shared buffers etc but would touching any of the optimizer section help?. I'm needing improvement on my concepts I'm sure. Thank you, David
pgsql-general by date: