Optimizing suggestions - Mailing list pgsql-general

From David Ford
Subject Optimizing suggestions
Date
Msg-id 3D13CC95.7010109@blue-labs.org
Whole thread Raw
Responses Re: Optimizing suggestions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Ok, I have a table which is currently a bit of a conglomerate.  I intend
to break it down into multiple tables as is more benefiting an RDBMS but
here is the scoop so far.  I have a table that I need to count up
statistics on it based on the timestamp of the entry.

bmilter=> \d history ;
                      Table "history"
     Column      |           Type           |   Modifiers
-----------------+--------------------------+---------------
 bmilter_host_id | inet                     | not null
 timestamp       | timestamp with time zone | default now()
 size            | bigint                   | not null
 content_type    | smallint                 | default '0'
 message_id      | character varying        | not null
Indexes: history_idx

The index is of a btree type on column timestamp.

What I need to do is get a count of the number of rows for a given day
for the last N days.  Typically 7, 14, 1mo, etc.  At present this takes
just under one second per query, ~680ms.  Each query returns the count
of one day.  So for a week of data, this comes out to 7 queries and
about 4 seconds worth of time. An example query for two days ago is:

bmilter=> select count(*) from history where timestamp::date =
now()::date - '2 days'::interval;
 count
-------
  3513

I am currently accumulating about 4K rows per day but I expect this to
nearly double.  I'm looking for enlightenment, suggestions on how to
improve this etc.  As stated above, I'd like to fetch a count of rows
per day for the last N days.

The explain return is here:

bmilter=> explain analyze select count(*) from history where
timestamp::date = now()::date - '2 days'::interval;
NOTICE:  QUERY PLAN:

Aggregate  (cost=690.66..690.66 rows=1 width=0) (actual
time=674.95..674.95 rows=1 loops=1)
  ->  Seq Scan on history  (cost=0.00..690.46 rows=83 width=0) (actual
time=219.03..663.33 rows=3513 loops=1)
Total runtime: 675.11 msec

In the near future I'm going to split this table up into many sub tables
and introduce several more columns, i.e.:

      create table hx_id (
         serial            int8,
         bmilter_host_id   inet NOT NULL,
         timestamp         timestamp DEFAULT 'now'::timestamp primary key
      );

      create table hx_stats (
         serial            int8,
         size              int8 DEFAULT '0',
         content_type      int2 DEFAULT '0',
         message_id        character varying
      );

      create table hx_recipients (
         serial            int8,
         recipients        character varying
      );

      create table hx_spam_tagged (
         serial            int8,
         spam_id           int8 references spam_ids (id) on update cascade
      );


I believe I've laid it out now, so hit me with it if you care to.  What
can I do to improve the speed with which the queries are done?  I don't
have the fastest of machines, so I do need to tune as best can.  Normal
postmaster tuning aside, I've already increased the shared buffers etc
but would touching any of the optimizer section help?.  I'm needing
improvement on my concepts I'm sure.

Thank you,
David



pgsql-general by date:

Previous
From: "Bruno Baguette"
Date:
Subject: Unable to run one pl/pgSQL function :-(
Next
From: "Marc G. Fournier"
Date:
Subject: Re: Download version on website