Re: Optimizing suggestions - Mailing list pgsql-general

From David Ford
Subject Re: Optimizing suggestions
Date
Msg-id 3D15FE17.7040103@blue-labs.org
Whole thread Raw
In response to Optimizing suggestions  (David Ford <david+cert@blue-labs.org>)
Responses Re: Optimizing suggestions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general

Tom Lane wrote:

>David Ford <david+cert@blue-labs.org> writes:
>
>
>>As stated above, I'd like to fetch a count of rows
>>per day for the last N days.
>>
>>
>
>GROUP BY seems like the obvious answer...
>

So it does,

bmilter=> explain analyze select count(*) from history group by
timestamp::date;
NOTICE:  QUERY PLAN:

Aggregate  (cost=1784.68..1878.46 rows=1876 width=8) (actual
time=515.60..661.22 rows=8 loops=1)
  ->  Group  (cost=1784.68..1831.57 rows=18755 width=8) (actual
time=510.75..623.29 rows=19310 loops=1)
        ->  Sort  (cost=1784.68..1784.68 rows=18755 width=8) (actual
time=510.73..535.28 rows=19310 loops=1)
              ->  Seq Scan on history  (cost=0.00..453.55 rows=18755
width=8) (actual time=0.05..354.04 rows=19310 loops=1)
Total runtime: 667.70 msec

EXPLAIN

and..

bmilter=> explain analyze select count(*) from history where
timestamp::date > (now()::date - '7 days'::interval) group by
timestamp::date ;
NOTICE:  QUERY PLAN:

Aggregate  (cost=1175.93..1207.19 rows=625 width=8) (actual
time=1233.34..1349.87 rows=7 loops=1)
  ->  Group  (cost=1175.93..1191.56 rows=6252 width=8) (actual
time=1204.19..1313.24 rows=18693 loops=1)
        ->  Sort  (cost=1175.93..1175.93 rows=6252 width=8) (actual
time=1204.17..1227.90 rows=18693 loops=1)
              ->  Seq Scan on history  (cost=0.00..781.76 rows=6252
width=8) (actual time=8.52..1049.05 rows=18693 loops=1)
Total runtime: 1356.16 msec

EXPLAIN

bmilter=> select count(*) from history where timestamp::date >
(now()::date - '7 days'::interval) group by timestamp::date ;
 count
-------
  3743
  3414
  3513
  3077
  2642
  1553
   751
(7 rows)


Now.. :)  How can I further improve on this?  This would be faster if I
used two columns of 'date' and 'time', yes?

>>In the near future I'm going to split this table up into many sub tables
>>
>>
>
>Why?  Aren't you just going to have a need for lots of joins, if you
>do that?
>
>            regards, tom lane
>

No, actually most of this data will be drawn into focused reporting for
graphing metrics.  Only when I drill down to a specific (small) set of
message serial numbers, will I be doing a join of all this data.

In my naivety, which is more performance effective?  Most of the time I
will be searching for a count of statistics on a single column.

Thank you for your assistance, knowledge is appreciated,
David

p.s. I am assuming count(*) v.s. count(column) is always faster because
it's doing a generic row count v.s. accumulating stats on a column
inside a row?



pgsql-general by date:

Previous
From: Francisco Reyes
Date:
Subject: Re: website design
Next
From: Joe Conway
Date:
Subject: Re: SELECT problem