Home > mailing lists

indexing for distinct search in timestamp based table - Mailing list pgsql-performance

From	Rainer Mager
Subject	indexing for distinct search in timestamp based table
Date	August 28, 2008 05:14:33
Msg-id	000c01c908e4$f64e92c0$e2ebb840$@com Whole thread
Responses	Re: indexing for distinct search in timestamp based table Re: indexing for distinct search in timestamp based table Re: indexing for distinct search in timestamp based table
List	pgsql-performance

Tree view

I'm looking for some help in speeding up searches. My table is pretty simple
(see below), but somewhat large, and continuously growing. Currently it has
about 50 million rows.

The table is (I know I have excessive indexes, I'm trying to get the
appropriate ones and drop the extras):
                                          Table "public.ad_log"
    Column    |            Type             |
Modifiers
--------------+-----------------------------+-------------------------------
-----------------------------
 ad_log_id    | integer                     | not null default
nextval('ad_log_ad_log_id_seq'::regclass)
 channel_name | text                        | not null
 player_name  | text                        | not null
 ad_name      | text                        | not null
 start_time   | timestamp without time zone | not null
 end_time     | timestamp without time zone | not null
Indexes:
    "ad_log_pkey" PRIMARY KEY, btree (ad_log_id)
    "ad_log_channel_name_key" UNIQUE, btree (channel_name, player_name,
ad_name, start_time, end_time)
    "ad_log_ad_and_start" btree (ad_name, start_time)
    "ad_log_ad_name" btree (ad_name)
    "ad_log_all" btree (channel_name, player_name, start_time, ad_name)
    "ad_log_channel_name" btree (channel_name)
    "ad_log_end_time" btree (end_time)
    "ad_log_player_and_start" btree (player_name, start_time)
    "ad_log_player_name" btree (player_name)
    "ad_log_start_time" btree (start_time)



The query I'm trying to speed up is below. In it the <field> tag can be one
of channel_name, player_name, or ad_name. I'm actually trying to return the
distinct values and I found GROUP BY to be slightly faster than using
DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses
in which case we use '%', but it seems Postgres optimizes that pretty well.

SELECT <field> FROM ad_log
    WHERE channel_name LIKE :channel_name
    AND player_name LIKE :player_name
    AND ad_name LIKE :ad_name
    AND start_time BETWEEN :start_date AND (date(:end_date) + 1)
    GROUP BY <field> ORDER BY <field>


A typical query is:

explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%'
AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND
(date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name;

with the result being:

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------
 Sort  (cost=1163169.02..1163169.03 rows=5 width=10) (actual
time=75460.187..75460.192 rows=15 loops=1)
   Sort Key: channel_name
   Sort Method:  quicksort  Memory: 17kB
   ->  HashAggregate  (cost=1163168.91..1163168.96 rows=5 width=10) (actual
time=75460.107..75460.114 rows=15 loops=1)
         ->  Bitmap Heap Scan on ad_log  (cost=285064.30..1129582.84
rows=13434427 width=10) (actual time=8506.250..65771.597 rows=13701296
loops=1)
               Recheck Cond: ((start_time >= '2008-07-01
00:00:00'::timestamp without time zone) AND (start_time <=
'2008-07-29'::date))
               Filter: ((channel_name ~~ '%'::text) AND (ad_name ~~
'%'::text))
               ->  Bitmap Index Scan on ad_log_start_time
(cost=0.00..281705.70 rows=13434427 width=0) (actual time=8488.443..8488.443
rows=13701296 loops=1)
                     Index Cond: ((start_time >= '2008-07-01
00:00:00'::timestamp without time zone) AND (start_time <=
'2008-07-29'::date))
 Total runtime: 75460.361 ms


It seems to me there should be some way to create an index to speed this up,
but the various ones I've tried so far haven't helped. Any suggestions would
be greatly appreciated.

pgsql-performance by date:

From: david@lang.hm
Date: 28 August 2008, 03:23:02
Subject: Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

From: "H. Hall"
Date: 28 August 2008, 06:46:20
Subject: Re: indexing for distinct search in timestamp based table

indexing for distinct search in timestamp based table - Mailing list pgsql-performance

Previous

Next