Thread: distinct values without seq scan

distinct values without seq scan

From
Bret Hughes
Date:
I have a table indexed on a char(35) field and want a query to return a
distinct list of the values in this column.  Is there some syntactical
magic I can do to get these values without a sequential scan?  I assume,
( here we go again ) that these values are in the index somewhere and I
seem to recall a select DISTINCT using an index in DB2 way back when but
I may be mistaken.

Searching the archives I found an email that indicated this was possible
in 7.4 and infact is the reason I upgraded from 7.2x.

The table in question has about 700K rows and grows daily so seq scans
hurt more and more all the time.

Tips appreciated.

elevating=# \d logrecords                                 Table "public.logrecords"     Column      |     Type      |
                    
 
Modifiers                         
------------------+---------------+-----------------------------------------------------------serial           |
integer      | not null default
 
nextval('"logrecords_serial_seq"'::text)city             | smallint      | not nullbuilding         | smallint      |
notnulldisplay          | integer       | not nulladvertiser       | character(35) | not nullpagename         |
character(65)| not nulllog_date         | date          | not nullexhibition_count | integer       | not null
 
Indexes:   "logrecords_pkey" primary key, btree (serial)   "logrecords_advertiser" btree (advertiser)
"logrecords_building"btree (building)   "logrecords_city" btree (city)   "logrecords_date" btree (log_date)
 
Triggers:   "RI_ConstraintTrigger_1709151" AFTER INSERT OR UPDATE ON logrecords
FROM cities NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE
PROCEDURE "RI_FKey_check_ins"('logrec_ref_cities', 'logrecords',
'cities', 'UNSPECIFIED', 'city', 'num')   "RI_ConstraintTrigger_1709152" AFTER INSERT OR UPDATE ON logrecords
FROM buildings NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE
PROCEDURE "RI_FKey_check_ins"('logrecords_ref_buildings', 'logrecords',
'buildings', 'UNSPECIFIED', 'building', 'num')   "RI_ConstraintTrigger_1709153" AFTER INSERT OR UPDATE ON logrecords
FROM displays NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE
PROCEDURE "RI_FKey_check_ins"('logrecords_ref_display', 'logrecords',
'displays', 'UNSPECIFIED', 'display', 'num')



elevating=# explain select distinct advertiser from logrecords;                                  QUERY
PLAN                                   
--------------------------------------------------------------------------------Unique  (cost=136440.68..139858.43
rows=61width=39)  ->  Sort  (cost=136440.68..138149.55 rows=683550 width=39)        Sort Key: advertiser        ->  Seq
Scanon logrecords  (cost=0.00..20785.50 rows=683550
 
width=39)
(4 rows)

Bret





Re: distinct values without seq scan

From
Josh Berkus
Date:
Bret,

> I have a table indexed on a char(35) field and want a query to return a
> distinct list of the values in this column.  Is there some syntactical
> magic I can do to get these values without a sequential scan?  I assume,
> ( here we go again ) that these values are in the index somewhere and I
> seem to recall a select DISTINCT using an index in DB2 way back when but
> I may be mistaken.
>
> Searching the archives I found an email that indicated this was possible
> in 7.4 and infact is the reason I upgraded from 7.2x.

Well, leaving aside that you have a data normalization issue (these values 
should really be in a reference list with an FK), to take advantage of the 
7.4 optimization, try:

SELECT advertiser FROM logrecords GROUP BY advertiser ORDER BY advertiser;

-- 
Josh Berkus
Aglio Database Solutions
San Francisco