Thread: Index on a function and SELECT DISTINCT
If I have this table, function and index in Postgres 7.3.6 ... """ CREATE TABLE news_stories ( id serial primary key NOT NULL, pub_date timestamp with time zone NOT NULL, ... ) CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone) returns timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);' LANGUAGE 'SQL' IMMUTABLE; CREATE INDEX news_stories_pub_date_year_trunc ON news_stories( get_year_trunc(pub_date) ); """ ...why does this query not use the index? db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM news_stories; QUERY PLAN --------------------------------------------------------------------------------- Unique (cost=59597.31..61311.13 rows=3768 width=8) -> Sort (cost=59597.31..60454.22 rows=342764 width=8) Sort Key: date_trunc('year'::text, pub_date) -> Seq Scan on news_stories (cost=0.00..23390.55 rows=342764 width=8) (4 rows) The query is noticably slow (2 seconds) on a database with 150,000+ records. How can I speed it up? Thanks, Adrian
On Fri, 14 Jan 2005 12:32:12 -0600 Adrian Holovaty <postgresql@holovaty.com> wrote: > If I have this table, function and index in Postgres 7.3.6 ... > > """ > CREATE TABLE news_stories ( > id serial primary key NOT NULL, > pub_date timestamp with time zone NOT NULL, > ... > ) > CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone) > returns timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);' > LANGUAGE 'SQL' IMMUTABLE; > CREATE INDEX news_stories_pub_date_year_trunc ON > news_stories( get_year_trunc(pub_date) ); > """ > > ...why does this query not use the index? > > db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM > news_stories; > QUERY PLAN > --------------------------------------------------------------------- > ------------ > Unique (cost=59597.31..61311.13 rows=3768 width=8) > -> Sort (cost=59597.31..60454.22 rows=342764 width=8) > Sort Key: date_trunc('year'::text, pub_date) > -> Seq Scan on news_stories (cost=0.00..23390.55 > rows=342764 > width=8) > (4 rows) > > The query is noticably slow (2 seconds) on a database with 150,000+ > records. How can I speed it up? It's doing a sequence scan because you're not limiting the query in the FROM clause. No point in using an index when you're asking for the entire table. :) --------------------------------- Frank Wiles <frank@wiles.org> http://www.wiles.org ---------------------------------
Frank Wiles wrote: > Adrian Holovaty <postgresql@holovaty.com> wrote: > > If I have this table, function and index in Postgres 7.3.6 ... > > > > """ > > CREATE TABLE news_stories ( > > id serial primary key NOT NULL, > > pub_date timestamp with time zone NOT NULL, > > ... > > ) > > CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone) > > returns timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);' > > LANGUAGE 'SQL' IMMUTABLE; > > CREATE INDEX news_stories_pub_date_year_trunc ON > > news_stories( get_year_trunc(pub_date) ); > > """ > > > > ...why does this query not use the index? > > > > db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM > > news_stories; > > QUERY PLAN > > --------------------------------------------------------------------- > > ------------ > > Unique (cost=59597.31..61311.13 rows=3768 width=8) > > -> Sort (cost=59597.31..60454.22 rows=342764 width=8) > > Sort Key: date_trunc('year'::text, pub_date) > > -> Seq Scan on news_stories (cost=0.00..23390.55 > > rows=342764 > > width=8) > > (4 rows) > > > > The query is noticably slow (2 seconds) on a database with 150,000+ > > records. How can I speed it up? > > It's doing a sequence scan because you're not limiting the query in > the FROM clause. No point in using an index when you're asking for > the entire table. :) Ah, that makes sense. So is there a way to optimize SELECT DISTINCT queries that have no WHERE clause? Adrian
Try : EXPLAIN SELECT get_year_trunc(pub_date) as foo FROM ... GROUP BY foo Apart from that, you could use a materialized view... >> > db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM > Ah, that makes sense. So is there a way to optimize SELECT DISTINCT > queries > that have no WHERE clause? > > Adrian > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >