Thread: Index on a function and SELECT DISTINCT

Index on a function and SELECT DISTINCT

From
Adrian Holovaty
Date:
If I have this table, function and index in Postgres 7.3.6 ...

"""
CREATE TABLE news_stories (
    id serial primary key NOT NULL,
    pub_date timestamp with time zone NOT NULL,
    ...
)
CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone) returns
timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);' LANGUAGE 'SQL'
IMMUTABLE;
CREATE INDEX news_stories_pub_date_year_trunc ON
news_stories( get_year_trunc(pub_date) );
"""

...why does this query not use the index?

db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM news_stories;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Unique  (cost=59597.31..61311.13 rows=3768 width=8)
   ->  Sort  (cost=59597.31..60454.22 rows=342764 width=8)
         Sort Key: date_trunc('year'::text, pub_date)
         ->  Seq Scan on news_stories  (cost=0.00..23390.55 rows=342764
width=8)
(4 rows)

The query is noticably slow (2 seconds) on a database with 150,000+ records.
How can I speed it up?

Thanks,
Adrian

Re: Index on a function and SELECT DISTINCT

From
Frank Wiles
Date:
On Fri, 14 Jan 2005 12:32:12 -0600
Adrian Holovaty <postgresql@holovaty.com> wrote:

> If I have this table, function and index in Postgres 7.3.6 ...
>
> """
> CREATE TABLE news_stories (
>     id serial primary key NOT NULL,
>     pub_date timestamp with time zone NOT NULL,
>     ...
> )
> CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone)
> returns timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);'
> LANGUAGE 'SQL' IMMUTABLE;
> CREATE INDEX news_stories_pub_date_year_trunc ON
> news_stories( get_year_trunc(pub_date) );
> """
>
> ...why does this query not use the index?
>
> db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM
> news_stories;
>                                    QUERY PLAN
> ---------------------------------------------------------------------
> ------------
>  Unique  (cost=59597.31..61311.13 rows=3768 width=8)
>    ->  Sort  (cost=59597.31..60454.22 rows=342764 width=8)
>          Sort Key: date_trunc('year'::text, pub_date)
>          ->  Seq Scan on news_stories  (cost=0.00..23390.55
>          rows=342764
> width=8)
> (4 rows)
>
> The query is noticably slow (2 seconds) on a database with 150,000+
> records. How can I speed it up?

  It's doing a sequence scan because you're not limiting the query in
  the FROM clause.  No point in using an index when you're asking for
  the entire table. :)

 ---------------------------------
   Frank Wiles <frank@wiles.org>
   http://www.wiles.org
 ---------------------------------


Re: Index on a function and SELECT DISTINCT

From
Adrian Holovaty
Date:
Frank Wiles wrote:
> Adrian Holovaty <postgresql@holovaty.com> wrote:
> > If I have this table, function and index in Postgres 7.3.6 ...
> >
> > """
> > CREATE TABLE news_stories (
> >     id serial primary key NOT NULL,
> >     pub_date timestamp with time zone NOT NULL,
> >     ...
> > )
> > CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone)
> > returns timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);'
> > LANGUAGE 'SQL' IMMUTABLE;
> > CREATE INDEX news_stories_pub_date_year_trunc ON
> > news_stories( get_year_trunc(pub_date) );
> > """
> >
> > ...why does this query not use the index?
> >
> > db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM
> > news_stories;
> >                                    QUERY PLAN
> > ---------------------------------------------------------------------
> > ------------
> >  Unique  (cost=59597.31..61311.13 rows=3768 width=8)
> >    ->  Sort  (cost=59597.31..60454.22 rows=342764 width=8)
> >          Sort Key: date_trunc('year'::text, pub_date)
> >          ->  Seq Scan on news_stories  (cost=0.00..23390.55
> >          rows=342764
> > width=8)
> > (4 rows)
> >
> > The query is noticably slow (2 seconds) on a database with 150,000+
> > records. How can I speed it up?
>
>   It's doing a sequence scan because you're not limiting the query in
>   the FROM clause.  No point in using an index when you're asking for
>   the entire table. :)

Ah, that makes sense. So is there a way to optimize SELECT DISTINCT queries
that have no WHERE clause?

Adrian

Re: Index on a function and SELECT DISTINCT

From
PFC
Date:

    Try :

EXPLAIN SELECT get_year_trunc(pub_date) as foo FROM ... GROUP BY foo

    Apart from that, you could use a materialized view...

>> > db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM

> Ah, that makes sense. So is there a way to optimize SELECT DISTINCT
> queries
> that have no WHERE clause?
>
> Adrian
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>