Thread: Create index on the year of a date column

Create index on the year of a date column

From
"Nick Barr"
Date:
Hi,

I am trying to create an index on the year of a date field, to speed up
some queries. Table structure is as follows

------------------------------------------------------------------------
----
CREATE SEQUENCE "sm_item_id_seq" start 1 increment 1 maxvalue
9223372036854775807 minvalue 1 cache 1;
CREATE TABLE "sm_item" (
    "item_id" int4 DEFAULT nextval('"sm_item_id_seq"'::text) NOT
NULL,
    "item_created_date_start" date,
    CONSTRAINT "sm_item_pkey" PRIMARY KEY ("item_id")
) WITHOUT OIDS;
------------------------------------------------------------------------
----

And I have tried the following to create the actual index

------------------------------------------------------------------------
----
create index sm_item_cdates_idx ON sm_item (extract(year from
item_created_date_start));
------------------------------------------------------------------------
----

The response I get from psql is

------------------------------------------------------------------------
----
sm_live=# create index sm_item_cdates_idx ON sm_item (extract(year from
item_created_date_start));
ERROR:  parser: parse error at or near "(" at character 52
------------------------------------------------------------------------
----

This relates to the brackets surrounding the "year from
item_created_date_start" bit.

Am I doing anything blatantly wrong? Can I actually use the extract
function for an index? Would I still get a speed improvement if I were
to just index the whole of the field, rather than just the year?

An example query that I have been running is:

------------------------------------------------------------------------
----SELECT item_id, item_created_date_start FROM sm_item WHERE
extract(year FROM item_created_date_start) = 1685;
------------------------------------------------------------------------
----

Which of course has been using a seq scan, as there is absolutely no
index on this column as yet.

Kind Regards,

Nick Barr
WebBased Ltd.

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.






Re: Create index on the year of a date column

From
Bruno Wolff III
Date:
On Thu, Jun 05, 2003 at 17:38:21 +0100,
  Nick Barr <nick.barr@webbased.co.uk> wrote:
> Hi,
>
> I am trying to create an index on the year of a date field, to speed up
> some queries. Table structure is as follows

Wouldn't an index on the plain date work as well? You would have to
make queries for years be range queries instead of equal, but the index
could be used for other things (like ordering by date).

Re: Create index on the year of a date column

From
Ron Johnson
Date:
On Thu, 2003-06-05 at 11:38, Nick Barr wrote:
> Hi,
>
> I am trying to create an index on the year of a date field, to speed up
> some queries. Table structure is as follows

If, for example, you want all records where item_created_start_date
in calendar year 2001, why not create the index on i_c_s_d and say
  WHERE item_created_start_date between '2001-01-01' and '2001-12-31'


--
+-----------------------------------------------------------+
| Ron Johnson, Jr.     Home: ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson |
|                                                           |
| Regarding war zones: "There's nothing sacrosanct about a  |
| hotel with a bunch of journalists in it."                 |
|     Marine Lt. Gen. Bernard E. Trainor (Retired)          |
+-----------------------------------------------------------+


Re: Create index on the year of a date column

From
"Nick Barr"
Date:
Hi,

Thanks guys for all the responses, they are really helpful. To
summarise:

1. Use an index on the entire date column and use ranges to make use of
the index.
2. Create a dummy function that wraps the extract function call, and use
this as the index.
3. Create another column, for the year, filled in automatically using a
trigger and index that.
4. Wait until 7.4 which will allow me to do this.

Brilliant. I suspect I will probably choose the first one for now, and
see how that fares. I suspect, as per normal with Postgres, it will
absolutely fly ;-). However, if performance becomes more of an issue,
which it probably wont, then 7.4 will be out by then so I will use that.

Thanks again for your time and responses. Much appreciated

Kind Regards,

Nick Barr
WebBased Ltd.

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.



> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 06 June 2003 06:29
> To: Nick Barr
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Create index on the year of a date column
>
> "Nick Barr" <nick.barr@webbased.co.uk> writes:
> > SELECT item_id, item_created_date_start FROM sm_item WHERE
> > extract(year FROM item_created_date_start) = 1685;
>
> As of 7.4 you will actually be able to build an index on an expression
> like that:
>
> regression=# CREATE TABLE "sm_item" ("item_created_date_start" date);
> CREATE TABLE
> regression=# create index sm_item_cdates_idx ON sm_item ((extract(year
> from item_created_date_start)));
> CREATE INDEX
> regression=# explain SELECT * FROM sm_item WHERE
> regression-# extract(year FROM item_created_date_start) = 1685;
>                                                         QUERY PLAN
>
------------------------------------------------------------------------
--
> ------------------------------------------------
>  Index Scan using sm_item_cdates_idx on sm_item  (cost=0.00..17.09
rows=5
> width=4)
>    Index Cond: (date_part('year'::text,
> (item_created_date_start)::timestamp without time zone) = 1685::double
> precision)
> (2 rows)
>
> But in current releases the best bet is a range inquiry using a plain
> index:
>
> regression=# create index fooi on sm_item (item_created_date_start);
> CREATE INDEX
> regression=# explain SELECT * FROM sm_item WHERE
> regression-# item_created_date_start BETWEEN '1685-01-01' AND
'1685-12-
> 31';
>                                                      QUERY PLAN
>
------------------------------------------------------------------------
--
> -------------------------------------------
>  Index Scan using fooi on sm_item  (cost=0.00..17.08 rows=5 width=4)
>    Index Cond: ((item_created_date_start >= '1685-01-01'::date) AND
> (item_created_date_start <= '1685-12-31'::date))
> (2 rows)
>
>             regards, tom lane



Possible bug in CREATE INDEX? Was: Re: Create index on the year of a date column

From
"Mattias Kregert"
Date:
> I am trying to create an index on the year of a date field, to speed up
> some queries. Table structure is as follows
[snip]
> ------------------------------------------------------------------------
> And I have tried the following to create the actual index
> ------------------------------------------------------------------------
> create index sm_item_cdates_idx ON sm_item (extract(year from
> item_created_date_start));
> ------------------------------------------------------------------------
> The response I get from psql is
> ------------------------------------------------------------------------
> sm_live=# create index sm_item_cdates_idx ON sm_item (extract(year from
> item_created_date_start));
> ERROR:  parser: parse error at or near "(" at character 52
> ------------------------------------------------------------------------

I think this is a bug. Some functions/builtins can not be used in a functional index. date_part() does not work
either...

However, if you wrap it in another function it works like it should:
CREATE FUNCTION get_year (date) RETURNS double precision AS '
    SELECT extract(year from $1)' LANGUAGE SQL IMMUTABLE;
CREATE INDEX sm_item_cdates_idx ON sm_item (get_year(item_created_date_start));

But I think it would be faster to simply put an index on the date, not the year part. A direct
index should be faster than a functional index.


> Am I doing anything blatantly wrong? Can I actually use the extract
> function for an index? Would I still get a speed improvement if I were
> to just index the whole of the field, rather than just the year?
> An example query that I have been running is:
> ------------------------------------------------------------------------
> SELECT item_id, item_created_date_start FROM sm_item WHERE
> extract(year FROM item_created_date_start) = 1685;
> ------------------------------------------------------------------------
> Nick Barr

I am not sure the index code is intelligent enought to realize that the index can be used if you do the "extract()"
thing,but it will work if you do it like this: 
CREATE INDEX sm_item_cdates_idx ON sm_item (item_created_date_start);
SELECT item_id, item_created_date_start FROM sm_item
  WHERE (item_created_date_start >= '1685-01-01' AND item_created_date_start <= '1685-12-31');

/Mattias


Re: Create index on the year of a date column

From
Oleg Bartunov
Date:
What if you create sql-function without arguments and use it to create
functional index ?

    Oleg
On Thu, 5 Jun 2003, Nick Barr wrote:

> Hi,
>
> I am trying to create an index on the year of a date field, to speed up
> some queries. Table structure is as follows
>
> ------------------------------------------------------------------------
> ----
> CREATE SEQUENCE "sm_item_id_seq" start 1 increment 1 maxvalue
> 9223372036854775807 minvalue 1 cache 1;
> CREATE TABLE "sm_item" (
>     "item_id" int4 DEFAULT nextval('"sm_item_id_seq"'::text) NOT
> NULL,
>     "item_created_date_start" date,
>     CONSTRAINT "sm_item_pkey" PRIMARY KEY ("item_id")
> ) WITHOUT OIDS;
> ------------------------------------------------------------------------
> ----
>
> And I have tried the following to create the actual index
>
> ------------------------------------------------------------------------
> ----
> create index sm_item_cdates_idx ON sm_item (extract(year from
> item_created_date_start));
> ------------------------------------------------------------------------
> ----
>
> The response I get from psql is
>
> ------------------------------------------------------------------------
> ----
> sm_live=# create index sm_item_cdates_idx ON sm_item (extract(year from
> item_created_date_start));
> ERROR:  parser: parse error at or near "(" at character 52
> ------------------------------------------------------------------------
> ----
>
> This relates to the brackets surrounding the "year from
> item_created_date_start" bit.
>
> Am I doing anything blatantly wrong? Can I actually use the extract
> function for an index? Would I still get a speed improvement if I were
> to just index the whole of the field, rather than just the year?
>
> An example query that I have been running is:
>
> ------------------------------------------------------------------------
> ----SELECT item_id, item_created_date_start FROM sm_item WHERE
> extract(year FROM item_created_date_start) = 1685;
> ------------------------------------------------------------------------
> ----
>
> Which of course has been using a seq scan, as there is absolutely no
> index on this column as yet.
>
> Kind Regards,
>
> Nick Barr
> WebBased Ltd.
>
> This email and any attachments are confidential to the intended
> recipient and may also be privileged. If you are not the intended
> recipient please delete it from your system and notify the sender. You
> should not copy it or use it for any purpose nor disclose or distribute
> its contents to any other person.
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Create index on the year of a date column

From
Tom Lane
Date:
"Nick Barr" <nick.barr@webbased.co.uk> writes:
> SELECT item_id, item_created_date_start FROM sm_item WHERE
> extract(year FROM item_created_date_start) = 1685;

As of 7.4 you will actually be able to build an index on an expression
like that:

regression=# CREATE TABLE "sm_item" ("item_created_date_start" date);
CREATE TABLE
regression=# create index sm_item_cdates_idx ON sm_item ((extract(year from item_created_date_start)));
CREATE INDEX
regression=# explain SELECT * FROM sm_item WHERE
regression-# extract(year FROM item_created_date_start) = 1685;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Index Scan using sm_item_cdates_idx on sm_item  (cost=0.00..17.09 rows=5 width=4)
   Index Cond: (date_part('year'::text, (item_created_date_start)::timestamp without time zone) = 1685::double
precision)
(2 rows)

But in current releases the best bet is a range inquiry using a plain
index:

regression=# create index fooi on sm_item (item_created_date_start);
CREATE INDEX
regression=# explain SELECT * FROM sm_item WHERE
regression-# item_created_date_start BETWEEN '1685-01-01' AND '1685-12-31';
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Scan using fooi on sm_item  (cost=0.00..17.08 rows=5 width=4)
   Index Cond: ((item_created_date_start >= '1685-01-01'::date) AND (item_created_date_start <= '1685-12-31'::date))
(2 rows)

            regards, tom lane

Re: Create index on the year of a date column

From
"Vincent Hikida"
Date:
> regression=# explain SELECT * FROM sm_item WHERE
> regression-# item_created_date_start BETWEEN '1685-01-01' AND
'1685-12-31';
>                                                      QUERY PLAN

Hmm, if date includes the time, shouldn't the query be:

SELECT *
     FROM sm_item
  WHERE item_created_date_start >= '1685-01-01'
         AND item_created_date_start < '1686-01-01'



Vincent Hikida,
Member of Technical Staff - Urbana Software, Inc.
"A Personalized Learning Experience"

www.UrbanaSoft.com



Re: Possible bug in CREATE INDEX? Was: Re: Create index

From
Stephan Szabo
Date:
On Fri, 6 Jun 2003, Mattias Kregert wrote:

> > I am trying to create an index on the year of a date field, to speed up
> > some queries. Table structure is as follows
> [snip]
> > ------------------------------------------------------------------------
> > And I have tried the following to create the actual index
> > ------------------------------------------------------------------------
> > create index sm_item_cdates_idx ON sm_item (extract(year from
> > item_created_date_start));
> > ------------------------------------------------------------------------
> > The response I get from psql is
> > ------------------------------------------------------------------------
> > sm_live=# create index sm_item_cdates_idx ON sm_item (extract(year from
> > item_created_date_start));
> > ERROR:  parser: parse error at or near "(" at character 52
> > ------------------------------------------------------------------------
>
> I think this is a bug. Some functions/builtins can not be used in a

It's not, just a limitation that hadn't been taken out yet (until 7.4
basically).  The syntax for 7.3 says something to the effect
of func_name(column [, ...]) so all the arguments must be columns in the
table.  This means that date_part and substring and other functions which
need constants to be useful or things that aren't strictly speaking
functions really can't be used without wrapping.

> functional index. date_part() does not work either...

Sure it does, if you've got another column that has the timestamp unit
string in it, not that that's really useful as a meaningful index. ;)


Re: Create index on the year of a date column

From
Stephan Szabo
Date:
On Thu, 5 Jun 2003, Nick Barr wrote:

> CREATE SEQUENCE "sm_item_id_seq" start 1 increment 1 maxvalue
> 9223372036854775807 minvalue 1 cache 1;
> CREATE TABLE "sm_item" (
>     "item_id" int4 DEFAULT nextval('"sm_item_id_seq"'::text) NOT
> NULL,
>     "item_created_date_start" date,
>     CONSTRAINT "sm_item_pkey" PRIMARY KEY ("item_id")
> ) WITHOUT OIDS;
> ------------------------------------------------------------------------
> ----
>
> And I have tried the following to create the actual index
>
> ------------------------------------------------------------------------
> ----
> create index sm_item_cdates_idx ON sm_item (extract(year from
> item_created_date_start));
> ------------------------------------------------------------------------

In 7.4, I believe you'll be allowed to say
sm_item((extract(year from item_created_date_start)))

For now, you'd need to make an immutable function that does the extract
year from $1 and use that in the index and query, although I agree with
the other response that it might just be better to index the whole value
and use range queries instead.