Thread: Create index on the year of a date column
Hi, I am trying to create an index on the year of a date field, to speed up some queries. Table structure is as follows ------------------------------------------------------------------------ ---- CREATE SEQUENCE "sm_item_id_seq" start 1 increment 1 maxvalue 9223372036854775807 minvalue 1 cache 1; CREATE TABLE "sm_item" ( "item_id" int4 DEFAULT nextval('"sm_item_id_seq"'::text) NOT NULL, "item_created_date_start" date, CONSTRAINT "sm_item_pkey" PRIMARY KEY ("item_id") ) WITHOUT OIDS; ------------------------------------------------------------------------ ---- And I have tried the following to create the actual index ------------------------------------------------------------------------ ---- create index sm_item_cdates_idx ON sm_item (extract(year from item_created_date_start)); ------------------------------------------------------------------------ ---- The response I get from psql is ------------------------------------------------------------------------ ---- sm_live=# create index sm_item_cdates_idx ON sm_item (extract(year from item_created_date_start)); ERROR: parser: parse error at or near "(" at character 52 ------------------------------------------------------------------------ ---- This relates to the brackets surrounding the "year from item_created_date_start" bit. Am I doing anything blatantly wrong? Can I actually use the extract function for an index? Would I still get a speed improvement if I were to just index the whole of the field, rather than just the year? An example query that I have been running is: ------------------------------------------------------------------------ ----SELECT item_id, item_created_date_start FROM sm_item WHERE extract(year FROM item_created_date_start) = 1685; ------------------------------------------------------------------------ ---- Which of course has been using a seq scan, as there is absolutely no index on this column as yet. Kind Regards, Nick Barr WebBased Ltd. This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person.
On Thu, Jun 05, 2003 at 17:38:21 +0100, Nick Barr <nick.barr@webbased.co.uk> wrote: > Hi, > > I am trying to create an index on the year of a date field, to speed up > some queries. Table structure is as follows Wouldn't an index on the plain date work as well? You would have to make queries for years be range queries instead of equal, but the index could be used for other things (like ordering by date).
On Thu, 2003-06-05 at 11:38, Nick Barr wrote: > Hi, > > I am trying to create an index on the year of a date field, to speed up > some queries. Table structure is as follows If, for example, you want all records where item_created_start_date in calendar year 2001, why not create the index on i_c_s_d and say WHERE item_created_start_date between '2001-01-01' and '2001-12-31' -- +-----------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | Regarding war zones: "There's nothing sacrosanct about a | | hotel with a bunch of journalists in it." | | Marine Lt. Gen. Bernard E. Trainor (Retired) | +-----------------------------------------------------------+
Hi, Thanks guys for all the responses, they are really helpful. To summarise: 1. Use an index on the entire date column and use ranges to make use of the index. 2. Create a dummy function that wraps the extract function call, and use this as the index. 3. Create another column, for the year, filled in automatically using a trigger and index that. 4. Wait until 7.4 which will allow me to do this. Brilliant. I suspect I will probably choose the first one for now, and see how that fares. I suspect, as per normal with Postgres, it will absolutely fly ;-). However, if performance becomes more of an issue, which it probably wont, then 7.4 will be out by then so I will use that. Thanks again for your time and responses. Much appreciated Kind Regards, Nick Barr WebBased Ltd. This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 06 June 2003 06:29 > To: Nick Barr > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Create index on the year of a date column > > "Nick Barr" <nick.barr@webbased.co.uk> writes: > > SELECT item_id, item_created_date_start FROM sm_item WHERE > > extract(year FROM item_created_date_start) = 1685; > > As of 7.4 you will actually be able to build an index on an expression > like that: > > regression=# CREATE TABLE "sm_item" ("item_created_date_start" date); > CREATE TABLE > regression=# create index sm_item_cdates_idx ON sm_item ((extract(year > from item_created_date_start))); > CREATE INDEX > regression=# explain SELECT * FROM sm_item WHERE > regression-# extract(year FROM item_created_date_start) = 1685; > QUERY PLAN > ------------------------------------------------------------------------ -- > ------------------------------------------------ > Index Scan using sm_item_cdates_idx on sm_item (cost=0.00..17.09 rows=5 > width=4) > Index Cond: (date_part('year'::text, > (item_created_date_start)::timestamp without time zone) = 1685::double > precision) > (2 rows) > > But in current releases the best bet is a range inquiry using a plain > index: > > regression=# create index fooi on sm_item (item_created_date_start); > CREATE INDEX > regression=# explain SELECT * FROM sm_item WHERE > regression-# item_created_date_start BETWEEN '1685-01-01' AND '1685-12- > 31'; > QUERY PLAN > ------------------------------------------------------------------------ -- > ------------------------------------------- > Index Scan using fooi on sm_item (cost=0.00..17.08 rows=5 width=4) > Index Cond: ((item_created_date_start >= '1685-01-01'::date) AND > (item_created_date_start <= '1685-12-31'::date)) > (2 rows) > > regards, tom lane
Possible bug in CREATE INDEX? Was: Re: Create index on the year of a date column
From
"Mattias Kregert"
Date:
> I am trying to create an index on the year of a date field, to speed up > some queries. Table structure is as follows [snip] > ------------------------------------------------------------------------ > And I have tried the following to create the actual index > ------------------------------------------------------------------------ > create index sm_item_cdates_idx ON sm_item (extract(year from > item_created_date_start)); > ------------------------------------------------------------------------ > The response I get from psql is > ------------------------------------------------------------------------ > sm_live=# create index sm_item_cdates_idx ON sm_item (extract(year from > item_created_date_start)); > ERROR: parser: parse error at or near "(" at character 52 > ------------------------------------------------------------------------ I think this is a bug. Some functions/builtins can not be used in a functional index. date_part() does not work either... However, if you wrap it in another function it works like it should: CREATE FUNCTION get_year (date) RETURNS double precision AS ' SELECT extract(year from $1)' LANGUAGE SQL IMMUTABLE; CREATE INDEX sm_item_cdates_idx ON sm_item (get_year(item_created_date_start)); But I think it would be faster to simply put an index on the date, not the year part. A direct index should be faster than a functional index. > Am I doing anything blatantly wrong? Can I actually use the extract > function for an index? Would I still get a speed improvement if I were > to just index the whole of the field, rather than just the year? > An example query that I have been running is: > ------------------------------------------------------------------------ > SELECT item_id, item_created_date_start FROM sm_item WHERE > extract(year FROM item_created_date_start) = 1685; > ------------------------------------------------------------------------ > Nick Barr I am not sure the index code is intelligent enought to realize that the index can be used if you do the "extract()" thing,but it will work if you do it like this: CREATE INDEX sm_item_cdates_idx ON sm_item (item_created_date_start); SELECT item_id, item_created_date_start FROM sm_item WHERE (item_created_date_start >= '1685-01-01' AND item_created_date_start <= '1685-12-31'); /Mattias
What if you create sql-function without arguments and use it to create functional index ? Oleg On Thu, 5 Jun 2003, Nick Barr wrote: > Hi, > > I am trying to create an index on the year of a date field, to speed up > some queries. Table structure is as follows > > ------------------------------------------------------------------------ > ---- > CREATE SEQUENCE "sm_item_id_seq" start 1 increment 1 maxvalue > 9223372036854775807 minvalue 1 cache 1; > CREATE TABLE "sm_item" ( > "item_id" int4 DEFAULT nextval('"sm_item_id_seq"'::text) NOT > NULL, > "item_created_date_start" date, > CONSTRAINT "sm_item_pkey" PRIMARY KEY ("item_id") > ) WITHOUT OIDS; > ------------------------------------------------------------------------ > ---- > > And I have tried the following to create the actual index > > ------------------------------------------------------------------------ > ---- > create index sm_item_cdates_idx ON sm_item (extract(year from > item_created_date_start)); > ------------------------------------------------------------------------ > ---- > > The response I get from psql is > > ------------------------------------------------------------------------ > ---- > sm_live=# create index sm_item_cdates_idx ON sm_item (extract(year from > item_created_date_start)); > ERROR: parser: parse error at or near "(" at character 52 > ------------------------------------------------------------------------ > ---- > > This relates to the brackets surrounding the "year from > item_created_date_start" bit. > > Am I doing anything blatantly wrong? Can I actually use the extract > function for an index? Would I still get a speed improvement if I were > to just index the whole of the field, rather than just the year? > > An example query that I have been running is: > > ------------------------------------------------------------------------ > ----SELECT item_id, item_created_date_start FROM sm_item WHERE > extract(year FROM item_created_date_start) = 1685; > ------------------------------------------------------------------------ > ---- > > Which of course has been using a seq scan, as there is absolutely no > index on this column as yet. > > Kind Regards, > > Nick Barr > WebBased Ltd. > > This email and any attachments are confidential to the intended > recipient and may also be privileged. If you are not the intended > recipient please delete it from your system and notify the sender. You > should not copy it or use it for any purpose nor disclose or distribute > its contents to any other person. > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
"Nick Barr" <nick.barr@webbased.co.uk> writes: > SELECT item_id, item_created_date_start FROM sm_item WHERE > extract(year FROM item_created_date_start) = 1685; As of 7.4 you will actually be able to build an index on an expression like that: regression=# CREATE TABLE "sm_item" ("item_created_date_start" date); CREATE TABLE regression=# create index sm_item_cdates_idx ON sm_item ((extract(year from item_created_date_start))); CREATE INDEX regression=# explain SELECT * FROM sm_item WHERE regression-# extract(year FROM item_created_date_start) = 1685; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using sm_item_cdates_idx on sm_item (cost=0.00..17.09 rows=5 width=4) Index Cond: (date_part('year'::text, (item_created_date_start)::timestamp without time zone) = 1685::double precision) (2 rows) But in current releases the best bet is a range inquiry using a plain index: regression=# create index fooi on sm_item (item_created_date_start); CREATE INDEX regression=# explain SELECT * FROM sm_item WHERE regression-# item_created_date_start BETWEEN '1685-01-01' AND '1685-12-31'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Index Scan using fooi on sm_item (cost=0.00..17.08 rows=5 width=4) Index Cond: ((item_created_date_start >= '1685-01-01'::date) AND (item_created_date_start <= '1685-12-31'::date)) (2 rows) regards, tom lane
> regression=# explain SELECT * FROM sm_item WHERE > regression-# item_created_date_start BETWEEN '1685-01-01' AND '1685-12-31'; > QUERY PLAN Hmm, if date includes the time, shouldn't the query be: SELECT * FROM sm_item WHERE item_created_date_start >= '1685-01-01' AND item_created_date_start < '1686-01-01' Vincent Hikida, Member of Technical Staff - Urbana Software, Inc. "A Personalized Learning Experience" www.UrbanaSoft.com
On Fri, 6 Jun 2003, Mattias Kregert wrote: > > I am trying to create an index on the year of a date field, to speed up > > some queries. Table structure is as follows > [snip] > > ------------------------------------------------------------------------ > > And I have tried the following to create the actual index > > ------------------------------------------------------------------------ > > create index sm_item_cdates_idx ON sm_item (extract(year from > > item_created_date_start)); > > ------------------------------------------------------------------------ > > The response I get from psql is > > ------------------------------------------------------------------------ > > sm_live=# create index sm_item_cdates_idx ON sm_item (extract(year from > > item_created_date_start)); > > ERROR: parser: parse error at or near "(" at character 52 > > ------------------------------------------------------------------------ > > I think this is a bug. Some functions/builtins can not be used in a It's not, just a limitation that hadn't been taken out yet (until 7.4 basically). The syntax for 7.3 says something to the effect of func_name(column [, ...]) so all the arguments must be columns in the table. This means that date_part and substring and other functions which need constants to be useful or things that aren't strictly speaking functions really can't be used without wrapping. > functional index. date_part() does not work either... Sure it does, if you've got another column that has the timestamp unit string in it, not that that's really useful as a meaningful index. ;)
On Thu, 5 Jun 2003, Nick Barr wrote: > CREATE SEQUENCE "sm_item_id_seq" start 1 increment 1 maxvalue > 9223372036854775807 minvalue 1 cache 1; > CREATE TABLE "sm_item" ( > "item_id" int4 DEFAULT nextval('"sm_item_id_seq"'::text) NOT > NULL, > "item_created_date_start" date, > CONSTRAINT "sm_item_pkey" PRIMARY KEY ("item_id") > ) WITHOUT OIDS; > ------------------------------------------------------------------------ > ---- > > And I have tried the following to create the actual index > > ------------------------------------------------------------------------ > ---- > create index sm_item_cdates_idx ON sm_item (extract(year from > item_created_date_start)); > ------------------------------------------------------------------------ In 7.4, I believe you'll be allowed to say sm_item((extract(year from item_created_date_start))) For now, you'd need to make an immutable function that does the extract year from $1 and use that in the index and query, although I agree with the other response that it might just be better to index the whole value and use range queries instead.