Thread: index on partitioned table

index on partitioned table

From
"Wojtek"
Date:
Dear Postgres Community,

I'm running postgres 8.3

I have a table, partitioned by month

-- Table: datadump

-- DROP TABLE datadump;

CREATE TABLE datadump
(
  sys_timestamp timestamp without time zone,
  sys_device_id integer,
  usefields integer,
  timedate timestamp without time zone,
  digitalthermometer1 integer,
  digitalthermometer2 integer,
  digitalthermometer3 integer,
  digitalthermometer4 integer,
  digitalthermometer5 integer,
  digitalthermometer6 integer,
  tco0 integer,
  tco1 integer,
  tco2 integer,
  tco3 integer
)
WITH (
  OIDS=FALSE
)
TABLESPACE disk_d;
ALTER TABLE datadump OWNER TO postgres;
GRANT ALL ON TABLE datadump TO postgres;

partitioned by timedate, example:

CREATE TABLE data_dmp_part_201036
(
{inherits from master table}
 CONSTRAINT data_dmp_part_201036_timedate_check CHECK (timedate >= '2010-09-06 00:00:00'::timestamp without time zone
ANDtimedate < '2010-09-13 00:00:00'::timestamp without time zone) 
)
INHERITS (datadump)
WITH (
  OIDS=FALSE
);
ALTER TABLE data_dmp_part_201036 OWNER TO postgres;

partitions are will typically have from 200k to 300k rows, i have 52 partitions per year and I'm keeping around 4-5
yearsof history. However, they will query last 3-4 months most often. 

my first, pretty obvious choice, was to create index on partitions on timedate:
CREATE INDEX data_dmp_part_201036_idx
  ON data_dmp_part_201036
  USING btree
  (timedate);


Most of my queries will have where conditions on timedate and sys_device_id, but a lot of them will have additional
clause:where usefields is not null. Some of the queries will be limited on timedate only. 

I'm trying to figure out the best indexing strategy for this table. If a query will have condition on sys_device_id
and/orusefields is not null, postgres won't use my index.  
I've experimented turning on and off enable_seqscan and creating different indexes and so far btree index on
(usefields,sys_device_id, timedate) turn out to be the best.  
If I create btree index only on (usefields, timedate) or (sys_device_id, timedate), planner will go for seqscan. If I
turnoff seqscan, postgres will use index but performance will be worse than seqscan. 


My question finally: is btree index on (usefields, sys_device_id, timedate) really the best choice? I'm yet to examine
optionsof creating separate indexes for timedate, usefields and sys_device_id. Possibly I should try using GiST or GIN? 

Any advice, please?

Regards,
foo



Re: index on partitioned table

From
Nikolas Everett
Date:


2010/2/5 Wojtek <foo@twine.pl>
partitions are will typically have from 200k to 300k rows, i have 52 partitions per year and I'm keeping around 4-5 years of history. However, they will query last 3-4 months most often.
Do you mean 12 partitions a year or weekly partitions?
 
Most of my queries will have where conditions on timedate and sys_device_id, but a lot of them will have additional clause: where usefields is not null. Some of the queries will be limited on timedate only.

I'm trying to figure out the best indexing strategy for this table. If a query will have condition on sys_device_id and/or usefields is not null, postgres won't use my index.
I've experimented turning on and off enable_seqscan and creating different indexes and so far btree index on (usefields, sys_device_id, timedate) turn out to be the best.
If I create btree index only on (usefields, timedate) or (sys_device_id, timedate), planner will go for seqscan. If I turn off seqscan, postgres will use index but performance will be worse than seqscan.


My question finally: is btree index on (usefields, sys_device_id, timedate) really the best choice? I'm yet to examine options of creating separate indexes for timedate, usefields and sys_device_id. Possibly I should try using GiST or GIN?

I'd start with no indexes and then add indexes as your queries start to take too long.  I'd start with single column indexes.  PostgreSQL is perfectly capable of bitmap anding the indexes if it has to.  Multicolumn indexes are the last place I'd go.

I'm not sure you'll need an index on timedate.  It depends on the length of the timedate segments you'll be querying.  If they are typically a month long then you shouldn't have an index on it at all.  Even if they are a week long its probably not worth it.

My guess is that an index sys_device_id will be selective enough for most of what you need.  What does PostgreSQL tell you about the statistics of that column?
 
Regards,
foo

Regards,
bar 

Re: index on partitioned table

From
Robert Haas
Date:
2010/2/5 Wojtek <foo@twine.pl>:
> Most of my queries will have where conditions on timedate and sys_device_id, but a lot of them will have additional
clause:where usefields is not null. Some of the queries will be limited on timedate only. 

What about a partial index on (timedate) WHERE usefields IS NOT NULL;
or maybe on (timedate, sys_device_id) WHERE usefields IS NOT NULL?

...Robert