Thread: Partitioned tables - planner wont use indexes

Partitioned tables - planner wont use indexes

From
kevin kempter
Date:
Hi List;

Sorry if this is a dupe, my first post never showed up...

I'm having some performance issues with a partitioned table. We have a
VERY large table that we've partitioned by day.

Currently we have 17 partitions - each partition table contains >
700million rows.
One of the things we need to query is the min date from the master
table - we may explore alternatives for this particular query, however
even if we fix this query I think we have a fundamental issue with the
use of indexes (actuallt the non-use) by the planner.

Below is a sample of the DDL used to create our tables and an explain
showing that the planner wants to do a sequential scan on each
partition. We do have "constraint_elimination = on" set in the
postgresql.conf file.

I tried removing the index from the part_master table and got the same
result

Likewise the costs associated with the seq scans seem to be way off
(yes I've run analyze on the master and all partition tables) - I ran
the actual SQL statement below and killed it after about 15min.

Thanks in advance for any help, advice, etc...




Tables:

------------------------------------------
-- Master Table
------------------------------------------
CREATE TABLE part_master (
    filename character varying(100),
    logdate date,
    ... -- about 50 more columns go here
    loghour date,
    url character varying(500),
    customer character varying(500)
);
CREATE INDEX master_logdate ON part_master USING btree (logdate);

------------------------------------------
-- Partitions:
------------------------------------------

------------------------------------------
-- part_20080319
------------------------------------------
CREATE TABLE part_20080319 (CONSTRAINT part_20080319_logdate_check
    CHECK ((logdate = '2008-03-19'::date))
)
INHERITS (part_master);


CREATE INDEX idx_part_20080319_customer ON part_20080319 USING btree
(customer);
CREATE INDEX idx_part_20080319_logdate ON part_20080319 USING btree
(logdate);
CREATE INDEX idx_part_20080319_loghour ON part_20080319 USING btree
(loghour);


------------------------------------------
-- part_20080320
------------------------------------------
CREATE TABLE part_20080320 (CONSTRAINT part_20080320_logdate_check
    CHECK ((logdate = '2008-03-20'::date))
)
INHERITS (part_master);


CREATE INDEX idx_part_20080320_customer ON part_20080320 USING btree
(customer);
CREATE INDEX idx_part_20080320_logdate ON part_20080320 USING btree
(logdate);
CREATE INDEX idx_part_20080320_loghour ON part_20080320 USING btree
(loghour);


-- And so on, thru part_20080404



------------------------------------------
-- explain plan
------------------------------------------

myDB=# explain SELECT min(logdate) FROM part_master;
                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=117070810.10..117070810.11 rows=1 width=4)
   ->  Append  (cost=0.00..114866502.48 rows=881723048 width=4)
         ->  Seq Scan on part_master  (cost=0.00..85596244.18
rows=679385718 width=4)
         ->  Seq Scan on part_20080319 part  (cost=0.00..212860.86
rows=1674986 width=4)
         ->  Seq Scan on part_20080320 part  (cost=0.00..1753802.51
rows=13782951 width=4)
         ->  Seq Scan on part_20080321 part  (cost=0.00..2061636.83
rows=15881283 width=4)
         ->  Seq Scan on part_20080322 part  (cost=0.00..1965144.71
rows=14936971 width=4)
         ->  Seq Scan on part_20080323 part  (cost=0.00..1614413.18
rows=12345618 width=4)
         ->  Seq Scan on part_20080324 part  (cost=0.00..1926520.22
rows=14741022 width=4)
         ->  Seq Scan on part_20080325 part  (cost=0.00..2356704.22
rows=18477622 width=4)
         ->  Seq Scan on part_20080326 part  (cost=0.00..1889267.71
rows=14512171 width=4)
         ->  Seq Scan on part_20080327 part  (cost=0.00..1622100.34
rows=12445034 width=4)
         ->  Seq Scan on part_20080328 part  (cost=0.00..1711779.49
rows=12885749 width=4)
         ->  Seq Scan on part_20080329 part  (cost=0.00..1568192.94
rows=11958394 width=4)
         ->  Seq Scan on part_20080330 part  (cost=0.00..1521204.64
rows=11676564 width=4)
         ->  Seq Scan on part_20080331 part  (cost=0.00..1587138.77
rows=12180377 width=4)
         ->  Seq Scan on part_20080401 part  (cost=0.00..2324352.82
rows=18211382 width=4)
         ->  Seq Scan on part_20080402 part  (cost=0.00..2891295.04
rows=6693804 width=4)
         ->  Seq Scan on part_20080403 part  (cost=0.00..1707327.48
rows=5748348 width=4)
         ->  Seq Scan on part_20080404 part  (cost=0.00..556516.54
rows=4185054 width=4)
(20 rows)

Re: Partitioned tables - planner wont use indexes

From
paul rivers
Date:
kevin kempter wrote:
> Hi List;
>
> Sorry if this is a dupe, my first post never showed up...
>
> I'm having some performance issues with a partitioned table. We have a
> VERY large table that we've partitioned by day.
>

Unfortunately, that is the defined behavior in this case. From 5.9.6 of
the manual:

"Constraint exclusion only works when the query's WHERE clause contains
constants."

[Where the constants are of course your partitioning column(s)]


The best way around this depends mostly on what you're up to. You can
get the min tablename from the catalogs, or you can keep a table of
active partitions that your script which drops off old partitions and
generates new ones can keep updated on the oldest/newest partition
dates. Or some number of other solutions, whatever you find cleanest for
your purposes.

Paul



Re: Partitioned tables - planner wont use indexes

From
Gregory Stark
Date:
"kevin kempter" <kevin@kevinkempterllc.com> writes:

> that the planner wants to do a sequential scan on each  partition. We do have
> "constraint_elimination = on" set in the  postgresql.conf file.

"constraint_exclusion" btw.


> myDB=# explain SELECT min(logdate) FROM part_master;

Er, yeah. Unfortunately this is just not a kind of query our planner knows how
to optimize when dealing with a partitioned table... yet. There are several
different pieces missing to make this work. There's some hope some of them
might show up for 8.4 but no guarantees.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: Partitioned tables - planner wont use indexes

From
Bricklen Anderson
Date:
kevin kempter wrote:
> One of the things we need to query is the min date from the master table
> - we may explore alternatives for this particular query, however even if
> we fix this query I think we have a fundamental issue with the use of
> indexes (actuallt the non-use) by the planner.

We had a similar requirement, so I've been using a function that loops
over the child tables, and queries for the min date from each. If all
you need is the date, you can try a function call. Here is a modified
version of what I've been using:

CREATE OR REPLACE function get_min_date() RETURNS DATE as $_$
DECLARE
     x RECORD;
     min_date DATE;
     min_date_tmp DATE;
     qry TEXT;
BEGIN
     /* can also test MIN() aggregate, rather than ORDER BY/LIMIT */
     FOR x IN EXECUTE 'select tablename from pg_tables where tablename
like ''part_20%''' loop
         qry := 'SELECT logdate FROM '||x.tablename||' ORDER BY logdate
LIMIT 1';
         EXECUTE qry INTO min_date_tmp;
         IF (min_date IS NULL OR (min_date_tmp IS NOT NULL AND
min_date_tmp<min_date)) THEN
             min_date := min_date_tmp;
         END IF;
     END LOOP;
     RETURN min_date;
END;
$_$ language plpgsql immutable;