Hi!
I'm just wondering, I've got a table that is partitioned into monthly
tables:
media_downloads -> media_downloads_YYYYMM
I\- id (primary key)
\- created_on (timestamp criteria for the monthly table split)
There are constraints upon the created_on column, all needed insert
instead rules are defined too.
One additional hardship is that id are not monotone against created_on,
id1 < id2 does not imply created_on1 <= created_on2 :(
The table contains basically almost 100M rows, and the number is
growing. (the table will be about a 12GB pg_dump.)
All relevant indexes (primary key id, index on created_on) are defined
too.
The good thing is, queries like all rows in the last 7 days work
reasonable fast, the optimizer just checks the 1-2 last month tables.
Using postgres 8.1.4-0ubuntu1, I've got to implement the following
queries in a reasonable fast way:
-- sequential reading of rows
SELECT * FROM media_downloads WHERE id > 1000000 ORDER BY id LIMIT 100;
Against the same monolithic table with about 16.5M rows, I'm getting a
cost of 20.6 pages. (Index scan)
Against the partitioned tables, I'm getting a cost of 5406822 pages.
Now I understand, that without any additional conditions, postgresql
needs to do the query for all subtables first, but explain against the
subtables show costs of 4-5 pages.
events=# explain select * from media_downloads where id >90000000 order
by id limit 100;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5406822.39..5406822.64 rows=100 width=1764)
-> Sort (cost=5406822.39..5413639.50 rows=2726843 width=1764)
Sort Key: public.media_downloads.id
-> Result (cost=0.00..115960.71 rows=2726843 width=1764)
-> Append (cost=0.00..115960.71 rows=2726843
width=1764)
-> Seq Scan on media_downloads (cost=0.00..10.50
rows=13 width=1764)
Filter: (id > 90000000)
-> Index Scan using media_downloads_200510_pkey on
media_downloads_200510 media_downloads (cost=0.00..3.75 rows=14
width=243)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200511_pkey on
media_downloads_200511 media_downloads (cost=0.00..72.19 rows=172
width=239)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200512_pkey on
media_downloads_200512 media_downloads (cost=0.00..603.64 rows=172
width=240)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200601_pkey on
media_downloads_200601 media_downloads (cost=0.00..19.33 rows=232
width=239)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200602_pkey on
media_downloads_200602 media_downloads (cost=0.00..56.82 rows=316
width=240)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200603_pkey on
media_downloads_200603 media_downloads (cost=0.00..18.88 rows=270
width=243)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200604_pkey on
media_downloads_200604 media_downloads (cost=0.00..1194.16 rows=939
width=298)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200605_pkey on
media_downloads_200605 media_downloads (cost=0.00..79.28 rows=672
width=326)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200606_pkey on
media_downloads_200606 media_downloads (cost=0.00..75.26 rows=1190
width=314)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200607_pkey on
media_downloads_200607 media_downloads (cost=0.00..55.29 rows=1238
width=319)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200608_pkey on
media_downloads_200608 media_downloads (cost=0.00..73.95 rows=1305
width=319)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200609_pkey on
media_downloads_200609 media_downloads (cost=0.00..144.10 rows=1575
width=324)
Index Cond: (id > 90000000)
-> Index Scan using media_downloads_200610_pkey on
media_downloads_200610 media_downloads (cost=0.00..113532.57
rows=2718709 width=337)
Index Cond: (id > 90000000)
-> Seq Scan on media_downloads_200611
media_downloads (cost=0.00..10.50 rows=13 width=1764)
Filter: (id > 90000000)
-> Seq Scan on media_downloads_200612
media_downloads (cost=0.00..10.50 rows=13 width=1764)
Filter: (id > 90000000)
(37 rows)
events=# explain select * from media_downloads_200610 where id
>90000000 order by id limit 100;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..4.18 rows=100 width=337)
-> Index Scan using media_downloads_200610_pkey on
media_downloads_200610 (cost=0.00..113582.70 rows=2719904 width=337)
Index Cond: (id > 90000000)
(3 rows)
Interestingly, if one reformulates the query like that:
SELECT * FROM media_downloads WHERE id > 90000000 AND id < 90001000
ORDER BY id LIMIT 100;
results in a reasonable cost of 161.5 pages.
Now the above query is basically acceptable, as one iterate all rows
this way, but now I need to know max(id) to know when to stop my loop:
events=# explain select max(id) from media_downloads;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------
Aggregate (cost=3676914.56..3676914.58 rows=1 width=4)
-> Append (cost=0.00..3444211.85 rows=93081085 width=4)
-> Seq Scan on media_downloads (cost=0.00..10.40 rows=40
width=4)
-> Seq Scan on media_downloads_200510 media_downloads
(cost=0.00..5615.84 rows=139884 width=4)
-> Seq Scan on media_downloads_200511 media_downloads
(cost=0.00..67446.56 rows=1724356 width=4)
-> Seq Scan on media_downloads_200512 media_downloads
(cost=0.00..66727.02 rows=1718302 width=4)
-> Seq Scan on media_downloads_200601 media_downloads
(cost=0.00..88799.91 rows=2321991 width=4)
-> Seq Scan on media_downloads_200602 media_downloads
(cost=0.00..121525.71 rows=3159571 width=4)
-> Seq Scan on media_downloads_200603 media_downloads
(cost=0.00..104205.40 rows=2701240 width=4)
-> Seq Scan on media_downloads_200604 media_downloads
(cost=0.00..342511.42 rows=9391242 width=4)
-> Seq Scan on media_downloads_200605 media_downloads
(cost=0.00..245167.39 rows=6724039 width=4)
-> Seq Scan on media_downloads_200606 media_downloads
(cost=0.00..430186.99 rows=11901499 width=4)
-> Seq Scan on media_downloads_200607 media_downloads
(cost=0.00..451313.72 rows=12380172 width=4)
-> Seq Scan on media_downloads_200608 media_downloads
(cost=0.00..474743.72 rows=13048372 width=4)
-> Seq Scan on media_downloads_200609 media_downloads
(cost=0.00..619711.52 rows=15754452 width=4)
-> Seq Scan on media_downloads_200610 media_downloads
(cost=0.00..426225.45 rows=12115845 width=4)
-> Seq Scan on media_downloads_200611 media_downloads
(cost=0.00..10.40 rows=40 width=4)
-> Seq Scan on media_downloads_200612 media_downloads
(cost=0.00..10.40 rows=40 width=4)
(18 rows)
events=# explain select max(id) from media_downloads_200610;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.04..0.05 rows=1 width=0)
InitPlan
-> Limit (cost=0.00..0.04 rows=1 width=4)
-> Index Scan Backward using media_downloads_200610_pkey on
media_downloads_200610 (cost=0.00..475660.29 rows=12115845 width=4)
Filter: (id IS NOT NULL)
(5 rows)
For me as a human, it's obvious, that max(media_downloads) ==
max(media_downloads_200612..media_downloads_200510).
Any ideas how to make the optimizer handle partitioned tables more
sensible?
Andreas