Problem with update on partitioned table - Mailing list pgsql-general
From | Alex Solovey |
---|---|
Subject | Problem with update on partitioned table |
Date | |
Msg-id | 47E7FDEF.2010405@gmail.com Whole thread Raw |
Responses |
Re: Problem with update on partitioned table
|
List | pgsql-general |
Hello, We have pretty big production database (running PostgreSQL 8.3.1) with many partitioned tables. In most cases, they work well (since 8.2.1 at least) -- constraint exclusion is able to select correct partitions. However, there is an exception: queries on partitioned tables using PostgreSQL 'UPDATE Foo ... FROM Bar' syntax extension. Here is a simple test case: ------------------ CREATE TABLE bar ( bar_id INT NOT NULL PRIMARY KEY ); INSERT INTO bar VALUES ( 1 ), ( 2 ), ( 3 ); CREATE TABLE foo ( part INT NOT NULL ,foo_data INT ,bar_id INT NOT NULL REFERENCES bar( bar_id ) ); CREATE TABLE foo_1 ( CHECK ( part = 1 ) ) INHERITS ( foo ); INSERT INTO foo_1 ( part, bar_id ) VALUES ( 1, 1 ), ( 1, 3 ); CREATE TABLE foo_2 ( CHECK ( part = 2 ) ) INHERITS ( foo ); INSERT INTO foo_2 ( part, bar_id ) VALUES ( 2, 2 ), ( 2, 3 ); CREATE TABLE foo_3 ( CHECK ( part = 3 ) ) INHERITS ( foo ); INSERT INTO foo_3 ( part, bar_id ) VALUES ( 3, 1 ), ( 3, 2 ); ------------------ As you can see, table "Foo" is partitioned by column "part". If only "Foo" is referenced in update, query plan is fine: => EXPLAIN UPDATE foo SET foo_data = 10 WHERE part = 2; QUERY PLAN ------------------------------------------------------------------ Append (cost=0.00..68.50 rows=20 width=14) -> Seq Scan on foo (cost=0.00..34.25 rows=10 width=14) Filter: (part = 2) -> Seq Scan on foo_2 foo (cost=0.00..34.25 rows=10 width=14) Filter: (part = 2) (5 rows) However, for this query it is far from being optimal: => EXPLAIN UPDATE foo SET foo_data = 10 FROM bar WHERE part = 2 AND foo.bar_id = bar.bar_id; QUERY PLAN -------------------------------------------------------------------------------- Append (cost=0.00..nan rows=22 width=14) -> Nested Loop (cost=0.00..73.05 rows=10 width=14) -> Seq Scan on foo (cost=0.00..34.25 rows=10 width=14) Filter: (part = 2) -> Index Scan using bar_pkey on bar (cost=0.00..3.87 rows=1 width=4) Index Cond: (bar.bar_id = public.foo.bar_id) -> Merge Join (cost=nan..nan rows=1 width=8) Merge Cond: (public.foo.bar_id = bar.bar_id) -> Sort (cost=0.02..0.03 rows=1 width=0) Sort Key: public.foo.bar_id -> Result (cost=0.00..0.01 rows=1 width=0) One-Time Filter: false -> Sort (cost=168.75..174.75 rows=2400 width=4) Sort Key: bar.bar_id -> Seq Scan on bar (cost=0.00..34.00 rows=2400 width=4) -> Nested Loop (cost=0.00..73.05 rows=10 width=14) -> Seq Scan on foo_2 foo (cost=0.00..34.25 rows=10 width=14) Filter: (part = 2) -> Index Scan using bar_pkey on bar (cost=0.00..3.87 rows=1 width=4) Index Cond: (bar.bar_id = public.foo.bar_id) -> Merge Join (cost=nan..nan rows=1 width=8) Merge Cond: (public.foo.bar_id = bar.bar_id) -> Sort (cost=0.02..0.03 rows=1 width=0) Sort Key: public.foo.bar_id -> Result (cost=0.00..0.01 rows=1 width=0) One-Time Filter: false -> Sort (cost=168.75..174.75 rows=2400 width=4) Sort Key: bar.bar_id -> Seq Scan on bar (cost=0.00..34.00 rows=2400 width=4) (29 rows) Is there any way to avoid this anomaly? UPDATE ... FROM is very convenient if you have to update rows depending on conditions involving multiple tables. In addition, with partitioned tables, standard-conforming UPDATE foo ... WHERE pk IN (SELECT pk FROM foo, bar...) is even worse because query planner cannot choose correct partitions without nested select results and so it does a complete scan of all partitions instead. Alex
pgsql-general by date: