Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT - Mailing list pgsql-performance

From John Papandriopoulos
Subject Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT
Date
Msg-id idbo6g$92r$1@news.hub.org
Whole thread Raw
Responses Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT
List pgsql-performance
Hi everyone,

I've been trialling different inheritance schemes for partitioning to a large number of tables.  I am looking at ~1e9
records,totaling ~200GB. 

I've found that a k-ary table inheritance tree works quite well to reduce the O(n) CHECK constraint overhead [1] in the
queryplanner when enabling partition constraint exclusion. 

I've played with binary (k=2) trees, and have found that query planning time is shorter for shallow trees where k>>2.
(Itappears that "more work" spent checking CHECK constraints is faster than to recur down the inheritance tree.  Is
thisbecause fewer table locks are involved?) 

A given tree structure (e.g. k=16) has a good query-plan time for SELECT queries in my case.  The query-plan times,
however,for UPDATE and DELETE are unfortunately quite quite bad.  (I was surprised that query-planning time was not
similaracross all three queries?) 

My machine swaps wildly when PostgreSQL plans an UPDATE or DELETE.  It does not swap for the SELECT query planning at
all. There is no noticeable memory growth by the postgres process for the SELECT plans.  There is huge memory usage
growthwhen running a query-plan for UPDATE or DELETE.  The difference is something like going from 50MB to over 10GB of
theprocess' virtual memory. 

I'm trialling PostgreSQL on a MacBook Pro having 8GB physical RAM.


Here's an example, where the DDL for the inheritance tree [2] is generated by a Python script [3].

1. Query planning time for a SELECT query

> $ echo "explain select * from ptest where id = 34324234; \q" | time -p psql ptest
>                                  QUERY PLAN
> -------------------------------------------------------------------------------
> Result  (cost=0.00..160.00 rows=48 width=4)
>   ->  Append  (cost=0.00..160.00 rows=48 width=4)
>         ->  Seq Scan on ptest  (cost=0.00..40.00 rows=12 width=4)
>               Filter: (id = 34324234)
>         ->  Seq Scan on ptest_0 ptest  (cost=0.00..40.00 rows=12 width=4)
>               Filter: (id = 34324234)
>         ->  Seq Scan on ptest_0_4 ptest  (cost=0.00..40.00 rows=12 width=4)
>               Filter: (id = 34324234)
>         ->  Seq Scan on ptest_0_4_1 ptest  (cost=0.00..40.00 rows=12 width=4)
>               Filter: (id = 34324234)
> (10 rows)
>
> real         0.99
> user         0.00
> sys          0.00
> $

2. Query planning time for a DELETE query

> $ echo "explain delete from ptest where id = 34324234; \q" | time -p psql ptest
>                               QUERY PLAN
> -------------------------------------------------------------------------
> Delete  (cost=0.00..160.00 rows=48 width=6)
>   ->  Seq Scan on ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0_4 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0_4_1 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
> (9 rows)
>
> real       317.14
> user         0.00
> sys          0.00
> $

3. Query planning time for an UPDATE query

> $ echo "explain update ptest set id = 34324235 where id = 34324234;
> \q" | time -p psql ptest
>                               QUERY PLAN
> -------------------------------------------------------------------------
> Update  (cost=0.00..160.00 rows=48 width=6)
>   ->  Seq Scan on ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0_4 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0_4_1 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
> (9 rows)
>
> real       331.72
> user         0.00
> sys          0.00
> $


Query planning on the leaf nodes works properly for all query-types:

> $ echo "explain delete from ptest_0_4_1 where id = 34324234; \q" | time -p psql ptest
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Delete  (cost=0.00..40.00 rows=12 width=6)
>    ->  Seq Scan on ptest_0_4_1  (cost=0.00..40.00 rows=12 width=6)
>          Filter: (id = 34324234)
> (3 rows)
>
> real         0.01
> user         0.00
> sys          0.00
>
> $ echo "explain update ptest_0_4_1 set id = 34324235 where id = 34324234; \q" | time -p psql ptest
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Update  (cost=0.00..40.00 rows=12 width=6)
>    ->  Seq Scan on ptest_0_4_1  (cost=0.00..40.00 rows=12 width=6)
>          Filter: (id = 34324234)
> (3 rows)
>
> real         0.01
> user         0.00
> sys          0.00
> $


With SELECT constraint exclusion working, I can define plpgsql functions to UPDATE or DELETE the leaf tables directly,
butusing such an interface isn't terribly elegant. 

I therefore tried writing the plpgsql functions for UPDATE and DELETE anyway, with the idea of linking to a TRIGGER on
theparent ptest table.  This didn't work as expected either, unless I polluted my application's SQL queries with the
"ONLY"keyword to make sure the trigger fired [4]. 


Is the query-planning times and memory use as demonstrated above normal?  I am hoping this is just a defect in the
query-plannerthat we might be able to fix so that PostgreSQL can manage my large data set with more ease. 

Any advice appreciated,

John


[1] http://wiki.postgresql.org/wiki/Table_partitioning#SELECT.2C_UPDATE.2C_DELETE
[2] http://jpap.org/files/partition-test.txt
[3] http://jpap.org/files/partition-test.py
[4] http://archives.postgresql.org/pgsql-hackers/2008-11/msg01883.php


pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: problem with from_collapse_limit and joined views
Next
From: Tom Lane
Date:
Subject: Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT