Re: Add SPLIT PARTITION/MERGE PARTITIONS commands - Mailing list pgsql-hackers

From Dmitry Koval
Subject Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Date
Msg-id 5fc93b25-768a-dc84-b14c-183873476f1e@postgrespro.ru
Whole thread Raw
In response to Re: Add SPLIT PARTITION/MERGE PARTITIONS commands  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
List pgsql-hackers
I'm sorry, I couldn't answer earlier...

1.
 > partbounds.c is adding 500+ LOC about checking if proposed partitions
 > meet the requirements (don't overlap, etc).  But a lot of those
 > checks must already happen, no?  Can you re-use/refactor the existing
 > checks ?

I a bit reduced the number of lines in partbounds.c and added comments.
Unfortunately, it is very difficult to re-use existing checks for other 
partitioned tables operations, because mostly part of PostgreSQL 
commands works with a single partition.
So for SPLIT/MERGE commands were created new checks for several partitions.

2.
 > Also, postgres already supports concurrent DDL (CREATE+ATTACH and
 > DETACH CONCURRENTLY).  Is it possible to leverage that ?
 > (Mostly to reduce the patch size, but also because maybe some cases
 > could be concurrent?).

Probably "ATTACH CONCURRENTLY" is not supported?
A few words about "DETACH CONCURRENTLY".
"DETACH CONCURRENTLY" can works because this command not move rows 
during detach partition (and so no reason to block detached partition).
"DETACH CONCURRENTLY" do not changes data, but changes partition 
description (partition is marked as "inhdetachpending = true" etc.).

For SPLIT and MERGE the situation is completely different - these 
commands transfer rows between sections.
Therefore partitions must be LOCKED EXCLUSIVELY during rows transfer.
Probably we can use concurrently partitions not participating in SPLIT 
and MERGE.
But now PostgreSQL has no possibilities to forbid using a part of 
partitions of a partitioned table (until the end of data transfer by 
SPLIT/MERGE commands).
Simple locking is not quite suitable here.
I see only one variant of SPLIT/MERGE CONCURRENTLY implementation that 
can be realized now:

* ShareUpdateExclusiveLock on partitioned table;
* AccessExclusiveLock on partition(s) which will be deleted and will be 
created during SPLIT/MEGRE command;
* transferring data between locked sections; operations with non-blocked 
partitions are allowed;
* sessions which want to use partition(s) which will be deleted, waits 
on locks;
* finally we release AccessExclusiveLock on partition(s) which will be 
deleted and delete them;
* waiting sessions will get errors "relation ... does not exist" (we can 
transform it to "relation structure was changed ... please try again"?).

It doesn't look pretty.
Therefore for the SPLIT/MERGE command the partitioned table is locked 
with AccessExclusiveLock.

3.
 > An UPDATE on a partitioned table will move tuples from one partition
 > to another.  Is there a way to re-use that?

This could be realized using methods that are called from 
ExecCrossPartitionUpdate().
But using these methods is more expensive than the current 
implementation of the SPLIT/MERGE commands.
SPLIT/MERGE commands uses "bulk insert" and there is low overhead for 
finding a partition to insert data: for MERGE is not need to search 
partition; for SPLIT need to use simple search from several partitions 
(listed in the SPLIT command).
Below is a test example.

a. Transferring data from the table "test2" to partitions "partition1" 
and "partition2" using the current implementation of tuple routing in 
PostgreSQL:

CREATE TABLE test (a int, b char(10)) PARTITION BY RANGE (a);
CREATE TABLE partition1 PARTITION OF test FOR VALUES FROM (10) TO (20);
CREATE TABLE partition2 PARTITION OF test FOR VALUES FROM (20) TO (30);
CREATE TABLE test2 (a int, b char(10));
INSERT INTO test2 (a, b) SELECT 11, 'a' FROM generate_series(1, 1000000);
INSERT INTO test2 (a, b) SELECT 22, 'b' FROM generate_series(1, 1000000);
INSERT INTO test(a, b) SELECT a, b FROM test2;
DROP TABLE test2;
DROP TABLE test;

Three attempts (the results are little different), the best result:

INSERT 0 2000000
Time: 4467,814 ms (00:04,468)

b. Transferring data from the partition "partition0" to partitions 
"partition 1" and "partition2" using SPLIT command:

CREATE TABLE test (a int, b char(10)) PARTITION BY RANGE (a);
CREATE TABLE partition0 PARTITION OF test FOR VALUES FROM (0) TO (30);
INSERT INTO test (a, b) SELECT 11, 'a' FROM generate_series(1, 1000000);
INSERT INTO test (a, b) SELECT 22, 'b' FROM generate_series(1, 1000000);
ALTER TABLE test SPLIT PARTITION partition0 INTO
   (PARTITION partition0 FOR VALUES FROM (0) TO (10),
    PARTITION partition1 FOR VALUES FROM (10) TO (20),
    PARTITION partition2 FOR VALUES FROM (20) TO (30));
DROP TABLE test;

Three attempts (the results are little different), the best result:

ALTER TABLE
Time: 3840,127 ms (00:03,840)

So the current implementation of tuple routing is ~16% slower than the 
SPLIT command.
That's quite a lot.


With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com

Attachment

pgsql-hackers by date:

Previous
From: Nikita Malakhov
Date:
Subject: Re: [PATCH] Infinite loop while acquiring new TOAST Oid
Next
From: Tom Lane
Date:
Subject: Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas