Thread: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi, hackers! There are not many commands in PostgreSQL for working with partitioned tables. This is an obstacle to their widespread use. Adding SPLIT PARTITION/MERGE PARTITIONS operations can make easier to use partitioned tables in PostgreSQL. (This is especially important when migrating projects from ORACLE DBMS.) SPLIT PARTITION/MERGE PARTITIONS commands are supported for range partitioning (BY RANGE) and for list partitioning (BY LIST). For hash partitioning (BY HASH) these operations are not supported. ================= 1 SPLIT PARTITION ================= Command for split a single partition. 1.1 Syntax ---------- ALTER TABLE <name> SPLIT PARTITION <partition_name> INTO (PARTITION <partition_name1> { FOR VALUES <partition_bound_spec> | DEFAULT }, [ ... ] PARTITION <partition_nameN> { FOR VALUES <partition_bound_spec> | DEFAULT }) <partition_bound_spec>: IN ( <partition_bound_expr> [, ...] ) | FROM ( { <partition_bound_expr> | MINVALUE | MAXVALUE } [, ...] ) TO ( { <partition_bound_expr> | MINVALUE | MAXVALUE } [, ...] ) 1.2 Rules --------- 1.2.1 The <partition_name> partition should be split into two (or more) partitions. 1.2.2 New partitions should have different names (with existing partitions too). 1.2.3 Bounds of new partitions should not overlap with new and existing partitions. 1.2.4 In case split partition is DEFAULT partition, one of new partitions should be DEFAULT. 1.2.5 In case new partitions or existing partitions contains DEFAULT partition, new partitions <partition_name1>...<partition_nameN> can have any bounds inside split partition bound (can be spaces between partitions bounds). 1.2.6 In case partitioned table does not have DEFAULT partition, DEFAULT partition can be defined as one of new partition. 1.2.7 In case new partitions not contains DEFAULT partition and partitioned table does not have DEFAULT partition the following should be true: sum bounds of new partitions <partition_name1>...<partition_nameN> should be equal to bound of split partition <partition_name>. 1.2.8 One of the new partitions <partition_name1>-<partition_nameN> can have the same name as split partition <partition_name> (this is suitable in case splitting a DEFAULT partition: we split it, but after splitting we have a partition with the same name). 1.2.9 Only simple (non-partitioned) partitions can be split. 1.3 Examples ------------ 1.3.1 Example for range partitioning (BY RANGE): CREATE TABLE sales_range (salesman_id INT, salesman_name VARCHAR(30), sales_amount INT, sales_date DATE) PARTITION BY RANGE (sales_date); CREATE TABLE sales_jan2022 PARTITION OF sales_range FOR VALUES FROM ('2022-01-01') TO ('2022-02-01'); CREATE TABLE sales_feb_mar_apr2022 PARTITION OF sales_range FOR VALUES FROM ('2022-02-01') TO ('2022-05-01'); CREATE TABLE sales_others PARTITION OF sales_range DEFAULT; ALTER TABLE sales_range SPLIT PARTITION sales_feb_mar_apr2022 INTO (PARTITION sales_feb2022 FOR VALUES FROM ('2022-02-01') TO ('2022-03-01'), PARTITION sales_mar2022 FOR VALUES FROM ('2022-03-01') TO ('2022-04-01'), PARTITION sales_apr2022 FOR VALUES FROM ('2022-04-01') TO ('2022-05-01')); 1.3.2 Example for list partitioning (BY LIST): CREATE TABLE sales_list (salesman_id INT GENERATED ALWAYS AS IDENTITY, salesman_name VARCHAR(30), sales_state VARCHAR(20), sales_amount INT, sales_date DATE) PARTITION BY LIST (sales_state); CREATE TABLE sales_nord PARTITION OF sales_list FOR VALUES IN ('Murmansk', 'St. Petersburg', 'Ukhta'); CREATE TABLE sales_all PARTITION OF sales_list FOR VALUES IN ('Moscow', 'Voronezh', 'Smolensk', 'Bryansk', 'Magadan', 'Kazan', 'Khabarovsk', 'Volgograd', 'Vladivostok'); CREATE TABLE sales_others PARTITION OF sales_list DEFAULT; ALTER TABLE sales_list SPLIT PARTITION sales_all INTO (PARTITION sales_west FOR VALUES IN ('Voronezh', 'Smolensk', 'Bryansk'), PARTITION sales_east FOR VALUES IN ('Magadan', 'Khabarovsk', 'Vladivostok'), PARTITION sales_central FOR VALUES IN ('Moscow', 'Kazan', 'Volgograd')); 1.4 ToDo: --------- 1.4.1 Possibility to specify tablespace for each of the new partitions (currently new partitions are created in the same tablespace as split partition). 1.4.2 Possibility to use CONCURRENTLY mode that allows (during the SPLIT operation) not blocking partitions that are not splitting. ================== 2 MERGE PARTITIONS ================== Command for merge several partitions into one partition. 2.1 Syntax ---------- ALTER TABLE <name> MERGE PARTITIONS (<partition_name1>, <partition_name2>[, ...]) INTO <new_partition_name>; 2.2 Rules --------- 2.2.1 The number of partitions that are merged into the new partition <new_partition_name> should be at least two. 2.2.2 If DEFAULT partition is not in the list of partitions <partition_name1>, <partition_name2>[, ...]: * for range partitioning (BY RANGE) is necessary that the ranges of the partitions <partition_name1>, <partition_name2>[, ...] can be merged into one range without spaces and overlaps (otherwise an error will be generated). The combined range will be the range for the partition <new_partition_name>. * for list partitioning (BY LIST) the values lists of all partitions <partition_name1>, <partition_name2>[, ...] are combined and form a list of values of partition <new_partition_name>. If DEFAULT partition is in the list of partitions <partition_name1>, <partition_name2>[, ...]: * the partition <new_partition_name> will be the DEFAULT partition; * for both partitioning types (BY RANGE, BY LIST) the ranges and lists of values of the merged partitions can be any. 2.2.3 The new partition <new_partition_name> can have the same name as one of the merged partitions. 2.2.4 Only simple (non-partitioned) partitions can be merged. 2.3 Examples ------------ 2.3.1 Example for range partitioning (BY RANGE): CREATE TABLE sales_range (salesman_id INT, salesman_name VARCHAR(30), sales_amount INT, sales_date DATE) PARTITION BY RANGE (sales_date); CREATE TABLE sales_jan2022 PARTITION OF sales_range FOR VALUES FROM ('2022-01-01') TO ('2022-02-01'); CREATE TABLE sales_feb2022 PARTITION OF sales_range FOR VALUES FROM ('2022-02-01') TO ('2022-03-01'); CREATE TABLE sales_mar2022 PARTITION OF sales_range FOR VALUES FROM ('2022-03-01') TO ('2022-04-01'); CREATE TABLE sales_apr2022 PARTITION OF sales_range FOR VALUES FROM ('2022-04-01') TO ('2022-05-01'); CREATE TABLE sales_others PARTITION OF sales_range DEFAULT; ALTER TABLE sales_range MERGE PARTITIONS (sales_feb2022, sales_mar2022, sales_apr2022) INTO sales_feb_mar_apr2022; 2.3.2 Example for list partitioning (BY LIST): CREATE TABLE sales_list (salesman_id INT GENERATED ALWAYS AS IDENTITY, salesman_name VARCHAR(30), sales_state VARCHAR(20), sales_amount INT, sales_date DATE) PARTITION BY LIST (sales_state); CREATE TABLE sales_nord PARTITION OF sales_list FOR VALUES IN ('Murmansk', 'St. Petersburg', 'Ukhta'); CREATE TABLE sales_west PARTITION OF sales_list FOR VALUES IN ('Voronezh', 'Smolensk', 'Bryansk'); CREATE TABLE sales_east PARTITION OF sales_list FOR VALUES IN ('Magadan', 'Khabarovsk', 'Vladivostok'); CREATE TABLE sales_central PARTITION OF sales_list FOR VALUES IN ('Moscow', 'Kazan', 'Volgograd'); CREATE TABLE sales_others PARTITION OF sales_list DEFAULT; ALTER TABLE sales_list MERGE PARTITIONS (sales_west, sales_east, sales_central) INTO sales_all; 2.4 ToDo: --------- 2.4.1 Possibility to specify tablespace for the new partition (currently new partition is created in the same tablespace as partitioned table). 2.4.2 Possibility to use CONCURRENTLY mode that allows (during the MERGE operation) not blocking partitions that are not merging. 2.4.3 New syntax for ALTER TABLE ... MERGE PARTITIONS command for range partitioning (BY RANGE): ALTER TABLE <name> MERGE PARTITIONS <partition_name1> TO <partition_name2> INTO <new_partition_name>; This command can merge all partitions between <partition_name1> and <partition_name2> into new partition <new_partition_name>. This can be useful for this example cases: need to merge all one-month partitions into a year partition or need to merge all one-day partitions into a month partition. Your opinions are very much welcome! -- With best regards, Dmitry Koval.
Attachment
On Tue, 31 May 2022 at 11:33, Dmitry Koval <d.koval@postgrespro.ru> wrote: > > Hi, hackers! > > There are not many commands in PostgreSQL for working with partitioned > tables. This is an obstacle to their widespread use. > Adding SPLIT PARTITION/MERGE PARTITIONS operations can make easier to > use partitioned tables in PostgreSQL. That is quite a nice and useful feature to have. > (This is especially important when migrating projects from ORACLE DBMS.) > > SPLIT PARTITION/MERGE PARTITIONS commands are supported for range > partitioning (BY RANGE) and for list partitioning (BY LIST). > For hash partitioning (BY HASH) these operations are not supported. Just out of curiosity, why is SPLIT / MERGE support not included for HASH partitions? Because sibling partitions can have a different modulus, you should be able to e.g. split a partition with (modulus, remainder) of (3, 1) into two partitions with (mod, rem) of (6, 1) and (6, 4) respectively, with the reverse being true for merge operations, right? Kind regards, Matthias van de Meent
On Tue, 2022-05-31 at 12:32 +0300, Dmitry Koval wrote: > There are not many commands in PostgreSQL for working with partitioned > tables. This is an obstacle to their widespread use. > Adding SPLIT PARTITION/MERGE PARTITIONS operations can make easier to > use partitioned tables in PostgreSQL. > (This is especially important when migrating projects from ORACLE DBMS.) > > SPLIT PARTITION/MERGE PARTITIONS commands are supported for range > partitioning (BY RANGE) and for list partitioning (BY LIST). > For hash partitioning (BY HASH) these operations are not supported. +1 on the general idea. At least, it will makes these operations simpler, but probably also less invasive (no need to detach the affected partitions). I didn't read the patch, but what lock level does that place on the partitioned table? Anything more than ACCESS SHARE? Yours, Laurenz Albe
>Just out of curiosity, why is SPLIT / MERGE support not included for >HASH partitions? Because sibling partitions can have a different >modulus, you should be able to e.g. split a partition with (modulus, >remainder) of (3, 1) into two partitions with (mod, rem) of (6, 1) and >(6, 4) respectively, with the reverse being true for merge operations, >right? You are right, SPLIT/MERGE operations can be added for HASH-partitioning in the future. But HASH-partitioning is rarer than RANGE- and LIST-partitioning and I decided to skip it in the first step. Maybe community will say that SPLIT/MERGE commands are not needed... (At first step I would like to make sure that it is no true) P.S. I attached patch with 1-line warning fix (for cfbot). -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
> I didn't read the patch, but what lock level does that place on the > partitioned table? Anything more than ACCESS SHARE? Current patch locks a partitioned table with ACCESS EXCLUSIVE lock. Unfortunately only this lock guarantees that other session can not work with partitions that are splitting or merging. I want add CONCURRENTLY mode in future. With this mode partitioned table during SPLIT/MERGE operation will be locked with SHARE UPDATE EXCLUSIVE (as ATTACH/DETACH PARTITION commands in CONCURRENTLY mode). But in this case queries from other sessions that want to work with partitions that are splitting/merging at this time should receive an error (like "Partition data is moving. Repeat the operation later") because old partitions will be deleted at the end of SPLIT/MERGE operation. I hope exists a better solution, but I don't know it now... -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
On Tue, May 31, 2022 at 12:43 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:
>Just out of curiosity, why is SPLIT / MERGE support not included for
>HASH partitions? Because sibling partitions can have a different
>modulus, you should be able to e.g. split a partition with (modulus,
>remainder) of (3, 1) into two partitions with (mod, rem) of (6, 1) and
>(6, 4) respectively, with the reverse being true for merge operations,
>right?
You are right, SPLIT/MERGE operations can be added for HASH-partitioning
in the future. But HASH-partitioning is rarer than RANGE- and
LIST-partitioning and I decided to skip it in the first step.
Maybe community will say that SPLIT/MERGE commands are not needed... (At
first step I would like to make sure that it is no true)
P.S. I attached patch with 1-line warning fix (for cfbot).
--
With best regards,
Dmitry Koval
Postgres Professional: http://postgrespro.com
Hi,
For attachPartTable, the parameter wqueue is missing from comment.
The parameters of CloneRowTriggersToPartition are called parent and partition. I think it is better to name the parameters to attachPartTable in a similar manner.
For struct SplitPartContext, SplitPartitionContext would be better name.
+ /* Store partition contect into list. */
contect -> context
Cheers
On Tue, May 31, 2022 at 1:43 PM Zhihong Yu <zyu@yugabyte.com> wrote:
On Tue, May 31, 2022 at 12:43 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:>Just out of curiosity, why is SPLIT / MERGE support not included for
>HASH partitions? Because sibling partitions can have a different
>modulus, you should be able to e.g. split a partition with (modulus,
>remainder) of (3, 1) into two partitions with (mod, rem) of (6, 1) and
>(6, 4) respectively, with the reverse being true for merge operations,
>right?
You are right, SPLIT/MERGE operations can be added for HASH-partitioning
in the future. But HASH-partitioning is rarer than RANGE- and
LIST-partitioning and I decided to skip it in the first step.
Maybe community will say that SPLIT/MERGE commands are not needed... (At
first step I would like to make sure that it is no true)
P.S. I attached patch with 1-line warning fix (for cfbot).
--
With best regards,
Dmitry Koval
Postgres Professional: http://postgrespro.comHi,For attachPartTable, the parameter wqueue is missing from comment.The parameters of CloneRowTriggersToPartition are called parent and partition. I think it is better to name the parameters to attachPartTable in a similar manner.For struct SplitPartContext, SplitPartitionContext would be better name.+ /* Store partition contect into list. */contect -> contextCheers
Hi,
For transformPartitionCmdForMerge(), nested loop is used to detect duplicate names.
If the number of partitions in partcmd->partlist, we should utilize map to speed up the check.
For check_parent_values_in_new_partitions():
+ if (!find_value_in_new_partitions(&key->partsupfunc[0],
+ key->partcollation, parts, nparts, datum, false))
+ found = false;
+ key->partcollation, parts, nparts, datum, false))
+ found = false;
It seems we can break out of the loop when found is false.
Cheers
Hi, 1) > For attachPartTable, the parameter wqueue is missing from comment. > The parameters of CloneRowTriggersToPartition are called parent and partition. > I think it is better to name the parameters to attachPartTable in a similar manner. > > For struct SplitPartContext, SplitPartitionContext would be better name. > > + /* Store partition contect into list. */ > contect -> context Thanks, changed. 2) > For transformPartitionCmdForMerge(), nested loop is used to detect duplicate names. > If the number of partitions in partcmd->partlist, we should utilize map to speed up the check. I'm not sure what we should utilize map in this case because chance that number of merging partitions exceed dozens is low. Is there a function example that uses a map for such a small number of elements? 3) > For check_parent_values_in_new_partitions(): > > + if (!find_value_in_new_partitions(&key->partsupfunc[0], > + key->partcollation, parts, nparts, datum, false)) > + found = false; > > It seems we can break out of the loop when found is false. We have implicit "break" in "for" construction: + for (i = 0; i < boundinfo->ndatums && found; i++) I'll change it to explicit "break;" to avoid confusion. Attached patch with the changes described above. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Wed, Jun 1, 2022 at 11:58 AM Dmitry Koval <d.koval@postgrespro.ru> wrote:
Hi,
1)
> For attachPartTable, the parameter wqueue is missing from comment.
> The parameters of CloneRowTriggersToPartition are called parent and partition.
> I think it is better to name the parameters to attachPartTable in a similar manner.
>
> For struct SplitPartContext, SplitPartitionContext would be better name.
>
> + /* Store partition contect into list. */
> contect -> context
Thanks, changed.
2)
> For transformPartitionCmdForMerge(), nested loop is used to detect duplicate names.
> If the number of partitions in partcmd->partlist, we should utilize map to speed up the check.
I'm not sure what we should utilize map in this case because chance that
number of merging partitions exceed dozens is low.
Is there a function example that uses a map for such a small number of
elements?
3)
> For check_parent_values_in_new_partitions():
>
> + if (!find_value_in_new_partitions(&key->partsupfunc[0],
> + key->partcollation, parts, nparts, datum, false))
> + found = false;
>
> It seems we can break out of the loop when found is false.
We have implicit "break" in "for" construction:
+ for (i = 0; i < boundinfo->ndatums && found; i++)
I'll change it to explicit "break;" to avoid confusion.
Attached patch with the changes described above.
--
With best regards,
Dmitry Koval
Postgres Professional: http://postgrespro.com
Hi,
Thanks for your response.
w.r.t. #2, I think using nested loop is fine for now.
If, when this feature is merged, some user comes up with long merge list, we can revisit this topic.
Cheers
Hi! Patch stop applying due to changes in upstream. Here is a rebased version. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Wed, Jul 13, 2022 at 11:28 AM Dmitry Koval <d.koval@postgrespro.ru> wrote:
Hi!
Patch stop applying due to changes in upstream.
Here is a rebased version.
--
With best regards,
Dmitry Koval
Postgres Professional: http://postgrespro.com
Hi,
+attachPartTable(List **wqueue, Relation rel, Relation partition, PartitionBoundSpec *bound)
I checked naming of existing methods, such as AttachPartitionEnsureIndexes.
I think it would be better if the above method is named attachPartitionTable.
+ if (!defaultPartCtx && OidIsValid(defaultPartOid))
+ {
+ pc = createSplitPartitionContext(table_open(defaultPartOid, AccessExclusiveLock));
+ {
+ pc = createSplitPartitionContext(table_open(defaultPartOid, AccessExclusiveLock));
Since the value of pc would be passed to defaultPartCtx, there is no need to assign to pc above. You can assign directly to defaultPartCtx.
+ /* Drop splitted partition. */
splitted -> split
+ /* Rename new partition if it is need. */
need -> needed.
Cheers
Thanks you! I've fixed all things mentioned. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Wed, Jul 13, 2022 at 1:05 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:
Thanks you!
I've fixed all things mentioned.
--
With best regards,
Dmitry Koval
Postgres Professional: http://postgrespro.com
Hi,
Toward the end of ATExecSplitPartition():
+ table_close(newPartRel, NoLock);
Why is NoLock passed (instead of AccessExclusiveLock) ?
Cheers
> + /* Unlock new partition. */ > + table_close(newPartRel, NoLock); > > Why is NoLock passed (instead of AccessExclusiveLock) ? Thanks! You're right, I replaced the comment with "Keep the lock until commit.". -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
This is not a review, but I think the isolation tests should be expanded. At least, include the case of serializable transactions being involved. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)
> This is not a review, but I think the isolation tests should be > expanded. At least, include the case of serializable transactions being > involved. Thanks! I will expand the tests for the next commitfest. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Hi! Patch stop applying due to changes in upstream. Here is a rebased version. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
> I will expand the tests for the next commitfest. Hi! Combinations of isolation modes (READ COMMITTED/REPEATABLE READ/SERIALIZABLE) were added to test src/test/isolation/specs/partition-split-merge.spec -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Hi! Patch stop applying due to changes in upstream. Here is a rebased version. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Wed, Sep 07, 2022 at 08:03:09PM +0300, Dmitry Koval wrote: > Hi! > > Patch stop applying due to changes in upstream. > Here is a rebased version. This crashes on freebsd with -DRELCACHE_FORCE_RELEASE https://cirrus-ci.com/task/6565371623768064 https://cirrus-ci.com/task/6145355992530944 Note that that's a modified cirrus script from my CI improvements branch which also does some extra/different things. -- Justin
Thanks a lot Justin! After compilation PostgreSQL+patch with macros RELCACHE_FORCE_RELEASE, COPY_PARSE_PLAN_TREES, WRITE_READ_PARSE_PLAN_TREES, RAW_EXPRESSION_COVERAGE_TEST, RANDOMIZE_ALLOCATED_MEMORY, I saw a problem on Windows 10, MSVC2019. (I hope this problem was the same as on Cirrus CI). Attached patch with fix. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Thu, Sep 08, 2022 at 02:35:24PM +0300, Dmitry Koval wrote: > Thanks a lot Justin! > > After compilation PostgreSQL+patch with macros > RELCACHE_FORCE_RELEASE, > RANDOMIZE_ALLOCATED_MEMORY, > I saw a problem on Windows 10, MSVC2019. Yes, it passes tests on my CI improvements branch. https://github.com/justinpryzby/postgres/runs/8248668269 Thanks to Alexander Pyhalov for reminding me about RELCACHE_FORCE_RELEASE last year ;) On Tue, May 31, 2022 at 12:32:43PM +0300, Dmitry Koval wrote: > This can be useful for this example cases: > need to merge all one-day partitions > into a month partition. +1, we would use this (at least the MERGE half). I wonder if it's possible to reduce the size of this patch (I'm starting to try to absorb it). Is there a way to refactor/reuse existing code to reduce its footprint ? partbounds.c is adding 500+ LOC about checking if proposed partitions meet the requirements (don't overlap, etc). But a lot of those checks must already happen, no? Can you re-use/refactor the existing checks ? An UPDATE on a partitioned table will move tuples from one partition to another. Is there a way to re-use that ? Also, postgres already supports concurrent DDL (CREATE+ATTACH and DETACH CONCURRENTLY). Is it possible to leverage that ? (Mostly to reduce the patch size, but also because maybe some cases could be concurrent?). If the patch were split into separate parts for MERGE and SPLIT, would the first patch be significantly smaller than the existing patch (hopefully half as big) ? That would help to review it, even if both halves were ultimately squished together. (An easy way to do this is to open up all the files in separate editor instances, trim out the parts that aren't needed for the first patch, save the files but don't quit the editors, test compilation and regression tests, then git commit --amend -a. Then in each editor, "undo" all the trimmed changes, save, and git commit -a). Would it save much code if "default" partitions weren't handled in the first patch ? -- Justin
On 2022-Sep-08, Justin Pryzby wrote: > If the patch were split into separate parts for MERGE and SPLIT, would > the first patch be significantly smaller than the existing patch > (hopefully half as big) ? That would help to review it, even if both > halves were ultimately squished together. (An easy way to do this is to > open up all the files in separate editor instances, trim out the parts > that aren't needed for the first patch, save the files but don't quit > the editors, test compilation and regression tests, then git commit > --amend -a. Then in each editor, "undo" all the trimmed changes, save, > and git commit -a). An easier (IMO) way to do that is to use "git gui" or even "git add -p", which allow you to selectively add changed lines/hunks to the index. You add a few, commit, then add the rest, commit again. With "git add -p" you can even edit individual hunks in an editor in case you have a mix of both wanted and unwanted in a single hunk (after "s"plitting, of course), which turns out to be easier than it sounds. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El sudor es la mejor cura para un pensamiento enfermo" (Bardia)
Thanks for your advice, Justin and Alvaro! I'll try to reduce the size of this patch and split it into separate parts (for MERGE and SPLIT). -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Hi! Two separate parts for MERGE and SPLIT partitions (without refactoring; it will be later) -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Tue, May 31, 2022 at 5:33 AM Dmitry Koval <d.koval@postgrespro.ru> wrote: > There are not many commands in PostgreSQL for working with partitioned > tables. This is an obstacle to their widespread use. > Adding SPLIT PARTITION/MERGE PARTITIONS operations can make easier to > use partitioned tables in PostgreSQL. > (This is especially important when migrating projects from ORACLE DBMS.) > > SPLIT PARTITION/MERGE PARTITIONS commands are supported for range > partitioning (BY RANGE) and for list partitioning (BY LIST). > For hash partitioning (BY HASH) these operations are not supported. This may be a good idea, but I would like to point out one disadvantage of this approach. If you know that a certain partition is not changing, and you would like to split it, you can create two or more new standalone tables and populate them from the original partition using INSERT .. SELECT. Then you can BEGIN a transaction, DETACH the existing partitions, and ATTACH the replacement ones. By doing this, you take an ACCESS EXCLUSIVE lock on the partitioned table only for a brief period. The same kind of idea can be used to merge partitions. It seems hard to do something comparable with built-in DDL for SPLIT PARTITION and MERGE PARTITION. You could start by taking e.g. SHARE lock on the existing partition(s) and then wait until the end to take ACCESS EXCLUSIVE lock on the partitions, but we typically avoid such coding patterns, because the lock upgrade might deadlock and then a lot of work would be wasted. So most likely with the approach you propose here you will end up acquiring ACCESS EXCLUSIVE lock at the beginning of the operation and then shuffle a lot of data around while still holding it, which is pretty painful. Because of this problem, I find it hard to believe that these commands would get much use, except perhaps on small tables or in non-production environments, unless people just didn't know about the alternatives. That's not to say that something like this has no value. As a convenience feature, it's fine. It's just hard for me to see it as any more than that. -- Robert Haas EDB: http://www.enterprisedb.com
Thanks for comments and advice! I thought about this problem and discussed about it with colleagues. Unfortunately, I don't know of a good general solution. 19.09.2022 22:56, Robert Haas пишет: > If you know that a certain partition is not changing, and you would > like to split it, you can create two or more new standalone tables and > populate them from the original partition using INSERT .. SELECT. Then > you can BEGIN a transaction, DETACH the existing partitions, and > ATTACH the replacement ones. By doing this, you take an ACCESS > EXCLUSIVE lock on the partitioned table only for a brief period. The > same kind of idea can be used to merge partitions. But for specific situation like this (certain partition is not changing) we can add CONCURRENTLY modifier. Our DDL query can be like ALTER TABLE...SPLIT PARTITION [CONCURRENTLY]; With CONCURRENTLY modifier we can lock partitioned table in ShareUpdateExclusiveLock mode and split partition - in AccessExclusiveLock mode. So we don't lock partitioned table in AccessExclusiveLock mode and can modify other partitions during SPLIT operation (except split partition). If smb try to modify split partition, he will receive error "relation does not exist" at end of operation (because split partition will be drop). -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
On Mon, Sep 19, 2022 at 4:42 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > Thanks for comments and advice! > I thought about this problem and discussed about it with colleagues. > Unfortunately, I don't know of a good general solution. Yeah, me neither. > But for specific situation like this (certain partition is not changing) > we can add CONCURRENTLY modifier. > Our DDL query can be like > > ALTER TABLE...SPLIT PARTITION [CONCURRENTLY]; > > With CONCURRENTLY modifier we can lock partitioned table in > ShareUpdateExclusiveLock mode and split partition - in > AccessExclusiveLock mode. So we don't lock partitioned table in > AccessExclusiveLock mode and can modify other partitions during SPLIT > operation (except split partition). > If smb try to modify split partition, he will receive error "relation > does not exist" at end of operation (because split partition will be drop). I think that a built-in DDL command can't really assume that the user won't modify anything. You'd have to take a ShareLock. But you might be able to have a CONCURRENTLY variant of the command that does the same kind of multi-transaction thing as, e.g., CREATE INDEX CONCURRENTLY. You would probably have to be quite careful about race conditions (e.g. you commit the first transaction and before you start the second one, someone drops or detaches the partition you were planning to merge or split). Might take some thought, but feels possibly doable. I've never been excited enough about this kind of thing to want to put a lot of energy into engineering it, because doing it "manually" feels so much nicer to me, and doubly so given that we now have ATTACH CONCURRENTLY and DETACH CONCURRENTLY, but it does seem like a thing some people would probably use and value. -- Robert Haas EDB: http://www.enterprisedb.com
Hi! Fixed couple warnings (for cfbot). -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Tue, Oct 11, 2022 at 9:22 AM Dmitry Koval <d.koval@postgrespro.ru> wrote:
Hi!
Fixed couple warnings (for cfbot).
--
With best regards,
Dmitry Koval
Postgres Professional: http://postgrespro.com
Hi,
For v12-0001-PGPRO-ALTER-TABLE-MERGE-PARTITIONS-command.patch:
+ /* One new partition can have the same name as merged partition. */
+ isSameName = true;
I think there should be a check before assigning true to isSameName - if isSameName is true, that means there are two partitions with this same name.
Cheers
On Tue, Oct 11, 2022 at 9:58 AM Zhihong Yu <zyu@yugabyte.com> wrote:
On Tue, Oct 11, 2022 at 9:22 AM Dmitry Koval <d.koval@postgrespro.ru> wrote:Hi!
Fixed couple warnings (for cfbot).
--
With best regards,
Dmitry Koval
Postgres Professional: http://postgrespro.comHi,For v12-0001-PGPRO-ALTER-TABLE-MERGE-PARTITIONS-command.patch:+ if (equal(name, cmd->name))
+ /* One new partition can have the same name as merged partition. */+ isSameName = true;I think there should be a check before assigning true to isSameName - if isSameName is true, that means there are two partitions with this same name.Cheers
Pardon - I see that transformPartitionCmdForMerge() compares the partition names.
Maybe you can add a comment in ATExecMergePartitions referring to transformPartitionCmdForMerge() so that people can more easily understand the logic.
Hi! >Maybe you can add a comment in ATExecMergePartitions referring to >transformPartitionCmdForMerge() so that people can more easily >understand the logic. Thanks, comment added. Patch stop applying due to changes in upstream. Here is a fixed version. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
I'm sorry, I couldn't answer earlier... 1. > partbounds.c is adding 500+ LOC about checking if proposed partitions > meet the requirements (don't overlap, etc). But a lot of those > checks must already happen, no? Can you re-use/refactor the existing > checks ? I a bit reduced the number of lines in partbounds.c and added comments. Unfortunately, it is very difficult to re-use existing checks for other partitioned tables operations, because mostly part of PostgreSQL commands works with a single partition. So for SPLIT/MERGE commands were created new checks for several partitions. 2. > Also, postgres already supports concurrent DDL (CREATE+ATTACH and > DETACH CONCURRENTLY). Is it possible to leverage that ? > (Mostly to reduce the patch size, but also because maybe some cases > could be concurrent?). Probably "ATTACH CONCURRENTLY" is not supported? A few words about "DETACH CONCURRENTLY". "DETACH CONCURRENTLY" can works because this command not move rows during detach partition (and so no reason to block detached partition). "DETACH CONCURRENTLY" do not changes data, but changes partition description (partition is marked as "inhdetachpending = true" etc.). For SPLIT and MERGE the situation is completely different - these commands transfer rows between sections. Therefore partitions must be LOCKED EXCLUSIVELY during rows transfer. Probably we can use concurrently partitions not participating in SPLIT and MERGE. But now PostgreSQL has no possibilities to forbid using a part of partitions of a partitioned table (until the end of data transfer by SPLIT/MERGE commands). Simple locking is not quite suitable here. I see only one variant of SPLIT/MERGE CONCURRENTLY implementation that can be realized now: * ShareUpdateExclusiveLock on partitioned table; * AccessExclusiveLock on partition(s) which will be deleted and will be created during SPLIT/MEGRE command; * transferring data between locked sections; operations with non-blocked partitions are allowed; * sessions which want to use partition(s) which will be deleted, waits on locks; * finally we release AccessExclusiveLock on partition(s) which will be deleted and delete them; * waiting sessions will get errors "relation ... does not exist" (we can transform it to "relation structure was changed ... please try again"?). It doesn't look pretty. Therefore for the SPLIT/MERGE command the partitioned table is locked with AccessExclusiveLock. 3. > An UPDATE on a partitioned table will move tuples from one partition > to another. Is there a way to re-use that? This could be realized using methods that are called from ExecCrossPartitionUpdate(). But using these methods is more expensive than the current implementation of the SPLIT/MERGE commands. SPLIT/MERGE commands uses "bulk insert" and there is low overhead for finding a partition to insert data: for MERGE is not need to search partition; for SPLIT need to use simple search from several partitions (listed in the SPLIT command). Below is a test example. a. Transferring data from the table "test2" to partitions "partition1" and "partition2" using the current implementation of tuple routing in PostgreSQL: CREATE TABLE test (a int, b char(10)) PARTITION BY RANGE (a); CREATE TABLE partition1 PARTITION OF test FOR VALUES FROM (10) TO (20); CREATE TABLE partition2 PARTITION OF test FOR VALUES FROM (20) TO (30); CREATE TABLE test2 (a int, b char(10)); INSERT INTO test2 (a, b) SELECT 11, 'a' FROM generate_series(1, 1000000); INSERT INTO test2 (a, b) SELECT 22, 'b' FROM generate_series(1, 1000000); INSERT INTO test(a, b) SELECT a, b FROM test2; DROP TABLE test2; DROP TABLE test; Three attempts (the results are little different), the best result: INSERT 0 2000000 Time: 4467,814 ms (00:04,468) b. Transferring data from the partition "partition0" to partitions "partition 1" and "partition2" using SPLIT command: CREATE TABLE test (a int, b char(10)) PARTITION BY RANGE (a); CREATE TABLE partition0 PARTITION OF test FOR VALUES FROM (0) TO (30); INSERT INTO test (a, b) SELECT 11, 'a' FROM generate_series(1, 1000000); INSERT INTO test (a, b) SELECT 22, 'b' FROM generate_series(1, 1000000); ALTER TABLE test SPLIT PARTITION partition0 INTO (PARTITION partition0 FOR VALUES FROM (0) TO (10), PARTITION partition1 FOR VALUES FROM (10) TO (20), PARTITION partition2 FOR VALUES FROM (20) TO (30)); DROP TABLE test; Three attempts (the results are little different), the best result: ALTER TABLE Time: 3840,127 ms (00:03,840) So the current implementation of tuple routing is ~16% slower than the SPLIT command. That's quite a lot. With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, failed Feature is clearly missing with partition handling in PostgreSQL, so, this patch is very welcome (as are futur steps) Code presents good, comments are explicit Patch v14 apply nicely on 4f46f870fa56fa73d6678273f1bd059fdd93d5e6 Compilation ok with meson compile LCOV after meson test shows good new code coverage. Documentation is missing in v14.
Hi! > Documentation: tested, failed Added documentation (as separate commit). -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Hi, Patch v15-0001-ALTER-TABLE-MERGE-PARTITIONS-command.patch Apply nicely. One warning on meson compile (configure -Dssl=openssl -Dldap=enabled -Dauto_features=enabled -DPG_TEST_EXTRA='ssl,ldap,kerberos'-Dbsd_auth=disabled -Dbonjour=disabled -Dpam=disabled -Dpltcl=disabled -Dsystemd=disabled-Dzstd=disabled -Db_coverage=true) ../../src/pgmergesplit/src/test/modules/test_ddl_deparse/test_ddl_deparse.c: In function ‘get_altertable_subcmdinfo’: ../../src/pgmergesplit/src/test/modules/test_ddl_deparse/test_ddl_deparse.c:112:17: warning: enumeration value ‘AT_MergePartitions’not handled in switch [-Wswitch] 112 | switch (subcmd->subtype) | ^~~~~~ Should be the same with 0002... meson test perfect, patch coverage is very good. Patch v15-0002-ALTER-TABLE-SPLIT-PARTITION-command.patch Doesn't apply on 326a33a289c7ba2dbf45f17e610b7be98dc11f67 Patch v15-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch Apply with one warning 1 line add space error (translate from french "warning: 1 ligne a ajouté des erreurs d'espace"). v15-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch:54: trailing whitespace. One of the new partitions <replaceable class="parameter">partition_name1</replaceable>, Comment are ok for me. A non native english speaker. Perhaps you could add some remarks in ddl.html and alter-ddl.html Stéphane The new status of this patch is: Waiting on Author
Thank you! Corrected version in attachment. Strange that cfbot didn't show this warning ... -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, failed Hi, Just a minor warning with documentation patch git apply ../v16-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch ../v16-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch:54: trailing whitespace. One of the new partitions <replaceable class="parameter">partition_name1</replaceable>, warning: 1 ligne a ajouté des erreurs d'espace. (perhaps due to my Ubuntu 22.04.2 french install) Everything else is ok. Thanks a lot for your work Stéphane
Thanks! I missed the trailing whitespace. Corrected. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
This patch no longer applies to master, please submit a rebased version to the thread. I've marked the CF entry as waiting for author in the meantime. -- Daniel Gustafsson
Thanks, Daniel! > This patch no longer applies to master, please submit a rebased > version to the thread. Here is a rebased version. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation: not tested Only documentation patch applied on 4e465aac36ce9a9533c68dbdc83e67579880e628 Checked with v18 The new status of this patch is: Waiting on Author
Thank you, Stephane! Rebased version attached to email. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed It is just a rebase I check with make and meson run manual split and merge on list and range partition Doc fits The new status of this patch is: Ready for Committer
Rebased version attached to email. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Hello! Added commit v21-0004-SPLIT-PARTITION-optimization.patch. Three already existing commits did not change (v21-0001-ALTER-TABLE-MERGE-PARTITIONS-command.patch, v21-0002-ALTER-TABLE-SPLIT-PARTITION-command.patch, v21-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch). The new commit is an optimization for the SPLIT PARTITION command. Description of optimization: 1) optimization is used for the SPLIT PARTITION command for tables with BY RANGE partitioning in case the partitioning key has a b-tree index; 2) the point of optimization is that, if after dividing of the old partition, all its records according to the range conditions must be inserted into ONE new partition, then instead of transferring data (from the old partition to new partition), the old partition will be renamed. Example. Suppose we have a BY RANGE-partitioned table "test" (indexed by partitioning key) with a single partition "test_default", which we want to split into two partitions ("test_1" and "test_default"), and all records should be moved to the "test_1" partition. When executing the script below, the "test_default" partition will be renamed to "test_1". ---- CREATE TABLE test(d date, v text) PARTITION BY RANGE (d); CREATE TABLE test_default PARTITION OF test DEFAULT; CREATE INDEX idx_test_d ON test USING btree (d); INSERT INTO test (d, v) SELECT d, 'value_' || md5(random()::text) FROM generate_series('2024-01-01', '2024-01-25', interval '10 seconds') AS d; -- Oid of table 'test_default': SELECT 'test_default'::regclass::oid AS previous_partition_oid; ALTER TABLE test SPLIT PARTITION test_default INTO (PARTITION test_1 FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'), PARTITION test_default DEFAULT); -- Oid of table 'test_1' (should be the same as "previous_partition_oid"): SELECT 'test_1'::regclass::oid AS current_partition_oid; DROP TABLE test CASCADE; -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Mon, 4 Dec 2023 at 13:22, Dmitry Koval <d.koval@postgrespro.ru> wrote: > > Hello! > > Added commit v21-0004-SPLIT-PARTITION-optimization.patch. CFBot shows that the patch does not apply anymore as in [1]: === Applying patches on top of PostgreSQL commit ID 8ba6fdf905d0f5aef70ced4504c6ad297bfe08ea === === applying patch ./v21-0001-ALTER-TABLE-MERGE-PARTITIONS-command.patch patching file src/backend/commands/tablecmds.c ... Hunk #7 FAILED at 18735. Hunk #8 succeeded at 20608 (offset 315 lines). 1 out of 8 hunks FAILED -- saving rejects to file src/backend/commands/tablecmds.c.rej patching file src/backend/parser/gram.y Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_3659.log Regards, Vignesh
On 2024-Jan-26, vignesh C wrote: > Please post an updated version for the same. Here's a rebase. I only fixed the conflicts, didn't review. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Attachment
On 2024-Jan-26, Alvaro Herrera wrote: > On 2024-Jan-26, vignesh C wrote: > > > Please post an updated version for the same. > > Here's a rebase. I only fixed the conflicts, didn't review. Hmm, but I got the attached regression.diffs with it. I didn't investigate further, but it looks like the recent changes to replication identity for partitioned tables has broken the regression tests. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "This is what I like so much about PostgreSQL. Most of the surprises are of the "oh wow! That's cool" Not the "oh shit!" kind. :)" Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
Attachment
git format-patch -4 HEAD -v 23 ============================= Thanks! I excluded regression test "Test: split partition witch identity column" from script src/test/regress/sql/partition_split.sql because after commit [1] partitions cannot contain identity columns and queries CREATE TABLE salesmans2_5(salesman_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, salesman_name VARCHAR(30)); ALTER TABLE salesmans ATTACH PARTITION salesmans2_5 FOR VALUES FROM (2) TO (5); returns ERROR: table "salesmans2_5" being attached contains an identity column "salesman_id" DETAIL: The new partition may not contain an identity column. [1] https://github.com/postgres/postgres/commit/699586315704a8268808e3bdba4cb5924a038c49 -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
I thought it's wrong to exclude the IDENTITY-column test, so I fixed the test and return it back. Changes in attachment (commit v24-0002-ALTER-TABLE-SPLIT-PARTITION-command.patch). -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
> On 26 Jan 2024, at 23:36, Dmitry Koval <d.koval@postgrespro.ru> wrote: > > <v24-0001-ALTER-TABLE-MERGE-PARTITIONS-command.patch><v24-0002-ALTER-TABLE-SPLIT-PARTITION-command.patch><v24-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch><v24-0004-SPLIT-PARTITION-optimization.patch> The CF entry was in Ready for Committer state no so long ago. Stephane, you might want to review recent version after it was rebased on current HEAD. CFbot's test passed successfully. Thanks! Best regards, Andrey Borodin.
Hi! Rebased version attached to email. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation: not tested Hi, I have failing tap test after patches apply: ok 201 + partition_merge 2635 ms not ok 202 + partition_split 5719 ms @@ -805,6 +805,7 @@ (PARTITION salesmans2_3 FOR VALUES FROM (2) TO (3), PARTITION salesmans3_4 FOR VALUES FROM (3) TO (4), PARTITION salesmans4_5 FOR VALUES FROM (4) TO (5)); +ERROR: no owned sequence found INSERT INTO salesmans (salesman_name) VALUES ('May'); INSERT INTO salesmans (salesman_name) VALUES ('Ford'); SELECT * FROM salesmans1_2; @@ -814,23 +815,17 @@ (1 row) SELECT * FROM salesmans2_3; - salesman_id | salesman_name --------------+--------------- - 2 | Ivanov -(1 row) - +ERROR: relation "salesmans2_3" does not exist +LINE 1: SELECT * FROM salesmans2_3; The new status of this patch is: Waiting on Author
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation: not tested Sorry, tests passed when applying all patches. I planned to check without optimisation first. The new status of this patch is: Needs review
Hi! > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed Thanks for info! I was unable to reproduce the problem and I wanted to ask for clarification. But your message was ahead of my question. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
On Tue, Sep 20, 2022 at 3:21 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Sep 19, 2022 at 4:42 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > > Thanks for comments and advice! > > I thought about this problem and discussed about it with colleagues. > > Unfortunately, I don't know of a good general solution. > > Yeah, me neither. > > > But for specific situation like this (certain partition is not changing) > > we can add CONCURRENTLY modifier. > > Our DDL query can be like > > > > ALTER TABLE...SPLIT PARTITION [CONCURRENTLY]; > > > > With CONCURRENTLY modifier we can lock partitioned table in > > ShareUpdateExclusiveLock mode and split partition - in > > AccessExclusiveLock mode. So we don't lock partitioned table in > > AccessExclusiveLock mode and can modify other partitions during SPLIT > > operation (except split partition). > > If smb try to modify split partition, he will receive error "relation > > does not exist" at end of operation (because split partition will be drop). > > I think that a built-in DDL command can't really assume that the user > won't modify anything. You'd have to take a ShareLock. > > But you might be able to have a CONCURRENTLY variant of the command > that does the same kind of multi-transaction thing as, e.g., CREATE > INDEX CONCURRENTLY. You would probably have to be quite careful about > race conditions (e.g. you commit the first transaction and before you > start the second one, someone drops or detaches the partition you were > planning to merge or split). Might take some thought, but feels > possibly doable. I've never been excited enough about this kind of > thing to want to put a lot of energy into engineering it, because > doing it "manually" feels so much nicer to me, and doubly so given > that we now have ATTACH CONCURRENTLY and DETACH CONCURRENTLY, but it > does seem like a thing some people would probably use and value. +1 Currently people are using external tools to implement this kind of task. However, having this functionality in core would be great. Implementing concurrent merge/split seems quite a difficult task, which needs careful design. It might be too hard to carry around the syntax altogether. So, I think having basic syntax in-core is a good step forward. But I think we need a clear notice in the documentation about the concurrency to avoid wrong user expectations. ------ Regards, Alexander Korotkov
On Tue, Mar 19, 2024 at 4:43 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > Thanks for info! > I was unable to reproduce the problem and I wanted to ask for > clarification. But your message was ahead of my question. I've revised the patchset. I mostly did some refactoring, code improvements and wrote new comments. If I apply just the first two patches, I get the same error as [1]. This error happens when createPartitionTable() tries to copy the identity of another partition. I've fixed that by skipping a copy of the identity of another partition (remove CREATE_TABLE_LIKE_IDENTITY from TableLikeClause.options). BTW, the same error happened to me when I manually ran CREATE TABLE ... (LIKE ... INCLUDING IDENTITY) for a partition of the table with identity. So, this probably deserves a separate fix, but I think not directly related to this patch. I have one question. When merging partitions you're creating a merged partition like the parent table. But when splitting a partition you're creating new partitions like the partition being split. What motivates this difference? Links. 1. https://www.postgresql.org/message-id/171085360143.2046436.7217841141682511557.pgcf%40coridan.postgresql.org ------ Regards, Alexander Korotkov
Attachment
Hi! > I've fixed that by skipping a copy of the identity of another > partition (remove CREATE_TABLE_LIKE_IDENTITY from > TableLikeClause.options). Thanks for correction! Probably I should have looked at the code more closely after commit [1]. I'm also very glad that situation [2] was reproduced. > When merging partitions you're creating a merged partition like the > parent table. But when splitting a partition you're creating new > partitions like the partition being split. What motivates this > difference? When splitting a partition, I planned to set parameters for each of the new partitions (for example, tablespace parameter). It would make sense if we want to transfer part of the data of splitting partition to a slower (archive) storage device. Right now I haven't seen any interest in this functionality, so it hasn't been implemented yet. But I think this will be needed in the future. Special thanks for the hint that new structures should be added to the list src\tools\pgindent\typedefs.list. Links. [1] https://github.com/postgres/postgres/commit/699586315704a8268808e3bdba4cb5924a038c49 [2] https://www.postgresql.org/message-id/171085360143.2046436.7217841141682511557.pgcf%40coridan.postgresql.org -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Hi, Dmitry! Thank you for your feedback! On Wed, Mar 27, 2024 at 10:18 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > > I've fixed that by skipping a copy of the identity of another > > partition (remove CREATE_TABLE_LIKE_IDENTITY from > > TableLikeClause.options). > > Thanks for correction! > Probably I should have looked at the code more closely after commit [1]. > I'm also very glad that situation [2] was reproduced. > > > When merging partitions you're creating a merged partition like the > > parent table. But when splitting a partition you're creating new > > partitions like the partition being split. What motivates this > > difference? > > When splitting a partition, I planned to set parameters for each of the > new partitions (for example, tablespace parameter). > It would make sense if we want to transfer part of the data of splitting > partition to a slower (archive) storage device. > Right now I haven't seen any interest in this functionality, so it > hasn't been implemented yet. But I think this will be needed in the future. OK, I've changed the code to use the parent table as a template for new partitions in split case. So, now it's the same in both split and merge cases. I also added a special note into docs about ACCESS EXCLUSIVE lock, because I believe that's a significant limitation for usage of this functionality. I think 0001, 0002 and 0003 could be considered for pg17. I will continue reviewing them. 0004 might require more work. I didn't rebase it for now. I suggest we can rebase it later and consider for pg18. ------ Regards, Alexander Korotkov
Attachment
Hi, Alexander! Thank you very much for your work on refactoring the commits! Yesterday I received an email from adjkldd@126.com <winterloo@126.com> with a proposal for optimization (MERGE PARTITION command) for cases where the target partition has a name identical to one of the merging partition names. I think this optimization is worth considering. A simplified version of the optimization is attached to this letter (difference is 10-15 lines). All changes made in one commit (v28-0001-ALTER-TABLE-MERGE-PARTITIONS-command.patch) and in one function (ATExecMergePartitions). In your opinion, should we added this optimization now or should it be left for later? -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Hi! Patch stop applying due to changes in upstream. Here is a rebased version. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Hi! On Sun, Mar 31, 2024 at 5:12 AM Dmitry Koval <d.koval@postgrespro.ru> wrote: > Patch stop applying due to changes in upstream. > Here is a rebased version. I've revised the patchset. Now there are two self-contained patches coming with the documentation. Also, now each command has a paragraph in the "Data definition" chapter. Also documentation and tests contain geographical partitioning with all Russian cities. I think that might create a country-centric feeling for the reader. I've edited that to make cities spread around the world to reflect the international spirit. Hope you're OK with this. Now, both merge and split commands make new partitions using the parent table as the template. And some other edits to comments, commit messages, documentation etc. I think this patch is well-reviewed and also has quite straightforward implementation. The major limitation of holding ACCESS EXCLUSIVE LOCK on the parent table is well-documented. I'm going to push this if no objections. ------ Regards, Alexander Korotkov
Attachment
Hi! > I've revised the patchset. Thanks for the corrections (especially ddl.sgml). Could you also look at a small optimization for the MERGE PARTITIONS command (in a separate file v31-0003-Additional-patch-for-ALTER-TABLE-.-MERGE-PARTITI.patch, I wrote about it in an email 2024-03-31 00:56:50)? Files v31-0001-*.patch, v31-0002-*.patch are the same as v30-0001-*.patch, v30-0002-*.patch (after rebasing because patch stopped applying due to changes in upstream). -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed All three patches applied nivcely. Code fits standart, comments are relevant. The new status of this patch is: Ready for Committer
Hi, Dmitry! On Fri, Apr 5, 2024 at 4:00 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > > I've revised the patchset. > > Thanks for the corrections (especially ddl.sgml). > Could you also look at a small optimization for the MERGE PARTITIONS > command (in a separate file > v31-0003-Additional-patch-for-ALTER-TABLE-.-MERGE-PARTITI.patch, I wrote > about it in an email 2024-03-31 00:56:50)? > > Files v31-0001-*.patch, v31-0002-*.patch are the same as > v30-0001-*.patch, v30-0002-*.patch (after rebasing because patch stopped > applying due to changes in upstream). I've pushed 0001 and 0002. I didn't push 0003 for the following reasons. 1) This doesn't keep functionality equivalent to 0001. With 0003, the merged partition will inherit indexes, constraints, and so on from the one of merging partitions. 2) This is not necessarily an optimization. Without 0003 indexes on the merged partition are created after moving the rows in attachPartitionTable(). With 0003 we merge data into the existing partition which saves its indexes. That might cause a significant performance loss because mass inserts into indexes may be much slower than building indexes from scratch. I think both aspects need to be carefully considered. Even if we accept them, this needs to be documented. I think now it's too late for both of these. So, this should wait for v18. ------ Regards, Alexander Korotkov
Hi, Alexander! > I didn't push 0003 for the following reasons. .... Thanks for clarifying. You are right, these are serious reasons. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Hi Alexander and Dmitry, 07.04.2024 01:22, Alexander Korotkov wrote: > I've pushed 0001 and 0002. I didn't push 0003 for the following reasons. Please try the following (erroneous) query: CREATE TABLE t1(i int, t text) PARTITION BY LIST (t); CREATE TABLE t1pa PARTITION OF t1 FOR VALUES IN ('A'); CREATE TABLE t2 (i int, t text) PARTITION BY RANGE (t); ALTER TABLE t2 SPLIT PARTITION t1pa INTO (PARTITION t2a FOR VALUES FROM ('A') TO ('B'), PARTITION t2b FOR VALUES FROM ('B') TO ('C')); that triggers an assertion failure: TRAP: failed Assert("datums != NIL"), File: "partbounds.c", Line: 3434, PID: 1841459 or a segfault (in a non-assert build): Program terminated with signal SIGSEGV, Segmentation fault. #0 pg_detoast_datum_packed (datum=0x0) at fmgr.c:1866 1866 if (VARATT_IS_COMPRESSED(datum) || VARATT_IS_EXTERNAL(datum)) (gdb) bt #0 pg_detoast_datum_packed (datum=0x0) at fmgr.c:1866 #1 0x000055f38c5d5e3f in bttextcmp (...) at varlena.c:1834 #2 0x000055f38c6030dd in FunctionCall2Coll (...) at fmgr.c:1161 #3 0x000055f38c417c83 in partition_rbound_cmp (...) at partbounds.c:3525 #4 check_partition_bounds_for_split_range (...) at partbounds.c:5221 #5 check_partitions_for_split (...) at partbounds.c:5688 #6 0x000055f38c256c49 in transformPartitionCmdForSplit (...) at parse_utilcmd.c:3451 #7 transformAlterTableStmt (...) at parse_utilcmd.c:3810 #8 0x000055f38c2bdf9c in ATParseTransformCmd (...) at tablecmds.c:5650 ... Best regards, Alexander
Hi, Alexander! On Sun, Apr 7, 2024 at 10:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > 07.04.2024 01:22, Alexander Korotkov wrote: > > I've pushed 0001 and 0002. I didn't push 0003 for the following reasons. > > Please try the following (erroneous) query: > CREATE TABLE t1(i int, t text) PARTITION BY LIST (t); > CREATE TABLE t1pa PARTITION OF t1 FOR VALUES IN ('A'); > > CREATE TABLE t2 (i int, t text) PARTITION BY RANGE (t); > ALTER TABLE t2 SPLIT PARTITION t1pa INTO > (PARTITION t2a FOR VALUES FROM ('A') TO ('B'), > PARTITION t2b FOR VALUES FROM ('B') TO ('C')); > > that triggers an assertion failure: > TRAP: failed Assert("datums != NIL"), File: "partbounds.c", Line: 3434, PID: 1841459 > > or a segfault (in a non-assert build): > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 pg_detoast_datum_packed (datum=0x0) at fmgr.c:1866 > 1866 if (VARATT_IS_COMPRESSED(datum) || VARATT_IS_EXTERNAL(datum)) > (gdb) bt > #0 pg_detoast_datum_packed (datum=0x0) at fmgr.c:1866 > #1 0x000055f38c5d5e3f in bttextcmp (...) at varlena.c:1834 > #2 0x000055f38c6030dd in FunctionCall2Coll (...) at fmgr.c:1161 > #3 0x000055f38c417c83 in partition_rbound_cmp (...) at partbounds.c:3525 > #4 check_partition_bounds_for_split_range (...) at partbounds.c:5221 > #5 check_partitions_for_split (...) at partbounds.c:5688 > #6 0x000055f38c256c49 in transformPartitionCmdForSplit (...) at parse_utilcmd.c:3451 > #7 transformAlterTableStmt (...) at parse_utilcmd.c:3810 > #8 0x000055f38c2bdf9c in ATParseTransformCmd (...) at tablecmds.c:5650 Thank you for spotting this. This seems like a missing check. I'm going to get a closer look at this tomorrow. ------ Regards, Alexander Korotkov
08.04.2024 01:15, Alexander Korotkov wrote: > Thank you for spotting this. This seems like a missing check. I'm > going to get a closer look at this tomorrow. > Thanks! There is also an anomaly with the MERGE command: CREATE TABLE t1 (i int, a int, b int, c int) PARTITION BY RANGE (a, b); CREATE TABLE t1p1 PARTITION OF t1 FOR VALUES FROM (1, 1) TO (1, 2); CREATE TABLE t2 (i int, t text) PARTITION BY RANGE (t); CREATE TABLE t2pa PARTITION OF t2 FOR VALUES FROM ('A') TO ('C'); CREATE TABLE t3 (i int, t text); ALTER TABLE t2 MERGE PARTITIONS (t1p1, t2pa, t3) INTO t2p; leads to: ERROR: partition bound for relation "t3" is null WARNING: problem in alloc set PortalContext: detected write past chunk end in block 0x55f1ef42f820, chunk 0x55f1ef42ff40 WARNING: problem in alloc set PortalContext: detected write past chunk end in block 0x55f1ef42f820, chunk 0x55f1ef42ff40 (I'm also not sure that the error message is clear enough (can't we say "relation X is not a partition of relation Y" in this context, as in MarkInheritDetached(), for example?).) Whilst with ALTER TABLE t2 MERGE PARTITIONS (t1p1, t2pa) INTO t2p; I get: Program terminated with signal SIGSEGV, Segmentation fault. #0 pg_detoast_datum_packed (datum=0x1) at fmgr.c:1866 1866 if (VARATT_IS_COMPRESSED(datum) || VARATT_IS_EXTERNAL(datum)) (gdb) bt #0 pg_detoast_datum_packed (datum=0x1) at fmgr.c:1866 #1 0x000055d77d00fde2 in bttextcmp (...) at ../../../../src/include/postgres.h:314 #2 0x000055d77d03fa27 in FunctionCall2Coll (...) at fmgr.c:1161 #3 0x000055d77ce1572f in partition_rbound_cmp (...) at partbounds.c:3525 #4 0x000055d77ce157b9 in qsort_partition_rbound_cmp (...) at partbounds.c:3816 #5 0x000055d77d0982ef in qsort_arg (...) at ../../src/include/lib/sort_template.h:316 #6 0x000055d77ce1d109 in calculate_partition_bound_for_merge (...) at partbounds.c:5786 #7 0x000055d77cc24b2b in transformPartitionCmdForMerge (...) at parse_utilcmd.c:3524 #8 0x000055d77cc2b555 in transformAlterTableStmt (...) at parse_utilcmd.c:3812 #9 0x000055d77ccab17c in ATParseTransformCmd (...) at tablecmds.c:5650 #10 0x000055d77ccafd09 in ATExecCmd (...) at tablecmds.c:5589 ... Best regards, Alexander
Alexander Lakhin, thanks for the problems you found! Unfortunately I can't watch them immediately (event [1]). I will try to start solving them in 12-14 hours. [1] https://pgconf.ru/2024 -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Hi all,
I went through the MERGE/SPLIT partition codes today, thanks for the works. I found some grammar errors:
i. in error messages(Users can see this grammar errors, not friendly).
ii. in codes comments
Alexander Korotkov <aekorotkov@gmail.com> 于2024年4月7日周日 06:23写道:
Hi, Dmitry!
On Fri, Apr 5, 2024 at 4:00 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:
> > I've revised the patchset.
>
> Thanks for the corrections (especially ddl.sgml).
> Could you also look at a small optimization for the MERGE PARTITIONS
> command (in a separate file
> v31-0003-Additional-patch-for-ALTER-TABLE-.-MERGE-PARTITI.patch, I wrote
> about it in an email 2024-03-31 00:56:50)?
>
> Files v31-0001-*.patch, v31-0002-*.patch are the same as
> v30-0001-*.patch, v30-0002-*.patch (after rebasing because patch stopped
> applying due to changes in upstream).
I've pushed 0001 and 0002. I didn't push 0003 for the following reasons.
1) This doesn't keep functionality equivalent to 0001. With 0003, the
merged partition will inherit indexes, constraints, and so on from the
one of merging partitions.
2) This is not necessarily an optimization. Without 0003 indexes on
the merged partition are created after moving the rows in
attachPartitionTable(). With 0003 we merge data into the existing
partition which saves its indexes. That might cause a significant
performance loss because mass inserts into indexes may be much slower
than building indexes from scratch.
I think both aspects need to be carefully considered. Even if we
accept them, this needs to be documented. I think now it's too late
for both of these. So, this should wait for v18.
------
Regards,
Alexander Korotkov
Tender Wang
OpenPie: https://en.openpie.com/Attachment
Hi Tender Wang, 08.04.2024 13:43, Tender Wang wrote: > Hi all, > I went through the MERGE/SPLIT partition codes today, thanks for the works. I found some grammar errors: > i. in error messages(Users can see this grammar errors, not friendly). > ii. in codes comments > On a quick glance, I saw also: NULL-value partitionde splited temparary And a trailing whitespace at: the quarter partition back to monthly partitions: warning: 1 line adds whitespace errors. I'm also confused by "administrators" here: https://www.postgresql.org/docs/devel/ddl-partitioning.html (We can find on the same page, for instance: ... whereas table inheritance allows data to be divided in a manner of the user's choosing. It seems to me, that "users" should work for merging partitions as well.) Though the documentation addition requires more than just a quick glance, of course. Best regards, Alexander
Hi! Attached fix for the problems found by Alexander Lakhin. About grammar errors. Unfortunately, I don't know English well. Therefore, I plan (in the coming days) to show the text to specialists who perform technical translation of documentation. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Mon, Apr 8, 2024 at 11:43 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > Attached fix for the problems found by Alexander Lakhin. > > About grammar errors. > Unfortunately, I don't know English well. > Therefore, I plan (in the coming days) to show the text to specialists > who perform technical translation of documentation. Thank you. I've pushed this fix with minor corrections from me. ------ Regards, Alexander Korotkov
Hello Alexander and Dmitry, 10.04.2024 02:03, Alexander Korotkov wrote: > On Mon, Apr 8, 2024 at 11:43 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: >> Attached fix for the problems found by Alexander Lakhin. >> >> About grammar errors. >> Unfortunately, I don't know English well. >> Therefore, I plan (in the coming days) to show the text to specialists >> who perform technical translation of documentation. > Thank you. I've pushed this fix with minor corrections from me. Thank you for fixing that defect! Please look at an error message emitted for foreign tables: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE FOREIGN TABLE ftp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1) SERVER loopback OPTIONS (table_name 'lt_0_1'); CREATE FOREIGN TABLE ftp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2) SERVER loopback OPTIONS (table_name 'lt_1_2'); ALTER TABLE t MERGE PARTITIONS (ftp_0_1, ftp_1_2) INTO ftp_0_2; ERROR: "ftp_0_1" is not a table Shouldn't it be more correct/precise? Best regards, Alexander
10.04.2024 12:00, Alexander Lakhin wrote: > Hello Alexander and Dmitry, > > 10.04.2024 02:03, Alexander Korotkov wrote: >> Thank you. I've pushed this fix with minor corrections from me. > Please look at another anomaly with MERGE. CREATE TEMP TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2); fails with ERROR: cannot create a permanent relation as partition of temporary relation "t" But CREATE TEMP TABLE t (i int) PARTITION BY RANGE (i); CREATE TEMP TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TEMP TABLE tp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2); ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO tp_0_2; succeeds and we get: regression=# \d+ t* Partitioned table "pg_temp_1.t" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- i | integer | | | | plain | | | Partition key: RANGE (i) Partitions: tp_0_2 FOR VALUES FROM (0) TO (2) Table "public.tp_0_2" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- i | integer | | | | plain | | | Partition of: t FOR VALUES FROM (0) TO (2) Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2)) Best regards, Alexander
Hi! Alexander Korotkov, thanks for the commit of previous fix. Alexander Lakhin, thanks for the problem you found. There are two corrections attached to the letter: 1) v1-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch - fix for the problem [1]. 2) v1-0002-Fixes-for-english-text.patch - fixes for English text (comments, error messages etc.). Links: [1] https://www.postgresql.org/message-id/dbc8b96c-3cf0-d1ee-860d-0e491da20485%40gmail.com -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Thu, Apr 11, 2024 at 1:22 AM Dmitry Koval <d.koval@postgrespro.ru> wrote:
2) v1-0002-Fixes-for-english-text.patch - fixes for English text
(comments, error messages etc.).
FWIW, I also proposed a patch earlier that fixes error messages and
comments in the split partition code at
https://www.postgresql.org/message-id/flat/CAMbWs49DDsknxyoycBqiE72VxzL_sYHF6zqL8dSeNehKPJhkKg%40mail.gmail.com
Thanks
Richard
comments in the split partition code at
https://www.postgresql.org/message-id/flat/CAMbWs49DDsknxyoycBqiE72VxzL_sYHF6zqL8dSeNehKPJhkKg%40mail.gmail.com
Thanks
Richard
Hi! > FWIW, I also proposed a patch earlier that fixes error messages and > comments in the split partition code Sorry, I thought all the fixes you suggested were already included in v1-0002-Fixes-for-english-text.patch (but they are not). Added missing lines to v2-0002-Fixes-for-english-text.patch. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Hi Dmitry,
11.04.2024 11:59, Dmitry Koval wrote:
11.04.2024 11:59, Dmitry Koval wrote:
FWIW, I also proposed a patch earlier that fixes error messages and
comments in the split partition code
Sorry, I thought all the fixes you suggested were already included in v1-0002-Fixes-for-english-text.patch (but they are not).
Added missing lines to v2-0002-Fixes-for-english-text.patch.
It seems to me that v2-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch
is not complete either.
Take a look, please:
CREATE TABLE t (i int) PARTITION BY RANGE (i);
SET search_path = pg_temp, public;
CREATE TABLE tp_0_1 PARTITION OF t
FOR VALUES FROM (0) TO (1);
-- fails with:
ERROR: cannot create a temporary relation as partition of permanent relation "t"
But:
CREATE TABLE t (i int) PARTITION BY RANGE (i);
CREATE TABLE tp_0_1 PARTITION OF t
FOR VALUES FROM (0) TO (1);
CREATE TABLE tp_1_2 PARTITION OF t
FOR VALUES FROM (1) TO (2);
INSERT INTO t VALUES(0), (1);
SELECT * FROM t;
-- the expected result is:
i
---
0
1
(2 rows)
SET search_path = pg_temp, public;
ALTER TABLE t
MERGE PARTITIONS (tp_0_1, tp_1_2) INTO tp_0_2;
-- succeeds, and
\c -
SELECT * FROM t;
-- gives:
i
---
(0 rows)
Please also ask your tech writers to check contents of src/test/sql/*, if
possible (perhaps, they'll fix "salesmans" and improve grammar).
Best regards,
Alexander
Hi! 1. Alexander Lakhin sent a question about index name after MERGE (partition name is the same as one of the merged partitions): ----start of quote---- I'm also confused by an index name after MERGE: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE tp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2); CREATE INDEX tidx ON t(i); ALTER TABLE t MERGE PARTITIONS (tp_1_2, tp_0_1) INTO tp_1_2; \d+ t* Table "public.tp_1_2" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- i | integer | | | | plain | | | Partition of: t FOR VALUES FROM (0) TO (2) Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2)) Indexes: "merge-16385-3A14B2-tmp_i_idx" btree (i) Is the name "merge-16385-3A14B2-tmp_i_idx" valid or it's something temporary? ----end of quote---- Fix for this case added to file v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch. ---- 2. >It seems to me that v2-0001-Fix-for-SPLIT-MERGE-partitions-of- >temporary-table.patch is not complete either. Added correction (and test), see v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Hi, Dmitry! On Thu, Apr 11, 2024 at 4:27 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > 1. > Alexander Lakhin sent a question about index name after MERGE (partition > name is the same as one of the merged partitions): > > ----start of quote---- > I'm also confused by an index name after MERGE: > CREATE TABLE t (i int) PARTITION BY RANGE (i); > > CREATE TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); > CREATE TABLE tp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2); > > CREATE INDEX tidx ON t(i); > ALTER TABLE t MERGE PARTITIONS (tp_1_2, tp_0_1) INTO tp_1_2; > \d+ t* > > Table "public.tp_1_2" > Column | Type | Collation | Nullable | Default | Storage | > Compression | Stats target | Description > --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- > i | integer | | | | plain | > | | > Partition of: t FOR VALUES FROM (0) TO (2) > Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2)) > Indexes: > "merge-16385-3A14B2-tmp_i_idx" btree (i) > > Is the name "merge-16385-3A14B2-tmp_i_idx" valid or it's something > temporary? > ----end of quote---- > > Fix for this case added to file > v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch. > > ---- > > 2. > >It seems to me that v2-0001-Fix-for-SPLIT-MERGE-partitions-of- > >temporary-table.patch is not complete either. > > Added correction (and test), see > v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch. Thank you, I'll review this later today. ------ Regards, Alexander Korotkov
11.04.2024 16:27, Dmitry Koval wrote: > > Added correction (and test), see v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch. > Thank you for the correction, but may be an attempt to merge into implicit pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does? Please look also at another anomaly with schemas: CREATE SCHEMA s1; CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2); ALTER TABLE t SPLIT PARTITION tp_0_2 INTO (PARTITION s1.tp0 FOR VALUES FROM (0) TO (1), PARTITION s1.tp1 FOR VALUES FROM (1) TO (2)); results in: \d+ s1.* Did not find any relation named "s1.*" \d+ tp* Table "public.tp0" ... Table "public.tp1" ... Best regards, Alexander
On Thu, Apr 11, 2024 at 8:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > 11.04.2024 16:27, Dmitry Koval wrote: > > > > Added correction (and test), see v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch. > > > > Thank you for the correction, but may be an attempt to merge into implicit > pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does? > > Please look also at another anomaly with schemas: > CREATE SCHEMA s1; > CREATE TABLE t (i int) PARTITION BY RANGE (i); > CREATE TABLE tp_0_2 PARTITION OF t > FOR VALUES FROM (0) TO (2); > ALTER TABLE t SPLIT PARTITION tp_0_2 INTO > (PARTITION s1.tp0 FOR VALUES FROM (0) TO (1), PARTITION s1.tp1 FOR VALUES FROM (1) TO (2)); > results in: > \d+ s1.* > Did not find any relation named "s1.*" > \d+ tp* > Table "public.tp0" > ... > Table "public.tp1" +1 I think we shouldn't unconditionally copy schema name and relpersistence from the parent table. Instead we should throw the error on a mismatch like CREATE TABLE ... PARTITION OF ... does. I'm working on revising this fix. ------ Regards, Alexander Korotkov
On Thu, Apr 11, 2024 at 9:54 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > I think we shouldn't unconditionally copy schema name and > relpersistence from the parent table. Instead we should throw the > error on a mismatch like CREATE TABLE ... PARTITION OF ... does. I'm > working on revising this fix. We definitely shouldn't copy the schema name from the parent table. It should be possible to schema-qualify the new partition names, and if you don't, then the search_path should determine where they get placed. But I am inclined to think that relpersistence should be copied. It's weird that you split an unlogged partition and you get logged partitions. One of the things I dislike about this type of feature -- not this implementation specifically, but just this kind of idea in general -- is that the syntax mentions a whole bunch of tables but in a way where you can't set their properties. Persistence, reloptions, whatever. There's just no place to mention any of that stuff - and if you wanted to create a place, you'd have to invent special syntax for each separate thing. That's why I think it's good that the normal way of creating a partition is CREATE TABLE .. PARTITION OF. Because that way, we know that the full power of the CREATE TABLE statement is always available, and you can set anything that you could set for a table that is not a partition. Of course, that is not to say that some people won't like to have a feature of this sort. I expect they will. The approach does have some drawbacks, though. -- Robert Haas EDB: http://www.enterprisedb.com
Hi! Attached is a patch with corrections based on comments in previous letters (I think these corrections are not final). I'll be very grateful for feedbacks and bug reports. 11.04.2024 20:00, Alexander Lakhin wrote: > may be an attempt to merge into implicit > pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does? Corrected. Result is: \d+ s1.* Table "s1.tp0" ... Table "s1.tp1" ... \d+ tp* Did not find any relation named "tp*". 12.04.2024 4:53, Alexander Korotkov wrote: > I think we shouldn't unconditionally copy schema name and > relpersistence from the parent table. Instead we should throw the > error on a mismatch like CREATE TABLE ... PARTITION OF ... does. 12.04.2024 5:20, Robert Haas wrote: > We definitely shouldn't copy the schema name from the parent table. Fixed. 12.04.2024 5:20, Robert Haas wrote: > One of the things I dislike about this type of feature -- not this > implementation specifically, but just this kind of idea in general -- > is that the syntax mentions a whole bunch of tables but in a way where > you can't set their properties. Persistence, reloptions, whatever. In next releases I want to allow specifying options (probably, first of all, specifying tablespace of the partitions). But before that, I would like to get a users reaction - what options they really need? -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Hi Dmitry, 12.04.2024 16:04, Dmitry Koval wrote: > Hi! > > Attached is a patch with corrections based on comments in previous letters (I think these corrections are not final). > I'll be very grateful for feedbacks and bug reports. > > 11.04.2024 20:00, Alexander Lakhin wrote: > > may be an attempt to merge into implicit > > pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does? > > Corrected. Result is: Thank you! Still now we're able to create a partition in the pg_temp schema explicitly. Please try: ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO pg_temp.tp_0_2; in the scenario [1] and you'll get the same empty table. [1] https://www.postgresql.org/message-id/fdaa003e-919c-cbc9-4f0c-e4546e96bd65%40gmail.com Best regards, Alexander
Thanks, Alexander! > Still now we're able to create a partition in the pg_temp schema > explicitly. Attached patches with fix. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Hi, Dmitry! On Fri, Apr 12, 2024 at 10:59 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > > Thanks, Alexander! > > > Still now we're able to create a partition in the pg_temp schema > > explicitly. > > Attached patches with fix. Please, find a my version of this fix attached. I think we need to check relpersistence in a similar way ATTACH PARTITION or CREATE TABLE ... PARTITION OF do. I'm going to polish this a little bit more. ------ Regards, Alexander Korotkov
Attachment
On Sat, Apr 13, 2024 at 6:05 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Please, find a my version of this fix attached. I think we need to > check relpersistence in a similar way ATTACH PARTITION or CREATE TABLE > ... PARTITION OF do. I'm going to polish this a little bit more. + errmsg("\"%s\" is not an ordinary table", This is not a phrasing that we use in any other error message. We always just say "is not a table". + * Open the new partition and acquire exclusive lock on it. This will A minor nitpick is that this should probably say access exclusive rather than exclusive. But the bigger thing that confuses me here is that if we just created the partition, surely we must *already* hold AccessExclusiveLoc on it. No? -- Robert Haas EDB: http://www.enterprisedb.com
Hello Robert, 15.04.2024 17:30, Robert Haas wrote: > On Sat, Apr 13, 2024 at 6:05 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> Please, find a my version of this fix attached. I think we need to >> check relpersistence in a similar way ATTACH PARTITION or CREATE TABLE >> ... PARTITION OF do. I'm going to polish this a little bit more. > + errmsg("\"%s\" is not an ordinary table", > > This is not a phrasing that we use in any other error message. We > always just say "is not a table". Initially I was confused by that message, because of: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE FOREIGN TABLE ftp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1) SERVER loopback OPTIONS (table_name 'lt_0_1'); CREATE FOREIGN TABLE ftp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2) SERVER loopback OPTIONS (table_name 'lt_1_2'); ALTER TABLE t MERGE PARTITIONS (ftp_0_1, ftp_1_2) INTO ftp_0_2; ERROR: "ftp_0_1" is not a table (Isn't a foreign table a table?) And also: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE t2 (i int) PARTITION BY RANGE (i); ALTER TABLE t MERGE PARTITIONS (tp_0_1, t2) INTO tpn; ERROR: "t2" is not a table (Isn't a partitioned table a table?) And in fact, an ordinary table is not suitable for MERGE anyway: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE t2 (i int); ALTER TABLE t MERGE PARTITIONS (tp_0_1, t2) INTO tpn; ERROR: "t2" is not a partition So I don't think that "an ordinary table" is a good (unambiguous) term either. Best regards, Alexander
Hi! > Please, find a my version of this fix attached. Is it possible to make a small addition to the file v6-0001 ... .patch (see attachment)? Most important: 1) Line 19: + mergePartName = makeRangeVar(cmd->name->schemaname, tmpRelName, -1); (temporary table should use the same schema as the partition); 2) Lines 116-123: +RESET search_path; + +-- Can't merge persistent partitions into a temporary partition +ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO pg_temp.tp_0_2; + +SET search_path = pg_temp, public; (Alexandr Lakhin's test for using of pg_temp schema explicitly). The rest of the changes in v6_afterfix.diff are not very important and can be ignored. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Mon, Apr 15, 2024 at 11:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > Initially I was confused by that message, because of: > CREATE TABLE t (i int) PARTITION BY RANGE (i); > CREATE FOREIGN TABLE ftp_0_1 PARTITION OF t > FOR VALUES FROM (0) TO (1) > SERVER loopback OPTIONS (table_name 'lt_0_1'); > CREATE FOREIGN TABLE ftp_1_2 PARTITION OF t > FOR VALUES FROM (1) TO (2) > SERVER loopback OPTIONS (table_name 'lt_1_2'); > ALTER TABLE t MERGE PARTITIONS (ftp_0_1, ftp_1_2) INTO ftp_0_2; > ERROR: "ftp_0_1" is not a table > (Isn't a foreign table a table?) I agree that this can be confusing, but a patch that is about adding SPLIT and MERGE PARTITION operations cannot decide to also invent a new error message phraseology and use it only in one place. We need to maintain consistency across the whole code base. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, Dmitry! On Mon, Apr 15, 2024 at 6:26 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > > Hi! > > > Please, find a my version of this fix attached. > > Is it possible to make a small addition to the file v6-0001 ... .patch > (see attachment)? > > Most important: > 1) Line 19: > > + mergePartName = makeRangeVar(cmd->name->schemaname, tmpRelName, -1); > > (temporary table should use the same schema as the partition); > > 2) Lines 116-123: > > +RESET search_path; > + > +-- Can't merge persistent partitions into a temporary partition > +ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO pg_temp.tp_0_2; > + > +SET search_path = pg_temp, public; > > (Alexandr Lakhin's test for using of pg_temp schema explicitly). > > > The rest of the changes in v6_afterfix.diff are not very important and > can be ignored. Thank you. I've integrated your changes. The revised patchset is attached. 1) I've split the fix for the CommandCounterIncrement() issue and the fix for relation persistence issue into a separate patch. 2) I've validated that the lock on the new partition is held in createPartitionTable() after ProcessUtility() as pointed out by Robert. So, no need to place the lock again. 3) Added fix for problematic error message as a separate patch [1]. 4) Added rename "salemans" => "salesmen" for tests as a separate patch. I think these fixes are reaching committable shape, but I'd like someone to check it before I push. Links. 1. https://postgr.es/m/20240408.152402.1485994009160660141.horikyota.ntt%40gmail.com ------ Regards, Alexander Korotkov
Attachment
Hi Alexander, 18.04.2024 13:35, Alexander Korotkov wrote: > > The revised patchset is attached. > 1) I've split the fix for the CommandCounterIncrement() issue and the > fix for relation persistence issue into a separate patch. > 2) I've validated that the lock on the new partition is held in > createPartitionTable() after ProcessUtility() as pointed out by > Robert. So, no need to place the lock again. > 3) Added fix for problematic error message as a separate patch [1]. > 4) Added rename "salemans" => "salesmen" for tests as a separate patch. > > I think these fixes are reaching committable shape, but I'd like > someone to check it before I push. I think the feature implementation should also provide tab completion for SPLIT/MERGE. (ALTER TABLE t S<Tab> fills in only SET now.) Also, the following MERGE operation: CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i); CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0; leaves a strange constraint: \d+ t* Table "public.tp_0" ... Not-null constraints: "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i" Best regards, Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > Hi Alexander, > > 18.04.2024 13:35, Alexander Korotkov wrote: >> >> The revised patchset is attached. >> 1) I've split the fix for the CommandCounterIncrement() issue and the >> fix for relation persistence issue into a separate patch. >> 2) I've validated that the lock on the new partition is held in >> createPartitionTable() after ProcessUtility() as pointed out by >> Robert. So, no need to place the lock again. >> 3) Added fix for problematic error message as a separate patch [1]. >> 4) Added rename "salemans" => "salesmen" for tests as a separate patch. >> >> I think these fixes are reaching committable shape, but I'd like >> someone to check it before I push. > > I think the feature implementation should also provide tab completion for > SPLIT/MERGE. > (ALTER TABLE t S<Tab> > fills in only SET now.) Here's a patch for that. One thing I noticed while testing it was that the tab completeion for partitions (Query_for_partition_of_table) shows all the schemas in the DB, even ones that don't contain any partitions of the table being altered. - ilmari From 26db03b7a7675aa7dbff1f18ee084296caa1e181 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org> Date: Thu, 18 Apr 2024 17:47:22 +0100 Subject: [PATCH] =?UTF-8?q?Add=20tab=20completion=20for=20ALTER=20TABLE=20?= =?UTF-8?q?=E2=80=A6=20SPLIT|MERGE=20PARTITION(S)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- src/bin/psql/tab-complete.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 6fee3160f0..97cd5d9f62 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -2353,6 +2353,7 @@ psql_completion(const char *text, int start, int end) "OWNER TO", "SET", "VALIDATE CONSTRAINT", "REPLICA IDENTITY", "ATTACH PARTITION", "DETACH PARTITION", "FORCE ROW LEVEL SECURITY", + "SPLIT PARTITION", "MERGE PARTITIONS (", "OF", "NOT OF"); /* ALTER TABLE xxx ADD */ else if (Matches("ALTER", "TABLE", MatchAny, "ADD")) @@ -2609,10 +2610,10 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("FROM (", "IN (", "WITH ("); /* - * If we have ALTER TABLE <foo> DETACH PARTITION, provide a list of + * If we have ALTER TABLE <foo> DETACH|SPLIT PARTITION, provide a list of * partitions of <foo>. */ - else if (Matches("ALTER", "TABLE", MatchAny, "DETACH", "PARTITION")) + else if (Matches("ALTER", "TABLE", MatchAny, "DETACH|SPLIT", "PARTITION")) { set_completion_reference(prev3_wd); COMPLETE_WITH_SCHEMA_QUERY(Query_for_partition_of_table); @@ -2620,6 +2621,19 @@ psql_completion(const char *text, int start, int end) else if (Matches("ALTER", "TABLE", MatchAny, "DETACH", "PARTITION", MatchAny)) COMPLETE_WITH("CONCURRENTLY", "FINALIZE"); + /* ALTER TABLE <name> SPLIT PARTITION <name> */ + else if (Matches("ALTER", "TABLE", MatchAny, "SPLIT", "PARTITION", MatchAny)) + COMPLETE_WITH("INTO ( PARTITION"); + + /* ALTER TABLE <name> MERGE PARTITIONS ( */ + else if (Matches("ALTER", "TABLE", MatchAny, "MERGE", "PARTITIONS", "(")) + { + set_completion_reference(prev4_wd); + COMPLETE_WITH_SCHEMA_QUERY(Query_for_partition_of_table); + } + else if (Matches("ALTER", "TABLE", MatchAny, "MERGE", "PARTITIONS", "(*)")) + COMPLETE_WITH("INTO"); + /* ALTER TABLE <name> OF */ else if (Matches("ALTER", "TABLE", MatchAny, "OF")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_composite_datatypes); -- 2.39.2
On 2024-Apr-18, Alexander Lakhin wrote: > I think the feature implementation should also provide tab completion > for SPLIT/MERGE. I don't think that we should be imposing on feature authors or committers the task of filling in tab-completion for whatever features they contribute. I mean, if they want to add that, cool; but if not, somebody else can do that, too. It's not a critical piece. Now, if we're talking about whether a patch to add tab-completion to a feature post feature-freeze is acceptable, I think it absolutely is (even though you could claim that it's a new psql feature). But for sure we shouldn't mandate that a feature be reverted just because it lacks tab-completion -- such lack is not an open-item against the feature in that sense. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "That sort of implies that there are Emacs keystrokes which aren't obscure. I've been using it daily for 2 years now and have yet to discover any key sequence which makes any sense." (Paul Thomas)
On Thu, Apr 18, 2024 at 6:35 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > The revised patchset is attached. > 1) I've split the fix for the CommandCounterIncrement() issue and the > fix for relation persistence issue into a separate patch. > 2) I've validated that the lock on the new partition is held in > createPartitionTable() after ProcessUtility() as pointed out by > Robert. So, no need to place the lock again. > 3) Added fix for problematic error message as a separate patch [1]. > 4) Added rename "salemans" => "salesmen" for tests as a separate patch. > > I think these fixes are reaching committable shape, but I'd like > someone to check it before I push. Reviewing 0001: - Seems mostly fine. I think the comment /* Unlock and drop merged partitions. */ is wrong. I think it should say something like /* Drop the current partitions before adding the new one. */ because (a) it doesn't unlock anything, and there's another comment saying that and (b) we now know that the drop vs. add order matters. Reviewing 0002: - Commit message typos: behavious, corresponsing - Given the change to the header comment of createPartitionTable, it's rather surprising to me that this patch doesn't touch the documentation. Isn't that a big change in semantics? - My previous review comment was really about the code comment, I believe, rather than the use of AccessExclusiveLock. NoLock is probably fine, but if it were me I'd be tempted to write AccessExclusiveLock and just make the comment say something like /* We should already have the lock, but do it this way just to be certain */. But what you have is probably fine, too. Mostly, I want to clarify the intent of my previous comment. - Do we, or can we, have a test that if you split a partition that's not in the search path, the resulting partitions end up in your creation namespace? And similarly for merge? And maybe also that schema-qualification works properly? I haven't exhaustively verified the patch, but these are some things I noticed when scrolling through it. Reviewing 0003: - Are you sure this can't dereference datum when datum is NULL, in either the upper or lower half? It sure looks strange to have code that looks like it can make datum a null pointer, and then an unconditional deference just after. - In general I think the wording changes are improvements. I'm slightly suspicious that there might be an even better way to word it, but I can't think of it right at this very moment. - I'm kind of unhappy (but not totally unhappy) with the semantics. Suppose I have a partition that allows values from 0 to 1000, but actually only contains values that are either between 0 and 99 or between 901 and 1000. If I try to to split the partition into one that allows 0..100 and a second that allows 900..1000, it will fail. Maybe that's good, because that means that if a failure is going to happen, it will happen right at the beginning, rather than maybe after doing a lot of work. But on the other hand, it also kind of stinks, because it feels like I'm being told I can't do something that I know is perfectly fine. Reviewing 0004: - Obviously this is quite trivial and there's no real problem with it, but if we're changing it anyway, how about a gender-neutral term (salesperson/salespeople)? -- Robert Haas EDB: http://www.enterprisedb.com
Here are some additional fixes to docs.
Attachment
Hi! 18.04.2024 19:00, Alexander Lakhin wrote: > leaves a strange constraint: > \d+ t* > Table "public.tp_0" > ... > Not-null constraints: > "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i" Thanks! Attached fix (with test) for this case. The patch should be applied after patches v6-0001- ... .patch ... v6-0004- ... .patch -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
18.04.2024 20:49, Alvaro Herrera wrote: > On 2024-Apr-18, Alexander Lakhin wrote: > >> I think the feature implementation should also provide tab completion >> for SPLIT/MERGE. > I don't think that we should be imposing on feature authors or > committers the task of filling in tab-completion for whatever features > they contribute. I mean, if they want to add that, cool; but if not, > somebody else can do that, too. It's not a critical piece. I agree, I just wanted to note the lack of the current implementation. But now, thanks to Dagfinn, we have the tab completion too. I have also a question regarding "ALTER TABLE ... SET ACCESS METHOD". The current documentation says: When applied to a partitioned table, there is no data to rewrite, but partitions created afterwards will default to the given access method unless overridden by a USING clause. But MERGE/SPLIT behave differently (if one can assume that MERGE/SPLIT create new partitions under the hood): CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler; CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i); ALTER TABLE t SET ACCESS METHOD heap2; CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); \d t+ Partitioned table "public.t" ... Access method: heap2 Table "public.tp_0" ... Access method: heap2 Table "public.tp_1" ... Access method: heap2 ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0; Partitioned table "public.t" ... Access method: heap2 Table "public.tp_0" ... Access method: heap Shouldn't it be changed, what do you think? Best regards, Alexander
On Thu, Apr 11, 2024 at 10:20:53PM -0400, Robert Haas wrote: > On Thu, Apr 11, 2024 at 9:54 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > I think we shouldn't unconditionally copy schema name and > > relpersistence from the parent table. Instead we should throw the > > error on a mismatch like CREATE TABLE ... PARTITION OF ... does. I'm > > working on revising this fix. > > We definitely shouldn't copy the schema name from the parent table. It > should be possible to schema-qualify the new partition names, and if > you don't, then the search_path should determine where they get > placed. +1. Alexander Lakhin reported an issue with schemas and SPLIT, and I noticed an issue with schemas with MERGE. The issue I hit is occurs when MERGE'ing into a partition with the same name, and it's fixed like so: --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -21526,8 +21526,7 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo *tab, Relation rel, { /* Create partition table with generated temporary name. */ sprintf(tmpRelName, "merge-%u-%X-tmp", RelationGetRelid(rel), MyProcPid); - mergePartName = makeRangeVar(get_namespace_name(RelationGetNamespace(rel)), - tmpRelName, -1); + mergePartName = makeRangeVar(mergePartName->schemaname, tmpRelName, -1); } createPartitionTable(mergePartName, makeRangeVar(get_namespace_name(RelationGetNamespace(rel)), > One of the things I dislike about this type of feature -- not this > implementation specifically, but just this kind of idea in general -- > is that the syntax mentions a whole bunch of tables but in a way where > you can't set their properties. Persistence, reloptions, whatever. > There's just no place to mention any of that stuff - and if you wanted > to create a place, you'd have to invent special syntax for each > separate thing. That's why I think it's good that the normal way of > creating a partition is CREATE TABLE .. PARTITION OF. Because that > way, we know that the full power of the CREATE TABLE statement is > always available, and you can set anything that you could set for a > table that is not a partition. Right. The current feature is useful and will probably work for 90% of people's partitioned tables. Currently, CREATE TABLE .. PARTITION OF does not create stats objects on the child table, but MERGE PARTITIONS does, which seems strange. Maybe stats should not be included on the new child ? Note that stats on parent table are not analagous to indexes - partitioned indexes do nothing other than cause indexes to be created on any new/attached partitions. But stats objects on the parent 1) cause extended stats to be collected and computed across the whole partition heirarchy, and 2) do not cause stats to be computed for the individual partitions. Partitions can have different column definitions, for example null constraints, FKs, defaults. And currently, if you MERGE partitions, those will all be lost (or rather, replaced by whatever LIKE parent gives). I think that's totally fine - anyone using different defaults on child tables could either not use MERGE PARTITIONS, or fix up the defaults afterwards. There's not much confusion that the details of the differences between individual partitions will be lost when the individual partitions are merged and no longer exist. But I think it'd be useful to document how the new partitions will be constructed. -- Justin
On Fri, Apr 19, 2024 at 2:26 AM Dmitry Koval <d.koval@postgrespro.ru> wrote: > > Hi! > > 18.04.2024 19:00, Alexander Lakhin wrote: > > leaves a strange constraint: > > \d+ t* > > Table "public.tp_0" > > ... > > Not-null constraints: > > "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i" > > Thanks! > Attached fix (with test) for this case. > The patch should be applied after patches > v6-0001- ... .patch ... v6-0004- ... .patch I've incorporated this fix with 0001 patch. Also added to the patchset 005 – tab completion by Dagfinn [1] 006 – draft fix for table AM issue spotted by Alexander Lakhin [2] 007 – doc review by Justin [3] I'm continuing work on this. Links 1. https://www.postgresql.org/message-id/87plumiox2.fsf%40wibble.ilmari.org 2. https://www.postgresql.org/message-id/84ada05b-be5c-473e-6d1c-ebe5dd21b190%40gmail.com 3. https://www.postgresql.org/message-id/ZiGH0xc1lxJ71ZfB%40pryzbyj2023 ------ Regards, Alexander Korotkov
Attachment
- v7-0001-Change-the-way-ATExecMergePartitions-handles-the-.patch
- v7-0002-Verify-persistence-of-new-partitions-during-MERGE.patch
- v7-0005-Add-tab-completion-for-ALTER-TABLE-SPLIT-MERGE-PA.patch
- v7-0003-Fix-error-message-in-check_partition_bounds_for_s.patch
- v7-0004-Rename-tables-in-tests-of-partition-MERGE-SPLIT-o.patch
- v7-0006-ALTER-TABLE-.-MERGE-PARTITIONS-.-make-inherit-par.patch
- v7-0007-doc-review-for-ALTER-TABLE-.-SPLIT-MERGE-PARTITIO.patch
Hi! On Fri, Apr 19, 2024 at 4:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Fri, Apr 19, 2024 at 2:26 AM Dmitry Koval <d.koval@postgrespro.ru> wrote: > > 18.04.2024 19:00, Alexander Lakhin wrote: > > > leaves a strange constraint: > > > \d+ t* > > > Table "public.tp_0" > > > ... > > > Not-null constraints: > > > "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i" > > > > Thanks! > > Attached fix (with test) for this case. > > The patch should be applied after patches > > v6-0001- ... .patch ... v6-0004- ... .patch > > I've incorporated this fix with 0001 patch. > > Also added to the patchset > 005 – tab completion by Dagfinn [1] > 006 – draft fix for table AM issue spotted by Alexander Lakhin [2] > 007 – doc review by Justin [3] > > I'm continuing work on this. > > Links > 1. https://www.postgresql.org/message-id/87plumiox2.fsf%40wibble.ilmari.org > 2. https://www.postgresql.org/message-id/84ada05b-be5c-473e-6d1c-ebe5dd21b190%40gmail.com > 3. https://www.postgresql.org/message-id/ZiGH0xc1lxJ71ZfB%40pryzbyj2023 0001 The way we handle name collisions during MERGE PARTITIONS operation is reworked by integration of patch [3]. This makes note about commit in [2] not relevant. 0002 The persistence of the new partition is copied as suggested in [1]. But the checks are in-place, because search_path could influence new table persistence. Per review [2], commit message typos are fixed, documentation is revised, revised tests to cover schema-qualification, usage of search_path. 0003 Making code more clear that we're not going to dereference the NULL datum per note in [2]. 0004 Gender-neutral terms are used per suggestions in [2]. 0005 Commit message revised 0006 Revise documentation mentioning we're going to copy the parent's table AM. Regression tests are added. Commit message revised. 0007 Commit message revised Links 1. https://www.postgresql.org/message-id/CA%2BTgmoYcjL%2Bw2BQzku5iNXKR5fyxJMSP3avQta8xngioTX7D7A%40mail.gmail.com 2. https://www.postgresql.org/message-id/CA%2BTgmoY_4r6BeeSCTim04nAiCmmXg-1pG1toxQovZOP2qaFJ0A%40mail.gmail.com 3. https://www.postgresql.org/message-id/f8b5cbf5-965e-4e5b-b506-33bbf41b0d50%40postgrespro.ru ------ Regards, Alexander Korotkov
Attachment
- v8-0003-Fix-error-message-in-check_partition_bounds_for_s.patch
- v8-0004-Rename-tables-in-tests-of-partition-MERGE-SPLIT-o.patch
- v8-0002-Make-new-partitions-with-parent-s-persistence-dur.patch
- v8-0005-Add-tab-completion-for-partition-MERGE-SPLIT-oper.patch
- v8-0001-Change-the-way-ATExecMergePartitions-handles-the-.patch
- v8-0006-Inherit-parent-s-AM-for-partition-MERGE-SPLIT-ope.patch
- v8-0007-Grammar-fixes-for-documentation-of-partition-MERG.patch
On Mon, Apr 22, 2024 at 01:31:48PM +0300, Alexander Korotkov wrote: > Hi! > > On Fri, Apr 19, 2024 at 4:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Fri, Apr 19, 2024 at 2:26 AM Dmitry Koval <d.koval@postgrespro.ru> wrote: > > > 18.04.2024 19:00, Alexander Lakhin wrote: > > > > leaves a strange constraint: > > > > \d+ t* > > > > Table "public.tp_0" > > > > ... > > > > Not-null constraints: > > > > "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i" > > > > > > Thanks! > > > Attached fix (with test) for this case. > > > The patch should be applied after patches > > > v6-0001- ... .patch ... v6-0004- ... .patch > > > > I've incorporated this fix with 0001 patch. > > > > Also added to the patchset > > 005 – tab completion by Dagfinn [1] > > 006 – draft fix for table AM issue spotted by Alexander Lakhin [2] > > 007 – doc review by Justin [3] > > > > I'm continuing work on this. > > > > Links > > 1. https://www.postgresql.org/message-id/87plumiox2.fsf%40wibble.ilmari.org > > 2. https://www.postgresql.org/message-id/84ada05b-be5c-473e-6d1c-ebe5dd21b190%40gmail.com > > 3. https://www.postgresql.org/message-id/ZiGH0xc1lxJ71ZfB%40pryzbyj2023 > > 0001 > The way we handle name collisions during MERGE PARTITIONS operation is > reworked by integration of patch [3]. This makes note about commit in > [2] not relevant. This patch also/already fixes the schema issue I reported. Thanks. If you wanted to include a test case for that: begin; CREATE SCHEMA s; CREATE SCHEMA t; CREATE TABLE p(i int) PARTITION BY RANGE(i); CREATE TABLE s.c1 PARTITION OF p FOR VALUES FROM (1)TO(2); CREATE TABLE s.c2 PARTITION OF p FOR VALUES FROM (2)TO(3); ALTER TABLE p MERGE PARTITIONS (s.c1, s.c2) INTO s.c1; -- misbehaves if merging into the same name as an existing partition \d+ p ... Partitions: c1 FOR VALUES FROM (1) TO (3) > 0002 > The persistence of the new partition is copied as suggested in [1]. > But the checks are in-place, because search_path could influence new > table persistence. Per review [2], commit message typos are fixed, > documentation is revised, revised tests to cover schema-qualification, > usage of search_path. Subject: [PATCH v8 2/7] Make new partitions with parent's persistence during MERGE/SPLIT operations This patch adds documentation saying: + Any indexes, constraints and user-defined row-level triggers that exist + in the parent table are cloned on new partitions [...] Which is good to say, and addresses part of my message [0] [0] ZiJW1g2nbQs9ekwK@pryzbyj2023 But it doesn't have anything to do with "creating new partitions with parent's persistence". Maybe there was a merge conflict and the docs ended up in the wrong patch ? Also, defaults, storage options, compression are also copied. As will be anything else from LIKE. And since anything added in the future will also be copied, maybe it's better to just say that the tables will be created the same way as "LIKE .. INCLUDING ALL EXCLUDING ..", or similar. Otherwise, the next person who adds a new option for LIKE would have to remember to update this paragraph... Also, extended stats objects are currently cloned to new child tables. But I suggested in [0] that they probably shouldn't be. > 007 – doc review by Justin [3] I suggest to drop this patch for now. I'll send some more minor fixes to docs and code comments once the other patches are settled. -- Justin
Hi, Hackers!
On Thu, 25 Apr 2024 at 00:26, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Apr 22, 2024 at 01:31:48PM +0300, Alexander Korotkov wrote:
> Hi!
>
> On Fri, Apr 19, 2024 at 4:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Fri, Apr 19, 2024 at 2:26 AM Dmitry Koval <d.koval@postgrespro.ru> wrote:
> > > 18.04.2024 19:00, Alexander Lakhin wrote:
> > > > leaves a strange constraint:
> > > > \d+ t*
> > > > Table "public.tp_0"
> > > > ...
> > > > Not-null constraints:
> > > > "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i"
> > >
> > > Thanks!
> > > Attached fix (with test) for this case.
> > > The patch should be applied after patches
> > > v6-0001- ... .patch ... v6-0004- ... .patch
> >
> > I've incorporated this fix with 0001 patch.
> >
> > Also added to the patchset
> > 005 – tab completion by Dagfinn [1]
> > 006 – draft fix for table AM issue spotted by Alexander Lakhin [2]
> > 007 – doc review by Justin [3]
> >
> > I'm continuing work on this.
> >
> > Links
> > 1. https://www.postgresql.org/message-id/87plumiox2.fsf%40wibble.ilmari.org
> > 2. https://www.postgresql.org/message-id/84ada05b-be5c-473e-6d1c-ebe5dd21b190%40gmail.com
> > 3. https://www.postgresql.org/message-id/ZiGH0xc1lxJ71ZfB%40pryzbyj2023
>
> 0001
> The way we handle name collisions during MERGE PARTITIONS operation is
> reworked by integration of patch [3]. This makes note about commit in
> [2] not relevant.
This patch also/already fixes the schema issue I reported. Thanks.
If you wanted to include a test case for that:
begin;
CREATE SCHEMA s;
CREATE SCHEMA t;
CREATE TABLE p(i int) PARTITION BY RANGE(i);
CREATE TABLE s.c1 PARTITION OF p FOR VALUES FROM (1)TO(2);
CREATE TABLE s.c2 PARTITION OF p FOR VALUES FROM (2)TO(3);
ALTER TABLE p MERGE PARTITIONS (s.c1, s.c2) INTO s.c1; -- misbehaves if merging into the same name as an existing partition
\d+ p
...
Partitions: c1 FOR VALUES FROM (1) TO (3)
> 0002
> The persistence of the new partition is copied as suggested in [1].
> But the checks are in-place, because search_path could influence new
> table persistence. Per review [2], commit message typos are fixed,
> documentation is revised, revised tests to cover schema-qualification,
> usage of search_path.
Subject: [PATCH v8 2/7] Make new partitions with parent's persistence during MERGE/SPLIT operations
This patch adds documentation saying:
+ Any indexes, constraints and user-defined row-level triggers that exist
+ in the parent table are cloned on new partitions [...]
Which is good to say, and addresses part of my message [0]
[0] ZiJW1g2nbQs9ekwK@pryzbyj2023
But it doesn't have anything to do with "creating new partitions with
parent's persistence". Maybe there was a merge conflict and the docs
ended up in the wrong patch ?
Also, defaults, storage options, compression are also copied. As will
be anything else from LIKE. And since anything added in the future will
also be copied, maybe it's better to just say that the tables will be
created the same way as "LIKE .. INCLUDING ALL EXCLUDING ..", or
similar. Otherwise, the next person who adds a new option for LIKE
would have to remember to update this paragraph...
Also, extended stats objects are currently cloned to new child tables.
But I suggested in [0] that they probably shouldn't be.
> 007 – doc review by Justin [3]
I suggest to drop this patch for now. I'll send some more minor fixes to
docs and code comments once the other patches are settled.
I've looked at the patchset:
0001 Look good.
0002 Also right with docs modification proposed by Justin.
0003:
Looks like unused code
5268 datum = cmpval ? list_nth(spec->lowerdatums, abs(cmpval) - 1) : NULL;
overridden by
5278 datum = list_nth(spec->upperdatums, abs(cmpval) - 1);
and
5290 datum = list_nth(spec->upperdatums, abs(cmpval) - 1);
Otherwise - good.
0004:
I suggest also getting rid of thee-noun compound words like: salesperson_name. Maybe salesperson -> clerk? Or maybe use the same terms like in pgbench: branches, tellers, accounts, balance.
0005: Good
0006: Patch is right
In comments:
+ New partitions will have the same table access method,
+ same column names and types as the partitioned table to which they belong.
+ same column names and types as the partitioned table to which they belong.
(I'd suggest to remove second "same")
Tests are passed. I suppose that it's better to add similar tests for SPLIT/MERGE PARTITION(S) to those covering ATTACH/DETACH PARTITION (e.g.: subscription/t/013_partition.pl and regression tests)
Overall, great work! Thanks!
Regards,
Pavel Borisov,
Supabase.
Hi, Pavel. Thank you for the review. On Fri, Apr 26, 2024 at 4:33 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > I've looked at the patchset: > > 0001 Look good. > 0002 Also right with docs modification proposed by Justin. Modified as proposed by Justin. The documentation for the way new partitions are created is now in separate patch. > 0003: > Looks like unused code > 5268 datum = cmpval ? list_nth(spec->lowerdatums, abs(cmpval) - 1) : NULL; > overridden by > 5278 datum = list_nth(spec->upperdatums, abs(cmpval) - 1); > and > 5290 datum = list_nth(spec->upperdatums, abs(cmpval) - 1); > > Otherwise - good. Fixed, thanks. > 0004: > I suggest also getting rid of thee-noun compound words like: salesperson_name. Maybe salesperson -> clerk? Or maybe usethe same terms like in pgbench: branches, tellers, accounts, balance. Thank you, but I'd like to prefer keeping these modifications simple. It's just regression tests, we don't need to have perfect naming here. My intention is to fix just obvious errors. > 0005: Good > 0006: Patch is right > In comments: > + New partitions will have the same table access method, > + same column names and types as the partitioned table to which they belong. > (I'd suggest to remove second "same") Documentation is modified per proposal by Justin. Thus double "same" is already gone. > Tests are passed. I suppose that it's better to add similar tests for SPLIT/MERGE PARTITION(S) to those covering ATTACH/DETACHPARTITION (e.g.: subscription/t/013_partition.pl and regression tests) The revised patchset is attached. I'm going to push it if there are no objections. Thank you for your suggestions about adding tests similar to subscription/t/013_partition.pl. I will work on this after pushing this patchset. ------ Regards, Alexander Korotkov Supabase
Attachment
- v9-0001-Change-the-way-ATExecMergePartitions-handles-the-.patch
- v9-0003-Make-new-partitions-with-parent-s-persistence-dur.patch
- v9-0004-Fix-error-message-in-check_partition_bounds_for_s.patch
- v9-0002-Document-the-way-parition-MERGE-SPLIT-operations-.patch
- v9-0005-Rename-tables-in-tests-of-partition-MERGE-SPLIT-o.patch
- v9-0007-Inherit-parent-s-AM-for-partition-MERGE-SPLIT-ope.patch
- v9-0006-Add-tab-completion-for-partition-MERGE-SPLIT-oper.patch
Hi Justin, Thank you for your review. Please check v9 of the patchset [1]. On Wed, Apr 24, 2024 at 11:26 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > This patch also/already fixes the schema issue I reported. Thanks. > > If you wanted to include a test case for that: > > begin; > CREATE SCHEMA s; > CREATE SCHEMA t; > CREATE TABLE p(i int) PARTITION BY RANGE(i); > CREATE TABLE s.c1 PARTITION OF p FOR VALUES FROM (1)TO(2); > CREATE TABLE s.c2 PARTITION OF p FOR VALUES FROM (2)TO(3); > ALTER TABLE p MERGE PARTITIONS (s.c1, s.c2) INTO s.c1; -- misbehaves if merging into the same name as an existing partition > \d+ p > ... > Partitions: c1 FOR VALUES FROM (1) TO (3) There is already a test which checks merging into the same name as an existing partition. And there are tests with schema-qualified names. I'm not yet convinced we need a test with both these properties together. > > 0002 > > The persistence of the new partition is copied as suggested in [1]. > > But the checks are in-place, because search_path could influence new > > table persistence. Per review [2], commit message typos are fixed, > > documentation is revised, revised tests to cover schema-qualification, > > usage of search_path. > > Subject: [PATCH v8 2/7] Make new partitions with parent's persistence during MERGE/SPLIT operations > > This patch adds documentation saying: > + Any indexes, constraints and user-defined row-level triggers that exist > + in the parent table are cloned on new partitions [...] > > Which is good to say, and addresses part of my message [0] > [0] ZiJW1g2nbQs9ekwK@pryzbyj2023 > > But it doesn't have anything to do with "creating new partitions with > parent's persistence". Maybe there was a merge conflict and the docs > ended up in the wrong patch ? Makes sense. Extracted this into a separate patch in v10. > Also, defaults, storage options, compression are also copied. As will > be anything else from LIKE. And since anything added in the future will > also be copied, maybe it's better to just say that the tables will be > created the same way as "LIKE .. INCLUDING ALL EXCLUDING ..", or > similar. Otherwise, the next person who adds a new option for LIKE > would have to remember to update this paragraph... Reworded that way. Thank you. > Also, extended stats objects are currently cloned to new child tables. > But I suggested in [0] that they probably shouldn't be. I will explore this. Do we copy extended stats when we do CREATE TABLE ... PARTITION OF? I think we need to do the same here. > > 007 – doc review by Justin [3] > > I suggest to drop this patch for now. I'll send some more minor fixes to > docs and code comments once the other patches are settled. Your edits are welcome. Dropped this for now. And waiting for the next revision from you. Links. 1. https://www.postgresql.org/message-id/CAPpHfduYuYECrqpHMgcOsNr%2B4j3uJK%2BJPUJ_zDBn-tqjjh3p1Q%40mail.gmail.com ------ Regards, Alexander Korotkov Supabase
Hello, 28.04.2024 03:59, Alexander Korotkov wrote: > The revised patchset is attached. I'm going to push it if there are > no objections. I have one additional question regarding security, if you don't mind: What permissions should a user have to perform split/merge? When we deal with mixed ownership, say, bob is an owner of a partitioned table, but not an owner of a partition, should we allow him to perform merge with that partition? Consider the following script: CREATE ROLE alice; GRANT CREATE ON SCHEMA public TO alice; SET SESSION AUTHORIZATION alice; CREATE TABLE t (i int PRIMARY KEY, t text, u text) PARTITION BY RANGE (i); CREATE TABLE tp_00 PARTITION OF t FOR VALUES FROM (0) TO (10); CREATE TABLE tp_10 PARTITION OF t FOR VALUES FROM (10) TO (20); CREATE POLICY p1 ON tp_00 USING (u = current_user); ALTER TABLE tp_00 ENABLE ROW LEVEL SECURITY; INSERT INTO t(i, t, u) VALUES (0, 'info for bob', 'bob'); INSERT INTO t(i, t, u) VALUES (1, 'info for alice', 'alice'); RESET SESSION AUTHORIZATION; CREATE ROLE bob; GRANT CREATE ON SCHEMA public TO bob; ALTER TABLE t OWNER TO bob; GRANT SELECT ON TABLE tp_00 TO bob; SET SESSION AUTHORIZATION bob; SELECT * FROM tp_00; --- here bob can see his info only \d Schema | Name | Type | Owner --------+-------+-------------------+------- public | t | partitioned table | bob public | tp_00 | table | alice public | tp_10 | table | alice -- but then bob can do: ALTER TABLE t MERGE PARTITIONS (tp_00, tp_10) INTO tp_00; -- (yes, he also can detach the partition tp_00, but then he couldn't -- re-attach nor read it) \d Schema | Name | Type | Owner --------+-------+-------------------+------- public | t | partitioned table | bob public | tp_00 | table | bob Thus bob effectively have captured the partition with the data. What do you think, does this create a new security risk? Best regards, Alexander
On Sun, Apr 28, 2024 at 2:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > 28.04.2024 03:59, Alexander Korotkov wrote: > > The revised patchset is attached. I'm going to push it if there are > > no objections. > > I have one additional question regarding security, if you don't mind: > What permissions should a user have to perform split/merge? > > When we deal with mixed ownership, say, bob is an owner of a > partitioned table, but not an owner of a partition, should we > allow him to perform merge with that partition? > Consider the following script: > CREATE ROLE alice; > GRANT CREATE ON SCHEMA public TO alice; > > SET SESSION AUTHORIZATION alice; > CREATE TABLE t (i int PRIMARY KEY, t text, u text) PARTITION BY RANGE (i); > CREATE TABLE tp_00 PARTITION OF t FOR VALUES FROM (0) TO (10); > CREATE TABLE tp_10 PARTITION OF t FOR VALUES FROM (10) TO (20); > > CREATE POLICY p1 ON tp_00 USING (u = current_user); > ALTER TABLE tp_00 ENABLE ROW LEVEL SECURITY; > > INSERT INTO t(i, t, u) VALUES (0, 'info for bob', 'bob'); > INSERT INTO t(i, t, u) VALUES (1, 'info for alice', 'alice'); > RESET SESSION AUTHORIZATION; > > CREATE ROLE bob; > GRANT CREATE ON SCHEMA public TO bob; > ALTER TABLE t OWNER TO bob; > GRANT SELECT ON TABLE tp_00 TO bob; > > SET SESSION AUTHORIZATION bob; > SELECT * FROM tp_00; > --- here bob can see his info only > \d > Schema | Name | Type | Owner > --------+-------+-------------------+------- > public | t | partitioned table | bob > public | tp_00 | table | alice > public | tp_10 | table | alice > > -- but then bob can do: > ALTER TABLE t MERGE PARTITIONS (tp_00, tp_10) INTO tp_00; > -- (yes, he also can detach the partition tp_00, but then he couldn't > -- re-attach nor read it) > > \d > Schema | Name | Type | Owner > --------+-------+-------------------+------- > public | t | partitioned table | bob > public | tp_00 | table | bob > > Thus bob effectively have captured the partition with the data. > > What do you think, does this create a new security risk? Alexander, thank you for discovering this. I believe that the one who merges partitions should have permissions for all the partitions merged. I'll recheck this and provide the patch. ------ Regards, Alexander Korotkov
On Sun, Apr 28, 2024 at 04:04:54AM +0300, Alexander Korotkov wrote: > Hi Justin, > > Thank you for your review. Please check v9 of the patchset [1]. > > On Wed, Apr 24, 2024 at 11:26 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > This patch also/already fixes the schema issue I reported. Thanks. > > > > If you wanted to include a test case for that: > > > > begin; > > CREATE SCHEMA s; > > CREATE SCHEMA t; > > CREATE TABLE p(i int) PARTITION BY RANGE(i); > > CREATE TABLE s.c1 PARTITION OF p FOR VALUES FROM (1)TO(2); > > CREATE TABLE s.c2 PARTITION OF p FOR VALUES FROM (2)TO(3); > > ALTER TABLE p MERGE PARTITIONS (s.c1, s.c2) INTO s.c1; -- misbehaves if merging into the same name as an existing partition > > \d+ p > > ... > > Partitions: c1 FOR VALUES FROM (1) TO (3) > > There is already a test which checks merging into the same name as an > existing partition. And there are tests with schema-qualified names. > I'm not yet convinced we need a test with both these properties > together. I mentioned that the combination of schemas and merge-into-same-name is what currently doesn't work right. > > Also, extended stats objects are currently cloned to new child tables. > > But I suggested in [0] that they probably shouldn't be. > > I will explore this. Do we copy extended stats when we do CREATE > TABLE ... PARTITION OF? I think we need to do the same here. Right, they're not copied because an extended stats objs on the parent does something different than putting stats objects on each child. I've convinced myself that it's wrong to copy the parent's stats obj. If someone wants stats objects on each child, they'll have to handle them specially after MERGE/SPLIT, just as they would for per-child defaults/constraints/etc. On Sun, Apr 28, 2024 at 04:04:54AM +0300, Alexander Korotkov wrote: > On Wed, Apr 24, 2024 at 11:26 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > This patch adds documentation saying: > > + Any indexes, constraints and user-defined row-level triggers that exist > > + in the parent table are cloned on new partitions [...] > > > > Which is good to say, and addresses part of my message [0] > > [0] ZiJW1g2nbQs9ekwK@pryzbyj2023 > > Makes sense. Extracted this into a separate patch in v10. I adjusted the language some and fixed a typo in the commit message. s/parition/partition/ -- Justin
Attachment
On Sunday, April 28, 2024, Alexander Lakhin <exclusion@gmail.com> wrote:
When we deal with mixed ownership, say, bob is an owner of a
partitioned table, but not an owner of a partition, should we
allow him to perform merge with that partition?
IIUC Merge causes the source tables to be dropped, their data having been effectively moved into the new partition. bob must not be allowed to drop Alice’s tables. Only an owner may do that. So if we do allow bob to build a new partition using his select access, the tables he selected from would have to remain behind if he is not an owner of them.
David J.
On Sunday, April 28, 2024, Alexander Lakhin <exclusion@gmail.com> wrote:
When we deal with mixed ownership, say, bob is an owner of a
partitioned table, but not an owner of a partition, should we
allow him to perform merge with that partition?
Attaching via alter table requires the user to own both the partitioned table and the table being acted upon. Merge needs to behave similarly.
The fact that we let the superuser break the requirement of common ownership is unfortunate but I guess understandable. But given the existing behavior of attach merge should likewise fail if it find the user doesn’t own the partitions being merged. The fact that the user can select from those tables can be acted upon manually if desired; these administrative commands should all ensure common ownership and fail if that precondition is not met.
David J.
On Sun, Apr 28, 2024 at 08:18:42AM -0500, Justin Pryzby wrote: > > I will explore this. Do we copy extended stats when we do CREATE > > TABLE ... PARTITION OF? I think we need to do the same here. > > Right, they're not copied because an extended stats objs on the parent > does something different than putting stats objects on each child. > I've convinced myself that it's wrong to copy the parent's stats obj. > If someone wants stats objects on each child, they'll have to handle > them specially after MERGE/SPLIT, just as they would for per-child > defaults/constraints/etc. I dug up this thread, in which the idea of copying extended stats from parent to child was considered some 6 years ago, but never implemented; for consistency, MERGE/SPLIT shouldn't copy extended stats, either. https://www.postgresql.org/message-id/20180305195750.aecbpihhcvuskzba%40alvherre.pgsql -- Justin
Hi Dmitry, 19.04.2024 02:26, Dmitry Koval wrote: > > 18.04.2024 19:00, Alexander Lakhin wrote: >> leaves a strange constraint: >> \d+ t* >> Table "public.tp_0" >> ... >> Not-null constraints: >> "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i" > > Thanks! > Attached fix (with test) for this case. > The patch should be applied after patches > v6-0001- ... .patch ... v6-0004- ... .patch I still wonder, why that constraint (now with a less questionable name) is created during MERGE? That is, before MERGE, two partitions have only PRIMARY KEY indexes, with no not-null constraint, and you can manually remove the constraint after MERGE, so maybe it's not necessary... Best regards, Alexander
Hi! 1. 29.04.2024 21:00, Alexander Lakhin wrote: > I still wonder, why that constraint (now with a less questionable name) is > created during MERGE? The SPLIT/MERGE PARTITION(S) commands for creating partitions reuse the existing code of CREATE TABLE .. LIKE ... command. A new partition was created with the name "merge-16385-26BCB0-tmp" (since there was an old partition with the same name). The constraint "merge-16385-26BCB0-tmp_i_not_null" was created too together with the partition. Subsequently, the table was renamed, but the constraint was not. Now a new partition is immediately created with the correct name (the old partition is renamed). 2. Just in case, I am attaching a small fix v9_fix.diff for situation [1]. [1] https://www.postgresql.org/message-id/0520c72e-8d97-245e-53f9-173beca2ab2e%40gmail.com -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
30.04.2024 03:10, Dmitry Koval wrote: > Hi! > > 1. > 29.04.2024 21:00, Alexander Lakhin wrote: >> I still wonder, why that constraint (now with a less questionable name) is >> created during MERGE? > > The SPLIT/MERGE PARTITION(S) commands for creating partitions reuse the existing code of CREATE TABLE .. LIKE ... > command. A new partition was created with the name "merge-16385-26BCB0-tmp" (since there was an old partition with the > same name). The constraint "merge-16385-26BCB0-tmp_i_not_null" was created too together with the partition. > Subsequently, the table was renamed, but the constraint was not. > Now a new partition is immediately created with the correct name (the old partition is renamed). Maybe I'm doing something wrong, but the following script: CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i); CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); CREATE TABLE t2 (LIKE t INCLUDING ALL); CREATE TABLE tp2 (LIKE tp_0 INCLUDING ALL); creates tables t2, tp2 without not-null constraints. But after ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0; I see: \d+ tp_0 ... Indexes: "tp_0_pkey" PRIMARY KEY, btree (i) Not-null constraints: "tp_0_i_not_null" NOT NULL "i" Best regards, Alexander
On Thu, Apr 11, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote: > 11.04.2024 16:27, Dmitry Koval wrote: > > > > Added correction (and test), see v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch. > > Thank you for the correction, but may be an attempt to merge into implicit > pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does? > > Please look also at another anomaly with schemas: > CREATE SCHEMA s1; > CREATE TABLE t (i int) PARTITION BY RANGE (i); > CREATE TABLE tp_0_2 PARTITION OF t > FOR VALUES FROM (0) TO (2); > ALTER TABLE t SPLIT PARTITION tp_0_2 INTO > (PARTITION s1.tp0 FOR VALUES FROM (0) TO (1), PARTITION s1.tp1 FOR VALUES FROM (1) TO (2)); > results in: > \d+ s1.* > Did not find any relation named "s1.*" > \d+ tp* > Table "public.tp0" Hi, Is this issue already fixed ? I wasn't able to reproduce it. Maybe it only happened with earlier patch versions applied ? Thanks, -- Justin
Hi! 30.04.2024 6:00, Alexander Lakhin пишет: > Maybe I'm doing something wrong, but the following script: > CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i); > CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1); > CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); > > CREATE TABLE t2 (LIKE t INCLUDING ALL); > CREATE TABLE tp2 (LIKE tp_0 INCLUDING ALL); > creates tables t2, tp2 without not-null constraints. To create partitions is used the "CREATE TABLE ... LIKE ..." command with the "EXCLUDING INDEXES" modifier (to speed up the insertion of values). CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE(i); CREATE TABLE t2 (LIKE t INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY); \d+ t2; ... Not-null constraints: "t2_i_not_null" NOT NULL "i" Access method: heap [1] https://github.com/postgres/postgres/blob/d12b4ba1bd3eedd862064cf1dad5ff107c5cba90/src/backend/commands/tablecmds.c#L21215 -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Hi! 30.04.2024 23:15, Justin Pryzby пишет: > Is this issue already fixed ? > I wasn't able to reproduce it. Maybe it only happened with earlier > patch versions applied ? I think this was fixed in commit [1]. [1] https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91 -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote: > Hi! > > 30.04.2024 23:15, Justin Pryzby пишет: > > Is this issue already fixed ? > > I wasn't able to reproduce it. Maybe it only happened with earlier > > patch versions applied ? > > I think this was fixed in commit [1]. > > [1] https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91 I tried to reproduce it at fcf80c5d5f~, but couldn't. I don't see how that patch would fix it anyway. I'm hoping Alexander can confirm what happened. The other remaining issues I'm aware of are for EXCLUDING STATISTICS and refusing to ALTER if the owners don't match. Note that the error that led to "EXCLUDING IDENTITY" is being discused over here: https://www.postgresql.org/message-id/3b8a9dc1-bbc7-0ef5-6863-c432afac7d59@gmail.com It's possible that once that's addressed, the exclusion should be removed here, too. -- Justin
On Fri, May 3, 2024 at 4:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote: > > 30.04.2024 23:15, Justin Pryzby пишет: > > > Is this issue already fixed ? > > > I wasn't able to reproduce it. Maybe it only happened with earlier > > > patch versions applied ? > > > > I think this was fixed in commit [1]. > > > > [1] https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91 > > I tried to reproduce it at fcf80c5d5f~, but couldn't. > I don't see how that patch would fix it anyway. > I'm hoping Alexander can confirm what happened. This problem is only relevant for an old version of fix [1], which overrides schemas for new partitions. That version was never committed. > The other remaining issues I'm aware of are for EXCLUDING STATISTICS and > refusing to ALTER if the owners don't match. These two are in my list. I'm planning to work on them in the next few days. > Note that the error that led to "EXCLUDING IDENTITY" is being discused > over here: > https://www.postgresql.org/message-id/3b8a9dc1-bbc7-0ef5-6863-c432afac7d59@gmail.com > > It's possible that once that's addressed, the exclusion should be > removed here, too. +1 Links. 1. https://www.postgresql.org/message-id/edfbd846-dcc1-42d1-ac26-715691b687d3%40postgrespro.ru ------ Regards, Alexander Korotkov Supabase
On Fri, May 3, 2024 at 4:32 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Fri, May 3, 2024 at 4:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote: > > > 30.04.2024 23:15, Justin Pryzby пишет: > > > > Is this issue already fixed ? > > > > I wasn't able to reproduce it. Maybe it only happened with earlier > > > > patch versions applied ? > > > > > > I think this was fixed in commit [1]. > > > > > > [1] https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91 > > > > I tried to reproduce it at fcf80c5d5f~, but couldn't. > > I don't see how that patch would fix it anyway. > > I'm hoping Alexander can confirm what happened. > > This problem is only relevant for an old version of fix [1], which > overrides schemas for new partitions. That version was never > committed. Here are the patches. 0001 Adds permission checks on the partitions before doing MERGE/SPLIT 0002 Skips copying extended statistics while creating new partitions in MERGE/SPLIT 0001 looks quite simple and trivial for me. I'm going to push it if no objections. For 0002 I'd like to hear some feedback on wordings used in docs and comments. ------ Regards, Alexander Korotkov Supabase
Attachment
On Wed, May 1, 2024 at 12:14 AM Dmitry Koval <d.koval@postgrespro.ru> wrote:
> 30.04.2024 6:00, Alexander Lakhin пишет:
> > Maybe I'm doing something wrong, but the following script:
> > CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i);
> > CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1);
> > CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2);
> >
> > CREATE TABLE t2 (LIKE t INCLUDING ALL);
> > CREATE TABLE tp2 (LIKE tp_0 INCLUDING ALL);
> > creates tables t2, tp2 without not-null constraints.
>
> To create partitions is used the "CREATE TABLE ... LIKE ..." command
> with the "EXCLUDING INDEXES" modifier (to speed up the insertion of values).
>
> CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE(i);
> CREATE TABLE t2 (LIKE t INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY);
> \d+ t2;
> ...
> Not-null constraints:
> "t2_i_not_null" NOT NULL "i"
> Access method: heap
I've explored this a little bit more.
If the parent table has explicit not null constraint than results of MERGE/SPLIT look the same as result of CREATE TABLE ... PARTITION OF. In every case there is explicit not null constraint in all the cases.
# CREATE TABLE t (i int not null, PRIMARY KEY(i)) PARTITION BY RANGE(i);
> 30.04.2024 6:00, Alexander Lakhin пишет:
> > Maybe I'm doing something wrong, but the following script:
> > CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i);
> > CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1);
> > CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2);
> >
> > CREATE TABLE t2 (LIKE t INCLUDING ALL);
> > CREATE TABLE tp2 (LIKE tp_0 INCLUDING ALL);
> > creates tables t2, tp2 without not-null constraints.
>
> To create partitions is used the "CREATE TABLE ... LIKE ..." command
> with the "EXCLUDING INDEXES" modifier (to speed up the insertion of values).
>
> CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE(i);
> CREATE TABLE t2 (LIKE t INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY);
> \d+ t2;
> ...
> Not-null constraints:
> "t2_i_not_null" NOT NULL "i"
> Access method: heap
I've explored this a little bit more.
If the parent table has explicit not null constraint than results of MERGE/SPLIT look the same as result of CREATE TABLE ... PARTITION OF. In every case there is explicit not null constraint in all the cases.
# CREATE TABLE t (i int not null, PRIMARY KEY(i)) PARTITION BY RANGE(i);
# \d+ t
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition key: RANGE (i)
Indexes:
"t_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"t_i_not_null" NOT NULL "i"
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition key: RANGE (i)
Indexes:
"t_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"t_i_not_null" NOT NULL "i"
Number of partitions: 0
# CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2);
# \d+ tp_0_2
Table "public.tp_0_2"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition of: t FOR VALUES FROM (0) TO (2)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2))
Indexes:
"tp_0_2_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"t_i_not_null" NOT NULL "i" (inherited)
Access method: heap
# ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
# (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
# PARTITION tp_1_2 FOR VALUES FROM (1) TO (2))
# \d+ tp_0_1
Table "public.tp_0_1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition of: t FOR VALUES FROM (0) TO (1)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 1))
Indexes:
"tp_0_1_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"t_i_not_null" NOT NULL "i" (inherited)
Access method: heap
However, if not null constraint is implicit and derived from primary key, the situation is different. The partition created by CREATE TABLE ... PARTITION OF doesn't have explicit not null constraint just like the parent. But the partition created by MERGE/SPLIT has explicit not null contraint.
# CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2);
# \d+ tp_0_2
Table "public.tp_0_2"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition of: t FOR VALUES FROM (0) TO (2)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2))
Indexes:
"tp_0_2_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"t_i_not_null" NOT NULL "i" (inherited)
Access method: heap
# ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
# (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
# PARTITION tp_1_2 FOR VALUES FROM (1) TO (2))
# \d+ tp_0_1
Table "public.tp_0_1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition of: t FOR VALUES FROM (0) TO (1)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 1))
Indexes:
"tp_0_1_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"t_i_not_null" NOT NULL "i" (inherited)
Access method: heap
However, if not null constraint is implicit and derived from primary key, the situation is different. The partition created by CREATE TABLE ... PARTITION OF doesn't have explicit not null constraint just like the parent. But the partition created by MERGE/SPLIT has explicit not null contraint.
# CREATE TABLE t (i int not null, PRIMARY KEY(i)) PARTITION BY RANGE(i);
------
Regards,
Alexander Korotkov
Supabase
# \d+ t
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition key: RANGE (i)
Indexes:
"t_pkey" PRIMARY KEY, btree (i)
Number of partitions: 0
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition key: RANGE (i)
Indexes:
"t_pkey" PRIMARY KEY, btree (i)
Number of partitions: 0
# CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2);
# \d+ tp_0_2
Table "public.tp_0_2"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition of: t FOR VALUES FROM (0) TO (2)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2))
Indexes:
"tp_0_2_pkey" PRIMARY KEY, btree (i)
Access method: heap
# \d+ tp_0_2
Table "public.tp_0_2"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition of: t FOR VALUES FROM (0) TO (2)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2))
Indexes:
"tp_0_2_pkey" PRIMARY KEY, btree (i)
Access method: heap
# ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
# (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
# PARTITION tp_1_2 FOR VALUES FROM (1) TO (2))
# (PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
# PARTITION tp_1_2 FOR VALUES FROM (1) TO (2))
# \d+ tp_0_1
Table "public.tp_0_1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition of: t FOR VALUES FROM (0) TO (1)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 1))
Indexes:
"tp_0_1_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"tp_0_1_i_not_null" NOT NULL "i"
Access method: heap
Table "public.tp_0_1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Partition of: t FOR VALUES FROM (0) TO (1)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 1))
Indexes:
"tp_0_1_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"tp_0_1_i_not_null" NOT NULL "i"
Access method: heap
I think this is related to the fact that we create indexes later. The same applies to CREATE TABLE ... LIKE. If we create indexes immediately, not explicit not null contraints are created. Not if we do without indexes, we have an explicit not null constraint.
# CREATE TABLE t2 (LIKE t INCLUDING ALL);
# \d+ t2
Table "public.t2"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Not-null constraints:
"t2_i_not_null" NOT NULL "i"
Access method: heap
Table "public.t2"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Not-null constraints:
"t2_i_not_null" NOT NULL "i"
Access method: heap
# CREATE TABLE t3 (LIKE t INCLUDING ALL EXCLUDING IDENTITY);
# \d+ t3
Table "public.t3"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Indexes:
"t3_pkey" PRIMARY KEY, btree (i)
Access method: heap
Table "public.t3"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
i | integer | | not null | | plain | | |
Indexes:
"t3_pkey" PRIMARY KEY, btree (i)
Access method: heap
I think this is feasible to avoid. However, it's minor and we exactly documented how we create new partitions. So, I think it works "as documented" and we don't have to fix this for v17.
Regards,
Alexander Korotkov
Supabase
On Wed, May 08, 2024 at 09:00:10PM +0300, Alexander Korotkov wrote: > On Fri, May 3, 2024 at 4:32 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Fri, May 3, 2024 at 4:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote: > > > > 30.04.2024 23:15, Justin Pryzby пишет: > > > > > Is this issue already fixed ? > > > > > I wasn't able to reproduce it. Maybe it only happened with earlier > > > > > patch versions applied ? > > > > > > > > I think this was fixed in commit [1]. > > > > > > > > [1] https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91 > > > > > > I tried to reproduce it at fcf80c5d5f~, but couldn't. > > > I don't see how that patch would fix it anyway. > > > I'm hoping Alexander can confirm what happened. > > > > This problem is only relevant for an old version of fix [1], which > > overrides schemas for new partitions. That version was never > > committed. > > Here are the patches. > 0002 Skips copying extended statistics while creating new partitions in MERGE/SPLIT > > For 0002 I'd like to hear some feedback on wordings used in docs and comments. commit message: Currenlty => Currently partiions => partitios copying => by copying > However, parent's table extended statistics already covers all its > children. => That's the wrong explanation. It's not that "stats on the parent table cover its children". It's that there are two types of stats: stats for the "table hierarchy" and stats for the individual table. That's true for single-column stats as well as for extended stats. In both cases, that's indicated by the inh flag in the code and in the catalog. The right explanation is that extended stats on partitioned tables are not similar to indexes. Indexes on parent table are nothing other than a mechanism to create indexes on the child tables. That's not true for stats. See also my prior messages ZiJW1g2nbQs9ekwK@pryzbyj2023 Zi5Msg74C61DjJKW@pryzbyj2023 I think EXCLUDE IDENTITY can/should now also be removed - see 509199587. I'm not able to reproduce that problem anyway, even before that... -- Justin
On Thu, May 9, 2024 at 12:37 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Wed, May 08, 2024 at 09:00:10PM +0300, Alexander Korotkov wrote: > > On Fri, May 3, 2024 at 4:32 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > On Fri, May 3, 2024 at 4:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote: > > > > > 30.04.2024 23:15, Justin Pryzby пишет: > > > > > > Is this issue already fixed ? > > > > > > I wasn't able to reproduce it. Maybe it only happened with earlier > > > > > > patch versions applied ? > > > > > > > > > > I think this was fixed in commit [1]. > > > > > > > > > > [1] https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91 > > > > > > > > I tried to reproduce it at fcf80c5d5f~, but couldn't. > > > > I don't see how that patch would fix it anyway. > > > > I'm hoping Alexander can confirm what happened. > > > > > > This problem is only relevant for an old version of fix [1], which > > > overrides schemas for new partitions. That version was never > > > committed. > > > > Here are the patches. > > 0002 Skips copying extended statistics while creating new partitions in MERGE/SPLIT > > > > For 0002 I'd like to hear some feedback on wordings used in docs and comments. > > commit message: > > Currenlty => Currently > partiions => partitios > copying => by copying Thank you! > > > However, parent's table extended statistics already covers all its > > children. > > => That's the wrong explanation. It's not that "stats on the parent > table cover its children". It's that there are two types of stats: > stats for the "table hierarchy" and stats for the individual table. > That's true for single-column stats as well as for extended stats. > In both cases, that's indicated by the inh flag in the code and in the > catalog. > > The right explanation is that extended stats on partitioned tables are > not similar to indexes. Indexes on parent table are nothing other than > a mechanism to create indexes on the child tables. That's not true for > stats. > > See also my prior messages > ZiJW1g2nbQs9ekwK@pryzbyj2023 > Zi5Msg74C61DjJKW@pryzbyj2023 Yes, I understand that parents pg_statistic entry with stainherit == true includes statistics for the children. I tried to express this by word "covers". But you're right, this is the wrong explanation. Can I, please, ask you to revise the patch? > I think EXCLUDE IDENTITY can/should now also be removed - see 509199587. > I'm not able to reproduce that problem anyway, even before that... I will check this. ------ Regards, Alexander Korotkov Supabase
Hello Dmitry and Alexander, Please look at one more anomaly with temporary tables: CREATE TEMP TABLE t (a int) PARTITION BY RANGE (a); CREATE TEMP TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1) ; CREATE TEMP TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0; -- succeeds, but: ALTER TABLE t SPLIT PARTITION tp_0 INTO (PARTITION tp_0 FOR VALUES FROM (0) TO (1), PARTITION tp_1 FOR VALUES FROM (1) TO (2)); -- fails with: ERROR: relation "tp_0" already exists Though the same SPLIT succeeds with non-temporary tables... Best regards, Alexander
Hi! 11.05.2024 12:00, Alexander Lakhin wrote: > Please look at one more anomaly with temporary tables: Thank you, Alexander! The problem affects the SPLIT PARTITION command. CREATE TEMP TABLE t (a int) PARTITION BY RANGE (a); CREATE TEMP TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (2) ; -- ERROR: relation "tp_0" already exists ALTER TABLE t SPLIT PARTITION tp_0 INTO (PARTITION tp_0 FOR VALUES FROM (0) TO (1), PARTITION tp_1 FOR VALUES FROM (1) TO (2)); I'll try to fix it soon. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Hi! Attached draft version of fix for [1]. [1] https://www.postgresql.org/message-id/86b4f1e3-0b5d-315c-9225-19860d64d685%40gmail.com -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
Commit 3ca43dbbb67f which adds the permission checks seems to cause conflicts in the pg_upgrade tests: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-05-13%2008%3A36%3A37 There is an issue with dropping and creating roles which seems to stem from this commit: CREATE ROLE regress_partition_merge_alice; +ERROR: role "regress_partition_merge_alice" already exists -- Daniel Gustafsson
Hi! 13.05.2024 11:45, Daniel Gustafsson пишет: > Commit 3ca43dbbb67f which adds the permission checks seems to cause conflicts > in the pg_upgrade tests Thanks! It will probably be enough to rename the roles: regress_partition_merge_alice -> regress_partition_split_alice regress_partition_merge_bob -> regress_partition_split_bob (changes in attachment) -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Mon, May 13, 2024 at 12:45 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > 13.05.2024 11:45, Daniel Gustafsson пишет: > > Commit 3ca43dbbb67f which adds the permission checks seems to cause conflicts > > in the pg_upgrade tests > > Thanks! > > It will probably be enough to rename the roles: > > regress_partition_merge_alice -> regress_partition_split_alice > regress_partition_merge_bob -> regress_partition_split_bob Thanks to Danial for spotting this. Thanks to Dmitry for the proposed fix. The actual problem appears to be a bit more complex. Additionally to the role names, the lack of permissions on schemas lead to creation of tables in public schema and potential conflict there. Fixed in 2a679ae94e. ------ Regards, Alexander Korotkov
On Thu, May 09, 2024 at 12:51:32AM +0300, Alexander Korotkov wrote: > > > However, parent's table extended statistics already covers all its > > > children. > > > > => That's the wrong explanation. It's not that "stats on the parent > > table cover its children". It's that there are two types of stats: > > stats for the "table hierarchy" and stats for the individual table. > > That's true for single-column stats as well as for extended stats. > > In both cases, that's indicated by the inh flag in the code and in the > > catalog. > > > > The right explanation is that extended stats on partitioned tables are > > not similar to indexes. Indexes on parent table are nothing other than > > a mechanism to create indexes on the child tables. That's not true for > > stats. > > > > See also my prior messages > > ZiJW1g2nbQs9ekwK@pryzbyj2023 > > Zi5Msg74C61DjJKW@pryzbyj2023 > > Yes, I understand that parents pg_statistic entry with stainherit == > true includes statistics for the children. I tried to express this by > word "covers". But you're right, this is the wrong explanation. > > Can I, please, ask you to revise the patch? I tried to make this clear but it'd be nice if someone (Tomas/Alvaro?) would check that this says what's wanted. -- Justin
Attachment
On Tue, May 14, 2024 at 5:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Thu, May 09, 2024 at 12:51:32AM +0300, Alexander Korotkov wrote: > > > > However, parent's table extended statistics already covers all its > > > > children. > > > > > > => That's the wrong explanation. It's not that "stats on the parent > > > table cover its children". It's that there are two types of stats: > > > stats for the "table hierarchy" and stats for the individual table. > > > That's true for single-column stats as well as for extended stats. > > > In both cases, that's indicated by the inh flag in the code and in the > > > catalog. > > > > > > The right explanation is that extended stats on partitioned tables are > > > not similar to indexes. Indexes on parent table are nothing other than > > > a mechanism to create indexes on the child tables. That's not true for > > > stats. > > > > > > See also my prior messages > > > ZiJW1g2nbQs9ekwK@pryzbyj2023 > > > Zi5Msg74C61DjJKW@pryzbyj2023 > > > > Yes, I understand that parents pg_statistic entry with stainherit == > > true includes statistics for the children. I tried to express this by > > word "covers". But you're right, this is the wrong explanation. > > > > Can I, please, ask you to revise the patch? > > I tried to make this clear but it'd be nice if someone (Tomas/Alvaro?) > would check that this says what's wanted. Thank you! I've assembled the patches with the pending fixes. 0001 – The patch by Dmitry Koval for fixing detection of name collision in SPLIT partition operation. Also, I found that name collision detection doesn't work well for MERGE partitions. I've added fix for that to this patch as well. 0002 -– Patch for skipping copy of extended statistics. I would appreciate more feedback about wording, but I'd like to get a correct behavior into the source tree sooner. If the docs and/or comments need further improvements, we can fix that later. I'm going to push both if no objections. Links. 1. https://www.postgresql.org/message-id/147426d9-b793-4571-a5e5-7438affeeb5a%40postgrespro.ru ------ Regards, Alexander Korotkov Supabase
Attachment
Hi, Alexander:
On Fri, 17 May 2024 at 14:05, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Tue, May 14, 2024 at 5:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Thu, May 09, 2024 at 12:51:32AM +0300, Alexander Korotkov wrote:
> > > > However, parent's table extended statistics already covers all its
> > > > children.
> > >
> > > => That's the wrong explanation. It's not that "stats on the parent
> > > table cover its children". It's that there are two types of stats:
> > > stats for the "table hierarchy" and stats for the individual table.
> > > That's true for single-column stats as well as for extended stats.
> > > In both cases, that's indicated by the inh flag in the code and in the
> > > catalog.
> > >
> > > The right explanation is that extended stats on partitioned tables are
> > > not similar to indexes. Indexes on parent table are nothing other than
> > > a mechanism to create indexes on the child tables. That's not true for
> > > stats.
> > >
> > > See also my prior messages
> > > ZiJW1g2nbQs9ekwK@pryzbyj2023
> > > Zi5Msg74C61DjJKW@pryzbyj2023
> >
> > Yes, I understand that parents pg_statistic entry with stainherit ==
> > true includes statistics for the children. I tried to express this by
> > word "covers". But you're right, this is the wrong explanation.
> >
> > Can I, please, ask you to revise the patch?
>
> I tried to make this clear but it'd be nice if someone (Tomas/Alvaro?)
> would check that this says what's wanted.
Thank you!
I've assembled the patches with the pending fixes.
0001 – The patch by Dmitry Koval for fixing detection of name
collision in SPLIT partition operation. Also, I found that name
collision detection doesn't work well for MERGE partitions. I've
added fix for that to this patch as well.
0002 -– Patch for skipping copy of extended statistics. I would
appreciate more feedback about wording, but I'd like to get a correct
behavior into the source tree sooner. If the docs and/or comments
need further improvements, we can fix that later.
I'm going to push both if no objections.
Thank you for working on this patch set!
Some minor things:
0001:
partition_split.sql
157 +-- Check that detection, that the new partition has the same name as one of158 +-- the merged partitions, works correctly for temporary partitions
Test for split with comment for merge. Maybe better something like:
"Split partition of a temporary table when one of the partitions after split has the same name as the partition being split"
0002:
analgous -> analogous (maybe better using "like" instead of "analogous to")
heirarchy -> hierarchy
alter_table.sgml:
Maybe in documentation it's better not to provide reasoning, just state how it works:
for consistency with <command>CREATE TABLE PARTITION OF</command> -> similar to <command>CREATE TABLE PARTITION OF</command>
Regards,
Pavel Borisov
Hi, Pavel! On Fri, May 17, 2024 at 2:02 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > On Fri, 17 May 2024 at 14:05, Alexander Korotkov <aekorotkov@gmail.com> wrote: >> >> On Tue, May 14, 2024 at 5:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote: >> > On Thu, May 09, 2024 at 12:51:32AM +0300, Alexander Korotkov wrote: >> > > > > However, parent's table extended statistics already covers all its >> > > > > children. >> > > > >> > > > => That's the wrong explanation. It's not that "stats on the parent >> > > > table cover its children". It's that there are two types of stats: >> > > > stats for the "table hierarchy" and stats for the individual table. >> > > > That's true for single-column stats as well as for extended stats. >> > > > In both cases, that's indicated by the inh flag in the code and in the >> > > > catalog. >> > > > >> > > > The right explanation is that extended stats on partitioned tables are >> > > > not similar to indexes. Indexes on parent table are nothing other than >> > > > a mechanism to create indexes on the child tables. That's not true for >> > > > stats. >> > > > >> > > > See also my prior messages >> > > > ZiJW1g2nbQs9ekwK@pryzbyj2023 >> > > > Zi5Msg74C61DjJKW@pryzbyj2023 >> > > >> > > Yes, I understand that parents pg_statistic entry with stainherit == >> > > true includes statistics for the children. I tried to express this by >> > > word "covers". But you're right, this is the wrong explanation. >> > > >> > > Can I, please, ask you to revise the patch? >> > >> > I tried to make this clear but it'd be nice if someone (Tomas/Alvaro?) >> > would check that this says what's wanted. >> >> Thank you! >> >> I've assembled the patches with the pending fixes. >> 0001 – The patch by Dmitry Koval for fixing detection of name >> collision in SPLIT partition operation. Also, I found that name >> collision detection doesn't work well for MERGE partitions. I've >> added fix for that to this patch as well. >> 0002 -– Patch for skipping copy of extended statistics. I would >> appreciate more feedback about wording, but I'd like to get a correct >> behavior into the source tree sooner. If the docs and/or comments >> need further improvements, we can fix that later. >> >> I'm going to push both if no objections. > > Thank you for working on this patch set! > > Some minor things: > 0001: > partition_split.sql > 157 +-- Check that detection, that the new partition has the same name as one of > 158 +-- the merged partitions, works correctly for temporary partitions > Test for split with comment for merge. Maybe better something like: > "Split partition of a temporary table when one of the partitions after split has the same name as the partition being split" Thank you, fixed as proposed. > 0002: > analgous -> analogous (maybe better using "like" instead of "analogous to") > heirarchy -> hierarchy Changed "are not analgous to" to "don't behave like". > alter_table.sgml: > Maybe in documentation it's better not to provide reasoning, just state how it works: > for consistency with <command>CREATE TABLE PARTITION OF</command> -> similar to <command>CREATE TABLE PARTITION OF</command> I'd like to keep this. This is the question, which should naturally arise when you read: "Why this is not just INCLUDING ALL?" ------ Regards, Alexander Korotkov Supabase
Attachment
The partition_split test has unstable results, as shown at [1]. I suggest adding "ORDER BY conname" to the two queries shown to fail there. Better look at other queries in the test for possible similar problems, too. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jackdaw&dt=2024-05-24%2015%3A58%3A17
Hello, 24.05.2024 22:29, Tom Lane wrote: > The partition_split test has unstable results, as shown at [1]. > I suggest adding "ORDER BY conname" to the two queries shown > to fail there. Better look at other queries in the test for > possible similar problems, too. Yes, I've just reproduced it on an aarch64 device as follows: echo "autovacuum_naptime = 1 autovacuum_vacuum_threshold = 1 autovacuum_analyze_threshold = 1 " > ~/temp.config TEMP_CONFIG=~/temp.config TESTS="$(printf 'partition_split %.0s' `seq 100`)" make -s check-tests ... ok 80 - partition_split 749 ms not ok 81 - partition_split 728 ms ok 82 - partition_split 732 ms $ cat src/test/regress/regression.diffs diff -U3 .../src/test/regress/expected/partition_split.out .../src/test/regress/results/partition_split.out --- .../src/test/regress/expected/partition_split.out 2024-05-15 17:15:57.171999830 +0000 +++ .../src/test/regress/results/partition_split.out 2024-05-24 19:28:37.329999749 +0000 @@ -625,8 +625,8 @@ SELECT pg_get_constraintdef(oid), conname, conkey FROM pg_constraint WHERE conrelid = 'sales_feb_mar_apr2022'::regclass::oid; pg_get_constraintdef | conname | conkey ---------------------------------------------------------------------+---------------------------------+-------- - CHECK ((sales_amount > 1)) | sales_range_sales_amount_check | {2} FOREIGN KEY (salesperson_id) REFERENCES salespeople(salesperson_id) | sales_range_salesperson_id_fkey | {1} + CHECK ((sales_amount > 1)) | sales_range_sales_amount_check | {2} (2 rows) ALTER TABLE sales_range SPLIT PARTITION sales_feb_mar_apr2022 INTO Best regards, Alexander
On Fri, May 24, 2024 at 11:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > 24.05.2024 22:29, Tom Lane wrote: > > The partition_split test has unstable results, as shown at [1]. > > I suggest adding "ORDER BY conname" to the two queries shown > > to fail there. Better look at other queries in the test for > > possible similar problems, too. > > Yes, I've just reproduced it on an aarch64 device as follows: > echo "autovacuum_naptime = 1 > autovacuum_vacuum_threshold = 1 > autovacuum_analyze_threshold = 1 > " > ~/temp.config > TEMP_CONFIG=~/temp.config TESTS="$(printf 'partition_split %.0s' `seq 100`)" make -s check-tests > ... > ok 80 - partition_split 749 ms > not ok 81 - partition_split 728 ms > ok 82 - partition_split 732 ms > > $ cat src/test/regress/regression.diffs > diff -U3 .../src/test/regress/expected/partition_split.out .../src/test/regress/results/partition_split.out > --- .../src/test/regress/expected/partition_split.out 2024-05-15 17:15:57.171999830 +0000 > +++ .../src/test/regress/results/partition_split.out 2024-05-24 19:28:37.329999749 +0000 > @@ -625,8 +625,8 @@ > SELECT pg_get_constraintdef(oid), conname, conkey FROM pg_constraint WHERE conrelid = > 'sales_feb_mar_apr2022'::regclass::oid; > pg_get_constraintdef | conname | conkey > ---------------------------------------------------------------------+---------------------------------+-------- > - CHECK ((sales_amount > 1)) | sales_range_sales_amount_check | {2} > FOREIGN KEY (salesperson_id) REFERENCES salespeople(salesperson_id) | sales_range_salesperson_id_fkey | {1} > + CHECK ((sales_amount > 1)) | sales_range_sales_amount_check | {2} > (2 rows) > > ALTER TABLE sales_range SPLIT PARTITION sales_feb_mar_apr2022 INTO Tom, Alexander, thank you for spotting this. I'm going to care about it later today. ------ Regards, Alexander Korotkov Supabase
On Fri, May 03, 2024 at 04:32:25PM +0300, Alexander Korotkov wrote: > On Fri, May 3, 2024 at 4:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > Note that the error that led to "EXCLUDING IDENTITY" is being discused > > over here: > > https://www.postgresql.org/message-id/3b8a9dc1-bbc7-0ef5-6863-c432afac7d59@gmail.com > > > > It's possible that once that's addressed, the exclusion should be > > removed here, too. > > +1 Can EXCLUDING IDENTITY be removed now ? I wasn't able to find why it was needed - at one point, I think there was a test case that threw an error, but now when I remove the EXCLUDE, nothing goes wrong. -- Justin
On Sat, May 25, 2024 at 8:53 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Fri, May 03, 2024 at 04:32:25PM +0300, Alexander Korotkov wrote: > > On Fri, May 3, 2024 at 4:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > Note that the error that led to "EXCLUDING IDENTITY" is being discused > > > over here: > > > https://www.postgresql.org/message-id/3b8a9dc1-bbc7-0ef5-6863-c432afac7d59@gmail.com > > > > > > It's possible that once that's addressed, the exclusion should be > > > removed here, too. > > > > +1 > > Can EXCLUDING IDENTITY be removed now ? > > I wasn't able to find why it was needed - at one point, I think there > was a test case that threw an error, but now when I remove the EXCLUDE, > nothing goes wrong. Yes, it was broken before [1][2], but now it seems to work. At the same time, I'm not sure if we need to remove the EXCLUDE now. IDENTITY is anyway successfully created when the new partition gets attached. Links. 1. https://www.postgresql.org/message-id/171085360143.2046436.7217841141682511557.pgcf@coridan.postgresql.org 2. https://www.postgresql.org/message-id/flat/ZiGH0xc1lxJ71ZfB%40pryzbyj2023#297b6aef85cb089abb38e9b1a9a7ffff ------ Regards, Alexander Korotkov Supabase
On Sat, May 25, 2024 at 3:53 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Fri, May 24, 2024 at 11:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > > > 24.05.2024 22:29, Tom Lane wrote: > > > The partition_split test has unstable results, as shown at [1]. > > > I suggest adding "ORDER BY conname" to the two queries shown > > > to fail there. Better look at other queries in the test for > > > possible similar problems, too. > > > > Yes, I've just reproduced it on an aarch64 device as follows: > > echo "autovacuum_naptime = 1 > > autovacuum_vacuum_threshold = 1 > > autovacuum_analyze_threshold = 1 > > " > ~/temp.config > > TEMP_CONFIG=~/temp.config TESTS="$(printf 'partition_split %.0s' `seq 100`)" make -s check-tests > > ... > > ok 80 - partition_split 749 ms > > not ok 81 - partition_split 728 ms > > ok 82 - partition_split 732 ms > > > > $ cat src/test/regress/regression.diffs > > diff -U3 .../src/test/regress/expected/partition_split.out .../src/test/regress/results/partition_split.out > > --- .../src/test/regress/expected/partition_split.out 2024-05-15 17:15:57.171999830 +0000 > > +++ .../src/test/regress/results/partition_split.out 2024-05-24 19:28:37.329999749 +0000 > > @@ -625,8 +625,8 @@ > > SELECT pg_get_constraintdef(oid), conname, conkey FROM pg_constraint WHERE conrelid = > > 'sales_feb_mar_apr2022'::regclass::oid; > > pg_get_constraintdef | conname | conkey > > ---------------------------------------------------------------------+---------------------------------+-------- > > - CHECK ((sales_amount > 1)) | sales_range_sales_amount_check | {2} > > FOREIGN KEY (salesperson_id) REFERENCES salespeople(salesperson_id) | sales_range_salesperson_id_fkey | {1} > > + CHECK ((sales_amount > 1)) | sales_range_sales_amount_check | {2} > > (2 rows) > > > > ALTER TABLE sales_range SPLIT PARTITION sales_feb_mar_apr2022 INTO > > Tom, Alexander, thank you for spotting this. > I'm going to care about it later today. ORDER BY is added in d53a4286d7 in these queries altogether with other catalog queries with potentially unstable result. ------ Regards, Alexander Korotkov Supabase
On Sun, Apr 07, 2024 at 01:22:51AM +0300, Alexander Korotkov wrote: > I've pushed 0001 and 0002 The partition MERGE (1adf16b8f) and SPLIT (87c21bb94) v17 patches introduced createPartitionTable() with this code: createStmt->relation = newPartName; ... wrapper->utilityStmt = (Node *) createStmt; ... ProcessUtility(wrapper, ... newRel = table_openrv(newPartName, NoLock); This breaks from the CVE-2014-0062 (commit 5f17304) principle of not repeating name lookups. The attached demo uses this defect to make one partition have two parents.
Attachment
On Thu, Aug 8, 2024 at 8:14 PM Noah Misch <noah@leadboat.com> wrote: > On Sun, Apr 07, 2024 at 01:22:51AM +0300, Alexander Korotkov wrote: > > I've pushed 0001 and 0002 > > The partition MERGE (1adf16b8f) and SPLIT (87c21bb94) v17 patches introduced > createPartitionTable() with this code: > > createStmt->relation = newPartName; > ... > wrapper->utilityStmt = (Node *) createStmt; > ... > ProcessUtility(wrapper, > ... > newRel = table_openrv(newPartName, NoLock); > > This breaks from the CVE-2014-0062 (commit 5f17304) principle of not repeating > name lookups. The attached demo uses this defect to make one partition have > two parents. Thank you for a valuable report. I will dig into and fix that. ------ Regards, Alexander Korotkov Supabase
> This breaks from the CVE-2014-0062 (commit 5f17304) principle of not repeating > name lookups. The attached demo uses this defect to make one partition have > two parents. Thank you very much for information (especially for the demo)! I'm not sure that we can get the identifier of the newly created partition from the ProcessUtility() function... Maybe it would be enough to check that the new partition is located in the namespace in which we created it (see attachment)? -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Attachment
On Fri, Aug 9, 2024 at 10:18 AM Dmitry Koval <d.koval@postgrespro.ru> wrote: > > This breaks from the CVE-2014-0062 (commit 5f17304) principle of not repeating > > name lookups. The attached demo uses this defect to make one partition have > > two parents. > > Thank you very much for information (especially for the demo)! > > I'm not sure that we can get the identifier of the newly created > partition from the ProcessUtility() function... > Maybe it would be enough to check that the new partition is located in > the namespace in which we created it (see attachment)? The new partition doesn't necessarily get created in the same namespace as parent partition. I think it would be better to somehow open partition by its oid. It would be quite unfortunate to replicate significant part of ProcessUtilitySlow(). So, the question is how to get the oid of newly created relation from ProcessUtility(). I don't like to change the signature of ProcessUtility() especially as a part of backpatch. So, I tried to fit this into existing parameters. Probably QueryCompletion struct fits this purpose best from the existing parameters. Attached draft patch implements returning oid of newly created relation as part of QueryCompletion. Thoughts? ------ Regards, Alexander Korotkov Supabase
Attachment
> Probably > QueryCompletion struct fits this purpose best from the existing > parameters. Attached draft patch implements returning oid of newly > created relation as part of QueryCompletion. Thoughts? I agree, returning the oid of the newly created relation is the best way to solve the problem. (Excuse me, I won't have access to a laptop for the next week - and won't be able to look at the source code). -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Hi, Alexander!
On Mon, 19 Aug 2024 at 02:24, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sat, Aug 10, 2024 at 6:57 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:
> > Probably
> > QueryCompletion struct fits this purpose best from the existing
> > parameters. Attached draft patch implements returning oid of newly
> > created relation as part of QueryCompletion. Thoughts?
>
> I agree, returning the oid of the newly created relation is the best way
> to solve the problem.
> (Excuse me, I won't have access to a laptop for the next week - and
> won't be able to look at the source code).
Thank you for your feedback. Although, I decided QueryCompletion is
not a good place for this new field. It looks more appropriate to
place it to TableLikeClause, which already contains one relation oid
inside. The revised patch is attached.
I've looked at the patch v2. Remembering the OID of a relation newly created with LIKE in TableLikeClause seems good to me.
Check-world passes sucessfully.
Shouldn't we also modify the TableLikeClause node in gram.y accordingly?
For the comments:
Put the Oid -> Store the OID
so caller might use it -> for the caller to use it.
(Maybe also caller -> table create function)
Regards,
Pavel Borisov
Supabase
Hi, Alexander!
On Wed, 21 Aug 2024 at 15:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Pavel!
On Wed, Aug 21, 2024 at 1:48 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Mon, 19 Aug 2024 at 02:24, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> On Sat, Aug 10, 2024 at 6:57 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:
>> > > Probably
>> > > QueryCompletion struct fits this purpose best from the existing
>> > > parameters. Attached draft patch implements returning oid of newly
>> > > created relation as part of QueryCompletion. Thoughts?
>> >
>> > I agree, returning the oid of the newly created relation is the best way
>> > to solve the problem.
>> > (Excuse me, I won't have access to a laptop for the next week - and
>> > won't be able to look at the source code).
>>
>> Thank you for your feedback. Although, I decided QueryCompletion is
>> not a good place for this new field. It looks more appropriate to
>> place it to TableLikeClause, which already contains one relation oid
>> inside. The revised patch is attached.
>
>
> I've looked at the patch v2. Remembering the OID of a relation newly created with LIKE in TableLikeClause seems good to me.
> Check-world passes sucessfully.
Thank you.
> Shouldn't we also modify the TableLikeClause node in gram.y accordingly?
On the one hand, makeNode() uses palloc0() and initializes all fields
with zero anyway. On the other hand, there is already assignment of
relationOid. So, yes I'll add assignment of newRelationOid for the
sake of uniformity.
> For the comments:
> Put the Oid -> Store the OID
> so caller might use it -> for the caller to use it.
Accepted.
> (Maybe also caller -> table create function)
I'll prefer to leave it "caller" as more generic term, which could
also fit potential future usages.
The revised patch is attached. I'm going to push it if no objections.
Looked at v3
All good except the patch has "Oid" and "OID" in two comments. I suppose "OID" is preferred elsewhere in the PG comments.
Regards,
Pavel.
Hi, In response to some concerns raised about this fix on the pgsql-release list today, I spent some time investigating this patch. Unfortunately, I think there are too many problems here to be reasonably fixed before release, and I think all of SPLIT/MERGE PARTITION needs to be reverted. I focused my investigation on createPartitionTable(), which is a helper for both SPLIT PARTITION and MERGE PARTITION, and it works by consing up a CREATE TABLE AS statement and then feeding that back through ProcessUtility. I think it's bad design to use such a high-level facility here; it is unlike what we do elsewhere in tablecmds.c and opens us up to a variety of problems. The first thing that I discovered is that this patch does not fix all of the repeated name lookup problems. There is still this: tlc->relation = makeRangeVar(get_namespace_name(RelationGetNamespace(modelRel)), RelationGetRelationName(modelRel), -1); And also this: createStmt->tablespacename = get_tablespace_name(modelRel->rd_rel->reltablespace); In both cases, we do a reverse lookup on an OID to get a name which the CREATE TABLE code will later turn back into an OID. If we don't get the same value, that's at least a bug and probably a security vulnerability, and there is no way to be certain that we will get the same value. The only remedy is to not repeat the lookup in the first place. Then I got to looking at this: tlc->options = CREATE_TABLE_LIKE_ALL & ~(CREATE_TABLE_LIKE_INDEXES | CREATE_TABLE_LIKE_IDENTITY | CREATE_TABLE_LIKE_STATISTICS); It's not obvious at first glance that there is a critical problem here, but there are reasons to be nervous. We're deploying a lot of machinery here to copy a lot of stuff and, while that's efficient from a coding perspective, it means that stuff you might not expect can just kind of happen. For instance: robert.haas=# \d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+------+-------------------+-------------+-------------+---------------+------------+------------- public | foo | partitioned table | robert.haas | permanent | | 0 bytes | public | foo1 | table | robert.haas | permanent | heap | 8192 bytes | public | foo2 | table | bob | permanent | heap | 8192 bytes | (3 rows) robert.haas=# alter table foo split partition foo2 into (partition foo3 for values from (10) to (15), partition foo4 for values from (15) to (20)); ALTER TABLE robert.haas=# \d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+------+-------------------+-------------+-------------+---------------+------------+------------- public | foo | partitioned table | robert.haas | permanent | | 0 bytes | public | foo1 | table | robert.haas | permanent | heap | 8192 bytes | public | foo3 | table | robert.haas | permanent | heap | 8192 bytes | public | foo4 | table | robert.haas | permanent | heap | 8192 bytes | (4 rows) I've split a partition owned by bob into two partitions owned by robert.haas. That's rather surprising. It doesn't work to split a partition that I don't own (and thus gain access to it) but if the superuser splits a non-superuser's partition, the superuser ends upowning the new partitions. I don't know if that's a vulnerability or just unexpected. However, then I found this, which I'm pretty well certain is a vulnerability: robert.haas=# set role bob; SET robert.haas=> create table foo (a int, b text) partition by range (a); CREATE TABLE robert.haas=> create table foo1 partition of foo for values from (0) to (10); CREATE TABLE robert.haas=> create table foo2 partition of foo for values from (10) to (20); CREATE TABLE robert.haas=> insert into foo values (11, 'carrots'), (16, 'pineapple'); INSERT 0 2 robert.haas=> create or replace function run_me(integer) returns integer as $$begin raise notice 'you are running me as %', current_user; return $1; end$$ language plpgsql immutable; CREATE FUNCTION robert.haas=> create index on foo (run_me(a)); NOTICE: you are running me as bob NOTICE: you are running me as bob CREATE INDEX robert.haas=> reset role; RESET robert.haas=# alter table foo split partition foo2 into (partition foo3 for values from (10) to (15), partition foo4 for values from (15) to (20)); NOTICE: you are running me as robert.haas NOTICE: you are running me as robert.haas ALTER TABLE I think it is very unlikely that the problems mentioned above are the only ones. They're just what I found in an hour or two of testing. Even if they were, we're probably too close to release to be rushing out last minute fixes to multiple unanticipated security problems. But because of the design that was chosen here, I think there is probably more stuff here that is not right, some of which is security relevant and some of which is just a question of whether we're really getting the behavior that we want. And I don't think we can fix all that without either a very large number of grotty hacks similar to the one installed by 04158e7fa37c2dda9c3421ca922d02807b86df19, or a complete redesign of the feature. I believe the latter is probably a wiser course of action. ...Robert
Hi! On Thu, Aug 22, 2024 at 7:33 PM Robert Haas <robertmhaas@gmail.com> wrote: > In response to some concerns raised about this fix on the > pgsql-release list today, I spent some time investigating this patch. > Unfortunately, I think there are too many problems here to be > reasonably fixed before release, and I think all of SPLIT/MERGE > PARTITION needs to be reverted. > > I focused my investigation on createPartitionTable(), which is a > helper for both SPLIT PARTITION and MERGE PARTITION, and it works by > consing up a CREATE TABLE AS statement and then feeding that back > through > ProcessUtility. I think it's bad design to use such a high-level > facility here; it is unlike what we do elsewhere in tablecmds.c and > opens us up to a variety of problems. The first thing that I > discovered is that this patch does not fix all of the repeated name > lookup problems. There is still this: > > tlc->relation = > makeRangeVar(get_namespace_name(RelationGetNamespace(modelRel)), > RelationGetRelationName(modelRel), -1); > > And also this: > > createStmt->tablespacename = > get_tablespace_name(modelRel->rd_rel->reltablespace); > > In both cases, we do a reverse lookup on an OID to get a name which > the CREATE TABLE code will later turn back into an OID. If we don't > get the same value, that's at least a bug and probably a security > vulnerability, and there is no way to be certain that we will get the > same value. The only remedy is to not repeat the lookup in the first > place. > > Then I got to looking at this: > > tlc->options = CREATE_TABLE_LIKE_ALL & > ~(CREATE_TABLE_LIKE_INDEXES | CREATE_TABLE_LIKE_IDENTITY | > CREATE_TABLE_LIKE_STATISTICS); > > It's not obvious at first glance that there is a critical problem > here, but there are reasons to be nervous. We're deploying a lot of > machinery here to copy a lot of stuff and, while that's efficient from > a coding perspective, it means that stuff you might not expect can > just kind of happen. For instance: > > robert.haas=# \d+ > List of relations > Schema | Name | Type | Owner | Persistence | > Access method | Size | Description > --------+------+-------------------+-------------+-------------+---------------+------------+------------- > public | foo | partitioned table | robert.haas | permanent | > | 0 bytes | > public | foo1 | table | robert.haas | permanent | heap > | 8192 bytes | > public | foo2 | table | bob | permanent | heap > | 8192 bytes | > (3 rows) > robert.haas=# alter table foo split partition foo2 into (partition > foo3 for values from (10) to (15), partition foo4 for values from (15) > to (20)); > ALTER TABLE > robert.haas=# \d+ > List of relations > Schema | Name | Type | Owner | Persistence | > Access method | Size | Description > --------+------+-------------------+-------------+-------------+---------------+------------+------------- > public | foo | partitioned table | robert.haas | permanent | > | 0 bytes | > public | foo1 | table | robert.haas | permanent | heap > | 8192 bytes | > public | foo3 | table | robert.haas | permanent | heap > | 8192 bytes | > public | foo4 | table | robert.haas | permanent | heap > | 8192 bytes | > (4 rows) > > I've split a partition owned by bob into two partitions owned by > robert.haas. That's rather surprising. It doesn't work to split a > partition that I don't own (and thus gain access to it) but if the > superuser splits a non-superuser's partition, the superuser ends > upowning the new partitions. I don't know if that's a vulnerability or > just unexpected. However, then I found this, which I'm pretty well > certain is a vulnerability: > > robert.haas=# set role bob; > SET > robert.haas=> create table foo (a int, b text) partition by range (a); > CREATE TABLE > robert.haas=> create table foo1 partition of foo for values from (0) to (10); > CREATE TABLE > robert.haas=> create table foo2 partition of foo for values from (10) to (20); > CREATE TABLE > robert.haas=> insert into foo values (11, 'carrots'), (16, 'pineapple'); > INSERT 0 2 > robert.haas=> create or replace function run_me(integer) returns > integer as $$begin raise notice 'you are running me as %', > current_user; return $1; end$$ language plpgsql immutable; > CREATE FUNCTION > robert.haas=> create index on foo (run_me(a)); > NOTICE: you are running me as bob > NOTICE: you are running me as bob > CREATE INDEX > robert.haas=> reset role; > RESET > robert.haas=# alter table foo split partition foo2 into (partition > foo3 for values from (10) to (15), partition foo4 for values from (15) > to (20)); > NOTICE: you are running me as robert.haas > NOTICE: you are running me as robert.haas > ALTER TABLE > > I think it is very unlikely that the problems mentioned above are the > only ones. They're just what I found in an hour or two of testing. > Even if they were, we're probably too close to release to be rushing > out last minute fixes to multiple unanticipated security problems. But > because of the design that was chosen here, I think there is probably > more stuff here that is not right, some of which is security relevant > and some of which is just a question of whether we're really getting > the behavior that we want. And I don't think we can fix all that > without either a very large number of grotty hacks similar to the one > installed by 04158e7fa37c2dda9c3421ca922d02807b86df19, or a complete > redesign of the feature. I believe the latter is probably a wiser > course of action. Thank you for your feedback. Yes, it seems that there is not enough time to even carefully analyze all the issues in these features. The rule of thumb I can get from this experience is "think multiple times before accessing something already opened by its name". I'm going to revert these features during next couple days. ------ Regards, Alexander Korotkov Supabase
On Thu, Aug 22, 2024 at 12:43 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Thank you for your feedback. Yes, it seems that there is not enough > time to even carefully analyze all the issues in these features. The > rule of thumb I can get from this experience is "think multiple times > before accessing something already opened by its name". I'm going to > revert these features during next couple days. Thanks, and sorry about that. I would say even "think multiple times" is possibly not strong enough -- it might almost be "just don't ever do it". Even if (in some particular case) the invalidation mechanism seems to protect you from getting wrong answers, there are often holes in that, specifically around search_path = foo, bar and you're operating on an object in schema bar and an identically-named object is created in schema foo at just the wrong time. Sometimes there are problems even when search_path is not involved, but when it is, there are more. Here, aside from the name lookup issues, there are also problems with expression evaluation: we can't split partitions without reindexing rows that those partitions contain, and it is critical to think through which is going to do the evaluation and make sure it's properly sandboxed. I think we might need SECURITY_RESTRICTED_OPERATION here. Another thing I want to highlight if you do have another go at this patch is that it's really critical to think about where every single property of the newly-created tables comes from. The original patch didn't consider relpersistence or tableam, and here I just discovered that owner is also an issue that probably needs more consideration, but it goes way beyond that. For example, I was surprised to discover that if I put per-partition constraints or triggers on a partition and then split it, they were not duplicated to the new partitions. Now, maybe that's actually the behavior we want -- I'm not 100% positive -- but it sure wasn't what I was expecting. If we did duplicate them when splitting, then what's supposed to happen when merging occurs? That is not at all obvious, at least to me, but it needs careful thought. ACLs and rules and default values and foreign keys (both outbond and inbound) all need to be considered too, along with 27 other things that I'm sure I'm not thinking about right now. Some of this behavior should probably be explicitly documented, but all of it should be considered carefully enough before commit to avoid surprises later. I say that both from a security point of view and also just from a user experience point of view. Even if things aren't insecure, they can still be annoying, but it's not uncommon in cases like this for annoying things to turn out to also be insecure. Finally, if you do revisit this, I believe it would be a good idea to think a bit harder about how data is moved around. My impression (and please correct me if I am mistaken) is that currently, any split or merge operation rewrites all the data in the source partition(s). If a large partition is being split nearly equally, I think that has a good chance of being optimal, but I think that might be the only case. If we're merging partitions, wouldn't it be better to adjust the constraints on the first partition -- or perhaps the largest partition if we want to be clever -- and insert the data from all of the others into it? Maybe that would even have syntax that puts the user in control of which partition survives, e.g. ALTER TABLE tab1 MERGE PARTITION part1 WITH part2, part3, .... That would also make it really obvious to the user what all of the properties of part1 will be after the merge: they will be exactly the same as they were before the merge, except that the partition constraint will have been adjusted. You basically dodge everything in the previous paragraph in one shot, and it seems like it would also be faster. Splitting there's no similar get-out-of-jail free card, at least not that I can see. Even if you add syntax that splits a partition by using INSERT/DELETE to move some rows to a newly-created partition, you still have to make at least one new partition. But possibly that syntax is worth having anyway, because it would be a lot quicker in the case of a highly asymmetric split. On the other hand, maybe even splits are much more likely and we don't really need it. I don't know. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Aug 22, 2024 at 8:25 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Aug 22, 2024 at 12:43 PM Alexander Korotkov > <aekorotkov@gmail.com> wrote: > > Thank you for your feedback. Yes, it seems that there is not enough > > time to even carefully analyze all the issues in these features. The > > rule of thumb I can get from this experience is "think multiple times > > before accessing something already opened by its name". I'm going to > > revert these features during next couple days. > > Thanks, and sorry about that. I would say even "think multiple times" > is possibly not strong enough -- it might almost be "just don't ever > do it". Even if (in some particular case) the invalidation mechanism > seems to protect you from getting wrong answers, there are often holes > in that, specifically around search_path = foo, bar and you're > operating on an object in schema bar and an identically-named object > is created in schema foo at just the wrong time. Sometimes there are > problems even when search_path is not involved, but when it is, there > are more. > > Here, aside from the name lookup issues, there are also problems with > expression evaluation: we can't split partitions without reindexing > rows that those partitions contain, and it is critical to think > through which is going to do the evaluation and make sure it's > properly sandboxed. I think we might need > SECURITY_RESTRICTED_OPERATION here. > > Another thing I want to highlight if you do have another go at this > patch is that it's really critical to think about where every single > property of the newly-created tables comes from. The original patch > didn't consider relpersistence or tableam, and here I just discovered > that owner is also an issue that probably needs more consideration, > but it goes way beyond that. For example, I was surprised to discover > that if I put per-partition constraints or triggers on a partition and > then split it, they were not duplicated to the new partitions. Now, > maybe that's actually the behavior we want -- I'm not 100% positive -- > but it sure wasn't what I was expecting. If we did duplicate them when > splitting, then what's supposed to happen when merging occurs? That is > not at all obvious, at least to me, but it needs careful thought. ACLs > and rules and default values and foreign keys (both outbond and > inbound) all need to be considered too, along with 27 other things > that I'm sure I'm not thinking about right now. Some of this behavior > should probably be explicitly documented, but all of it should be > considered carefully enough before commit to avoid surprises later. I > say that both from a security point of view and also just from a user > experience point of view. Even if things aren't insecure, they can > still be annoying, but it's not uncommon in cases like this for > annoying things to turn out to also be insecure. > > Finally, if you do revisit this, I believe it would be a good idea to > think a bit harder about how data is moved around. My impression (and > please correct me if I am mistaken) is that currently, any split or > merge operation rewrites all the data in the source partition(s). If a > large partition is being split nearly equally, I think that has a good > chance of being optimal, but I think that might be the only case. If > we're merging partitions, wouldn't it be better to adjust the > constraints on the first partition -- or perhaps the largest partition > if we want to be clever -- and insert the data from all of the others > into it? Maybe that would even have syntax that puts the user in > control of which partition survives, e.g. ALTER TABLE tab1 MERGE > PARTITION part1 WITH part2, part3, .... That would also make it really > obvious to the user what all of the properties of part1 will be after > the merge: they will be exactly the same as they were before the > merge, except that the partition constraint will have been adjusted. > You basically dodge everything in the previous paragraph in one shot, > and it seems like it would also be faster. Splitting there's no > similar get-out-of-jail free card, at least not that I can see. Even > if you add syntax that splits a partition by using INSERT/DELETE to > move some rows to a newly-created partition, you still have to make at > least one new partition. But possibly that syntax is worth having > anyway, because it would be a lot quicker in the case of a highly > asymmetric split. On the other hand, maybe even splits are much more > likely and we don't really need it. I don't know. Thank you for so valuable feedback! When I have another go over this patch I will ensure this is addressed. ------ Regards, Alexander Korotkov Supabase
On Tue, Aug 27, 2024 at 2:24 PM Dmitry Koval <d.koval@postgrespro.ru> wrote: > They contains changes from reverted commits 1adf16b8fb, 87c21bb941, and > subsequent fixes and improvements including df64c81ca9, c99ef1811a, > 9dfcac8e15, 885742b9f8, 842c9b2705, fcf80c5d5f, 96c7381c4c, f4fc7cb54b, > 60ae37a8bc, 259c96fa8f, 449cdcd486, 3ca43dbbb6, 2a679ae94e, 3a82c689fd, > fbd4321fd5, d53a4286d7, c086896625, 4e5d6c4091. > I didn't include fix 04158e7fa3 into patches because Robert Haas > objected to its use. To be clear, I'm not against 04158e7fa3. I just don't think it fixes everything. > 1. Function createPartitionTable() should be rewritten using partitioned > table OID (not name) and without using ProcessUtility(). Agree. > 2. Should it be considered an error when we split a partition owned by > another user and get partitions that owned by our user? > (I think this is not a problem. Perhaps disallow merging other users' > partitions would be too strict a restriction.) > > 3. About the functional index "create index on foo (run_me(a));". > (Should we disallow merging of another user's partitions when > partitioned table has functional indexes? SECURITY_RESTRICTED_OPERATION?) > > 4. Need to decide what is correct in case there are per-partition > constraints or triggers on a split partition. They not duplicated to the > new partitions now. (But might be in this case we should have an error > or warning?) I think we want to avoid giving errors or warnings. For all of these cases, and others, we need to consider what the expected behavior is, and have test cases and documentation as appropriate. But we shouldn't think of it as "let's make it fail if the user does something that's not safe" but rather "let's figure out how to make it safe." > 5. "If we're merging partitions, wouldn't it be better to adjust the > constraints on the first partition - or perhaps the largest partition if > we want to be clever -- and insert the data from all of the others into > it? Maybe that would even have syntax that puts the user in control of > which partition survives, e.g. ALTER TABLE tab1 MERGE PARTITION part1 > WITH part2, part3, .... That would also make it really obvious to the > user what all of the properties of part1 will be after the merge: they > will be exactly the same as they were before the merge, except that the > partition constraint will have been adjusted." > (Similar optimization was proposed in [3] but was rejected [4]). Interesting. Maybe it would be a good idea to set up some test cases to see which approach is better in different cases. Like try moving data from foo1 to foo2 with DELETE..INSERT vs. creating a new table with CTAS from foo1 UNION ALL foo2 and then indexing it. I think Alexander has a good point there, but I think my point is good too so I'm not sure which way wins. -- Robert Haas EDB: http://www.enterprisedb.com
Hi! On Fri, Aug 30, 2024 at 11:43 AM Dmitry Koval <d.koval@postgrespro.ru> wrote: > I plan to prepare fixes for issues from email [1] as separate commits > (for better code readability). Attachment in this email is a variant of > fix for the issue: > > > 1. Function createPartitionTable() should be rewritten using > > partitioned table OID (not name) and without using ProcessUtility(). > > Patch "Refactor createPartitionTable to remove ProcessUtility call" > contains code changes + test (see file > v33-0003-Refactor-createPartitionTable-to-remove-ProcessU.patch). > > But I'm not sure that refactoring createPartitionTable is the best > solution. PostgreSQL code has issue CVE-2014-0062 (commit 5f17304) - see > relation_openrv() call in expandTableLikeClause() function [2] (opening > relation by name after we got relation Oid). > Example for reproduce relation_openrv() call: > > CREATE TABLE t (b bigint, i int DEFAULT 100); > CREATE TABLE t1 (LIKE t_bigint INCLUDING ALL); > > Commit 04158e7fa3 [3] (by Alexander Korotkov) might be a good fix for > this issue. But if we keep commit 04158e7fa3, do we need to refactor the > createPartitionTable function (for removing ProcessUtility)? > Perhaps the existing code > 1) v33-0002-Implement-ALTER-TABLE-.-SPLIT-PARTITION-.-comman.patch > 2) v33-0003-Refactor-createPartitionTable-to-remove-ProcessU.patch + > with patch 04158e7fa3 will look better. > > > I would be very grateful for comments and suggestions. Thank you for continuing your work on the subject. The patches currently doesn't apply cleanly. Please, rebase. I think getting away from expandTableLikeClause() is the right direction to resolve the security problems. That looks great, it finally not as complex as I thought. I think the code requires some polishing: you need to revise the comments given its not part of LIKE clause handling anymore. I see fixes for the issues mentioned in [1] and [2] are still not implemented. Do you plan to do this in this release cycle? Links. 1. https://www.postgresql.org/message-id/CA%2BTgmoY0%3DbT_xBP8csR%3DMFE%3DFxGE2n2-me2-31jBOgEcLvW7ug%40mail.gmail.com 2. https://www.postgresql.org/message-id/859476bf-3cb0-455e-b093-b8ab5ef17f0e%40postgrespro.ru ------ Regards, Alexander Korotkov Supabase