Thread: [HACKERS] Adding support for Default partition in partitioning
postgres=# CREATE TABLE list_partitioned (
a int
) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
CREATE TABLE
postgres=# insert into list_partitioned values (9);
INSERT 0 1
postgres=# select * from part_default;
a
---
9
(1 row)
https://www.postgresql.org/
Kindly give your suggestions.
Attachment
On Wed, Mar 1, 2017 at 6:29 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: > 3. Handling adding a new partition to a partitioned table > with default partition. > This will require moving tuples from existing default partition to > newly created partition if they satisfy its partition bound. Considering that this patch was submitted at the last minute and isn't even complete, I can't see this getting into v10. But that doesn't mean we can't talk about it. I'm curious to hear other opinions on whether we should have this feature. On the point mentioned above, I don't think adding a partition should move tuples, necessarily; seems like it would be good enough - maybe better - for it to fail if there are any that would need to be moved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 03, 2017 at 08:10:52AM +0530, Robert Haas wrote: > On Wed, Mar 1, 2017 at 6:29 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: > > 3. Handling adding a new partition to a partitioned table > > with default partition. > > This will require moving tuples from existing default partition to > > newly created partition if they satisfy its partition bound. > > Considering that this patch was submitted at the last minute and isn't > even complete, I can't see this getting into v10. But that doesn't > mean we can't talk about it. I'm curious to hear other opinions on > whether we should have this feature. On the point mentioned above, I > don't think adding a partition should move tuples, necessarily; seems > like it would be good enough - maybe better - for it to fail if there > are any that would need to be moved. I see this as a bug fix. The current state of declarative partitions is such that you need way too much foresight in order to use them. Missed adding a partition? Writes fail and can't be made to succeed. This is not a failure mode we should be forcing on people, especially as it's a massive regression from the extant inheritance-based partitioning. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, Mar 1, 2017 at 6:29 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> 3. Handling adding a new partition to a partitioned table
> with default partition.
> This will require moving tuples from existing default partition to
> newly created partition if they satisfy its partition bound.
Considering that this patch was submitted at the last minute and isn't
even complete, I can't see this getting into v10. But that doesn't
mean we can't talk about it. I'm curious to hear other opinions on
whether we should have this feature. On the point mentioned above, I
don't think adding a partition should move tuples, necessarily; seems
like it would be good enough - maybe better - for it to fail if there
are any that would need to be moved.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com
On 3/7/17 10:30 AM, Keith Fiske wrote: > I'm all for this feature and had suggested it back in the original FWIW, I was working with a system just today that has an overflow partition. > thread to add partitioning to 10. I agree that adding a new partition > should not move any data out of the default. It's easy enough to set up +1 > a monitor to watch for data existing in the default. Perhaps also adding > a column to pg_partitioned_table that contains the oid of the default > partition so it's easier to identify from a system catalog perspective > and make that monitoring easier. I don't even see a need for it to fail I agree that there should be a way to identify the default partition. > either and not quite sure how that would even work? If they can't add a > necessary child due to data being in the default, how can they ever get > it out? Yeah, was wondering that as well... -- Jim Nasby, Chief Data Architect, OpenSCG http://OpenSCG.com
Wont it incur overhead of scanning the default partition for matching rows each time a select happens on any matching partition?
This extra scan would be required until rows satisfying the newly added partition are left around in default partition.
>I don't even see a need for it to fail either and not quite sure how that would even work? If they can't add a necessary child due to data being in the >default, how can they ever get it out? Just leave it to the user to keep an eye on the default and fix it as necessary.
On Thu, Mar 2, 2017 at 9:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Mar 1, 2017 at 6:29 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> 3. Handling adding a new partition to a partitioned table
> with default partition.
> This will require moving tuples from existing default partition to
> newly created partition if they satisfy its partition bound.
Considering that this patch was submitted at the last minute and isn't
even complete, I can't see this getting into v10. But that doesn't
mean we can't talk about it. I'm curious to hear other opinions on
whether we should have this feature. On the point mentioned above, I
don't think adding a partition should move tuples, necessarily; seems
like it would be good enough - maybe better - for it to fail if there
are any that would need to be moved.--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers I'm all for this feature and had suggested it back in the original thread to add partitioning to 10. I agree that adding a new partition should not move any data out of the default. It's easy enough to set up a monitor to watch for data existing in the default. Perhaps also adding a column to pg_partitioned_table that contains the oid of the default partition so it's easier to identify from a system catalog perspective and make that monitoring easier. I don't even see a need for it to fail either and not quite sure how that would even work? If they can't add a necessary child due to data being in the default, how can they ever get it out? Just leave it to the user to keep an eye on the default and fix it as necessary. This is what I do in pg_partman.
--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com
On 3/2/17 21:40, Robert Haas wrote: > On the point mentioned above, I > don't think adding a partition should move tuples, necessarily; seems > like it would be good enough - maybe better - for it to fail if there > are any that would need to be moved. ISTM that the uses cases of various combinations of adding a default partition, adding another partition after it, removing the default partition, clearing out the default partition in order to add more nondefault partitions, and so on, need to be more clearly spelled out for each partitioning type. We also need to consider that pg_dump and pg_upgrade need to be able to reproduce all those states. Seems to be a bit of work still ... -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 3/2/17 21:40, Robert Haas wrote: >> On the point mentioned above, I >> don't think adding a partition should move tuples, necessarily; seems >> like it would be good enough - maybe better - for it to fail if there >> are any that would need to be moved. > > ISTM that the uses cases of various combinations of adding a default > partition, adding another partition after it, removing the default > partition, clearing out the default partition in order to add more > nondefault partitions, and so on, need to be more clearly spelled out > for each partitioning type. We also need to consider that pg_dump and > pg_upgrade need to be able to reproduce all those states. Seems to be a > bit of work still ... This patch is only targeting list partitioning. It is not entirely clear that the concept makes sense for range partitioning; you can already define a partition from the end of the last partitioning up to infinity, or from minus-infinity up to the starting point of the first partition. The only thing a default range partition would do is let you do is have a single partition cover all of the ranges that would otherwise be unassigned, which might not even be something we want. I don't know how complete the patch is, but the specification seems clear enough. If you have partitions for 1, 3, and 5, you get partition constraints of (a = 1), (a = 3), and (a = 5). If you add a default partition, you get a constraint of (a != 1 and a != 3 and a != 5). If you then add a partition for 7, the default partition's constraint becomes (a != 1 and a != 3 and a != 5 and a != 7). The partition must be revalidated at that point for conflicting rows, which we can either try to move to the new partition, or just error out if there are any, depending on what we decide we want to do. I don't think any of that requires any special handling for either pg_dump or pg_upgrade; it all just falls out of getting the partitioning constraints correct and consistently enforcing them, just as for any other partition. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
to handle adding a new partition after a default partition by throwing an error if
Will post an updated patch by tomorrow.
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type. We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states. Seems to be a
> bit of work still ...
This patch is only targeting list partitioning. It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition. The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.
I don't know how complete the patch is, but the specification seems
clear enough. If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5). If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5). If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7). The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do. I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.
Attachment
partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
Const *val = lfirst(c);
^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
result = parent->indexes[partdesc->boundinfo->def_index];
^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
*failed_at = RelationGetRelid(parent->reldesc);
^
cc1: all warnings being treated as errors
/* List partitioning specific */
PartitionListValue **all_values = NULL;
bool found_null = false;
+ bool found_def = false;
+ int def_index = -1;
int null_index = -1;
/* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
i = 0;
found_null = false;
null_index = -1;
+ found_def = false;
+ def_index = -1;
foreach(cell, boundspecs)
{
ListCell *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel)
@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
bound = stringToNode(TextDatumGetCString(boundDatum));
ReleaseSysCache(tuple);
+ /* Return if it is a default list partition */
+ PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+ ListCell *cell;
+ foreach(cell, spec->listdatums)
get_qual_for_list().
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
Sconst { $$ = makeStringConst($1, @1); }
| NumericOnly { $$ = makeAConst($1, @1); }
| NULL_P { $$ = makeNullAConst(@1); }
+ | DEFAULT { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
;
@@ -2601,7 +2602,6 @@ partbound_datum_list:
| partbound_datum_list ',' partbound_datum
{ $$ = lappend($1, $3); }
;
-
Hello,Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error ifconflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.Thank you,Rahila SyedOn Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type. We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states. Seems to be a
> bit of work still ...
This patch is only targeting list partitioning. It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition. The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.
I don't know how complete the patch is, but the specification seems
clear enough. If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5). If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5). If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7). The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do. I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
>Rather then adding check for default here, I think this should be handle inside
>get_qual_for_list().
Apart from this, I was reading patch here are few more comments:I picked this for review and noticed that patch is not gettingcleanly complied on my environment.
partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
Const *val = lfirst(c);
^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
result = parent->indexes[partdesc->boundinfo->def_index];
^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
*failed_at = RelationGetRelid(parent->reldesc);
^
cc1: all warnings being treated as errors1) Variable initializing happening at two place.@@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
/* List partitioning specific */
PartitionListValue **all_values = NULL;
bool found_null = false;
+ bool found_def = false;
+ int def_index = -1;
int null_index = -1;
/* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
i = 0;
found_null = false;
null_index = -1;
+ found_def = false;
+ def_index = -1;
foreach(cell, boundspecs)
{
ListCell *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel) 2)
@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
bound = stringToNode(TextDatumGetCString( boundDatum));
ReleaseSysCache(tuple);
+ /* Return if it is a default list partition */
+ PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+ ListCell *cell;
+ foreach(cell, spec->listdatums)More comment on above hunk is needed?Rather then adding check for default here, I think this should be handle inside
get_qual_for_list().3) Code is not aligned with existing
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
Sconst { $$ = makeStringConst($1, @1); }
| NumericOnly { $$ = makeAConst($1, @1); }
| NULL_P { $$ = makeNullAConst(@1); }
+ | DEFAULT { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
;4) Unnecessary hunk:
@@ -2601,7 +2602,6 @@ partbound_datum_list:
| partbound_datum_list ',' partbound_datum
{ $$ = lappend($1, $3); }
;
-Note: this is just an initially review comments, I am yet to do the detailed reviewand the testing for the patch.Thanks.On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:--Hello,Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error ifconflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.Thank you,Rahila SyedOn Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type. We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states. Seems to be a
> bit of work still ...
This patch is only targeting list partitioning. It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition. The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.
I don't know how complete the patch is, but the specification seems
clear enough. If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5). If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5). If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7). The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do. I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--Rushabh Lathia
Attachment
ending up with server crash with the test shared by you in your starting mail:
postgres=# CREATE TABLE list_partitioned (
postgres(# a int
postgres(# ) PARTITION BY LIST (a);
CREATE TABLE
postgres=#
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Apart from this, few more explanation in the patch is needed to explain the
changes for the DEFAULT partition. Like I am not quite sure what exactly the
latest version of patch supports, like does that support the tuple row movement,
or adding new partition will be allowed having partition table having DEFAULT
partition, which is quite difficult to understand through the code changes.
Another part which is missing in the patch is the test coverage, adding
proper test coverage, which explain what is supported and what's not.
before calling get_qual_for_list() for default partitions.Hello Rushabh,Thank you for reviewing.Have addressed all your comments in the attached patch. The attached patch currently throws anerror if a new partition is added after default partition.Have moved the check inside get_qual_for_partbound() as needed to do some operations
>Rather then adding check for default here, I think this should be handle inside
>get_qual_for_list().Thank you,Rahila SyedOn Tue, Mar 21, 2017 at 11:36 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:Apart from this, I was reading patch here are few more comments:I picked this for review and noticed that patch is not gettingcleanly complied on my environment.
partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
Const *val = lfirst(c);
^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
result = parent->indexes[partdesc->boundinfo->def_index];
^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
*failed_at = RelationGetRelid(parent->reldesc);
^
cc1: all warnings being treated as errors1) Variable initializing happening at two place.@@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
/* List partitioning specific */
PartitionListValue **all_values = NULL;
bool found_null = false;
+ bool found_def = false;
+ int def_index = -1;
int null_index = -1;
/* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
i = 0;
found_null = false;
null_index = -1;
+ found_def = false;
+ def_index = -1;
foreach(cell, boundspecs)
{
ListCell *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel) 2)
@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
bound = stringToNode(TextDatumGetCString(boundDatum));
ReleaseSysCache(tuple);
+ /* Return if it is a default list partition */
+ PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+ ListCell *cell;
+ foreach(cell, spec->listdatums)More comment on above hunk is needed?Rather then adding check for default here, I think this should be handle inside
get_qual_for_list().3) Code is not aligned with existing
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
Sconst { $$ = makeStringConst($1, @1); }
| NumericOnly { $$ = makeAConst($1, @1); }
| NULL_P { $$ = makeNullAConst(@1); }
+ | DEFAULT { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
;4) Unnecessary hunk:
@@ -2601,7 +2602,6 @@ partbound_datum_list:
| partbound_datum_list ',' partbound_datum
{ $$ = lappend($1, $3); }
;
-Note: this is just an initially review comments, I am yet to do the detailed reviewand the testing for the patch.Thanks.On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:--Hello,Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error ifconflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.Thank you,Rahila SyedOn Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type. We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states. Seems to be a
> bit of work still ...
This patch is only targeting list partitioning. It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition. The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.
I don't know how complete the patch is, but the specification seems
clear enough. If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5). If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5). If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7). The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do. I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--Rushabh Lathia
--
I applied the patch and was trying to perform some testing, but its
ending up with server crash with the test shared by you in your starting mail:
postgres=# CREATE TABLE list_partitioned (
postgres(# a int
postgres(# ) PARTITION BY LIST (a);
CREATE TABLE
postgres=#
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Apart from this, few more explanation in the patch is needed to explain the
changes for the DEFAULT partition. Like I am not quite sure what exactly the
latest version of patch supports, like does that support the tuple row movement,
or adding new partition will be allowed having partition table having DEFAULT
partition, which is quite difficult to understand through the code changes.
Another part which is missing in the patch is the test coverage, adding
proper test coverage, which explain what is supported and what's not.Thanks,--On Fri, Mar 24, 2017 at 3:25 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:before calling get_qual_for_list() for default partitions.Hello Rushabh,Thank you for reviewing.Have addressed all your comments in the attached patch. The attached patch currently throws anerror if a new partition is added after default partition.Have moved the check inside get_qual_for_partbound() as needed to do some operations
>Rather then adding check for default here, I think this should be handle inside
>get_qual_for_list().Thank you,Rahila SyedOn Tue, Mar 21, 2017 at 11:36 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:Apart from this, I was reading patch here are few more comments:I picked this for review and noticed that patch is not gettingcleanly complied on my environment.
partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
Const *val = lfirst(c);
^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
result = parent->indexes[partdesc->boundinfo->def_index];
^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
*failed_at = RelationGetRelid(parent->reldesc);
^
cc1: all warnings being treated as errors1) Variable initializing happening at two place.@@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
/* List partitioning specific */
PartitionListValue **all_values = NULL;
bool found_null = false;
+ bool found_def = false;
+ int def_index = -1;
int null_index = -1;
/* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
i = 0;
found_null = false;
null_index = -1;
+ found_def = false;
+ def_index = -1;
foreach(cell, boundspecs)
{
ListCell *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel) 2)
@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
bound = stringToNode(TextDatumGetCString(boundDatum));
ReleaseSysCache(tuple);
+ /* Return if it is a default list partition */
+ PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+ ListCell *cell;
+ foreach(cell, spec->listdatums)More comment on above hunk is needed?Rather then adding check for default here, I think this should be handle inside
get_qual_for_list().3) Code is not aligned with existing
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
Sconst { $$ = makeStringConst($1, @1); }
| NumericOnly { $$ = makeAConst($1, @1); }
| NULL_P { $$ = makeNullAConst(@1); }
+ | DEFAULT { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
;4) Unnecessary hunk:
@@ -2601,7 +2602,6 @@ partbound_datum_list:
| partbound_datum_list ',' partbound_datum
{ $$ = lappend($1, $3); }
;
-Note: this is just an initially review comments, I am yet to do the detailed reviewand the testing for the patch.Thanks.On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:--Hello,Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error ifconflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.Thank you,Rahila SyedOn Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type. We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states. Seems to be a
> bit of work still ...
This patch is only targeting list partitioning. It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition. The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.
I don't know how complete the patch is, but the specification seems
clear enough. If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5). If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5). If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7). The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do. I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--Rushabh LathiaRushabh Lathia
I applied the patch and was trying to perform some testing, but its
ending up with server crash with the test shared by you in your starting mail:
postgres=# CREATE TABLE list_partitioned (
postgres(# a int
postgres(# ) PARTITION BY LIST (a);
CREATE TABLE
postgres=#
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Apart from this, few more explanation in the patch is needed to explain the
changes for the DEFAULT partition. Like I am not quite sure what exactly the
latest version of patch supports, like does that support the tuple row movement,
or adding new partition will be allowed having partition table having DEFAULT
partition, which is quite difficult to understand through the code changes.
Another part which is missing in the patch is the test coverage, adding
proper test coverage, which explain what is supported and what's not.Thanks,--On Fri, Mar 24, 2017 at 3:25 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:before calling get_qual_for_list() for default partitions.Hello Rushabh,Thank you for reviewing.Have addressed all your comments in the attached patch. The attached patch currently throws anerror if a new partition is added after default partition.Have moved the check inside get_qual_for_partbound() as needed to do some operations
>Rather then adding check for default here, I think this should be handle inside
>get_qual_for_list().Thank you,Rahila SyedOn Tue, Mar 21, 2017 at 11:36 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:Apart from this, I was reading patch here are few more comments:I picked this for review and noticed that patch is not gettingcleanly complied on my environment.
partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
Const *val = lfirst(c);
^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
result = parent->indexes[partdesc->boundinfo->def_index];
^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
*failed_at = RelationGetRelid(parent->reldesc);
^
cc1: all warnings being treated as errors1) Variable initializing happening at two place.@@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
/* List partitioning specific */
PartitionListValue **all_values = NULL;
bool found_null = false;
+ bool found_def = false;
+ int def_index = -1;
int null_index = -1;
/* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
i = 0;
found_null = false;
null_index = -1;
+ found_def = false;
+ def_index = -1;
foreach(cell, boundspecs)
{
ListCell *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel) 2)
@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
bound = stringToNode(TextDatumGetCString(boundDatum));
ReleaseSysCache(tuple);
+ /* Return if it is a default list partition */
+ PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+ ListCell *cell;
+ foreach(cell, spec->listdatums)More comment on above hunk is needed?Rather then adding check for default here, I think this should be handle inside
get_qual_for_list().3) Code is not aligned with existing
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
Sconst { $$ = makeStringConst($1, @1); }
| NumericOnly { $$ = makeAConst($1, @1); }
| NULL_P { $$ = makeNullAConst(@1); }
+ | DEFAULT { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
;4) Unnecessary hunk:
@@ -2601,7 +2602,6 @@ partbound_datum_list:
| partbound_datum_list ',' partbound_datum
{ $$ = lappend($1, $3); }
;
-Note: this is just an initially review comments, I am yet to do the detailed reviewand the testing for the patch.Thanks.On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:--Hello,Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error ifconflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.Thank you,Rahila SyedOn Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type. We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states. Seems to be a
> bit of work still ...
This patch is only targeting list partitioning. It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition. The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.
I don't know how complete the patch is, but the specification seems
clear enough. If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5). If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5). If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7). The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do. I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--Rushabh LathiaRushabh Lathia
On 3/29/17 8:13 AM, Rahila Syed wrote: > Thanks for reporting. I have identified the problem and have a fix. > Currently working on allowing > adding a partition after default partition if the default partition does > not have any conflicting rows. > Will update the patch with both of these. The CF has been extended but until April 7 but time is still growing short. Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David david@pgmasters.net
On 3/31/17 10:45 AM, David Steele wrote: > On 3/29/17 8:13 AM, Rahila Syed wrote: > >> Thanks for reporting. I have identified the problem and have a fix. >> Currently working on allowing >> adding a partition after default partition if the default partition does >> not have any conflicting rows. >> Will update the patch with both of these. > > The CF has been extended but until April 7 but time is still growing > short. Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) > or this submission will be marked "Returned with Feedback". This submission has been marked "Returned with Feedback". Please feel free to resubmit to a future commitfest. Regards, -- -David david@pgmasters.net
On 3/31/17 10:45 AM, David Steele wrote:
> On 3/29/17 8:13 AM, Rahila Syed wrote:
>
>> Thanks for reporting. I have identified the problem and have a fix.
>> Currently working on allowing
>> adding a partition after default partition if the default partition does
>> not have any conflicting rows.
>> Will update the patch with both of these.
>
> The CF has been extended but until April 7 but time is still growing
> short. Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12)
> or this submission will be marked "Returned with Feedback".
This submission has been marked "Returned with Feedback". Please feel
free to resubmit to a future commitfest.
Regards,
--
-David
david@pgmasters.net
Attachment
1. A new partition can be added after default partition if there are no conflicting rows in default partition.Following has been accomplished in this update:Hello,Please find attached an updated patch.2. Solved the crash reported earlier.Thank you,Rahila Syed
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
keith@keith=# CREATE TABLE list_partitioned (
keith(# a int
keith(# ) PARTITION BY LIST (a);
CREATE TABLE
Time: 4.933 ms
keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE
Time: 3.492 ms
keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
ERROR: unrecognized node type: 216
Time: 0.979 ms
On 2017/04/05 6:22, Keith Fiske wrote: > On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote: >> Please find attached an updated patch. >> Following has been accomplished in this update: >> >> 1. A new partition can be added after default partition if there are no >> conflicting rows in default partition. >> 2. Solved the crash reported earlier. > > Installed and compiled against commit > 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017 > -0400) without any issue > > However, running your original example, I'm getting this error > > keith@keith=# CREATE TABLE list_partitioned ( > keith(# a int > keith(# ) PARTITION BY LIST (a); > CREATE TABLE > Time: 4.933 ms > keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR > VALUES IN (DEFAULT); > CREATE TABLE > Time: 3.492 ms > keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES > IN (4,5); > ERROR: unrecognized node type: 216 It seems like the new ExecPrepareCheck should be used in the place of ExecPrepareExpr in the code added in check_new_partition_bound(). > Also, I'm still of the opinion that denying future partitions of values in > the default would be a cause of confusion. In order to move the data out of > the default and into a proper child it would require first removing that > data from the default, storing it elsewhere, creating the child, then > moving it back. If it's only a small amount of data it may not be a big > deal, but if it's a large amount, that could cause quite a lot of > contention if done in a single transaction. Either that or the user would > have to deal with data existing in the table, disappearing, then > reappearing again. > > This also makes it harder to migrate an existing table easily. Essentially > no child tables for a large, existing data set could ever be created > without going through one of the two situations above. I thought of the following possible way to allow future partitions when the default partition exists which might contain rows that belong to the newly created partition (unfortunately, nothing that we could implement at this point for v10): Suppose you want to add a new partition which will accept key=3 rows. 1. If no default partition exists, we're done; no key=3 rows would have been accepted by any of the table's existing partitions,so no need to move any rows 2. Default partition exists which might contain key=3 rows, which we'll need to move. If we do this in the same transaction,as you say, it might result in unnecessary unavailability of table's data. So, it's better to delegate thatresponsibility to a background process. The current transaction will only add the new key=3 partition, so any key=3 rows will be routed to the new partition from this point on. But we haven't updated the default partition's constraintyet to say that it no longer contains key=3 rows (constraint that the planner consumes), so it will continueto be scanned for queries that request key=3 rows (there should be some metadata to indicate that the default partition's constraint is invalid), along with the new partition. 3. A background process receives a "work item" requesting it to move all key=3 rows from the default partition heap to thenew partition's heap. The movement occurs without causing the table to become unavailable; once all rows have been moved,we momentarily lock the table to update the default partition's constraint to mark it valid, so that it will no longerbe accessed by queries that want to see key=3 rows. Regarding 2, there is a question of whether it should not be possible for the row movement to occur in the same transaction. Somebody may want that to happen because they chose to run the command during a maintenance window, when the table's becoming unavailable is not an issue. In that case, we need to think of the interface more carefully. Regarding 3, I think the new autovacuum work items infrastructure added by the following commit looks very promising: * BRIN auto-summarization * https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7526e10224f0792201e99631567bbe44492bbde4 > However, thinking through this, I'm not sure I can see a solution without > the global index support. If this restriction is removed, there's still an > issue of data duplication after the necessary child table is added. So > guess it's a matter of deciding which user experience is better for the > moment? I'm not sure about the fate of this particular patch for v10, but until we implement a solution to move rows and design appropriate interface for the same, we could error out if moving rows is required at all, like the patch does. Could you briefly elaborate why you think the lack global index support would be a problem in this regard? I agree that some design is required here to implement a solution redistribution of rows; not only in the context of supporting the notion of default partitions, but also to allow the feature to split/merge range (only?) partitions. I'd like to work on the latter for v11 for which I would like to post a proposal soon; if anyone would like to collaborate (ideas, code, review), I look forward to. (sorry for hijacking this thread.) Thanks, Amit
On 2017/04/05 6:22, Keith Fiske wrote:
> On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
>> Please find attached an updated patch.
>> Following has been accomplished in this update:
>>
>> 1. A new partition can be added after default partition if there are no
>> conflicting rows in default partition.
>> 2. Solved the crash reported earlier.
>
> Installed and compiled against commit
> 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> -0400) without any issue
>
> However, running your original example, I'm getting this error
>
> keith@keith=# CREATE TABLE list_partitioned (
> keith(# a int
> keith(# ) PARTITION BY LIST (a);
> CREATE TABLE
> Time: 4.933 ms
> keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE
> Time: 3.492 ms
> keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
> IN (4,5);
> ERROR: unrecognized node type: 216
It seems like the new ExecPrepareCheck should be used in the place of
ExecPrepareExpr in the code added in check_new_partition_bound().
> Also, I'm still of the opinion that denying future partitions of values in
> the default would be a cause of confusion. In order to move the data out of
> the default and into a proper child it would require first removing that
> data from the default, storing it elsewhere, creating the child, then
> moving it back. If it's only a small amount of data it may not be a big
> deal, but if it's a large amount, that could cause quite a lot of
> contention if done in a single transaction. Either that or the user would
> have to deal with data existing in the table, disappearing, then
> reappearing again.
>
> This also makes it harder to migrate an existing table easily. Essentially
> no child tables for a large, existing data set could ever be created
> without going through one of the two situations above.
I thought of the following possible way to allow future partitions when
the default partition exists which might contain rows that belong to the
newly created partition (unfortunately, nothing that we could implement at
this point for v10):
Suppose you want to add a new partition which will accept key=3 rows.
1. If no default partition exists, we're done; no key=3 rows would have
been accepted by any of the table's existing partitions, so no need to
move any rows
2. Default partition exists which might contain key=3 rows, which we'll
need to move. If we do this in the same transaction, as you say, it
might result in unnecessary unavailability of table's data. So, it's
better to delegate that responsibility to a background process. The
current transaction will only add the new key=3 partition, so any key=3
rows will be routed to the new partition from this point on. But we
haven't updated the default partition's constraint yet to say that it
no longer contains key=3 rows (constraint that the planner consumes),
so it will continue to be scanned for queries that request key=3 rows
(there should be some metadata to indicate that the default partition's
constraint is invalid), along with the new partition.
3. A background process receives a "work item" requesting it to move all
key=3 rows from the default partition heap to the new partition's heap.
The movement occurs without causing the table to become unavailable;
once all rows have been moved, we momentarily lock the table to update
the default partition's constraint to mark it valid, so that it will
no longer be accessed by queries that want to see key=3 rows.
Regarding 2, there is a question of whether it should not be possible for
the row movement to occur in the same transaction. Somebody may want that
to happen because they chose to run the command during a maintenance
window, when the table's becoming unavailable is not an issue. In that
case, we need to think of the interface more carefully.
Regarding 3, I think the new autovacuum work items infrastructure added by
the following commit looks very promising:
* BRIN auto-summarization *
https://git.postgresql.org/gitweb/?p=postgresql.git;a= commit;h= 7526e10224f0792201e99631567bbe 44492bbde4
> However, thinking through this, I'm not sure I can see a solution without
> the global index support. If this restriction is removed, there's still an
> issue of data duplication after the necessary child table is added. So
> guess it's a matter of deciding which user experience is better for the
> moment?
I'm not sure about the fate of this particular patch for v10, but until we
implement a solution to move rows and design appropriate interface for the
same, we could error out if moving rows is required at all, like the patch
does.
Could you briefly elaborate why you think the lack global index support
would be a problem in this regard?
I agree that some design is required here to implement a solution
redistribution of rows; not only in the context of supporting the notion
of default partitions, but also to allow the feature to split/merge range
(only?) partitions. I'd like to work on the latter for v11 for which I
would like to post a proposal soon; if anyone would like to collaborate
(ideas, code, review), I look forward to. (sorry for hijacking this thread.)
Thanks,
Amit
>However, running your original example, I'm getting this error
Attachment
On 2017/04/05 14:41, Rushabh Lathia wrote: > I agree about the future plan about the row movement, how that is I am > not quite sure at this stage. > > I was thinking that CREATE new partition is the DDL command, so even > if row-movement works with holding the lock on the new partition table, > that should be fine. I am not quire sure, why row movement should be > happen in the back-ground process. I think to improve the availability of access to the partitioned table. Consider that the default partition may have gotten pretty large. Scanning it and moving rows to the newly added partition while holding an AccessExclusiveLock on the parent will block any and all of the concurrent activity on it until the row-movement is finished. One may be prepared to pay this cost, for which there should definitely be an option to perform the row-movement in the same transaction (also possibly the default behavior). Thanks, Amit
>would be a problem in this regard?
default partition until background process moves it.
Consider a scenario where partition key is a primary key and the data in the default partition is
not yet moved into the newly added partition. If now, new data is added into the new partition
which also exists(same key) in default partition there will be data duplication. If now
we scan the partitioned table for that key(from both the default and new partition as we
have not moved the rows) it will fetch the both rows.
Unless we have global indexes for partitioned tables, there is chance of data duplication between
child table added after default partition and the default partition.
>Scanning it and moving rows to the newly added partition while holding an
>AccessExclusiveLock on the parent will block any and all of the concurrent
>activity on it until the row-movement is finished.
On 2017/04/05 14:41, Rushabh Lathia wrote:
> I agree about the future plan about the row movement, how that is I am
> not quite sure at this stage.
>
> I was thinking that CREATE new partition is the DDL command, so even
> if row-movement works with holding the lock on the new partition table,
> that should be fine. I am not quire sure, why row movement should be
> happen in the back-ground process.
I think to improve the availability of access to the partitioned table.
Consider that the default partition may have gotten pretty large.
Scanning it and moving rows to the newly added partition while holding an
AccessExclusiveLock on the parent will block any and all of the concurrent
activity on it until the row-movement is finished. One may be prepared to
pay this cost, for which there should definitely be an option to perform
the row-movement in the same transaction (also possibly the default behavior).
Thanks,
Amit
Hi Rahila, On 2017/04/05 18:57, Rahila Syed wrote: > Hello Amit, > >> Could you briefly elaborate why you think the lack global index support >> would be a problem in this regard? > I think following can happen if we allow rows satisfying the new partition > to lie around in the > default partition until background process moves it. > Consider a scenario where partition key is a primary key and the data in > the default partition is > not yet moved into the newly added partition. If now, new data is added > into the new partition > which also exists(same key) in default partition there will be data > duplication. If now > we scan the partitioned table for that key(from both the default and new > partition as we > have not moved the rows) it will fetch the both rows. > Unless we have global indexes for partitioned tables, there is chance of > data duplication between > child table added after default partition and the default partition. Ah, okay. I think I wrote that question before even reading the next sentence in Keith's message ("there's still an issue of data duplication after the necessary child table is added.") Maybe we can disallow background row movement if such global constraint exists. >> Scanning it and moving rows to the newly added partition while holding an >> AccessExclusiveLock on the parent will block any and all of the concurrent >> activity on it until the row-movement is finished. > Can you explain why this will require AccessExclusiveLock on parent and > not just the default partition and newly added partition? Because we take an AccessExclusiveLock on the parent table when adding/removing a partition in general. We do that because concurrent accessors of the parent table rely on its partition descriptor from not changing under them. Thanks, Amit
On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >>Could you briefly elaborate why you think the lack global index support >>would be a problem in this regard? > I think following can happen if we allow rows satisfying the new partition > to lie around in the > default partition until background process moves it. > Consider a scenario where partition key is a primary key and the data in the > default partition is > not yet moved into the newly added partition. If now, new data is added into > the new partition > which also exists(same key) in default partition there will be data > duplication. If now > we scan the partitioned table for that key(from both the default and new > partition as we > have not moved the rows) it will fetch the both rows. > Unless we have global indexes for partitioned tables, there is chance of > data duplication between > child table added after default partition and the default partition. Yes, I think it would be completely crazy to try to migrate the data in the background: - The migration might never complete because of a UNIQUE or CHECK constraint on the partition to which rows are being migrated. - Even if the migration eventually succeeded, such a design abandons all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly while the migration is in progress, unless the new partition has no UNIQUE constraints. - Partition-wise join and partition-wise aggregate would need to have special case handling for the case of an unfinished migration, as would any user code that accesses partitions directly. - More generally, I think users expect that when a DDL command finishes execution, it's done all of the work that there is to do (or at the very least, that any remaining work has no user-visible consequences, which would not be the case here). IMV, the question of whether we have efficient ways to move data around between partitions is somewhat separate from the question of whether partitions can be defined in a certain way in the first place. The problems that Keith refers to upthread already exist for subpartitioning; you've got to detach the old partition, create a new one, and then reinsert the data. And for partitioning an unpartitioned table: create a replacement table, insert all the data, substitute it for the original table. The fact that we have these limitation is not good, but we're not going to rip out partitioning entirely because we don't have clever ways of migrating the data in those cases, and the proposed behavior here is not any worse. Also, waiting for those problems to get fixed might be waiting for Godot. I'm not really all that sanguine about our chances of coming up with a really nice way of handling these cases. In a designed based on table inheritance, you can leave it murky where certain data is supposed to end up and migrate it on-line and you might get away with that, but a major point of having declarative partitioning at all is to remove that sort of murkiness. It's probably not that hard to come up with a command that locks the parent and moves data around via full table scans, but I'm not sure how far that really gets us; you could do the same thing easily enough with a sequence of commands generated via a script. And being able to do this in a general way without a full table lock looks pretty hard - it doesn't seem fundamentally different from trying to perform a table-rewriting operation like CLUSTER without a full table lock, which we also don't support. The executor is not built to cope with any aspect of the table definition shifting under it, and that includes the set of child tables with are partitions of the table mentioned in the query. Maybe the executor can be taught to survive such definitional changes at least in limited cases, but that's a much different project than allowing default partitions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>Could you briefly elaborate why you think the lack global index support
>>would be a problem in this regard?
> I think following can happen if we allow rows satisfying the new partition
> to lie around in the
> default partition until background process moves it.
> Consider a scenario where partition key is a primary key and the data in the
> default partition is
> not yet moved into the newly added partition. If now, new data is added into
> the new partition
> which also exists(same key) in default partition there will be data
> duplication. If now
> we scan the partitioned table for that key(from both the default and new
> partition as we
> have not moved the rows) it will fetch the both rows.
> Unless we have global indexes for partitioned tables, there is chance of
> data duplication between
> child table added after default partition and the default partition.
Yes, I think it would be completely crazy to try to migrate the data
in the background:
- The migration might never complete because of a UNIQUE or CHECK
constraint on the partition to which rows are being migrated.
- Even if the migration eventually succeeded, such a design abandons
all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly
while the migration is in progress, unless the new partition has no
UNIQUE constraints.
- Partition-wise join and partition-wise aggregate would need to have
special case handling for the case of an unfinished migration, as
would any user code that accesses partitions directly.
- More generally, I think users expect that when a DDL command
finishes execution, it's done all of the work that there is to do (or
at the very least, that any remaining work has no user-visible
consequences, which would not be the case here).
IMV, the question of whether we have efficient ways to move data
around between partitions is somewhat separate from the question of
whether partitions can be defined in a certain way in the first place.
The problems that Keith refers to upthread already exist for
subpartitioning; you've got to detach the old partition, create a new
one, and then reinsert the data. And for partitioning an
unpartitioned table: create a replacement table, insert all the data,
substitute it for the original table. The fact that we have these
limitation is not good, but we're not going to rip out partitioning
entirely because we don't have clever ways of migrating the data in
those cases, and the proposed behavior here is not any worse.
Also, waiting for those problems to get fixed might be waiting for
Godot. I'm not really all that sanguine about our chances of coming
up with a really nice way of handling these cases. In a designed
based on table inheritance, you can leave it murky where certain data
is supposed to end up and migrate it on-line and you might get away
with that, but a major point of having declarative partitioning at all
is to remove that sort of murkiness. It's probably not that hard to
come up with a command that locks the parent and moves data around via
full table scans, but I'm not sure how far that really gets us; you
could do the same thing easily enough with a sequence of commands
generated via a script. And being able to do this in a general way
without a full table lock looks pretty hard - it doesn't seem
fundamentally different from trying to perform a table-rewriting
operation like CLUSTER without a full table lock, which we also don't
support. The executor is not built to cope with any aspect of the
table definition shifting under it, and that includes the set of child
tables with are partitions of the table mentioned in the query. Maybe
the executor can be taught to survive such definitional changes at
least in limited cases, but that's a much different project than
allowing default partitions.
keith@keith=# drop table cities;
DROP TABLE
Time: 6.055 ms
keith@keith=# CREATE TABLE cities (
city_id bigserial not null,
name text not null,
population int
) PARTITION BY LIST (initcap(name));
CREATE TABLE
Time: 7.130 ms
keith@keith=# CREATE TABLE cities_west
PARTITION OF cities (
CONSTRAINT city_id_nonzero CHECK (city_id != 0)
) FOR VALUES IN ('Los Angeles', 'San Francisco');
CREATE TABLE
Time: 6.690 ms
keith@keith=# CREATE TABLE cities_default
keith-# PARTITION OF cities FOR VALUES IN (DEFAULT);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
Failed.
Time: 387.887 ms
After reading responses, I think I'd be fine with how Rahila implemented this with disallowing the child until the data is removed from the default if this would allow it to be included in v10. As was mentioned, there just doesn't seem to be a way to easily handle the data conflicts cleanly at this time, but I think the value of the default to be able to catch accidental data vs returning an error is worth it. It also at least gives a slightly easier migration path vs having to migrate to a completely new table. Any chance this could be adapted for range partitioning as well? I'd be happy to create some pgtap tests with pg_partman for this then to make sure it works.
On 2017/04/06 0:19, Robert Haas wrote: > On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >>> Could you briefly elaborate why you think the lack global index support >>> would be a problem in this regard? >> I think following can happen if we allow rows satisfying the new partition >> to lie around in the >> default partition until background process moves it. >> Consider a scenario where partition key is a primary key and the data in the >> default partition is >> not yet moved into the newly added partition. If now, new data is added into >> the new partition >> which also exists(same key) in default partition there will be data >> duplication. If now >> we scan the partitioned table for that key(from both the default and new >> partition as we >> have not moved the rows) it will fetch the both rows. >> Unless we have global indexes for partitioned tables, there is chance of >> data duplication between >> child table added after default partition and the default partition. > > Yes, I think it would be completely crazy to try to migrate the data > in the background: > > - The migration might never complete because of a UNIQUE or CHECK > constraint on the partition to which rows are being migrated. > > - Even if the migration eventually succeeded, such a design abandons > all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly > while the migration is in progress, unless the new partition has no > UNIQUE constraints. > > - Partition-wise join and partition-wise aggregate would need to have > special case handling for the case of an unfinished migration, as > would any user code that accesses partitions directly. > > - More generally, I think users expect that when a DDL command > finishes execution, it's done all of the work that there is to do (or > at the very least, that any remaining work has no user-visible > consequences, which would not be the case here). OK, I realize the background migration was a poorly thought out idea. And a *first* version that will handle the row-movement should be doing that as part of the same command anyway. > IMV, the question of whether we have efficient ways to move data > around between partitions is somewhat separate from the question of > whether partitions can be defined in a certain way in the first place. > The problems that Keith refers to upthread already exist for > subpartitioning; you've got to detach the old partition, create a new > one, and then reinsert the data. And for partitioning an > unpartitioned table: create a replacement table, insert all the data, > substitute it for the original table. The fact that we have these > limitation is not good, but we're not going to rip out partitioning > entirely because we don't have clever ways of migrating the data in > those cases, and the proposed behavior here is not any worse. > > Also, waiting for those problems to get fixed might be waiting for > Godot. I'm not really all that sanguine about our chances of coming > up with a really nice way of handling these cases. In a designed > based on table inheritance, you can leave it murky where certain data > is supposed to end up and migrate it on-line and you might get away > with that, but a major point of having declarative partitioning at all > is to remove that sort of murkiness. It's probably not that hard to > come up with a command that locks the parent and moves data around via > full table scans, but I'm not sure how far that really gets us; you > could do the same thing easily enough with a sequence of commands > generated via a script. And being able to do this in a general way > without a full table lock looks pretty hard - it doesn't seem > fundamentally different from trying to perform a table-rewriting > operation like CLUSTER without a full table lock, which we also don't > support. The executor is not built to cope with any aspect of the > table definition shifting under it, and that includes the set of child > tables with are partitions of the table mentioned in the query. Maybe > the executor can be taught to survive such definitional changes at > least in limited cases, but that's a much different project than > allowing default partitions. Agreed. Thanks, Amit
Only issue I see with this, and I'm not sure if it is an issue, is what happens to that default constraint clause when 1000s of partitions start getting added? From what I gather the default's constraint is built based off the cumulative opposite of all other child constraints. I don't understand the code well enough to see what it's actually doing, but if there are no gaps, is the method used smart enough to aggregate all the child constraints to make a simpler constraint that is simply outside the current min/max boundaries? If so, for serial/time range partitioning this should typically work out fine since there are rarely gaps. This actually seems more of an issue for list partitioning where each child is a distinct value or range of values that are completely arbitrary. Won't that check and re-evaluation of the default's constraint just get worse and worse as more children are added? Is there really even a need for the default to have an opposite constraint like this? Not sure on how the planner works with partitioning now, but wouldn't it be better to first check all non-default children for a match the same as it does now without a default and, failing that, then route to the default if one is declared? The default should accept any data then so I don't see the need for the constraint unless it's required for the current implementation. If that's the case, could that be changed?Keith
> On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>> Could you briefly elaborate why you think the lack global index support
>>> would be a problem in this regard?
>> I think following can happen if we allow rows satisfying the new partition
>> to lie around in the
>> default partition until background process moves it.
>> Consider a scenario where partition key is a primary key and the data in the
>> default partition is
>> not yet moved into the newly added partition. If now, new data is added into
>> the new partition
>> which also exists(same key) in default partition there will be data
>> duplication. If now
>> we scan the partitioned table for that key(from both the default and new
>> partition as we
>> have not moved the rows) it will fetch the both rows.
>> Unless we have global indexes for partitioned tables, there is chance of
>> data duplication between
>> child table added after default partition and the default partition.
>
> Yes, I think it would be completely crazy to try to migrate the data
> in the background:
>
> - The migration might never complete because of a UNIQUE or CHECK
> constraint on the partition to which rows are being migrated.
>
> - Even if the migration eventually succeeded, such a design abandons
> all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly
> while the migration is in progress, unless the new partition has no
> UNIQUE constraints.
>
> - Partition-wise join and partition-wise aggregate would need to have
> special case handling for the case of an unfinished migration, as
> would any user code that accesses partitions directly.
>
> - More generally, I think users expect that when a DDL command
> finishes execution, it's done all of the work that there is to do (or
> at the very least, that any remaining work has no user-visible
> consequences, which would not be the case here).
On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske <keith@omniti.com> wrote:Only issue I see with this, and I'm not sure if it is an issue, is what happens to that default constraint clause when 1000s of partitions start getting added? From what I gather the default's constraint is built based off the cumulative opposite of all other child constraints. I don't understand the code well enough to see what it's actually doing, but if there are no gaps, is the method used smart enough to aggregate all the child constraints to make a simpler constraint that is simply outside the current min/max boundaries? If so, for serial/time range partitioning this should typically work out fine since there are rarely gaps. This actually seems more of an issue for list partitioning where each child is a distinct value or range of values that are completely arbitrary. Won't that check and re-evaluation of the default's constraint just get worse and worse as more children are added? Is there really even a need for the default to have an opposite constraint like this? Not sure on how the planner works with partitioning now, but wouldn't it be better to first check all non-default children for a match the same as it does now without a default and, failing that, then route to the default if one is declared? The default should accept any data then so I don't see the need for the constraint unless it's required for the current implementation. If that's the case, could that be changed?KeithActually, thinking on this more, I realized this does again come back to the lack of a global index. Without the constraint, data could be put directly into the default that could technically conflict with the partition scheme elsewhere. Perhaps, instead of the constraint, inserts directly to the default could be prevented on the user level. Writing to valid children directly certainly has its place, but been thinking about it, and I can't see any reason why one would ever want to write directly to the default. It's use case seems to be around being a sort of temporary storage until that data can be moved to a valid location. Would still need to allow removal of data, though.Not sure if that's even a workable solution. Just trying to think of ways around the current limitations and still allow this feature.
On 2017/04/06 13:08, Keith Fiske wrote: > On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske wrote: >> Only issue I see with this, and I'm not sure if it is an issue, is what >> happens to that default constraint clause when 1000s of partitions start >> getting added? From what I gather the default's constraint is built based >> off the cumulative opposite of all other child constraints. I don't >> understand the code well enough to see what it's actually doing, but if >> there are no gaps, is the method used smart enough to aggregate all the >> child constraints to make a simpler constraint that is simply outside the >> current min/max boundaries? If so, for serial/time range partitioning this >> should typically work out fine since there are rarely gaps. This actually >> seems more of an issue for list partitioning where each child is a distinct >> value or range of values that are completely arbitrary. Won't that check >> and re-evaluation of the default's constraint just get worse and worse as >> more children are added? Is there really even a need for the default to >> have an opposite constraint like this? Not sure on how the planner works >> with partitioning now, but wouldn't it be better to first check all >> non-default children for a match the same as it does now without a default >> and, failing that, then route to the default if one is declared? The >> default should accept any data then so I don't see the need for the >> constraint unless it's required for the current implementation. If that's >> the case, could that be changed? Unless I misread your last sentence, I think there might be some confusion. Currently, the partition constraint (think of these as you would of user-defined check constraints) is needed for two reasons: 1. to prevent direct insertion of rows into the default partition for which a non-default partition exists; no two partitions should ever have duplicate rows. 2. so that planner can use the constraint to determine if the default partition needs to be scanned for a query using constraint exclusion; no need, for example, to scan the default partition if the query requests only key=3 rows and a partition for the same exists (no other partition should have key=3 rows by definition, not even the default). As things stand today, planner needs to look at every partition individually for using constraint exclusion to possibly exclude it, *even* with declarative partitioning and that would include the default partition. > Actually, thinking on this more, I realized this does again come back to > the lack of a global index. Without the constraint, data could be put > directly into the default that could technically conflict with the > partition scheme elsewhere. Perhaps, instead of the constraint, inserts > directly to the default could be prevented on the user level. Writing to > valid children directly certainly has its place, but been thinking about > it, and I can't see any reason why one would ever want to write directly to > the default. It's use case seems to be around being a sort of temporary > storage until that data can be moved to a valid location. Would still need > to allow removal of data, though. As mentioned above, the default partition will not allow directly inserting a row whose key maps to some existing (non-default) partition. As far as tuple-routing is concerned, it will choose the default partition only if no other partition is found for the key. Tuple-routing doesn't use the partition constraints directly per se, like one of the two things mentioned above do. One could say that tuple-routing assigns the incoming rows to partitions such that their individual partition constraints are not violated. Finally, we don't yet offer global guarantees for constraints like unique.The only guarantee that's in place is that no twopartitions can contain the same partition key. Thanks, Amit
On Wed, Apr 5, 2017 at 11:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>Could you briefly elaborate why you think the lack global index support
>>would be a problem in this regard?
> I think following can happen if we allow rows satisfying the new partition
> to lie around in the
> default partition until background process moves it.
> Consider a scenario where partition key is a primary key and the data in the
> default partition is
> not yet moved into the newly added partition. If now, new data is added into
> the new partition
> which also exists(same key) in default partition there will be data
> duplication. If now
> we scan the partitioned table for that key(from both the default and new
> partition as we
> have not moved the rows) it will fetch the both rows.
> Unless we have global indexes for partitioned tables, there is chance of
> data duplication between
> child table added after default partition and the default partition.
Yes, I think it would be completely crazy to try to migrate the data
in the background:
- The migration might never complete because of a UNIQUE or CHECK
constraint on the partition to which rows are being migrated.
- Even if the migration eventually succeeded, such a design abandons
all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly
while the migration is in progress, unless the new partition has no
UNIQUE constraints.
- Partition-wise join and partition-wise aggregate would need to have
special case handling for the case of an unfinished migration, as
would any user code that accesses partitions directly.
- More generally, I think users expect that when a DDL command
finishes execution, it's done all of the work that there is to do (or
at the very least, that any remaining work has no user-visible
consequences, which would not be the case here).
IMV, the question of whether we have efficient ways to move data
around between partitions is somewhat separate from the question of
whether partitions can be defined in a certain way in the first place.
The problems that Keith refers to upthread already exist for
subpartitioning; you've got to detach the old partition, create a new
one, and then reinsert the data. And for partitioning an
unpartitioned table: create a replacement table, insert all the data,
substitute it for the original table. The fact that we have these
limitation is not good, but we're not going to rip out partitioning
entirely because we don't have clever ways of migrating the data in
those cases, and the proposed behavior here is not any worse.
Also, waiting for those problems to get fixed might be waiting for
Godot. I'm not really all that sanguine about our chances of coming
up with a really nice way of handling these cases. In a designed
based on table inheritance, you can leave it murky where certain data
is supposed to end up and migrate it on-line and you might get away
with that, but a major point of having declarative partitioning at all
is to remove that sort of murkiness. It's probably not that hard to
come up with a command that locks the parent and moves data around via
full table scans, but I'm not sure how far that really gets us; you
could do the same thing easily enough with a sequence of commands
generated via a script. And being able to do this in a general way
without a full table lock looks pretty hard - it doesn't seem
fundamentally different from trying to perform a table-rewriting
operation like CLUSTER without a full table lock, which we also don't
support. The executor is not built to cope with any aspect of the
table definition shifting under it, and that includes the set of child
tables with are partitions of the table mentioned in the query. Maybe
the executor can be taught to survive such definitional changes at
least in limited cases, but that's a much different project than
allowing default partitions.Confirmed that v5 patch works with examples given in the original post but segfaulted when I tried the examples I used in my blog post (taken from the documentation at the time I wrote it). https://www.keithf4.com/postgresql-10-built-in- partitioning/
keith@keith=# drop table cities;
DROP TABLE
Time: 6.055 ms
keith@keith=# CREATE TABLE cities (
city_id bigserial not null,
name text not null,
population int
) PARTITION BY LIST (initcap(name));
CREATE TABLE
Time: 7.130 ms
keith@keith=# CREATE TABLE cities_west
PARTITION OF cities (
CONSTRAINT city_id_nonzero CHECK (city_id != 0)
) FOR VALUES IN ('Los Angeles', 'San Francisco');
CREATE TABLE
Time: 6.690 ms
keith@keith=# CREATE TABLE cities_default
keith-# PARTITION OF cities FOR VALUES IN (DEFAULT);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
Failed.
Time: 387.887 ms
After reading responses, I think I'd be fine with how Rahila implemented this with disallowing the child until the data is removed from the default if this would allow it to be included in v10. As was mentioned, there just doesn't seem to be a way to easily handle the data conflicts cleanly at this time, but I think the value of the default to be able to catch accidental data vs returning an error is worth it. It also at least gives a slightly easier migration path vs having to migrate to a completely new table. Any chance this could be adapted for range partitioning as well? I'd be happy to create some pgtap tests with pg_partman for this then to make sure it works.Only issue I see with this, and I'm not sure if it is an issue, is what happens to that default constraint clause when 1000s of partitions start getting added? From what I gather the default's constraint is built based off the cumulative opposite of all other child constraints. I don't understand the code well enough to see what it's actually doing, but if there are no gaps, is the method used smart enough to aggregate all the child constraints to make a simpler constraint that is simply outside the current min/max boundaries? If so, for serial/time range partitioning this should typically work out fine since there are rarely gaps. This actually seems more of an issue for list partitioning where each child is a distinct value or range of values that are completely arbitrary. Won't that check and re-evaluation of the default's constraint just get worse and worse as more children are added? Is there really even a need for the default to have an opposite constraint like this? Not sure on how the planner works with partitioning now, but wouldn't it be better to first check all non-default children for a match the same as it does now without a default and, failing that, then route to the default if one is declared? The default should accept any data then so I don't see the need for the constraint unless it's required for the current implementation. If that's the case, could that be changed?Keith
Attachment
On 2017/04/06 13:08, Keith Fiske wrote:
> On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske wrote:
>> Only issue I see with this, and I'm not sure if it is an issue, is what
>> happens to that default constraint clause when 1000s of partitions start
>> getting added? From what I gather the default's constraint is built based
>> off the cumulative opposite of all other child constraints. I don't
>> understand the code well enough to see what it's actually doing, but if
>> there are no gaps, is the method used smart enough to aggregate all the
>> child constraints to make a simpler constraint that is simply outside the
>> current min/max boundaries? If so, for serial/time range partitioning this
>> should typically work out fine since there are rarely gaps. This actually
>> seems more of an issue for list partitioning where each child is a distinct
>> value or range of values that are completely arbitrary. Won't that check
>> and re-evaluation of the default's constraint just get worse and worse as
>> more children are added? Is there really even a need for the default to
>> have an opposite constraint like this? Not sure on how the planner works
>> with partitioning now, but wouldn't it be better to first check all
>> non-default children for a match the same as it does now without a default
>> and, failing that, then route to the default if one is declared? The
>> default should accept any data then so I don't see the need for the
>> constraint unless it's required for the current implementation. If that's
>> the case, could that be changed?
Unless I misread your last sentence, I think there might be some
confusion. Currently, the partition constraint (think of these as you
would of user-defined check constraints) is needed for two reasons: 1. to
prevent direct insertion of rows into the default partition for which a
non-default partition exists; no two partitions should ever have duplicate
rows. 2. so that planner can use the constraint to determine if the
default partition needs to be scanned for a query using constraint
exclusion; no need, for example, to scan the default partition if the
query requests only key=3 rows and a partition for the same exists (no
other partition should have key=3 rows by definition, not even the
default). As things stand today, planner needs to look at every partition
individually for using constraint exclusion to possibly exclude it, *even*
with declarative partitioning and that would include the default partition.
> Actually, thinking on this more, I realized this does again come back to
> the lack of a global index. Without the constraint, data could be put
> directly into the default that could technically conflict with the
> partition scheme elsewhere. Perhaps, instead of the constraint, inserts
> directly to the default could be prevented on the user level. Writing to
> valid children directly certainly has its place, but been thinking about
> it, and I can't see any reason why one would ever want to write directly to
> the default. It's use case seems to be around being a sort of temporary
> storage until that data can be moved to a valid location. Would still need
> to allow removal of data, though.
As mentioned above, the default partition will not allow directly
inserting a row whose key maps to some existing (non-default) partition.
As far as tuple-routing is concerned, it will choose the default partition
only if no other partition is found for the key. Tuple-routing doesn't
use the partition constraints directly per se, like one of the two things
mentioned above do. One could say that tuple-routing assigns the incoming
rows to partitions such that their individual partition constraints are
not violated.
Finally, we don't yet offer global guarantees for constraints like unique.
The only guarantee that's in place is that no two partitions can contain
the same partition key.
Thanks,
Amit
Rahila SyedThank you,regarding operator used at the time of creating expression as default partition constraint. This was notified offlist by Amit Langote.Hello,Thanks a lot for testing and reporting this. Please find attached an updated patch with the fix. The patch also contains a fix
Hi Rahila,
With your latest patch:
Consider a case when a table is partitioned on a boolean key.
Even when there are existing separate partitions for 'true' and
'false', still default partition can be created.
I think this should not be allowed.
Consider following case:
postgres=# CREATE TABLE list_partitioned (
a bool
) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN ('false');
CREATE TABLE
postgres=# CREATE TABLE part_2 PARTITION OF list_partitioned FOR VALUES IN ('true');
CREATE TABLE
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE
The creation of table part_default should have failed instead.
Thanks,
Jeevan Ladhe
On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:Rahila SyedThank you,regarding operator used at the time of creating expression as default partition constraint. This was notified offlist by Amit Langote.Hello,Thanks a lot for testing and reporting this. Please find attached an updated patch with the fix. The patch also contains a fixCould probably use some more extensive testing, but all examples I had on my previously mentioned blog post are now working.Keith
Hi Rahila,
With your latest patch:
Consider a case when a table is partitioned on a boolean key.
Even when there are existing separate partitions for 'true' and
'false', still default partition can be created.
I think this should not be allowed.
Consider following case:
postgres=# CREATE TABLE list_partitioned (
a bool
) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN ('false');
CREATE TABLE
postgres=# CREATE TABLE part_2 PARTITION OF list_partitioned FOR VALUES IN ('true');
CREATE TABLE
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE
The creation of table part_default should have failed instead.
Thanks,
Jeevan Ladhe
On Thu, Apr 6, 2017 at 9:37 PM, Keith Fiske <keith@omniti.com> wrote:On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:Rahila SyedThank you,regarding operator used at the time of creating expression as default partition constraint. This was notified offlist by Amit Langote.Hello,Thanks a lot for testing and reporting this. Please find attached an updated patch with the fix. The patch also contains a fixCould probably use some more extensive testing, but all examples I had on my previously mentioned blog post are now working.Keith
On Mon, Apr 10, 2017 at 8:12 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi Rahila, > > > With your latest patch: > > Consider a case when a table is partitioned on a boolean key. > > Even when there are existing separate partitions for 'true' and > > 'false', still default partition can be created. > > > I think this should not be allowed. Well, boolean columns can have "NULL" values which will go into default partition if no NULL partition exists. So, probably we should add check for NULL partition there. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Mon, Apr 10, 2017 at 8:12 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi Rahila,
>
>
> With your latest patch:
>
> Consider a case when a table is partitioned on a boolean key.
>
> Even when there are existing separate partitions for 'true' and
>
> 'false', still default partition can be created.
>
>
> I think this should not be allowed.
Well, boolean columns can have "NULL" values which will go into
default partition if no NULL partition exists. So, probably we should
add check for NULL partition there.
On Tue, Apr 11, 2017 at 9:41 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > I have checked for NULLs too, and the default partition can be created even > when there are partitions for each TRUE, FALSE and NULL. > > Consider the example below: > > postgres=# CREATE TABLE list_partitioned ( > a bool > ) PARTITION BY LIST (a); > CREATE TABLE > postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN > ('false'); > CREATE TABLE > postgres=# CREATE TABLE part_2 PARTITION OF list_partitioned FOR VALUES IN > ('true'); > CREATE TABLE > postgres=# CREATE TABLE part_3 PARTITION OF list_partitioned FOR VALUES IN > (null); > CREATE TABLE > postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR > VALUES IN (DEFAULT); > CREATE TABLE In my opinion, that's absolutely fine, and it would be very strange to try to prevent it. The partitioning method shouldn't have specific knowledge of the properties of individual data types. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Apr 6, 2017 at 1:17 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > I like the idea about having DEFAULT partition for the range partition. With > the > way partition is designed it can have holes into range partition. I think > DEFAULT > for the range partition is a good idea, generally when the range having > holes. When > range is serial then of course DEFAULT partition doen't much sense. Yes, I like that idea, too. I think the DEFAULT partition should be allowed to be created for either range or list partitioning regardless of whether we think there are any holes, but if you create a DEFAULT partition when there are no holes, you just won't be able to put any data into it. It's silly, but it's not worth the code that it would take to try to prevent it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: > Thanks a lot for testing and reporting this. Please find attached an updated > patch with the fix. The patch also contains a fix > regarding operator used at the time of creating expression as default > partition constraint. This was notified offlist by Amit Langote. I think that the syntax for this patch should probably be revised. Right now the proposal is for: CREATE TABLE .. PARTITION OF ... FOR VALUES IN (DEFAULT); But that's not a good idea for several reasons. For one thing, you can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible. For another thing, this kind of syntax won't generalize to range partitioning, which we've talked about making this feature support. Maybe something like: CREATE TABLE .. PARTITION OF .. DEFAULT; This patch makes the assumption throughout that any DefElem represents the word DEFAULT, which is true in the patch as written but doesn't seem very future-proof. I think the "def" in "DefElem" stands for "definition" or "define" or something like that, so this is actually pretty confusing. Maybe we should introduce a dedicated node type to represent a default-specification in the parser grammar. If not, then let's at least encapsulate the test a little better, e.g. by adding isDefaultPartitionBound() which tests not only IsA(..., DefElem) but also whether the name is DEFAULT as expected. BTW, we typically use lower-case internally, so if we stick with this representation it should really be "default" not "DEFAULT". Useless hunk: + bool has_def; /* Is there a default partition? Currently false + * for a range partitioned table */ + int def_index; /* Index of the default list partition. -1 for + * range partitioned tables */ Why abbreviate "default" to def here? Seems pointless. + if (found_def) + { + if (mapping[def_index] == -1) + mapping[def_index] = next_index++; + } Consider && @@ -717,7 +754,6 @@ check_new_partition_bound(char *relname, Relation parent, Node *bound) } } } - break; } + * default partiton for rows satisfying the new partition Spelling. + * constraint. If found dont allow addition of a new partition. Missing apostrophe. + defrel = heap_open(defid, AccessShareLock); + tupdesc = CreateTupleDescCopy(RelationGetDescr(defrel)); + + /* Build expression execution states for partition check quals */ + partqualstate = ExecPrepareCheck(partConstraint, + estate); + + econtext = GetPerTupleExprContext(estate); + snapshot = RegisterSnapshot(GetLatestSnapshot()); Definitely not safe against concurrency, since AccessShareLock won't exclude somebody else's update. In fact, it won't even cover somebody else's already-in-flight transaction. + errmsg("new default partition constraint is violated by some row"))); Normally in such cases we try to give more detail using ExecBuildSlotValueDescription. + bool is_def = true; This variable starts out true and is never set to any value other than true. Just get rid of it and, in the one place where it is currently used, write "true". That's shorter and clearer. + inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock); If it's actually safe to do this with no lock, there ought to be a comment with a very compelling explanation of why it's safe. + boundspec = (Node *) stringToNode(TextDatumGetCString(datum)); + bspec = (PartitionBoundSpec *)boundspec; There's not really a reason to cast the result of stringToNode() to Node * and then turn around and cast it to PartitionBoundSpec *. Just cast it directly to whatever it needs to be. And use the new castNode macro. + foreach(cell1, bspec->listdatums) + { + Node *value = lfirst(cell1); + if (IsA(value, DefElem)) + { + def_elem = true; + *defid = inhrelid; + } + } + if (def_elem) + { + ReleaseSysCache(tuple); + continue; + } + foreach(cell3, bspec->listdatums) + { + Node *value = lfirst(cell3); + boundspecs = lappend(boundspecs, value); + } + ReleaseSysCache(tuple); + } + foreach(cell4, spec->listdatums) + { + Node *value = lfirst(cell4); + boundspecs = lappend(boundspecs, value); + } cell1, cell2, cell3, and cell4 are not very clear variable names. Between that and the lack of comments, this is not easy to understand. It's sort of spaghetti logic, too. The if (def_elem) test continues early, but if the point is that the loop using cell3 shouldn't execute in that case, why not just put if (!def_elem) { foreach(cell3, ...) { ... } } instead of reiterating the ReleaseSysCache in two places? + /* Collect bound spec nodes in a list. This is done if the partition is + * a default partition. In case of default partition, constraint is formed + * by performing <> operation over the partition constraints of the + * existing partitions. + */ I doubt that handles NULLs properly. + inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock); Again, no lock? Really? The logic which follows looks largely cut-and-pasted, which makes me think you need to do some refactoring here to make it more clear what's going on, so that you have the relevant logic in just one place. It seems wrong anyway to shove all of this logic specific to the default case into get_qual_from_partbound() when the logic for the non-default case is inside get_qual_for_list. Where there were 2 lines of code before you've now got something like 30. + if(get_negator(operoid) == InvalidOid) + elog(ERROR, "no negator found for partition operator %u", + operoid); I really doubt that's OK. elog() shouldn't be reachable, but this will be reachable if the partitioning operator does not have a negator. And there's the NULL-handling issue I mentioned above, too. + if (partdesc->boundinfo->has_def && key->strategy + == PARTITION_STRATEGY_LIST) + result = parent->indexes[partdesc->boundinfo->def_index]; Testing for PARTITION_STRATEGY_LIST here seems unnecessary. If has_def (or has_default, as it probably should be) isn't allowed for range partitions, then it's redundant; if it is allowed, then that case should be handled too. Also, at this point we've already set *failed_at and *failed_slot; presumably you'd want to make this check before you get to that point. I suspect there are quite a few more problems here in addition to the ones mentioned above, but I don't think it makes sense to spend too much time searching for them until some of this basic stuff is cleaned up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thank you for reviewing.
>But that's not a good idea for several reasons. For one thing, you
>can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
>For another thing, this kind of syntax won't generalize to range
>partitioning, which we've talked about making this feature support.
>Maybe something like:
>CREATE TABLE .. PARTITION OF .. DEFAULT;
Following can also be considered as it specifies more clearly that the
partition holds default values.
CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>Maybe we should introduce a dedicated node type to
>represent a default-specification in the parser grammar. If not, then
>let's at least encapsulate the test a little better, e.g. by adding
>isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
>also whether the name is DEFAULT as expected. BTW, we typically use
>lower-case internally, so if we stick with this representation it
>should really be "default" not "DEFAULT".
>Why abbreviate "default" to def here? Seems pointless.
>Consider &&
>+ * default partiton for rows satisfying the new partition
>Spelling.
>Missing apostrophe
>Definitely not safe against concurrency, since AccessShareLock won't
>exclude somebody else's update. In fact, it won't even cover somebody
>else's already-in-flight transaction
>Normally in such cases we try to give more detail using
>ExecBuildSlotValueDescription.
>This variable starts out true and is never set to any value other than
>true. Just get rid of it and, in the one place where it is currently
>used, write "true". That's shorter and clearer.
>There's not really a reason to cast the result of stringToNode() to
>Node * and then turn around and cast it to PartitionBoundSpec *. Just
>cast it directly to whatever it needs to be. And use the new castNode
>macro
>early, but if the point is that the loop using cell3 shouldn't execute
>in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
>... } } instead of reiterating the ReleaseSysCache in two places?
On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Thanks a lot for testing and reporting this. Please find attached an updated
> patch with the fix. The patch also contains a fix
> regarding operator used at the time of creating expression as default
> partition constraint. This was notified offlist by Amit Langote.
I think that the syntax for this patch should probably be revised.
Right now the proposal is for:
CREATE TABLE .. PARTITION OF ... FOR VALUES IN (DEFAULT);
But that's not a good idea for several reasons. For one thing, you
can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
For another thing, this kind of syntax won't generalize to range
partitioning, which we've talked about making this feature support.
Maybe something like:
CREATE TABLE .. PARTITION OF .. DEFAULT;
This patch makes the assumption throughout that any DefElem represents
the word DEFAULT, which is true in the patch as written but doesn't
seem very future-proof. I think the "def" in "DefElem" stands for
"definition" or "define" or something like that, so this is actually
pretty confusing. Maybe we should introduce a dedicated node type to
represent a default-specification in the parser grammar. If not, then
let's at least encapsulate the test a little better, e.g. by adding
isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
also whether the name is DEFAULT as expected. BTW, we typically use
lower-case internally, so if we stick with this representation it
should really be "default" not "DEFAULT".
Useless hunk:
+ bool has_def; /* Is there a default partition?
Currently false
+ * for a range partitioned table */
+ int def_index; /* Index of the default list
partition. -1 for
+ * range partitioned tables */
Why abbreviate "default" to def here? Seems pointless.
+ if (found_def)
+ {
+ if (mapping[def_index] == -1)
+ mapping[def_index] = next_index++;
+ }
Consider &&
@@ -717,7 +754,6 @@ check_new_partition_bound(char *relname, Relation
parent, Node *bound)
}
}
}
-
break;
}
+ * default partiton for rows satisfying the new partition
Spelling.
+ * constraint. If found dont allow addition of a new partition.
Missing apostrophe.
+ defrel = heap_open(defid, AccessShareLock);
+ tupdesc = CreateTupleDescCopy(RelationGetDescr(defrel));
+
+ /* Build expression execution states for partition check quals */
+ partqualstate = ExecPrepareCheck(partConstraint,
+ estate);
+
+ econtext = GetPerTupleExprContext(estate);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
Definitely not safe against concurrency, since AccessShareLock won't
exclude somebody else's update. In fact, it won't even cover somebody
else's already-in-flight transaction.
+ errmsg("new default partition constraint is violated
by some row")));
Normally in such cases we try to give more detail using
ExecBuildSlotValueDescription.
+ bool is_def = true;
This variable starts out true and is never set to any value other than
true. Just get rid of it and, in the one place where it is currently
used, write "true". That's shorter and clearer.
+ inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock);
If it's actually safe to do this with no lock, there ought to be a
comment with a very compelling explanation of why it's safe.
+ boundspec = (Node *) stringToNode(TextDatumGetCString(datum));
+ bspec = (PartitionBoundSpec *)boundspec;
There's not really a reason to cast the result of stringToNode() to
Node * and then turn around and cast it to PartitionBoundSpec *. Just
cast it directly to whatever it needs to be. And use the new castNode
macro.
+ foreach(cell1, bspec->listdatums)
+ {
+ Node *value = lfirst(cell1);
+ if (IsA(value, DefElem))
+ {
+ def_elem = true;
+ *defid = inhrelid;
+ }
+ }
+ if (def_elem)
+ {
+ ReleaseSysCache(tuple);
+ continue;
+ }
+ foreach(cell3, bspec->listdatums)
+ {
+ Node *value = lfirst(cell3);
+ boundspecs = lappend(boundspecs, value);
+ }
+ ReleaseSysCache(tuple);
+ }
+ foreach(cell4, spec->listdatums)
+ {
+ Node *value = lfirst(cell4);
+ boundspecs = lappend(boundspecs, value);
+ }
cell1, cell2, cell3, and cell4 are not very clear variable names.
Between that and the lack of comments, this is not easy to understand.
It's sort of spaghetti logic, too. The if (def_elem) test continues
early, but if the point is that the loop using cell3 shouldn't execute
in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
... } } instead of reiterating the ReleaseSysCache in two places?
+ /* Collect bound spec nodes in a list. This is done
if the partition is
+ * a default partition. In case of default partition,
constraint is formed
+ * by performing <> operation over the partition
constraints of the
+ * existing partitions.
+ */
I doubt that handles NULLs properly.
+ inhoids =
find_inheritance_children(RelationGetRelid(parent), NoLock);
Again, no lock? Really?
The logic which follows looks largely cut-and-pasted, which makes me
think you need to do some refactoring here to make it more clear
what's going on, so that you have the relevant logic in just one
place. It seems wrong anyway to shove all of this logic specific to
the default case into get_qual_from_partbound() when the logic for the
non-default case is inside get_qual_for_list. Where there were 2
lines of code before you've now got something like 30.
+ if(get_negator(operoid) == InvalidOid)
+ elog(ERROR, "no negator found for partition operator %u",
+ operoid);
I really doubt that's OK. elog() shouldn't be reachable, but this
will be reachable if the partitioning operator does not have a
negator. And there's the NULL-handling issue I mentioned above, too.
+ if (partdesc->boundinfo->has_def && key->strategy
+ == PARTITION_STRATEGY_LIST)
+ result = parent->indexes[partdesc->boundinfo->def_index];
Testing for PARTITION_STRATEGY_LIST here seems unnecessary. If
has_def (or has_default, as it probably should be) isn't allowed for
range partitions, then it's redundant; if it is allowed, then that
case should be handled too. Also, at this point we've already set
*failed_at and *failed_slot; presumably you'd want to make this check
before you get to that point.
I suspect there are quite a few more problems here in addition to the
ones mentioned above, but I don't think it makes sense to spend too
much time searching for them until some of this basic stuff is cleaned
up.
Attachment
On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: > Following can also be considered as it specifies more clearly that the > partition holds default values. > > CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT; Yes, that could be done. But I don't think it's correct to say that the partition holds default values. Let's back up and ask what the word "default" means. The relevant definition (according to Google or whoever they stole it from) is: a preselected option adopted by a computer program or other mechanism when no alternative is specified by the user or programmer. So, a default *value* is the value that is used when no alternative is specified by the user or programmer. We have that concept, but it's not what we're talking about here: that's configured by applying the DEFAULT property to a column. Here, we're talking about the default *partition*, or in other words the *partition* that is used when no alternative is specified by the user or programmer. So, that's why I proposed the syntax I did. The partition doesn't contain default values; it is itself a default. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >> Following can also be considered as it specifies more clearly that the >> partition holds default values. >> >> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT; > > Yes, that could be done. But I don't think it's correct to say that > the partition holds default values. Let's back up and ask what the > word "default" means. The relevant definition (according to Google or > whoever they stole it from) is: > > a preselected option adopted by a computer program or other mechanism > when no alternative is specified by the user or programmer. > > So, a default *value* is the value that is used when no alternative is > specified by the user or programmer. We have that concept, but it's > not what we're talking about here: that's configured by applying the > DEFAULT property to a column. Here, we're talking about the default > *partition*, or in other words the *partition* that is used when no > alternative is specified by the user or programmer. So, that's why I > proposed the syntax I did. The partition doesn't contain default > values; it is itself a default. Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>> Following can also be considered as it specifies more clearly that the
>> partition holds default values.
>>
>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>
> Yes, that could be done. But I don't think it's correct to say that
> the partition holds default values. Let's back up and ask what the
> word "default" means. The relevant definition (according to Google or
> whoever they stole it from) is:
>
> a preselected option adopted by a computer program or other mechanism
> when no alternative is specified by the user or programmer.
>
> So, a default *value* is the value that is used when no alternative is
> specified by the user or programmer. We have that concept, but it's
> not what we're talking about here: that's configured by applying the
> DEFAULT property to a column. Here, we're talking about the default
> *partition*, or in other words the *partition* that is used when no
> alternative is specified by the user or programmer. So, that's why I
> proposed the syntax I did. The partition doesn't contain default
> values; it is itself a default.
Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>> Following can also be considered as it specifies more clearly that the
>> partition holds default values.
>>
>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>
> Yes, that could be done. But I don't think it's correct to say that
> the partition holds default values. Let's back up and ask what the
> word "default" means. The relevant definition (according to Google or
> whoever they stole it from) is:
>
> a preselected option adopted by a computer program or other mechanism
> when no alternative is specified by the user or programmer.
>
> So, a default *value* is the value that is used when no alternative is
> specified by the user or programmer. We have that concept, but it's
> not what we're talking about here: that's configured by applying the
> DEFAULT property to a column. Here, we're talking about the default
> *partition*, or in other words the *partition* that is used when no
> alternative is specified by the user or programmer. So, that's why I
> proposed the syntax I did. The partition doesn't contain default
> values; it is itself a default.
Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.
On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >>> Following can also be considered as it specifies more clearly that the >>> partition holds default values. >>> >>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT; >> >> The partition doesn't contain default values; it is itself a default. > > Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural. I suspect it could be done as of now, but I'm a little worried that it might create grammar conflicts in the future as we extend the syntax further. If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the word DEFAULT appears in the same position where we'd normally have FOR VALUES, and so the parser will definitely be able to figure out what's going on. When it gets to that position, it will see FOR or it will see DEFAULT, and all is clear. OTOH, if we use CREATE TABLE ... DEFAULT PARTITION OF ..., then we have action at a distance: whether or not the word DEFAULT is present before PARTITION affects which tokens are legal after the parent table name. bison isn't always very smart about that kind of thing. No particular dangers come to mind at the moment, but it makes me nervous anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/04/25 5:16, Robert Haas wrote: > On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >>>> Following can also be considered as it specifies more clearly that the >>>> partition holds default values. >>>> >>>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT; >>> >>> The partition doesn't contain default values; it is itself a default. >> >> Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural. > > I suspect it could be done as of now, but I'm a little worried that it > might create grammar conflicts in the future as we extend the syntax > further. If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the > word DEFAULT appears in the same position where we'd normally have FOR > VALUES, and so the parser will definitely be able to figure out what's > going on. When it gets to that position, it will see FOR or it will > see DEFAULT, and all is clear. OTOH, if we use CREATE TABLE ... > DEFAULT PARTITION OF ..., then we have action at a distance: whether > or not the word DEFAULT is present before PARTITION affects which > tokens are legal after the parent table name. bison isn't always very > smart about that kind of thing. No particular dangers come to mind at > the moment, but it makes me nervous anyway. +1 to CREATE TABLE .. PARTITION OF .. DEFAULT Thanks, Amit
On Tue, Apr 25, 2017 at 1:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >>>> Following can also be considered as it specifies more clearly that the >>>> partition holds default values. >>>> >>>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT; >>> >>> The partition doesn't contain default values; it is itself a default. >> >> Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural. > > I suspect it could be done as of now, but I'm a little worried that it > might create grammar conflicts in the future as we extend the syntax > further. If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the > word DEFAULT appears in the same position where we'd normally have FOR > VALUES, and so the parser will definitely be able to figure out what's > going on. When it gets to that position, it will see FOR or it will > see DEFAULT, and all is clear. OTOH, if we use CREATE TABLE ... > DEFAULT PARTITION OF ..., then we have action at a distance: whether > or not the word DEFAULT is present before PARTITION affects which > tokens are legal after the parent table name. As long as we handle this at the transformation stage, it shouldn't be a problem. The grammar would be something like CREATE TABLE ... optDefault PARTITION OF ... If user specifies DEFAULT PARTITION OF t1 FOR VALUES ..., parser will allow that but in transformation stage, we will detect it and throw an error "DEFAULT partitions can not contains partition bound clause" or something like that. Also, documentation would say that DEFAULT and partition bound specification are not allowed together. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2017/04/25 14:20, Ashutosh Bapat wrote: > On Tue, Apr 25, 2017 at 1:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >>>>> Following can also be considered as it specifies more clearly that the >>>>> partition holds default values. >>>>> >>>>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT; >>>> >>>> The partition doesn't contain default values; it is itself a default. >>> >>> Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural. >> >> I suspect it could be done as of now, but I'm a little worried that it >> might create grammar conflicts in the future as we extend the syntax >> further. If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the >> word DEFAULT appears in the same position where we'd normally have FOR >> VALUES, and so the parser will definitely be able to figure out what's >> going on. When it gets to that position, it will see FOR or it will >> see DEFAULT, and all is clear. OTOH, if we use CREATE TABLE ... >> DEFAULT PARTITION OF ..., then we have action at a distance: whether >> or not the word DEFAULT is present before PARTITION affects which >> tokens are legal after the parent table name. > > As long as we handle this at the transformation stage, it shouldn't be > a problem. The grammar would be something like > CREATE TABLE ... optDefault PARTITION OF ... > > If user specifies DEFAULT PARTITION OF t1 FOR VALUES ..., parser will > allow that but in transformation stage, we will detect it and throw an > error "DEFAULT partitions can not contains partition bound clause" or > something like that. Also, documentation would say that DEFAULT and > partition bound specification are not allowed together. FWIW, one point to like about PARTITION OF .. DEFAULT is that it wouldn't need us to do things you mention we could do. A point to not like it may be that it might read backwards to some users, but then the DEFAULT PARTITION OF have all those possibilities of error-causing user input. Thanks, Amit
On Tue, Apr 25, 2017 at 1:20 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: >> I suspect it could be done as of now, but I'm a little worried that it >> might create grammar conflicts in the future as we extend the syntax >> further. If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the >> word DEFAULT appears in the same position where we'd normally have FOR >> VALUES, and so the parser will definitely be able to figure out what's >> going on. When it gets to that position, it will see FOR or it will >> see DEFAULT, and all is clear. OTOH, if we use CREATE TABLE ... >> DEFAULT PARTITION OF ..., then we have action at a distance: whether >> or not the word DEFAULT is present before PARTITION affects which >> tokens are legal after the parent table name. > > As long as we handle this at the transformation stage, it shouldn't be > a problem. The grammar would be something like > CREATE TABLE ... optDefault PARTITION OF ... > > If user specifies DEFAULT PARTITION OF t1 FOR VALUES ..., parser will > allow that but in transformation stage, we will detect it and throw an > error "DEFAULT partitions can not contains partition bound clause" or > something like that. Also, documentation would say that DEFAULT and > partition bound specification are not allowed together. That's not what I'm concerned about. I'm concerned about future syntax additions resulting in difficult-to-resolve grammar conflicts. For an example what of what I mean, consider this example: http://postgr.es/m/9253.1295031520@sss.pgh.pa.us The whole thread is worth a read. In brief, I wanted to add syntax like LOCK VIEW xyz, and it wasn't possible to do that without breaking backward compatibility. In a nutshell, the problem with making that syntax work was that LOCK VIEW NOWAIT would then potentially mean either lock a table called VIEW with the NOWAIT option, or else it might mean lock a view called NOWAIT. If the NOWAIT key word were not allowed at the end or if the TABLE keyword were mandatory, then it would be possible to make it work, but because we already decided both to make the TABLE keyword optional and allow an optional NOWAIT keyword at the end, the syntax couldn't be further extended in the way that I wanted to extend it without confusing the parser. The problem was basically unfixable without breaking backward compatibility, and we gave up. I don't want to make the same mistake with the default partition syntax that we made with the LOCK TABLE syntax. Aside from unfixable grammar conflicts, there's another way that this kind of syntax can become problematic, which is when you end up with multiple optional keywords in the same part of the syntax. For an example of that, see http://postgr.es/m/603c8f070905231747j2e099c23hef8eafbf26682e5f@mail.gmail.com - that discusses the problems with EXPLAIN; we later ran into the same problem with VACUUM. Users can't remember whether they are supposed to type VACUUM FULL VERBOSE or VACUUM VERBOSE FULL and trying to support both creates parser problems and tends to involve adding too many keywords, so we switched to a new and more extensible syntax for future options. Now, you may think that that's never going to happen in this case. What optional keyword other than DEFAULT could we possibly want to add just before PARTITION OF? TBH, I don't know. I can't think of anything else we might want to put in that position right now. But considering that it's been less than six months since the original syntax was committed and we've already thought of ONE thing we might want to put there, it seems hard to rule out the possibility that we might eventually think of more, and then we will have exactly the same kind of problem that we've had in the past with other commands. Let's head the problem off at the pass and pick a syntax which isn't vulnerable to that sort of issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 25, 2017 at 11:23 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Apr 25, 2017 at 1:20 AM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >>> I suspect it could be done as of now, but I'm a little worried that it >>> might create grammar conflicts in the future as we extend the syntax >>> further. If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the >>> word DEFAULT appears in the same position where we'd normally have FOR >>> VALUES, and so the parser will definitely be able to figure out what's >>> going on. When it gets to that position, it will see FOR or it will >>> see DEFAULT, and all is clear. OTOH, if we use CREATE TABLE ... >>> DEFAULT PARTITION OF ..., then we have action at a distance: whether >>> or not the word DEFAULT is present before PARTITION affects which >>> tokens are legal after the parent table name. >> >> As long as we handle this at the transformation stage, it shouldn't be >> a problem. The grammar would be something like >> CREATE TABLE ... optDefault PARTITION OF ... >> >> If user specifies DEFAULT PARTITION OF t1 FOR VALUES ..., parser will >> allow that but in transformation stage, we will detect it and throw an >> error "DEFAULT partitions can not contains partition bound clause" or >> something like that. Also, documentation would say that DEFAULT and >> partition bound specification are not allowed together. > > That's not what I'm concerned about. I'm concerned about future > syntax additions resulting in difficult-to-resolve grammar conflicts. > For an example what of what I mean, consider this example: > > http://postgr.es/m/9253.1295031520@sss.pgh.pa.us > > The whole thread is worth a read. In brief, I wanted to add syntax > like LOCK VIEW xyz, and it wasn't possible to do that without breaking > backward compatibility. In a nutshell, the problem with making that > syntax work was that LOCK VIEW NOWAIT would then potentially mean > either lock a table called VIEW with the NOWAIT option, or else it > might mean lock a view called NOWAIT. If the NOWAIT key word were not > allowed at the end or if the TABLE keyword were mandatory, then it > would be possible to make it work, but because we already decided both > to make the TABLE keyword optional and allow an optional NOWAIT > keyword at the end, the syntax couldn't be further extended in the way > that I wanted to extend it without confusing the parser. The problem > was basically unfixable without breaking backward compatibility, and > we gave up. I don't want to make the same mistake with the default > partition syntax that we made with the LOCK TABLE syntax. > > Aside from unfixable grammar conflicts, there's another way that this > kind of syntax can become problematic, which is when you end up with > multiple optional keywords in the same part of the syntax. For an > example of that, see > http://postgr.es/m/603c8f070905231747j2e099c23hef8eafbf26682e5f@mail.gmail.com > - that discusses the problems with EXPLAIN; we later ran into the same > problem with VACUUM. Users can't remember whether they are supposed > to type VACUUM FULL VERBOSE or VACUUM VERBOSE FULL and trying to > support both creates parser problems and tends to involve adding too > many keywords, so we switched to a new and more extensible syntax for > future options. > Thanks for taking out time for detailed explanation. > Now, you may think that that's never going to happen in this case. > What optional keyword other than DEFAULT could we possibly want to add > just before PARTITION OF? Since the grammar before PARTITION OF is shared with CREATE TABLE () there is high chance that we will have an optional keyword unrelated to partitioning there. I take back my proposal for that syntax. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
exclude it from the list of partition bounds to form the partition constraint.
This cant be accomplished by using has_default flag.
isDefaultPartitionBound() is written to accomplish that.
>might create grammar conflicts in the future as we extend the syntax
>further. If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
>word DEFAULT appears in the same position where we'd normally have FOR
>VALUES, and so the parser will definitely be able to figure out what's
>going on. When it gets to that position, it will see FOR or it will
>see DEFAULT, and all is clear. OTOH, if we use CREATE TABLE ...
>DEFAULT PARTITION OF ..., then we have action at a distance: whether
>or not the word DEFAULT is present before PARTITION affects which
>tokens are legal after the parent table name. bison isn't always very
>smart about that kind of thing. No particular dangers come to mind at
>the moment, but it makes me nervous anyway.
I think substituting DEFAULT for FOR VALUES is appropriate as
My colleague Rajkumar Raghuwanshi brought to my notice the current patch
does not handle this correctly.
I will include this in the updated patch if there is no objection.
an error should be thrown if PARTITION BY is specified after DEFAULT.
On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>> Following can also be considered as it specifies more clearly that the
>>> partition holds default values.
>>>
>>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>>
>> The partition doesn't contain default values; it is itself a default.
>
> Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.
I suspect it could be done as of now, but I'm a little worried that it
might create grammar conflicts in the future as we extend the syntax
further. If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
word DEFAULT appears in the same position where we'd normally have FOR
VALUES, and so the parser will definitely be able to figure out what's
going on. When it gets to that position, it will see FOR or it will
see DEFAULT, and all is clear. OTOH, if we use CREATE TABLE ...
DEFAULT PARTITION OF ..., then we have action at a distance: whether
or not the word DEFAULT is present before PARTITION affects which
tokens are legal after the parent table name. bison isn't always very
smart about that kind of thing. No particular dangers come to mind at
the moment, but it makes me nervous anyway.
On Thu, Apr 27, 2017 at 8:49 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >>I suspect it could be done as of now, but I'm a little worried that it >>might create grammar conflicts in the future as we extend the syntax >>further. If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the >>word DEFAULT appears in the same position where we'd normally have FOR >>VALUES, and so the parser will definitely be able to figure out what's >>going on. When it gets to that position, it will see FOR or it will >>see DEFAULT, and all is clear. OTOH, if we use CREATE TABLE ... >>DEFAULT PARTITION OF ..., then we have action at a distance: whether >>or not the word DEFAULT is present before PARTITION affects which >>tokens are legal after the parent table name. bison isn't always very >>smart about that kind of thing. No particular dangers come to mind at >>the moment, but it makes me nervous anyway. > > +1 for CREATE TABLE..PARTITION OF...DEFAULT syntax. > I think substituting DEFAULT for FOR VALUES is appropriate as > both cases are mutually exclusive. > > One more thing that needs consideration is should default partitions be > partitioned further? Other databases allow default partitions to be > partitioned further. I think, its normal for users to expect the data in > default partitions to also be divided into sub partitions. So > it should be supported. > My colleague Rajkumar Raghuwanshi brought to my notice the current patch > does not handle this correctly. > I will include this in the updated patch if there is no objection. > > On the other hand if sub partitions of a default partition is to be > prohibited, > an error should be thrown if PARTITION BY is specified after DEFAULT. I see no reason to prohibit it. You can further partition any other kind of partition, so there seems to be no reason to disallow it in this one case. Are you also working on extending this to work with range partitioning? Because I think that would be good to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 27.04.2017 15:07, Robert Haas wrote: > On Thu, Apr 27, 2017 at 8:49 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >> +1 for CREATE TABLE..PARTITION OF...DEFAULT syntax. >> I think substituting DEFAULT for FOR VALUES is appropriate as >> both cases are mutually exclusive. Just to make sound a little rounder: CREATE TABLE ... PARTITION OF ... AS DEFAULT CREATE TABLE ... PARTITION OF ... AS FALLBACK or CREATE TABLE ... PARTITION OF ... AS DEFAULT PARTITION CREATE TABLE ... PARTITION OF ... AS FALLBACK PARTITION Could any of these be feasible? Sven
Are you also working on extending this to work with range
partitioning? Because I think that would be good to do.
On Thu, Apr 27, 2017 at 3:15 PM, Sven R. Kunze <srkunze@mail.de> wrote: > On 27.04.2017 15:07, Robert Haas wrote: >> On Thu, Apr 27, 2017 at 8:49 AM, Rahila Syed <rahilasyed90@gmail.com> >> wrote: >>> >>> +1 for CREATE TABLE..PARTITION OF...DEFAULT syntax. >>> I think substituting DEFAULT for FOR VALUES is appropriate as >>> both cases are mutually exclusive. > > Just to make sound a little rounder: > > CREATE TABLE ... PARTITION OF ... AS DEFAULT > CREATE TABLE ... PARTITION OF ... AS FALLBACK > > or > > CREATE TABLE ... PARTITION OF ... AS DEFAULT PARTITION > CREATE TABLE ... PARTITION OF ... AS FALLBACK PARTITION > > Could any of these be feasible? FALLBACK wouldn't be a good choice because it's not an existing parser keyword. We could probably insert AS before DEFAULT and/or PARTITION afterwards, but they sort of seem like noise words. SQL seems to have been invented by people who didn't have any trouble remembering really long command strings, but brevity is not without some merit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Apr 27, 2017 at 3:15 PM, Sven R. Kunze <srkunze@mail.de> wrote:Just to make sound a little rounder: CREATE TABLE ... PARTITION OF ... AS DEFAULT CREATE TABLE ... PARTITION OF ... AS FALLBACK or CREATE TABLE ... PARTITION OF ... AS DEFAULT PARTITION CREATE TABLE ... PARTITION OF ... AS FALLBACK PARTITION Could any of these be feasible?FALLBACK wouldn't be a good choice because it's not an existing parser keyword. We could probably insert AS before DEFAULT and/or PARTITION afterwards, but they sort of seem like noise words.
You are right. I just thought it would make this variant more acceptable as people expressed concerns about understandability of the command.
SQL seems to have been invented by people who didn't have any trouble remembering really long command strings, but brevity is not without some merit.
For me, it's exactly the thing I like about SQL. It makes for an easy learning curve.
Sven
CREATE TABLE .. PARTITION OF .. DEFAULT has got most votes on this thread.
Do you mean the error reporting should be moved into execMain.creported in partition.c.This function is used in execMain.c and the error is beingChanged it to AccessExclusiveLockFixed.Fixed.Fixed.Corrected in the attached.checks for both node type and name.isDefaultPartitionBound() function is created in the attached patch whichHello,I agree that the syntax should be changed to also support range partitioning.
Thank you for reviewing.
>But that's not a good idea for several reasons. For one thing, you
>can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
>For another thing, this kind of syntax won't generalize to range
>partitioning, which we've talked about making this feature support.
>Maybe something like:
>CREATE TABLE .. PARTITION OF .. DEFAULT;
Following can also be considered as it specifies more clearly that the
partition holds default values.
CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>Maybe we should introduce a dedicated node type to
>represent a default-specification in the parser grammar. If not, then
>let's at least encapsulate the test a little better, e.g. by adding
>isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
>also whether the name is DEFAULT as expected. BTW, we typically use
>lower-case internally, so if we stick with this representation it
>should really be "default" not "DEFAULT".
>Why abbreviate "default" to def here? Seems pointless.
>Consider &&
>+ * default partiton for rows satisfying the new partition
>Spelling.
>Missing apostrophe
>Definitely not safe against concurrency, since AccessShareLock won't
>exclude somebody else's update. In fact, it won't even cover somebody
>else's already-in-flight transaction
>Normally in such cases we try to give more detail using
>ExecBuildSlotValueDescription. to use ExecBuildSlotValueDescription?
>This variable starts out true and is never set to any value other than
>true. Just get rid of it and, in the one place where it is currently
>used, write "true". That's shorter and clearer.Fixed.
>There's not really a reason to cast the result of stringToNode() to
>Node * and then turn around and cast it to PartitionBoundSpec *. Just
>cast it directly to whatever it needs to be. And use the new castNode
>macroFixed. castNode macro takes as input Node * whereas stringToNode() takes string.IIUC, castNode cant be used here.>The if (def_elem) test continues
>early, but if the point is that the loop using cell3 shouldn't execute
>in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
>... } } instead of reiterating the ReleaseSysCache in two places?Fixed in the attached.I will respond to further comments in following email.On Thu, Apr 13, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Thanks a lot for testing and reporting this. Please find attached an updated
> patch with the fix. The patch also contains a fix
> regarding operator used at the time of creating expression as default
> partition constraint. This was notified offlist by Amit Langote.
I think that the syntax for this patch should probably be revised.
Right now the proposal is for:
CREATE TABLE .. PARTITION OF ... FOR VALUES IN (DEFAULT);
But that's not a good idea for several reasons. For one thing, you
can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
For another thing, this kind of syntax won't generalize to range
partitioning, which we've talked about making this feature support.
Maybe something like:
CREATE TABLE .. PARTITION OF .. DEFAULT;
This patch makes the assumption throughout that any DefElem represents
the word DEFAULT, which is true in the patch as written but doesn't
seem very future-proof. I think the "def" in "DefElem" stands for
"definition" or "define" or something like that, so this is actually
pretty confusing. Maybe we should introduce a dedicated node type to
represent a default-specification in the parser grammar. If not, then
let's at least encapsulate the test a little better, e.g. by adding
isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
also whether the name is DEFAULT as expected. BTW, we typically use
lower-case internally, so if we stick with this representation it
should really be "default" not "DEFAULT".
Useless hunk:
+ bool has_def; /* Is there a default partition?
Currently false
+ * for a range partitioned table */
+ int def_index; /* Index of the default list
partition. -1 for
+ * range partitioned tables */
Why abbreviate "default" to def here? Seems pointless.
+ if (found_def)
+ {
+ if (mapping[def_index] == -1)
+ mapping[def_index] = next_index++;
+ }
Consider &&
@@ -717,7 +754,6 @@ check_new_partition_bound(char *relname, Relation
parent, Node *bound)
}
}
}
-
break;
}
+ * default partiton for rows satisfying the new partition
Spelling.
+ * constraint. If found dont allow addition of a new partition.
Missing apostrophe.
+ defrel = heap_open(defid, AccessShareLock);
+ tupdesc = CreateTupleDescCopy(RelationGetDescr(defrel));
+
+ /* Build expression execution states for partition check quals */
+ partqualstate = ExecPrepareCheck(partConstraint,
+ estate);
+
+ econtext = GetPerTupleExprContext(estate);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
Definitely not safe against concurrency, since AccessShareLock won't
exclude somebody else's update. In fact, it won't even cover somebody
else's already-in-flight transaction.
+ errmsg("new default partition constraint is violated
by some row")));
Normally in such cases we try to give more detail using
ExecBuildSlotValueDescription.
+ bool is_def = true;
This variable starts out true and is never set to any value other than
true. Just get rid of it and, in the one place where it is currently
used, write "true". That's shorter and clearer.
+ inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock);
If it's actually safe to do this with no lock, there ought to be a
comment with a very compelling explanation of why it's safe.
+ boundspec = (Node *) stringToNode(TextDatumGetCString(datum));
+ bspec = (PartitionBoundSpec *)boundspec;
There's not really a reason to cast the result of stringToNode() to
Node * and then turn around and cast it to PartitionBoundSpec *. Just
cast it directly to whatever it needs to be. And use the new castNode
macro.
+ foreach(cell1, bspec->listdatums)
+ {
+ Node *value = lfirst(cell1);
+ if (IsA(value, DefElem))
+ {
+ def_elem = true;
+ *defid = inhrelid;
+ }
+ }
+ if (def_elem)
+ {
+ ReleaseSysCache(tuple);
+ continue;
+ }
+ foreach(cell3, bspec->listdatums)
+ {
+ Node *value = lfirst(cell3);
+ boundspecs = lappend(boundspecs, value);
+ }
+ ReleaseSysCache(tuple);
+ }
+ foreach(cell4, spec->listdatums)
+ {
+ Node *value = lfirst(cell4);
+ boundspecs = lappend(boundspecs, value);
+ }
cell1, cell2, cell3, and cell4 are not very clear variable names.
Between that and the lack of comments, this is not easy to understand.
It's sort of spaghetti logic, too. The if (def_elem) test continues
early, but if the point is that the loop using cell3 shouldn't execute
in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
... } } instead of reiterating the ReleaseSysCache in two places?
+ /* Collect bound spec nodes in a list. This is done
if the partition is
+ * a default partition. In case of default partition,
constraint is formed
+ * by performing <> operation over the partition
constraints of the
+ * existing partitions.
+ */
I doubt that handles NULLs properly.
+ inhoids =
find_inheritance_children(RelationGetRelid(parent), NoLock);
Again, no lock? Really?
The logic which follows looks largely cut-and-pasted, which makes me
think you need to do some refactoring here to make it more clear
what's going on, so that you have the relevant logic in just one
place. It seems wrong anyway to shove all of this logic specific to
the default case into get_qual_from_partbound() when the logic for the
non-default case is inside get_qual_for_list. Where there were 2
lines of code before you've now got something like 30.
+ if(get_negator(operoid) == InvalidOid)
+ elog(ERROR, "no negator found for partition operator %u",
+ operoid);
I really doubt that's OK. elog() shouldn't be reachable, but this
will be reachable if the partitioning operator does not have a
negator. And there's the NULL-handling issue I mentioned above, too.
+ if (partdesc->boundinfo->has_def && key->strategy
+ == PARTITION_STRATEGY_LIST)
+ result = parent->indexes[partdesc->boundinfo->def_index];
Testing for PARTITION_STRATEGY_LIST here seems unnecessary. If
has_def (or has_default, as it probably should be) isn't allowed for
range partitions, then it's redundant; if it is allowed, then that
case should be handled too. Also, at this point we've already set
*failed_at and *failed_slot; presumably you'd want to make this check
before you get to that point.
I suspect there are quite a few more problems here in addition to the
ones mentioned above, but I don't think it makes sense to spend too
much time searching for them until some of this basic stuff is cleaned
up.
Attachment
On Tue, May 2, 2017 at 9:33 PM, Rahila Syed <rahilasyed90@gmail.com> wrote: > Please find attached updated patch with review comments by Robert and Jeevan > implemented. > Patch v8 got clean apply on latest head but server got crash at data insert in the following test: -- Create test table CREATE TABLE test ( a int, b date) PARTITION BY LIST (a); CREATE TABLE p1 PARTITION OF test FOR VALUES IN (DEFAULT) PARTITION BY LIST(b); CREATE TABLE p11 PARTITION OF p1 FOR VALUES IN (DEFAULT); -- crash INSERT INTO test VALUES (210,'1/1/2002'); Regards, Amul
CREATE TABLE p11 PARTITION OF p1 DEFAULT;
On Tue, May 2, 2017 at 9:33 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Please find attached updated patch with review comments by Robert and Jeevan
> implemented.
>
Patch v8 got clean apply on latest head but server got crash at data
insert in the following test:
-- Create test table
CREATE TABLE test ( a int, b date) PARTITION BY LIST (a);
CREATE TABLE p1 PARTITION OF test FOR VALUES IN (DEFAULT) PARTITION BY LIST(b);
CREATE TABLE p11 PARTITION OF p1 FOR VALUES IN (DEFAULT);
-- crash
INSERT INTO test VALUES (210,'1/1/2002');
Regards,
Amul
Attachment
The syntax implemented in this patch is as follows,
CREATE TABLE p11 PARTITION OF p1 DEFAULT;
create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;
\d+ lpd
Table "public.lpd"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+--------------+-------------
a | integer | | | | plain | |
b | integer | | | | plain | |
c | character varying | | | | extended | |
Partition key: LIST (a)
Partitions: lpd_d FOR VALUES IN (DEFAULT)
On Thu, May 4, 2017 at 5:14 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:The syntax implemented in this patch is as follows,
CREATE TABLE p11 PARTITION OF p1 DEFAULT;Applied v9 patches, table description still showing old pattern of default partition. Is it expected?
create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;
\d+ lpd
Table "public.lpd"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-------------------+-----------+----------+-------- -+----------+--------------+-- -----------
a | integer | | | | plain | |
b | integer | | | | plain | |
c | character varying | | | | extended | |
Partition key: LIST (a)
Partitions: lpd_d FOR VALUES IN (DEFAULT)
Hi Rahila,
still thinking about the syntax (sorry):
On 04.05.2017 13:44, Rahila Syed wrote:
[...] The syntax implemented in this patch is as follows,
CREATE TABLE p11 PARTITION OF p1 DEFAULT;
Rewriting the following:
On Thu, May 4, 2017 at 4:02 PM, amul sul <sulamul@gmail.com> wrote:[...] CREATE TABLE p1 PARTITION OF test FOR VALUES IN (DEFAULT) PARTITION BY LIST(b); [...]
It yields
CREATE TABLE p1 PARTITION OF test DEFAULT PARTITION BY LIST(b);
This reads to me like "DEFAULT PARTITION".
I can imagine a lot of confusion when those queries are encountered in the wild. I know this thread is about creating a default partition but I were to propose a minor change in the following direction, I think confusion would be greatly avoided:
CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITIONED BY LIST(b);
I know it's a bit longer but I think those 4 characters might serve readability in the long term. It was especially confusing to see PARTITION in two positions serving two different functions.
Sven
pg_restore is failing for default partition, dump file still storing old syntax of default partition.
create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;
create database bkp owner 'edb';
grant all on DATABASE bkp to edb;
--take plain dump of existing database
\! ./pg_dump -f lpd_test.sql -Fp -d postgres
--restore plain backup to new database bkp
\! ./psql -f lpd_test.sql -d bkp
psql:lpd_test.sql:63: ERROR: syntax error at or near "DEFAULT"
LINE 2: FOR VALUES IN (DEFAULT);
^
vi lpd_test.sql
--
-- Name: lpd; Type: TABLE; Schema: public; Owner: edb
--
CREATE TABLE lpd (
a integer,
b integer,
c character varying
)
PARTITION BY LIST (a);
ALTER TABLE lpd OWNER TO edb;
--
-- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
--
CREATE TABLE lpd_d PARTITION OF lpd
FOR VALUES IN (DEFAULT);
ALTER TABLE lpd_d OWNER TO edb;
Hi Rahila,
pg_restore is failing for default partition, dump file still storing old syntax of default partition.
create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;
create database bkp owner 'edb';
grant all on DATABASE bkp to edb;
--take plain dump of existing database
\! ./pg_dump -f lpd_test.sql -Fp -d postgres
--restore plain backup to new database bkp
\! ./psql -f lpd_test.sql -d bkp
psql:lpd_test.sql:63: ERROR: syntax error at or near "DEFAULT"
LINE 2: FOR VALUES IN (DEFAULT);
^
vi lpd_test.sql
--
-- Name: lpd; Type: TABLE; Schema: public; Owner: edb
--
CREATE TABLE lpd (
a integer,
b integer,
c character varying
)
PARTITION BY LIST (a);
ALTER TABLE lpd OWNER TO edb;
--
-- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
--
CREATE TABLE lpd_d PARTITION OF lpd
FOR VALUES IN (DEFAULT);
ALTER TABLE lpd_d OWNER TO edb;Thanks,Rajkumar
>pg_restore is failing for default partition, dump file still storing old syntax of default partition.
Hi Rahila,I am not able add a new partition if default partition is further partitionedwith default partition.Consider example below:postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);CREATE TABLEpostgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, 8);CREATE TABLEpostgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY LIST(b);CREATE TABLEpostgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;CREATE TABLEpostgres=# INSERT INTO test VALUES (20, 24, 12);INSERT 0 1postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);ERROR: could not open file "base/12335/16420": No such file or directoryThanks,Jeevan LadheOn Fri, May 5, 2017 at 11:55 AM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote: Hi Rahila,
pg_restore is failing for default partition, dump file still storing old syntax of default partition.
create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;
create database bkp owner 'edb';
grant all on DATABASE bkp to edb;
--take plain dump of existing database
\! ./pg_dump -f lpd_test.sql -Fp -d postgres
--restore plain backup to new database bkp
\! ./psql -f lpd_test.sql -d bkp
psql:lpd_test.sql:63: ERROR: syntax error at or near "DEFAULT"
LINE 2: FOR VALUES IN (DEFAULT);
^
vi lpd_test.sql
--
-- Name: lpd; Type: TABLE; Schema: public; Owner: edb
--
CREATE TABLE lpd (
a integer,
b integer,
c character varying
)
PARTITION BY LIST (a);
ALTER TABLE lpd OWNER TO edb;
--
-- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
--
CREATE TABLE lpd_d PARTITION OF lpd
FOR VALUES IN (DEFAULT);
ALTER TABLE lpd_d OWNER TO edb;Thanks,Rajkumar
On Thu, May 4, 2017 at 4:28 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > While reviewing the code I was trying to explore more cases, and I here > comes an > open question to my mind: > should we allow the default partition table to be partitioned further? I think yes. In general, you are allowed to partition a partition, and I can't see any justification for restricting that for default partitions when we allow it everywhere else. > If we allow it(as in the current case) then observe following case, where I > have defined a default partitioned which is further partitioned on a > different > column. > > postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a); > CREATE TABLE > postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, > 8); > CREATE TABLE > postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY > LIST(b); > CREATE TABLE > postgres=# INSERT INTO test VALUES (20, 24, 12); > ERROR: no partition of relation "test_pd" found for row > DETAIL: Partition key of the failing row contains (b) = (24). > > Note, that it does not allow inserting the tuple(20, 24, 12) because though > a=20 > would fall in default partition i.e. test_pd, table test_pd itself is > further > partitioned and does not have any partition satisfying b=24. Right, that looks like correct behavior. You would have gotten the same result if you had tried to insert into test_pd directly. > Further if I define a default partition for table test_pd, the the tuple > gets inserted. That also sounds correct. > Doesn't this sound like the whole purpose of having DEFAULT partition on > test > table is defeated? Not to me. It's possible to do lots of silly things with partitioned tables. For example, one case that we talked about before is that you can define a range partition for, say, VALUES (0) TO (100), and then subpartition it and give the subpartitions bounds which are outside the range 0-100. That's obviously silly and will lead to failures inserting tuples, but we chose not to try to prohibit it because it's not really broken, just useless. There are lots of similar cases involving other features. For example, you can apply an inherited CHECK (false) constraint to a table, which makes it impossible for that table or any of its children to ever contain any rows; that is probably a dumb configuration. You can create two unique indexes with exactly the same definition; unless you're creating a new one with the intent of dropping the old one, that doesn't make sense. You can define a trigger that always throws an ERROR and then another trigger which runs later that modifies the tuple; the second will never be run because the first one will always kill the transaction before we get there. Those things are all legal, but often unuseful. Similarly here. Defining a default list partition and then subpartitioning it by list is not likely to be a good schema design, but it doesn't mean we should try to disallow it. It is important to distinguish between things that are actually *broken* (like a partitioning configuration where the tuples that can be inserted into a partition manually differ from the ones that are routed to it automatically) and things that are merely *lame* (like creating a multi-level partitioning hierarchy when a single level would have done the job just as well). The former should be prevented by the code, while the latter is at most a documentation issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 4, 2017 at 4:28 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> While reviewing the code I was trying to explore more cases, and I here
> comes an
> open question to my mind:
> should we allow the default partition table to be partitioned further?
I think yes. In general, you are allowed to partition a partition,
and I can't see any justification for restricting that for default
partitions when we allow it everywhere else.
> If we allow it(as in the current case) then observe following case, where I
> have defined a default partitioned which is further partitioned on a
> different
> column.
>
> postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
> CREATE TABLE
> postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7,
> 8);
> CREATE TABLE
> postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY
> LIST(b);
> CREATE TABLE
> postgres=# INSERT INTO test VALUES (20, 24, 12);
> ERROR: no partition of relation "test_pd" found for row
> DETAIL: Partition key of the failing row contains (b) = (24).
>
> Note, that it does not allow inserting the tuple(20, 24, 12) because though
> a=20
> would fall in default partition i.e. test_pd, table test_pd itself is
> further
> partitioned and does not have any partition satisfying b=24.
Right, that looks like correct behavior. You would have gotten the
same result if you had tried to insert into test_pd directly.
> Further if I define a default partition for table test_pd, the the tuple
> gets inserted.
That also sounds correct.
> Doesn't this sound like the whole purpose of having DEFAULT partition on
> test
> table is defeated?
Not to me. It's possible to do lots of silly things with partitioned
tables. For example, one case that we talked about before is that you
can define a range partition for, say, VALUES (0) TO (100), and then
subpartition it and give the subpartitions bounds which are outside
the range 0-100. That's obviously silly and will lead to failures
inserting tuples, but we chose not to try to prohibit it because it's
not really broken, just useless. There are lots of similar cases
involving other features. For example, you can apply an inherited
CHECK (false) constraint to a table, which makes it impossible for
that table or any of its children to ever contain any rows; that is
probably a dumb configuration. You can create two unique indexes with
exactly the same definition; unless you're creating a new one with the
intent of dropping the old one, that doesn't make sense. You can
define a trigger that always throws an ERROR and then another trigger
which runs later that modifies the tuple; the second will never be run
because the first one will always kill the transaction before we get
there. Those things are all legal, but often unuseful. Similarly
here. Defining a default list partition and then subpartitioning it
by list is not likely to be a good schema design, but it doesn't mean
we should try to disallow it. It is important to distinguish between
things that are actually *broken* (like a partitioning configuration
where the tuples that can be inserted into a partition manually differ
from the ones that are routed to it automatically) and things that are
merely *lame* (like creating a multi-level partitioning hierarchy when
a single level would have done the job just as well). The former
should be prevented by the code, while the latter is at most a
documentation issue.
On Thu, May 4, 2017 at 4:40 PM, Sven R. Kunze <srkunze@mail.de> wrote: > It yields > > CREATE TABLE p1 PARTITION OF test DEFAULT PARTITION BY LIST(b); > > This reads to me like "DEFAULT PARTITION". > > I can imagine a lot of confusion when those queries are encountered in the > wild. I know this thread is about creating a default partition but I were to > propose a minor change in the following direction, I think confusion would > be greatly avoided: > > CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITIONED BY LIST(b); > > I know it's a bit longer but I think those 4 characters might serve > readability in the long term. It was especially confusing to see PARTITION > in two positions serving two different functions. Well, we certainly can't make that change just for default partitions. I mean, that would be non-orthogonal, right? You can't say that the way to subpartition is to write "PARTITION BY strategy" when the table unpartitioned or is a non-default partition but "PARTITIONED BY strategy" when it is a default partition. That would certainly not be a good way of confusing users less, and would probably result in a variety of special cases in places like ruleutils.c or pg_dump, plus some weasel-wording in the documentation. We COULD do a general change from "CREATE TABLE table_name PARTITION BY strategy" to "CREATE TABLE table_name PARTITIONED BY strategy". I don't have any particular arguments against that except that the current syntax is more like Oracle, which might count for something, and maybe the fact that we're a month after feature freeze. Still, if we want to change that, now would be the time; but I favor leaving it alone. I don't have a big objection to adding AS. If that's the majority vote, fine; if not, that's OK, too. I can see it might be a bit more clear in the case you mention, but it might also just be a noise word that we don't really need. There don't seem to be many uses of AS that would pose a risk of actual grammar conflicts here. I can imagine someone wanting to use CREATE TABLE ... PARTITION BY ... AS SELECT ... to create and populate a partition in one command, but that wouldn't be a conflict because it'd have to go AFTER the partition specification. In the DEFAULT case, you'd end up with something like CREATE TABLE p1 PARTITION OF test AS DEFAULT AS <query> ...which is neither great nor horrible syntax-wise and maybe not such a good thing to support anyway since it would have to lock the parent to add the partition and then keep the lock on the parent while populating the new child (ouch). So I guess I'm still in favor of the CREATE TABLE p1 PARTITION OF test DEFAULT syntax, but if it ends up being AS DEFAULT instead, I can live with that. Other opinions? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 4, 2017 at 4:40 PM, Sven R. Kunze <srkunze@mail.de> wrote:
> It yields
>
> CREATE TABLE p1 PARTITION OF test DEFAULT PARTITION BY LIST(b);
>
> This reads to me like "DEFAULT PARTITION".
>
> I can imagine a lot of confusion when those queries are encountered in the
> wild. I know this thread is about creating a default partition but I were to
> propose a minor change in the following direction, I think confusion would
> be greatly avoided:
>
> CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITIONED BY LIST(b);
>
> I know it's a bit longer but I think those 4 characters might serve
> readability in the long term. It was especially confusing to see PARTITION
> in two positions serving two different functions.
Well, we certainly can't make that change just for default partitions.
I mean, that would be non-orthogonal, right? You can't say that the
way to subpartition is to write "PARTITION BY strategy" when the table
unpartitioned or is a non-default partition but "PARTITIONED BY
strategy" when it is a default partition. That would certainly not be
a good way of confusing users less, and would probably result in a
variety of special cases in places like ruleutils.c or pg_dump, plus
some weasel-wording in the documentation. We COULD do a general
change from "CREATE TABLE table_name PARTITION BY strategy" to "CREATE
TABLE table_name PARTITIONED BY strategy". I don't have any
particular arguments against that except that the current syntax is
more like Oracle, which might count for something, and maybe the fact
that we're a month after feature freeze. Still, if we want to change
that, now would be the time; but I favor leaving it alone.
I don't have a big objection to adding AS. If that's the majority
vote, fine; if not, that's OK, too. I can see it might be a bit more
clear in the case you mention, but it might also just be a noise word
that we don't really need. There don't seem to be many uses of AS
that would pose a risk of actual grammar conflicts here. I can
imagine someone wanting to use CREATE TABLE ... PARTITION BY ... AS
SELECT ... to create and populate a partition in one command, but that
wouldn't be a conflict because it'd have to go AFTER the partition
specification. In the DEFAULT case, you'd end up with something like
CREATE TABLE p1 PARTITION OF test AS DEFAULT AS <query>
...which is neither great nor horrible syntax-wise and maybe not such
a good thing to support anyway since it would have to lock the parent
to add the partition and then keep the lock on the parent while
populating the new child (ouch).
So I guess I'm still in favor of the CREATE TABLE p1 PARTITION OF test
DEFAULT syntax, but if it ends up being AS DEFAULT instead, I can live
with that.
Other opinions?
Hi Rahila,I am not able add a new partition if default partition is further partitionedwith default partition.Consider example below:postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);CREATE TABLEpostgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, 8);CREATE TABLEpostgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY LIST(b);CREATE TABLEpostgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;CREATE TABLEpostgres=# INSERT INTO test VALUES (20, 24, 12);INSERT 0 1postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);ERROR: could not open file "base/12335/16420": No such file or directoryThanks,Jeevan LadheOn Fri, May 5, 2017 at 11:55 AM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote: Hi Rahila,
pg_restore is failing for default partition, dump file still storing old syntax of default partition.
create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;
create database bkp owner 'edb';
grant all on DATABASE bkp to edb;
--take plain dump of existing database
\! ./pg_dump -f lpd_test.sql -Fp -d postgres
--restore plain backup to new database bkp
\! ./psql -f lpd_test.sql -d bkp
psql:lpd_test.sql:63: ERROR: syntax error at or near "DEFAULT"
LINE 2: FOR VALUES IN (DEFAULT);
^
vi lpd_test.sql
--
-- Name: lpd; Type: TABLE; Schema: public; Owner: edb
--
CREATE TABLE lpd (
a integer,
b integer,
c character varying
)
PARTITION BY LIST (a);
ALTER TABLE lpd OWNER TO edb;
--
-- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
--
CREATE TABLE lpd_d PARTITION OF lpd
FOR VALUES IN (DEFAULT);
ALTER TABLE lpd_d OWNER TO edb;Thanks,Rajkumar
On Tue, May 9, 2017 at 9:26 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >>Hi Rahila, > >>I am not able add a new partition if default partition is further >> partitioned >>with default partition. > >>Consider example below: > >>postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a); >>CREATE TABLE >>postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, >> 8); >>CREATE TABLE >>postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY >> LIST(b); >>CREATE TABLE >>postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT; >>CREATE TABLE >>postgres=# INSERT INTO test VALUES (20, 24, 12); >>INSERT 0 1 >>postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15); > ERROR: could not open file "base/12335/16420": No such file or directory > > Regarding fix for this I think we need to prohibit this case. That is > prohibit creation > of new partition after a default partition which is further partitioned. > Currently before adding a new partition after default partition all the rows > of default > partition are scanned and if a row which matches the new partitions > constraint exists > the new partition is not added. > > If we allow this for default partition which is partitioned further, we will > have to scan > all the partitions of default partition for matching rows which can slow > down execution. I think this case should be allowed and I don't think it should require scanning all the partitions of the default partition. This is no different than any other case where multiple levels of partitioning are used. First, you route the tuple at the root level; then, you route it at the next level; and so on. It shouldn't matter whether the routing at the top level is to that level's default partition or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/05/10 2:09, Robert Haas wrote: > On Tue, May 9, 2017 at 9:26 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: >>> Hi Rahila, >> >>> I am not able add a new partition if default partition is further >>> partitioned >>> with default partition. >> >>> Consider example below: >> >>> postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a); >>> CREATE TABLE >>> postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, >>> 8); >>> CREATE TABLE >>> postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY >>> LIST(b); >>> CREATE TABLE >>> postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT; >>> CREATE TABLE >>> postgres=# INSERT INTO test VALUES (20, 24, 12); >>> INSERT 0 1 >>> postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15); >> ERROR: could not open file "base/12335/16420": No such file or directory >> >> Regarding fix for this I think we need to prohibit this case. That is >> prohibit creation >> of new partition after a default partition which is further partitioned. >> Currently before adding a new partition after default partition all the rows >> of default >> partition are scanned and if a row which matches the new partitions >> constraint exists >> the new partition is not added. >> >> If we allow this for default partition which is partitioned further, we will >> have to scan >> all the partitions of default partition for matching rows which can slow >> down execution. > > I think this case should be allowed +1 > and I don't think it should > require scanning all the partitions of the default partition. This is > no different than any other case where multiple levels of partitioning > are used. First, you route the tuple at the root level; then, you > route it at the next level; and so on. It shouldn't matter whether > the routing at the top level is to that level's default partition or > not. It seems that adding a new partition at the same level as the default partition will require scanning it or its (leaf) partitions if partitioned. Consider that p1, pd are partitions of a list-partitioned table p accepting 1 and everything else, respectively, and pd is further partitioned. When adding p2 of p for 2, we need to scan the partitions of pd to check if there are any (2, ...) rows. As for fixing the reported issue whereby the partitioned default partition's non-existent file is being accessed, it would help to take a look at the code in ATExecAttachPartition() starting at the following: /* * Set up to have the table be scanned to validate the partition * constraint (see partConstraint above). Ifit's a partitioned table, we * instead schedule its leaf partitions to be scanned. */ if (!skip_validate) { Thanks, Amit
>partition will require scanning it or its (leaf) partitions if
>partitioned. Consider that p1, pd are partitions of a list-partitioned
>table p accepting 1 and everything else, respectively, and pd is further
>partitioned. When adding p2 of p for 2, we need to scan the partitions of
>pd to check if there are any (2, ...) rows.
This is a better explanation. May be following sentence was confusing,
"That is prohibit creation of new partition after a default partition which is further partitioned"
>partition's non-existent file is being accessed, it would help to take a
>look at the code in ATExecAttachPartition() starting at the following:
similar support should be provided in the case of adding a partition after default partition as well.
+1On 2017/05/10 2:09, Robert Haas wrote:
> On Tue, May 9, 2017 at 9:26 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>> Hi Rahila,
>>
>>> I am not able add a new partition if default partition is further
>>> partitioned
>>> with default partition.
>>
>>> Consider example below:
>>
>>> postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
>>> CREATE TABLE
>>> postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7,
>>> 8);
>>> CREATE TABLE
>>> postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY
>>> LIST(b);
>>> CREATE TABLE
>>> postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;
>>> CREATE TABLE
>>> postgres=# INSERT INTO test VALUES (20, 24, 12);
>>> INSERT 0 1
>>> postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);
>> ERROR: could not open file "base/12335/16420": No such file or directory
>>
>> Regarding fix for this I think we need to prohibit this case. That is
>> prohibit creation
>> of new partition after a default partition which is further partitioned.
>> Currently before adding a new partition after default partition all the rows
>> of default
>> partition are scanned and if a row which matches the new partitions
>> constraint exists
>> the new partition is not added.
>>
>> If we allow this for default partition which is partitioned further, we will
>> have to scan
>> all the partitions of default partition for matching rows which can slow
>> down execution.
>
> I think this case should be allowed
> and I don't think it should
> require scanning all the partitions of the default partition. This is
> no different than any other case where multiple levels of partitioning
> are used. First, you route the tuple at the root level; then, you
> route it at the next level; and so on. It shouldn't matter whether
> the routing at the top level is to that level's default partition or
> not.
It seems that adding a new partition at the same level as the default
partition will require scanning it or its (leaf) partitions if
partitioned. Consider that p1, pd are partitions of a list-partitioned
table p accepting 1 and everything else, respectively, and pd is further
partitioned. When adding p2 of p for 2, we need to scan the partitions of
pd to check if there are any (2, ...) rows.
As for fixing the reported issue whereby the partitioned default
partition's non-existent file is being accessed, it would help to take a
look at the code in ATExecAttachPartition() starting at the following:
/*
* Set up to have the table be scanned to validate the partition
* constraint (see partConstraint above). If it's a partitioned table, we
* instead schedule its leaf partitions to be scanned.
*/
if (!skip_validate)
{
Thanks,
Amit
+1 for AS DEFAULT syntax if it helps in improving readability specially in following caseCREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITION BY LIST(a);Thank you,Rahila SyedOn Tue, May 9, 2017 at 1:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Thu, May 4, 2017 at 4:40 PM, Sven R. Kunze <srkunze@mail.de> wrote:
> It yields
>
> CREATE TABLE p1 PARTITION OF test DEFAULT PARTITION BY LIST(b);
>
> This reads to me like "DEFAULT PARTITION".
>
> I can imagine a lot of confusion when those queries are encountered in the
> wild. I know this thread is about creating a default partition but I were to
> propose a minor change in the following direction, I think confusion would
> be greatly avoided:
>
> CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITIONED BY LIST(b);
>
> I know it's a bit longer but I think those 4 characters might serve
> readability in the long term. It was especially confusing to see PARTITION
> in two positions serving two different functions.
Well, we certainly can't make that change just for default partitions.
I mean, that would be non-orthogonal, right? You can't say that the
way to subpartition is to write "PARTITION BY strategy" when the table
unpartitioned or is a non-default partition but "PARTITIONED BY
strategy" when it is a default partition. That would certainly not be
a good way of confusing users less, and would probably result in a
variety of special cases in places like ruleutils.c or pg_dump, plus
some weasel-wording in the documentation. We COULD do a general
change from "CREATE TABLE table_name PARTITION BY strategy" to "CREATE
TABLE table_name PARTITIONED BY strategy". I don't have any
particular arguments against that except that the current syntax is
more like Oracle, which might count for something, and maybe the fact
that we're a month after feature freeze. Still, if we want to change
that, now would be the time; but I favor leaving it alone.
You are definitely right. Changing it here would require to change it everywhere AND thus to loose syntax parity with Oracle.
I am not in a position to judge this properly whether this would be a huge problem. Personally, I don't have an issue with that. But don't count me as most important opion on this.
So I guess I'm still in favor of the CREATE TABLE p1 PARTITION OF test
DEFAULT syntax, but if it ends up being AS DEFAULT instead, I can live
with that.
Is to make it optional an option?
Sven
On Wed, May 10, 2017 at 10:59 AM, Sven R. Kunze <srkunze@mail.de> wrote: > You are definitely right. Changing it here would require to change it > everywhere AND thus to loose syntax parity with Oracle. Right. > I am not in a position to judge this properly whether this would be a huge > problem. Personally, I don't have an issue with that. But don't count me as > most important opion on this. Well, I don't think it would be a HUGE problem, but I think the fact that Amit chose to implement this with syntax similar to that of Oracle is probably not a coincidence, but rather a goal, and I think the readability problem that you're worrying about is really pretty minor. I think most people aren't going to subpartition their default partition, and I think those who do will probably find the syntax clear enough anyway. So I don't favor changing it. Now, if there's an outcry of support for your position then I'll stand aside but I don't anticipate that. >> So I guess I'm still in favor of the CREATE TABLE p1 PARTITION OF test >> DEFAULT syntax, but if it ends up being AS DEFAULT instead, I can live >> with that. > > Is to make it optional an option? Optional keywords may not be the root of ALL evil, but they're pretty evil. See my posting earlier on this same thread on this topic: http://postgr.es/m/CA+TgmoZGHgd3vKZvyQ1Qx3e0L3n=voxY57mz9TTncVET-aLK2A@mail.gmail.com The issues here are more or less the same. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I'm surprised that there is so much activity in this thread. Is this patch being considered for pg10? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, May 10, 2017 at 12:12 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I'm surprised that there is so much activity in this thread. Is this > patch being considered for pg10? Of course not. Feature freeze was a month ago. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Well, I don't think it would be a HUGE problem, but I think the fact that Amit chose to implement this with syntax similar to that of Oracle is probably not a coincidence, but rather a goal, and I think the readability problem that you're worrying about is really pretty minor. I think most people aren't going to subpartition their default partition, and I think those who do will probably find the syntax clear enough anyway.
I agree here.
Optional keywords may not be the root of ALL evil, but they're pretty evil. See my posting earlier on this same thread on this topic: http://postgr.es/m/CA+TgmoZGHgd3vKZvyQ1Qx3e0L3n=voxY57mz9TTncVET-aLK2A@mail.gmail.com The issues here are more or less the same.
Ah, I see. I didn't draw the conclusion from the optionality of a keyword the other day but after re-reading your post, it's exactly the same issue.
Let's avoid optional keywords!
Sven
>3.
Todo:
Add regression tests
Documentation
Hi Rahila,I have started reviewing your latest patch, and here are my initial comments:1.In following block, we can just do with def_index, and we do not need found_defflag. We can check if def_index is -1 or not to decide if default partition ispresent.@@ -166,6 +172,8 @@ RelationBuildPartitionDesc(Relation rel) /* List partitioning specific */PartitionListValue **all_values = NULL;bool found_null = false;+ bool found_def = false;+ int def_index = -1;int null_index = -1;2.In check_new_partition_bound, in case of PARTITION_STRATEGY_LIST, removefollowing duplicate declaration of boundinfo, because it is confusing and afteryour changes it is not needed as its not getting overridden in the if blocklocally.if (partdesc->nparts > 0){PartitionBoundInfo boundinfo = partdesc->boundinfo;ListCell *cell;3.In following function isDefaultPartitionBound, first statement "return false"is not needed.+ * Returns true if the partition bound is default+ */+bool+isDefaultPartitionBound(Node *value)+{+ if (IsA(value, DefElem))+ {+ DefElem *defvalue = (DefElem *) value;+ if(!strcmp(defvalue->defname, "DEFAULT"))+ return true;+ return false;+ }+ return false;+}4.As mentioned in my previous set of comments, following if block inside a loopin get_qual_for_default needs a break:+ foreach(cell1, bspec->listdatums)+ {+ Node *value = lfirst(cell1);+ if (isDefaultPartitionBound(value)) + {+ def_elem = true;+ *defid = inhrelid;+ }+ }5.In the grammar the rule default_part_list is not needed:+default_partition:+ DEFAULT { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }++default_part_list:+ default_partition { $$ = list_make1($1); }+ ;+Instead you can simply declare default_partition as a list and write it as:default_partition:DEFAULT{Node *def = (Node *)makeDefElem("DEFAULT", NULL, @1);$$ = list_make1(def);}6.You need to change the output of the describe command, which is currently as below: postgres=# \d+ test; Table "public.test" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+-------- -+--------------+------------- a | integer | | | | plain | | b | date | | | | plain | | Partition key: LIST (a) Partitions: pd FOR VALUES IN (DEFAULT), test_p1 FOR VALUES IN (4, 5) What about changing the Paritions output as below: Partitions: pd DEFAULT, test_p1 FOR VALUES IN (4, 5) 7.You need to handle tab completion for DEFAULT.e.g.If I partially type following command:CREATE TABLE pd PARTITION OF test DEFAand then press tab, I get following completion:CREATE TABLE pd PARTITION OF test FOR VALUESI did some primary testing and did not find any problem so far.I will review and test further and let you know my comments.Regards,Jeevan LadheOn Thu, May 4, 2017 at 6:09 PM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote: On Thu, May 4, 2017 at 5:14 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:The syntax implemented in this patch is as follows,
CREATE TABLE p11 PARTITION OF p1 DEFAULT;Applied v9 patches, table description still showing old pattern of default partition. Is it expected?
create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;
\d+ lpd
Table "public.lpd"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-------------------+-----------+----------+-------- -+----------+--------------+-- -----------
a | integer | | | | plain | |
b | integer | | | | plain | |
c | character varying | | | | extended | |
Partition key: LIST (a)
Partitions: lpd_d FOR VALUES IN (DEFAULT)
Attachment
On Thu, May 11, 2017 at 10:07 AM, Rahila Syed <rahilasyed90@gmail.com> wrote: > Please find attached an updated patch with review comments and bugs reported > till date implemented. You haven't done anything about the repeated suggestion that this should also cover range partitioning. + /* + * If the partition is the default partition switch + * back to PARTITION_STRATEGY_LIST + */ + if (spec->strategy == PARTITION_DEFAULT) + result_spec->strategy = PARTITION_STRATEGY_LIST; + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("invalid bound specification for a list partition"), parser_errposition(pstate, exprLocation(bound)))); This is incredibly ugly. I don't know exactly what should be done about it, but I think PARTITION_DEFAULT is a bad idea and has got to go. Maybe add a separate isDefault flag to PartitionBoundSpec. + /* + * Skip if it's a partitioned table. Only RELKIND_RELATION + * relations (ie, leaf partitions) need to be scanned. + */ + if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) What about foreign table partitions? Doesn't it strike you as a bit strange that get_qual_for_default() doesn't return a qual? Functions should generally have names that describe what they do. + bound_datums = list_copy(spec->listdatums); + + boundspecs = get_qual_for_default(parent, defid); + + foreach(cell, bound_datums) + { + Node *value = lfirst(cell); + boundspecs = lappend(boundspecs, value); + } There's an existing function that you can use to concatenate two lists instead of open-coding it. Also, I think that before you ask anyone to spend too much more time and energy reviewing this, you should really add the documentation and regression tests which you mentioned as a TODO. And run the code through pgindent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>3.>In following function isDefaultPartitionBound, first statement "return false">is not needed.It is needed to return false if the node is not DefElem.
Hello,Please find attached an updated patch with review comments and bugs reported till date implemented.
(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.
DROP TABLE list1;
CREATE TABLE list1 (
a int,
b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 SELECT generate_series(1,2),1;
-- Partition overlapping with DEF
CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
INSERT INTO list1 SELECT generate_series(2,3),2;
>1.>In following block, we can just do with def_index, and we do not need found_def>flag. We can check if def_index is -1 or not to decide if default partition is>present.found_def is used to set boundinfo->has_default which is used at coupleof other places to check if default partition exists. The implementation is similarto has_null.
>3.>In following function isDefaultPartitionBound, first statement "return false">is not needed.It is needed to return false if the node is not DefElem.
Todo:
Add regression tests
DocumentationThank you,Rahila Syed
>(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.
>This is incredibly ugly. I don't know exactly what should be done
>about it, but I think PARTITION_DEFAULT is a bad idea and has got to
>go. Maybe add a separate isDefault flag to PartitionBoundSpec
>Doesn't it strike you as a bit strange that get_qual_for_default()
>doesn't return a qual? Functions should generally have names that
>describe what they do.
>There's an existing function that you can use to concatenate two lists
>instead of open-coding it.
>you should really add the documentation and
>regression tests which you mentioned as a TODO. And run the code
>through pgindent
On Thu, May 11, 2017 at 7:37 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:Hello,Please find attached an updated patch with review comments and bugs reported till date implemented.Hello Rahila,Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."
(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.
DROP TABLE list1;
CREATE TABLE list1 (
a int,
b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 SELECT generate_series(1,2),1;
-- Partition overlapping with DEF
CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
INSERT INTO list1 SELECT generate_series(2,3),2;postgres=# SELECT * FROM list1 ORDER BY a,b;a | b---+---1 | 12 | 12 | 23 | 2(4 rows)postgres=# SELECT * FROM list1 WHERE a=2;a | b---+---2 | 2(1 row)This ignores the a=2 entries in the DEFAULT.postgres=# SELECT * FROM list1_def;a | b---+---2 | 13 | 2(2 rows)(2) I get the following warning:partition.c: In function ‘check_new_partition_bound’:partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this function [-Wmaybe-uninitialized]&& boundinfo->has_default)^preproc.y:3250.2-8: warning: type clash on default action: <str> != <>>1.>In following block, we can just do with def_index, and we do not need found_def>flag. We can check if def_index is -1 or not to decide if default partition is>present.found_def is used to set boundinfo->has_default which is used at coupleof other places to check if default partition exists. The implementation is similarto has_null.
>3.>In following function isDefaultPartitionBound, first statement "return false">is not needed.It is needed to return false if the node is not DefElem.
Todo:
Add regression tests
DocumentationThank you,Rahila Syed
Attachment
come in picture. Please find attached an updated patch which corrects this.Hello,This was introduced in latest version. We are not allowing adding a partition when entries with same key value exist in default partition. So this scenario should not
>(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.
CREATE TABLE list1 (
a int,
b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
CREATE TABLE list1_5 PARTITION OF list1 FOR VALUES IN (3);
>(2) I get the following warning:>partition.c: In function ‘check_new_partition_bound’:>partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this function [-Wmaybe-uninitialized]> && boundinfo->has_default)^>preproc.y:3250.2-8: warning: type clash on default action: <str> != <>I failed to notice this warning. I will look into it.
>This is incredibly ugly. I don't know exactly what should be done
>about it, but I think PARTITION_DEFAULT is a bad idea and has got to
>go. Maybe add a separate isDefault flag to PartitionBoundSpecWill look at other ways to do it.
>Doesn't it strike you as a bit strange that get_qual_for_default()
>doesn't return a qual? Functions should generally have names that
>describe what they do.Will fix this.
>There's an existing function that you can use to concatenate two lists
>instead of open-coding it.Will check this.
>you should really add the documentation and
>regression tests which you mentioned as a TODO. And run the code
>through pgindentI will also update the next version with documentation and regression testsand run pgindentThank you,Rahila SyedOn Fri, May 12, 2017 at 4:33 PM, Beena Emerson <memissemerson@gmail.com> wrote:On Thu, May 11, 2017 at 7:37 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:Hello,Please find attached an updated patch with review comments and bugs reported till date implemented.Hello Rahila,Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."
(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.
DROP TABLE list1;
CREATE TABLE list1 (
a int,
b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 SELECT generate_series(1,2),1;
-- Partition overlapping with DEF
CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
INSERT INTO list1 SELECT generate_series(2,3),2;postgres=# SELECT * FROM list1 ORDER BY a,b;a | b---+---1 | 12 | 12 | 23 | 2(4 rows)postgres=# SELECT * FROM list1 WHERE a=2;a | b---+---2 | 2(1 row)This ignores the a=2 entries in the DEFAULT.postgres=# SELECT * FROM list1_def;a | b---+---2 | 13 | 2(2 rows)(2) I get the following warning:partition.c: In function ‘check_new_partition_bound’:partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this function [-Wmaybe-uninitialized]&& boundinfo->has_default)^preproc.y:3250.2-8: warning: type clash on default action: <str> != <>>1.>In following block, we can just do with def_index, and we do not need found_def>flag. We can check if def_index is -1 or not to decide if default partition is>present.found_def is used to set boundinfo->has_default which is used at coupleof other places to check if default partition exists. The implementation is similarto has_null.
>3.>In following function isDefaultPartitionBound, first statement "return false">is not needed.It is needed to return false if the node is not DefElem.
Todo:
Add regression tests
DocumentationThank you,Rahila Syed
Thank you for the updated patch. However, now I cannot create a partition after default.
CREATE TABLE list1 (
a int,
b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
CREATE TABLE list1_5 PARTITION OF list1 FOR VALUES IN (3);server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing the request.The connection to the server was lost. Attempting reset: Failed.!>
Attachment
On Tue, May 16, 2017 at 8:57 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > I have fixed the crash in attached patch. > Also the patch needed bit of adjustments due to recent commit. > I have re-based the patch on latest commit. + bool has_default; /* Is there a default partition? Currently false + * for a range partitioned table */ + int default_index; /* Index of the default list partition. -1 for + * range partitioned tables */ Why do we need both has_default and default_index? If default_index == -1 means that there is no default, we don't also need a separate bool to record the same thing, do we? get_qual_for_default() still returns a list of things that are not quals. I think that this logic is all pretty poorly organized. The logic to create a partitioning constraint for a list partition should be part of get_qual_for_list(), whether or not it is a default. And when we have range partitions, the logic to create a default range partitioning constraint should be part of get_qual_for_range(). The code the way it's organized today makes it look like there are three kinds of partitions: list, range, and default. But that's not right at all. There are two kinds: list and range. And a list partition might or might not be a default partition, and similarly for range. + ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), + errmsg("DEFAULT partition cannot be used" + " without negator of operator %s", + get_opname(operoid)))); I don't think ERRCODE_CHECK_VIOLATION is the right error code here, and we have a policy against splitting message strings like this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 16, 2017 at 9:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, May 16, 2017 at 8:57 AM, Jeevan Ladhe > <jeevan.ladhe@enterprisedb.com> wrote: >> I have fixed the crash in attached patch. >> Also the patch needed bit of adjustments due to recent commit. >> I have re-based the patch on latest commit. > > + bool has_default; /* Is there a default partition? > Currently false > + * for a range partitioned table */ > + int default_index; /* Index of the default list > partition. -1 for > + * range partitioned tables */ > We have has_null and null_index for list partitioning. There null_index == -1 = has_null. May be Rahila and/or Jeevan just copied that style. Probably we should change that as well? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2017/05/17 17:58, Ashutosh Bapat wrote: > On Tue, May 16, 2017 at 9:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, May 16, 2017 at 8:57 AM, Jeevan Ladhe >> <jeevan.ladhe@enterprisedb.com> wrote: >>> I have fixed the crash in attached patch. >>> Also the patch needed bit of adjustments due to recent commit. >>> I have re-based the patch on latest commit. >> >> + bool has_default; /* Is there a default partition? >> Currently false >> + * for a range partitioned table */ >> + int default_index; /* Index of the default list >> partition. -1 for >> + * range partitioned tables */ >> > > We have has_null and null_index for list partitioning. There > null_index == -1 = has_null. May be Rahila and/or Jeevan just copied > that style. Probably we should change that as well? Probably a good idea. Thanks, Amit
On Tue, May 16, 2017 at 9:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 16, 2017 at 8:57 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>> I have fixed the crash in attached patch.
>> Also the patch needed bit of adjustments due to recent commit.
>> I have re-based the patch on latest commit.
>
> + bool has_default; /* Is there a default partition?
> Currently false
> + * for a range partitioned table */
> + int default_index; /* Index of the default list
> partition. -1 for
> + * range partitioned tables */
>
We have has_null and null_index for list partitioning. There
null_index == -1 = has_null. May be Rahila and/or Jeevan just copied
that style. Probably we should change that as well?
Attachment
Hello,Patch for default range partition has been added. PFA the rebased v12 patch for the same.I have not removed the has_default variable yet.Default range partition: https://www.--postgresql.org/message-id/ CAOG9ApEYj34fWMcvBMBQ- YtqR9fTdXhdN82QEKG0SVZ6zeL1xg% 40mail.gmail.com Beena EmersonEnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
PFA.
Hi
postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TEMP TABLE temp_def_part (a int);
CREATE TABLE
postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TEMP TABLE temp_def_part (a int);
CREATE TABLE
postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
Hi Rajkumar,postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TEMP TABLE temp_def_part (a int);
CREATE TABLE
postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>Thanks for reporting.PFA patch that fixes above issue.Regards,Jeevan Ladhe
Attachment
On Thu, May 25, 2017 at 3:03 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > > Forgot to attach the patch. > PFA. > > On Thu, May 25, 2017 at 3:02 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: >> >> Hi Rajkumar, >> >>> postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a); >>> CREATE TABLE >>> postgres=# CREATE TEMP TABLE temp_def_part (a int); >>> CREATE TABLE >>> postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT; >>> server closed the connection unexpectedly >>> This probably means the server terminated abnormally >>> before or while processing the request. >>> The connection to the server was lost. Attempting reset: Failed. >>> !> >> >> >> Thanks for reporting. >> PFA patch that fixes above issue. >> The existing comment is not valid /* * A null partition key is only acceptable if null-accepting list * partition exists. */ as we allow NULL to be stored in default. It should be updated. DROP TABLE list1; CREATE TABLE list1 ( a int) PARTITION BY LIST (a); CREATE TABLE list1_1 (LIKE list1); ALTER TABLE list1 ATTACH PARTITION list1_1 FOR VALUES IN (2); CREATE TABLE list1_def PARTITION OF list1 DEFAULT; INSERT INTO list1 VALUES (NULL); SELECT * FROM list1_def;a --- (1 row) -- Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
This patch needs a rebase on recent commits, and also a fix[1] that is posted for get_qual_for_list().
I am working on both of these tasks. Will update the patch once I am done with this.
Regards,
Jeevan Ladhe
On Thu, May 25, 2017 at 3:03 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
> Forgot to attach the patch.
> PFA.
>
> On Thu, May 25, 2017 at 3:02 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
>>
>> Hi Rajkumar,
>>
>>> postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
>>> CREATE TABLE
>>> postgres=# CREATE TEMP TABLE temp_def_part (a int);
>>> CREATE TABLE
>>> postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT;
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>
>>
>> Thanks for reporting.
>> PFA patch that fixes above issue.
>>
The existing comment is not valid
/*
* A null partition key is only acceptable if null-accepting list
* partition exists.
*/
as we allow NULL to be stored in default. It should be updated.
DROP TABLE list1;
CREATE TABLE list1 ( a int) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE list1 ATTACH PARTITION list1_1 FOR VALUES IN (2);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 VALUES (NULL);
SELECT * FROM list1_def;
a
---
(1 row)
The existing comment is not valid
/*
* A null partition key is only acceptable if null-accepting list
* partition exists.
*/
as we allow NULL to be stored in default. It should be updated.
The existing comment is not valid
/*
* A null partition key is only acceptable if null-accepting list
* partition exists.
*/
as we allow NULL to be stored in default. It should be updated.Sure Beena, as stated earlier will update this on my next version of patch.Regards,Jeevan Ladhe
Attachment
On Mon, May 29, 2017 at 9:33 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi, > > I have rebased the patch on latest commit with few cosmetic changes. > > The patch fix_listdatums_get_qual_for_list_v3.patch [1] needs to be applied > before applying this patch. > > [1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315490.html > This needs a rebase again. -- Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, May 29, 2017 at 9:33 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patch on latest commit with few cosmetic changes.
>
> The patch fix_listdatums_get_qual_for_list_v3.patch [1] needs to be applied
> before applying this patch.
>
> [1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/ msg315490.html
>
This needs a rebase again.
--
Beena Emerson
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
Hi,I have rebased the patch on the latest commit.PFA.There exists one issue reported by Rajkumar[1] off-line as following, wheredescribing the default partition after deleting null partition, does not showupdated constraints. I am working on fixing this issue.create table t1 (c1 int) partition by list (c1);create table t11 partition of t1 for values in (1,2);create table t12 partition of t1 default;create table t13 partition of t1 for values in (10,11);create table t14 partition of t1 for values in (null);postgres=# \d+ t12Table "public.t12"Column | Type | Collation | Nullable | Default | Storage | Stats target | Description--------+---------+-----------+----------+---------+-------- -+--------------+------------- c1 | integer | | | | plain | |Partition of: t1 DEFAULTPartition constraint: ((c1 IS NOT NULL) AND (c1 <> ALL (ARRAY[1, 2, 10, 11])))postgres=# alter table t1 detach partition t14;ALTER TABLEpostgres=# \d+ t12Table "public.t12"Column | Type | Collation | Nullable | Default | Storage | Stats target | Description--------+---------+-----------+----------+---------+-------- -+--------------+------------- c1 | integer | | | | plain | |Partition of: t1 DEFAULTPartition constraint: ((c1 IS NOT NULL) AND (c1 <> ALL (ARRAY[1, 2, 10, 11])))postgres=# insert into t1 values(null);INSERT 0 1Note that the parent correctly allows the nulls to be inserted.Regards,Jeevan LadheOn Tue, May 30, 2017 at 10:59 AM, Beena Emerson <memissemerson@gmail.com> wrote:On Mon, May 29, 2017 at 9:33 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patch on latest commit with few cosmetic changes.
>
> The patch fix_listdatums_get_qual_for_list_v3.patch [1] needs to be applied
> before applying this patch.
>
> [1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg 315490.html
>
This needs a rebase again.
--
Beena Emerson
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
On Tue, May 30, 2017 at 1:08 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi, > > I have rebased the patch on the latest commit. > PFA. > Thanks for rebasing the patch. Here are some review comments. + /* + * In case of default partition, just note the index, we do not + * add this to non_null_values list. + */ We may want to rephrase it like "Note the index of the partition bound spec for the default partition. There's no datum to add to the list of non-null datums for this partition." /* Assign mapping index for default partition. */ "mapping index" should be "mapped index". May be we want to use "the" before default partition everywhere, there's only one specific default partition. Assert(default_index >= 0 && mapping[default_index] == -1); Needs some explanation for asserting mapping[default_index] == -1. Since default partition accepts any non-specified value, it should not get a mapped index while assigning those for non-null datums. + * Currently range partition do not have default partition May be rephrased as "As of now, we do not support default range partition." + * ArrayExpr, which would return an negated expression for default a negated instead of an negated. + cur_index = -1; /* - * A null partition key is only acceptable if null-accepting list - * partition exists. + * A null partition key is acceptable if null-accepting list partition + * or a default partition exists. Check if there exists a null + * accepting partition, else this will be handled later by default + * partition if it exists. */ - cur_index = -1; Why do we need to move assignment to cur_index before the comment. The comment should probably change to "Handle NULL partition key here if there's a null-accepting list partition. Else it will routed to a default partition if one exists." +-- attaching default partition overlaps if a default partition already exists +ERROR: partition "part_def2" would overlap partition "part_def1" Saying a default partition overlaps is misleading here. A default partition is not exepected to overlap with anything. It's expected to "adjust" with the rest of the partitions. It can "conflict" with another default partition. So the right error message here is "a default partition "part_def1" already exists." +CREATE TABLE part_def1 PARTITION OF list_parted DEFAULT; +CREATE TABLE part_def2 (LIKE part_1 INCLUDING CONSTRAINTS); +ALTER TABLE list_parted ATTACH PARTITION part_def2 DEFAULT; May be you want to name part_def1 as def_part and part_def2 as fail_def_part to be consistent with other names in the file. May be you want to test to consecutive CREATE TABLE ... DEFAULT. +ALTER TABLE list_parted2 ATTACH PARTITION part_3 FOR VALUES IN (11); +ERROR: new default partition constraint is violated by some row +DETAIL: Violating row contains (11, z). The error message seems to be misleading. The default partition is not new. May be we should say, "default partition contains rows that conflict with the partition bounds of "part_3"". I think we should use a better word instead of "conflict", but I am not able to find one right now. +-- check that leaf partitons of default partition are scanned when s/partitons/partitions/ -ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5)), ALTER a SET NOT NULL; -ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5); +ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5, 55)), ALTER a SET NOT NULL; +ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5, 55); Why do we want to change partition bounds of this one? The test is for children of part_5 right? +drop table part_default; I think this is premature drop. Down the file there's a SELECT from list_parted, which won't list the rows inserted to the default partition and we will miss to check whether the tuples were routed to the right partition or not. +update list_part1 set a = 'c' where a = 'a'; +ERROR: new row for relation "list_part1" violates partition constraint +DETAIL: Failing row contains (c, 1). Why do we need this test here? It's not dealing with the default partition and partition row movement is not in there. So the updated row may not move to the default partition, even if it's there. This isn't a complete review. I will continue to review this patch further. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Hi Jeevan, On 2017/05/30 16:38, Jeevan Ladhe wrote: > I have rebased the patch on the latest commit. > PFA. Was looking at the patch and felt that the parse node representation of default partition bound could be slightly different. Can you explain the motivation behind implementing it without adding a new member to the PartitionBoundSpec struct? I would suggest instead adding a bool named is_default and be done with it. It will help get rid of the public isDefaultPartitionBound() in the proposed patch whose interface isn't quite clear and instead simply check if (spec->is_default) in places where it's called by passing it (Node *) linitial(spec->listdatums). Further looking into the patch, I found a tiny problem in check_default_allows_bound(). If the default partition that will be scanned by it is a foreign table or a partitioned table with a foreign leaf partition, you will get a failure like: -- default partition is a foreign table alter table p attach partition fp default; -- adding a new partition will try to scan fp above alter table p attach partition p12 for values in (1, 2); ERROR: could not open file "base/13158/16456": No such file or directory I think the foreign tables should be ignored here to avoid the error. The fact that foreign default partition may contain data that satisfies the new partition's constraint is something we cannot do much about. Also, see the note in ATTACH PARTITION description regarding foreign tables [1] and the discussion at [2]. Thanks, Amit [1] https://www.postgresql.org/docs/devel/static/sql-altertable.html [2] https://www.postgresql.org/message-id/flat/8f89dcb2-bd15-d8dc-5f54-3e11dc6c9463%40lab.ntt.co.jp
Hi Jeevan,Was looking at the patch and felt that the parse node representation of
On 2017/05/30 16:38, Jeevan Ladhe wrote:
> I have rebased the patch on the latest commit.
> PFA.
default partition bound could be slightly different. Can you explain the
motivation behind implementing it without adding a new member to the
PartitionBoundSpec struct?
I would suggest instead adding a bool named is_default and be done with
it. It will help get rid of the public isDefaultPartitionBound() in the
proposed patch whose interface isn't quite clear and instead simply check
if (spec->is_default) in places where it's called by passing it (Node *)
linitial(spec->listdatums).
Further looking into the patch, I found a tiny problem in
check_default_allows_bound(). If the default partition that will be
scanned by it is a foreign table or a partitioned table with a foreign
leaf partition, you will get a failure like:
-- default partition is a foreign table
alter table p attach partition fp default;
-- adding a new partition will try to scan fp above
alter table p attach partition p12 for values in (1, 2);
ERROR: could not open file "base/13158/16456": No such file or directory
I think the foreign tables should be ignored here to avoid the error. The
fact that foreign default partition may contain data that satisfies the
new partition's constraint is something we cannot do much about. Also,
see the note in ATTACH PARTITION description regarding foreign tables [1]
and the discussion at [2].
On 2017/05/31 9:33, Amit Langote wrote: > On 2017/05/30 16:38, Jeevan Ladhe wrote: >> I have rebased the patch on the latest commit. >> PFA. > > Was looking at the patch I tried creating default partition of a range-partitioned table and got the following error: ERROR: invalid bound specification for a range partition I thought it would give: ERROR: creating default partition is not supported for range partitioned tables Which means transformPartitionBound() should perform this check more carefully. As I suggested in my previous email, if there were a is_default field in the PartitionBoundSpec, then one could add the following block of code at the beginning of transformPartitionBound: if (spec->is_default && spec->strategy != PARTITION_STRATEGY_LIST) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("creating default partition is not supported for %s partitioned tables", get_partition_strategy_name(key->strategy)))); Some more comments on the patch: + errmsg("new default partition constraint is violated by some row"), "new default partition constraint" may sound a bit confusing to users. That we recompute the default partition's constraint and check the "new constraint" against the rows it contains seems to me to be the description of internal details. How about: ERROR: default partition contains rows that belong to partition being created +char *ExecBuildSlotValueDescription(Oid reloid, + TupleTableSlot *slot, + TupleDesc tupdesc, + Bitmapset *modifiedCols, + int maxfieldlen); It seems that you made the above public to use it in check_default_allows_bound(), which while harmless, I'm not sure if needed. ATRewriteTable() in tablecmds.c, for example, emits the following error messages: errmsg("check constraint \"%s\" is violated by some row", errmsg("partition constraint is violated by some row"))); but neither outputs the DETAIL part showing exactly what row. I think it's fine for check_default_allows_bound() not to show the row itself and hence no need to make ExecBuildSlotValueDescription public. In get_rule_expr(): case PARTITION_STRATEGY_LIST: Assert(spec->listdatums != NIL); + /* + * If the boundspec is of Default partition, it does + * not have list of datums, but has only one node to + * indicate its a default partition. + */ + if (isDefaultPartitionBound( + (Node *) linitial(spec->listdatums))) + { + appendStringInfoString(buf, "DEFAULT"); + break; + } + How about adding this part before the switch (key->strategy)? That way, we won't have to come back and add this again when we add range default partitions. Gotta go; will provide more comments later. Thanks, Amit
On Wed, May 31, 2017 at 8:13 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/05/31 9:33, Amit Langote wrote: > > > In get_rule_expr(): > > case PARTITION_STRATEGY_LIST: > Assert(spec->listdatums != NIL); > > + /* > + * If the boundspec is of Default partition, it does > + * not have list of datums, but has only one node to > + * indicate its a default partition. > + */ > + if (isDefaultPartitionBound( > + (Node *) linitial(spec->listdatums))) > + { > + appendStringInfoString(buf, "DEFAULT"); > + break; > + } > + > > How about adding this part before the switch (key->strategy)? That way, > we won't have to come back and add this again when we add range default > partitions. I think it is best that we add a bool is_default to PartitionBoundSpec and then have a general check for both list and range. Though listdatums, upperdatums and lowerdatums are set to default for a DEFAULt partition, it does not seem proper that we check listdatums for range as well. -- Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, May 31, 2017 at 8:13 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/05/31 9:33, Amit Langote wrote:
>
>
> In get_rule_expr():
>
> case PARTITION_STRATEGY_LIST:
> Assert(spec->listdatums != NIL);
>
> + /*
> + * If the boundspec is of Default partition, it does
> + * not have list of datums, but has only one node to
> + * indicate its a default partition.
> + */
> + if (isDefaultPartitionBound(
> + (Node *) linitial(spec->listdatums)))
> + {
> + appendStringInfoString(buf, "DEFAULT");
> + break;
> + }
> +
>
> How about adding this part before the switch (key->strategy)? That way,
> we won't have to come back and add this again when we add range default
> partitions.
I think it is best that we add a bool is_default to PartitionBoundSpec
and then have a general check for both list and range. Though
listdatums, upperdatums and lowerdatums are set to default for a
DEFAULt partition, it does not seem proper that we check listdatums
for range as well.
Attachment
On Thu, Jun 1, 2017 at 3:35 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Please let me know if I have missed anything and any further comments. + errmsg("a default partition \"%s\" already exists", I suggest: partition \"%s\" conflicts with existing default partition \"%s\" The point is that's more similar to the message you get when overlap && !spec->is_default. + * If the default partition exists, it's partition constraint will change it's -> its + errmsg("default partition contains row(s) that would overlap with partition being created"))); It doesn't really sound right to talk about rows overlapping with a partition. Partitions can overlap with each other, but not rows. Also, it's not really project style to use ambiguously plural forms like "row(s)" in error messages. Maybe something like: new partition constraint for default partition \"%s\" would be violated by some row +/* + * InvalidateDefaultPartitionRelcache + * + * Given a parent oid, this function checks if there exists a default partition + * and invalidates it's relcache if it exists. + */ +void +InvalidateDefaultPartitionRelcache(Oid parentOid) +{ + Relation parent = heap_open(parentOid, AccessShareLock); + Oid default_relid = parent->rd_partdesc->oids[DEFAULT_PARTITION_INDEX(parent)]; + + if (partition_bound_has_default(parent->rd_partdesc->boundinfo)) + CacheInvalidateRelcacheByRelid(default_relid); + + heap_close(parent, AccessShareLock); +} It does not seem like a good idea to put the heap_open() call inside this function. One of the two callers already *has* the Relation, and we definitely want to avoid pulling the Oid out of the Relation only to reopen it to get the Relation back. And I think heap_drop_with_catalog could open the parent relation instead of calling LockRelationOid(). If DETACH PARTITION and DROP PARTITION require this, why not ATTACH PARTITION and CREATE TABLE .. PARTITION OF? The indentation of the changes in gram.y doesn't appear to match the nearby code. I'd remove this comment: + * Currently this is supported only for LIST partition. Since nothing here is dependent on this working only for LIST partitions, and since this will probably change, I think it would be more future-proof to leave this out, lest somebody forget to update it later. - switch (spec->strategy) + if (spec->is_default && (strategy == PARTITION_STRATEGY_LIST || + strategy == PARTITION_STRATEGY_RANGE)) Checking strategy here appears pointless. This is not a full review, but I'm out of time for today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Here's some detailed review of the code. @@ -1883,6 +1883,15 @@ heap_drop_with_catalog(Oid relid) if (OidIsValid(parentOid)) { /* + * Default partition constraints are constructed run-time from the + * constraints of its siblings(basically by negating them), so any + * change in the siblings needs to rebuild the constraints of the + * default partition. So, invalidate the sibling default partition's + * relcache. + */ + InvalidateDefaultPartitionRelcache(parentOid); + Do we need a lock on the default partition for doing this? A query might be scanning the default partition directly and we will invalidate the relcache underneath it. What if two partitions are being dropped simultaneously and change default constraints simultaneously. Probably the lock on the parent helps there, but need to check it. What if the default partition cache is invalidated because partition gets added/dropped to the default partition itself. If we need a lock on the default partition, we will need to check the order in which we should be obtaining the locks so as to avoid deadlocks. This also means that we have to test PREPARED statements involving default partition. Any addition/deletion/attach/detach of other partition should invalidate those cached statements. + if (partition_bound_has_default(boundinfo)) + { + overlap = true; + with = boundinfo->default_index; + } You could possibly rewrite this as overlap = partition_bound_has_default(boundinfo); with = boundinfo->default_index; that would save one indentation and a conditional jump. + if (partdesc->nparts > 0 && partition_bound_has_default(boundinfo)) + check_default_allows_bound(parent, spec); If the table has a default partition, nparts > 0, nparts > 0 check looks redundant. The comments above should also explain that this check doesn't trigger when a default partition is added since we don't expect an existing default partition in such a case. + * Checks if there exists any row in the default partition that passes the + * check for constraints of new partition, if any reports an error. grammar two conflicting ifs in the same statement. You may want to rephrase this as "This function checks if there exists a row in the default partition that fits in the new partition and throws an error if it finds one." + if (new_spec->strategy != PARTITION_STRATEGY_LIST) + return; This should probably be an Assert. When default range partition is supported this function would silently return, meaning there is no row in the default partition which fits the new partition. We don't want that behavior. The code in check_default_allows_bound() to check whether the default partition has any rows that would fit new partition looks quite similar to the code in ATExecAttachPartition() checking whether all rows in the table being attached as a partition fit the partition bounds. One thing that check_default_allows_bound() misses is, if there's already a constraint on the default partition refutes the partition constraint on the new partition, we can skip the scan of the default partition since it can not have rows that would fit the new partition. ATExecAttachPartition() has code to deal with a similar case i.e. the table being attached has a constraint which implies the partition constraint. There may be more cases which check_default_allows_bound() does not handle but ATExecAttachPartition() handles. So, I am wondering whether it's better to somehow take out the common code into a function and use it. We will have to deal with a difference through. The first one would throw an error when finding a row that satisfies partition constraints whereas the second one would throw an error when it doesn't find such a row. But this difference can be handled through a flag or by negating the constraint. This would also take care of Amit Langote's complaint about foreign partitions. There's also another difference that the ATExecAttachPartition() queues the table for scan and the actual scan takes place in ATRewriteTable(), but there is not such queue while creating a table as a partition. But we should check if we can reuse the code to scan the heap for checking a constraint. In case of ATTACH PARTITION, probably we should schedule scan of default partition in the alter table's work queue like what ATExecAttachPartition() is doing for the table being attached. That would fit in the way alter table works. make_partition_op_expr(PartitionKey key, int keynum, - uint16 strategy, Expr *arg1, Expr *arg2) + uint16 strategy, Expr *arg1, Expr *arg2, bool is_default) Indentation + if (is_default && + ((operoid = get_negator(operoid)) == InvalidOid)) + ereport(ERROR, (errcode(ERRCODE_RESTRICT_VIOLATION), + errmsg("DEFAULT partition cannot be used without negator of operator %s", + get_opname(operoid)))); + If the existence of default partition depends upon the negator, shouldn't there be a dependency between the default partition and the negator. At the time of creating the default partition, we will try to constuct the partition constraint for the default partition and if the negator doesn't exist that time, it will throw an error. But in an unlikely event when the user drops the negator, the partitioned table will not be usable at all, as every time it will try to create the relcache, it will try to create default partition constraint and will throw error because of missing negator. That's not a very good scenario. Have you tried this case? Apart from that, while restoring a dump, if the default partition gets restored before the negator is created, restore will fail with this error. /* Generate the main expression, i.e., keyCol = ANY (arr) */ opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber, - keyCol, (Expr *) arr); + keyCol, (Expr *) arr, spec->is_default); /* Build leftop = ANY (rightop)*/ saopexpr = makeNode(ScalarArrayOpExpr); The comments in both the places need correction, as for default partition the expression will be keyCol <> ALL(arr). + /* + * In case of the default partition for list, the partition constraint + * is basically any value that is not equal to any of the values in + * boundinfo->datums array. So, construct a list of constants from + * boundinfo->datums to pass to function make_partition_op_expr via + * ArrayExpr, which would return a negated expression for the default + * partition. + */ This is misleading, since the actual constraint would also have NOT NULL or IS NULL in there depending upon the existence of a NULL partition. I would simply rephrase this as "For default list partition, collect lists for all the partitions. The default partition constraint should check that the partition key is equal to none of those." + ndatums = (pdesc->nparts > 0) ? boundinfo->ndatums : 0; wouldn't ndatums be simply boundinfo->ndatums? When nparts = 0, ndatums will be 0. + int ndatums = 0; This assignment looks redundant then. + if (boundinfo && partition_bound_accepts_nulls(boundinfo)) You have not checked existence of boundinfo when extracting ndatums out of it and just few lines below you check that. If the later check is required then we will get a segfault while extracting ndatums. + if ((!list_has_null && !spec->is_default) || + (list_has_null && spec->is_default)) Need a comment explaining what's going on here. The condition is no more a simple condition. - result = -1; - *failed_at = parent; - *failed_slot = slot; - break; + if (partition_bound_has_default(partdesc->boundinfo)) + { + result = parent->indexes[partdesc->boundinfo->default_index]; + + if (result >= 0) + break; + else + parent = pd[-result]; + } + else + { + result = -1; + *failed_at = parent; + *failed_slot = slot; + break; + } The code to handle result is duplicated here and few lines below. I think it would be better to not duplicate it by having separate condition blocks to deal with setting result and setting parent. Basically if (cur_index < 0) ... else would set the result breaking when setting result = -1 explicitly. A follow-on block would adjust the parent if result < 0 or break otherwise. Both the places where DEFAULT_PARTITION_INDEX is used, its result is used to fetch OID of the default partition. So, instead of having this macro, may be we should have macro to fetch OID of default partition. But even there I don't see much value in that. Further, the macro and code using that macro fetches rd_partdesc directly from Relation. We have RelationGetPartitionDesc() for that. Probably we should also add Asserts to check that every pointer in the long pointer chain is Non-null. InvalidateDefaultPartitionRelcache() is called in case of drop and detach. Shouldn't the constraint change when we add or attach a new partition. Shouldn't we invalidate the cache then as well? I am not able to find that code in your patch. /* + * Default partition constraints are constructed run-time from the + * constraints of its siblings(basically by negating them), so any + * change in the siblings needs to rebuild the constraints of the + * default partition. So, invalidate the sibling default partition's + * relcache. + */ May be rephrase this as "The default partition constraints depend upon the partition bounds of other partitions. Detaching a partition invalidates the default partition constraints. Invalidate the default partition's relcache so that the constraints are built anew and any plans dependent on those constraints are invalidated as well." + errmsg("default partition is supported only for list partitioned table"))); for "a" list partitioned table. + /* + * A default partition, that can be partition of either LIST or + * RANGE partitioned table. + * Currently this is supported only for LIST partition. + */ Keep everything in single paragraph without line break. } + ; unnecessary extra line. + /* + * The default partition bound does not have any datums to be + * transformed, return the new bound. + */ Probably not needed. + if (spec->is_default && (strategy == PARTITION_STRATEGY_LIST || + strategy == PARTITION_STRATEGY_RANGE)) + { + appendStringInfoString(buf, "DEFAULT"); + break; + } + What happens if strategy is something other than RANGE or LIST. For that matter why not just LIST? Possibly you could write this as + if (spec->is_default) + { + Assert(strategy == PARTITION_STRATEGY_LIST); + appendStringInfoString(buf, "DEFAULT"); + break; + } @@ -2044,7 +2044,7 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables,""); /* Limited completion support for partition bound specification*/ else if (TailMatches3("ATTACH", "PARTITION", MatchAny)) - COMPLETE_WITH_CONST("FOR VALUES"); + COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT"); else if (TailMatches2("FOR", "VALUES")) COMPLETE_WITH_LIST2("FROM(", "IN ("); @@ -2483,7 +2483,7 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables,""); /* Limited completion support for partition boundspecification */ else if (TailMatches3("PARTITION", "OF", MatchAny)) - COMPLETE_WITH_CONST("FOR VALUES"); + COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT"); Do we include psql tab completion in the main feature patch? I have not seen this earlier. But appreciate taking care of these defails. +char *ExecBuildSlotValueDescription(Oid reloid, needs an "extern" declaration. On Fri, Jun 2, 2017 at 1:05 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi, > > I have addressed Ashutosh's and Amit's comments in the attached patch. > > Please let me know if I have missed anything and any further comments. > > PFA. > > Regards, > Jeevan Ladhe > > On Wed, May 31, 2017 at 9:50 AM, Beena Emerson <memissemerson@gmail.com> > wrote: >> >> On Wed, May 31, 2017 at 8:13 AM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> > On 2017/05/31 9:33, Amit Langote wrote: >> > >> > >> > In get_rule_expr(): >> > >> > case PARTITION_STRATEGY_LIST: >> > Assert(spec->listdatums != NIL); >> > >> > + /* >> > + * If the boundspec is of Default partition, it >> > does >> > + * not have list of datums, but has only one >> > node to >> > + * indicate its a default partition. >> > + */ >> > + if (isDefaultPartitionBound( >> > + (Node *) >> > linitial(spec->listdatums))) >> > + { >> > + appendStringInfoString(buf, "DEFAULT"); >> > + break; >> > + } >> > + >> > >> > How about adding this part before the switch (key->strategy)? That way, >> > we won't have to come back and add this again when we add range default >> > partitions. >> >> I think it is best that we add a bool is_default to PartitionBoundSpec >> and then have a general check for both list and range. Though >> listdatums, upperdatums and lowerdatums are set to default for a >> DEFAULt partition, it does not seem proper that we check listdatums >> for range as well. >> >> >> >> >> -- >> >> Beena Emerson >> >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
If DETACH PARTITION and DROP PARTITION require this, why not ATTACH
PARTITION and CREATE TABLE .. PARTITION OF?
Hello, On Fri, Jun 2, 2017 at 1:05 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi, > > I have addressed Ashutosh's and Amit's comments in the attached patch. > > Please let me know if I have missed anything and any further comments. > > PFA. > > Regards, > Jeevan Ladhe > What is the reason the new patch does not mention of violating rows when a new partition overlaps with default? Is it because more than one row could be violating the condition? -- Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
What is the reason the new patch does not mention of violating rows
when a new partition overlaps with default?
Is it because more than one row could be violating the condition?
On Mon, Jun 5, 2017 at 12:14 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > > >> >> What is the reason the new patch does not mention of violating rows >> when a new partition overlaps with default? >> Is it because more than one row could be violating the condition? > > > This is because, for reporting the violating error, I had to function > ExecBuildSlotValueDescription() public. Per Amit's comment I have > removed this change and let the overlapping error without row contains. > I think this is analogus to other functions that are throwing violation > error > but are not local to execMain.c. > ok thanks. -- Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Here's some detailed review of the code.
@@ -1883,6 +1883,15 @@ heap_drop_with_catalog(Oid relid)
if (OidIsValid(parentOid))
{
/*
+ * Default partition constraints are constructed run-time from the
+ * constraints of its siblings(basically by negating them), so any
+ * change in the siblings needs to rebuild the constraints of the
+ * default partition. So, invalidate the sibling default partition's
+ * relcache.
+ */
+ InvalidateDefaultPartitionRelcache(parentOid);
+
Do we need a lock on the default partition for doing this? A query might be
scanning the default partition directly and we will invalidate the relcache
underneath it. What if two partitions are being dropped simultaneously and
change default constraints simultaneously. Probably the lock on the parent
helps there, but need to check it. What if the default partition cache is
invalidated because partition gets added/dropped to the default partition
itself. If we need a lock on the default partition, we will need to
check the order in which we should be obtaining the locks so as to avoid
deadlocks.
This also means that we have to test PREPARED statements involving
default partition. Any addition/deletion/attach/detach of other partition
should invalidate those cached statements.
+ if (partition_bound_has_default(boundinfo))
+ {
+ overlap = true;
+ with = boundinfo->default_index;
+ }
You could possibly rewrite this as
overlap = partition_bound_has_default(boundinfo);
with = boundinfo->default_index;
that would save one indentation and a conditional jump.
+ if (partdesc->nparts > 0 && partition_bound_has_default(boundinfo))
+ check_default_allows_bound(parent, spec);
If the table has a default partition, nparts > 0, nparts > 0 check looks
redundant. The comments above should also explain that this check doesn't
trigger when a default partition is added since we don't expect an existing
default partition in such a case.
+ * Checks if there exists any row in the default partition that passes the
+ * check for constraints of new partition, if any reports an error.
grammar two conflicting ifs in the same statement. You may want to rephrase
this as "This function checks if there exists a row in the default
partition that fits in the new
partition and throws an error if it finds one."
+ if (new_spec->strategy != PARTITION_STRATEGY_LIST)
+ return;
This should probably be an Assert. When default range partition is supported
this function would silently return, meaning there is no row in the default
partition which fits the new partition. We don't want that behavior.
The code in check_default_allows_bound() to check whether the default partition
has any rows that would fit new partition looks quite similar to the code in
ATExecAttachPartition() checking whether all rows in the table being attached
as a partition fit the partition bounds. One thing that
check_default_allows_bound() misses is, if there's already a constraint on the
default partition refutes the partition constraint on the new partition, we can
skip the scan of the default partition since it can not have rows that would
fit the new partition. ATExecAttachPartition() has code to deal with a similar
case i.e. the table being attached has a constraint which implies the partition
constraint. There may be more cases which check_default_allows_bound() does not
handle but ATExecAttachPartition() handles. So, I am wondering whether it's
better to somehow take out the common code into a function and use it. We will
have to deal with a difference through. The first one would throw an error when
finding a row that satisfies partition constraints whereas the second one would
throw an error when it doesn't find such a row. But this difference can be
handled through a flag or by negating the constraint. This would also take care
of Amit Langote's complaint about foreign partitions. There's also another
difference that the ATExecAttachPartition() queues the table for scan and the
actual scan takes place in ATRewriteTable(), but there is not such queue while
creating a table as a partition. But we should check if we can reuse the code to
scan the heap for checking a constraint.
In case of ATTACH PARTITION, probably we should schedule scan of default
partition in the alter table's work queue like what ATExecAttachPartition() is
doing for the table being attached. That would fit in the way alter table
works.
make_partition_op_expr(PartitionKey key, int keynum,
- uint16 strategy, Expr *arg1, Expr *arg2)
+ uint16 strategy, Expr *arg1, Expr *arg2, bool is_default)
Indentation
+ if (is_default &&
+ ((operoid = get_negator(operoid)) == InvalidOid))
+ ereport(ERROR, (errcode(ERRCODE_RESTRICT_VIOLATION),
+ errmsg("DEFAULT partition cannot
be used without negator of operator %s",
+ get_opname(operoid))));
+
If the existence of default partition depends upon the negator, shouldn't there
be a dependency between the default partition and the negator. At the time of
creating the default partition, we will try to constuct the partition
constraint for the default partition and if the negator doesn't exist that
time, it will throw an error. But in an unlikely event when the user drops the
negator, the partitioned table will not be usable at all, as every time it will
try to create the relcache, it will try to create default partition constraint
and will throw error because of missing negator. That's not a very good
scenario. Have you tried this case? Apart from that, while restoring a dump, if
the default partition gets restored before the negator is created, restore will
fail with this error.
/* Generate the main expression, i.e., keyCol = ANY (arr) */
opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber,
- keyCol, (Expr *) arr);
+ keyCol, (Expr *) arr, spec->is_default);
/* Build leftop = ANY (rightop) */
saopexpr = makeNode(ScalarArrayOpExpr);
The comments in both the places need correction, as for default partition the
expression will be keyCol <> ALL(arr).
+ /*
+ * In case of the default partition for list, the partition constraint
+ * is basically any value that is not equal to any of the values in
+ * boundinfo->datums array. So, construct a list of constants from
+ * boundinfo->datums to pass to function make_partition_op_expr via
+ * ArrayExpr, which would return a negated expression for the default
+ * partition.
+ */
This is misleading, since the actual constraint would also have NOT NULL or IS
NULL in there depending upon the existence of a NULL partition.
I would simply rephrase this as "For default list partition, collect lists for
all the partitions. The default partition constraint should check that the
partition key is equal to none of those."
+ ndatums = (pdesc->nparts > 0) ? boundinfo->ndatums : 0;
wouldn't ndatums be simply boundinfo->ndatums? When nparts = 0, ndatums will be
0.
+ int ndatums = 0;
This assignment looks redundant then.
+ if (boundinfo && partition_bound_accepts_nulls(boundinfo))
You have not checked existence of boundinfo when extracting ndatums out of it
and just few lines below you check that. If the later check is required then we
will get a segfault while extracting ndatums.
+ if ((!list_has_null && !spec->is_default) ||
+ (list_has_null && spec->is_default))
Need a comment explaining what's going on here. The condition is no more a
simple condition.
- result = -1;
- *failed_at = parent;
- *failed_slot = slot;
- break;
+ if (partition_bound_has_default(partdesc->boundinfo))
+ {
+ result = parent->indexes[partdesc->boundinfo->default_index];
+
+ if (result >= 0)
+ break;
+ else
+ parent = pd[-result];
+ }
+ else
+ {
+ result = -1;
+ *failed_at = parent;
+ *failed_slot = slot;
+ break;
+ }
The code to handle result is duplicated here and few lines below. I think it
would be better to not duplicate it by having separate condition blocks to deal
with setting result and setting parent. Basically if (cur_index < 0) ... else
would set the result breaking when setting result = -1 explicitly. A follow-on
block would adjust the parent if result < 0 or break otherwise.
Both the places where DEFAULT_PARTITION_INDEX is used, its result is used to
fetch OID of the default partition. So, instead of having this macro, may be we
should have macro to fetch OID of default partition. But even there I don't see
much value in that.
Further, the macro and code using that macro fetches
rd_partdesc directly from Relation.
We have RelationGetPartitionDesc() for
that. Probably we should also add Asserts to check that every pointer in the
long pointer chain is Non-null.
InvalidateDefaultPartitionRelcache() is called in case of drop and detach.
Shouldn't the constraint change when we add or attach a new partition.
Shouldn't we invalidate the cache then as well? I am not able to find that
code in your patch.
/*
+ * Default partition constraints are constructed run-time from the
+ * constraints of its siblings(basically by negating them), so any
+ * change in the siblings needs to rebuild the constraints of the
+ * default partition. So, invalidate the sibling default partition's
+ * relcache.
+ */
May be rephrase this as "The default partition constraints depend upon the
partition bounds of other partitions. Detaching a partition invalidates the
default partition constraints. Invalidate the default partition's relcache so
that the constraints are built anew and any plans dependent on those
constraints are invalidated as well."
+ errmsg("default partition is supported only for
list partitioned table")));
for "a" list partitioned table.
+ /*
+ * A default partition, that can be partition of either LIST or
+ * RANGE partitioned table.
+ * Currently this is supported only for LIST partition.
+ */
Keep everything in single paragraph without line break.
}
+
;
unnecessary extra line.
+ /*
+ * The default partition bound does not have any datums to be
+ * transformed, return the new bound.
+ */
Probably not needed.
+ if (spec->is_default && (strategy == PARTITION_STRATEGY_LIST ||
+ strategy == PARTITION_STRATEGY_RANGE))
+ {
+ appendStringInfoString(buf, "DEFAULT");
+ break;
+ }
+
What happens if strategy is something other than RANGE or LIST. For that matter
why not just LIST? Possibly you could write this as
+ if (spec->is_default)
+ {
+ Assert(strategy == PARTITION_STRATEGY_LIST);
+ appendStringInfoString(buf, "DEFAULT");
+ break;
+ }
@@ -2044,7 +2044,7 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, "");
/* Limited completion support for partition bound specification */
else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
- COMPLETE_WITH_CONST("FOR VALUES");
+ COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
else if (TailMatches2("FOR", "VALUES"))
COMPLETE_WITH_LIST2("FROM (", "IN (");
@@ -2483,7 +2483,7 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_ta bles, "");
/* Limited completion support for partition bound specification */
else if (TailMatches3("PARTITION", "OF", MatchAny))
- COMPLETE_WITH_CONST("FOR VALUES");
+ COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
Do we include psql tab completion in the main feature patch? I have not seen
this earlier. But appreciate taking care of these defails.
+char *ExecBuildSlotValueDescription( Oid reloid,
needs an "extern" declaration.
Attachment
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: [...] >> >> The code in check_default_allows_bound() to check whether the default >> partition >> has any rows that would fit new partition looks quite similar to the code >> in >> ATExecAttachPartition() checking whether all rows in the table being >> attached >> as a partition fit the partition bounds. One thing that >> check_default_allows_bound() misses is, if there's already a constraint on >> the >> default partition refutes the partition constraint on the new partition, >> we can >> skip the scan of the default partition since it can not have rows that >> would >> fit the new partition. ATExecAttachPartition() has code to deal with a >> similar >> case i.e. the table being attached has a constraint which implies the >> partition >> constraint. There may be more cases which check_default_allows_bound() >> does not >> handle but ATExecAttachPartition() handles. So, I am wondering whether >> it's >> better to somehow take out the common code into a function and use it. We >> will >> have to deal with a difference through. The first one would throw an error >> when >> finding a row that satisfies partition constraints whereas the second one >> would >> throw an error when it doesn't find such a row. But this difference can be >> handled through a flag or by negating the constraint. This would also take >> care >> of Amit Langote's complaint about foreign partitions. There's also another >> difference that the ATExecAttachPartition() queues the table for scan and >> the >> actual scan takes place in ATRewriteTable(), but there is not such queue >> while >> creating a table as a partition. But we should check if we can reuse the >> code to >> scan the heap for checking a constraint. >> >> In case of ATTACH PARTITION, probably we should schedule scan of default >> partition in the alter table's work queue like what >> ATExecAttachPartition() is >> doing for the table being attached. That would fit in the way alter table >> works. > > > I am still working on this. > But, about your comment here: > "if there's already a constraint on the default partition refutes the > partition > constraint on the new partition, we can skip the scan": > I am so far not able to imagine such a case, since default partition > constraint > can be imagined something like "minus infinity to positive infinity with > some finite set elimination", and any new non-default partition being added > would simply be a set of finite values(at-least in case of list, but I think > range > should not also differ here). Hence one cannot imply the other here. > Possibly, > I might be missing something that you had visioned when you raised the flag, > please correct me if I am missing something. > IIUC, default partition constraints is simply NOT IN (<values of all other sibling partitions>). If constraint on the default partition refutes the new partition's constraints that means we have overlapping partition, and perhaps error. Regards, Amul
IIUC, default partition constraints is simply NOT IN (<values of all
other sibling partitions>).
If constraint on the default partition refutes the new partition's
constraints that means we have overlapping partition, and perhaps
error.
On Wed, Jun 7, 2017 at 10:30 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > > >> IIUC, default partition constraints is simply NOT IN (<values of all >> other sibling partitions>). >> If constraint on the default partition refutes the new partition's >> constraints that means we have overlapping partition, and perhaps >> error. > > > You are correct Amul, but this error will be thrown before we try to > check for the default partition data. So, in such cases I think we really > do not need to have logic to check if default partition refutes the new > partition contraints. > But Ashutosh's suggestion make sense, we might have constraints other than that partitioning constraint on default partition. If those constraints refutes the new partition's constraints, we should skip the scan. Regards, Amul
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: >> >> This also means that we have to test PREPARED statements involving >> default partition. Any addition/deletion/attach/detach of other partition >> should invalidate those cached statements. > > > Will add this in next version of patch. My earlier statement requires a clarification. By "PREPARED statements involving default partition." I mean PREPAREd statements with direct access to the default partition, without going through the partitioned table. > >> >> The code in check_default_allows_bound() to check whether the default >> partition >> has any rows that would fit new partition looks quite similar to the code >> in >> ATExecAttachPartition() checking whether all rows in the table being >> attached >> as a partition fit the partition bounds. One thing that >> check_default_allows_bound() misses is, if there's already a constraint on >> the >> default partition refutes the partition constraint on the new partition, >> we can >> skip the scan of the default partition since it can not have rows that >> would >> fit the new partition. ATExecAttachPartition() has code to deal with a >> similar >> case i.e. the table being attached has a constraint which implies the >> partition >> constraint. There may be more cases which check_default_allows_bound() >> does not >> handle but ATExecAttachPartition() handles. So, I am wondering whether >> it's >> better to somehow take out the common code into a function and use it. We >> will >> have to deal with a difference through. The first one would throw an error >> when >> finding a row that satisfies partition constraints whereas the second one >> would >> throw an error when it doesn't find such a row. But this difference can be >> handled through a flag or by negating the constraint. This would also take >> care >> of Amit Langote's complaint about foreign partitions. There's also another >> difference that the ATExecAttachPartition() queues the table for scan and >> the >> actual scan takes place in ATRewriteTable(), but there is not such queue >> while >> creating a table as a partition. But we should check if we can reuse the >> code to >> scan the heap for checking a constraint. >> >> In case of ATTACH PARTITION, probably we should schedule scan of default >> partition in the alter table's work queue like what >> ATExecAttachPartition() is >> doing for the table being attached. That would fit in the way alter table >> works. > > > I am still working on this. > But, about your comment here: > "if there's already a constraint on the default partition refutes the > partition > constraint on the new partition, we can skip the scan": > I am so far not able to imagine such a case, since default partition > constraint > can be imagined something like "minus infinity to positive infinity with > some finite set elimination", and any new non-default partition being added > would simply be a set of finite values(at-least in case of list, but I think > range > should not also differ here). Hence one cannot imply the other here. > Possibly, > I might be missing something that you had visioned when you raised the flag, > please correct me if I am missing something. I am hoping that this has been clarified in other mails in this thread between you and Amul. > >> >> /* Generate the main expression, i.e., keyCol = ANY (arr) */ >> opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber, >> - keyCol, (Expr *) arr); >> + keyCol, (Expr *) arr, >> spec->is_default); >> /* Build leftop = ANY (rightop) */ >> saopexpr = makeNode(ScalarArrayOpExpr); >> The comments in both the places need correction, as for default partition >> the >> expression will be keyCol <> ALL(arr). > > > Done. Please note that this changes, if you construct the constraint as !(keycol = ANY[]). > >> We have RelationGetPartitionDesc() for >> that. Probably we should also add Asserts to check that every pointer in >> the >> long pointer chain is Non-null. > > > I am sorry, but I did not understand which chain you are trying to point > here. The chain of pointers: a->b->c->d is a chain of pointers. > >> >> @@ -2044,7 +2044,7 @@ psql_completion(const char *text, int start, int >> end) >> COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, ""); >> /* Limited completion support for partition bound specification */ >> else if (TailMatches3("ATTACH", "PARTITION", MatchAny)) >> - COMPLETE_WITH_CONST("FOR VALUES"); >> + COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT"); >> else if (TailMatches2("FOR", "VALUES")) >> COMPLETE_WITH_LIST2("FROM (", "IN ("); >> >> @@ -2483,7 +2483,7 @@ psql_completion(const char *text, int start, int >> end) >> COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables, >> ""); >> /* Limited completion support for partition bound specification */ >> else if (TailMatches3("PARTITION", "OF", MatchAny)) >> - COMPLETE_WITH_CONST("FOR VALUES"); >> + COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT"); >> Do we include psql tab completion in the main feature patch? I have not >> seen >> this earlier. But appreciate taking care of these defails. > > > I am not sure about this. If needed I can submit a patch to take care of > this later, but > as of now I have not removed this from the patch. I looked at Amul's patch. He has tab completion changes for HASH partitions and those were suggested by Robert. So, keep those changes in this patch. Sorry for misunderstanding on my part. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Sat, Jun 3, 2017 at 2:11 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > + errmsg("default partition contains row(s) > that would overlap with partition being created"))); > > It doesn't really sound right to talk about rows overlapping with a > partition. Partitions can overlap with each other, but not rows. > Also, it's not really project style to use ambiguously plural forms > like "row(s)" in error messages. Maybe something like: > > new partition constraint for default partition \"%s\" would be > violated by some row > Partition constraint is implementation detail here. We enforce partition bounds through constraints and we call such constraints as partition constraints. But a user may not necessarily understand this term or may interpret it different. Adding "new" adds to the confusion as the default partition is not new. My suggestion in an earlier mail was ""default partition contains rows that conflict with the partition bounds of "part_xyz"", with a note that we should use a better word than "conflict". So, Jeevan seems to have used overlap, which again is not correct. How about "default partition contains row/s which would fit the partition "part_xyz" being created or attached." with a hint to move those rows to the new partition's table in case of attach. I don't think hint would be so straight forward i.e. to create the table with SELECT INTO and then ATTACH. What do you think? Also, the error code ERRCODE_CHECK_VIOLATION, which is an "integrity constraint violation" code, seems misleading. We aren't violating any integrity here. In fact I am not able to understand, how could adding an object violate integrity constraint. The nearest errorcode seems to be ERRCODE_INVALID_OBJECT_DEFINITION, which is also used for partitions with overlapping bounds. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > >> >> The code in check_default_allows_bound() to check whether the default >> partition >> has any rows that would fit new partition looks quite similar to the code >> in >> ATExecAttachPartition() checking whether all rows in the table being >> attached >> as a partition fit the partition bounds. One thing that >> check_default_allows_bound() misses is, if there's already a constraint on >> the >> default partition refutes the partition constraint on the new partition, >> we can >> skip the scan of the default partition since it can not have rows that >> would >> fit the new partition. ATExecAttachPartition() has code to deal with a >> similar >> case i.e. the table being attached has a constraint which implies the >> partition >> constraint. There may be more cases which check_default_allows_bound() >> does not >> handle but ATExecAttachPartition() handles. So, I am wondering whether >> it's >> better to somehow take out the common code into a function and use it. We >> will >> have to deal with a difference through. The first one would throw an error >> when >> finding a row that satisfies partition constraints whereas the second one >> would >> throw an error when it doesn't find such a row. But this difference can be >> handled through a flag or by negating the constraint. This would also take >> care >> of Amit Langote's complaint about foreign partitions. There's also another >> difference that the ATExecAttachPartition() queues the table for scan and >> the >> actual scan takes place in ATRewriteTable(), but there is not such queue >> while >> creating a table as a partition. But we should check if we can reuse the >> code to >> scan the heap for checking a constraint. >> >> In case of ATTACH PARTITION, probably we should schedule scan of default >> partition in the alter table's work queue like what >> ATExecAttachPartition() is >> doing for the table being attached. That would fit in the way alter table >> works. > I tried refactoring existing code so that it can be used for default partitioning as well. The code to validate the partition constraints against the table being attached in ATExecAttachPartition() is extracted out into a set of functions. For default partition we reuse those functions to check whether it contains any row that would fit the partition being attached. While creating a new partition, the function to skip validation is reused but the scan portion is duplicated from ATRewriteTable since we are not in ALTER TABLE context. The names of the functions, their declaration will require some thought. There's one test failing because for ATTACH partition the error comes from ATRewriteTable instead of check_default_allows_bounds(). May be we want to use same message in both places or some make ATRewriteTable give a different message while validating default partition. Please review the patch and let me know if the changes look good. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jun 8, 2017 at 2:54 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe > <jeevan.ladhe@enterprisedb.com> wrote: > >> >>> >>> The code in check_default_allows_bound() to check whether the default >>> partition >>> has any rows that would fit new partition looks quite similar to the code >>> in >>> ATExecAttachPartition() checking whether all rows in the table being >>> attached >>> as a partition fit the partition bounds. One thing that >>> check_default_allows_bound() misses is, if there's already a constraint on >>> the >>> default partition refutes the partition constraint on the new partition, >>> we can >>> skip the scan of the default partition since it can not have rows that >>> would >>> fit the new partition. ATExecAttachPartition() has code to deal with a >>> similar >>> case i.e. the table being attached has a constraint which implies the >>> partition >>> constraint. There may be more cases which check_default_allows_bound() >>> does not >>> handle but ATExecAttachPartition() handles. So, I am wondering whether >>> it's >>> better to somehow take out the common code into a function and use it. We >>> will >>> have to deal with a difference through. The first one would throw an error >>> when >>> finding a row that satisfies partition constraints whereas the second one >>> would >>> throw an error when it doesn't find such a row. But this difference can be >>> handled through a flag or by negating the constraint. This would also take >>> care >>> of Amit Langote's complaint about foreign partitions. There's also another >>> difference that the ATExecAttachPartition() queues the table for scan and >>> the >>> actual scan takes place in ATRewriteTable(), but there is not such queue >>> while >>> creating a table as a partition. But we should check if we can reuse the >>> code to >>> scan the heap for checking a constraint. >>> >>> In case of ATTACH PARTITION, probably we should schedule scan of default >>> partition in the alter table's work queue like what >>> ATExecAttachPartition() is >>> doing for the table being attached. That would fit in the way alter table >>> works. >> > > I tried refactoring existing code so that it can be used for default > partitioning as well. The code to validate the partition constraints > against the table being attached in ATExecAttachPartition() is > extracted out into a set of functions. For default partition we reuse > those functions to check whether it contains any row that would fit > the partition being attached. While creating a new partition, the > function to skip validation is reused but the scan portion is > duplicated from ATRewriteTable since we are not in ALTER TABLE > context. The names of the functions, their declaration will require > some thought. > > There's one test failing because for ATTACH partition the error comes > from ATRewriteTable instead of check_default_allows_bounds(). May be > we want to use same message in both places or some make ATRewriteTable > give a different message while validating default partition. > > Please review the patch and let me know if the changes look good. From the discussion on thread [1], that having a NOT NULL constraint embedded within an expression may cause a scan to be skipped when it shouldn't be. For default partitions such a case may arise. If an existing partition accepts NULL and we try to attach a default partition, it would get a NOT NULL partition constraint but it will be buried within an expression like !(key = any(array[1, 2, 3]) OR key is null) where the existing partition/s accept values 1, 2, 3 and null. We need to check whether the refactored code handles this case correctly. v19 patch does not have this problem since it doesn't try to skip the scan based on the constraints of the table being attached. Please try following cases 1. a default partition accepting nulls exists and we attach a partition to accept NULL values 2. a NULL accepting partition exists and we try to attach a table as default partition. In both the cases default partition should be checked for rows with NULL partition keys. In both the cases, if the default partition table has a NOT NULL constraint we should be able to skip the scan and should scan the table when such a constraint does not exist. [1] http://www.postgresql-archive.org/A-bug-in-mapping-attributes-in-ATExecAttachPartition-td5965298.html -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
From the discussion on thread [1], that having a NOT NULL constraintOn Thu, Jun 8, 2017 at 2:54 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>
>>
>>>
>>> The code in check_default_allows_bound() to check whether the default
>>> partition
>>> has any rows that would fit new partition looks quite similar to the code
>>> in
>>> ATExecAttachPartition() checking whether all rows in the table being
>>> attached
>>> as a partition fit the partition bounds. One thing that
>>> check_default_allows_bound() misses is, if there's already a constraint on
>>> the
>>> default partition refutes the partition constraint on the new partition,
>>> we can
>>> skip the scan of the default partition since it can not have rows that
>>> would
>>> fit the new partition. ATExecAttachPartition() has code to deal with a
>>> similar
>>> case i.e. the table being attached has a constraint which implies the
>>> partition
>>> constraint. There may be more cases which check_default_allows_bound()
>>> does not
>>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>>> it's
>>> better to somehow take out the common code into a function and use it. We
>>> will
>>> have to deal with a difference through. The first one would throw an error
>>> when
>>> finding a row that satisfies partition constraints whereas the second one
>>> would
>>> throw an error when it doesn't find such a row. But this difference can be
>>> handled through a flag or by negating the constraint. This would also take
>>> care
>>> of Amit Langote's complaint about foreign partitions. There's also another
>>> difference that the ATExecAttachPartition() queues the table for scan and
>>> the
>>> actual scan takes place in ATRewriteTable(), but there is not such queue
>>> while
>>> creating a table as a partition. But we should check if we can reuse the
>>> code to
>>> scan the heap for checking a constraint.
>>>
>>> In case of ATTACH PARTITION, probably we should schedule scan of default
>>> partition in the alter table's work queue like what
>>> ATExecAttachPartition() is
>>> doing for the table being attached. That would fit in the way alter table
>>> works.
>>
>
> I tried refactoring existing code so that it can be used for default
> partitioning as well. The code to validate the partition constraints
> against the table being attached in ATExecAttachPartition() is
> extracted out into a set of functions. For default partition we reuse
> those functions to check whether it contains any row that would fit
> the partition being attached. While creating a new partition, the
> function to skip validation is reused but the scan portion is
> duplicated from ATRewriteTable since we are not in ALTER TABLE
> context. The names of the functions, their declaration will require
> some thought.
>
> There's one test failing because for ATTACH partition the error comes
> from ATRewriteTable instead of check_default_allows_bounds(). May be
> we want to use same message in both places or some make ATRewriteTable
> give a different message while validating default partition.
>
> Please review the patch and let me know if the changes look good.
embedded within an expression may cause a scan to be skipped when it
shouldn't be. For default partitions such a case may arise. If an
existing partition accepts NULL and we try to attach a default
partition, it would get a NOT NULL partition constraint but it will be
buried within an expression like !(key = any(array[1, 2, 3]) OR key is
null) where the existing partition/s accept values 1, 2, 3 and null.
We need to check whether the refactored code handles this case
correctly. v19 patch does not have this problem since it doesn't try
to skip the scan based on the constraints of the table being attached.
Please try following cases 1. a default partition accepting nulls
exists and we attach a partition to accept NULL values 2. a NULL
accepting partition exists and we try to attach a table as default
partition. In both the cases default partition should be checked for
rows with NULL partition keys. In both the cases, if the default
partition table has a NOT NULL constraint we should be able to skip
the scan and should scan the table when such a constraint does not
exist.
On Wed, Jun 7, 2017 at 1:59 AM, amul sul <sulamul@gmail.com> wrote: > But Ashutosh's suggestion make sense, we might have constraints other > than that partitioning constraint on default partition. If those > constraints refutes the new partition's constraints, we should skip > the scan. Right. If the user adds a constraint to the default partition that is identical to the new partition constraint, that should cause the scan to be skipped. Ideally, we could do even better. For example, if the user is creating a new partition FOR VALUES IN (7), and the default partition has CHECK (key != 7), we could perhaps deduce that the combination of the existing partition constraint (which must certainly hold) and the additional CHECK constraint (which must also hold, at least assuming it's not marked NOT VALID) are sufficient to prove the new check constraint. But I'm not sure whether predicate_refuted_by() is smart enough to figure that out. However, it should definitely be smart enough to figure out that if somebody's added the new partitioning constraint as a CHECK constraint on the default partition, we don't need to scan it. The reason somebody might want to do that, just to be clear, is that they could do this in multiple steps: first, add the new CHECK constraint as NOT VALID. Then VALIDATE CONSTRAINT. Then add the new non-default partition. This would result in holding an exclusive lock for a lesser period of time than if they did it all together as one operation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 7, 2017 at 5:47 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Sat, Jun 3, 2017 at 2:11 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> + errmsg("default partition contains row(s) >> that would overlap with partition being created"))); >> >> It doesn't really sound right to talk about rows overlapping with a >> partition. Partitions can overlap with each other, but not rows. >> Also, it's not really project style to use ambiguously plural forms >> like "row(s)" in error messages. Maybe something like: >> >> new partition constraint for default partition \"%s\" would be >> violated by some row > > Partition constraint is implementation detail here. We enforce > partition bounds through constraints and we call such constraints as > partition constraints. But a user may not necessarily understand this > term or may interpret it different. Adding "new" adds to the confusion > as the default partition is not new. I see your point. We could say "updated partition constraint" instead of "new partition constraint" to address that to some degree. > My suggestion in an earlier mail > was ""default partition contains rows that conflict with the partition > bounds of "part_xyz"", with a note that we should use a better word > than "conflict". So, Jeevan seems to have used overlap, which again is > not correct. How about "default partition contains row/s which would > fit the partition "part_xyz" being created or attached." with a hint > to move those rows to the new partition's table in case of attach. I > don't think hint would be so straight forward i.e. to create the table > with SELECT INTO and then ATTACH. The problem is that none of these actually sound very good. Neither conflict nor overlap nor fit actually express the underlying idea very clearly, at least IMHO. I'm not opposed to using some wording along these lines if we can think of a clear way to word it, but I think my wording is better than using some unclear word for this concept. I can't immediately think of a way to adjust your wording so that it seems completely clear. > Also, the error code ERRCODE_CHECK_VIOLATION, which is an "integrity > constraint violation" code, seems misleading. We aren't violating any > integrity here. In fact I am not able to understand, how could adding > an object violate integrity constraint. The nearest errorcode seems to > be ERRCODE_INVALID_OBJECT_DEFINITION, which is also used for > partitions with overlapping bounds. I think that calling a constraint failure a check violation is not too much of a stretch, even if it's technically a partition constraint rather than a CHECK constraint. However, your proposal also seems reasonable. I'm happy to go with whatever most people like best. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
>>I tried refactoring existing code so that it can be used for default>> The code in check_default_allows_bound() to check whether the default
>> partition
>> has any rows that would fit new partition looks quite similar to the code
>> in
>> ATExecAttachPartition() checking whether all rows in the table being
>> attached
>> as a partition fit the partition bounds. One thing that
>> check_default_allows_bound() misses is, if there's already a constraint on
>> the
>> default partition refutes the partition constraint on the new partition,
>> we can
>> skip the scan of the default partition since it can not have rows that
>> would
>> fit the new partition. ATExecAttachPartition() has code to deal with a
>> similar
>> case i.e. the table being attached has a constraint which implies the
>> partition
>> constraint. There may be more cases which check_default_allows_bound()
>> does not
>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>> it's
>> better to somehow take out the common code into a function and use it. We
>> will
>> have to deal with a difference through. The first one would throw an error
>> when
>> finding a row that satisfies partition constraints whereas the second one
>> would
>> throw an error when it doesn't find such a row. But this difference can be
>> handled through a flag or by negating the constraint. This would also take
>> care
>> of Amit Langote's complaint about foreign partitions. There's also another
>> difference that the ATExecAttachPartition() queues the table for scan and
>> the
>> actual scan takes place in ATRewriteTable(), but there is not such queue
>> while
>> creating a table as a partition. But we should check if we can reuse the
>> code to
>> scan the heap for checking a constraint.
>>
>> In case of ATTACH PARTITION, probably we should schedule scan of default
>> partition in the alter table's work queue like what
>> ATExecAttachPartition() is
>> doing for the table being attached. That would fit in the way alter table
>> works.
>
partitioning as well. The code to validate the partition constraints
against the table being attached in ATExecAttachPartition() is
extracted out into a set of functions. For default partition we reuse
those functions to check whether it contains any row that would fit
the partition being attached. While creating a new partition, the
function to skip validation is reused but the scan portion is
duplicated from ATRewriteTable since we are not in ALTER TABLE
context. The names of the functions, their declaration will require
some thought.
There's one test failing because for ATTACH partition the error comes
from ATRewriteTable instead of check_default_allows_bounds(). May be
we want to use same message in both places or some make ATRewriteTable
give a different message while validating default partition.
Please review the patch and let me know if the changes look good.
Attachment
While the refactoring seems a reasonable way to re-use existing code, that may change based on the discussion in [1]. Till then please keep the refactoring patches separate from the main patch. In the final version, I think the refactoring changes to ATAttachPartition and the default partition support should be committed separately. So, please provide three different patches. That also makes review easy. On Mon, Jun 12, 2017 at 8:29 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi Ashutosh, > > I tried to look into your refactoring code. > When applied all 3 patches, I got some regression failures, I have fixed all > of > them now in attached patches, attached the regression.diffs. > > Moving further, I have also made following changes in attached patches: > > 1. 0001-Refactor-ATExecAttachPartition.patch > > + * There is a case in which we cannot rely on just the result of the > + * proof > This comment seems to also exist in current code, and I am not able to > follow > which case this refers to. But, IIUC, this comment is for the case where we > are > handling the 'key IS NOT NULL' part separately, and if that is the case it > is > not needed here in the prologue of the function. > > attachPartCanSkipValidation > +static bool > +ATCheckValidationSkippable(Relation scanRel, List *partConstraint, > + PartitionKey key) > The function name ATCheckValidationSkippable does not sound very intuitive > to me, > and also I think prefix AT is something does not fit here as the function is > not > really directly related to alter table command, instead is an auxiliary > function. > How about changing it to "attachPartitionRequiresScan" or > "canSkipPartConstraintValidation" > > + List *existConstraint = NIL; > Needs to be moved to inside if block instead. > > + bool skip_validate; > Needs to be initialized to false, otherwise it can be returned without > initialization when scanRel_constr is NULL. > > + if (scanRel_constr != NULL) > instead of this may be we can simply have: > + if (scanRel_constr == NULL) > + return false; > This can prevent further indentation. > > +static void > +ATValidatePartitionConstraints(List **wqueue, Relation scanRel, > + List *partConstraint, Relation rel) > What about just validatePartitionConstraints() > > + bool skip_validate = false; > + > + /* Check if we can do away with having to scan the table being attached. > */ > + skip_validate = ATCheckValidationSkippable(scanRel, partConstraint, > key); > > First assignment is unnecessary here. > > Instead of: > /* Check if we can do away with having to scan the table being attached. */ > skip_validate = ATCheckValidationSkippable(scanRel, partConstraint, key); > > /* It's safe to skip the validation scan after all */ > if (skip_validate) > ereport(INFO, > (errmsg("partition constraint for table \"%s\" is implied by existing > constraints", > RelationGetRelationName(scanRel)))); > > Following change can prevent further indentation: > if (ATCheckValidationSkippable(scanRel, partConstraint, key)) > { > ereport(INFO, > (errmsg("partition constraint for table \"%s\" is implied by existing > constraints", > RelationGetRelationName(scanRel)))); > return; > } > This way variable skip_validate will not be needed. > > Apart from this, I see that the patch will need change depending on how the > fix > for validating partition constraints in case of embedded NOT-NULL[1] shapes > up. > > 2. 0003-Refactor-default-partitioning-patch-to-re-used-code.patch > > + * In case the new partition bound being checked itself is a DEFAULT > + * bound, this check shouldn't be triggered as there won't already exists > + * the default partition in such a case. > I think above comment in DefineRelation() is not applicable, as > check_default_allows_bound() is called unconditional, and the check for > existence > of default partition is now done inside the check_default_allows_bound() > function. > > * This function checks if there exists a row in the default partition that > * fits in the new partition and throws an error if it finds one. > */ > Above comment for check_default_allows_bound() needs a change now, may be > something like this: > * This function checks if a default partition already exists and if it > does > * it checks if there exists a row in the default partition that fits in > the > * new partition and throws an error if it finds one. > */ > > List *new_part_constraints = NIL; > List *def_part_constraints = NIL; > I think above initialization is not needed, as the further assignments are > unconditional. > > + if (OidIsValid(default_oid)) > + { > + Relation default_rel = heap_open(default_oid, AccessExclusiveLock); > We already have taken a lock on default and here we should be using a NoLock > instead. > > + def_part_constraints = > get_default_part_validation_constraint(new_part_constraints); > exceeds 80 columns. > > + defPartConstraint = > get_default_part_validation_constraint(partBoundConstraint); > similarly, needs indentation. > > + > +List * > +get_default_part_validation_constraint(List *new_part_constraints) > +{ > Needs some comment. What about: > /* > * get_default_part_validation_constraint > * > * Given partition constraints, this function returns *would be* default > * partition constraint. > */ > > Apart from this, I tried to address the differences in error shown in case > of > attache and create partition when rows in default partition would violate > the > updated constraints, basically I have taken a flag in AlteredTableInfo to > indicate if the relation being scanned is a default partition or a child of > default partition(which I dint like much, but I don't see a way out here). > Still > the error message does not display the default partition name in error as of > check_default_allows_bound(). May be to address this and keep the messages > exactly similar we can copy the name of parent default partition in a field > in > AlteredTableInfo structure, which looks very ugly to me. I am open to > suggestions here. > > 3. changes to default_partition_v19.patch: > > The default partition constraint are no more built using the negator of the > operator, instead it is formed simply as NOT of the existing partitions: > e.g.: > if a null accepting partition already exists: > NOT ((keycol IS NULL) OR (keycol = ANY (arr))) > if a null accepting partition does not exists: > NOT ((keycol IS NOT NULL) AND (keycol = ANY (arr))), where arr is an array > of > datums in boundinfo->datums. > > Added tests for prepared statment. > > Renamed RelationGetDefaultPartitionOid() to get_default_partition_oid(). > > + if (partqualstate && ExecCheck(partqualstate, econtext)) > + ereport(ERROR, > + (errcode(ERRCODE_CHECK_VIOLATION), > + errmsg("new partition constraint for default partition \"%s\" would be > violated by some row", > + RelationGetRelationName(default_rel)))); > Per Ashutosh's suggestion[2], changed the error code to > ERRCODE_INVALID_OBJECT_DEFINITION. > Also, per Robert's suggestion[3], changed following message: > "new partition constraint for default partition \"%s\" would be violated by > some row" > to > "updated partition constraint for default partition \"%s\" would be violated > by some row" > > Some other cosmetic changes. > > Apart from this, I am exploring the tests in relation with NOT NULL > constraint > embedded within an expression. Will update on that shortly. > > [1]http://www.postgresql-archive.org/A-bug-in-mapping-attributes-in-ATExecAttachPartition-td5965298.html > [2]http://www.postgresql-archive.org/Adding-support-for-Default-partition-in-partitioning-td5946868i120.html#a5965277 > [3]http://www.postgresql-archive.org/Adding-support-for-Default-partition-in-partitioning-tp5946868p5965599.html > > Regards, > Jeevan Ladhe > > > On Thu, Jun 8, 2017 at 2:54 PM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> >> On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe >> <jeevan.ladhe@enterprisedb.com> wrote: >> >> > >> >> >> >> The code in check_default_allows_bound() to check whether the default >> >> partition >> >> has any rows that would fit new partition looks quite similar to the >> >> code >> >> in >> >> ATExecAttachPartition() checking whether all rows in the table being >> >> attached >> >> as a partition fit the partition bounds. One thing that >> >> check_default_allows_bound() misses is, if there's already a constraint >> >> on >> >> the >> >> default partition refutes the partition constraint on the new >> >> partition, >> >> we can >> >> skip the scan of the default partition since it can not have rows that >> >> would >> >> fit the new partition. ATExecAttachPartition() has code to deal with a >> >> similar >> >> case i.e. the table being attached has a constraint which implies the >> >> partition >> >> constraint. There may be more cases which check_default_allows_bound() >> >> does not >> >> handle but ATExecAttachPartition() handles. So, I am wondering whether >> >> it's >> >> better to somehow take out the common code into a function and use it. >> >> We >> >> will >> >> have to deal with a difference through. The first one would throw an >> >> error >> >> when >> >> finding a row that satisfies partition constraints whereas the second >> >> one >> >> would >> >> throw an error when it doesn't find such a row. But this difference can >> >> be >> >> handled through a flag or by negating the constraint. This would also >> >> take >> >> care >> >> of Amit Langote's complaint about foreign partitions. There's also >> >> another >> >> difference that the ATExecAttachPartition() queues the table for scan >> >> and >> >> the >> >> actual scan takes place in ATRewriteTable(), but there is not such >> >> queue >> >> while >> >> creating a table as a partition. But we should check if we can reuse >> >> the >> >> code to >> >> scan the heap for checking a constraint. >> >> >> >> In case of ATTACH PARTITION, probably we should schedule scan of >> >> default >> >> partition in the alter table's work queue like what >> >> ATExecAttachPartition() is >> >> doing for the table being attached. That would fit in the way alter >> >> table >> >> works. >> > >> >> I tried refactoring existing code so that it can be used for default >> partitioning as well. The code to validate the partition constraints >> against the table being attached in ATExecAttachPartition() is >> extracted out into a set of functions. For default partition we reuse >> those functions to check whether it contains any row that would fit >> the partition being attached. While creating a new partition, the >> function to skip validation is reused but the scan portion is >> duplicated from ATRewriteTable since we are not in ALTER TABLE >> context. The names of the functions, their declaration will require >> some thought. >> >> There's one test failing because for ATTACH partition the error comes >> from ATRewriteTable instead of check_default_allows_bounds(). May be >> we want to use same message in both places or some make ATRewriteTable >> give a different message while validating default partition. >> >> Please review the patch and let me know if the changes look good. > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
While the refactoring seems a reasonable way to re-use existing code,
that may change based on the discussion in [1]. Till then please keep
the refactoring patches separate from the main patch. In the final
version, I think the refactoring changes to ATAttachPartition and the
default partition support should be committed separately. So, please
provide three different patches. That also makes review easy.
Attachment
On Mon, Jun 12, 2017 at 9:39 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: While the refactoring seems a reasonable way to re-use existing code,
that may change based on the discussion in [1]. Till then please keep
the refactoring patches separate from the main patch. In the final
version, I think the refactoring changes to ATAttachPartition and the
default partition support should be committed separately. So, please
provide three different patches. That also makes review easy.Sure Ashutosh,PFA.
Attachment
On Wed, Jun 14, 2017 at 8:02 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Here are the details of the patches in attached zip. > 0001. refactoring existing ATExecAttachPartition code so that it can be > used for > default partitioning as well > 0002. support for default partition with the restriction of preventing > addition > of any new partition after default partition. > 0003. extend default partitioning support to allow addition of new > partitions. > 0004. extend default partitioning validation code to reuse the refactored > code > in patch 0001. I think the core ideas of this patch are pretty solid now. It's come a long way in the last month. High-level comments: - Needs to be rebased over b08df9cab777427fdafe633ca7b8abf29817aa55. - Still no documentation. - Should probably be merged with the patch to add default partitioning for ranges. Other stuff I noticed: - The regression tests don't seem to check that the scan-skipping logic works as expected. We have regression tests for that case for attaching regular partitions, and it seems like it would be worth testing the default-partition case as well. - check_default_allows_bound() assumes that if canSkipPartConstraintValidation() fails for the default partition, it will also fail for every subpartition of the default partition. That is, once we commit to scanning one child partition, we're committed to scanning them all. In practice, that's probably not a huge limitation, but if it's not too much code, we could keep the top-level check but also check each partitioning individually as we reach it, and skip the scan for any individual partitions for which the constraint can be proven. For example, suppose the top-level table is list-partitioned with a partition for each of the most common values, and then we range-partition the default partition. - The changes to the regression test results in 0004 make the error messages slightly worse. The old message names the default partition, whereas the new one does not. Maybe that's worth avoiding. Specific comments: + * Also, invalidate the parent's and a sibling default partition's relcache, + * so that the next rebuild will load the new partition's info into parent's + * partition descriptor and default partition constraints(which are dependent + * on other partition bounds) are built anew. I find this a bit unclear, and it also repeats the comment further down. Maybe something like: Also, invalidate the parent's relcache entry, so that the next rebuild will load he new partition's info into its partition descriptor. If there is a default partition, we must invalidate its relcache entry as well. + /* + * The default partition constraints depend upon the partition bounds of + * other partitions. Adding a new(or even removing existing) partition + * would invalidate the default partition constraints. Invalidate the + * default partition's relcache so that the constraints are built anew and + * any plans dependent on those constraints are invalidated as well. + */ Here, I'd write: The partition constraint for the default partition depends on the partition bounds of every other partition, so we must invalidate the relcache entry for that partition every time a partition is added or removed. + /* + * Default partition cannot be added if there already + * exists one. + */ + if (spec->is_default) + { + overlap = partition_bound_has_default(boundinfo); + with = boundinfo->default_index; + break; + } To support default partitioning for range, this is going to have to be moved above the switch rather than done inside of it. And there's really no downside to putting it there. + * constraint, by *proving* that the existing constraints of the table + * *imply* the given constraints. We include the table's check constraints and Both the comma and the asterisks are unnecessary. + * Check whether all rows in the given table (scanRel) obey given partition obey the given I think the larger comment block could be tightened up a bit, like this: Check whether all rows in the given table obey the given partition constraint; if so, it can be attached as a partition. We do this by scanning the table (or all of its leaf partitions) row by row, except when the existing constraints are sufficient to prove that the new partitioning constraint must already hold. + /* Check if we can do away with having to scan the table being attached. */ If possible, skip the validation scan. + * Set up to have the table be scanned to validate the partition + * constraint If it's a partitioned table, we instead schedule its leaf + * partitions to be scanned. I suggest: Prepare to scan the default partition (or, if it is itself partitioned, all of its leaf partitions). + int default_index; /* Index of the default partition if any; -1 + * if there isn't one */ "if any" is a bit redundant with "if there isn't one"; note the phrasing of the preceding entry. + /* + * Skip if it's a partitioned table. Only RELKIND_RELATION relations + * (ie, leaf partitions) need to be scanned. + */ + if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE || + part_rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE) The comment talks about what must be included in our list of things to scan, but the code tests for the things that can be excluded. I suspect the comment has the right idea and the code should be adjusted to match, but anyway they should be consistent. Also, the correct way to punctuate i.e. is like this: (i.e. leaf partitions) You should have a period after each letter, but no following comma. + * The default partition must be already having an AccessExclusiveLock. I think we should instead change DefineRelation to open (rather than just lock) the default partition and pass the Relation as an argument here so that we need not reopen it. + /* Construct const from datum */ + val = makeConst(key->parttypid[0], + key->parttypmod[0], + key->parttypcoll[0], + key->parttyplen[0], + *boundinfo->datums[i], + false, /* isnull */ + key->parttypbyval[0] /* byval */ ); The /* byval */ comment looks a bit redundant, but I think this could use a comment along the lines of: /* Only single-column list partitioning is supported, so we only need to worry about the partition key with index 0. */ And I'd also add an Assert() verifying the the partition key has exactly 1 column, so that this breaks a bit more obviously if someone removes that restriction in the future. + * Handle NULL partition key here if there's a null-accepting list + * partition, else later it will be routed to the default partition if + * one exists. This isn't a great update of the existing comment -- it's drifted from explaining the code to which it is immediately attached to a more general discussion of NULL handling. I'd just say something like: If this is a NULL, send it to the null-accepting partition. Otherwise, route by searching the array of partition bounds. + if (tab->is_default_partition) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("updated partition constraint for default partition would be violated by some row"))); + else + ereport(ERROR, + (errcode(ERRCODE_CHECK_VIOLATION), While there's room for debate about the correct error code here, it's hard for me to believe that it's correct to use one error code for the is_default_partition case and a different error code the rest of the time. + * previously cached default partition constraints; those constraints + * won't stand correct after addition(or even removal) of a partition. won't be correct after addition or removal + * allow any row that qualifies for this new partition. So, check if + * the existing data in the default partition satisfies this *would be* + * default partition constraint. check that the existing data in the default partition satisfies the constraint as it will exist after adding this partition + * Need to take a lock on the default partition, refer comment for locking + * the default partition in DefineRelation(). I'd say: We must also lock the default partition, for the same reasons explained in DefineRelation(). And similarly in the other places that refer to that same comment. + /* + * In case of the default partition, the constraint is of the form + * "!(result)" i.e. one of the following two forms: + * 1. NOT ((keycol IS NULL) OR (keycol = ANY (arr))) + * 2. NOT ((keycol IS NOT NULL) AND (keycol = ANY (arr))), where arr is an + * array of datums in boundinfo->datums. + */ Does this survive pgindent? You might need to surround the comment with dashes to preserve formatting. I think it would be worth adding a little more text this comment, something like this: Note that, in general, applying NOT to a constraint expression doesn't necessarily invert the set of rows it accepts, because NOT NULL is NULL. However, the partition constraints we construct here never evaluate to NULL, so applying NOT works as intended. + * Check whether default partition has a row that would fit the partition + * being attached by negating the partition constraint derived from the + * bounds. Since default partition is already part of the partitioned + * table, we don't need to validate the constraints on the partitioned + * table. Here again, I'd add to the end of the first sentence a parenthetical note, like this: ...from the bounds (the partition constraint never evaluates to NULL, so negating it like this is safe). I don't understand the second sentence. It seems to contradict the first one. +extern List *get_default_part_validation_constraint(List *new_part_constaints);#endif /* PARTITION_H */ There should be a blank line after the last prototype and before the #endif. +-- default partition table when it is being used in cahced plan. Typo. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/06/15 4:51, Robert Haas wrote: > On Wed, Jun 14, 2017 at 8:02 AM, Jeevan Ladhe > <jeevan.ladhe@enterprisedb.com> wrote: >> Here are the details of the patches in attached zip. >> 0001. refactoring existing ATExecAttachPartition code so that it can be >> used for >> default partitioning as well >> 0002. support for default partition with the restriction of preventing >> addition >> of any new partition after default partition. >> 0003. extend default partitioning support to allow addition of new >> partitions. >> 0004. extend default partitioning validation code to reuse the refactored >> code >> in patch 0001. > > I think the core ideas of this patch are pretty solid now. It's come > a long way in the last month. +1 BTW, I noticed the following in 0002: @@ -1322,15 +1357,59 @@ get_qual_for_list(PartitionKey key, PartitionBoundSpec *spec) [ ... ] + oldcxt = MemoryContextSwitchTo(CacheMemoryContext); I'm not sure if we need to do that. Can you explain? Thanks, Amit
Oops, I meant to send one more comment. On 2017/06/15 15:48, Amit Langote wrote: > BTW, I noticed the following in 0002 + errmsg("there exists a default partition for table \"%s\", cannot add a new partition", This error message style seems novel to me. I'm not sure about the best message text here, but maybe: "cannot add new partition to table \"%s\" with default partition" Note that the comment applies to both DefineRelation and ATExecAttachPartition. Thanks, Amit
Some more comments on the latest set of patches. In heap_drop_with_catalog(), we heap_open() the parent table to get the default partition OID, if any. If the relcache doesn't have an entry for the parent, this means that the entry will be created, only to be invalidated at the end of the function. If there is no default partition, this all is completely unnecessary. We should avoid heap_open() in this case. This also means that get_default_partition_oid() should not rely on the relcache entry, but should growl through pg_inherit to find the default partition. In get_qual_for_list(), if the table has only default partition, it won't have any boundinfo. In such a case the default partition's constraint would come out as (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[]::integer[])))). The empty array looks odd and may be we spend a few CPU cycles executing ANY on an empty array. We have the same problem with a partition containing only NULL value. So, may be this one is not that bad. Please add a testcase to test addition of default partition as the first partition. get_qual_for_list() allocates the constant expressions corresponding to the datums in CacheMemoryContext while constructing constraints for a default partition. We do not do this for other partitions. We may not be constructing the constraints for saving in the cache. For example, ATExecAttachPartition constructs the constraints for validation. In such a case, this code will unnecessarily clobber the cache memory. generate_partition_qual() copies the partition constraint in the CacheMemoryContext. + if (spec->is_default) + { + result = list_make1(make_ands_explicit(result)); + result = list_make1(makeBoolExpr(NOT_EXPR, result, -1)); + } If the "result" is an OR expression, calling make_ands_explicit() on it would create AND(OR(...)) expression, with an unnecessary AND. We want to avoid that? + if (cur_index < 0 && (partition_bound_has_default(partdesc->boundinfo))) + cur_index = partdesc->boundinfo->default_index; + The partition_bound_has_default() check is unnecessary since we check for cur_index < 0 next anyway. + * + * Given the parent relation checks if it has default partition, and if it + * does exist returns its oid, otherwise returns InvalidOid. + */ May be reworded as "If the given relation has a default partition, this function returns the OID of the default partition. Otherwise it returns InvalidOid." +Oid +get_default_partition_oid(Relation parent) +{ + PartitionDesc partdesc = RelationGetPartitionDesc(parent); + + if (partdesc->boundinfo && partition_bound_has_default(partdesc->boundinfo)) + return partdesc->oids[partdesc->boundinfo->default_index]; + + return InvalidOid; +} An unpartitioned table would not have partdesc set set. So, this function will segfault if we pass an unpartitioned table. Either Assert that partdesc should exist or check for its NULL-ness. + defaultPartOid = get_default_partition_oid(rel); + if (OidIsValid(defaultPartOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("there exists a default partition for table \"%s\", cannot attach a new partition", + RelationGetRelationName(rel)))); + Should be done before heap_open on the table being attached. If we are not going to attach the partition, there's no point in instantiating its relcache. The comment in heap_drop_with_catalog() should mention why we lock the default partition before locking the table being dropped. extern List *preprune_pg_partitions(PlannerInfo *root, RangeTblEntry *rte, Index rti, Node *quals,LOCKMODE lockmode); -#endif /* PARTITION_H */ Unnecessary hunk. On Thu, Jun 15, 2017 at 12:31 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Oops, I meant to send one more comment. > > On 2017/06/15 15:48, Amit Langote wrote: >> BTW, I noticed the following in 0002 > + errmsg("there exists a default partition for table \"%s\", cannot > add a new partition", > > This error message style seems novel to me. I'm not sure about the best > message text here, but maybe: "cannot add new partition to table \"%s\" > with default partition" > > Note that the comment applies to both DefineRelation and > ATExecAttachPartition. > > Thanks, > Amit > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > Some more comments on the latest set of patches. > > In heap_drop_with_catalog(), we heap_open() the parent table to get the > default partition OID, if any. If the relcache doesn't have an entry for the > parent, this means that the entry will be created, only to be invalidated at > the end of the function. If there is no default partition, this all is > completely unnecessary. We should avoid heap_open() in this case. This also > means that get_default_partition_oid() should not rely on the relcache entry, > but should growl through pg_inherit to find the default partition. I am *entirely* unconvinced by this line of argument. I think we want to open the relation the first time we touch it and pass the Relation around thereafter. Anything else is prone to accidentally failing to have the relation locked early enough, or looking up the OID in the relcache multiple times. > In get_qual_for_list(), if the table has only default partition, it won't have > any boundinfo. In such a case the default partition's constraint would come out > as (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[]::integer[])))). The empty array > looks odd and may be we spend a few CPU cycles executing ANY on an empty array. > We have the same problem with a partition containing only NULL value. So, may > be this one is not that bad. I think that one is probably worth fixing. > Please add a testcase to test addition of default partition as the first > partition. That seems like a good idea, too. > get_qual_for_list() allocates the constant expressions corresponding to the > datums in CacheMemoryContext while constructing constraints for a default > partition. We do not do this for other partitions. We may not be constructing > the constraints for saving in the cache. For example, ATExecAttachPartition > constructs the constraints for validation. In such a case, this code will > unnecessarily clobber the cache memory. generate_partition_qual() copies the > partition constraint in the CacheMemoryContext. > > + if (spec->is_default) > + { > + result = list_make1(make_ands_explicit(result)); > + result = list_make1(makeBoolExpr(NOT_EXPR, result, -1)); > + } Clearly we do not want things to end up across multiple contexts. We should ensure that anything linked from the relcache entry ends up in CacheMemoryContext, but we must be careful not to allocate anything else in there, because CacheMemoryContext is never reset. > If the "result" is an OR expression, calling make_ands_explicit() on it would > create AND(OR(...)) expression, with an unnecessary AND. We want to avoid that? I'm not sure it's worth the trouble. > + defaultPartOid = get_default_partition_oid(rel); > + if (OidIsValid(defaultPartOid)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > + errmsg("there exists a default partition for table > \"%s\", cannot attach a new partition", > + RelationGetRelationName(rel)))); > + > Should be done before heap_open on the table being attached. If we are not > going to attach the partition, there's no point in instantiating its relcache. No, because we should take the lock before examining any properties of the table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 16, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> Some more comments on the latest set of patches. >> >> In heap_drop_with_catalog(), we heap_open() the parent table to get the >> default partition OID, if any. If the relcache doesn't have an entry for the >> parent, this means that the entry will be created, only to be invalidated at >> the end of the function. If there is no default partition, this all is >> completely unnecessary. We should avoid heap_open() in this case. This also >> means that get_default_partition_oid() should not rely on the relcache entry, >> but should growl through pg_inherit to find the default partition. > > I am *entirely* unconvinced by this line of argument. I think we want > to open the relation the first time we touch it and pass the Relation > around thereafter. If this would be correct, why heap_drop_with_catalog() without this patch just locks the parent and doesn't call a heap_open(). I am missing something. > Anything else is prone to accidentally failing to > have the relation locked early enough, We are locking the parent relation even without this patch, so this isn't an issue. > or looking up the OID in the > relcache multiple times. I am not able to understand this in the context of default partition. After that nobody else is going to change its partitions and their bounds (since both of those require heap_open on parent which would be stuck on the lock we hold.). So, we have to check only once if the table has a default partition. If it doesn't, it's not going to acquire one unless we release the lock on the parent i.e at the end of transaction. If it has one, it's not going to get dropped till the end of the transaction for the same reason. I don't see where we are looking up OIDs multiple times. > >> + defaultPartOid = get_default_partition_oid(rel); >> + if (OidIsValid(defaultPartOid)) >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), >> + errmsg("there exists a default partition for table >> \"%s\", cannot attach a new partition", >> + RelationGetRelationName(rel)))); >> + >> Should be done before heap_open on the table being attached. If we are not >> going to attach the partition, there's no point in instantiating its relcache. > > No, because we should take the lock before examining any properties of > the table. There are three tables involved here. "rel" which is the partitioned table. "attachrel" is the table being attached as a partition to "rel" and defaultrel, which is the default partition table. If there exists a default partition in "rel" we are not allowing "attachrel" to be attached to "rel". If that's the case, we don't need to examine any properties of "attachrel" and hence we don't need to instantiate relcache of "attachrel". That's what the comment is about. ATExecAttachPartition() receives "rel" as an argument which has been already locked and opened. So, we can check the existence of default partition right at the beginning of the function. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Hello, I'd like to review this but it doesn't fit the master, as Robert said. Especially the interface of predicate_implied_by is changed by the suggested commit. Anyway I have some comment on this patch with fresh eyes. I believe the basic design so my comment below are from a rather micro viewpoint. At Thu, 15 Jun 2017 16:01:53 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <a1267081-6e9a-e570-f6cf-34ff801bf503@lab.ntt.co.jp> > Oops, I meant to send one more comment. > > On 2017/06/15 15:48, Amit Langote wrote: > > BTW, I noticed the following in 0002 > + errmsg("there exists a default partition for table \"%s\", cannot > add a new partition", > > This error message style seems novel to me. I'm not sure about the best > message text here, but maybe: "cannot add new partition to table \"%s\" > with default partition" > > Note that the comment applies to both DefineRelation and > ATExecAttachPartition. - Considering on how canSkipPartConstraintValidation is called, I *think* that RelationProvenValid() would be better. (Butthis would be disappear by rebasing..) - 0002 changes the interface of get_qual_for_list, but left get_qual_for_range alone. Anyway get_qual_for_range will haveto do the similar thing soon. - In check_new_partition_bound, "overlap" and "with" is completely correlated with each other. "with > -1" means "overlap= true". So overlap is not useless. ("with" would be better to be "overlap_with" or somehting if we remove "overlap") - The error message of check_default_allows_bound is below. "updated partition constraint for default partition \"%s\" would be violated by some row" This looks an analog of validateCheckConstraint but as my understanding this function is called only when new partitionis added. This would be difficult to recognize in the situation. "the default partition contains rows that should be in the new partition: \"%s\"" or something? - In check_default_allows_bound, the iteration over partitions is quite similar to what validateCheckConstraint does. Canwe somehow share validateCheckConstraint with this function? - In the same function, skipping RELKIND_PARTITIONED_TABLE is okay, but silently ignoring RELKIND_FOREIGN_TABLE doesn't seemgood. I think at least some warning should be emitted. "Skipping foreign tables in the defalut partition. It might contain rows that should be in the new partition." (Needs preventing multple warnings in single call, maybe) - In the same function, the following condition seems somewhat strange in comparison to validateCheckConstraint. > if (partqualstate && ExecCheck(partqualstate, econtext)) partqualstate won't be null as long as partition_constraint is valid. Anyway (I'm believing that) an invalid constraintresults in error by ExecPrepareExpr. Therefore 'if (partqualstate' is useless. - In gram.y, the nonterminal for list spec clause is still "ForValues". It seems somewhat strange. partition_spec or somethingwould be better. - This is not a part of this patch, but in ruleutils.c, the error for unknown paritioning strategy is emitted as following. > elog(ERROR, "unrecognized partition strategy: %d", > (int) strategy); The cast is added because the strategy is a char. I suppose this is because strategy can be an unprintable. I'd like tosee a comment if it is correct. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2017/06/16 14:16, Ashutosh Bapat wrote: > On Fri, Jun 16, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> Some more comments on the latest set of patches. >>> >>> In heap_drop_with_catalog(), we heap_open() the parent table to get the >>> default partition OID, if any. If the relcache doesn't have an entry for the >>> parent, this means that the entry will be created, only to be invalidated at >>> the end of the function. If there is no default partition, this all is >>> completely unnecessary. We should avoid heap_open() in this case. This also >>> means that get_default_partition_oid() should not rely on the relcache entry, >>> but should growl through pg_inherit to find the default partition. >> >> I am *entirely* unconvinced by this line of argument. I think we want >> to open the relation the first time we touch it and pass the Relation >> around thereafter. > > If this would be correct, why heap_drop_with_catalog() without this > patch just locks the parent and doesn't call a heap_open(). I am > missing something. As of commit c1e0e7e1d790bf, we avoid creating relcache entry for the parent. Before that commit, drop table partitioned_table_with_many_partitions used to take too long and consumed quite some memory as result of relcache invalidation requested at the end on the parent table for every partition. If this patch reintroduces the heap_open() on the parent table, that's going to bring back the problem fixed by that commit. >> Anything else is prone to accidentally failing to >> have the relation locked early enough, > > We are locking the parent relation even without this patch, so this > isn't an issue. Yes. >> or looking up the OID in the >> relcache multiple times. > > I am not able to understand this in the context of default partition. > After that nobody else is going to change its partitions and their > bounds (since both of those require heap_open on parent which would be > stuck on the lock we hold.). So, we have to check only once if the > table has a default partition. If it doesn't, it's not going to > acquire one unless we release the lock on the parent i.e at the end of > transaction. If it has one, it's not going to get dropped till the end > of the transaction for the same reason. I don't see where we are > looking up OIDs multiple times. Without heap_opening the parent, the only way is to look up parentOid's children in pg_inherits and for each child looking up its pg_class tuple in the syscache to see if its relpartbound indicates that it's a default partition. That seems like it won't be inexpensive either. It would be nice if could get that information (that is - is a given relation being heap_drop_with_catalog'd a partition of the parent that happens to have default partition) in less number of steps than that. Having that information in relcache is one way, but as mentioned, that turns out be expensive. Has anyone considered the idea of putting the default partition OID in the pg_partitioned_table catalog? Looking the above information up would amount to one syscache lookup. Default partition seems to be special enough object to receive a place in the pg_partitioned_table tuple of the parent. Thoughts? >>> + defaultPartOid = get_default_partition_oid(rel); >>> + if (OidIsValid(defaultPartOid)) >>> + ereport(ERROR, >>> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), >>> + errmsg("there exists a default partition for table >>> \"%s\", cannot attach a new partition", >>> + RelationGetRelationName(rel)))); >>> + >>> Should be done before heap_open on the table being attached. If we are not >>> going to attach the partition, there's no point in instantiating its relcache. >> >> No, because we should take the lock before examining any properties of >> the table. > > There are three tables involved here. "rel" which is the partitioned > table. "attachrel" is the table being attached as a partition to "rel" > and defaultrel, which is the default partition table. If there exists > a default partition in "rel" we are not allowing "attachrel" to be > attached to "rel". If that's the case, we don't need to examine any > properties of "attachrel" and hence we don't need to instantiate > relcache of "attachrel". That's what the comment is about. > ATExecAttachPartition() receives "rel" as an argument which has been > already locked and opened. So, we can check the existence of default > partition right at the beginning of the function. It seems that we are examining the properties of the parent table here (whether it has default partition), which as Ashutosh mentions, is already locked before we got to ATExecAttachPartition(). Another place where we are ereporting before locking the table to be attached (actually even before looking it up by name), based just on the properties of the parent table, is in transformPartitionCmd(): /* the table must be partitioned */ if (parentRel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("\"%s\" is not partitioned", RelationGetRelationName(parentRel)))); Thanks, Amit
On Wed, Jun 14, 2017 at 8:02 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Here are the details of the patches in attached zip.
> 0001. refactoring existing ATExecAttachPartition code so that it can be
> used for
> default partitioning as well
> 0002. support for default partition with the restriction of preventing
> addition
> of any new partition after default partition.
> 0003. extend default partitioning support to allow addition of new
> partitions.
> 0004. extend default partitioning validation code to reuse the refactored
> code
> in patch 0001.
I think the core ideas of this patch are pretty solid now. It's come
a long way in the last month. High-level comments:
- Needs to be rebased over b08df9cab777427fdafe633ca7b8abf29817aa55.
- Still no documentation.
- Should probably be merged with the patch to add default partitioning
for ranges.
Oops, I meant to send one more comment.
On 2017/06/15 15:48, Amit Langote wrote:
> BTW, I noticed the following in 0002
+ errmsg("there exists a default partition for table \"%s\", cannot
add a new partition",
This error message style seems novel to me. I'm not sure about the best
message text here, but maybe: "cannot add new partition to table \"%s\"
with default partition"
Hello, I'd like to review this but it doesn't fit the master, as
Robert said. Especially the interface of predicate_implied_by is
changed by the suggested commit.
Anyway I have some comment on this patch with fresh eyes. I
believe the basic design so my comment below are from a rather
micro viewpoint.
At Thu, 15 Jun 2017 16:01:53 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <a1267081-6e9a-e570-f6cf- 34ff801bf503@lab.ntt.co.jp>
> Oops, I meant to send one more comment.
>
> On 2017/06/15 15:48, Amit Langote wrote:
> > BTW, I noticed the following in 0002
> + errmsg("there exists a default partition for table \"%s\", cannot
> add a new partition",
>
> This error message style seems novel to me. I'm not sure about the best
> message text here, but maybe: "cannot add new partition to table \"%s\"
> with default partition"
>
> Note that the comment applies to both DefineRelation and
> ATExecAttachPartition.
- Considering on how canSkipPartConstraintValidation is called, I
*think* that RelationProvenValid() would be better. (But this
would be disappear by rebasing..)
- 0002 changes the interface of get_qual_for_list, but left
get_qual_for_range alone. Anyway get_qual_for_range will have
to do the similar thing soon.
- In check_new_partition_bound, "overlap" and "with" is
completely correlated with each other. "with > -1" means
"overlap = true". So overlap is not useless. ("with" would be
better to be "overlap_with" or somehting if we remove
"overlap")
- The error message of check_default_allows_bound is below.
"updated partition constraint for default partition \"%s\"
would be violated by some row"
This looks an analog of validateCheckConstraint but as my
understanding this function is called only when new partition
is added. This would be difficult to recognize in the
situation.
"the default partition contains rows that should be in
the new partition: \"%s\""
or something?
- In check_default_allows_bound, the iteration over partitions is
quite similar to what validateCheckConstraint does. Can we
somehow share validateCheckConstraint with this function?
- In the same function, skipping RELKIND_PARTITIONED_TABLE is
okay, but silently ignoring RELKIND_FOREIGN_TABLE doesn't seem
good. I think at least some warning should be emitted.
"Skipping foreign tables in the defalut partition. It might
contain rows that should be in the new partition." (Needs
preventing multple warnings in single call, maybe)
- In the same function, the following condition seems somewhat
strange in comparison to validateCheckConstraint.
> if (partqualstate && ExecCheck(partqualstate, econtext))
partqualstate won't be null as long as partition_constraint is
valid. Anyway (I'm believing that) an invalid constraint
results in error by ExecPrepareExpr. Therefore 'if
(partqualstate' is useless.
- In gram.y, the nonterminal for list spec clause is still
"ForValues". It seems somewhat strange. partition_spec or
something would be better.
- This is not a part of this patch, but in ruleutils.c, the error
for unknown paritioning strategy is emitted as following.
> elog(ERROR, "unrecognized partition strategy: %d",
> (int) strategy);
The cast is added because the strategy is a char. I suppose
this is because strategy can be an unprintable. I'd like to see
a comment if it is correct.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2017/06/21 21:37, Jeevan Ladhe wrote: > Hi Amit, > > On Thu, Jun 15, 2017 at 12:31 PM, Amit Langote < > Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> Oops, I meant to send one more comment. >> >> On 2017/06/15 15:48, Amit Langote wrote: >>> BTW, I noticed the following in 0002 >> + errmsg("there exists a default >> partition for table \"%s\", cannot >> add a new partition", >> >> This error message style seems novel to me. I'm not sure about the best >> message text here, but maybe: "cannot add new partition to table \"%s\" >> with default partition" >> > > This sounds confusing to me, what about something like: > "\"%s\" has a default partition, cannot add a new partition." It's the comma inside the error message that suggests to me that it's a style that I haven't seen elsewhere in the backend code. The primary error message here is that the new partition cannot be created. "%s has default partition" seems to me to belong in errdetail() (see "What Goes Where" in [1].) Or write the sentence such that the comma is not required. Anyway, we can leave this for the committer to decide. > Note that this comment belongs to patch 0002, and it will go away > in case we are going to have extended functionality i.e. patch 0003, > as in that patch we allow user to create a new partition even in the > cases when there exists a default partition. Oh, that'd be great. It's always better to get rid of the error conditions that are hard to communicate to users. :) (Although, this one's not that ambiguous.) Thanks, Amit [1] https://www.postgresql.org/docs/devel/static/error-style-guide.html
On Wed, Jun 21, 2017 at 8:47 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > It's the comma inside the error message that suggests to me that it's a > style that I haven't seen elsewhere in the backend code. Exactly. Avoid that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/06/16 14:16, Ashutosh Bapat wrote:
> On Fri, Jun 16, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> Some more comments on the latest set of patches.
>> or looking up the OID in the
>> relcache multiple times.
>
> I am not able to understand this in the context of default partition.
> After that nobody else is going to change its partitions and their
> bounds (since both of those require heap_open on parent which would be
> stuck on the lock we hold.). So, we have to check only once if the
> table has a default partition. If it doesn't, it's not going to
> acquire one unless we release the lock on the parent i.e at the end of
> transaction. If it has one, it's not going to get dropped till the end
> of the transaction for the same reason. I don't see where we are
> looking up OIDs multiple times.
Without heap_opening the parent, the only way is to look up parentOid's
children in pg_inherits and for each child looking up its pg_class tuple
in the syscache to see if its relpartbound indicates that it's a default
partition. That seems like it won't be inexpensive either.
It would be nice if could get that information (that is - is a given
relation being heap_drop_with_catalog'd a partition of the parent that
happens to have default partition) in less number of steps than that.
Having that information in relcache is one way, but as mentioned, that
turns out be expensive.
Has anyone considered the idea of putting the default partition OID in the
pg_partitioned_table catalog? Looking the above information up would
amount to one syscache lookup. Default partition seems to be special
enough object to receive a place in the pg_partitioned_table tuple of the
parent. Thoughts?
Hi,On Mon, Jun 19, 2017 at 12:34 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: On 2017/06/16 14:16, Ashutosh Bapat wrote:
> On Fri, Jun 16, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> Some more comments on the latest set of patches.
>> or looking up the OID in the
>> relcache multiple times.
>
> I am not able to understand this in the context of default partition.
> After that nobody else is going to change its partitions and their
> bounds (since both of those require heap_open on parent which would be
> stuck on the lock we hold.). So, we have to check only once if the
> table has a default partition. If it doesn't, it's not going to
> acquire one unless we release the lock on the parent i.e at the end of
> transaction. If it has one, it's not going to get dropped till the end
> of the transaction for the same reason. I don't see where we are
> looking up OIDs multiple times.
Without heap_opening the parent, the only way is to look up parentOid's
children in pg_inherits and for each child looking up its pg_class tuple
in the syscache to see if its relpartbound indicates that it's a default
partition. That seems like it won't be inexpensive either.
It would be nice if could get that information (that is - is a given
relation being heap_drop_with_catalog'd a partition of the parent that
happens to have default partition) in less number of steps than that.
Having that information in relcache is one way, but as mentioned, that
turns out be expensive.
Has anyone considered the idea of putting the default partition OID in the
pg_partitioned_table catalog? Looking the above information up would
amount to one syscache lookup. Default partition seems to be special
enough object to receive a place in the pg_partitioned_table tuple of the
parent. Thoughts?I liked this suggestion. Having an entry in pg_partitioned_table would avoidboth expensive methods, i.e. 1. opening the parent or 2. lookup foreach of the children first in pg_inherits and then its corresponding entry inpg_class.Unless anybody has any other suggestions/comments here, I am going toimplement this suggestion.Thanks,Jeevan Ladhe
Attachment
- Needs to be rebased over b08df9cab777427fdafe633ca7b8abf29817aa55.
- Still no documentation.
- Should probably be merged with the patch to add default partitioning
for ranges.
Other stuff I noticed:
- The regression tests don't seem to check that the scan-skipping
logic works as expected. We have regression tests for that case for
attaching regular partitions, and it seems like it would be worth
testing the default-partition case as well.
- check_default_allows_bound() assumes that if
canSkipPartConstraintValidation() fails for the default partition, it
will also fail for every subpartition of the default partition. That
is, once we commit to scanning one child partition, we're committed to
scanning them all. In practice, that's probably not a huge
limitation, but if it's not too much code, we could keep the top-level
check but also check each partitioning individually as we reach it,
and skip the scan for any individual partitions for which the
constraint can be proven. For example, suppose the top-level table is
list-partitioned with a partition for each of the most common values,
and then we range-partition the default partition.
- The changes to the regression test results in 0004 make the error
messages slightly worse. The old message names the default partition,
whereas the new one does not. Maybe that's worth avoiding.
Specific comments:
+ * Also, invalidate the parent's and a sibling default partition's relcache,
+ * so that the next rebuild will load the new partition's info into parent's
+ * partition descriptor and default partition constraints(which are dependent
+ * on other partition bounds) are built anew.
I find this a bit unclear, and it also repeats the comment further
down. Maybe something like: Also, invalidate the parent's relcache
entry, so that the next rebuild will load he new partition's info into
its partition descriptor. If there is a default partition, we must
invalidate its relcache entry as well.
+ /*
+ * The default partition constraints depend upon the partition bounds of
+ * other partitions. Adding a new(or even removing existing) partition
+ * would invalidate the default partition constraints. Invalidate the
+ * default partition's relcache so that the constraints are built anew and
+ * any plans dependent on those constraints are invalidated as well.
+ */
Here, I'd write: The partition constraint for the default partition
depends on the partition bounds of every other partition, so we must
invalidate the relcache entry for that partition every time a
partition is added or removed.
+ /*
+ * Default partition cannot be added if there already
+ * exists one.
+ */
+ if (spec->is_default)
+ {
+ overlap = partition_bound_has_default(boundinfo);
+ with = boundinfo->default_index;
+ break;
+ }
To support default partitioning for range, this is going to have to be
moved above the switch rather than done inside of it. And there's
really no downside to putting it there.
+ * constraint, by *proving* that the existing constraints of the table
+ * *imply* the given constraints. We include the table's check constraints and
Both the comma and the asterisks are unnecessary.
+ * Check whether all rows in the given table (scanRel) obey given partition
obey the given
I think the larger comment block could be tightened up a bit, like
this: Check whether all rows in the given table obey the given
partition constraint; if so, it can be attached as a partition. We do
this by scanning the table (or all of its leaf partitions) row by row,
except when the existing constraints are sufficient to prove that the
new partitioning constraint must already hold.
+ /* Check if we can do away with having to scan the table being attached. */
If possible, skip the validation scan.
+ * Set up to have the table be scanned to validate the partition
+ * constraint If it's a partitioned table, we instead schedule its leaf
+ * partitions to be scanned.
I suggest: Prepare to scan the default partition (or, if it is itself
partitioned, all of its leaf partitions).
+ int default_index; /* Index of the default partition if any; -1
+ * if there isn't one */
"if any" is a bit redundant with "if there isn't one"; note the
phrasing of the preceding entry.
+ /*
+ * Skip if it's a partitioned table. Only RELKIND_RELATION relations
+ * (ie, leaf partitions) need to be scanned.
+ */
+ if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
+ part_rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
The comment talks about what must be included in our list of things to
scan, but the code tests for the things that can be excluded. I
suspect the comment has the right idea and the code should be adjusted
to match, but anyway they should be consistent. Also, the correct way
to punctuate i.e. is like this: (i.e. leaf partitions) You should have
a period after each letter, but no following comma.
+ * The default partition must be already having an AccessExclusiveLock.
I think we should instead change DefineRelation to open (rather than
just lock) the default partition and pass the Relation as an argument
here so that we need not reopen it.
+ /* Construct const from datum */
+ val = makeConst(key->parttypid[0],
+ key->parttypmod[0],
+ key->parttypcoll[0],
+ key->parttyplen[0],
+ *boundinfo->datums[i],
+ false, /* isnull */
+ key->parttypbyval[0] /* byval */ );
The /* byval */ comment looks a bit redundant, but I think this could
use a comment along the lines of: /* Only single-column list
partitioning is supported, so we only need to worry about the
partition key with index 0. */ And I'd also add an Assert() verifying
the the partition key has exactly 1 column, so that this breaks a bit
more obviously if someone removes that restriction in the future.
+ * Handle NULL partition key here if there's a null-accepting list
+ * partition, else later it will be routed to the default partition if
+ * one exists.
This isn't a great update of the existing comment -- it's drifted from
explaining the code to which it is immediately attached to a more
general discussion of NULL handling. I'd just say something like: If
this is a NULL, send it to the null-accepting partition. Otherwise,
route by searching the array of partition bounds.
+ if (tab->is_default_partition)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("updated partition constraint for
default partition would be violated by some row")));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_CHECK_VIOLATION),
While there's room for debate about the correct error code here, it's
hard for me to believe that it's correct to use one error code for the
is_default_partition case and a different error code the rest of the
time.
+ * previously cached default partition constraints; those constraints
+ * won't stand correct after addition(or even removal) of a partition.
won't be correct after addition or removal
+ * allow any row that qualifies for this new partition. So, check if
+ * the existing data in the default partition satisfies this *would be*
+ * default partition constraint.
check that the existing data in the default partition satisfies the
constraint as it will exist after adding this partition
+ * Need to take a lock on the default partition, refer comment for locking
+ * the default partition in DefineRelation().
I'd say: We must also lock the default partition, for the same reasons
explained in DefineRelation().
And similarly in the other places that refer to that same comment.
+ /*
+ * In case of the default partition, the constraint is of the form
+ * "!(result)" i.e. one of the following two forms:
+ * 1. NOT ((keycol IS NULL) OR (keycol = ANY (arr)))
+ * 2. NOT ((keycol IS NOT NULL) AND (keycol = ANY (arr))), where arr is an
+ * array of datums in boundinfo->datums.
+ */
Does this survive pgindent? You might need to surround the comment
with dashes to preserve formatting.
I think it would be worth adding a little more text this comment,
something like this: Note that, in general, applying NOT to a
constraint expression doesn't necessarily invert the set of rows it
accepts, because NOT NULL is NULL. However, the partition constraints
we construct here never evaluate to NULL, so applying NOT works as
intended.
+ * Check whether default partition has a row that would fit the partition
+ * being attached by negating the partition constraint derived from the
+ * bounds. Since default partition is already part of the partitioned
+ * table, we don't need to validate the constraints on the partitioned
+ * table.
Here again, I'd add to the end of the first sentence a parenthetical
note, like this: ...from the bounds (the partition constraint never
evaluates to NULL, so negating it like this is safe).
I don't understand the second sentence. It seems to contradict the first one.
+extern List *get_default_part_validation_constraint(List *new_part_constaints);
#endif /* PARTITION_H */
There should be a blank line after the last prototype and before the #endif.
+-- default partition table when it is being used in cahced plan.
Typo.
Some more comments on the latest set of patches.
In heap_drop_with_catalog(), we heap_open() the parent table to get the
default partition OID, if any. If the relcache doesn't have an entry for the
parent, this means that the entry will be created, only to be invalidated at
the end of the function. If there is no default partition, this all is
completely unnecessary. We should avoid heap_open() in this case. This also
means that get_default_partition_oid() should not rely on the relcache entry,
but should growl through pg_inherit to find the default partition.
In get_qual_for_list(), if the table has only default partition, it won't have
any boundinfo. In such a case the default partition's constraint would come out
as (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[]::integer[])))). The empty array
looks odd and may be we spend a few CPU cycles executing ANY on an empty array.
We have the same problem with a partition containing only NULL value. So, may
be this one is not that bad.
Please add a testcase to test addition of default partition as the first
partition.
get_qual_for_list() allocates the constant expressions corresponding to the
datums in CacheMemoryContext while constructing constraints for a default
partition. We do not do this for other partitions. We may not be constructing
the constraints for saving in the cache. For example, ATExecAttachPartition
constructs the constraints for validation. In such a case, this code will
unnecessarily clobber the cache memory. generate_partition_qual() copies the
partition constraint in the CacheMemoryContext.
+ if (spec->is_default)
+ {
+ result = list_make1(make_ands_explicit(result));
+ result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
+ }
If the "result" is an OR expression, calling make_ands_explicit() on it would
create AND(OR(...)) expression, with an unnecessary AND. We want to avoid that?
+ if (cur_index < 0 && (partition_bound_has_default(partdesc->boundinfo)))
+ cur_index = partdesc->boundinfo->default_index;
+
The partition_bound_has_default() check is unnecessary since we check for
cur_index < 0 next anyway.
+ *
+ * Given the parent relation checks if it has default partition, and if it
+ * does exist returns its oid, otherwise returns InvalidOid.
+ */
May be reworded as "If the given relation has a default partition, this
function returns the OID of the default partition. Otherwise it returns
InvalidOid."
+Oid
+get_default_partition_oid(Relation parent)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+
+ if (partdesc->boundinfo && partition_bound_has_default(partdesc->boundinfo))
+ return partdesc->oids[partdesc->boundinfo->default_index];
+
+ return InvalidOid;
+}
An unpartitioned table would not have partdesc set set. So, this function will
segfault if we pass an unpartitioned table. Either Assert that partdesc should
exist or check for its NULL-ness.
+ defaultPartOid = get_default_partition_oid(rel);
+ if (OidIsValid(defaultPartOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("there exists a default partition for table
\"%s\", cannot attach a new partition",
+ RelationGetRelationName(rel))));
+
Should be done before heap_open on the table being attached. If we are not
going to attach the partition, there's no point in instantiating its relcache.
The comment in heap_drop_with_catalog() should mention why we lock the default
partition before locking the table being dropped.
extern List *preprune_pg_partitions(PlannerInfo *root, RangeTblEntry *rte,
Index rti, Node *quals, LOCKMODE lockmode);
-
#endif /* PARTITION_H */
Unnecessary hunk.
Hello, I'd like to review this but it doesn't fit the master, as
Robert said. Especially the interface of predicate_implied_by is
changed by the suggested commit.
Anyway I have some comment on this patch with fresh eyes. I
believe the basic design so my comment below are from a rather
micro viewpoint.
- Considering on how canSkipPartConstraintValidation is called, I
*think* that RelationProvenValid() would be better. (But this
would be disappear by rebasing..)
- 0002 changes the interface of get_qual_for_list, but left
get_qual_for_range alone. Anyway get_qual_for_range will have
to do the similar thing soon.
- In check_new_partition_bound, "overlap" and "with" is
completely correlated with each other. "with > -1" means
"overlap = true". So overlap is not useless. ("with" would be
better to be "overlap_with" or somehting if we remove
"overlap")
- The error message of check_default_allows_bound is below.
"updated partition constraint for default partition \"%s\"
would be violated by some row"
This looks an analog of validateCheckConstraint but as my
understanding this function is called only when new partition
is added. This would be difficult to recognize in the
situation.
"the default partition contains rows that should be in
the new partition: \"%s\""
or something?
- In check_default_allows_bound, the iteration over partitions is
quite similar to what validateCheckConstraint does. Can we
somehow share validateCheckConstraint with this function?
- In the same function, skipping RELKIND_PARTITIONED_TABLE is
okay, but silently ignoring RELKIND_FOREIGN_TABLE doesn't seem
good. I think at least some warning should be emitted.
"Skipping foreign tables in the defalut partition. It might
contain rows that should be in the new partition." (Needs
preventing multple warnings in single call, maybe)
- In the same function, the following condition seems somewhat
strange in comparison to validateCheckConstraint.
> if (partqualstate && ExecCheck(partqualstate, econtext))
partqualstate won't be null as long as partition_constraint is
valid. Anyway (I'm believing that) an invalid constraint
results in error by ExecPrepareExpr. Therefore 'if
(partqualstate' is useless.
- In gram.y, the nonterminal for list spec clause is still
"ForValues". It seems somewhat strange. partition_spec or
something would be better.
- This is not a part of this patch, but in ruleutils.c, the error
for unknown paritioning strategy is emitted as following.
> elog(ERROR, "unrecognized partition strategy: %d",
> (int) strategy);
The cast is added because the strategy is a char. I suppose
this is because strategy can be an unprintable. I'd like to see
a comment if it is correct.
Hello, On Thu, Jul 13, 2017 at 1:22 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > >> - Should probably be merged with the patch to add default partitioning >> for ranges. > > > Beena is already rebasing her patch on my latest patches, so I think getting > them merged here won't be an issue, mostly will be just like one more patch > on top my patches. > I have posted the updated patch which can be applied over the v22 patches submitted here. https://www.postgresql.org/message-id/CAOG9ApGEZxSQD-ZD3icj_CwTmprSGG7sZ_r3k9m4rmcc6ozr%3Dg%40mail.gmail.com Thank you, Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi,I have worked further on V21 patch set, rebased it on latest master commit,addressed the comments given by Robert, Ashutosh and others.The attached tar has a series of 7 patches.Here is a brief of these 7 patches:0001:Refactoring existing ATExecAttachPartition code so that it can be used fordefault partitioning as well0002:This patch teaches the partitioning code to handle the NIL returned byget_qual_for_list().This is needed because a default partition will not have any constraints in caseit is the only partition of its parent.0003:Support for default partition with the restriction of preventing addition of anynew partition after default partition.0004:Store the default partition OID in pg_partition_table, this will help us toretrieve the OID of default relation when we don't have the relation cacheavailable. This was also suggested by Amit Langote here[1].0005:Extend default partitioning support to allow addition of new partitions.0006:Extend default partitioning validation code to reuse the refactored code inpatch 0001.0007:This patch introduces code to check if the scanning of default partition childcan be skipped if it's constraints are proven.TODO:Add documentation.
Merge default range partitioning patch.
Attachment
Hi,On Thu, Jul 13, 2017 at 1:01 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: Hi,I have worked further on V21 patch set, rebased it on latest master commit,addressed the comments given by Robert, Ashutosh and others.The attached tar has a series of 7 patches.Here is a brief of these 7 patches:0001:Refactoring existing ATExecAttachPartition code so that it can be used fordefault partitioning as well0002:This patch teaches the partitioning code to handle the NIL returned byget_qual_for_list().This is needed because a default partition will not have any constraints in caseit is the only partition of its parent.0003:Support for default partition with the restriction of preventing addition of anynew partition after default partition.0004:Store the default partition OID in pg_partition_table, this will help us toretrieve the OID of default relation when we don't have the relation cacheavailable. This was also suggested by Amit Langote here[1].0005:Extend default partitioning support to allow addition of new partitions.0006:Extend default partitioning validation code to reuse the refactored code inpatch 0001.0007:This patch introduces code to check if the scanning of default partition childcan be skipped if it's constraints are proven.TODO:Add documentation.I have added a documentation patch(patch 0008) to the existing set of patches.PFA.Merge default range partitioning patch.Beena has created a patch on top of my patches here[1].Regards,Jeevan Ladhe
Attachment
On Wed, Jul 26, 2017 at 5:44 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi, > > I have rebased the patches on the latest commit. > Thanks for rebasing the patches. The patches apply and compile cleanly. make check passes. Here are some review comments 0001 patch Most of this patch is same as 0002 patch posted in thread [1]. I have extensively reviewed that patch for Amit Langote. Can you please compare these two patches and try to address those comments OR just use patch from that thread? For example, canSkipPartConstraintValidation() is named as PartConstraintImpliedByRelConstraint() in that patch. OR + if (scanRel_constr == NULL) + return false; + is not there in that patch since returning false is wrong when partConstraint is NULL. I think this patch needs those fixes. Also, this patch set would need a rebase when 0001 from that thread gets committed. 0002 patch + if (!and_args) + result = NULL; Add "NULL, if there are not partition constraints e.g. in case of default partition as the only partition.". This patch avoids calling validatePartitionConstraints() and hence canSkipPartConstraintValidation() when partConstraint is NULL, but patches in [1] introduce more callers of canSkipPartConstraintValidation() which may pass NULL. So, it's better that we handle that case. 0003 patch + parentRel = heap_open(parentOid, AccessExclusiveLock); In [2], Amit Langote has given a reason as to why heap_drop_with_catalog() should not heap_open() the parent relation. But this patch still calls heap_open() without giving any counter argument. Also I don't see get_default_partition_oid() using Relation anywhere. If you remove that heap_open() please remove following heap_close(). + heap_close(parentRel, NoLock); + /* + * The default partition accepts any non-specified + * value, hence it should not get a mapped index while + * assigning those for non-null datums. + */ Instead of "any non-specified value", you may want to use "any value not specified in the lists of other partitions" or something like that. + * If this is a NULL, route it to the null-accepting partition. + * Otherwise, route by searching the array of partition bounds. You may want to write it as "If this is a null partition key, ..." to clarify what's NULL. + * cur_index < 0 means we could not find a non-default partition of + * this parent. cur_index >= 0 means we either found the leaf + * partition, or the next parent to find a partition of. + * + * If we couldn't find a non-default partition check if the default + * partition exists, if it does, get its index. In order to avoid repeating "we couldn't find a ..."; you may want to add ", try default partition if one exists." in the first sentence itself. get_default_partition_oid() is defined in this patch and then redefined in 0004. Let's define it only once, mostly in or before 0003 patch. + * partition strategy. Assign the parent strategy to the default s/parent/parent's/ +-- attaching default partition overlaps if the default partition already exists +CREATE TABLE def_part PARTITION OF list_parted DEFAULT; +CREATE TABLE fail_def_part (LIKE part_1 INCLUDING CONSTRAINTS); +ALTER TABLE list_parted ATTACH PARTITION fail_def_part DEFAULT; +ERROR: cannot attach a new partition to table "list_parted" having a default partition For 0003 patch this testcase is same as the testcase in the next hunk; no new partition can be added after default partition. Please add this testcase in next set of patches. +-- fail +insert into part_default values ('aa', 2); May be explain why the insert should fail. "A row, which would fit other partition, does not fit default partition, even when inserted directly" or something like that. I see that many of the tests in that file do not explain why something should "fail" or be "ok", but may be it's better to document the reason for better readability and future reference. +-- check in case of multi-level default partitioned table s/in/the/ ?. Or you may want to reword it as "default partitioned partition in multi-level partitioned table" as there is nothing like "default partitioned table". May be we need a testcase where every level of a multi-level partitioned table has a default partition. +-- drop default, as we need to add some more partitions to test tuple routing Should be clubbed with the actual DROP statement? +-- Check that addition or removal of any partition is correctly dealt with by +-- default partition table when it is being used in cached plan. Plan of a prepared statement gets cached only after it's executed 5 times. Before that the statement gets invalidated but there's not cached plan that gets invalidated. The test is fine here, but in order to test the cached plan as mentioned in the comment, you will need to execute the statement 5 times before executing drop statement. That's probably unnecessary, so just modify the comment to say "prepared statements instead of cached plan". 0004 patch The patch adds another column partdefid to catalog pg_partitioned_table. The column gives OID of the default partition for a given partitioned table. This means that the default partition's OID is stored at two places 1. in the default partition table's pg_class entry and in pg_partitioned_table. There is no way to detect when these two go out of sync. Keeping those two in sync is also a maintenance burdern. Given that default partition's OID is required only while adding/dropping a partition, which is a less frequent operation, it won't hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out the default partition's OID. That will be occasional performance hit worth the otherwise maintenance burden. I haven't reviewed next two patches, but those patches depend upon some of the comments above. So, it's better to consider these comments before looking at those patches. [1] https://www.postgresql.org/message-id/cee32590-68a7-8b56-5213-e07d9b8ab89e@lab.ntt.co.jp [2] https://www.postgresql.org/message-id/35d68d49-555f-421a-99f8-185a44d085a4@lab.ntt.co.jp -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Fri, Jul 28, 2017 at 9:30 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > 0004 patch > The patch adds another column partdefid to catalog pg_partitioned_table. The > column gives OID of the default partition for a given partitioned table. This > means that the default partition's OID is stored at two places 1. in the > default partition table's pg_class entry and in pg_partitioned_table. There is > no way to detect when these two go out of sync. Keeping those two in sync is > also a maintenance burdern. Given that default partition's OID is required only > while adding/dropping a partition, which is a less frequent operation, it won't > hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out > the default partition's OID. That will be occasional performance hit > worth the otherwise maintenance burden. Performance isn't the only consideration here. We also need to think about locking and concurrency. I think that most operations that involve locking the parent will also involve locking the default partition. However, we can't safely build a relcache entry for a relation before we've got some kind of lock on it. We can't assume that there is no concurrent DDL going on before we take some lock. We can't assume invalidation messages are processed before we have taken some lock. If we read multiple catalog tuples, they may be from different points in time. If we can figure out everything we need to know from one or two syscache lookups, it may be easier to verify that the code is bug-free vs. having to do something more complicated. Now that having been said, I'm not taking the position that Jeevan's patch (based on Amit Langote's idea) has definitely got the right idea, just that you should think twice before shooting down the approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
+ parentRel = heap_open(parentOid, AccessExclusiveLock);
In [2], Amit Langote has given a reason as to why heap_drop_with_catalog()
should not heap_open() the parent relation. But this patch still calls
heap_open() without giving any counter argument. Also I don't see
get_default_partition_oid() using Relation anywhere. If you remove that
heap_open() please remove following heap_close().
On Sat, Jul 29, 2017 at 2:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jul 28, 2017 at 9:30 AM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> 0004 patch >> The patch adds another column partdefid to catalog pg_partitioned_table. The >> column gives OID of the default partition for a given partitioned table. This >> means that the default partition's OID is stored at two places 1. in the >> default partition table's pg_class entry and in pg_partitioned_table. There is >> no way to detect when these two go out of sync. Keeping those two in sync is >> also a maintenance burdern. Given that default partition's OID is required only >> while adding/dropping a partition, which is a less frequent operation, it won't >> hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out >> the default partition's OID. That will be occasional performance hit >> worth the otherwise maintenance burden. > > Performance isn't the only consideration here. We also need to think > about locking and concurrency. I think that most operations that > involve locking the parent will also involve locking the default > partition. However, we can't safely build a relcache entry for a > relation before we've got some kind of lock on it. We can't assume > that there is no concurrent DDL going on before we take some lock. We > can't assume invalidation messages are processed before we have taken > some lock. If we read multiple catalog tuples, they may be from > different points in time. If we can figure out everything we need to > know from one or two syscache lookups, it may be easier to verify that > the code is bug-free vs. having to do something more complicated. > The code takes a lock on the parent relation. While that function holds that lock nobody else would change partitions of that relation and hence nobody changes the default partition. heap_drop_with_catalog() has code to lock the parent. Looking up pg_inherits catalog for its partitions followed by identifying the partition which has default partition bounds specification while holding the lock on the parent should be safe. Any changes to partition bounds, or partitions would require lock on the parent. In order to prevent any buggy code changing the default partition without sufficient locks, we should lock the default partition after it's found and check the default partition bound specification again. Will that work? > Now that having been said, I'm not taking the position that Jeevan's > patch (based on Amit Langote's idea) has definitely got the right > idea, just that you should think twice before shooting down the > approach. > If we can avoid the problems specified by Amit Langote, I am fine with the approach of reading the default partition OID from the Relcache as well. But I am not able to device a solution to those problems. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Sun, Jul 30, 2017 at 8:07 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi Ashutosh, > > 0003 patch >> >> + parentRel = heap_open(parentOid, AccessExclusiveLock); >> In [2], Amit Langote has given a reason as to why heap_drop_with_catalog() >> should not heap_open() the parent relation. But this patch still calls >> heap_open() without giving any counter argument. Also I don't see >> get_default_partition_oid() using Relation anywhere. If you remove that >> heap_open() please remove following heap_close(). > > > I think the patch 0004 exactly does what you have said here, i.e. it gets > rid of the heap_open() and heap_close(). > The question might be why I kept the patch 0004 a separate one, and the > answer is I wanted to make it easier for review, and also keeping it that > way would make it bit easy to work on a different approach if needed. > The reviewer has to review two different set of changes to the same portion of the code. That just doubles the work. I didn't find that simplifying review. As I have suggested earlier, let's define get_default_partition_oid() only once, mostly in or before 0003 patch. Having it in a separate patch would allow you to change its implementation if needed. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Wed, Jul 12, 2017 at 3:31 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > 0001: > Refactoring existing ATExecAttachPartition code so that it can be used for > default partitioning as well Boring refactoring. Seems fine. > 0002: > This patch teaches the partitioning code to handle the NIL returned by > get_qual_for_list(). > This is needed because a default partition will not have any constraints in > case > it is the only partition of its parent. Perhaps it would be better to make validatePartConstraint() a no-op when the constraint is empty rather than putting the logic in the caller. Otherwise, every place that calls validatePartConstraint() has to think about whether or not the constraint-is-NULL case needs to be handled. > 0003: > Support for default partition with the restriction of preventing addition of > any > new partition after default partition. This looks generally reasonable, but can't really be committed without the later patches, because it might break pg_dump, which won't know that the DEFAULT partition must be dumped last and might therefore get the dump ordering wrong, and of course also because it reverts commit c1e0e7e1d790bf18c913e6a452dea811e858b554. > 0004: > Store the default partition OID in pg_partition_table, this will help us to > retrieve the OID of default relation when we don't have the relation cache > available. This was also suggested by Amit Langote here[1]. I looked this over and I think this is the right approach. An alternative way to avoid needing a relcache entry in heap_drop_with_catalog() would be for get_default_partition_oid() to call find_inheritance_children() here and then use a syscache lookup to get the partition bound for each one, but that's still going to cause some syscache bloat. > 0005: > Extend default partitioning support to allow addition of new partitions. + if (spec->is_default) + { + /* Default partition cannot be added if there already exists one. */ + if (partdesc->nparts > 0 && partition_bound_has_default(boundinfo)) + { + with = boundinfo->default_index; + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("partition \"%s\" conflicts with existing default partition \"%s\"", + relname, get_rel_name(partdesc->oids[with])), + parser_errposition(pstate, spec->location))); + } + + return; + } I generally think it's good to structure the code so as to minimize the indentation level. In this case, if you did if (partdesc->nparts == 0 || !partition_bound_has_default(boundinfo)) return; first, then the rest of it could be one level less indented. Also, perhaps it would be clearer to test boundinfo == NULL rather than partdesc->nparts == 0, assuming they are equivalent. - * We must also lock the default partition, for the same reasons explained - * in heap_drop_with_catalog(). + * We must lock the default partition, for the same reasons explained in + * DefineRelation(). I don't really see the point of this change. Whichever earlier patch adds this code could include or omit the word "also" as appropriate, and then this patch wouldn't need to change it. > 0006: > Extend default partitioning validation code to reuse the refactored code in > patch 0001. I'm having a very hard time understanding what's going on with this patch. It certainly doesn't seem to be just refactoring things to use the code from 0001. For example: - if (ExecCheck(partqualstate, econtext)) + if (!ExecCheck(partqualstate, econtext)) It seems hard to believe that refactoring things to use the code from 0001 would involve inverting the value of this test. + * derived from the bounds(the partition constraint never evaluates to + * NULL, so negating it like this is safe). I don't see it being negated. I think this patch needs a better explanation of what it's trying to do, and better comments. I gather that at least part of the point here is to skip validation scans on default partitions if the default partition has been constrained not to contain any values that would fall in the new partition, but neither the commit message for 0006 nor your description here make that very clear. > 0007: > This patch introduces code to check if the scanning of default partition > child > can be skipped if it's constraints are proven. If I understand correctly, this is actually a completely separate feature not intrinsically related to default partitioning. > [0008 documentation] - attached is marked <literal>NO INHERIT</literal>, the command will fail; - such a constraint must be recreated without the <literal>NO INHERIT</literal> - clause. + target table. + </para> I don't favor inserting a paragraph break here. + then the default partition(if it is a regular table) is scanned to check The sort-of-trivial problem with this is that an open parenthesis should be proceeded by a space. But I think this won't be clear. I think you should move this below the following paragraph, which describes what happens for foreign tables, and then add a new paragraph like this: When a table has a default partition, defining a new partition changes the partition constraint for the default partition. The default partition can't contain any rows that would need to be moved to the new partition, and will be scanned to verify that none are present. This scan, like the scan of the new partition, can be avoided if an appropriate <literal>CHECK</literal> constraint is present. Also like the scan of the new partition, it is always skipped when the default partition is a foreign table. -) ] FOR VALUES <replaceable class="PARAMETER">partition_bound_spec</replaceable> +) ] { DEFAULT | FOR VALUES <replaceable class="PARAMETER">partition_bound_spec</replaceable> } I recommend writing FOR VALUES | DEFAULT both here and in the ATTACH PARTITION syntax summary. + If <literal>DEFAULT</literal> is specified the table will be created as a + default partition of the parent table. The parent can either be a list or + range partitioned table. A partition key value not fitting into any other + partition of the given parent will be routed to the default partition. + There can be only one default partition for a given parent table. + </para> + + <para> + If the given parent is already having a default partition then adding a + new partition would result in an error if the default partition contains a + record that would fit in the new partition being added. This check is not + performed if the default partition is a foreign table. + </para> The indentation isn't correct here - it doesn't match the surrounding paragraphs. The bit about list or range partitioning doesn't match the actual behavior of the other patches, but maybe you intended this to document both this feature and what Beena's doing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> 0005:
> Extend default partitioning support to allow addition of new partitions.
+ if (spec->is_default)
+ {
+ /* Default partition cannot be added if there already
exists one. */
+ if (partdesc->nparts > 0 &&
partition_bound_has_default(boundinfo))
+ {
+ with = boundinfo->default_index;
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("partition \"%s\"
conflicts with existing default partition \"%s\"",
+ relname,
get_rel_name(partdesc->oids[with])),
+ parser_errposition(pstate,
spec->location)));
+ }
+
+ return;
+ }
I generally think it's good to structure the code so as to minimize
the indentation level. In this case, if you did if (partdesc->nparts
== 0 || !partition_bound_has_default(boundinfo)) return; first, then
the rest of it could be one level less indented. Also, perhaps it
would be clearer to test boundinfo == NULL rather than
partdesc->nparts == 0, assuming they are equivalent.
- * We must also lock the default partition, for the same
reasons explained
- * in heap_drop_with_catalog().
+ * We must lock the default partition, for the same reasons explained in
+ * DefineRelation().
I don't really see the point of this change. Whichever earlier patch
adds this code could include or omit the word "also" as appropriate,
and then this patch wouldn't need to change it.
> 0007:
> This patch introduces code to check if the scanning of default partition
> child
> can be skipped if it's constraints are proven.
If I understand correctly, this is actually a completely separate
feature not intrinsically related to default partitioning.
On Mon, Aug 14, 2017 at 7:51 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > I think even with this change there will be one level of indentation > needed for throwing the error, as the error is to be thrown only if > there exists a default partition. That's true, but we don't need two levels. >> > 0007: >> > This patch introduces code to check if the scanning of default partition >> > child >> > can be skipped if it's constraints are proven. >> >> If I understand correctly, this is actually a completely separate >> feature not intrinsically related to default partitioning. > > I don't see this as a new feature, since scanning the default partition > will be introduced by this series of patches only, and rather than a > feature this can be classified as a completeness of default skip > validation logic. Your thoughts? Currently, when a partitioned table is attached, we check whether all the scans can be checked but not whether scans on some partitions can be attached. So there are two separate things: 1. When we introduce default partitioning, we need scan the default partition either when (a) any partition is attached or (b) any partition is created. 2. In any situation where scans are needed (scanning the partition when it's attached, scanning the default partition when some other partition is attached, scanning the default when a new partition is created), we can run predicate_implied_by for each partition to see whether the scan of that partition can be skipped. Those two changes are independent. We could do (1) without doing (2) or (2) without doing (1) or we could do both. So they are separate features. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > I have rebased the patches on the latest commit. This needs another rebase. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have rebased the patches on the latest commit.
This needs another rebase.
Attachment
On Wed, Jul 26, 2017 at 5:44 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patches on the latest commit.
>
Thanks for rebasing the patches. The patches apply and compile
cleanly. make check passes.
Here are some review comments
0001 patch
Most of this patch is same as 0002 patch posted in thread [1]. I have
extensively reviewed that patch for Amit Langote. Can you please compare these
two patches and try to address those comments OR just use patch from that
thread? For example, canSkipPartConstraintValidation() is named as
PartConstraintImpliedByRelConstraint() in that patch. OR
+ if (scanRel_constr == NULL)
+ return false;
+
is not there in that patch since returning false is wrong when partConstraint
is NULL. I think this patch needs those fixes. Also, this patch set would need
a rebase when 0001 from that thread gets committed.
0002 patch
+ if (!and_args)
+ result = NULL;
Add "NULL, if there are not partition constraints e.g. in case of default
partition as the only partition.".
This patch avoids calling
validatePartitionConstraints() and hence canSkipPartConstraintValidation() when
partConstraint is NULL, but patches in [1] introduce more callers of
canSkipPartConstraintValidation() which may pass NULL. So, it's better that we
handle that case.
0003 patch
+ parentRel = heap_open(parentOid, AccessExclusiveLock);
In [2], Amit Langote has given a reason as to why heap_drop_with_catalog()
should not heap_open() the parent relation. But this patch still calls
heap_open() without giving any counter argument. Also I don't see
get_default_partition_oid() using Relation anywhere. If you remove that
heap_open() please remove following heap_close().
+ heap_close(parentRel, NoLock);
+ /*
+ * The default partition accepts any non-specified
+ * value, hence it should not get a mapped index while
+ * assigning those for non-null datums.
+ */
Instead of "any non-specified value", you may want to use "any value not
specified in the lists of other partitions" or something like that.
+ * If this is a NULL, route it to the null-accepting partition.
+ * Otherwise, route by searching the array of partition bounds.
You may want to write it as "If this is a null partition key, ..." to clarify
what's NULL.
+ * cur_index < 0 means we could not find a non-default partition of
+ * this parent. cur_index >= 0 means we either found the leaf
+ * partition, or the next parent to find a partition of.
+ *
+ * If we couldn't find a non-default partition check if the default
+ * partition exists, if it does, get its index.
In order to avoid repeating "we couldn't find a ..."; you may want to add ",
try default partition if one exists." in the first sentence itself.
get_default_partition_oid() is defined in this patch and then redefined in
0004. Let's define it only once, mostly in or before 0003 patch.
+ * partition strategy. Assign the parent strategy to the default
s/parent/parent's/
+-- attaching default partition overlaps if the default partition already exists
+CREATE TABLE def_part PARTITION OF list_parted DEFAULT;
+CREATE TABLE fail_def_part (LIKE part_1 INCLUDING CONSTRAINTS);
+ALTER TABLE list_parted ATTACH PARTITION fail_def_part DEFAULT;
+ERROR: cannot attach a new partition to table "list_parted" having a
default partition
For 0003 patch this testcase is same as the testcase in the next hunk; no new
partition can be added after default partition. Please add this testcase in
next set of patches.
+-- fail
+insert into part_default values ('aa', 2);
May be explain why the insert should fail. "A row, which would fit
other partition, does not fit default partition, even when inserted directly"
or something like that. I see that many of the tests in that file do not
explain why something should "fail" or be "ok", but may be it's better to
document the reason for better readability and future reference.
+-- check in case of multi-level default partitioned table
s/in/the/ ?. Or you may want to reword it as "default partitioned partition in
multi-level partitioned table" as there is nothing like "default partitioned
table". May be we need a testcase where every level of a multi-level
partitioned table has a default partition.
+-- drop default, as we need to add some more partitions to test tuple routing
Should be clubbed with the actual DROP statement?
+-- Check that addition or removal of any partition is correctly dealt with by
+-- default partition table when it is being used in cached plan.
Plan of a prepared statement gets cached only after it's executed 5 times.
Before that the statement gets invalidated but there's not cached plan that
gets invalidated. The test is fine here, but in order to test the cached plan
as mentioned in the comment, you will need to execute the statement 5 times
before executing drop statement. That's probably unnecessary, so just modify
the comment to say "prepared statements instead of cached plan".
0004 patch
The patch adds another column partdefid to catalog pg_partitioned_table. The
column gives OID of the default partition for a given partitioned table. This
means that the default partition's OID is stored at two places 1. in the
default partition table's pg_class entry and in pg_partitioned_table. There is
no way to detect when these two go out of sync. Keeping those two in sync is
also a maintenance burdern. Given that default partition's OID is required only
while adding/dropping a partition, which is a less frequent operation, it won't
hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out
the default partition's OID. That will be occasional performance hit
worth the otherwise maintenance burden.
Hi Ashutosh,Please find my feedback inlined.On Fri, Jul 28, 2017 at 7:00 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: On Wed, Jul 26, 2017 at 5:44 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patches on the latest commit.
>
Thanks for rebasing the patches. The patches apply and compile
cleanly. make check passes.
Here are some review comments
0001 patch
Most of this patch is same as 0002 patch posted in thread [1]. I have
extensively reviewed that patch for Amit Langote. Can you please compare these
two patches and try to address those comments OR just use patch from that
thread? For example, canSkipPartConstraintValidation() is named as
PartConstraintImpliedByRelConstraint() in that patch. OR
+ if (scanRel_constr == NULL)
+ return false;
+
is not there in that patch since returning false is wrong when partConstraint
is NULL. I think this patch needs those fixes. Also, this patch set would need
a rebase when 0001 from that thread gets committed.I have renamed the canSkipPartConstraintValidation() to PartConstraintImpliedByRelConstraint() and made other changes applicable per Amit’s patch. This patch also refactors the scanning logic in ATExecAttachPartition()and adds it into a function ValidatePartitionConstraints(), hence I could not use Amit’s patch as it is. Please have a look into the new patch and let me know if itlooks fine to you.0002 patch
+ if (!and_args)
+ result = NULL;
Add "NULL, if there are not partition constraints e.g. in case of default
partition as the only partition.".Added. Please check.This patch avoids calling
validatePartitionConstraints() and hence canSkipPartConstraintValidation() when
partConstraint is NULL, but patches in [1] introduce more callers of
canSkipPartConstraintValidation() which may pass NULL. So, it's better that we
handle that case.Following code added in patch 0001 now should take care of this.+ num_check = (constr != NULL) ? constr->num_check : 0;0003 patch
+ parentRel = heap_open(parentOid, AccessExclusiveLock);
In [2], Amit Langote has given a reason as to why heap_drop_with_catalog()
should not heap_open() the parent relation. But this patch still calls
heap_open() without giving any counter argument. Also I don't see
get_default_partition_oid() using Relation anywhere. If you remove that
heap_open() please remove following heap_close().
+ heap_close(parentRel, NoLock);As clarified earlier this was addressed in 0004 patch of V24 series. Incurrent set of patches this is now addressed in patch 0003 itself.
+ /*
+ * The default partition accepts any non-specified
+ * value, hence it should not get a mapped index while
+ * assigning those for non-null datums.
+ */
Instead of "any non-specified value", you may want to use "any value not
specified in the lists of other partitions" or something like that.Changed the comment.
+ * If this is a NULL, route it to the null-accepting partition.
+ * Otherwise, route by searching the array of partition bounds.
You may want to write it as "If this is a null partition key, ..." to clarify
what's NULL.Changed the comment.
+ * cur_index < 0 means we could not find a non-default partition of
+ * this parent. cur_index >= 0 means we either found the leaf
+ * partition, or the next parent to find a partition of.
+ *
+ * If we couldn't find a non-default partition check if the default
+ * partition exists, if it does, get its index.
In order to avoid repeating "we couldn't find a ..."; you may want to add ",
try default partition if one exists." in the first sentence itself.Sorry, but I am not really sure how this change would make the commentmore readable than the current one.get_default_partition_oid() is defined in this patch and then redefined in
0004. Let's define it only once, mostly in or before 0003 patch.get_default_partition_oid() is now defined only once in patch 0003.
+ * partition strategy. Assign the parent strategy to the default
s/parent/parent's/Fixed.
+-- attaching default partition overlaps if the default partition already exists
+CREATE TABLE def_part PARTITION OF list_parted DEFAULT;
+CREATE TABLE fail_def_part (LIKE part_1 INCLUDING CONSTRAINTS);
+ALTER TABLE list_parted ATTACH PARTITION fail_def_part DEFAULT;
+ERROR: cannot attach a new partition to table "list_parted" having a
default partition
For 0003 patch this testcase is same as the testcase in the next hunk; no new
partition can be added after default partition. Please add this testcase in
next set of patches.Though the error message is same, the purpose of testing is different:1. There cannot be more than one default partition,2. and other is to test the fact the a new partition cannot be added if thedefault partition exists.The later test needs to be removed in next patch where we add support foradding new partition even if a default partition exists.+-- fail
+insert into part_default values ('aa', 2);
May be explain why the insert should fail. "A row, which would fit
other partition, does not fit default partition, even when inserted directly"
or something like that. I see that many of the tests in that file do not
explain why something should "fail" or be "ok", but may be it's better to
document the reason for better readability and future reference.Added a comment.+-- check in case of multi-level default partitioned table
s/in/the/ ?. Or you may want to reword it as "default partitioned partition in
multi-level partitioned table" as there is nothing like "default partitioned
table". May be we need a testcase where every level of a multi-level
partitioned table has a default partition.I have changed the comment as well as added a test scenario where thepartition further has a default partition.+-- drop default, as we need to add some more partitions to test tuple routing
Should be clubbed with the actual DROP statement?This is needed in patch 0003, as it prevents adding/creating further partitionsto parent. This is removed in patch 0004.+-- Check that addition or removal of any partition is correctly dealt with by
+-- default partition table when it is being used in cached plan.
Plan of a prepared statement gets cached only after it's executed 5 times.
Before that the statement gets invalidated but there's not cached plan that
gets invalidated. The test is fine here, but in order to test the cached plan
as mentioned in the comment, you will need to execute the statement 5 times
before executing drop statement. That's probably unnecessary, so just modify
the comment to say "prepared statements instead of cached plan".Agree. Fixed.0004 patch
The patch adds another column partdefid to catalog pg_partitioned_table. The
column gives OID of the default partition for a given partitioned table. This
means that the default partition's OID is stored at two places 1. in the
default partition table's pg_class entry and in pg_partitioned_table. There is
no way to detect when these two go out of sync. Keeping those two in sync is
also a maintenance burdern. Given that default partition's OID is required only
while adding/dropping a partition, which is a less frequent operation, it won't
hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out
the default partition's OID. That will be occasional performance hit
worth the otherwise maintenance burden.To avoid partdefid of pg_partitioned_table going out of sync during anyfuture developments I have added an assert in RelationBuildPartitionDesc()in patch 0003 in V25 series. I believe DBAs are not supposed to alter anycatalog tables, hence instead of adding an error, I added an Assert to preventthis breaking during development cycle.We have similar kind of duplications in other catalogs e.g. pg_opfamily,pg_operator etc. Also, per Robert [1], the other route of searching pg_classand pg_inherits is going to cause some syscache bloat.
> 0002:
> This patch teaches the partitioning code to handle the NIL returned by
> get_qual_for_list().
> This is needed because a default partition will not have any constraints in
> case
> it is the only partition of its parent.
Perhaps it would be better to make validatePartConstraint() a no-op
when the constraint is empty rather than putting the logic in the
caller. Otherwise, every place that calls validatePartConstraint()
has to think about whether or not the constraint-is-NULL case needs to
be handled.
> 0003:
> Support for default partition with the restriction of preventing addition of
> any
> new partition after default partition.
This looks generally reasonable, but can't really be committed without
the later patches, because it might break pg_dump, which won't know
that the DEFAULT partition must be dumped last and might therefore get
the dump ordering wrong, and of course also because it reverts commit
c1e0e7e1d790bf18c913e6a452dea811e858b554.
> 0004:
> Store the default partition OID in pg_partition_table, this will help us to
> retrieve the OID of default relation when we don't have the relation cache
> available. This was also suggested by Amit Langote here[1].
I looked this over and I think this is the right approach. An
alternative way to avoid needing a relcache entry in
heap_drop_with_catalog() would be for get_default_partition_oid() to
call find_inheritance_children() here and then use a syscache lookup
to get the partition bound for each one, but that's still going to
cause some syscache bloat.
> 0005:
> Extend default partitioning support to allow addition of new partitions.
+ if (spec->is_default)
+ {
+ /* Default partition cannot be added if there already
exists one. */
+ if (partdesc->nparts > 0 &&
partition_bound_has_default(boundinfo))
+ {
+ with = boundinfo->default_index;
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("partition \"%s\"
conflicts with existing default partition \"%s\"",
+ relname,
get_rel_name(partdesc->oids[with])),
+ parser_errposition(pstate,
spec->location)));
+ }
+
+ return;
+ }
I generally think it's good to structure the code so as to minimize
the indentation level. In this case, if you did if (partdesc->nparts
== 0 || !partition_bound_has_default(boundinfo)) return; first, then
the rest of it could be one level less indented. Also, perhaps it
would be clearer to test boundinfo == NULL rather than
partdesc->nparts == 0, assuming they are equivalent.
> 0006:
> Extend default partitioning validation code to reuse the refactored code in
> patch 0001.
I'm having a very hard time understanding what's going on with this
patch. It certainly doesn't seem to be just refactoring things to use
the code from 0001. For example:
- if (ExecCheck(partqualstate, econtext))
+ if (!ExecCheck(partqualstate, econtext))
It seems hard to believe that refactoring things to use the code from
0001 would involve inverting the value of this test.
+ * derived from the bounds(the partition constraint
never evaluates to
+ * NULL, so negating it like this is safe).
I don't see it being negated.
I think this patch needs a better explanation of what it's trying to
do, and better comments. I gather that at least part of the point
here is to skip validation scans on default partitions if the default
partition has been constrained not to contain any values that would
fall in the new partition, but neither the commit message for 0006 nor
your description here make that very clear.
> [0008 documentation]
- attached is marked <literal>NO INHERIT</literal>, the command will fail;
- such a constraint must be recreated without the <literal>NO
INHERIT</literal>
- clause.
+ target table.
+ </para>
I don't favor inserting a paragraph break here.
+ then the default partition(if it is a regular table) is scanned to check
The sort-of-trivial problem with this is that an open parenthesis
should be proceeded by a space. But I think this won't be clear. I
think you should move this below the following paragraph, which
describes what happens for foreign tables, and then add a new
paragraph like this:
When a table has a default partition, defining a new partition changes
the partition constraint for the default partition. The default
partition can't contain any rows that would need to be moved to the
new partition, and will be scanned to verify that none are present.
This scan, like the scan of the new partition, can be avoided if an
appropriate <literal>CHECK</literal> constraint is present. Also like
the scan of the new partition, it is always skipped when the default
partition is a foreign table.
-) ] FOR VALUES <replaceable
class="PARAMETER">partition_bound_spec</replaceable>
+) ] { DEFAULT | FOR VALUES <replaceable
class="PARAMETER">partition_bound_spec</replaceable> }
I recommend writing FOR VALUES | DEFAULT both here and in the ATTACH
PARTITION syntax summary.
+ If <literal>DEFAULT</literal> is specified the table will be created as a
+ default partition of the parent table. The parent can either be a list or
+ range partitioned table. A partition key value not fitting into any other
+ partition of the given parent will be routed to the default partition.
+ There can be only one default partition for a given parent table.
+ </para>
+
+ <para>
+ If the given parent is already having a default partition then adding a
+ new partition would result in an error if the default partition contains a
+ record that would fit in the new partition being added. This check is not
+ performed if the default partition is a foreign table.
+ </para>
The indentation isn't correct here - it doesn't match the surrounding
paragraphs. The bit about list or range partitioning doesn't match
the actual behavior of the other patches, but maybe you intended this
to document both this feature and what Beena's doing.
>> > 0007:
>> > This patch introduces code to check if the scanning of default partition
>> > child
>> > can be skipped if it's constraints are proven.
>>
>> If I understand correctly, this is actually a completely separate
>> feature not intrinsically related to default partitioning.
>
> I don't see this as a new feature, since scanning the default partition
> will be introduced by this series of patches only, and rather than a
> feature this can be classified as a completeness of default skip
> validation logic. Your thoughts?
Currently, when a partitioned table is attached, we check whether all
the scans can be checked but not whether scans on some partitions can
be attached. So there are two separate things:
1. When we introduce default partitioning, we need scan the default
partition either when (a) any partition is attached or (b) any
partition is created.
2. In any situation where scans are needed (scanning the partition
when it's attached, scanning the default partition when some other
partition is attached, scanning the default when a new partition is
created), we can run predicate_implied_by for each partition to see
whether the scan of that partition can be skipped.
Those two changes are independent. We could do (1) without doing (2)
or (2) without doing (1) or we could do both. So they are separate
features.
On 17 August 2017 at 10:59, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi, > > On Tue, Aug 15, 2017 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe >> <jeevan.ladhe@enterprisedb.com> wrote: >> > I have rebased the patches on the latest commit. >> >> This needs another rebase. > > > I have rebased the patch and addressed your and Ashutosh comments on last > set of patches. > > The current set of patches contains 6 patches as below: > > 0001: > Refactoring existing ATExecAttachPartition code so that it can be used for > default partitioning as well > > 0002: > This patch teaches the partitioning code to handle the NIL returned by > get_qual_for_list(). > This is needed because a default partition will not have any constraints in > case > it is the only partition of its parent. > > 0003: > Support for default partition with the restriction of preventing addition of > any > new partition after default partition. This is a merge of 0003 and 0004 in > V24 series. > > 0004: > Extend default partitioning support to allow addition of new partitions > after > default partition is created/attached. This patch is a merge of patches > 0005 and 0006 in V24 series to simplify the review process. The > commit message has more details regarding what all is included. > > 0005: > This patch introduces code to check if the scanning of default partition > child > can be skipped if it's constraints are proven. > > 0006: > Documentation. > > > PFA, and let me know in case of any comments. Thanks. Applies fine, and I've been exercising the patch and it is doing everything it's supposed to do. I am, however, curious to know why the planner can't optimise the following: SELECT * FROM mystuff WHERE mystuff = (1::int,'JP'::text,'blue'::text); This exhaustively checks all partitions, but if I change it to: SELECT * FROM mystuff WHERE (id, country, content) = (1::int,'JP'::text,'blue'::text); It works fine. The former filters like so: ((mystuff_default_1.*)::mystuff = ROW(1, 'JP'::text, 'blue'::text)) Shouldn't it instead do: ((mystuff_default_1.id, mystuff_default_1.country, mystuff_default_1.content)::mystuff = ROW(1, 'JP'::text, 'blue'::text)) So it's not really to do with this patch; it's just something I noticed while testing. Thom
On Thu, Aug 17, 2017 at 6:24 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > I have addressed following comments in V25 patch[1]. Committed 0001. Since that code seems to be changing about every 10 minutes, it seems best to get this refactoring out of the way before it changes again. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 17, 2017 at 6:24 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have addressed following comments in V25 patch[1].
Committed 0001. Since that code seems to be changing about every 10
minutes, it seems best to get this refactoring out of the way before
it changes again.
Hi,
Hi,On Tue, Aug 15, 2017 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have rebased the patches on the latest commit.
This needs another rebase.I have rebased the patch and addressed your and Ashutosh comments on last set of patches.The current set of patches contains 6 patches as below:0001:Refactoring existing ATExecAttachPartition code so that it can be used fordefault partitioning as well0002:This patch teaches the partitioning code to handle the NIL returned byget_qual_for_list().This is needed because a default partition will not have any constraints in caseit is the only partition of its parent.0003:Support for default partition with the restriction of preventing addition of anynew partition after default partition. This is a merge of 0003 and 0004 inV24 series.0004:Extend default partitioning support to allow addition of new partitions afterdefault partition is created/attached. This patch is a merge of patches0005 and 0006 in V24 series to simplify the review process. Thecommit message has more details regarding what all is included.0005:This patch introduces code to check if the scanning of default partition childcan be skipped if it's constraints are proven.0006:Documentation.
After patch 0001 in above series got committed[1], I have rebased the patches.
The attached set of patches now looks like below:
0001:
This patch teaches the partitioning code to handle the NIL returned by
get_qual_for_list().
This is needed because a default partition will not have any constraints in case
it is the only partition of its parent.
0002:
Support for default partition with the restriction of preventing addition of any
new partition after default partition. This is a merge of 0003 and 0004 in
V24 series.
0003:
Extend default partitioning support to allow addition of new partitions after
default partition is created/attached. This patch is a merge of patches
0005 and 0006 in V24 series to simplify the review process. The
commit message has more details regarding what all is included.
0004:
This patch introduces code to check if the scanning of default partition child
can be skipped if it's constraints are proven.
0005:
Documentation.
Regards,
Attachment
On Mon, Aug 21, 2017 at 4:47 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > > Hi, > > On Thu, Aug 17, 2017 at 3:29 PM, Jeevan Ladhe > <jeevan.ladhe@enterprisedb.com> wrote: >> >> Hi, >> >> On Tue, Aug 15, 2017 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> >> wrote: >>> >>> On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe >>> <jeevan.ladhe@enterprisedb.com> wrote: >>> > I have rebased the patches on the latest commit. >>> >>> This needs another rebase. >> >> >> I have rebased the patch and addressed your and Ashutosh comments on last >> set of patches. Thanks for the rebased patches. >> >> The current set of patches contains 6 patches as below: >> >> 0001: >> Refactoring existing ATExecAttachPartition code so that it can be used >> for >> default partitioning as well * Returns an expression tree describing the passed-in relation's partition - * constraint. + * constraint. If there are no partition constraints returns NULL e.g. in case + * default partition is the only partition. The first sentence uses singular constraint. The second uses plural. Given that partition bounds together form a single constraint we should use singular constraint in the second sentence as well. Do we want to add a similar comment in the prologue of generate_partition_qual(). The current wording there seems to cover this case, but do we want to explicitly mention this case? + if (!and_args) + result = NULL; While this is correct, I am increasingly seeing (and_args != NIL) usage. get_partition_qual_relid() is called from pg_get_partition_constraintdef(), constr_expr = get_partition_qual_relid(relationId); /* Quick exit if not a partition */ if (constr_expr == NULL) PG_RETURN_NULL(); The comment is now wrong since a default partition may have no constraints. May be rewrite it as simply, "Quick exit if no partition constraint." generate_partition_qual() has three callers and all of them are capable of handling NIL partition constraint for default partition. May be it's better to mention in the commit message that we have checked that the callers of this function can handle NIL partition constraint. >> >> 0002: >> This patch teaches the partitioning code to handle the NIL returned by >> get_qual_for_list(). >> This is needed because a default partition will not have any constraints >> in case >> it is the only partition of its parent. If the partition being dropped is the default partition, heap_drop_with_catalog() locks default partition twice, once as the default partition and the second time as the partition being dropped. So, it will be counted as locked twice. There doesn't seem to be any harm in this, since the lock will be help till the transaction ends, by when all the locks will be released. Same is the case with cache invalidation message. If we are dropping default partition, the cache invalidation message on "default partition" is not required. Again this might be harmless, but better to avoid it. Similar problems exists with ATExecDetachPartition(), default partition will be locked twice if it's being detached. + /* + * If this is a default partition, pg_partitioned_table must have it's + * OID as value of 'partdefid' for it's parent (i.e. rel) entry. + */ + if (castNode(PartitionBoundSpec, boundspec)->is_default) + { + Oid partdefid; + + partdefid = get_default_partition_oid(RelationGetRelid(rel)); + Assert(partdefid == inhrelid); + } Since an accidental change or database corruption may change the default partition OID in pg_partition_table. An Assert won't help in such a case. May be we should throw an error or at least report a warning. If we throw an error, the table will become useless (or even the database will become useless RelationBuildPartitionDesc is called from RelationCacheInitializePhase3() on such a corrupted table). To avoid that we may raise a warning. I am wondering whether we could avoid call to get_default_partition_oid() in the above block, thus avoiding a sys cache lookup. The sys cache search shouldn't be expensive since the cache should already have that entry, but still if we can avoid it, we save some CPU cycles. The default partition OID is stored in pg_partition_table catalog, which is looked up in RelationGetPartitionKey(), a function which precedes RelationGetPartitionDesc() everywhere. What if that RelationGetPartitionKey() also returns the default partition OID and the common caller passes it to RelationGetPartitionDesc()?. + /* A partition cannot be attached if there exists a default partition */ + defaultPartOid = get_default_partition_oid(RelationGetRelid(rel)); + if (OidIsValid(defaultPartOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot attach a new partition to table \"%s\" having a default partition", + RelationGetRelationName(rel)))); get_default_partition_oid() searches the catalogs, which is not needed when we have relation descriptor of the partitioned table (to which a new partition is being attached). You should get the default partition OID from partition descriptor. That will be cheaper. + /* If there isn't any constraint, show that explicitly */ + if (partconstraintdef[0] == '\0') + printfPQExpBuffer(&tmpbuf, _("No partition constraint")); I think we need to change the way we set partconstraintdef at if (PQnfields(result) == 3) partconstraintdef= PQgetvalue(result, 0, 2); Before this commit, constraints will never be NULL so this code works, but now that the cosntraints could be NULL, we need to check whether 3rd value is NULL or not using PQgetisnull() and assigning a value only when it's not NULL. +-- test adding default partition as first partition accepts any value including grammar, reword as "test that a default partition added as the first partition accepts any value including". >> >> 0003: >> Support for default partition with the restriction of preventing addition >> of any >> new partition after default partition. This is a merge of 0003 and 0004 in >> V24 series. The commit message of this patch has following line, which no more applies to patch 0001. May be you want to remove this line or update the patch number. 3. This patch uses the refactored functions created in patch 0001 in this series. Similarly the credit line refers to patch 0001. That too needs correction. - * Also, invalidate the parent's relcache, so that the next rebuild will load - * the new partition's info into its partition descriptor. + * Also, invalidate the parent's relcache entry, so that the next rebuild will + * load he new partition's info into its partition descriptor. If there is a + * default partition, we must invalidate its relcache entry as well. Replacing "relcache" with "relcache entry" in the first sentence may be a good idea, but is unrelated to this patch. I would leave that change aside and just add comment about default partition. + /* + * The partition constraint for the default partition depends on the + * partition bounds of every other partition, so we must invalidate the + * relcache entry for that partition every time a partition is added or + * removed. + */ + defaultPartOid = get_default_partition_oid(RelationGetRelid(parent)); + if (OidIsValid(defaultPartOid)) + CacheInvalidateRelcacheByRelid(defaultPartOid); Again, since we have access to the parent's relcache, we should get the default partition OID from relcache rather than catalogs. The commit message of this patch has following line, which no more applies to patch 0001. May be you want to remove this line or update the patch number. 3. This patch uses the refactored functions created in patch 0001 in this series. Similarly the credit line refers to patch 0001. That too needs correction. - * Also, invalidate the parent's relcache, so that the next rebuild will load - * the new partition's info into its partition descriptor. + * Also, invalidate the parent's relcache entry, so that the next rebuild will + * load he new partition's info into its partition descriptor. If there is a + * default partition, we must invalidate its relcache entry as well. Replacing "relcache" with "relcache entry" in the first sentence may be a good idea, but is unrelated to this patch. I would leave that change aside and just add comment about default partition. + /* + * The partition constraint for the default partition depends on the + * partition bounds of every other partition, so we must invalidate the + * relcache entry for that partition every time a partition is added or + * removed. + */ + defaultPartOid = get_default_partition_oid(RelationGetRelid(parent)); + if (OidIsValid(defaultPartOid)) + CacheInvalidateRelcacheByRelid(defaultPartOid); Again, since we have access to the parent's relcache, we should get the default partition OID from relcache rather than catalogs. I haven't gone through the full patch yet, so there may be more comments here. There is some duplication of code in check_default_allows_bound() and ValidatePartitionConstraints() to scan the children of partition being validated. The difference is that the first one scans the relations in the same function and the second adds them to work queue. May be we could use ValidatePartitionConstraints() to scan the relation or add to the queue based on some input flag may be wqueue argument itself. But I haven't thought through this completely. Any thoughts? >> >> 0004: >> Extend default partitioning support to allow addition of new partitions >> after >> default partition is created/attached. This patch is a merge of patches >> 0005 and 0006 in V24 series to simplify the review process. The >> commit message has more details regarding what all is included. >> >> 0005: >> This patch introduces code to check if the scanning of default partition >> child >> can be skipped if it's constraints are proven. >> >> 0006: >> Documentation. >> > I will get to these patches in a short while. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
On Thu, Aug 31, 2017 at 8:53 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > 0001: > This patch refactors RelationBuildPartitionDesc(), basically this is patch > 0001 of default range partition[1]. I spent a while studying this; it seems to be simpler and there's no real downside. So, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 1, 2017 at 3:19 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Aug 31, 2017 at 8:53 AM, Jeevan Ladhe > <jeevan.ladhe@enterprisedb.com> wrote: >> 0001: >> This patch refactors RelationBuildPartitionDesc(), basically this is patch >> 0001 of default range partition[1]. > > I spent a while studying this; it seems to be simpler and there's no > real downside. So, committed. BTW, the rest of this series seems to need a rebase. The changes to insert.sql conflicted with 30833ba154e0c1106d61e3270242dc5999a3e4f3. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 1, 2017 at 3:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Aug 31, 2017 at 8:53 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>> 0001:
>> This patch refactors RelationBuildPartitionDesc(), basically this is patch
>> 0001 of default range partition[1].
>
> I spent a while studying this; it seems to be simpler and there's no
> real downside. So, committed.
BTW, the rest of this series seems to need a rebase. The changes to
insert.sql conflicted with 30833ba154e0c1106d61e3270242dc5999a3e4f3.
Attachment
>>
>> The current set of patches contains 6 patches as below:
>>
>> 0001:
>> Refactoring existing ATExecAttachPartition code so that it can be used
>> for
>> default partitioning as well
* Returns an expression tree describing the passed-in relation's partition
- * constraint.
+ * constraint. If there are no partition constraints returns NULL e.g. in case
+ * default partition is the only partition.
The first sentence uses singular constraint. The second uses plural. Given that
partition bounds together form a single constraint we should use singular
constraint in the second sentence as well.
Do we want to add a similar comment in the prologue of
generate_partition_qual(). The current wording there seems to cover this case,
but do we want to explicitly mention this case?
+ if (!and_args)
+ result = NULL;
While this is correct, I am increasingly seeing (and_args != NIL) usage.
get_partition_qual_relid() is called from pg_get_partition_constraintdef(),
constr_expr = get_partition_qual_relid(relationId);
/* Quick exit if not a partition */
if (constr_expr == NULL)
PG_RETURN_NULL();
The comment is now wrong since a default partition may have no constraints. May
be rewrite it as simply, "Quick exit if no partition constraint."
generate_partition_qual() has three callers and all of them are capable of
handling NIL partition constraint for default partition. May be it's better to
mention in the commit message that we have checked that the callers of
this function
can handle NIL partition constraint.
>>
>> 0002:
>> This patch teaches the partitioning code to handle the NIL returned by
>> get_qual_for_list().
>> This is needed because a default partition will not have any constraints
>> in case
>> it is the only partition of its parent.
If the partition being dropped is the default partition,
heap_drop_with_catalog() locks default partition twice, once as the default
partition and the second time as the partition being dropped. So, it will be
counted as locked twice. There doesn't seem to be any harm in this, since the
lock will be help till the transaction ends, by when all the locks will be
released.
Same is the case with cache invalidation message. If we are dropping default
partition, the cache invalidation message on "default partition" is not
required. Again this might be harmless, but better to avoid it.
Similar problems exists with ATExecDetachPartition(), default partition will be
locked twice if it's being detached.
+ /*
+ * If this is a default partition, pg_partitioned_table must have it's
+ * OID as value of 'partdefid' for it's parent (i.e. rel) entry.
+ */
+ if (castNode(PartitionBoundSpec, boundspec)->is_default)
+ {
+ Oid partdefid;
+
+ partdefid = get_default_partition_oid(RelationGetRelid(rel));
+ Assert(partdefid == inhrelid);
+ }
Since an accidental change or database corruption may change the default
partition OID in pg_partition_table. An Assert won't help in such a case. May
be we should throw an error or at least report a warning. If we throw an error,
the table will become useless (or even the database will become useless
RelationBuildPartitionDesc is called from RelationCacheInitializePhase3() on
such a corrupted table). To avoid that we may raise a warning.
I am wondering whether we could avoid call to get_default_partition_oid() in
the above block, thus avoiding a sys cache lookup. The sys cache search
shouldn't be expensive since the cache should already have that entry, but
still if we can avoid it, we save some CPU cycles. The default partition OID is
stored in pg_partition_table catalog, which is looked up in
RelationGetPartitionKey(), a function which precedes RelationGetPartitionDesc()
everywhere. What if that RelationGetPartitionKey() also returns the default
partition OID and the common caller passes it to RelationGetPartitionDesc()?.
+ /* A partition cannot be attached if there exists a default partition */
+ defaultPartOid = get_default_partition_oid(RelationGetRelid(rel));
+ if (OidIsValid(defaultPartOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("cannot attach a new partition to table
\"%s\" having a default partition",
+ RelationGetRelationName(rel))));
get_default_partition_oid() searches the catalogs, which is not needed when we
have relation descriptor of the partitioned table (to which a new partition is
being attached). You should get the default partition OID from partition
descriptor. That will be cheaper.
+ /* If there isn't any constraint, show that explicitly */
+ if (partconstraintdef[0] == '\0')
+ printfPQExpBuffer(&tmpbuf, _("No partition constraint"));
I think we need to change the way we set partconstraintdef at
if (PQnfields(result) == 3)
partconstraintdef = PQgetvalue(result, 0, 2);
Before this commit, constraints will never be NULL so this code works, but now
that the cosntraints could be NULL, we need to check whether 3rd value is NULL
or not using PQgetisnull() and assigning a value only when it's not NULL.
+-- test adding default partition as first partition accepts any value including
grammar, reword as "test that a default partition added as the first
partition accepts any
value including".
>>
>> 0003:
>> Support for default partition with the restriction of preventing addition
>> of any
>> new partition after default partition. This is a merge of 0003 and 0004 in
>> V24 series.
The commit message of this patch has following line, which no more applies to
patch 0001. May be you want to remove this line or update the patch number.
3. This patch uses the refactored functions created in patch 0001
in this series.
Similarly the credit line refers to patch 0001. That too needs correction.
- * Also, invalidate the parent's relcache, so that the next rebuild will load
- * the new partition's info into its partition descriptor.
+ * Also, invalidate the parent's relcache entry, so that the next rebuild will
+ * load he new partition's info into its partition descriptor. If there is a
+ * default partition, we must invalidate its relcache entry as well.
Replacing "relcache" with "relcache entry" in the first sentence may be a good
idea, but is unrelated to this patch. I would leave that change aside and just
add comment about default partition.
+ /*
+ * The partition constraint for the default partition depends on the
+ * partition bounds of every other partition, so we must invalidate the
+ * relcache entry for that partition every time a partition is added or
+ * removed.
+ */
+ defaultPartOid = get_default_partition_oid(RelationGetRelid(parent));
+ if (OidIsValid(defaultPartOid))
+ CacheInvalidateRelcacheByRelid(defaultPartOid);
Again, since we have access to the parent's relcache, we should get the default
partition OID from relcache rather than catalogs.
I haven't gone through the full patch yet, so there may be more
comments here. There is some duplication of code in
check_default_allows_bound() and ValidatePartitionConstraints() to
scan the children of partition being validated. The difference is that
the first one scans the relations in the same function and the second
adds them to work queue. May be we could use
ValidatePartitionConstraints() to scan the relation or add to the
queue based on some input flag may be wqueue argument itself. But I
haven't thought through this completely. Any thoughts?
Hi,Attached is the rebased set of patches.Robert has committed[1] patch 0001 in V26 series, hence the patch numberingin V27 is now decreased by 1 for each patch as compared to V26.
Hi,
I have applied v27 patches and while testing got below observation.
Observation: in below partition table, d1 constraints not allowing NULL to be inserted in b column
but I am able to insert it.
steps to reproduce:
create table d0 (a int, b int) partition by range(a,b);
create table d1 partition of d0 for values from (0,0) to (maxvalue,maxvalue);
postgres=# insert into d0 values (0,null);
INSERT 0 1
postgres=# \d+ d1
Table "public.d1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
a | integer | | | | plain | |
b | integer | | | | plain | |
Partition of: d0 FOR VALUES FROM (0, 0) TO (MAXVALUE, MAXVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((a > 0) OR ((a = 0) AND (b >= 0))))
postgres=# select tableoid::regclass,* from d0;
tableoid | a | b
----------+---+---
d1 | 0 |
(1 row)
On Wed, Sep 6, 2017 at 5:25 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: Hi,Attached is the rebased set of patches.Robert has committed[1] patch 0001 in V26 series, hence the patch numberingin V27 is now decreased by 1 for each patch as compared to V26.
Hi,
I have applied v27 patches and while testing got below observation.
Observation: in below partition table, d1 constraints not allowing NULL to be inserted in b column
but I am able to insert it.
steps to reproduce:
create table d0 (a int, b int) partition by range(a,b);
create table d1 partition of d0 for values from (0,0) to (maxvalue,maxvalue);
postgres=# insert into d0 values (0,null);
INSERT 0 1
postgres=# \d+ d1
Table "public.d1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+-------- -+--------------+-------------
a | integer | | | | plain | |
b | integer | | | | plain | |
Partition of: d0 FOR VALUES FROM (0, 0) TO (MAXVALUE, MAXVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((a > 0) OR ((a = 0) AND (b >= 0))))
postgres=# select tableoid::regclass,* from d0;
tableoid | a | b
----------+---+---
d1 | 0 |
(1 row)
I will work on a fix and send a patch shortly.
Attachment
I will work on a fix and send a patch shortly.Attached is the V28 patch that fixes the issue reported by Rajkumar.The patch series is exactly same as that of V27 series[1].The fix is in patch 0002, and macro partition_bound_has_default() isagain moved in 0002 from 0003, as the fix needed to use it.
Attachment
On Wed, Sep 6, 2017 at 5:50 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > >> >> I am wondering whether we could avoid call to get_default_partition_oid() >> in >> the above block, thus avoiding a sys cache lookup. The sys cache search >> shouldn't be expensive since the cache should already have that entry, but >> still if we can avoid it, we save some CPU cycles. The default partition >> OID is >> stored in pg_partition_table catalog, which is looked up in >> RelationGetPartitionKey(), a function which precedes >> RelationGetPartitionDesc() >> everywhere. What if that RelationGetPartitionKey() also returns the >> default >> partition OID and the common caller passes it to >> RelationGetPartitionDesc()?. > > > The purpose here is to cross check the relid with partdefid stored in > catalog > pg_partitioned_table, though its going to be the same in the parents cache, > I > think its better that we retrieve it from the catalog syscache. > Further, RelationGetPartitionKey() is a macro and not a function, so > modifying > the existing simple macro for this reason does not sound a good idea to me. > Having said this I am open to ideas here. Sorry, I meant RelationBuildPartitionKey() and RelationBuildPartitionDesc() instead of RelationGetPartitionKey() and RelationGetPartitionDesc() resp. > >> >> + /* A partition cannot be attached if there exists a default partition >> */ >> + defaultPartOid = get_default_partition_oid(RelationGetRelid(rel)); >> + if (OidIsValid(defaultPartOid)) >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), >> + errmsg("cannot attach a new partition to table >> \"%s\" having a default partition", >> + RelationGetRelationName(rel)))); >> get_default_partition_oid() searches the catalogs, which is not needed >> when we >> have relation descriptor of the partitioned table (to which a new >> partition is >> being attached). You should get the default partition OID from partition >> descriptor. That will be cheaper. > > > Something like following can be done here: > /* A partition cannot be attached if there exists a default partition */ > if (partition_bound_has_default(rel->partdesc->boundinfo)) > ereport(ERROR, > (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > errmsg("cannot attach a new partition to table \"%s\" > having a default partition", > RelationGetRelationName(rel)))); > > But, partition_bound_has_default() is defined in partition.c and not in > partition.h. This is done that way because boundinfo is not available in > partition.h. Further, this piece of code is removed in next patch where we > extend default partition support to add/attach partition even when default > partition exists. So, to me I don’t see much of the correction issue here. If the code is being removed, I don't think we should sweat too much about it. Sorry for the noise. > > Another way to get around this is, we can define another version of > get_default_partition_oid() something like > get_default_partition_oid_from_parent_rel() > in partition.c which looks around in relcache instead of catalog and returns > the > oid of default partition, or get_default_partition_oid() accepts both parent > OID, > and parent ‘Relation’ rel, if rel is not null look into relcahce and return, > else search from catalog using OID. I think we should define a function to return default partition OID from partition descriptor and make it extern. Define a wrapper which accepts Relation and returns calls this function to get default partition OID from partition descriptor. The wrapper will be called only on an open Relation, wherever it's available. > >> I haven't gone through the full patch yet, so there may be more >> comments here. There is some duplication of code in >> check_default_allows_bound() and ValidatePartitionConstraints() to >> scan the children of partition being validated. The difference is that >> the first one scans the relations in the same function and the second >> adds them to work queue. May be we could use >> ValidatePartitionConstraints() to scan the relation or add to the >> queue based on some input flag may be wqueue argument itself. But I >> haven't thought through this completely. Any thoughts? > > > check_default_allows_bound() is called only from DefineRelation(), > and not for alter command. I am not really sure how can we use > work queue for create command. No, we shouldn't use work queue for CREATE command. We should extract the common code into a function to be called from check_default_allows_bound() and ValidatePartitionConstraints(). To that function we pass a flag (or the work queue argument itself), which decides whether to add a work queue item or scan the relation directly. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Thu, Sep 7, 2017 at 8:13 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > The fix would be much easier if the refactoring patch 0001 by Amul in hash > partitioning thread[2] is committed. Done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 6, 2017 at 5:50 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
>>
>> I am wondering whether we could avoid call to get_default_partition_oid()
>> in
>> the above block, thus avoiding a sys cache lookup. The sys cache search
>> shouldn't be expensive since the cache should already have that entry, but
>> still if we can avoid it, we save some CPU cycles. The default partition
>> OID is
>> stored in pg_partition_table catalog, which is looked up in
>> RelationGetPartitionKey(), a function which precedes
>> RelationGetPartitionDesc()
>> everywhere. What if that RelationGetPartitionKey() also returns the
>> default
>> partition OID and the common caller passes it to
>> RelationGetPartitionDesc()?.
>
>
> The purpose here is to cross check the relid with partdefid stored in
> catalog
> pg_partitioned_table, though its going to be the same in the parents cache,
> I
> think its better that we retrieve it from the catalog syscache.
> Further, RelationGetPartitionKey() is a macro and not a function, so
> modifying
> the existing simple macro for this reason does not sound a good idea to me.
> Having said this I am open to ideas here.
Sorry, I meant RelationBuildPartitionKey() and
RelationBuildPartitionDesc() instead of RelationGetPartitionKey() and
RelationGetPartitionDesc() resp.
If the code is being removed, I don't think we should sweat too much>
>>
>> + /* A partition cannot be attached if there exists a default partition
>> */
>> + defaultPartOid = get_default_partition_oid(RelationGetRelid(rel));
>> + if (OidIsValid(defaultPartOid))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>> + errmsg("cannot attach a new partition to table
>> \"%s\" having a default partition",
>> + RelationGetRelationName(rel))));
>> get_default_partition_oid() searches the catalogs, which is not needed
>> when we
>> have relation descriptor of the partitioned table (to which a new
>> partition is
>> being attached). You should get the default partition OID from partition
>> descriptor. That will be cheaper.
>
>
> Something like following can be done here:
> /* A partition cannot be attached if there exists a default partition */
> if (partition_bound_has_default(rel->partdesc->boundinfo))
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> errmsg("cannot attach a new partition to table \"%s\"
> having a default partition",
> RelationGetRelationName(rel))));
>
> But, partition_bound_has_default() is defined in partition.c and not in
> partition.h. This is done that way because boundinfo is not available in
> partition.h. Further, this piece of code is removed in next patch where we
> extend default partition support to add/attach partition even when default
> partition exists. So, to me I don’t see much of the correction issue here.
about it. Sorry for the noise.
>
> Another way to get around this is, we can define another version of
> get_default_partition_oid() something like
> get_default_partition_oid_from_parent_rel()
> in partition.c which looks around in relcache instead of catalog and returns
> the
> oid of default partition, or get_default_partition_oid() accepts both parent
> OID,
> and parent ‘Relation’ rel, if rel is not null look into relcahce and return,
> else search from catalog using OID.
I think we should define a function to return default partition OID
from partition descriptor and make it extern. Define a wrapper which
accepts Relation and returns calls this function to get default
partition OID from partition descriptor. The wrapper will be called
only on an open Relation, wherever it's available.
>
>> I haven't gone through the full patch yet, so there may be more
>> comments here. There is some duplication of code in
>> check_default_allows_bound() and ValidatePartitionConstraints() to
>> scan the children of partition being validated. The difference is that
>> the first one scans the relations in the same function and the second
>> adds them to work queue. May be we could use
>> ValidatePartitionConstraints() to scan the relation or add to the
>> queue based on some input flag may be wqueue argument itself. But I
>> haven't thought through this completely. Any thoughts?
>
>
> check_default_allows_bound() is called only from DefineRelation(),
> and not for alter command. I am not really sure how can we use
> work queue for create command.
No, we shouldn't use work queue for CREATE command. We should extract
the common code into a function to be called from
check_default_allows_bound() and ValidatePartitionConstraints(). To
that function we pass a flag (or the work queue argument itself),
which decides whether to add a work queue item or scan the relation
directly.
Attachment
On Thu, Sep 7, 2017 at 8:13 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> The fix would be much easier if the refactoring patch 0001 by Amul in hash
> partitioning thread[2] is committed.
Done.
On Fri, Sep 8, 2017 at 10:08 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Thanks Robert for taking care of this. > My V29 patch series[1] is based on this commit now. Committed 0001-0003, 0005 with assorted modifications, mostly cosmetic, but with some actual changes to describeOneTableDetails and ATExecAttachPartition and minor additions to the regression tests. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 8, 2017 at 10:08 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Thanks Robert for taking care of this.
> My V29 patch series[1] is based on this commit now.
Committed 0001-0003, 0005 with assorted modifications, mostly
cosmetic, but with some actual changes to describeOneTableDetails and
ATExecAttachPartition and minor additions to the regression tests.