Thread: [HACKERS] Adding support for Default partition in partitioning

[HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello,

Currently inserting the data into a partitioned table that does not fit into
any of its partitions is not allowed.

The attached patch provides a capability to add a default partition to a list
partitioned table as follows.

postgres=# CREATE TABLE list_partitioned (              
    a int
) PARTITION BY LIST (a);
CREATE TABLE

postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE

postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
CREATE TABLE

postgres=# insert into list_partitioned values (9);
INSERT 0 1

postgres=# select * from part_default;
 a
---
 9
(1 row)

The attached patch is in a  preliminary stage and has following ToDos:
1. Adding pg_dump support.
2. Documentation
3. Handling adding a new partition to a partitioned table
   with default partition.
   This will require moving tuples from existing default partition to
  newly created partition if they satisfy its partition bound.
4. Handling of update of partition key in a default partition. As per
current design it should throw an error if the update requires the tuple to
be moved to any other partition. But this can changed by the following proposal.

https://www.postgresql.org/message-id/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com


I am adding it to the current commitfest with the status Waiting on Author as I will submit an updated patch with above ToDos.
Kindly give your suggestions.

Thank you,
Rahila Syed
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, Mar 1, 2017 at 6:29 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> 3. Handling adding a new partition to a partitioned table
>    with default partition.
>    This will require moving tuples from existing default partition to
>   newly created partition if they satisfy its partition bound.

Considering that this patch was submitted at the last minute and isn't
even complete, I can't see this getting into v10.  But that doesn't
mean we can't talk about it.  I'm curious to hear other opinions on
whether we should have this feature.  On the point mentioned above, I
don't think adding a partition should move tuples, necessarily; seems
like it would be good enough - maybe better - for it to fail if there
are any that would need to be moved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
David Fetter
Date:
On Fri, Mar 03, 2017 at 08:10:52AM +0530, Robert Haas wrote:
> On Wed, Mar 1, 2017 at 6:29 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> > 3. Handling adding a new partition to a partitioned table
> >    with default partition.
> >    This will require moving tuples from existing default partition to
> >   newly created partition if they satisfy its partition bound.
> 
> Considering that this patch was submitted at the last minute and isn't
> even complete, I can't see this getting into v10.  But that doesn't
> mean we can't talk about it.  I'm curious to hear other opinions on
> whether we should have this feature.  On the point mentioned above, I
> don't think adding a partition should move tuples, necessarily; seems
> like it would be good enough - maybe better - for it to fail if there
> are any that would need to be moved.

I see this as a bug fix.

The current state of declarative partitions is such that you need way
too much foresight in order to use them.  Missed adding a partition?
Writes fail and can't be made to succeed.  This is not a failure mode
we should be forcing on people, especially as it's a massive
regression from the extant inheritance-based partitioning.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] Adding support for Default partition in partitioning

From
Keith Fiske
Date:
On Thu, Mar 2, 2017 at 9:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 1, 2017 at 6:29 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> 3. Handling adding a new partition to a partitioned table
>    with default partition.
>    This will require moving tuples from existing default partition to
>   newly created partition if they satisfy its partition bound.

Considering that this patch was submitted at the last minute and isn't
even complete, I can't see this getting into v10.  But that doesn't
mean we can't talk about it.  I'm curious to hear other opinions on
whether we should have this feature.  On the point mentioned above, I
don't think adding a partition should move tuples, necessarily; seems
like it would be good enough - maybe better - for it to fail if there
are any that would need to be moved.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

I'm all for this feature and had suggested it back in the original thread to add partitioning to 10. I agree that adding a new partition should not move any data out of the default. It's easy enough to set up a monitor to watch for data existing in the default. Perhaps also adding a column to pg_partitioned_table that contains the oid of the default partition so it's easier to identify from a system catalog perspective and make that monitoring easier. I don't even see a need for it to fail either and not quite sure how that would even work? If they can't add a necessary child due to data being in the default, how can they ever get it out? Just leave it to the user to keep an eye on the default and fix it as necessary. This is what I do in pg_partman.

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jim Nasby
Date:
On 3/7/17 10:30 AM, Keith Fiske wrote:
> I'm all for this feature and had suggested it back in the original

FWIW, I was working with a system just today that has an overflow partition.

> thread to add partitioning to 10. I agree that adding a new partition
> should not move any data out of the default. It's easy enough to set up

+1

> a monitor to watch for data existing in the default. Perhaps also adding
> a column to pg_partitioned_table that contains the oid of the default
> partition so it's easier to identify from a system catalog perspective
> and make that monitoring easier. I don't even see a need for it to fail

I agree that there should be a way to identify the default partition.

> either and not quite sure how that would even work? If they can't add a
> necessary child due to data being in the default, how can they ever get
> it out?

Yeah, was wondering that as well...
-- 
Jim Nasby, Chief Data Architect, OpenSCG
http://OpenSCG.com



Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
>I agree that adding a new partition should not move any data out of the default. It's easy enough to set up a monitor to watch for data existing in the >default. Perhaps also adding a column to pg_partitioned_table that contains the oid of the default partition so it's easier to identify from a system >catalog perspective and make that monitoring easier.

Wont it incur overhead of scanning the default partition for matching rows each time a select happens on any matching partition? 
This extra scan would be required until rows satisfying the newly added partition are left around in default partition.

>I don't even see a need for it to fail either and not quite sure how that would even work? If they can't add a necessary child due to data being in the >default, how can they ever get it out? Just leave it to the user to keep an eye on the default and fix it as necessary.
This patch intends to provide a way to insert data that does not satisfy any of the existing partitions. For this patch, we can disallow adding a new partition when a default partition with conflicting rows exist. There can be many solutions to the problem of adding a new partition. One is to move the relevant tuples from default to the new partition or like you suggest keep monitoring the default partition until user moves the rows out of the default.

Thank you,
Rahila Syed

On Tue, Mar 7, 2017 at 10:00 PM, Keith Fiske <keith@omniti.com> wrote:
On Thu, Mar 2, 2017 at 9:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 1, 2017 at 6:29 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> 3. Handling adding a new partition to a partitioned table
>    with default partition.
>    This will require moving tuples from existing default partition to
>   newly created partition if they satisfy its partition bound.

Considering that this patch was submitted at the last minute and isn't
even complete, I can't see this getting into v10.  But that doesn't
mean we can't talk about it.  I'm curious to hear other opinions on
whether we should have this feature.  On the point mentioned above, I
don't think adding a partition should move tuples, necessarily; seems
like it would be good enough - maybe better - for it to fail if there
are any that would need to be moved.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

I'm all for this feature and had suggested it back in the original thread to add partitioning to 10. I agree that adding a new partition should not move any data out of the default. It's easy enough to set up a monitor to watch for data existing in the default. Perhaps also adding a column to pg_partitioned_table that contains the oid of the default partition so it's easier to identify from a system catalog perspective and make that monitoring easier. I don't even see a need for it to fail either and not quite sure how that would even work? If they can't add a necessary child due to data being in the default, how can they ever get it out? Just leave it to the user to keep an eye on the default and fix it as necessary. This is what I do in pg_partman.

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Re: [HACKERS] Adding support for Default partition in partitioning

From
Peter Eisentraut
Date:
On 3/2/17 21:40, Robert Haas wrote:
> On the point mentioned above, I
> don't think adding a partition should move tuples, necessarily; seems
> like it would be good enough - maybe better - for it to fail if there
> are any that would need to be moved.

ISTM that the uses cases of various combinations of adding a default
partition, adding another partition after it, removing the default
partition, clearing out the default partition in order to add more
nondefault partitions, and so on, need to be more clearly spelled out
for each partitioning type.  We also need to consider that pg_dump and
pg_upgrade need to be able to reproduce all those states.  Seems to be a
bit of work still ...

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type.  We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states.  Seems to be a
> bit of work still ...

This patch is only targeting list partitioning.   It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition.  The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.

I don't know how complete the patch is, but the specification seems
clear enough.  If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5).  If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5).  If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7).  The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do.  I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello,

Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error if
conflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.

Thank you,
Rahila Syed

On Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type.  We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states.  Seems to be a
> bit of work still ...

This patch is only targeting list partitioning.   It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition.  The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.

I don't know how complete the patch is, but the specification seems
clear enough.  If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5).  If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5).  If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7).  The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do.  I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rushabh Lathia
Date:
I picked this for review and noticed that patch is not getting
cleanly complied on my environment.

partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
      Const    *val = lfirst(c);
      ^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
  PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
  ^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
     result = parent->indexes[partdesc->boundinfo->def_index];
     ^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
     *failed_at = RelationGetRelid(parent->reldesc);
                ^
cc1: all warnings being treated as errors

Apart from this, I was reading patch here are few more comments:

1) Variable initializing happening at two place.

@@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
     /* List partitioning specific */
     PartitionListValue **all_values = NULL;
     bool        found_null = false;
+    bool        found_def = false;
+    int            def_index = -1;
     int            null_index = -1;
 
     /* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
             i = 0;
             found_null = false;
             null_index = -1;
+            found_def = false;
+            def_index = -1;
             foreach(cell, boundspecs)
             {
                 ListCell   *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel)


2)

@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
     bound = stringToNode(TextDatumGetCString(boundDatum));
     ReleaseSysCache(tuple);
 
+    /* Return if it is a default list partition */
+    PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+    ListCell *cell;
+    foreach(cell, spec->listdatums)

More comment on above hunk is needed?

Rather then adding check for default here, I think this should be handle inside
get_qual_for_list().

3) Code is not aligned with existing

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
             Sconst            { $$ = makeStringConst($1, @1); }
             | NumericOnly    { $$ = makeAConst($1, @1); }
             | NULL_P        { $$ = makeNullAConst(@1); }
+            | DEFAULT  { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
         ;


4) Unnecessary hunk:

@@ -2601,7 +2602,6 @@ partbound_datum_list:
             | partbound_datum_list ',' partbound_datum
                                                 { $$ = lappend($1, $3); }
         ;
-

Note: this is just an initially review comments, I am yet to do the detailed review
and the testing for the patch.

Thanks.

On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error if
conflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.

Thank you,
Rahila Syed

On Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type.  We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states.  Seems to be a
> bit of work still ...

This patch is only targeting list partitioning.   It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition.  The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.

I don't know how complete the patch is, but the specification seems
clear enough.  If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5).  If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5).  If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7).  The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do.  I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Rushabh Lathia

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello Rushabh,

Thank you for reviewing.
Have addressed all your comments in the attached patch. The attached patch currently throws an
error if a new partition is added after default partition.

>Rather then adding check for default here, I think this should be handle inside
>get_qual_for_list().
Have moved the check inside get_qual_for_partbound() as needed to do some operations
before calling get_qual_for_list() for default partitions.

Thank you,
Rahila Syed

On Tue, Mar 21, 2017 at 11:36 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
I picked this for review and noticed that patch is not getting
cleanly complied on my environment.

partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
      Const    *val = lfirst(c);
      ^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
  PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
  ^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
     result = parent->indexes[partdesc->boundinfo->def_index];
     ^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
     *failed_at = RelationGetRelid(parent->reldesc);
                ^
cc1: all warnings being treated as errors

Apart from this, I was reading patch here are few more comments:

1) Variable initializing happening at two place.

@@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
     /* List partitioning specific */
     PartitionListValue **all_values = NULL;
     bool        found_null = false;
+    bool        found_def = false;
+    int            def_index = -1;
     int            null_index = -1;
 
     /* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
             i = 0;
             found_null = false;
             null_index = -1;
+            found_def = false;
+            def_index = -1;
             foreach(cell, boundspecs)
             {
                 ListCell   *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel)


2)

@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
     bound = stringToNode(TextDatumGetCString(boundDatum));
     ReleaseSysCache(tuple);
 
+    /* Return if it is a default list partition */
+    PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+    ListCell *cell;
+    foreach(cell, spec->listdatums)

More comment on above hunk is needed?

Rather then adding check for default here, I think this should be handle inside
get_qual_for_list().

3) Code is not aligned with existing

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
             Sconst            { $$ = makeStringConst($1, @1); }
             | NumericOnly    { $$ = makeAConst($1, @1); }
             | NULL_P        { $$ = makeNullAConst(@1); }
+            | DEFAULT  { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
         ;


4) Unnecessary hunk:

@@ -2601,7 +2602,6 @@ partbound_datum_list:
             | partbound_datum_list ',' partbound_datum
                                                 { $$ = lappend($1, $3); }
         ;
-

Note: this is just an initially review comments, I am yet to do the detailed review
and the testing for the patch.

Thanks.

On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error if
conflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.

Thank you,
Rahila Syed

On Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type.  We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states.  Seems to be a
> bit of work still ...

This patch is only targeting list partitioning.   It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition.  The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.

I don't know how complete the patch is, but the specification seems
clear enough.  If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5).  If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5).  If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7).  The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do.  I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Rushabh Lathia

Attachment

Re: Adding support for Default partition in partitioning

From
Rushabh Lathia
Date:
I applied the patch and was trying to perform some testing, but its
ending up with server crash with the test shared by you in your starting mail:

postgres=# CREATE TABLE list_partitioned (             
postgres(#     a int
postgres(# ) PARTITION BY LIST (a);
CREATE TABLE
postgres=#
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE

postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Apart from this, few more explanation in the patch is needed to explain the
changes for the DEFAULT partition. Like I am not quite sure what exactly the
latest version of patch supports, like does that support the tuple row movement,
or adding new partition will be allowed having partition table having DEFAULT
partition, which is quite difficult to understand through the code changes.

Another part which is missing in the patch is the test coverage, adding
proper test coverage, which explain what is supported and what's not.

Thanks,

On Fri, Mar 24, 2017 at 3:25 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello Rushabh,

Thank you for reviewing.
Have addressed all your comments in the attached patch. The attached patch currently throws an
error if a new partition is added after default partition.

>Rather then adding check for default here, I think this should be handle inside
>get_qual_for_list().
Have moved the check inside get_qual_for_partbound() as needed to do some operations
before calling get_qual_for_list() for default partitions.

Thank you,
Rahila Syed

On Tue, Mar 21, 2017 at 11:36 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
I picked this for review and noticed that patch is not getting
cleanly complied on my environment.

partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
      Const    *val = lfirst(c);
      ^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
  PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
  ^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
     result = parent->indexes[partdesc->boundinfo->def_index];
     ^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
     *failed_at = RelationGetRelid(parent->reldesc);
                ^
cc1: all warnings being treated as errors

Apart from this, I was reading patch here are few more comments:

1) Variable initializing happening at two place.

@@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
     /* List partitioning specific */
     PartitionListValue **all_values = NULL;
     bool        found_null = false;
+    bool        found_def = false;
+    int            def_index = -1;
     int            null_index = -1;
 
     /* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
             i = 0;
             found_null = false;
             null_index = -1;
+            found_def = false;
+            def_index = -1;
             foreach(cell, boundspecs)
             {
                 ListCell   *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel)


2)

@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
     bound = stringToNode(TextDatumGetCString(boundDatum));
     ReleaseSysCache(tuple);
 
+    /* Return if it is a default list partition */
+    PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+    ListCell *cell;
+    foreach(cell, spec->listdatums)

More comment on above hunk is needed?

Rather then adding check for default here, I think this should be handle inside
get_qual_for_list().

3) Code is not aligned with existing

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
             Sconst            { $$ = makeStringConst($1, @1); }
             | NumericOnly    { $$ = makeAConst($1, @1); }
             | NULL_P        { $$ = makeNullAConst(@1); }
+            | DEFAULT  { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
         ;


4) Unnecessary hunk:

@@ -2601,7 +2602,6 @@ partbound_datum_list:
             | partbound_datum_list ',' partbound_datum
                                                 { $$ = lappend($1, $3); }
         ;
-

Note: this is just an initially review comments, I am yet to do the detailed review
and the testing for the patch.

Thanks.

On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error if
conflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.

Thank you,
Rahila Syed

On Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type.  We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states.  Seems to be a
> bit of work still ...

This patch is only targeting list partitioning.   It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition.  The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.

I don't know how complete the patch is, but the specification seems
clear enough.  If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5).  If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5).  If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7).  The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do.  I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Rushabh Lathia




--
Rushabh Lathia

Re: Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Rahila,

IIUC, your default_partition_v3.patch is trying to implement an error if new
partition is added to a table already having a default partition.

I too tried to run the test and similar to Rushabh, I see the server is crashing
with the given test.

However, if I reverse the order of creating the partitions, i.e. if I create a
partition with list first and later create the default partition.

The reason is, while defining new relation DefineRelation() checks for
overlapping partitions by calling check_new_partition_bound(). Where in case
of list partitions it assumes that the ndatums should be > 0, but in case of
default partition that is 0.
 
The crash here seems to be coming because, following assertion getting failed in
function check_new_partition_bound():


Assert(boundinfo &&
  boundinfo->strategy == PARTITION_STRATEGY_LIST &&
  (boundinfo->ndatums > 0 || boundinfo->has_null));


So, I think the error you have added needs to be moved before this assertion:


@@ -690,6 +715,12 @@ check_new_partition_bound(char *relname, Relation parent, Node *bound)
    boundinfo->strategy == PARTITION_STRATEGY_LIST &&
    (boundinfo->ndatums > 0 || boundinfo->has_null));
 
+ if (boundinfo->has_def)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("parent table \"%s\" has a default partition",
+ RelationGetRelationName(parent))));


If I do so, the server does not run into crash, and instead throws an error:

postgres=# CREATE TABLE list_partitioned (               
    a int
) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
ERROR:  parent table "list_partitioned" has a default partition

Regards,
Jeevan Ladhe

On Mon, Mar 27, 2017 at 12:10 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
I applied the patch and was trying to perform some testing, but its
ending up with server crash with the test shared by you in your starting mail:

postgres=# CREATE TABLE list_partitioned (             
postgres(#     a int
postgres(# ) PARTITION BY LIST (a);
CREATE TABLE
postgres=#
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE

postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Apart from this, few more explanation in the patch is needed to explain the
changes for the DEFAULT partition. Like I am not quite sure what exactly the
latest version of patch supports, like does that support the tuple row movement,
or adding new partition will be allowed having partition table having DEFAULT
partition, which is quite difficult to understand through the code changes.

Another part which is missing in the patch is the test coverage, adding
proper test coverage, which explain what is supported and what's not.

Thanks,

On Fri, Mar 24, 2017 at 3:25 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello Rushabh,

Thank you for reviewing.
Have addressed all your comments in the attached patch. The attached patch currently throws an
error if a new partition is added after default partition.

>Rather then adding check for default here, I think this should be handle inside
>get_qual_for_list().
Have moved the check inside get_qual_for_partbound() as needed to do some operations
before calling get_qual_for_list() for default partitions.

Thank you,
Rahila Syed

On Tue, Mar 21, 2017 at 11:36 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
I picked this for review and noticed that patch is not getting
cleanly complied on my environment.

partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
      Const    *val = lfirst(c);
      ^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
  PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
  ^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
     result = parent->indexes[partdesc->boundinfo->def_index];
     ^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
     *failed_at = RelationGetRelid(parent->reldesc);
                ^
cc1: all warnings being treated as errors

Apart from this, I was reading patch here are few more comments:

1) Variable initializing happening at two place.

@@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
     /* List partitioning specific */
     PartitionListValue **all_values = NULL;
     bool        found_null = false;
+    bool        found_def = false;
+    int            def_index = -1;
     int            null_index = -1;
 
     /* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
             i = 0;
             found_null = false;
             null_index = -1;
+            found_def = false;
+            def_index = -1;
             foreach(cell, boundspecs)
             {
                 ListCell   *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel)


2)

@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
     bound = stringToNode(TextDatumGetCString(boundDatum));
     ReleaseSysCache(tuple);
 
+    /* Return if it is a default list partition */
+    PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+    ListCell *cell;
+    foreach(cell, spec->listdatums)

More comment on above hunk is needed?

Rather then adding check for default here, I think this should be handle inside
get_qual_for_list().

3) Code is not aligned with existing

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
             Sconst            { $$ = makeStringConst($1, @1); }
             | NumericOnly    { $$ = makeAConst($1, @1); }
             | NULL_P        { $$ = makeNullAConst(@1); }
+            | DEFAULT  { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
         ;


4) Unnecessary hunk:

@@ -2601,7 +2602,6 @@ partbound_datum_list:
             | partbound_datum_list ',' partbound_datum
                                                 { $$ = lappend($1, $3); }
         ;
-

Note: this is just an initially review comments, I am yet to do the detailed review
and the testing for the patch.

Thanks.

On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error if
conflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.

Thank you,
Rahila Syed

On Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type.  We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states.  Seems to be a
> bit of work still ...

This patch is only targeting list partitioning.   It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition.  The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.

I don't know how complete the patch is, but the specification seems
clear enough.  If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5).  If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5).  If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7).  The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do.  I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Rushabh Lathia




--
Rushabh Lathia

Re: Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Thanks for reporting. I have identified the problem and have a fix. Currently working on allowing
adding a partition after default partition if the default partition does not have any conflicting rows.
Will update the patch with both of these.

Thank you,
Rahila Syed

On Mon, Mar 27, 2017 at 12:10 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
I applied the patch and was trying to perform some testing, but its
ending up with server crash with the test shared by you in your starting mail:

postgres=# CREATE TABLE list_partitioned (             
postgres(#     a int
postgres(# ) PARTITION BY LIST (a);
CREATE TABLE
postgres=#
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE

postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Apart from this, few more explanation in the patch is needed to explain the
changes for the DEFAULT partition. Like I am not quite sure what exactly the
latest version of patch supports, like does that support the tuple row movement,
or adding new partition will be allowed having partition table having DEFAULT
partition, which is quite difficult to understand through the code changes.

Another part which is missing in the patch is the test coverage, adding
proper test coverage, which explain what is supported and what's not.

Thanks,

On Fri, Mar 24, 2017 at 3:25 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello Rushabh,

Thank you for reviewing.
Have addressed all your comments in the attached patch. The attached patch currently throws an
error if a new partition is added after default partition.

>Rather then adding check for default here, I think this should be handle inside
>get_qual_for_list().
Have moved the check inside get_qual_for_partbound() as needed to do some operations
before calling get_qual_for_list() for default partitions.

Thank you,
Rahila Syed

On Tue, Mar 21, 2017 at 11:36 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
I picked this for review and noticed that patch is not getting
cleanly complied on my environment.

partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:269:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
      Const    *val = lfirst(c);
      ^
partition.c: In function ‘generate_partition_qual’:
partition.c:1590:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
  PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
  ^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1820:5: error: array subscript has type ‘char’ [-Werror=char-subscripts]
     result = parent->indexes[partdesc->boundinfo->def_index];
     ^
partition.c:1824:16: error: assignment makes pointer from integer without a cast [-Werror]
     *failed_at = RelationGetRelid(parent->reldesc);
                ^
cc1: all warnings being treated as errors

Apart from this, I was reading patch here are few more comments:

1) Variable initializing happening at two place.

@@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
     /* List partitioning specific */
     PartitionListValue **all_values = NULL;
     bool        found_null = false;
+    bool        found_def = false;
+    int            def_index = -1;
     int            null_index = -1;
 
     /* Range partitioning specific */
@@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
             i = 0;
             found_null = false;
             null_index = -1;
+            found_def = false;
+            def_index = -1;
             foreach(cell, boundspecs)
             {
                 ListCell   *c;
@@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel)


2)

@@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
     bound = stringToNode(TextDatumGetCString(boundDatum));
     ReleaseSysCache(tuple);
 
+    /* Return if it is a default list partition */
+    PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
+    ListCell *cell;
+    foreach(cell, spec->listdatums)

More comment on above hunk is needed?

Rather then adding check for default here, I think this should be handle inside
get_qual_for_list().

3) Code is not aligned with existing

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..ebb7db7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2594,6 +2594,7 @@ partbound_datum:
             Sconst            { $$ = makeStringConst($1, @1); }
             | NumericOnly    { $$ = makeAConst($1, @1); }
             | NULL_P        { $$ = makeNullAConst(@1); }
+            | DEFAULT  { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
         ;


4) Unnecessary hunk:

@@ -2601,7 +2602,6 @@ partbound_datum_list:
             | partbound_datum_list ',' partbound_datum
                                                 { $$ = lappend($1, $3); }
         ;
-

Note: this is just an initially review comments, I am yet to do the detailed review
and the testing for the patch.

Thanks.

On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Please find attached a rebased patch with support for pg_dump. I am working on the patch
to handle adding a new partition after a default partition by throwing an error if
conflicting rows exist in default partition and adding the partition successfully otherwise.
Will post an updated patch by tomorrow.

Thank you,
Rahila Syed

On Mon, Mar 13, 2017 at 5:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type.  We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states.  Seems to be a
> bit of work still ...

This patch is only targeting list partitioning.   It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition.  The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.

I don't know how complete the patch is, but the specification seems
clear enough.  If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5).  If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5).  If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7).  The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do.  I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Rushabh Lathia




--
Rushabh Lathia

Re: Adding support for Default partition in partitioning

From
David Steele
Date:
On 3/29/17 8:13 AM, Rahila Syed wrote:

> Thanks for reporting. I have identified the problem and have a fix.
> Currently working on allowing
> adding a partition after default partition if the default partition does
> not have any conflicting rows.
> Will update the patch with both of these.

The CF has been extended but until April 7 but time is still growing 
short.  Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) 
or this submission will be marked "Returned with Feedback".

Thanks,
-- 
-David
david@pgmasters.net



Re: Adding support for Default partition in partitioning

From
David Steele
Date:
On 3/31/17 10:45 AM, David Steele wrote:
> On 3/29/17 8:13 AM, Rahila Syed wrote:
> 
>> Thanks for reporting. I have identified the problem and have a fix.
>> Currently working on allowing
>> adding a partition after default partition if the default partition does
>> not have any conflicting rows.
>> Will update the patch with both of these.
> 
> The CF has been extended but until April 7 but time is still growing
> short.  Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12)
> or this submission will be marked "Returned with Feedback".

This submission has been marked "Returned with Feedback".  Please feel
free to resubmit to a future commitfest.

Regards,
-- 
-David
david@pgmasters.net



Re: Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello,

Please find attached an updated patch.
Following has been accomplished in this update:

1. A new partition can be added after default partition if there are no conflicting rows in default partition.
2. Solved the crash reported earlier.

Thank you,
Rahila Syed



On Tue, Apr 4, 2017 at 6:22 PM, David Steele <david@pgmasters.net> wrote:
On 3/31/17 10:45 AM, David Steele wrote:
> On 3/29/17 8:13 AM, Rahila Syed wrote:
>
>> Thanks for reporting. I have identified the problem and have a fix.
>> Currently working on allowing
>> adding a partition after default partition if the default partition does
>> not have any conflicting rows.
>> Will update the patch with both of these.
>
> The CF has been extended but until April 7 but time is still growing
> short.  Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12)
> or this submission will be marked "Returned with Feedback".

This submission has been marked "Returned with Feedback".  Please feel
free to resubmit to a future commitfest.

Regards,
--
-David
david@pgmasters.net

Attachment

Re: Adding support for Default partition in partitioning

From
Keith Fiske
Date:

On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Please find attached an updated patch.
Following has been accomplished in this update:

1. A new partition can be added after default partition if there are no conflicting rows in default partition.
2. Solved the crash reported earlier.

Thank you,
Rahila Syed


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Installed and compiled against commit 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017 -0400) without any issue

However, running your original example, I'm getting this error

keith@keith=# CREATE TABLE list_partitioned (             
keith(#     a int
keith(# ) PARTITION BY LIST (a);
CREATE TABLE
Time: 4.933 ms
keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE
Time: 3.492 ms
keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5);
ERROR:  unrecognized node type: 216
Time: 0.979 ms

Also, I'm still of the opinion that denying future partitions of values in the default would be a cause of confusion. In order to move the data out of the default and into a proper child it would require first removing that data from the default, storing it elsewhere, creating the child, then moving it back. If it's only a small amount of data it may not be a big deal, but if it's a large amount, that could cause quite a lot of contention if done in a single transaction. Either that or the user would have to deal with data existing in the table, disappearing, then reappearing again.

This also makes it harder to migrate an existing table easily. Essentially no child tables for a large, existing data set could ever be created without going through one of the two situations above.

However, thinking through this, I'm not sure I can see a solution without the global index support. If this restriction is removed, there's still an issue of data duplication after the necessary child table is added. So guess it's a matter of deciding which user experience is better for the moment?

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Re: Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/04/05 6:22, Keith Fiske wrote:
> On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
>> Please find attached an updated patch.
>> Following has been accomplished in this update:
>>
>> 1. A new partition can be added after default partition if there are no
>> conflicting rows in default partition.
>> 2. Solved the crash reported earlier.
>
> Installed and compiled against commit
> 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> -0400) without any issue
> 
> However, running your original example, I'm getting this error
> 
> keith@keith=# CREATE TABLE list_partitioned (
> keith(#     a int
> keith(# ) PARTITION BY LIST (a);
> CREATE TABLE
> Time: 4.933 ms
> keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE
> Time: 3.492 ms
> keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
> IN (4,5);
> ERROR:  unrecognized node type: 216

It seems like the new ExecPrepareCheck should be used in the place of
ExecPrepareExpr in the code added in check_new_partition_bound().

> Also, I'm still of the opinion that denying future partitions of values in
> the default would be a cause of confusion. In order to move the data out of
> the default and into a proper child it would require first removing that
> data from the default, storing it elsewhere, creating the child, then
> moving it back. If it's only a small amount of data it may not be a big
> deal, but if it's a large amount, that could cause quite a lot of
> contention if done in a single transaction. Either that or the user would
> have to deal with data existing in the table, disappearing, then
> reappearing again.
>
> This also makes it harder to migrate an existing table easily. Essentially
> no child tables for a large, existing data set could ever be created
> without going through one of the two situations above.

I thought of the following possible way to allow future partitions when
the default partition exists which might contain rows that belong to the
newly created partition (unfortunately, nothing that we could implement at
this point for v10):

Suppose you want to add a new partition which will accept key=3 rows.

1. If no default partition exists, we're done; no key=3 rows would have  been accepted by any of the table's existing
partitions,so no need to  move any rows
 

2. Default partition exists which might contain key=3 rows, which we'll  need to move.  If we do this in the same
transaction,as you say, it  might result in unnecessary unavailability of table's data.  So, it's  better to delegate
thatresponsibility to a background process.  The  current transaction will only add the new key=3 partition, so any
key=3 rows will be routed to the new partition from this point on.  But we  haven't updated the default partition's
constraintyet to say that it  no longer contains key=3 rows (constraint that the planner consumes),  so it will
continueto be scanned for queries that request key=3 rows  (there should be some metadata to indicate that the default
partition's constraint is invalid), along with the new partition.
 

3. A background process receives a "work item" requesting it to move all  key=3 rows from the default partition heap to
thenew partition's heap.  The movement occurs without causing the table to become unavailable;  once all rows have been
moved,we momentarily lock the table to update  the default partition's constraint to mark it valid, so that it will  no
longerbe accessed by queries that want to see key=3 rows.
 

Regarding 2, there is a question of whether it should not be possible for
the row movement to occur in the same transaction.  Somebody may want that
to happen because they chose to run the command during a maintenance
window, when the table's becoming unavailable is not an issue.  In that
case, we need to think of the interface more carefully.

Regarding 3, I think the new autovacuum work items infrastructure added by
the following commit looks very promising:

* BRIN auto-summarization *
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7526e10224f0792201e99631567bbe44492bbde4

> However, thinking through this, I'm not sure I can see a solution without
> the global index support. If this restriction is removed, there's still an
> issue of data duplication after the necessary child table is added. So
> guess it's a matter of deciding which user experience is better for the
> moment?

I'm not sure about the fate of this particular patch for v10, but until we
implement a solution to move rows and design appropriate interface for the
same, we could error out if moving rows is required at all, like the patch
does.

Could you briefly elaborate why you think the lack global index support
would be a problem in this regard?

I agree that some design is required here to implement a solution
redistribution of rows; not only in the context of supporting the notion
of default partitions, but also to allow the feature to split/merge range
(only?) partitions.  I'd like to work on the latter for v11 for which I
would like to post a proposal soon; if anyone would like to collaborate
(ideas, code, review), I look forward to.  (sorry for hijacking this thread.)

Thanks,
Amit





Re: Adding support for Default partition in partitioning

From
Rushabh Lathia
Date:


On Wed, Apr 5, 2017 at 10:59 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/04/05 6:22, Keith Fiske wrote:
> On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
>> Please find attached an updated patch.
>> Following has been accomplished in this update:
>>
>> 1. A new partition can be added after default partition if there are no
>> conflicting rows in default partition.
>> 2. Solved the crash reported earlier.
>
> Installed and compiled against commit
> 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> -0400) without any issue
>
> However, running your original example, I'm getting this error
>
> keith@keith=# CREATE TABLE list_partitioned (
> keith(#     a int
> keith(# ) PARTITION BY LIST (a);
> CREATE TABLE
> Time: 4.933 ms
> keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE
> Time: 3.492 ms
> keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
> IN (4,5);
> ERROR:  unrecognized node type: 216

It seems like the new ExecPrepareCheck should be used in the place of
ExecPrepareExpr in the code added in check_new_partition_bound().

> Also, I'm still of the opinion that denying future partitions of values in
> the default would be a cause of confusion. In order to move the data out of
> the default and into a proper child it would require first removing that
> data from the default, storing it elsewhere, creating the child, then
> moving it back. If it's only a small amount of data it may not be a big
> deal, but if it's a large amount, that could cause quite a lot of
> contention if done in a single transaction. Either that or the user would
> have to deal with data existing in the table, disappearing, then
> reappearing again.
>
> This also makes it harder to migrate an existing table easily. Essentially
> no child tables for a large, existing data set could ever be created
> without going through one of the two situations above.

I thought of the following possible way to allow future partitions when
the default partition exists which might contain rows that belong to the
newly created partition (unfortunately, nothing that we could implement at
this point for v10):

Suppose you want to add a new partition which will accept key=3 rows.

1. If no default partition exists, we're done; no key=3 rows would have
   been accepted by any of the table's existing partitions, so no need to
   move any rows

2. Default partition exists which might contain key=3 rows, which we'll
   need to move.  If we do this in the same transaction, as you say, it
   might result in unnecessary unavailability of table's data.  So, it's
   better to delegate that responsibility to a background process.  The
   current transaction will only add the new key=3 partition, so any key=3
   rows will be routed to the new partition from this point on.  But we
   haven't updated the default partition's constraint yet to say that it
   no longer contains key=3 rows (constraint that the planner consumes),
   so it will continue to be scanned for queries that request key=3 rows
   (there should be some metadata to indicate that the default partition's
   constraint is invalid), along with the new partition.

3. A background process receives a "work item" requesting it to move all
   key=3 rows from the default partition heap to the new partition's heap.
   The movement occurs without causing the table to become unavailable;
   once all rows have been moved, we momentarily lock the table to update
   the default partition's constraint to mark it valid, so that it will
   no longer be accessed by queries that want to see key=3 rows.

Regarding 2, there is a question of whether it should not be possible for
the row movement to occur in the same transaction.  Somebody may want that
to happen because they chose to run the command during a maintenance
window, when the table's becoming unavailable is not an issue.  In that
case, we need to think of the interface more carefully.

Regarding 3, I think the new autovacuum work items infrastructure added by
the following commit looks very promising:

* BRIN auto-summarization *
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7526e10224f0792201e99631567bbe44492bbde4

> However, thinking through this, I'm not sure I can see a solution without
> the global index support. If this restriction is removed, there's still an
> issue of data duplication after the necessary child table is added. So
> guess it's a matter of deciding which user experience is better for the
> moment?

I'm not sure about the fate of this particular patch for v10, but until we
implement a solution to move rows and design appropriate interface for the
same, we could error out if moving rows is required at all, like the patch
does.


+1

I agree about the future plan about the row movement, how that is I am
not quite sure at this stage.

I was thinking that CREATE new partition is the DDL command, so even
if row-movement works with holding the lock on the new partition table,
that should be fine. I am not quire sure, why row movement should be
happen in the back-ground process.

Of-course, one idea is that if someone don't want feature of row-movement,
then we might add that under some GUC or may be as another option into
the CREATE partition table.

Could you briefly elaborate why you think the lack global index support
would be a problem in this regard?

I agree that some design is required here to implement a solution
redistribution of rows; not only in the context of supporting the notion
of default partitions, but also to allow the feature to split/merge range
(only?) partitions.  I'd like to work on the latter for v11 for which I
would like to post a proposal soon; if anyone would like to collaborate
(ideas, code, review), I look forward to.  (sorry for hijacking this thread.)

Thanks,
Amit





--
Rushabh Lathia

Re: Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hi,

>However, running your original example, I'm getting this error
Thank you for testing. Please find attached an updated patch which fixes the above.


Thank you,
Rahila Syed


Attachment

Re: Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/04/05 14:41, Rushabh Lathia wrote:
> I agree about the future plan about the row movement, how that is I am
> not quite sure at this stage.
> 
> I was thinking that CREATE new partition is the DDL command, so even
> if row-movement works with holding the lock on the new partition table,
> that should be fine. I am not quire sure, why row movement should be
> happen in the back-ground process.

I think to improve the availability of access to the partitioned table.

Consider that the default partition may have gotten pretty large.
Scanning it and moving rows to the newly added partition while holding an
AccessExclusiveLock on the parent will block any and all of the concurrent
activity on it until the row-movement is finished.  One may be prepared to
pay this cost, for which there should definitely be an option to perform
the row-movement in the same transaction (also possibly the default behavior).

Thanks,
Amit





Re: Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello Amit,

>Could you briefly elaborate why you think the lack global index support
>would be a problem in this regard?
I think following can happen if we allow rows satisfying the new partition to lie around in the
default partition until background process moves it.
Consider a scenario where partition key is a primary key and the data in the default partition is
not yet moved into the newly added partition. If now, new data is added into the new partition
which also exists(same key) in default partition there will be data duplication. If now
we scan the partitioned table for that key(from both the default and new partition as we
have not moved the rows) it will fetch the both rows.
Unless we have global indexes for partitioned tables, there is chance of data duplication between
child table added after default partition and the default partition.

>Scanning it and moving rows to the newly added partition while holding an
>AccessExclusiveLock on the parent will block any and all of the concurrent
>activity on it until the row-movement is finished.
Can you explain why this will require AccessExclusiveLock on parent and
not just the default partition and newly added partition?

Thank you,
Rahila Syed


On Wed, Apr 5, 2017 at 1:22 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/04/05 14:41, Rushabh Lathia wrote:
> I agree about the future plan about the row movement, how that is I am
> not quite sure at this stage.
>
> I was thinking that CREATE new partition is the DDL command, so even
> if row-movement works with holding the lock on the new partition table,
> that should be fine. I am not quire sure, why row movement should be
> happen in the back-ground process.

I think to improve the availability of access to the partitioned table.

Consider that the default partition may have gotten pretty large.
Scanning it and moving rows to the newly added partition while holding an
AccessExclusiveLock on the parent will block any and all of the concurrent
activity on it until the row-movement is finished.  One may be prepared to
pay this cost, for which there should definitely be an option to perform
the row-movement in the same transaction (also possibly the default behavior).

Thanks,
Amit



Re: Adding support for Default partition in partitioning

From
Amit Langote
Date:
Hi Rahila,

On 2017/04/05 18:57, Rahila Syed wrote:
> Hello Amit,
> 
>> Could you briefly elaborate why you think the lack global index support
>> would be a problem in this regard?
> I think following can happen if we allow rows satisfying the new partition
> to lie around in the
> default partition until background process moves it.
> Consider a scenario where partition key is a primary key and the data in
> the default partition is
> not yet moved into the newly added partition. If now, new data is added
> into the new partition
> which also exists(same key) in default partition there will be data
> duplication. If now
> we scan the partitioned table for that key(from both the default and new
> partition as we
> have not moved the rows) it will fetch the both rows.
> Unless we have global indexes for partitioned tables, there is chance of
> data duplication between
> child table added after default partition and the default partition.

Ah, okay.  I think I wrote that question before even reading the next
sentence in Keith's message ("there's still an issue of data duplication
after the necessary child table is added.")

Maybe we can disallow background row movement if such global constraint
exists.

>> Scanning it and moving rows to the newly added partition while holding an
>> AccessExclusiveLock on the parent will block any and all of the concurrent
>> activity on it until the row-movement is finished.
> Can you explain why this will require AccessExclusiveLock on parent and
> not just the default partition and newly added partition?

Because we take an AccessExclusiveLock on the parent table when
adding/removing a partition in general.  We do that because concurrent
accessors of the parent table rely on its partition descriptor from not
changing under them.

Thanks,
Amit





Re: Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>Could you briefly elaborate why you think the lack global index support
>>would be a problem in this regard?
> I think following can happen if we allow rows satisfying the new partition
> to lie around in the
> default partition until background process moves it.
> Consider a scenario where partition key is a primary key and the data in the
> default partition is
> not yet moved into the newly added partition. If now, new data is added into
> the new partition
> which also exists(same key) in default partition there will be data
> duplication. If now
> we scan the partitioned table for that key(from both the default and new
> partition as we
> have not moved the rows) it will fetch the both rows.
> Unless we have global indexes for partitioned tables, there is chance of
> data duplication between
> child table added after default partition and the default partition.

Yes, I think it would be completely crazy to try to migrate the data
in the background:

- The migration might never complete because of a UNIQUE or CHECK
constraint on the partition to which rows are being migrated.

- Even if the migration eventually succeeded, such a design abandons
all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly
while the migration is in progress, unless the new partition has no
UNIQUE constraints.

- Partition-wise join and partition-wise aggregate would need to have
special case handling for the case of an unfinished migration, as
would any user code that accesses partitions directly.

- More generally, I think users expect that when a DDL command
finishes execution, it's done all of the work that there is to do (or
at the very least, that any remaining work has no user-visible
consequences, which would not be the case here).

IMV, the question of whether we have efficient ways to move data
around between partitions is somewhat separate from the question of
whether partitions can be defined in a certain way in the first place.
The problems that Keith refers to upthread already exist for
subpartitioning; you've got to detach the old partition, create a new
one, and then reinsert the data.  And for partitioning an
unpartitioned table: create a replacement table, insert all the data,
substitute it for the original table.  The fact that we have these
limitation is not good, but we're not going to rip out partitioning
entirely because we don't have clever ways of migrating the data in
those cases, and the proposed behavior here is not any worse.

Also, waiting for those problems to get fixed might be waiting for
Godot.  I'm not really all that sanguine about our chances of coming
up with a really nice way of handling these cases.  In a designed
based on table inheritance, you can leave it murky where certain data
is supposed to end up and migrate it on-line and you might get away
with that, but a major point of having declarative partitioning at all
is to remove that sort of murkiness.  It's probably not that hard to
come up with a command that locks the parent and moves data around via
full table scans, but I'm not sure how far that really gets us; you
could do the same thing easily enough with a sequence of commands
generated via a script.  And being able to do this in a general way
without a full table lock looks pretty hard - it doesn't seem
fundamentally different from trying to perform a table-rewriting
operation like CLUSTER without a full table lock, which we also don't
support.  The executor is not built to cope with any aspect of the
table definition shifting under it, and that includes the set of child
tables with are partitions of the table mentioned in the query.  Maybe
the executor can be taught to survive such definitional changes at
least in limited cases, but that's a much different project than
allowing default partitions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Adding support for Default partition in partitioning

From
Keith Fiske
Date:

On Wed, Apr 5, 2017 at 11:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>Could you briefly elaborate why you think the lack global index support
>>would be a problem in this regard?
> I think following can happen if we allow rows satisfying the new partition
> to lie around in the
> default partition until background process moves it.
> Consider a scenario where partition key is a primary key and the data in the
> default partition is
> not yet moved into the newly added partition. If now, new data is added into
> the new partition
> which also exists(same key) in default partition there will be data
> duplication. If now
> we scan the partitioned table for that key(from both the default and new
> partition as we
> have not moved the rows) it will fetch the both rows.
> Unless we have global indexes for partitioned tables, there is chance of
> data duplication between
> child table added after default partition and the default partition.

Yes, I think it would be completely crazy to try to migrate the data
in the background:

- The migration might never complete because of a UNIQUE or CHECK
constraint on the partition to which rows are being migrated.

- Even if the migration eventually succeeded, such a design abandons
all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly
while the migration is in progress, unless the new partition has no
UNIQUE constraints.

- Partition-wise join and partition-wise aggregate would need to have
special case handling for the case of an unfinished migration, as
would any user code that accesses partitions directly.

- More generally, I think users expect that when a DDL command
finishes execution, it's done all of the work that there is to do (or
at the very least, that any remaining work has no user-visible
consequences, which would not be the case here).

IMV, the question of whether we have efficient ways to move data
around between partitions is somewhat separate from the question of
whether partitions can be defined in a certain way in the first place.
The problems that Keith refers to upthread already exist for
subpartitioning; you've got to detach the old partition, create a new
one, and then reinsert the data.  And for partitioning an
unpartitioned table: create a replacement table, insert all the data,
substitute it for the original table.  The fact that we have these
limitation is not good, but we're not going to rip out partitioning
entirely because we don't have clever ways of migrating the data in
those cases, and the proposed behavior here is not any worse.

Also, waiting for those problems to get fixed might be waiting for
Godot.  I'm not really all that sanguine about our chances of coming
up with a really nice way of handling these cases.  In a designed
based on table inheritance, you can leave it murky where certain data
is supposed to end up and migrate it on-line and you might get away
with that, but a major point of having declarative partitioning at all
is to remove that sort of murkiness.  It's probably not that hard to
come up with a command that locks the parent and moves data around via
full table scans, but I'm not sure how far that really gets us; you
could do the same thing easily enough with a sequence of commands
generated via a script.  And being able to do this in a general way
without a full table lock looks pretty hard - it doesn't seem
fundamentally different from trying to perform a table-rewriting
operation like CLUSTER without a full table lock, which we also don't
support.  The executor is not built to cope with any aspect of the
table definition shifting under it, and that includes the set of child
tables with are partitions of the table mentioned in the query.  Maybe
the executor can be taught to survive such definitional changes at
least in limited cases, but that's a much different project than
allowing default partitions.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Confirmed that v5 patch works with examples given in the original post but segfaulted when I tried the examples I used in my blog post (taken from the documentation at the time I wrote it). https://www.keithf4.com/postgresql-10-built-in-partitioning/

keith@keith=# drop table cities;
DROP TABLE
Time: 6.055 ms
keith@keith=# CREATE TABLE cities (
    city_id         bigserial not null,        
    name         text not null,
    population   int
) PARTITION BY LIST (initcap(name));
CREATE TABLE
Time: 7.130 ms
keith@keith=# CREATE TABLE cities_west
    PARTITION OF cities (                      
    CONSTRAINT city_id_nonzero CHECK (city_id != 0)
) FOR VALUES IN ('Los Angeles', 'San Francisco');
CREATE TABLE
Time: 6.690 ms
keith@keith=# CREATE TABLE cities_default
keith-#     PARTITION OF cities FOR VALUES IN (DEFAULT);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
Failed.
Time: 387.887 ms

After reading responses, I think I'd be fine with how Rahila implemented this with disallowing the child until the data is removed from the default if this would allow it to be included in v10. As was mentioned, there just doesn't seem to be a way to easily handle the data conflicts cleanly at this time, but I think the value of the default to be able to catch accidental data vs returning an error is worth it. It also at least gives a slightly easier migration path vs having to migrate to a completely new table. Any chance this could be adapted for range partitioning as well? I'd be happy to create some pgtap tests with pg_partman for this then to make sure it works.

Only issue I see with this, and I'm not sure if it is an issue, is what happens to that default constraint clause when 1000s of partitions start getting added? From what I gather the default's constraint is built based off the cumulative opposite of all other child constraints. I don't understand the code well enough to see what it's actually doing, but if there are no gaps, is the method used smart enough to aggregate all the child constraints to make a simpler constraint that is simply outside the current min/max boundaries? If so, for serial/time range partitioning this should typically work out fine since there are rarely gaps. This actually seems more of an issue for list partitioning where each child is a distinct value or range of values that are completely arbitrary. Won't that check and re-evaluation of the default's constraint just get worse and worse as more children are added? Is there really even a need for the default to have an opposite constraint like this? Not sure on how the planner works with partitioning now, but wouldn't it be better to first check all non-default children for a match the same as it does now without a default and, failing that, then route to the default if one is declared? The default should accept any data then so I don't see the need for the constraint unless it's required for the current implementation. If that's the case, could that be changed?

Keith

Re: Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/04/06 0:19, Robert Haas wrote:
> On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>> Could you briefly elaborate why you think the lack global index support
>>> would be a problem in this regard?
>> I think following can happen if we allow rows satisfying the new partition
>> to lie around in the
>> default partition until background process moves it.
>> Consider a scenario where partition key is a primary key and the data in the
>> default partition is
>> not yet moved into the newly added partition. If now, new data is added into
>> the new partition
>> which also exists(same key) in default partition there will be data
>> duplication. If now
>> we scan the partitioned table for that key(from both the default and new
>> partition as we
>> have not moved the rows) it will fetch the both rows.
>> Unless we have global indexes for partitioned tables, there is chance of
>> data duplication between
>> child table added after default partition and the default partition.
> 
> Yes, I think it would be completely crazy to try to migrate the data
> in the background:
> 
> - The migration might never complete because of a UNIQUE or CHECK
> constraint on the partition to which rows are being migrated.
> 
> - Even if the migration eventually succeeded, such a design abandons
> all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly
> while the migration is in progress, unless the new partition has no
> UNIQUE constraints.
> 
> - Partition-wise join and partition-wise aggregate would need to have
> special case handling for the case of an unfinished migration, as
> would any user code that accesses partitions directly.
> 
> - More generally, I think users expect that when a DDL command
> finishes execution, it's done all of the work that there is to do (or
> at the very least, that any remaining work has no user-visible
> consequences, which would not be the case here).

OK, I realize the background migration was a poorly thought out idea.  And
a *first* version that will handle the row-movement should be doing that
as part of the same command anyway.

> IMV, the question of whether we have efficient ways to move data
> around between partitions is somewhat separate from the question of
> whether partitions can be defined in a certain way in the first place.
> The problems that Keith refers to upthread already exist for
> subpartitioning; you've got to detach the old partition, create a new
> one, and then reinsert the data.  And for partitioning an
> unpartitioned table: create a replacement table, insert all the data,
> substitute it for the original table.  The fact that we have these
> limitation is not good, but we're not going to rip out partitioning
> entirely because we don't have clever ways of migrating the data in
> those cases, and the proposed behavior here is not any worse.
>
> Also, waiting for those problems to get fixed might be waiting for
> Godot.  I'm not really all that sanguine about our chances of coming
> up with a really nice way of handling these cases.  In a designed
> based on table inheritance, you can leave it murky where certain data
> is supposed to end up and migrate it on-line and you might get away
> with that, but a major point of having declarative partitioning at all
> is to remove that sort of murkiness.  It's probably not that hard to
> come up with a command that locks the parent and moves data around via
> full table scans, but I'm not sure how far that really gets us; you
> could do the same thing easily enough with a sequence of commands
> generated via a script.  And being able to do this in a general way
> without a full table lock looks pretty hard - it doesn't seem
> fundamentally different from trying to perform a table-rewriting
> operation like CLUSTER without a full table lock, which we also don't
> support.  The executor is not built to cope with any aspect of the
> table definition shifting under it, and that includes the set of child
> tables with are partitions of the table mentioned in the query.  Maybe
> the executor can be taught to survive such definitional changes at
> least in limited cases, but that's a much different project than
> allowing default partitions.

Agreed.

Thanks,
Amit





Re: Adding support for Default partition in partitioning

From
Keith Fiske
Date:

On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske <keith@omniti.com> wrote:


Only issue I see with this, and I'm not sure if it is an issue, is what happens to that default constraint clause when 1000s of partitions start getting added? From what I gather the default's constraint is built based off the cumulative opposite of all other child constraints. I don't understand the code well enough to see what it's actually doing, but if there are no gaps, is the method used smart enough to aggregate all the child constraints to make a simpler constraint that is simply outside the current min/max boundaries? If so, for serial/time range partitioning this should typically work out fine since there are rarely gaps. This actually seems more of an issue for list partitioning where each child is a distinct value or range of values that are completely arbitrary. Won't that check and re-evaluation of the default's constraint just get worse and worse as more children are added? Is there really even a need for the default to have an opposite constraint like this? Not sure on how the planner works with partitioning now, but wouldn't it be better to first check all non-default children for a match the same as it does now without a default and, failing that, then route to the default if one is declared? The default should accept any data then so I don't see the need for the constraint unless it's required for the current implementation. If that's the case, could that be changed?

Keith

Actually, thinking on this more, I realized this does again come back to the lack of a global index. Without the constraint, data could be put directly into the default that could technically conflict with the partition scheme elsewhere. Perhaps, instead of the constraint, inserts directly to the default could be prevented on the user level. Writing to valid children directly certainly has its place, but been thinking about it, and I can't see any reason why one would ever want to write directly to the default. It's use case seems to be around being a sort of temporary storage until that data can be moved to a valid location. Would still need to allow removal of data, though.

Not sure if that's even a workable solution. Just trying to think of ways around the current limitations and still allow this feature.

Re: Adding support for Default partition in partitioning

From
Rushabh Lathia
Date:
On 2017/04/06 0:19, Robert Haas wrote:
> On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>> Could you briefly elaborate why you think the lack global index support
>>> would be a problem in this regard?
>> I think following can happen if we allow rows satisfying the new partition
>> to lie around in the
>> default partition until background process moves it.
>> Consider a scenario where partition key is a primary key and the data in the
>> default partition is
>> not yet moved into the newly added partition. If now, new data is added into
>> the new partition
>> which also exists(same key) in default partition there will be data
>> duplication. If now
>> we scan the partitioned table for that key(from both the default and new
>> partition as we
>> have not moved the rows) it will fetch the both rows.
>> Unless we have global indexes for partitioned tables, there is chance of
>> data duplication between
>> child table added after default partition and the default partition.
>
> Yes, I think it would be completely crazy to try to migrate the data
> in the background:
>
> - The migration might never complete because of a UNIQUE or CHECK
> constraint on the partition to which rows are being migrated.
>
> - Even if the migration eventually succeeded, such a design abandons
> all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly
> while the migration is in progress, unless the new partition has no
> UNIQUE constraints.
>
> - Partition-wise join and partition-wise aggregate would need to have
> special case handling for the case of an unfinished migration, as
> would any user code that accesses partitions directly.
>
> - More generally, I think users expect that when a DDL command
> finishes execution, it's done all of the work that there is to do (or
> at the very least, that any remaining work has no user-visible
> consequences, which would not be the case here).

Thanks Robert for this explanation. This makes it more clear, why row
movement by background is not sensible idea.

On Thu, Apr 6, 2017 at 9:38 AM, Keith Fiske <keith@omniti.com> wrote:

On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske <keith@omniti.com> wrote:


Only issue I see with this, and I'm not sure if it is an issue, is what happens to that default constraint clause when 1000s of partitions start getting added? From what I gather the default's constraint is built based off the cumulative opposite of all other child constraints. I don't understand the code well enough to see what it's actually doing, but if there are no gaps, is the method used smart enough to aggregate all the child constraints to make a simpler constraint that is simply outside the current min/max boundaries? If so, for serial/time range partitioning this should typically work out fine since there are rarely gaps. This actually seems more of an issue for list partitioning where each child is a distinct value or range of values that are completely arbitrary. Won't that check and re-evaluation of the default's constraint just get worse and worse as more children are added? Is there really even a need for the default to have an opposite constraint like this? Not sure on how the planner works with partitioning now, but wouldn't it be better to first check all non-default children for a match the same as it does now without a default and, failing that, then route to the default if one is declared? The default should accept any data then so I don't see the need for the constraint unless it's required for the current implementation. If that's the case, could that be changed?

Keith

Actually, thinking on this more, I realized this does again come back to the lack of a global index. Without the constraint, data could be put directly into the default that could technically conflict with the partition scheme elsewhere. Perhaps, instead of the constraint, inserts directly to the default could be prevented on the user level. Writing to valid children directly certainly has its place, but been thinking about it, and I can't see any reason why one would ever want to write directly to the default. It's use case seems to be around being a sort of temporary storage until that data can be moved to a valid location. Would still need to allow removal of data, though.

Not sure if that's even a workable solution. Just trying to think of ways around the current limitations and still allow this feature.

I like the idea about having DEFAULT partition for the range partition. With the 
way partition is designed it can have holes into range partition. I think DEFAULT
for the range partition is a good idea, generally when the range having holes. When
range is serial then of course DEFAULT partition doen't much sense. 

Regarda,

Rushabh Lathia

Re: Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/04/06 13:08, Keith Fiske wrote:
> On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske wrote:
>> Only issue I see with this, and I'm not sure if it is an issue, is what
>> happens to that default constraint clause when 1000s of partitions start
>> getting added? From what I gather the default's constraint is built based
>> off the cumulative opposite of all other child constraints. I don't
>> understand the code well enough to see what it's actually doing, but if
>> there are no gaps, is the method used smart enough to aggregate all the
>> child constraints to make a simpler constraint that is simply outside the
>> current min/max boundaries? If so, for serial/time range partitioning this
>> should typically work out fine since there are rarely gaps. This actually
>> seems more of an issue for list partitioning where each child is a distinct
>> value or range of values that are completely arbitrary. Won't that check
>> and re-evaluation of the default's constraint just get worse and worse as
>> more children are added? Is there really even a need for the default to
>> have an opposite constraint like this? Not sure on how the planner works
>> with partitioning now, but wouldn't it be better to first check all
>> non-default children for a match the same as it does now without a default
>> and, failing that, then route to the default if one is declared? The
>> default should accept any data then so I don't see the need for the
>> constraint unless it's required for the current implementation. If that's
>> the case, could that be changed?

Unless I misread your last sentence, I think there might be some
confusion.  Currently, the partition constraint (think of these as you
would of user-defined check constraints) is needed for two reasons: 1. to
prevent direct insertion of rows into the default partition for which a
non-default partition exists; no two partitions should ever have duplicate
rows.  2. so that planner can use the constraint to determine if the
default partition needs to be scanned for a query using constraint
exclusion; no need, for example, to scan the default partition if the
query requests only key=3 rows and a partition for the same exists (no
other partition should have key=3 rows by definition, not even the
default).  As things stand today, planner needs to look at every partition
individually for using constraint exclusion to possibly exclude it, *even*
with declarative partitioning and that would include the default partition.

> Actually, thinking on this more, I realized this does again come back to
> the lack of a global index. Without the constraint, data could be put
> directly into the default that could technically conflict with the
> partition scheme elsewhere. Perhaps, instead of the constraint, inserts
> directly to the default could be prevented on the user level. Writing to
> valid children directly certainly has its place, but been thinking about
> it, and I can't see any reason why one would ever want to write directly to
> the default. It's use case seems to be around being a sort of temporary
> storage until that data can be moved to a valid location. Would still need
> to allow removal of data, though.

As mentioned above, the default partition will not allow directly
inserting a row whose key maps to some existing (non-default) partition.

As far as tuple-routing is concerned, it will choose the default partition
only if no other partition is found for the key.  Tuple-routing doesn't
use the partition constraints directly per se, like one of the two things
mentioned above do.  One could say that tuple-routing assigns the incoming
rows to partitions such that their individual partition constraints are
not violated.

Finally, we don't yet offer global guarantees for constraints like unique.The only guarantee that's in place is that no
twopartitions can contain
 
the same partition key.

Thanks,
Amit





Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello,

Thanks a lot for testing and reporting this. Please find attached an updated patch with the fix. The patch also contains a fix
regarding operator used at the time of creating expression as default partition constraint. This was notified offlist by Amit Langote.

Thank you,
Rahila Syed


On Thu, Apr 6, 2017 at 12:21 AM, Keith Fiske <keith@omniti.com> wrote:

On Wed, Apr 5, 2017 at 11:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>Could you briefly elaborate why you think the lack global index support
>>would be a problem in this regard?
> I think following can happen if we allow rows satisfying the new partition
> to lie around in the
> default partition until background process moves it.
> Consider a scenario where partition key is a primary key and the data in the
> default partition is
> not yet moved into the newly added partition. If now, new data is added into
> the new partition
> which also exists(same key) in default partition there will be data
> duplication. If now
> we scan the partitioned table for that key(from both the default and new
> partition as we
> have not moved the rows) it will fetch the both rows.
> Unless we have global indexes for partitioned tables, there is chance of
> data duplication between
> child table added after default partition and the default partition.

Yes, I think it would be completely crazy to try to migrate the data
in the background:

- The migration might never complete because of a UNIQUE or CHECK
constraint on the partition to which rows are being migrated.

- Even if the migration eventually succeeded, such a design abandons
all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly
while the migration is in progress, unless the new partition has no
UNIQUE constraints.

- Partition-wise join and partition-wise aggregate would need to have
special case handling for the case of an unfinished migration, as
would any user code that accesses partitions directly.

- More generally, I think users expect that when a DDL command
finishes execution, it's done all of the work that there is to do (or
at the very least, that any remaining work has no user-visible
consequences, which would not be the case here).

IMV, the question of whether we have efficient ways to move data
around between partitions is somewhat separate from the question of
whether partitions can be defined in a certain way in the first place.
The problems that Keith refers to upthread already exist for
subpartitioning; you've got to detach the old partition, create a new
one, and then reinsert the data.  And for partitioning an
unpartitioned table: create a replacement table, insert all the data,
substitute it for the original table.  The fact that we have these
limitation is not good, but we're not going to rip out partitioning
entirely because we don't have clever ways of migrating the data in
those cases, and the proposed behavior here is not any worse.

Also, waiting for those problems to get fixed might be waiting for
Godot.  I'm not really all that sanguine about our chances of coming
up with a really nice way of handling these cases.  In a designed
based on table inheritance, you can leave it murky where certain data
is supposed to end up and migrate it on-line and you might get away
with that, but a major point of having declarative partitioning at all
is to remove that sort of murkiness.  It's probably not that hard to
come up with a command that locks the parent and moves data around via
full table scans, but I'm not sure how far that really gets us; you
could do the same thing easily enough with a sequence of commands
generated via a script.  And being able to do this in a general way
without a full table lock looks pretty hard - it doesn't seem
fundamentally different from trying to perform a table-rewriting
operation like CLUSTER without a full table lock, which we also don't
support.  The executor is not built to cope with any aspect of the
table definition shifting under it, and that includes the set of child
tables with are partitions of the table mentioned in the query.  Maybe
the executor can be taught to survive such definitional changes at
least in limited cases, but that's a much different project than
allowing default partitions.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Confirmed that v5 patch works with examples given in the original post but segfaulted when I tried the examples I used in my blog post (taken from the documentation at the time I wrote it). https://www.keithf4.com/postgresql-10-built-in-partitioning/

keith@keith=# drop table cities;
DROP TABLE
Time: 6.055 ms
keith@keith=# CREATE TABLE cities (
    city_id         bigserial not null,        
    name         text not null,
    population   int
) PARTITION BY LIST (initcap(name));
CREATE TABLE
Time: 7.130 ms
keith@keith=# CREATE TABLE cities_west
    PARTITION OF cities (                      
    CONSTRAINT city_id_nonzero CHECK (city_id != 0)
) FOR VALUES IN ('Los Angeles', 'San Francisco');
CREATE TABLE
Time: 6.690 ms
keith@keith=# CREATE TABLE cities_default
keith-#     PARTITION OF cities FOR VALUES IN (DEFAULT);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
Failed.
Time: 387.887 ms

After reading responses, I think I'd be fine with how Rahila implemented this with disallowing the child until the data is removed from the default if this would allow it to be included in v10. As was mentioned, there just doesn't seem to be a way to easily handle the data conflicts cleanly at this time, but I think the value of the default to be able to catch accidental data vs returning an error is worth it. It also at least gives a slightly easier migration path vs having to migrate to a completely new table. Any chance this could be adapted for range partitioning as well? I'd be happy to create some pgtap tests with pg_partman for this then to make sure it works.

Only issue I see with this, and I'm not sure if it is an issue, is what happens to that default constraint clause when 1000s of partitions start getting added? From what I gather the default's constraint is built based off the cumulative opposite of all other child constraints. I don't understand the code well enough to see what it's actually doing, but if there are no gaps, is the method used smart enough to aggregate all the child constraints to make a simpler constraint that is simply outside the current min/max boundaries? If so, for serial/time range partitioning this should typically work out fine since there are rarely gaps. This actually seems more of an issue for list partitioning where each child is a distinct value or range of values that are completely arbitrary. Won't that check and re-evaluation of the default's constraint just get worse and worse as more children are added? Is there really even a need for the default to have an opposite constraint like this? Not sure on how the planner works with partitioning now, but wouldn't it be better to first check all non-default children for a match the same as it does now without a default and, failing that, then route to the default if one is declared? The default should accept any data then so I don't see the need for the constraint unless it's required for the current implementation. If that's the case, could that be changed?

Keith

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Keith Fiske
Date:

On Thu, Apr 6, 2017 at 1:18 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/04/06 13:08, Keith Fiske wrote:
> On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske wrote:
>> Only issue I see with this, and I'm not sure if it is an issue, is what
>> happens to that default constraint clause when 1000s of partitions start
>> getting added? From what I gather the default's constraint is built based
>> off the cumulative opposite of all other child constraints. I don't
>> understand the code well enough to see what it's actually doing, but if
>> there are no gaps, is the method used smart enough to aggregate all the
>> child constraints to make a simpler constraint that is simply outside the
>> current min/max boundaries? If so, for serial/time range partitioning this
>> should typically work out fine since there are rarely gaps. This actually
>> seems more of an issue for list partitioning where each child is a distinct
>> value or range of values that are completely arbitrary. Won't that check
>> and re-evaluation of the default's constraint just get worse and worse as
>> more children are added? Is there really even a need for the default to
>> have an opposite constraint like this? Not sure on how the planner works
>> with partitioning now, but wouldn't it be better to first check all
>> non-default children for a match the same as it does now without a default
>> and, failing that, then route to the default if one is declared? The
>> default should accept any data then so I don't see the need for the
>> constraint unless it's required for the current implementation. If that's
>> the case, could that be changed?

Unless I misread your last sentence, I think there might be some
confusion.  Currently, the partition constraint (think of these as you
would of user-defined check constraints) is needed for two reasons: 1. to
prevent direct insertion of rows into the default partition for which a
non-default partition exists; no two partitions should ever have duplicate
rows.  2. so that planner can use the constraint to determine if the
default partition needs to be scanned for a query using constraint
exclusion; no need, for example, to scan the default partition if the
query requests only key=3 rows and a partition for the same exists (no
other partition should have key=3 rows by definition, not even the
default).  As things stand today, planner needs to look at every partition
individually for using constraint exclusion to possibly exclude it, *even*
with declarative partitioning and that would include the default partition.

Forgot about constraint exclusion. My follow up email that you answered below was addressing the prevention of data to the default if there was no constraint on the default. I guess my main concern was with how manageable that cumulative opposite constraint of the default would be over time, especially with list partitioning. And also that it's smart enough to consolidate constraint conditions to simplify things if it's found that two or more conditions cover a continuous range.
 

> Actually, thinking on this more, I realized this does again come back to
> the lack of a global index. Without the constraint, data could be put
> directly into the default that could technically conflict with the
> partition scheme elsewhere. Perhaps, instead of the constraint, inserts
> directly to the default could be prevented on the user level. Writing to
> valid children directly certainly has its place, but been thinking about
> it, and I can't see any reason why one would ever want to write directly to
> the default. It's use case seems to be around being a sort of temporary
> storage until that data can be moved to a valid location. Would still need
> to allow removal of data, though.

As mentioned above, the default partition will not allow directly
inserting a row whose key maps to some existing (non-default) partition.

As far as tuple-routing is concerned, it will choose the default partition
only if no other partition is found for the key.  Tuple-routing doesn't
use the partition constraints directly per se, like one of the two things
mentioned above do.  One could say that tuple-routing assigns the incoming
rows to partitions such that their individual partition constraints are
not violated.
 
Finally, we don't yet offer global guarantees for constraints like unique.
 The only guarantee that's in place is that no two partitions can contain
the same partition key.

Thanks,
Amit



Re: [HACKERS] Adding support for Default partition in partitioning

From
Keith Fiske
Date:

On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Thanks a lot for testing and reporting this. Please find attached an updated patch with the fix. The patch also contains a fix
regarding operator used at the time of creating expression as default partition constraint. This was notified offlist by Amit Langote.

Thank you,
Rahila Syed


Could probably use some more extensive testing, but all examples I had on my previously mentioned blog post are now working.

Keith

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:

Hi Rahila,


With your latest patch:

Consider a case when a table is partitioned on a boolean key.

Even when there are existing separate partitions for 'true' and

'false', still default partition can be created.


I think this should not be allowed.


Consider following case:


postgres=# CREATE TABLE list_partitioned (               

    a bool

) PARTITION BY LIST (a);

CREATE TABLE

postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN ('false');

CREATE TABLE

postgres=# CREATE TABLE part_2 PARTITION OF list_partitioned FOR VALUES IN ('true');

CREATE TABLE

postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);

CREATE TABLE


The creation of table part_default should have failed instead.


Thanks,

Jeevan Ladhe



On Thu, Apr 6, 2017 at 9:37 PM, Keith Fiske <keith@omniti.com> wrote:

On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Thanks a lot for testing and reporting this. Please find attached an updated patch with the fix. The patch also contains a fix
regarding operator used at the time of creating expression as default partition constraint. This was notified offlist by Amit Langote.

Thank you,
Rahila Syed


Could probably use some more extensive testing, but all examples I had on my previously mentioned blog post are now working.

Keith


Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Rahila,

I tried to review the code, and here are some of my early comments:

1.
When I configure using "-Werror", I see unused variable in function DefineRelation:

tablecmds.c: In function ‘DefineRelation’:
tablecmds.c:761:17: error: unused variable ‘partdesc’ [-Werror=unused-variable]
   PartitionDesc partdesc;
                 ^

2.
Typo in comment:
+ /*
+ * When adding a list partition after default partition, scan the
+ * default partiton for rows satisfying the new partition
+ * constraint. If found dont allow addition of a new partition.
+ * Otherwise continue with the creation of new  partition.
+ */

partition
don't

3.
I think instead of a period '.', it will be good if you can use semicolon ';'
in following declaration similar to the comment for 'null_index'.

+ int def_index; /* Index of the default list partition. -1 for
+ * range partitioned tables */

4.
You may want to consider 80 column alignment for changes done in function
get_qual_from_partbound, and other places as applicable.

5.
It would be good if the patch has some test coverage that explains what is
being achieved, what kind of error handling is done etc.

6.
There are some places having code like following:

+ Node *value = lfirst(c);
  Const   *val = lfirst(c);
  PartitionListValue *list_value = NULL;
 
+ if (IsA(value, DefElem))

The additional variable is not needed and you can call IsA on val itself.

7.
Also, in places like below where you are just trying to check for node is a
DefaultElem, you can avoid an extra variable:

+ foreach(cell1, bspec->listdatums)
+ {
+ Node *value = lfirst(cell1);
+ if (IsA(value, DefElem))
+ {
+ def_elem = true;
+ *defid = inhrelid;
+ }

Can be written as:
+ foreach(cell1, bspec->listdatums)
+ {
+ if (IsA(lfirst(cell1), DefElem))
+ {
+ def_elem = true;
+ *defid = inhrelid;
+ }
+ }


Regards,
Jeevan Ladhe



On Mon, Apr 10, 2017 at 8:12 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:

Hi Rahila,


With your latest patch:

Consider a case when a table is partitioned on a boolean key.

Even when there are existing separate partitions for 'true' and

'false', still default partition can be created.


I think this should not be allowed.


Consider following case:


postgres=# CREATE TABLE list_partitioned (               

    a bool

) PARTITION BY LIST (a);

CREATE TABLE

postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN ('false');

CREATE TABLE

postgres=# CREATE TABLE part_2 PARTITION OF list_partitioned FOR VALUES IN ('true');

CREATE TABLE

postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);

CREATE TABLE


The creation of table part_default should have failed instead.


Thanks,

Jeevan Ladhe



On Thu, Apr 6, 2017 at 9:37 PM, Keith Fiske <keith@omniti.com> wrote:

On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Thanks a lot for testing and reporting this. Please find attached an updated patch with the fix. The patch also contains a fix
regarding operator used at the time of creating expression as default partition constraint. This was notified offlist by Amit Langote.

Thank you,
Rahila Syed


Could probably use some more extensive testing, but all examples I had on my previously mentioned blog post are now working.

Keith



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Mon, Apr 10, 2017 at 8:12 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi Rahila,
>
>
> With your latest patch:
>
> Consider a case when a table is partitioned on a boolean key.
>
> Even when there are existing separate partitions for 'true' and
>
> 'false', still default partition can be created.
>
>
> I think this should not be allowed.

Well, boolean columns can have "NULL" values which will go into
default partition if no NULL partition exists. So, probably we should
add check for NULL partition there.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Ashutosh,


On Tue, Apr 11, 2017 at 6:02 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Mon, Apr 10, 2017 at 8:12 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi Rahila,
>
>
> With your latest patch:
>
> Consider a case when a table is partitioned on a boolean key.
>
> Even when there are existing separate partitions for 'true' and
>
> 'false', still default partition can be created.
>
>
> I think this should not be allowed.

Well, boolean columns can have "NULL" values which will go into
default partition if no NULL partition exists. So, probably we should
add check for NULL partition there.

I have checked for NULLs too, and the default partition can be created even when there are partitions for each TRUE, FALSE and NULL.

Consider the example below:

postgres=# CREATE TABLE list_partitioned (               
    a bool
) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN ('false');
CREATE TABLE
postgres=# CREATE TABLE part_2 PARTITION OF list_partitioned FOR VALUES IN ('true');
CREATE TABLE
postgres=# CREATE TABLE part_3 PARTITION OF list_partitioned FOR VALUES IN (null);
CREATE TABLE
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT);
CREATE TABLE

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Tue, Apr 11, 2017 at 9:41 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have checked for NULLs too, and the default partition can be created even
> when there are partitions for each TRUE, FALSE and NULL.
>
> Consider the example below:
>
> postgres=# CREATE TABLE list_partitioned (
>     a bool
> ) PARTITION BY LIST (a);
> CREATE TABLE
> postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN
> ('false');
> CREATE TABLE
> postgres=# CREATE TABLE part_2 PARTITION OF list_partitioned FOR VALUES IN
> ('true');
> CREATE TABLE
> postgres=# CREATE TABLE part_3 PARTITION OF list_partitioned FOR VALUES IN
> (null);
> CREATE TABLE
> postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE

In my opinion, that's absolutely fine, and it would be very strange to
try to prevent it.  The partitioning method shouldn't have specific
knowledge of the properties of individual data types.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, Apr 6, 2017 at 1:17 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> I like the idea about having DEFAULT partition for the range partition. With
> the
> way partition is designed it can have holes into range partition. I think
> DEFAULT
> for the range partition is a good idea, generally when the range having
> holes. When
> range is serial then of course DEFAULT partition doen't much sense.

Yes, I like that idea, too.  I think the DEFAULT partition should be
allowed to be created for either range or list partitioning regardless
of whether we think there are any holes, but if you create a DEFAULT
partition when there are no holes, you just won't be able to put any
data into it.  It's silly, but it's not worth the code that it would
take to try to prevent it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Thanks a lot for testing and reporting this. Please find attached an updated
> patch with the fix. The patch also contains a fix
> regarding operator used at the time of creating expression as default
> partition constraint. This was notified offlist by Amit Langote.

I think that the syntax for this patch should probably be revised.
Right now the proposal is for:

CREATE TABLE .. PARTITION OF ... FOR VALUES IN (DEFAULT);

But that's not a good idea for several reasons.  For one thing, you
can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
For another thing, this kind of syntax won't generalize to range
partitioning, which we've talked about making this feature support.
Maybe something like:

CREATE TABLE .. PARTITION OF .. DEFAULT;

This patch makes the assumption throughout that any DefElem represents
the word DEFAULT, which is true in the patch as written but doesn't
seem very future-proof.  I think the "def" in "DefElem" stands for
"definition" or "define" or something like that, so this is actually
pretty confusing.  Maybe we should introduce a dedicated node type to
represent a default-specification in the parser grammar.  If not, then
let's at least encapsulate the test a little better, e.g. by adding
isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
also whether the name is DEFAULT as expected.  BTW, we typically use
lower-case internally, so if we stick with this representation it
should really be "default" not "DEFAULT".

Useless hunk:

+    bool        has_def;        /* Is there a default partition?
Currently false
+                                 * for a range partitioned table */
+    int            def_index;        /* Index of the default list
partition. -1 for
+                                 * range partitioned tables */

Why abbreviate "default" to def here?  Seems pointless.

+                    if (found_def)
+                    {
+                        if (mapping[def_index] == -1)
+                            mapping[def_index] = next_index++;
+                    }

Consider &&

@@ -717,7 +754,6 @@ check_new_partition_bound(char *relname, Relation
parent, Node *bound)                        }                    }                }
-                break;            }

+     * default partiton for rows satisfying the new partition

Spelling.

+     * constraint. If found dont allow addition of a new partition.

Missing apostrophe.

+        defrel = heap_open(defid, AccessShareLock);
+        tupdesc = CreateTupleDescCopy(RelationGetDescr(defrel));
+
+        /* Build expression execution states for partition check quals */
+        partqualstate = ExecPrepareCheck(partConstraint,
+                        estate);
+
+        econtext = GetPerTupleExprContext(estate);
+        snapshot = RegisterSnapshot(GetLatestSnapshot());

Definitely not safe against concurrency, since AccessShareLock won't
exclude somebody else's update.  In fact, it won't even cover somebody
else's already-in-flight transaction.

+                errmsg("new default partition constraint is violated
by some row")));

Normally in such cases we try to give more detail using
ExecBuildSlotValueDescription.

+    bool        is_def = true;

This variable starts out true and is never set to any value other than
true.  Just get rid of it and, in the one place where it is currently
used, write "true".  That's shorter and clearer.

+    inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock);

If it's actually safe to do this with no lock, there ought to be a
comment with a very compelling explanation of why it's safe.

+        boundspec = (Node *) stringToNode(TextDatumGetCString(datum));
+        bspec = (PartitionBoundSpec *)boundspec;

There's not really a reason to cast the result of stringToNode() to
Node * and then turn around and cast it to PartitionBoundSpec *.  Just
cast it directly to whatever it needs to be.  And use the new castNode
macro.

+        foreach(cell1, bspec->listdatums)
+        {
+            Node *value = lfirst(cell1);
+            if (IsA(value, DefElem))
+            {
+                def_elem = true;
+                *defid = inhrelid;
+            }
+        }
+        if (def_elem)
+        {
+            ReleaseSysCache(tuple);
+            continue;
+        }
+        foreach(cell3, bspec->listdatums)
+        {
+            Node *value = lfirst(cell3);
+            boundspecs = lappend(boundspecs, value);
+        }
+        ReleaseSysCache(tuple);
+    }
+    foreach(cell4, spec->listdatums)
+    {
+        Node *value = lfirst(cell4);
+        boundspecs = lappend(boundspecs, value);
+    }

cell1, cell2, cell3, and cell4 are not very clear variable names.
Between that and the lack of comments, this is not easy to understand.
It's sort of spaghetti logic, too.  The if (def_elem) test continues
early, but if the point is that the loop using cell3 shouldn't execute
in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
... } } instead of reiterating the ReleaseSysCache in two places?

+                /* Collect bound spec nodes in a list. This is done
if the partition is
+                 * a default partition. In case of default partition,
constraint is formed
+                 * by performing <> operation over the partition
constraints of the
+                 * existing partitions.
+                 */

I doubt that handles NULLs properly.

+                inhoids =
find_inheritance_children(RelationGetRelid(parent), NoLock);

Again, no lock?  Really?

The logic which follows looks largely cut-and-pasted, which makes me
think you need to do some refactoring here to make it more clear
what's going on, so that you have the relevant logic in just one
place.  It seems wrong anyway to shove all of this logic specific to
the default case into get_qual_from_partbound() when the logic for the
non-default case is inside get_qual_for_list.  Where there were 2
lines of code before you've now got something like 30.

+        if(get_negator(operoid) == InvalidOid)
+            elog(ERROR, "no negator found for partition operator %u",
+                 operoid);

I really doubt that's OK.  elog() shouldn't be reachable, but this
will be reachable if the partitioning operator does not have a
negator.  And there's the NULL-handling issue I mentioned above, too.

+            if (partdesc->boundinfo->has_def && key->strategy
+                == PARTITION_STRATEGY_LIST)
+                result = parent->indexes[partdesc->boundinfo->def_index];

Testing for PARTITION_STRATEGY_LIST here seems unnecessary.  If
has_def (or has_default, as it probably should be) isn't allowed for
range partitions, then it's redundant; if it is allowed, then that
case should be handled too.  Also, at this point we've already set
*failed_at and *failed_slot; presumably you'd want to make this check
before you get to that point.

I suspect there are quite a few more problems here in addition to the
ones mentioned above, but I don't think it makes sense to spend too
much time searching for them until some of this basic stuff is cleaned
up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello,

Thank you for reviewing.

>But that's not a good idea for several reasons.  For one thing, you
>can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
>For another thing, this kind of syntax won't generalize to range
>partitioning, which we've talked about making this feature support.
>Maybe something like:

>CREATE TABLE .. PARTITION OF .. DEFAULT;

I agree that the syntax should be changed to also support range partitioning.

Following can also be considered as it specifies more clearly that the
partition holds default values.

CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;

>Maybe we should introduce a dedicated node type to
>represent a default-specification in the parser grammar.  If not, then
>let's at least encapsulate the test a little better, e.g. by adding
>isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
>also whether the name is DEFAULT as expected.  BTW, we typically use
>lower-case internally, so if we stick with this representation it
>should really be "default" not "DEFAULT".

isDefaultPartitionBound() function is created in the attached patch which
checks for both node type and name.

>Why abbreviate "default" to def here?  Seems pointless.
Corrected in the attached.

>Consider &&
Fixed.

>+     * default partiton for rows satisfying the new partition
>Spelling.
Fixed.

>Missing apostrophe
Fixed.

>Definitely not safe against concurrency, since AccessShareLock won't
>exclude somebody else's update.  In fact, it won't even cover somebody
>else's already-in-flight transaction
Changed it to AccessExclusiveLock

>Normally in such cases we try to give more detail using
>ExecBuildSlotValueDescription.
This function is used in execMain.c and the error is being
reported in partition.c.
Do you mean the error reporting should be moved into execMain.c
to use ExecBuildSlotValueDescription?

>This variable starts out true and is never set to any value other than
>true.  Just get rid of it and, in the one place where it is currently
>used, write "true".  That's shorter and clearer.
Fixed.

>There's not really a reason to cast the result of stringToNode() to
>Node * and then turn around and cast it to PartitionBoundSpec *.  Just
>cast it directly to whatever it needs to be.  And use the new castNode
>macro
Fixed. castNode macro takes as input Node * whereas stringToNode() takes string.
IIUC, castNode cant be used here.

>The if (def_elem) test continues
>early, but if the point is that the loop using cell3 shouldn't execute
>in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
>... } } instead of reiterating the ReleaseSysCache in two places?
Fixed in the attached.

I will respond to further comments in following email.


On Thu, Apr 13, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Thanks a lot for testing and reporting this. Please find attached an updated
> patch with the fix. The patch also contains a fix
> regarding operator used at the time of creating expression as default
> partition constraint. This was notified offlist by Amit Langote.

I think that the syntax for this patch should probably be revised.
Right now the proposal is for:

CREATE TABLE .. PARTITION OF ... FOR VALUES IN (DEFAULT);

But that's not a good idea for several reasons.  For one thing, you
can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
For another thing, this kind of syntax won't generalize to range
partitioning, which we've talked about making this feature support.
Maybe something like:

CREATE TABLE .. PARTITION OF .. DEFAULT;

This patch makes the assumption throughout that any DefElem represents
the word DEFAULT, which is true in the patch as written but doesn't
seem very future-proof.  I think the "def" in "DefElem" stands for
"definition" or "define" or something like that, so this is actually
pretty confusing.  Maybe we should introduce a dedicated node type to
represent a default-specification in the parser grammar.  If not, then
let's at least encapsulate the test a little better, e.g. by adding
isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
also whether the name is DEFAULT as expected.  BTW, we typically use
lower-case internally, so if we stick with this representation it
should really be "default" not "DEFAULT".

Useless hunk:

+    bool        has_def;        /* Is there a default partition?
Currently false
+                                 * for a range partitioned table */
+    int            def_index;        /* Index of the default list
partition. -1 for
+                                 * range partitioned tables */

Why abbreviate "default" to def here?  Seems pointless.

+                    if (found_def)
+                    {
+                        if (mapping[def_index] == -1)
+                            mapping[def_index] = next_index++;
+                    }

Consider &&

@@ -717,7 +754,6 @@ check_new_partition_bound(char *relname, Relation
parent, Node *bound)
                         }
                     }
                 }
-
                 break;
             }

+     * default partiton for rows satisfying the new partition

Spelling.

+     * constraint. If found dont allow addition of a new partition.

Missing apostrophe.

+        defrel = heap_open(defid, AccessShareLock);
+        tupdesc = CreateTupleDescCopy(RelationGetDescr(defrel));
+
+        /* Build expression execution states for partition check quals */
+        partqualstate = ExecPrepareCheck(partConstraint,
+                        estate);
+
+        econtext = GetPerTupleExprContext(estate);
+        snapshot = RegisterSnapshot(GetLatestSnapshot());

Definitely not safe against concurrency, since AccessShareLock won't
exclude somebody else's update.  In fact, it won't even cover somebody
else's already-in-flight transaction.

+                errmsg("new default partition constraint is violated
by some row")));

Normally in such cases we try to give more detail using
ExecBuildSlotValueDescription.

+    bool        is_def = true;

This variable starts out true and is never set to any value other than
true.  Just get rid of it and, in the one place where it is currently
used, write "true".  That's shorter and clearer.

+    inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock);

If it's actually safe to do this with no lock, there ought to be a
comment with a very compelling explanation of why it's safe.

+        boundspec = (Node *) stringToNode(TextDatumGetCString(datum));
+        bspec = (PartitionBoundSpec *)boundspec;

There's not really a reason to cast the result of stringToNode() to
Node * and then turn around and cast it to PartitionBoundSpec *.  Just
cast it directly to whatever it needs to be.  And use the new castNode
macro.

+        foreach(cell1, bspec->listdatums)
+        {
+            Node *value = lfirst(cell1);
+            if (IsA(value, DefElem))
+            {
+                def_elem = true;
+                *defid = inhrelid;
+            }
+        }
+        if (def_elem)
+        {
+            ReleaseSysCache(tuple);
+            continue;
+        }
+        foreach(cell3, bspec->listdatums)
+        {
+            Node *value = lfirst(cell3);
+            boundspecs = lappend(boundspecs, value);
+        }
+        ReleaseSysCache(tuple);
+    }
+    foreach(cell4, spec->listdatums)
+    {
+        Node *value = lfirst(cell4);
+        boundspecs = lappend(boundspecs, value);
+    }

cell1, cell2, cell3, and cell4 are not very clear variable names.
Between that and the lack of comments, this is not easy to understand.
It's sort of spaghetti logic, too.  The if (def_elem) test continues
early, but if the point is that the loop using cell3 shouldn't execute
in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
... } } instead of reiterating the ReleaseSysCache in two places?

+                /* Collect bound spec nodes in a list. This is done
if the partition is
+                 * a default partition. In case of default partition,
constraint is formed
+                 * by performing <> operation over the partition
constraints of the
+                 * existing partitions.
+                 */

I doubt that handles NULLs properly.

+                inhoids =
find_inheritance_children(RelationGetRelid(parent), NoLock);

Again, no lock?  Really?

The logic which follows looks largely cut-and-pasted, which makes me
think you need to do some refactoring here to make it more clear
what's going on, so that you have the relevant logic in just one
place.  It seems wrong anyway to shove all of this logic specific to
the default case into get_qual_from_partbound() when the logic for the
non-default case is inside get_qual_for_list.  Where there were 2
lines of code before you've now got something like 30.

+        if(get_negator(operoid) == InvalidOid)
+            elog(ERROR, "no negator found for partition operator %u",
+                 operoid);

I really doubt that's OK.  elog() shouldn't be reachable, but this
will be reachable if the partitioning operator does not have a
negator.  And there's the NULL-handling issue I mentioned above, too.

+            if (partdesc->boundinfo->has_def && key->strategy
+                == PARTITION_STRATEGY_LIST)
+                result = parent->indexes[partdesc->boundinfo->def_index];

Testing for PARTITION_STRATEGY_LIST here seems unnecessary.  If
has_def (or has_default, as it probably should be) isn't allowed for
range partitions, then it's redundant; if it is allowed, then that
case should be handled too.  Also, at this point we've already set
*failed_at and *failed_slot; presumably you'd want to make this check
before you get to that point.

I suspect there are quite a few more problems here in addition to the
ones mentioned above, but I don't think it makes sense to spend too
much time searching for them until some of this basic stuff is cleaned
up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Following can also be considered as it specifies more clearly that the
> partition holds default values.
>
> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;

Yes, that could be done.  But I don't think it's correct to say that
the partition holds default values.  Let's back up and ask what the
word "default" means.  The relevant definition (according to Google or
whoever they stole it from) is:

a preselected option adopted by a computer program or other mechanism
when no alternative is specified by the user or programmer.

So, a default *value* is the value that is used when no alternative is
specified by the user or programmer. We have that concept, but it's
not what we're talking about here: that's configured by applying the
DEFAULT property to a column.  Here, we're talking about the default
*partition*, or in other words the *partition* that is used when no
alternative is specified by the user or programmer.  So, that's why I
proposed the syntax I did.  The partition doesn't contain default
values; it is itself a default.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>> Following can also be considered as it specifies more clearly that the
>> partition holds default values.
>>
>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>
> Yes, that could be done.  But I don't think it's correct to say that
> the partition holds default values.  Let's back up and ask what the
> word "default" means.  The relevant definition (according to Google or
> whoever they stole it from) is:
>
> a preselected option adopted by a computer program or other mechanism
> when no alternative is specified by the user or programmer.
>
> So, a default *value* is the value that is used when no alternative is
> specified by the user or programmer. We have that concept, but it's
> not what we're talking about here: that's configured by applying the
> DEFAULT property to a column.  Here, we're talking about the default
> *partition*, or in other words the *partition* that is used when no
> alternative is specified by the user or programmer.  So, that's why I
> proposed the syntax I did.  The partition doesn't contain default
> values; it is itself a default.

Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Rahila,

I tried to go through your v7 patch, and following are my comments:

1.
With -Werrors I see following compilation failure:

parse_utilcmd.c: In function ‘transformPartitionBound’:
parse_utilcmd.c:3309:4: error: implicit declaration of function ‘isDefaultPartitionBound’ [-Werror=implicit-function-declaration]
    if (!(isDefaultPartitionBound(value)))
    ^
cc1: all warnings being treated as errors

You need to include, "catalog/partitions.h".

2.
Once I made above change pass, I see following error:
tablecmds.c: In function ‘DefineRelation’:
tablecmds.c:762:17: error: unused variable ‘partdesc’ [-Werror=unused-variable]
   PartitionDesc partdesc;
                 ^
cc1: all warnings being treated as errors

3.
Please remove the extra line at the end of the function check_new_partition_bound:
+ MemoryContextSwitchTo(oldCxt);
+ heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
+ heap_close(defrel, AccessExclusiveLock);
+ ExecDropSingleTupleTableSlot(tupslot);
+ }
+
 }
 
4.
In generate_qual_for_defaultpart() you do not need 2 pointers for looping over
bound specs:
+ ListCell   *cell1;
+ ListCell   *cell3;
You can iterate twice using one pointer itself.

Same is for:
+ ListCell   *cell2;
+ ListCell   *cell4;

Similarly, in get_qual_from_partbound(), you can use one pointer below,
instead of cell1 and cell3:
+ PartitionBoundSpec *bspec;
+ ListCell *cell1;
+ ListCell *cell3;

5.
Should this have a break in if block?
+ foreach(cell1, bspec->listdatums)
+ {
+ Node *value = lfirst(cell1);
+ if (isDefaultPartitionBound(value))
+ {
+ def_elem = true;
+ *defid = inhrelid;
+ }
+ }

6.
I am wondering, isn't it possible to retrieve the has_default and default_index
here to find out if default partition exists and if exist then find it's oid
using rd_partdesc, that would avoid above(7) loop to check if partition bound is
default.

7.
The output of describe needs to be improved.
Consider following case:
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5,4,4,4,6,2);
ERROR:  relation "list_partitioned" does not exist
postgres=# CREATE TABLE list_partitioned (               
    a int
) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5,4,4,4,6,2);
CREATE TABLE
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT, 3, DEFAULT, 3, DEFAULT);
CREATE TABLE
postgres=# \d+ part_1;
                                  Table "public.part_1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 a      | integer |           |          |         | plain   |              | 
Partition of: list_partitioned FOR VALUES IN (4, 5, 6, 2)

postgres=# \d+ part_default;
                               Table "public.part_default"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 a      | integer |           |          |         | plain   |              | 
Partition of: list_partitioned FOR VALUES IN (DEFAULT3DEFAULTDEFAULT)

As you can see in above example, part_1 has multiple entries for 4 while
creating the partition, but describe shows only one entry for 4 in values set.
Similarly, part_default has multiple entries for 3 and DEFAULT while creating
the partition, but the describe shows a weired output. Instead, we should have
just one entry saying "VALUES IN (DEFAULT, 3)":

postgres=# \d+ part_default;
                               Table "public.part_default"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 a      | integer |           |          |         | plain   |              | 
Partition of: list_partitioned FOR VALUES IN (DEFAULT, 3)

8.
Following call to find_inheritance_children() in generate_qual_for_defaultpart()
is an overhead, instead we can simply use an array of oids in rd_partdesc.

+ spec = (PartitionBoundSpec *) bound;
+
+ inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock);
+
+ foreach(cell2, inhoids)

Same is for the call in get_qual_from_partbound:

+ /* Collect bound spec nodes in a list. This is done if the partition is
+ * a default partition. In case of default partition, constraint is formed
+ * by performing <> operation over the partition constraints of the
+ * existing partitions.
+ */
+ inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock);
+ foreach(cell2, inhoids)

9.
How about rephrasing following error message:
postgres=# CREATE TABLE part_2 PARTITION OF list_partitioned FOR VALUES IN (14);
ERROR:  new default partition constraint is violated by some row

To,
"ERROR: some existing row in default partition violates new default partition constraint"

10.
Additionally, I did test your given sample test in first post and the one
mentioned by Keith; both of them are passing without errors.
Also, I did a pg_dump test and it is dumping the partitions and data correctly. 
But as mentioned earlier, it would be good if you have them in your patch.

I will do further review and let you know comments if any.

Regards,
Jeevan Ladhe

On Mon, Apr 24, 2017 at 5:44 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>> Following can also be considered as it specifies more clearly that the
>> partition holds default values.
>>
>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>
> Yes, that could be done.  But I don't think it's correct to say that
> the partition holds default values.  Let's back up and ask what the
> word "default" means.  The relevant definition (according to Google or
> whoever they stole it from) is:
>
> a preselected option adopted by a computer program or other mechanism
> when no alternative is specified by the user or programmer.
>
> So, a default *value* is the value that is used when no alternative is
> specified by the user or programmer. We have that concept, but it's
> not what we're talking about here: that's configured by applying the
> DEFAULT property to a column.  Here, we're talking about the default
> *partition*, or in other words the *partition* that is used when no
> alternative is specified by the user or programmer.  So, that's why I
> proposed the syntax I did.  The partition doesn't contain default
> values; it is itself a default.

Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:


On Mon, Apr 24, 2017 at 5:44 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>> Following can also be considered as it specifies more clearly that the
>> partition holds default values.
>>
>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>
> Yes, that could be done.  But I don't think it's correct to say that
> the partition holds default values.  Let's back up and ask what the
> word "default" means.  The relevant definition (according to Google or
> whoever they stole it from) is:
>
> a preselected option adopted by a computer program or other mechanism
> when no alternative is specified by the user or programmer.
>
> So, a default *value* is the value that is used when no alternative is
> specified by the user or programmer. We have that concept, but it's
> not what we're talking about here: that's configured by applying the
> DEFAULT property to a column.  Here, we're talking about the default
> *partition*, or in other words the *partition* that is used when no
> alternative is specified by the user or programmer.  So, that's why I
> proposed the syntax I did.  The partition doesn't contain default
> values; it is itself a default.

Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.

+1

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>> Following can also be considered as it specifies more clearly that the
>>> partition holds default values.
>>>
>>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>>
>> The partition doesn't contain default values; it is itself a default.
>
> Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.

I suspect it could be done as of now, but I'm a little worried that it
might create grammar conflicts in the future as we extend the syntax
further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
word DEFAULT appears in the same position where we'd normally have FOR
VALUES, and so the parser will definitely be able to figure out what's
going on.  When it gets to that position, it will see FOR or it will
see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
DEFAULT PARTITION OF ..., then we have action at a distance: whether
or not the word DEFAULT is present before PARTITION affects which
tokens are legal after the parent table name.  bison isn't always very
smart about that kind of thing.  No particular dangers come to mind at
the moment, but it makes me nervous anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/04/25 5:16, Robert Haas wrote:
> On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>>> Following can also be considered as it specifies more clearly that the
>>>> partition holds default values.
>>>>
>>>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>>>
>>> The partition doesn't contain default values; it is itself a default.
>>
>> Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.
> 
> I suspect it could be done as of now, but I'm a little worried that it
> might create grammar conflicts in the future as we extend the syntax
> further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
> word DEFAULT appears in the same position where we'd normally have FOR
> VALUES, and so the parser will definitely be able to figure out what's
> going on.  When it gets to that position, it will see FOR or it will
> see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
> DEFAULT PARTITION OF ..., then we have action at a distance: whether
> or not the word DEFAULT is present before PARTITION affects which
> tokens are legal after the parent table name.  bison isn't always very
> smart about that kind of thing.  No particular dangers come to mind at
> the moment, but it makes me nervous anyway.

+1 to CREATE TABLE .. PARTITION OF .. DEFAULT

Thanks,
Amit




Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Tue, Apr 25, 2017 at 1:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>>> Following can also be considered as it specifies more clearly that the
>>>> partition holds default values.
>>>>
>>>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>>>
>>> The partition doesn't contain default values; it is itself a default.
>>
>> Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.
>
> I suspect it could be done as of now, but I'm a little worried that it
> might create grammar conflicts in the future as we extend the syntax
> further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
> word DEFAULT appears in the same position where we'd normally have FOR
> VALUES, and so the parser will definitely be able to figure out what's
> going on.  When it gets to that position, it will see FOR or it will
> see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
> DEFAULT PARTITION OF ..., then we have action at a distance: whether
> or not the word DEFAULT is present before PARTITION affects which
> tokens are legal after the parent table name.

As long as we handle this at the transformation stage, it shouldn't be
a problem. The grammar would be something like
CREATE TABLE ... optDefault PARTITION OF ...

If user specifies DEFAULT PARTITION OF t1 FOR VALUES ..., parser will
allow that but in transformation stage, we will detect it and throw an
error "DEFAULT partitions can not contains partition bound clause" or
something like that. Also, documentation would say that DEFAULT and
partition bound specification are not allowed together.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/04/25 14:20, Ashutosh Bapat wrote:
> On Tue, Apr 25, 2017 at 1:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>>>> Following can also be considered as it specifies more clearly that the
>>>>> partition holds default values.
>>>>>
>>>>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>>>>
>>>> The partition doesn't contain default values; it is itself a default.
>>>
>>> Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.
>>
>> I suspect it could be done as of now, but I'm a little worried that it
>> might create grammar conflicts in the future as we extend the syntax
>> further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
>> word DEFAULT appears in the same position where we'd normally have FOR
>> VALUES, and so the parser will definitely be able to figure out what's
>> going on.  When it gets to that position, it will see FOR or it will
>> see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
>> DEFAULT PARTITION OF ..., then we have action at a distance: whether
>> or not the word DEFAULT is present before PARTITION affects which
>> tokens are legal after the parent table name.
> 
> As long as we handle this at the transformation stage, it shouldn't be
> a problem. The grammar would be something like
> CREATE TABLE ... optDefault PARTITION OF ...
> 
> If user specifies DEFAULT PARTITION OF t1 FOR VALUES ..., parser will
> allow that but in transformation stage, we will detect it and throw an
> error "DEFAULT partitions can not contains partition bound clause" or
> something like that. Also, documentation would say that DEFAULT and
> partition bound specification are not allowed together.

FWIW, one point to like about PARTITION OF .. DEFAULT is that it wouldn't
need us to do things you mention we could do.  A point to not like it may
be that it might read backwards to some users, but then the DEFAULT
PARTITION OF have all those possibilities of error-causing user input.

Thanks,
Amit




Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Tue, Apr 25, 2017 at 1:20 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> I suspect it could be done as of now, but I'm a little worried that it
>> might create grammar conflicts in the future as we extend the syntax
>> further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
>> word DEFAULT appears in the same position where we'd normally have FOR
>> VALUES, and so the parser will definitely be able to figure out what's
>> going on.  When it gets to that position, it will see FOR or it will
>> see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
>> DEFAULT PARTITION OF ..., then we have action at a distance: whether
>> or not the word DEFAULT is present before PARTITION affects which
>> tokens are legal after the parent table name.
>
> As long as we handle this at the transformation stage, it shouldn't be
> a problem. The grammar would be something like
> CREATE TABLE ... optDefault PARTITION OF ...
>
> If user specifies DEFAULT PARTITION OF t1 FOR VALUES ..., parser will
> allow that but in transformation stage, we will detect it and throw an
> error "DEFAULT partitions can not contains partition bound clause" or
> something like that. Also, documentation would say that DEFAULT and
> partition bound specification are not allowed together.

That's not what I'm concerned about.  I'm concerned about future
syntax additions resulting in difficult-to-resolve grammar conflicts.
For an example what of what I mean, consider this example:

http://postgr.es/m/9253.1295031520@sss.pgh.pa.us

The whole thread is worth a read.  In brief, I wanted to add syntax
like LOCK VIEW xyz, and it wasn't possible to do that without breaking
backward compatibility.  In a nutshell, the problem with making that
syntax work was that LOCK VIEW NOWAIT would then potentially mean
either lock a table called VIEW with the NOWAIT option, or else it
might mean lock a view called NOWAIT.  If the NOWAIT key word were not
allowed at the end or if the TABLE keyword were mandatory, then it
would be possible to make it work, but because we already decided both
to make the TABLE keyword optional and allow an optional NOWAIT
keyword at the end, the syntax couldn't be further extended in the way
that I wanted to extend it without confusing the parser.  The problem
was basically unfixable without breaking backward compatibility, and
we gave up.  I don't want to make the same mistake with the default
partition syntax that we made with the LOCK TABLE syntax.

Aside from unfixable grammar conflicts, there's another way that this
kind of syntax can become problematic, which is when you end up with
multiple optional keywords in the same part of the syntax.  For an
example of that, see
http://postgr.es/m/603c8f070905231747j2e099c23hef8eafbf26682e5f@mail.gmail.com
- that discusses the problems with EXPLAIN; we later ran into the same
problem with VACUUM.  Users can't remember whether they are supposed
to type VACUUM FULL VERBOSE or VACUUM VERBOSE FULL and trying to
support both creates parser problems and tends to involve adding too
many keywords, so we switched to a new and more extensible syntax for
future options.

Now, you may think that that's never going to happen in this case.
What optional keyword other than DEFAULT could we possibly want to add
just before PARTITION OF?  TBH, I don't know.  I can't think of
anything else we might want to put in that position right now.  But
considering that it's been less than six months since the original
syntax was committed and we've already thought of ONE thing we might
want to put there, it seems hard to rule out the possibility that we
might eventually think of more, and then we will have exactly the same
kind of problem that we've had in the past with other commands.  Let's
head the problem off at the pass and pick a syntax which isn't
vulnerable to that sort of issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Tue, Apr 25, 2017 at 11:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Apr 25, 2017 at 1:20 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>>> I suspect it could be done as of now, but I'm a little worried that it
>>> might create grammar conflicts in the future as we extend the syntax
>>> further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
>>> word DEFAULT appears in the same position where we'd normally have FOR
>>> VALUES, and so the parser will definitely be able to figure out what's
>>> going on.  When it gets to that position, it will see FOR or it will
>>> see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
>>> DEFAULT PARTITION OF ..., then we have action at a distance: whether
>>> or not the word DEFAULT is present before PARTITION affects which
>>> tokens are legal after the parent table name.
>>
>> As long as we handle this at the transformation stage, it shouldn't be
>> a problem. The grammar would be something like
>> CREATE TABLE ... optDefault PARTITION OF ...
>>
>> If user specifies DEFAULT PARTITION OF t1 FOR VALUES ..., parser will
>> allow that but in transformation stage, we will detect it and throw an
>> error "DEFAULT partitions can not contains partition bound clause" or
>> something like that. Also, documentation would say that DEFAULT and
>> partition bound specification are not allowed together.
>
> That's not what I'm concerned about.  I'm concerned about future
> syntax additions resulting in difficult-to-resolve grammar conflicts.
> For an example what of what I mean, consider this example:
>
> http://postgr.es/m/9253.1295031520@sss.pgh.pa.us
>
> The whole thread is worth a read.  In brief, I wanted to add syntax
> like LOCK VIEW xyz, and it wasn't possible to do that without breaking
> backward compatibility.  In a nutshell, the problem with making that
> syntax work was that LOCK VIEW NOWAIT would then potentially mean
> either lock a table called VIEW with the NOWAIT option, or else it
> might mean lock a view called NOWAIT.  If the NOWAIT key word were not
> allowed at the end or if the TABLE keyword were mandatory, then it
> would be possible to make it work, but because we already decided both
> to make the TABLE keyword optional and allow an optional NOWAIT
> keyword at the end, the syntax couldn't be further extended in the way
> that I wanted to extend it without confusing the parser.  The problem
> was basically unfixable without breaking backward compatibility, and
> we gave up.  I don't want to make the same mistake with the default
> partition syntax that we made with the LOCK TABLE syntax.
>
> Aside from unfixable grammar conflicts, there's another way that this
> kind of syntax can become problematic, which is when you end up with
> multiple optional keywords in the same part of the syntax.  For an
> example of that, see
> http://postgr.es/m/603c8f070905231747j2e099c23hef8eafbf26682e5f@mail.gmail.com
> - that discusses the problems with EXPLAIN; we later ran into the same
> problem with VACUUM.  Users can't remember whether they are supposed
> to type VACUUM FULL VERBOSE or VACUUM VERBOSE FULL and trying to
> support both creates parser problems and tends to involve adding too
> many keywords, so we switched to a new and more extensible syntax for
> future options.
>

Thanks for taking out time for detailed explanation.

> Now, you may think that that's never going to happen in this case.
> What optional keyword other than DEFAULT could we possibly want to add
> just before PARTITION OF?

Since the grammar before PARTITION OF is shared with CREATE TABLE ()
there is high chance that we will have an optional keyword unrelated
to partitioning there. I take back my proposal for that syntax.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello Jeevan,

Thank you for comments.

I will include your comments in the updated patch.

>7.The output of describe needs to be improved.

The syntax for DEFAULT partitioning is still under discussion. This comment wont be
applicable if the syntax is changed.

>6.
>I am wondering, isn't it possible to retrieve the has_default and default_index
>here to find out if default partition exists and if exist then find it's oid
>using rd_partdesc, that would avoid above(7) loop to check if partition bound is
>default
The checks are used to find the default partition bound and
exclude it from the list of partition bounds to form the partition constraint.
This cant be accomplished by using has_default flag.
isDefaultPartitionBound() is written to accomplish that.


>8.
>Following call to find_inheritance_children() in generate_qual_for_defaultpart()
>is an overhead, instead we can simply use an array of oids in rd_partdesc.
I think using find_inheritance_children() will take into consideration concurrent
drop of a partition which the value in rd_partdesc will not.

Thank you,
Rahila Syed


Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
>I suspect it could be done as of now, but I'm a little worried that it
>might create grammar conflicts in the future as we extend the syntax
>further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
>word DEFAULT appears in the same position where we'd normally have FOR
>VALUES, and so the parser will definitely be able to figure out what's
>going on.  When it gets to that position, it will see FOR or it will
>see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
>DEFAULT PARTITION OF ..., then we have action at a distance: whether
>or not the word DEFAULT is present before PARTITION affects which
>tokens are legal after the parent table name.  bison isn't always very
>smart about that kind of thing.  No particular dangers come to mind at
>the moment, but it makes me nervous anyway.

+1 for CREATE TABLE..PARTITION OF...DEFAULT  syntax.
I think substituting DEFAULT for FOR VALUES is appropriate as
both cases are mutually exclusive.

One more thing that needs consideration is should default partitions be
partitioned further? Other databases allow default partitions to be
partitioned further. I think, its normal for users to expect the data in
default partitions to also be divided into sub partitions.  So
it should be supported.
My colleague Rajkumar Raghuwanshi brought to my notice the current patch
does not handle this correctly.
I will include this in the updated patch if there is no objection.

On the other hand if sub partitions of a default partition is to be prohibited,
an error should be thrown if PARTITION BY is specified after DEFAULT. 


Thank you,
Rahila Syed



On Tue, Apr 25, 2017 at 1:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>> Following can also be considered as it specifies more clearly that the
>>> partition holds default values.
>>>
>>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
>>
>> The partition doesn't contain default values; it is itself a default.
>
> Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more natural.

I suspect it could be done as of now, but I'm a little worried that it
might create grammar conflicts in the future as we extend the syntax
further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
word DEFAULT appears in the same position where we'd normally have FOR
VALUES, and so the parser will definitely be able to figure out what's
going on.  When it gets to that position, it will see FOR or it will
see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
DEFAULT PARTITION OF ..., then we have action at a distance: whether
or not the word DEFAULT is present before PARTITION affects which
tokens are legal after the parent table name.  bison isn't always very
smart about that kind of thing.  No particular dangers come to mind at
the moment, but it makes me nervous anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, Apr 27, 2017 at 8:49 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>I suspect it could be done as of now, but I'm a little worried that it
>>might create grammar conflicts in the future as we extend the syntax
>>further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
>>word DEFAULT appears in the same position where we'd normally have FOR
>>VALUES, and so the parser will definitely be able to figure out what's
>>going on.  When it gets to that position, it will see FOR or it will
>>see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
>>DEFAULT PARTITION OF ..., then we have action at a distance: whether
>>or not the word DEFAULT is present before PARTITION affects which
>>tokens are legal after the parent table name.  bison isn't always very
>>smart about that kind of thing.  No particular dangers come to mind at
>>the moment, but it makes me nervous anyway.
>
> +1 for CREATE TABLE..PARTITION OF...DEFAULT  syntax.
> I think substituting DEFAULT for FOR VALUES is appropriate as
> both cases are mutually exclusive.
>
> One more thing that needs consideration is should default partitions be
> partitioned further? Other databases allow default partitions to be
> partitioned further. I think, its normal for users to expect the data in
> default partitions to also be divided into sub partitions.  So
> it should be supported.
> My colleague Rajkumar Raghuwanshi brought to my notice the current patch
> does not handle this correctly.
> I will include this in the updated patch if there is no objection.
>
> On the other hand if sub partitions of a default partition is to be
> prohibited,
> an error should be thrown if PARTITION BY is specified after DEFAULT.

I see no reason to prohibit it.  You can further partition any other
kind of partition, so there seems to be no reason to disallow it in
this one case.

Are you also working on extending this to work with range
partitioning?  Because I think that would be good to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
"Sven R. Kunze"
Date:
On 27.04.2017 15:07, Robert Haas wrote:
> On Thu, Apr 27, 2017 at 8:49 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>> +1 for CREATE TABLE..PARTITION OF...DEFAULT syntax.
>> I think substituting DEFAULT for FOR VALUES is appropriate as
>> both cases are mutually exclusive.

Just to make sound a little rounder:

CREATE TABLE ... PARTITION OF ... AS DEFAULT
CREATE TABLE ... PARTITION OF ... AS FALLBACK

or

CREATE TABLE ... PARTITION OF ... AS DEFAULT PARTITION
CREATE TABLE ... PARTITION OF ... AS FALLBACK PARTITION


Could any of these be feasible?


Sven



Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hi,

On Apr 27, 2017 18:37, "Robert Haas" <robertmhaas@gmail.com> wrote:


Are you also working on extending this to work with range
partitioning?  Because I think that would be good to do.


Currently I am working on review comments and bug fixes for the
default list partitioning patch. After that I can start with default
partition for range partitioning.
 
Thank you,
Rahila Syed

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, Apr 27, 2017 at 3:15 PM, Sven R. Kunze <srkunze@mail.de> wrote:
> On 27.04.2017 15:07, Robert Haas wrote:
>> On Thu, Apr 27, 2017 at 8:49 AM, Rahila Syed <rahilasyed90@gmail.com>
>> wrote:
>>>
>>> +1 for CREATE TABLE..PARTITION OF...DEFAULT syntax.
>>> I think substituting DEFAULT for FOR VALUES is appropriate as
>>> both cases are mutually exclusive.
>
> Just to make sound a little rounder:
>
> CREATE TABLE ... PARTITION OF ... AS DEFAULT
> CREATE TABLE ... PARTITION OF ... AS FALLBACK
>
> or
>
> CREATE TABLE ... PARTITION OF ... AS DEFAULT PARTITION
> CREATE TABLE ... PARTITION OF ... AS FALLBACK PARTITION
>
> Could any of these be feasible?

FALLBACK wouldn't be a good choice because it's not an existing parser
keyword.  We could probably insert AS before DEFAULT and/or PARTITION
afterwards, but they sort of seem like noise words.  SQL seems to have
been invented by people who didn't have any trouble remembering really
long command strings, but brevity is not without some merit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
"Sven R. Kunze"
Date:
On 27.04.2017 22:21, Robert Haas wrote:
On Thu, Apr 27, 2017 at 3:15 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Just to make sound a little rounder:

CREATE TABLE ... PARTITION OF ... AS DEFAULT
CREATE TABLE ... PARTITION OF ... AS FALLBACK

or

CREATE TABLE ... PARTITION OF ... AS DEFAULT PARTITION
CREATE TABLE ... PARTITION OF ... AS FALLBACK PARTITION

Could any of these be feasible?
FALLBACK wouldn't be a good choice because it's not an existing parser
keyword.  We could probably insert AS before DEFAULT and/or PARTITION
afterwards, but they sort of seem like noise words.

You are right. I just thought it would make this variant more acceptable as people expressed concerns about understandability of the command.

SQL seems to have
been invented by people who didn't have any trouble remembering really
long command strings, but brevity is not without some merit.

For me, it's exactly the thing I like about SQL. It makes for an easy learning curve.


Sven

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Please find attached updated patch with review comments by Robert and Jeevan implemented.

The newly proposed syntax
CREATE TABLE .. PARTITION OF .. DEFAULT has got most votes on this thread.

If there is no more objection, I will go ahead and include that in the patch.

Thank you,
Rahila Syed

On Mon, Apr 24, 2017 at 2:40 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Thank you for reviewing.

>But that's not a good idea for several reasons.  For one thing, you
>can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
>For another thing, this kind of syntax won't generalize to range
>partitioning, which we've talked about making this feature support.
>Maybe something like:

>CREATE TABLE .. PARTITION OF .. DEFAULT;

I agree that the syntax should be changed to also support range partitioning.

Following can also be considered as it specifies more clearly that the
partition holds default values.

CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;

>Maybe we should introduce a dedicated node type to
>represent a default-specification in the parser grammar.  If not, then
>let's at least encapsulate the test a little better, e.g. by adding
>isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
>also whether the name is DEFAULT as expected.  BTW, we typically use
>lower-case internally, so if we stick with this representation it
>should really be "default" not "DEFAULT".

isDefaultPartitionBound() function is created in the attached patch which
checks for both node type and name.

>Why abbreviate "default" to def here?  Seems pointless.
Corrected in the attached.

>Consider &&
Fixed.

>+     * default partiton for rows satisfying the new partition
>Spelling.
Fixed.

>Missing apostrophe
Fixed.

>Definitely not safe against concurrency, since AccessShareLock won't
>exclude somebody else's update.  In fact, it won't even cover somebody
>else's already-in-flight transaction
Changed it to AccessExclusiveLock

>Normally in such cases we try to give more detail using
>ExecBuildSlotValueDescription.
This function is used in execMain.c and the error is being
reported in partition.c.
Do you mean the error reporting should be moved into execMain.c
to use ExecBuildSlotValueDescription?

>This variable starts out true and is never set to any value other than
>true.  Just get rid of it and, in the one place where it is currently
>used, write "true".  That's shorter and clearer.
Fixed.

>There's not really a reason to cast the result of stringToNode() to
>Node * and then turn around and cast it to PartitionBoundSpec *.  Just
>cast it directly to whatever it needs to be.  And use the new castNode
>macro
Fixed. castNode macro takes as input Node * whereas stringToNode() takes string.
IIUC, castNode cant be used here.

>The if (def_elem) test continues
>early, but if the point is that the loop using cell3 shouldn't execute
>in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
>... } } instead of reiterating the ReleaseSysCache in two places?
Fixed in the attached.

I will respond to further comments in following email.


On Thu, Apr 13, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Thanks a lot for testing and reporting this. Please find attached an updated
> patch with the fix. The patch also contains a fix
> regarding operator used at the time of creating expression as default
> partition constraint. This was notified offlist by Amit Langote.

I think that the syntax for this patch should probably be revised.
Right now the proposal is for:

CREATE TABLE .. PARTITION OF ... FOR VALUES IN (DEFAULT);

But that's not a good idea for several reasons.  For one thing, you
can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
For another thing, this kind of syntax won't generalize to range
partitioning, which we've talked about making this feature support.
Maybe something like:

CREATE TABLE .. PARTITION OF .. DEFAULT;

This patch makes the assumption throughout that any DefElem represents
the word DEFAULT, which is true in the patch as written but doesn't
seem very future-proof.  I think the "def" in "DefElem" stands for
"definition" or "define" or something like that, so this is actually
pretty confusing.  Maybe we should introduce a dedicated node type to
represent a default-specification in the parser grammar.  If not, then
let's at least encapsulate the test a little better, e.g. by adding
isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
also whether the name is DEFAULT as expected.  BTW, we typically use
lower-case internally, so if we stick with this representation it
should really be "default" not "DEFAULT".

Useless hunk:

+    bool        has_def;        /* Is there a default partition?
Currently false
+                                 * for a range partitioned table */
+    int            def_index;        /* Index of the default list
partition. -1 for
+                                 * range partitioned tables */

Why abbreviate "default" to def here?  Seems pointless.

+                    if (found_def)
+                    {
+                        if (mapping[def_index] == -1)
+                            mapping[def_index] = next_index++;
+                    }

Consider &&

@@ -717,7 +754,6 @@ check_new_partition_bound(char *relname, Relation
parent, Node *bound)
                         }
                     }
                 }
-
                 break;
             }

+     * default partiton for rows satisfying the new partition

Spelling.

+     * constraint. If found dont allow addition of a new partition.

Missing apostrophe.

+        defrel = heap_open(defid, AccessShareLock);
+        tupdesc = CreateTupleDescCopy(RelationGetDescr(defrel));
+
+        /* Build expression execution states for partition check quals */
+        partqualstate = ExecPrepareCheck(partConstraint,
+                        estate);
+
+        econtext = GetPerTupleExprContext(estate);
+        snapshot = RegisterSnapshot(GetLatestSnapshot());

Definitely not safe against concurrency, since AccessShareLock won't
exclude somebody else's update.  In fact, it won't even cover somebody
else's already-in-flight transaction.

+                errmsg("new default partition constraint is violated
by some row")));

Normally in such cases we try to give more detail using
ExecBuildSlotValueDescription.

+    bool        is_def = true;

This variable starts out true and is never set to any value other than
true.  Just get rid of it and, in the one place where it is currently
used, write "true".  That's shorter and clearer.

+    inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock);

If it's actually safe to do this with no lock, there ought to be a
comment with a very compelling explanation of why it's safe.

+        boundspec = (Node *) stringToNode(TextDatumGetCString(datum));
+        bspec = (PartitionBoundSpec *)boundspec;

There's not really a reason to cast the result of stringToNode() to
Node * and then turn around and cast it to PartitionBoundSpec *.  Just
cast it directly to whatever it needs to be.  And use the new castNode
macro.

+        foreach(cell1, bspec->listdatums)
+        {
+            Node *value = lfirst(cell1);
+            if (IsA(value, DefElem))
+            {
+                def_elem = true;
+                *defid = inhrelid;
+            }
+        }
+        if (def_elem)
+        {
+            ReleaseSysCache(tuple);
+            continue;
+        }
+        foreach(cell3, bspec->listdatums)
+        {
+            Node *value = lfirst(cell3);
+            boundspecs = lappend(boundspecs, value);
+        }
+        ReleaseSysCache(tuple);
+    }
+    foreach(cell4, spec->listdatums)
+    {
+        Node *value = lfirst(cell4);
+        boundspecs = lappend(boundspecs, value);
+    }

cell1, cell2, cell3, and cell4 are not very clear variable names.
Between that and the lack of comments, this is not easy to understand.
It's sort of spaghetti logic, too.  The if (def_elem) test continues
early, but if the point is that the loop using cell3 shouldn't execute
in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
... } } instead of reiterating the ReleaseSysCache in two places?

+                /* Collect bound spec nodes in a list. This is done
if the partition is
+                 * a default partition. In case of default partition,
constraint is formed
+                 * by performing <> operation over the partition
constraints of the
+                 * existing partitions.
+                 */

I doubt that handles NULLs properly.

+                inhoids =
find_inheritance_children(RelationGetRelid(parent), NoLock);

Again, no lock?  Really?

The logic which follows looks largely cut-and-pasted, which makes me
think you need to do some refactoring here to make it more clear
what's going on, so that you have the relevant logic in just one
place.  It seems wrong anyway to shove all of this logic specific to
the default case into get_qual_from_partbound() when the logic for the
non-default case is inside get_qual_for_list.  Where there were 2
lines of code before you've now got something like 30.

+        if(get_negator(operoid) == InvalidOid)
+            elog(ERROR, "no negator found for partition operator %u",
+                 operoid);

I really doubt that's OK.  elog() shouldn't be reachable, but this
will be reachable if the partitioning operator does not have a
negator.  And there's the NULL-handling issue I mentioned above, too.

+            if (partdesc->boundinfo->has_def && key->strategy
+                == PARTITION_STRATEGY_LIST)
+                result = parent->indexes[partdesc->boundinfo->def_index];

Testing for PARTITION_STRATEGY_LIST here seems unnecessary.  If
has_def (or has_default, as it probably should be) isn't allowed for
range partitions, then it's redundant; if it is allowed, then that
case should be handled too.  Also, at this point we've already set
*failed_at and *failed_slot; presumably you'd want to make this check
before you get to that point.

I suspect there are quite a few more problems here in addition to the
ones mentioned above, but I don't think it makes sense to spend too
much time searching for them until some of this basic stuff is cleaned
up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
amul sul
Date:
On Tue, May 2, 2017 at 9:33 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Please find attached updated patch with review comments by Robert and Jeevan
> implemented.
>
Patch v8 got clean apply on latest head but server got crash at data
insert in the following test:

-- Create test table
CREATE TABLE test ( a int, b date) PARTITION BY LIST (a);
CREATE TABLE p1 PARTITION OF test FOR VALUES IN  (DEFAULT) PARTITION BY LIST(b);
CREATE TABLE p11 PARTITION OF p1 FOR VALUES IN (DEFAULT);

-- crash
INSERT INTO test VALUES (210,'1/1/2002');

Regards,
Amul



Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello Amul,

Thanks for reporting. Please find attached an updated patch which fixes the above.
Also, the attached patch includes changes in syntax proposed upthread.

The syntax implemented in this patch is as follows,

CREATE TABLE p11 PARTITION OF p1 DEFAULT;

Thank you,
Rahila Syed

On Thu, May 4, 2017 at 4:02 PM, amul sul <sulamul@gmail.com> wrote:
On Tue, May 2, 2017 at 9:33 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Please find attached updated patch with review comments by Robert and Jeevan
> implemented.
>
Patch v8 got clean apply on latest head but server got crash at data
insert in the following test:

-- Create test table
CREATE TABLE test ( a int, b date) PARTITION BY LIST (a);
CREATE TABLE p1 PARTITION OF test FOR VALUES IN  (DEFAULT) PARTITION BY LIST(b);
CREATE TABLE p11 PARTITION OF p1 FOR VALUES IN (DEFAULT);

-- crash
INSERT INTO test VALUES (210,'1/1/2002');

Regards,
Amul

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rajkumar Raghuwanshi
Date:
On Thu, May 4, 2017 at 5:14 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
The syntax implemented in this patch is as follows,

CREATE TABLE p11 PARTITION OF p1 DEFAULT;

Applied v9 patches, table description still showing old pattern of default partition. Is it expected?

create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;

\d+ lpd
                                         Table "public.lpd"
 Column |       Type        | Collation | Nullable | Default | Storage  | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer           |           |          |         | plain    |              |
 b      | integer           |           |          |         | plain    |              |
 c      | character varying |           |          |         | extended |              |
Partition key: LIST (a)
Partitions: lpd_d FOR VALUES IN (DEFAULT)

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Rahila,

I have started reviewing your latest patch, and here are my initial comments:

1.
In following block, we can just do with def_index, and we do not need found_def
flag. We can check if def_index is -1 or not to decide if default partition is
present.

@@ -166,6 +172,8 @@ RelationBuildPartitionDesc(Relation rel)
  /* List partitioning specific */
  PartitionListValue **all_values = NULL;
  bool found_null = false;
+ bool found_def = false;
+ int def_index = -1;
  int null_index = -1;

2.
In check_new_partition_bound, in case of PARTITION_STRATEGY_LIST, remove
following duplicate declaration of boundinfo, because it is confusing and after
your changes it is not needed as its not getting overridden in the if block
locally.
if (partdesc->nparts > 0)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
ListCell   *cell;


3.
In following function isDefaultPartitionBound, first statement "return false"
is not needed.

+ * Returns true if the partition bound is default
+ */
+bool
+isDefaultPartitionBound(Node *value)
+{
+ if (IsA(value, DefElem))
+ {
+ DefElem *defvalue = (DefElem *) value;
+ if(!strcmp(defvalue->defname, "DEFAULT"))
+ return true;
+ return false;
+ }
+ return false;
+}

4.
As mentioned in my previous set of comments, following if block inside a loop
in get_qual_for_default needs a break:

+ foreach(cell1, bspec->listdatums)
+ {
+ Node *value = lfirst(cell1);
+ if (isDefaultPartitionBound(value))
+ {
+ def_elem = true;
+ *defid  = inhrelid;
+ }
+ }

5.
In the grammar the rule default_part_list is not needed:
  
+default_partition:
+ DEFAULT  { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
+
+default_part_list:
+ default_partition { $$ = list_make1($1); }
+ ;
+

Instead you can simply declare default_partition as a list and write it as:

default_partition:
DEFAULT
{
Node *def = (Node *)makeDefElem("DEFAULT", NULL, @1);
$$ = list_make1(def);
}

6.
You need to change the output of the describe command, which is currently as below: postgres=# \d+ test; Table "public.test" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+---------+--------------+------------- a | integer | | | | plain | | b | date | | | | plain | | Partition key: LIST (a) Partitions: pd FOR VALUES IN (DEFAULT), test_p1 FOR VALUES IN (4, 5) What about changing the Paritions output as below: Partitions: pd DEFAULT, test_p1 FOR VALUES IN (4, 5)

7.
You need to handle tab completion for DEFAULT.
e.g.
If I partially type following command:
CREATE TABLE pd PARTITION OF test DEFA
and then press tab, I get following completion:
CREATE TABLE pd PARTITION OF test FOR VALUES

I did some primary testing and did not find any problem so far.

I will review and test further and let you know my comments.

Regards,
Jeevan Ladhe

On Thu, May 4, 2017 at 6:09 PM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
On Thu, May 4, 2017 at 5:14 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
The syntax implemented in this patch is as follows,

CREATE TABLE p11 PARTITION OF p1 DEFAULT;

Applied v9 patches, table description still showing old pattern of default partition. Is it expected?

create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;

\d+ lpd
                                         Table "public.lpd"
 Column |       Type        | Collation | Nullable | Default | Storage  | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer           |           |          |         | plain    |              |
 b      | integer           |           |          |         | plain    |              |
 c      | character varying |           |          |         | extended |              |
Partition key: LIST (a)
Partitions: lpd_d FOR VALUES IN (DEFAULT)

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
While reviewing the code I was trying to explore more cases, and I here comes an
open question to my mind:
should we allow the default partition table to be partitioned further?

If we allow it(as in the current case) then observe following case, where I
have defined a default partitioned which is further partitioned on a different
column.

postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, 8);
CREATE TABLE
postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY LIST(b);
CREATE TABLE
postgres=# INSERT INTO test VALUES (20, 24, 12);
ERROR:  no partition of relation "test_pd" found for row
DETAIL:  Partition key of the failing row contains (b) = (24).

Note, that it does not allow inserting the tuple(20, 24, 12) because though a=20
would fall in default partition i.e. test_pd, table test_pd itself is further
partitioned and does not have any partition satisfying b=24.
Further if I define a default partition for table test_pd, the the tuple gets inserted.

Doesn't this sound like the whole purpose of having DEFAULT partition on test
table is defeated?

Any views? 

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
"Sven R. Kunze"
Date:

Hi Rahila,

still thinking about the syntax (sorry):

On 04.05.2017 13:44, Rahila Syed wrote:

[...] The syntax implemented in this patch is as follows,

CREATE TABLE p11 PARTITION OF p1 DEFAULT;

Rewriting the following:

On Thu, May 4, 2017 at 4:02 PM, amul sul <sulamul@gmail.com> wrote:
[...] CREATE TABLE p1 PARTITION OF test FOR VALUES IN  (DEFAULT) PARTITION BY LIST(b); [...]

It yields

CREATE TABLE p1 PARTITION OF test DEFAULT PARTITION BY LIST(b);

This reads to me like "DEFAULT PARTITION".


I can imagine a lot of confusion when those queries are encountered in the wild. I know this thread is about creating a default partition but I were to propose a minor change in the following direction, I think confusion would be greatly avoided:

CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITIONED BY LIST(b);

I know it's a bit longer but I think those 4 characters might serve readability in the long term. It was especially confusing to see PARTITION in two positions serving two different functions.

Sven

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rajkumar Raghuwanshi
Date:
Hi Rahila,

pg_restore is failing for default partition, dump file still storing old syntax of default partition.

create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;

create database bkp owner 'edb';
grant all on DATABASE bkp to edb;

--take plain dump of existing database
\! ./pg_dump -f lpd_test.sql -Fp -d postgres

--restore plain backup to new database bkp
\! ./psql -f lpd_test.sql -d bkp

psql:lpd_test.sql:63: ERROR:  syntax error at or near "DEFAULT"
LINE 2: FOR VALUES IN (DEFAULT);
                       ^


vi lpd_test.sql

--
-- Name: lpd; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd (
    a integer,
    b integer,
    c character varying
)
PARTITION BY LIST (a);


ALTER TABLE lpd OWNER TO edb;

--
-- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd_d PARTITION OF lpd
FOR VALUES IN (DEFAULT);


ALTER TABLE lpd_d OWNER TO edb;


Thanks,
Rajkumar

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Rahila,

I am not able add a new partition if default partition is further partitioned
with default partition.

Consider example below:

postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, 8);
CREATE TABLE
postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY LIST(b);
CREATE TABLE
postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;
CREATE TABLE
postgres=# INSERT INTO test VALUES (20, 24, 12);
INSERT 0 1
postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);
ERROR:  could not open file "base/12335/16420": No such file or directory


Thanks,
Jeevan Ladhe

On Fri, May 5, 2017 at 11:55 AM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Rahila,

pg_restore is failing for default partition, dump file still storing old syntax of default partition.

create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;

create database bkp owner 'edb';
grant all on DATABASE bkp to edb;

--take plain dump of existing database
\! ./pg_dump -f lpd_test.sql -Fp -d postgres

--restore plain backup to new database bkp
\! ./psql -f lpd_test.sql -d bkp

psql:lpd_test.sql:63: ERROR:  syntax error at or near "DEFAULT"
LINE 2: FOR VALUES IN (DEFAULT);
                       ^


vi lpd_test.sql

--
-- Name: lpd; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd (
    a integer,
    b integer,
    c character varying
)
PARTITION BY LIST (a);


ALTER TABLE lpd OWNER TO edb;

--
-- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd_d PARTITION OF lpd
FOR VALUES IN (DEFAULT);


ALTER TABLE lpd_d OWNER TO edb;


Thanks,
Rajkumar

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
>I am not able add a new partition if default partition is further partitioned
>with default partition.

Thanks for reporting. I will fix this.

>pg_restore is failing for default partition, dump file still storing old syntax of default partition.
Thanks for reporting . I will fix this once the syntax is finalized.


On Fri, May 5, 2017 at 12:46 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi Rahila,

I am not able add a new partition if default partition is further partitioned
with default partition.

Consider example below:

postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, 8);
CREATE TABLE
postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY LIST(b);
CREATE TABLE
postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;
CREATE TABLE
postgres=# INSERT INTO test VALUES (20, 24, 12);
INSERT 0 1
postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);
ERROR:  could not open file "base/12335/16420": No such file or directory


Thanks,
Jeevan Ladhe

On Fri, May 5, 2017 at 11:55 AM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Rahila,

pg_restore is failing for default partition, dump file still storing old syntax of default partition.

create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;

create database bkp owner 'edb';
grant all on DATABASE bkp to edb;

--take plain dump of existing database
\! ./pg_dump -f lpd_test.sql -Fp -d postgres

--restore plain backup to new database bkp
\! ./psql -f lpd_test.sql -d bkp

psql:lpd_test.sql:63: ERROR:  syntax error at or near "DEFAULT"
LINE 2: FOR VALUES IN (DEFAULT);
                       ^


vi lpd_test.sql

--
-- Name: lpd; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd (
    a integer,
    b integer,
    c character varying
)
PARTITION BY LIST (a);


ALTER TABLE lpd OWNER TO edb;

--
-- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd_d PARTITION OF lpd
FOR VALUES IN (DEFAULT);


ALTER TABLE lpd_d OWNER TO edb;


Thanks,
Rajkumar


Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, May 4, 2017 at 4:28 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> While reviewing the code I was trying to explore more cases, and I here
> comes an
> open question to my mind:
> should we allow the default partition table to be partitioned further?

I think yes.  In general, you are allowed to partition a partition,
and I can't see any justification for restricting that for default
partitions when we allow it everywhere else.

> If we allow it(as in the current case) then observe following case, where I
> have defined a default partitioned which is further partitioned on a
> different
> column.
>
> postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
> CREATE TABLE
> postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7,
> 8);
> CREATE TABLE
> postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY
> LIST(b);
> CREATE TABLE
> postgres=# INSERT INTO test VALUES (20, 24, 12);
> ERROR:  no partition of relation "test_pd" found for row
> DETAIL:  Partition key of the failing row contains (b) = (24).
>
> Note, that it does not allow inserting the tuple(20, 24, 12) because though
> a=20
> would fall in default partition i.e. test_pd, table test_pd itself is
> further
> partitioned and does not have any partition satisfying b=24.

Right, that looks like correct behavior.  You would have gotten the
same result if you had tried to insert into test_pd directly.

> Further if I define a default partition for table test_pd, the the tuple
> gets inserted.

That also sounds correct.

> Doesn't this sound like the whole purpose of having DEFAULT partition on
> test
> table is defeated?

Not to me.  It's possible to do lots of silly things with partitioned
tables.  For example, one case that we talked about before is that you
can define a range partition for, say, VALUES (0) TO (100), and then
subpartition it and give the subpartitions bounds which are outside
the range 0-100.  That's obviously silly and will lead to failures
inserting tuples, but we chose not to try to prohibit it because it's
not really broken, just useless.  There are lots of similar cases
involving other features.  For example, you can apply an inherited
CHECK (false) constraint to a table, which makes it impossible for
that table or any of its children to ever contain any rows; that is
probably a dumb configuration.  You can create two unique indexes with
exactly the same definition; unless you're creating a new one with the
intent of dropping the old one, that doesn't make sense.  You can
define a trigger that always throws an ERROR and then another trigger
which runs later that modifies the tuple; the second will never be run
because the first one will always kill the transaction before we get
there.  Those things are all legal, but often unuseful.  Similarly
here.  Defining a default list partition and then subpartitioning it
by list is not likely to be a good schema design, but it doesn't mean
we should try to disallow it.  It is important to distinguish between
things that are actually *broken* (like a partitioning configuration
where the tuples that can be inserted into a partition manually differ
from the ones that are routed to it automatically) and things that are
merely *lame* (like creating a multi-level partitioning hierarchy when
a single level would have done the job just as well).  The former
should be prevented by the code, while the latter is at most a
documentation issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Robert,

Thanks for your explnation.

On Mon, May 8, 2017 at 9:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 4, 2017 at 4:28 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> While reviewing the code I was trying to explore more cases, and I here
> comes an
> open question to my mind:
> should we allow the default partition table to be partitioned further?

I think yes.  In general, you are allowed to partition a partition,
and I can't see any justification for restricting that for default
partitions when we allow it everywhere else.

> If we allow it(as in the current case) then observe following case, where I
> have defined a default partitioned which is further partitioned on a
> different
> column.
>
> postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
> CREATE TABLE
> postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7,
> 8);
> CREATE TABLE
> postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY
> LIST(b);
> CREATE TABLE
> postgres=# INSERT INTO test VALUES (20, 24, 12);
> ERROR:  no partition of relation "test_pd" found for row
> DETAIL:  Partition key of the failing row contains (b) = (24).
>
> Note, that it does not allow inserting the tuple(20, 24, 12) because though
> a=20
> would fall in default partition i.e. test_pd, table test_pd itself is
> further
> partitioned and does not have any partition satisfying b=24.

Right, that looks like correct behavior.  You would have gotten the
same result if you had tried to insert into test_pd directly.

> Further if I define a default partition for table test_pd, the the tuple
> gets inserted.

That also sounds correct.

> Doesn't this sound like the whole purpose of having DEFAULT partition on
> test
> table is defeated?

Not to me.  It's possible to do lots of silly things with partitioned
tables.  For example, one case that we talked about before is that you
can define a range partition for, say, VALUES (0) TO (100), and then
subpartition it and give the subpartitions bounds which are outside
the range 0-100.  That's obviously silly and will lead to failures
inserting tuples, but we chose not to try to prohibit it because it's
not really broken, just useless.  There are lots of similar cases
involving other features.  For example, you can apply an inherited
CHECK (false) constraint to a table, which makes it impossible for
that table or any of its children to ever contain any rows; that is
probably a dumb configuration.  You can create two unique indexes with
exactly the same definition; unless you're creating a new one with the
intent of dropping the old one, that doesn't make sense.  You can
define a trigger that always throws an ERROR and then another trigger
which runs later that modifies the tuple; the second will never be run
because the first one will always kill the transaction before we get
there.  Those things are all legal, but often unuseful.  Similarly
here.  Defining a default list partition and then subpartitioning it
by list is not likely to be a good schema design, but it doesn't mean
we should try to disallow it.  It is important to distinguish between
things that are actually *broken* (like a partitioning configuration
where the tuples that can be inserted into a partition manually differ
from the ones that are routed to it automatically) and things that are
merely *lame* (like creating a multi-level partitioning hierarchy when
a single level would have done the job just as well).  The former
should be prevented by the code, while the latter is at most a
documentation issue.

I agree with you that it is a user perspective on how he decides to do
partitions of already partitioned table, and also we should have a
demarcation between things to be handled by code and things to be
left as common-sense or ability to define a good schema.

I am ok with current behavior, provided we have atleast one-lineer in
documentation alerting the user that partitioning the default partition will
limit the ability of routing the tuples that do not fit in any other partitions.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, May 4, 2017 at 4:40 PM, Sven R. Kunze <srkunze@mail.de> wrote:
> It yields
>
> CREATE TABLE p1 PARTITION OF test DEFAULT PARTITION BY LIST(b);
>
> This reads to me like "DEFAULT PARTITION".
>
> I can imagine a lot of confusion when those queries are encountered in the
> wild. I know this thread is about creating a default partition but I were to
> propose a minor change in the following direction, I think confusion would
> be greatly avoided:
>
> CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITIONED BY LIST(b);
>
> I know it's a bit longer but I think those 4 characters might serve
> readability in the long term. It was especially confusing to see PARTITION
> in two positions serving two different functions.

Well, we certainly can't make that change just for default partitions.
I mean, that would be non-orthogonal, right?  You can't say that the
way to subpartition is to write "PARTITION BY strategy" when the table
unpartitioned or is a non-default partition but "PARTITIONED BY
strategy" when it is a default partition.  That would certainly not be
a good way of confusing users less, and would probably result in a
variety of special cases in places like ruleutils.c or pg_dump, plus
some weasel-wording in the documentation.  We COULD do a general
change from "CREATE TABLE table_name PARTITION BY strategy" to "CREATE
TABLE table_name PARTITIONED BY strategy".  I don't have any
particular arguments against that except that the current syntax is
more like Oracle, which might count for something, and maybe the fact
that we're a month after feature freeze.  Still, if we want to change
that, now would be the time; but I favor leaving it alone.

I don't have a big objection to adding AS.  If that's the majority
vote, fine; if not, that's OK, too.  I can see it might be a bit more
clear in the case you mention, but it might also just be a noise word
that we don't really need.  There don't seem to be many uses of AS
that would pose a risk of actual grammar conflicts here.  I can
imagine someone wanting to use CREATE TABLE ... PARTITION BY ... AS
SELECT ... to create and populate a partition in one command, but that
wouldn't be a conflict because it'd have to go AFTER the partition
specification.  In the DEFAULT case, you'd end up with something like

CREATE TABLE p1 PARTITION OF test AS DEFAULT AS <query>

...which is neither great nor horrible syntax-wise and maybe not such
a good thing to support anyway since it would have to lock the parent
to add the partition and then keep the lock on the parent while
populating the new child (ouch).

So I guess I'm still in favor of the CREATE TABLE p1 PARTITION OF test
DEFAULT syntax, but if it ends up being AS DEFAULT instead, I can live
with that.

Other opinions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
+1 for AS DEFAULT syntax if it helps in improving readability specially in following case

CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITION BY LIST(a);

Thank you,
Rahila Syed

On Tue, May 9, 2017 at 1:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 4, 2017 at 4:40 PM, Sven R. Kunze <srkunze@mail.de> wrote:
> It yields
>
> CREATE TABLE p1 PARTITION OF test DEFAULT PARTITION BY LIST(b);
>
> This reads to me like "DEFAULT PARTITION".
>
> I can imagine a lot of confusion when those queries are encountered in the
> wild. I know this thread is about creating a default partition but I were to
> propose a minor change in the following direction, I think confusion would
> be greatly avoided:
>
> CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITIONED BY LIST(b);
>
> I know it's a bit longer but I think those 4 characters might serve
> readability in the long term. It was especially confusing to see PARTITION
> in two positions serving two different functions.

Well, we certainly can't make that change just for default partitions.
I mean, that would be non-orthogonal, right?  You can't say that the
way to subpartition is to write "PARTITION BY strategy" when the table
unpartitioned or is a non-default partition but "PARTITIONED BY
strategy" when it is a default partition.  That would certainly not be
a good way of confusing users less, and would probably result in a
variety of special cases in places like ruleutils.c or pg_dump, plus
some weasel-wording in the documentation.  We COULD do a general
change from "CREATE TABLE table_name PARTITION BY strategy" to "CREATE
TABLE table_name PARTITIONED BY strategy".  I don't have any
particular arguments against that except that the current syntax is
more like Oracle, which might count for something, and maybe the fact
that we're a month after feature freeze.  Still, if we want to change
that, now would be the time; but I favor leaving it alone.

I don't have a big objection to adding AS.  If that's the majority
vote, fine; if not, that's OK, too.  I can see it might be a bit more
clear in the case you mention, but it might also just be a noise word
that we don't really need.  There don't seem to be many uses of AS
that would pose a risk of actual grammar conflicts here.  I can
imagine someone wanting to use CREATE TABLE ... PARTITION BY ... AS
SELECT ... to create and populate a partition in one command, but that
wouldn't be a conflict because it'd have to go AFTER the partition
specification.  In the DEFAULT case, you'd end up with something like

CREATE TABLE p1 PARTITION OF test AS DEFAULT AS <query>

...which is neither great nor horrible syntax-wise and maybe not such
a good thing to support anyway since it would have to lock the parent
to add the partition and then keep the lock on the parent while
populating the new child (ouch).

So I guess I'm still in favor of the CREATE TABLE p1 PARTITION OF test
DEFAULT syntax, but if it ends up being AS DEFAULT instead, I can live
with that.

Other opinions?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
>Hi Rahila,

>I am not able add a new partition if default partition is further partitioned
>with default partition.

>Consider example below:

>postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
>CREATE TABLE
>postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, 8);
>CREATE TABLE
>postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY LIST(b);
>CREATE TABLE
>postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;
>CREATE TABLE
>postgres=# INSERT INTO test VALUES (20, 24, 12);
>INSERT 0 1
>postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);
ERROR:  could not open file "base/12335/16420": No such file or directory

Regarding fix for this I think we need to prohibit this case. That is prohibit creation
of new partition after a default partition which is further partitioned.
Currently before adding a new partition after default partition all the rows of default
partition are scanned and if a row which matches the new partitions constraint exists
the new partition is not added.

If we allow this for default partition which is partitioned further, we will have to scan
all the partitions of default partition for matching rows which can slow down execution.

So to not hamper the performance, an error should be thrown in this case and user should
be expected to change his schema to avoid partitioning default partitions.

Kindly give your opinions.



On Fri, May 5, 2017 at 12:46 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi Rahila,

I am not able add a new partition if default partition is further partitioned
with default partition.

Consider example below:

postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7, 8);
CREATE TABLE
postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY LIST(b);
CREATE TABLE
postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;
CREATE TABLE
postgres=# INSERT INTO test VALUES (20, 24, 12);
INSERT 0 1
postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);
ERROR:  could not open file "base/12335/16420": No such file or directory


Thanks,
Jeevan Ladhe

On Fri, May 5, 2017 at 11:55 AM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Rahila,

pg_restore is failing for default partition, dump file still storing old syntax of default partition.

create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;

create database bkp owner 'edb';
grant all on DATABASE bkp to edb;

--take plain dump of existing database
\! ./pg_dump -f lpd_test.sql -Fp -d postgres

--restore plain backup to new database bkp
\! ./psql -f lpd_test.sql -d bkp

psql:lpd_test.sql:63: ERROR:  syntax error at or near "DEFAULT"
LINE 2: FOR VALUES IN (DEFAULT);
                       ^


vi lpd_test.sql

--
-- Name: lpd; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd (
    a integer,
    b integer,
    c character varying
)
PARTITION BY LIST (a);


ALTER TABLE lpd OWNER TO edb;

--
-- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd_d PARTITION OF lpd
FOR VALUES IN (DEFAULT);


ALTER TABLE lpd_d OWNER TO edb;


Thanks,
Rajkumar


Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Tue, May 9, 2017 at 9:26 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>Hi Rahila,
>
>>I am not able add a new partition if default partition is further
>> partitioned
>>with default partition.
>
>>Consider example below:
>
>>postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
>>CREATE TABLE
>>postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7,
>> 8);
>>CREATE TABLE
>>postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY
>> LIST(b);
>>CREATE TABLE
>>postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;
>>CREATE TABLE
>>postgres=# INSERT INTO test VALUES (20, 24, 12);
>>INSERT 0 1
>>postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);
> ERROR:  could not open file "base/12335/16420": No such file or directory
>
> Regarding fix for this I think we need to prohibit this case. That is
> prohibit creation
> of new partition after a default partition which is further partitioned.
> Currently before adding a new partition after default partition all the rows
> of default
> partition are scanned and if a row which matches the new partitions
> constraint exists
> the new partition is not added.
>
> If we allow this for default partition which is partitioned further, we will
> have to scan
> all the partitions of default partition for matching rows which can slow
> down execution.

I think this case should be allowed and I don't think it should
require scanning all the partitions of the default partition.  This is
no different than any other case where multiple levels of partitioning
are used.  First, you route the tuple at the root level; then, you
route it at the next level; and so on.  It shouldn't matter whether
the routing at the top level is to that level's default partition or
not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/05/10 2:09, Robert Haas wrote:
> On Tue, May 9, 2017 at 9:26 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>> Hi Rahila,
>>
>>> I am not able add a new partition if default partition is further
>>> partitioned
>>> with default partition.
>>
>>> Consider example below:
>>
>>> postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
>>> CREATE TABLE
>>> postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7,
>>> 8);
>>> CREATE TABLE
>>> postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY
>>> LIST(b);
>>> CREATE TABLE
>>> postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;
>>> CREATE TABLE
>>> postgres=# INSERT INTO test VALUES (20, 24, 12);
>>> INSERT 0 1
>>> postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);
>> ERROR:  could not open file "base/12335/16420": No such file or directory
>>
>> Regarding fix for this I think we need to prohibit this case. That is
>> prohibit creation
>> of new partition after a default partition which is further partitioned.
>> Currently before adding a new partition after default partition all the rows
>> of default
>> partition are scanned and if a row which matches the new partitions
>> constraint exists
>> the new partition is not added.
>>
>> If we allow this for default partition which is partitioned further, we will
>> have to scan
>> all the partitions of default partition for matching rows which can slow
>> down execution.
> 
> I think this case should be allowed

+1

> and I don't think it should
> require scanning all the partitions of the default partition.  This is
> no different than any other case where multiple levels of partitioning
> are used.  First, you route the tuple at the root level; then, you
> route it at the next level; and so on.  It shouldn't matter whether
> the routing at the top level is to that level's default partition or
> not.

It seems that adding a new partition at the same level as the default
partition will require scanning it or its (leaf) partitions if
partitioned.  Consider that p1, pd are partitions of a list-partitioned
table p accepting 1 and everything else, respectively, and pd is further
partitioned.  When adding p2 of p for 2, we need to scan the partitions of
pd to check if there are any (2, ...) rows.

As for fixing the reported issue whereby the partitioned default
partition's non-existent file is being accessed, it would help to take a
look at the code in ATExecAttachPartition() starting at the following:
   /*    * Set up to have the table be scanned to validate the partition    * constraint (see partConstraint above).
Ifit's a partitioned table, we    * instead schedule its leaf partitions to be scanned.    */   if (!skip_validate)
{

Thanks,
Amit




Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
>It seems that adding a new partition at the same level as the default
>partition will require scanning it or its (leaf) partitions if
>partitioned.  Consider that p1, pd are partitions of a list-partitioned
>table p accepting 1 and everything else, respectively, and pd is further
>partitioned.  When adding p2 of p for 2, we need to scan the partitions of
>pd to check if there are any (2, ...) rows.

 This is a better explanation. May be following sentence was confusing,
"That is prohibit creation of new partition after a default partition which is further partitioned"
Here, what I meant was default partition is partitioned further.

>As for fixing the reported issue whereby the partitioned default
>partition's non-existent file is being accessed, it would help to take a
>look at the code in ATExecAttachPartition() starting at the following:
OK. I get it now. If attach partition already supports scanning all the partitions before attach,
similar support should be provided in the case of adding a partition after default partition as well.

Thank you,
Rahila Syed

On Wed, May 10, 2017 at 6:42 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/05/10 2:09, Robert Haas wrote:
> On Tue, May 9, 2017 at 9:26 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>> Hi Rahila,
>>
>>> I am not able add a new partition if default partition is further
>>> partitioned
>>> with default partition.
>>
>>> Consider example below:
>>
>>> postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
>>> CREATE TABLE
>>> postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7,
>>> 8);
>>> CREATE TABLE
>>> postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY
>>> LIST(b);
>>> CREATE TABLE
>>> postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;
>>> CREATE TABLE
>>> postgres=# INSERT INTO test VALUES (20, 24, 12);
>>> INSERT 0 1
>>> postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);
>> ERROR:  could not open file "base/12335/16420": No such file or directory
>>
>> Regarding fix for this I think we need to prohibit this case. That is
>> prohibit creation
>> of new partition after a default partition which is further partitioned.
>> Currently before adding a new partition after default partition all the rows
>> of default
>> partition are scanned and if a row which matches the new partitions
>> constraint exists
>> the new partition is not added.
>>
>> If we allow this for default partition which is partitioned further, we will
>> have to scan
>> all the partitions of default partition for matching rows which can slow
>> down execution.
>
> I think this case should be allowed

+1

> and I don't think it should
> require scanning all the partitions of the default partition.  This is
> no different than any other case where multiple levels of partitioning
> are used.  First, you route the tuple at the root level; then, you
> route it at the next level; and so on.  It shouldn't matter whether
> the routing at the top level is to that level's default partition or
> not.

It seems that adding a new partition at the same level as the default
partition will require scanning it or its (leaf) partitions if
partitioned.  Consider that p1, pd are partitions of a list-partitioned
table p accepting 1 and everything else, respectively, and pd is further
partitioned.  When adding p2 of p for 2, we need to scan the partitions of
pd to check if there are any (2, ...) rows.

As for fixing the reported issue whereby the partitioned default
partition's non-existent file is being accessed, it would help to take a
look at the code in ATExecAttachPartition() starting at the following:

    /*
     * Set up to have the table be scanned to validate the partition
     * constraint (see partConstraint above).  If it's a partitioned table, we
     * instead schedule its leaf partitions to be scanned.
     */
    if (!skip_validate)
    {

Thanks,
Amit


Re: [HACKERS] Adding support for Default partition in partitioning

From
"Sven R. Kunze"
Date:
On 09.05.2017 09:19, Rahila Syed wrote:
+1 for AS DEFAULT syntax if it helps in improving readability specially in following case

CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITION BY LIST(a);

Thank you,
Rahila Syed

On Tue, May 9, 2017 at 1:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 4, 2017 at 4:40 PM, Sven R. Kunze <srkunze@mail.de> wrote:
> It yields
>
> CREATE TABLE p1 PARTITION OF test DEFAULT PARTITION BY LIST(b);
>
> This reads to me like "DEFAULT PARTITION".
>
> I can imagine a lot of confusion when those queries are encountered in the
> wild. I know this thread is about creating a default partition but I were to
> propose a minor change in the following direction, I think confusion would
> be greatly avoided:
>
> CREATE TABLE p1 PARTITION OF test AS DEFAULT PARTITIONED BY LIST(b);
>
> I know it's a bit longer but I think those 4 characters might serve
> readability in the long term. It was especially confusing to see PARTITION
> in two positions serving two different functions.

Well, we certainly can't make that change just for default partitions.
I mean, that would be non-orthogonal, right?  You can't say that the
way to subpartition is to write "PARTITION BY strategy" when the table
unpartitioned or is a non-default partition but "PARTITIONED BY
strategy" when it is a default partition.  That would certainly not be
a good way of confusing users less, and would probably result in a
variety of special cases in places like ruleutils.c or pg_dump, plus
some weasel-wording in the documentation.  We COULD do a general
change from "CREATE TABLE table_name PARTITION BY strategy" to "CREATE
TABLE table_name PARTITIONED BY strategy".  I don't have any
particular arguments against that except that the current syntax is
more like Oracle, which might count for something, and maybe the fact
that we're a month after feature freeze.  Still, if we want to change
that, now would be the time; but I favor leaving it alone.


You are definitely right. Changing it here would require to change it everywhere AND thus to loose syntax parity with Oracle.

I am not in a position to judge this properly whether this would be a huge problem. Personally, I don't have an issue with that. But don't count me as most important opion on this.


So I guess I'm still in favor of the CREATE TABLE p1 PARTITION OF test
DEFAULT syntax, but if it ends up being AS DEFAULT instead, I can live
with that.

Is to make it optional an option?

Sven

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, May 10, 2017 at 10:59 AM, Sven R. Kunze <srkunze@mail.de> wrote:
> You are definitely right. Changing it here would require to change it
> everywhere AND thus to loose syntax parity with Oracle.

Right.

> I am not in a position to judge this properly whether this would be a huge
> problem. Personally, I don't have an issue with that. But don't count me as
> most important opion on this.

Well, I don't think it would be a HUGE problem, but I think the fact
that Amit chose to implement this with syntax similar to that of
Oracle is probably not a coincidence, but rather a goal, and I think
the readability problem that you're worrying about is really pretty
minor.  I think most people aren't going to subpartition their default
partition, and I think those who do will probably find the syntax
clear enough anyway.   So I don't favor changing it.  Now, if there's
an outcry of support for your position then I'll stand aside but I
don't anticipate that.

>> So I guess I'm still in favor of the CREATE TABLE p1 PARTITION OF test
>> DEFAULT syntax, but if it ends up being AS DEFAULT instead, I can live
>> with that.
>
> Is to make it optional an option?

Optional keywords may not be the root of ALL evil, but they're pretty
evil.  See my posting earlier on this same thread on this topic:

http://postgr.es/m/CA+TgmoZGHgd3vKZvyQ1Qx3e0L3n=voxY57mz9TTncVET-aLK2A@mail.gmail.com

The issues here are more or less the same.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Alvaro Herrera
Date:
I'm surprised that there is so much activity in this thread.  Is this
patch being considered for pg10?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, May 10, 2017 at 12:12 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I'm surprised that there is so much activity in this thread.  Is this
> patch being considered for pg10?

Of course not.  Feature freeze was a month ago.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
"Sven R. Kunze"
Date:
On 10.05.2017 17:59, Robert Haas wrote:
Well, I don't think it would be a HUGE problem, but I think the fact
that Amit chose to implement this with syntax similar to that of
Oracle is probably not a coincidence, but rather a goal, and I think
the readability problem that you're worrying about is really pretty
minor.  I think most people aren't going to subpartition their default
partition, and I think those who do will probably find the syntax
clear enough anyway.

I agree here.

Optional keywords may not be the root of ALL evil, but they're pretty
evil.  See my posting earlier on this same thread on this topic:

http://postgr.es/m/CA+TgmoZGHgd3vKZvyQ1Qx3e0L3n=voxY57mz9TTncVET-aLK2A@mail.gmail.com

The issues here are more or less the same.

Ah, I see. I didn't draw the conclusion from the optionality of a keyword the other day but after re-reading your post, it's exactly the same issue.
Let's avoid optional keywords!

Sven

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello,

Please find attached an updated patch with review comments and bugs reported till date implemented.

>1.
>In following block, we can just do with def_index, and we do not need found_def
>flag. We can check if def_index is -1 or not to decide if default partition is
>present.
found_def is used to set boundinfo->has_default which is used at couple
of other places to check if default partition exists. The implementation is similar
to has_null.

>3.
>In following function isDefaultPartitionBound, first statement "return false"
>is not needed.
It is needed to return false if the node is not DefElem.

Todo:
Add regression tests
Documentation

Thank you,
Rahila Syed



On Fri, May 5, 2017 at 1:30 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi Rahila,

I have started reviewing your latest patch, and here are my initial comments:

1.
In following block, we can just do with def_index, and we do not need found_def
flag. We can check if def_index is -1 or not to decide if default partition is
present.

@@ -166,6 +172,8 @@ RelationBuildPartitionDesc(Relation rel)
  /* List partitioning specific */
  PartitionListValue **all_values = NULL;
  bool found_null = false;
+ bool found_def = false;
+ int def_index = -1;
  int null_index = -1;

2.
In check_new_partition_bound, in case of PARTITION_STRATEGY_LIST, remove
following duplicate declaration of boundinfo, because it is confusing and after
your changes it is not needed as its not getting overridden in the if block
locally.
if (partdesc->nparts > 0)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
ListCell   *cell;


3.
In following function isDefaultPartitionBound, first statement "return false"
is not needed.

+ * Returns true if the partition bound is default
+ */
+bool
+isDefaultPartitionBound(Node *value)
+{
+ if (IsA(value, DefElem))
+ {
+ DefElem *defvalue = (DefElem *) value;
+ if(!strcmp(defvalue->defname, "DEFAULT"))
+ return true;
+ return false;
+ }
+ return false;
+}

4.
As mentioned in my previous set of comments, following if block inside a loop
in get_qual_for_default needs a break:

+ foreach(cell1, bspec->listdatums)
+ {
+ Node *value = lfirst(cell1);
+ if (isDefaultPartitionBound(value))
+ {
+ def_elem = true;
+ *defid  = inhrelid;
+ }
+ }

5.
In the grammar the rule default_part_list is not needed:
  
+default_partition:
+ DEFAULT  { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
+
+default_part_list:
+ default_partition { $$ = list_make1($1); }
+ ;
+

Instead you can simply declare default_partition as a list and write it as:

default_partition:
DEFAULT
{
Node *def = (Node *)makeDefElem("DEFAULT", NULL, @1);
$$ = list_make1(def);
}

6.
You need to change the output of the describe command, which is currently as below: postgres=# \d+ test; Table "public.test" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+---------+--------------+------------- a | integer | | | | plain | | b | date | | | | plain | | Partition key: LIST (a) Partitions: pd FOR VALUES IN (DEFAULT), test_p1 FOR VALUES IN (4, 5) What about changing the Paritions output as below: Partitions: pd DEFAULT, test_p1 FOR VALUES IN (4, 5)

7.
You need to handle tab completion for DEFAULT.
e.g.
If I partially type following command:
CREATE TABLE pd PARTITION OF test DEFA
and then press tab, I get following completion:
CREATE TABLE pd PARTITION OF test FOR VALUES

I did some primary testing and did not find any problem so far.

I will review and test further and let you know my comments.

Regards,
Jeevan Ladhe

On Thu, May 4, 2017 at 6:09 PM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
On Thu, May 4, 2017 at 5:14 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
The syntax implemented in this patch is as follows,

CREATE TABLE p11 PARTITION OF p1 DEFAULT;

Applied v9 patches, table description still showing old pattern of default partition. Is it expected?

create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;

\d+ lpd
                                         Table "public.lpd"
 Column |       Type        | Collation | Nullable | Default | Storage  | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer           |           |          |         | plain    |              |
 b      | integer           |           |          |         | plain    |              |
 c      | character varying |           |          |         | extended |              |
Partition key: LIST (a)
Partitions: lpd_d FOR VALUES IN (DEFAULT)


Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, May 11, 2017 at 10:07 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Please find attached an updated patch with review comments and bugs reported
> till date implemented.

You haven't done anything about the repeated suggestion that this
should also cover range partitioning.

+            /*
+             * If the partition is the default partition switch
+             * back to PARTITION_STRATEGY_LIST
+             */
+            if (spec->strategy == PARTITION_DEFAULT)
+                result_spec->strategy = PARTITION_STRATEGY_LIST;
+            else
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+                     errmsg("invalid bound specification for a list
partition"),                     parser_errposition(pstate, exprLocation(bound))));

This is incredibly ugly.  I don't know exactly what should be done
about it, but I think PARTITION_DEFAULT is a bad idea and has got to
go.  Maybe add a separate isDefault flag to PartitionBoundSpec.

+            /*
+             * Skip if it's a partitioned table.  Only RELKIND_RELATION
+             * relations (ie, leaf partitions) need to be scanned.
+             */
+            if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)

What about foreign table partitions?

Doesn't it strike you as a bit strange that get_qual_for_default()
doesn't return a qual?  Functions should generally have names that
describe what they do.

+    bound_datums = list_copy(spec->listdatums);
+
+    boundspecs = get_qual_for_default(parent, defid);
+
+    foreach(cell, bound_datums)
+    {
+        Node *value = lfirst(cell);
+        boundspecs = lappend(boundspecs, value);
+    }

There's an existing function that you can use to concatenate two lists
instead of open-coding it.

Also, I think that before you ask anyone to spend too much more time
and energy reviewing this, you should really add the documentation and
regression tests which you mentioned as a TODO.  And run the code
through pgindent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Rahila,

On Thu, May 11, 2017 at 7:37 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>3.
>In following function isDefaultPartitionBound, first statement "return false"
>is not needed.
It is needed to return false if the node is not DefElem.

Please have a look at following code:

+ * Returns true if the partition bound is default
+ */
+bool
+isDefaultPartitionBound(Node *value)
+{
+ if (IsA(value, DefElem))
+ {
+ DefElem defvalue = (DefElem ) value;
+ if(!strcmp(defvalue->defname, "DEFAULT"))
+ return true;
+ return false;
+ }
+ return false;
+}

By first return false, I mean to say the return statement inside the
if block "if (IsA(value, DefElem))":

+ if(!strcmp(defvalue->defname, "DEFAULT"))
+ return true;
+ return false; 

Even if this "return false" is not present, the control is anyway going to
fall through and will return false from the outermost return statement.

I leave this decision to you, but further this block could be rewritten as
below and also can be defined as a macro:

bool
isDefaultPartitionBound(Node *value)
{
return (IsA(value, DefElem) &&
!strcmp(((DefElem) value)->defname, "DEFAULT"));
}

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Beena Emerson
Date:


On Thu, May 11, 2017 at 7:37 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Please find attached an updated patch with review comments and bugs reported till date implemented.

Hello Rahila,

Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."

(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.

DROP TABLE list1;
CREATE TABLE list1 (
    a int,
    b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 SELECT generate_series(1,2),1;
-- Partition overlapping with DEF
CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
INSERT INTO list1 SELECT generate_series(2,3),2;

postgres=# SELECT * FROM list1 ORDER BY a,b;
 a | b
---+---
 1 | 1
 2 | 1
 2 | 2
 3 | 2
(4 rows)

postgres=# SELECT * FROM list1 WHERE a=2;
 a | b
---+---
 2 | 2
(1 row)

This ignores the a=2 entries in the DEFAULT.

postgres=# SELECT * FROM list1_def;
 a | b
---+---
 2 | 1
 3 | 2
(2 rows)


(2) I get the following warning:

partition.c: In function ‘check_new_partition_bound’:
partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   && boundinfo->has_default)
               ^
preproc.y:3250.2-8: warning: type clash on default action: <str> != <>


>1.
>In following block, we can just do with def_index, and we do not need found_def
>flag. We can check if def_index is -1 or not to decide if default partition is
>present.
found_def is used to set boundinfo->has_default which is used at couple
of other places to check if default partition exists. The implementation is similar
to has_null.

>3.
>In following function isDefaultPartitionBound, first statement "return false"
>is not needed.
It is needed to return false if the node is not DefElem.

Todo:
Add regression tests
Documentation

Thank you,
Rahila Syed


Re: [HACKERS] Adding support for Default partition in partitioning

From
Rahila Syed
Date:
Hello,

>(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.
This was introduced in latest version. We are not allowing adding a partition when entries with same key value exist in default partition. So this scenario should not
come in picture. Please find attached an updated patch which corrects this.

>(2) I get the following warning:

>partition.c: In function ‘check_new_partition_bound’:
>partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>   && boundinfo->has_default)
               ^
>preproc.y:3250.2-8: warning: type clash on default action: <str> != <>
I failed to notice this warning. I will look into it.

>This is incredibly ugly.  I don't know exactly what should be done
>about it, but I think PARTITION_DEFAULT is a bad idea and has got to
>go.  Maybe add a separate isDefault flag to PartitionBoundSpec
Will look at other ways to do it.

>Doesn't it strike you as a bit strange that get_qual_for_default()
>doesn't return a qual?  Functions should generally have names that
>describe what they do.
Will fix this.

>There's an existing function that you can use to concatenate two lists
>instead of open-coding it.
Will check this.

>you should really add the documentation and
>regression tests which you mentioned as a TODO.  And run the code
>through pgindent
I will also update the next version with documentation and regression tests
and run pgindent

Thank you,
Rahila Syed

On Fri, May 12, 2017 at 4:33 PM, Beena Emerson <memissemerson@gmail.com> wrote:


On Thu, May 11, 2017 at 7:37 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Please find attached an updated patch with review comments and bugs reported till date implemented.

Hello Rahila,

Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."

(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.

DROP TABLE list1;
CREATE TABLE list1 (
    a int,
    b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 SELECT generate_series(1,2),1;
-- Partition overlapping with DEF
CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
INSERT INTO list1 SELECT generate_series(2,3),2;

postgres=# SELECT * FROM list1 ORDER BY a,b;
 a | b
---+---
 1 | 1
 2 | 1
 2 | 2
 3 | 2
(4 rows)

postgres=# SELECT * FROM list1 WHERE a=2;
 a | b
---+---
 2 | 2
(1 row)

This ignores the a=2 entries in the DEFAULT.

postgres=# SELECT * FROM list1_def;
 a | b
---+---
 2 | 1
 3 | 2
(2 rows)


(2) I get the following warning:

partition.c: In function ‘check_new_partition_bound’:
partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   && boundinfo->has_default)
               ^
preproc.y:3250.2-8: warning: type clash on default action: <str> != <>


>1.
>In following block, we can just do with def_index, and we do not need found_def
>flag. We can check if def_index is -1 or not to decide if default partition is
>present.
found_def is used to set boundinfo->has_default which is used at couple
of other places to check if default partition exists. The implementation is similar
to has_null.

>3.
>In following function isDefaultPartitionBound, first statement "return false"
>is not needed.
It is needed to return false if the node is not DefElem.

Todo:
Add regression tests
Documentation

Thank you,
Rahila Syed



Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Beena Emerson
Date:
Hello,


On Fri, May 12, 2017 at 5:30 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

>(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.
This was introduced in latest version. We are not allowing adding a partition when entries with same key value exist in default partition. So this scenario should not
come in picture. Please find attached an updated patch which corrects this.

Thank you for the updated patch. However, now I cannot create a partition after default.

CREATE TABLE list1 (
    a int,
    b int
) PARTITION BY LIST (a);

CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
CREATE TABLE list1_5 PARTITION OF list1 FOR VALUES IN (3);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
 


>(2) I get the following warning:

>partition.c: In function ‘check_new_partition_bound’:
>partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>   && boundinfo->has_default)
               ^
>preproc.y:3250.2-8: warning: type clash on default action: <str> != <>
I failed to notice this warning. I will look into it.

>This is incredibly ugly.  I don't know exactly what should be done
>about it, but I think PARTITION_DEFAULT is a bad idea and has got to
>go.  Maybe add a separate isDefault flag to PartitionBoundSpec
Will look at other ways to do it.

>Doesn't it strike you as a bit strange that get_qual_for_default()
>doesn't return a qual?  Functions should generally have names that
>describe what they do.
Will fix this.

>There's an existing function that you can use to concatenate two lists
>instead of open-coding it.
Will check this.

>you should really add the documentation and
>regression tests which you mentioned as a TODO.  And run the code
>through pgindent
I will also update the next version with documentation and regression tests
and run pgindent

Thank you,
Rahila Syed

On Fri, May 12, 2017 at 4:33 PM, Beena Emerson <memissemerson@gmail.com> wrote:


On Thu, May 11, 2017 at 7:37 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Please find attached an updated patch with review comments and bugs reported till date implemented.

Hello Rahila,

Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."

(1) With the new patch, we allow new partitions when there is overlapping data with default partition. The entries in default are ignored when running queries satisfying the new partition.

DROP TABLE list1;
CREATE TABLE list1 (
    a int,
    b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 SELECT generate_series(1,2),1;
-- Partition overlapping with DEF
CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
INSERT INTO list1 SELECT generate_series(2,3),2;

postgres=# SELECT * FROM list1 ORDER BY a,b;
 a | b
---+---
 1 | 1
 2 | 1
 2 | 2
 3 | 2
(4 rows)

postgres=# SELECT * FROM list1 WHERE a=2;
 a | b
---+---
 2 | 2
(1 row)

This ignores the a=2 entries in the DEFAULT.

postgres=# SELECT * FROM list1_def;
 a | b
---+---
 2 | 1
 3 | 2
(2 rows)


(2) I get the following warning:

partition.c: In function ‘check_new_partition_bound’:
partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   && boundinfo->has_default)
               ^
preproc.y:3250.2-8: warning: type clash on default action: <str> != <>


>1.
>In following block, we can just do with def_index, and we do not need found_def
>flag. We can check if def_index is -1 or not to decide if default partition is
>present.
found_def is used to set boundinfo->has_default which is used at couple
of other places to check if default partition exists. The implementation is similar
to has_null.

>3.
>In following function isDefaultPartitionBound, first statement "return false"
>is not needed.
It is needed to return false if the node is not DefElem.

Todo:
Add regression tests
Documentation

Thank you,
Rahila Syed






--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
On Fri, May 12, 2017 at 7:34 PM, Beena Emerson <memissemerson@gmail.com> wrote:

Thank you for the updated patch. However, now I cannot create a partition after default.

CREATE TABLE list1 (
    a int,
    b int
) PARTITION BY LIST (a);

CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
CREATE TABLE list1_5 PARTITION OF list1 FOR VALUES IN (3);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Hi,

I have fixed the crash in attached patch.
Also the patch needed bit of adjustments due to recent commit.
I have re-based the patch on latest commit.

PFA.

Regards,
Jeevan Ladhe 
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Tue, May 16, 2017 at 8:57 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have fixed the crash in attached patch.
> Also the patch needed bit of adjustments due to recent commit.
> I have re-based the patch on latest commit.

+    bool        has_default;        /* Is there a default partition?
Currently false
+                                 * for a range partitioned table */
+    int            default_index;        /* Index of the default list
partition. -1 for
+                                 * range partitioned tables */

Why do we need both has_default and default_index?  If default_index
== -1 means that there is no default, we don't also need a separate
bool to record the same thing, do we?

get_qual_for_default() still returns a list of things that are not
quals.  I think that this logic is all pretty poorly organized.  The
logic to create a partitioning constraint for a list partition should
be part of get_qual_for_list(), whether or not it is a default.  And
when we have range partitions, the logic to create a default range
partitioning constraint should be part of get_qual_for_range().  The
code the way it's organized today makes it look like there are three
kinds of partitions: list, range, and default.  But that's not right
at all.  There are two kinds: list and range.  And a list partition
might or might not be a default partition, and similarly for range.

+                    ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION),
+                                    errmsg("DEFAULT partition cannot be used"
+                                           " without negator of operator  %s",
+                                           get_opname(operoid))));

I don't think ERRCODE_CHECK_VIOLATION is the right error code here,
and we have a policy against splitting message strings like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Tue, May 16, 2017 at 9:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 16, 2017 at 8:57 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>> I have fixed the crash in attached patch.
>> Also the patch needed bit of adjustments due to recent commit.
>> I have re-based the patch on latest commit.
>
> +    bool        has_default;        /* Is there a default partition?
> Currently false
> +                                 * for a range partitioned table */
> +    int            default_index;        /* Index of the default list
> partition. -1 for
> +                                 * range partitioned tables */
>

We have has_null and null_index for list partitioning. There
null_index == -1 = has_null. May be Rahila and/or Jeevan just copied
that style. Probably we should change that as well?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/05/17 17:58, Ashutosh Bapat wrote:
> On Tue, May 16, 2017 at 9:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, May 16, 2017 at 8:57 AM, Jeevan Ladhe
>> <jeevan.ladhe@enterprisedb.com> wrote:
>>> I have fixed the crash in attached patch.
>>> Also the patch needed bit of adjustments due to recent commit.
>>> I have re-based the patch on latest commit.
>>
>> +    bool        has_default;        /* Is there a default partition?
>> Currently false
>> +                                 * for a range partitioned table */
>> +    int            default_index;        /* Index of the default list
>> partition. -1 for
>> +                                 * range partitioned tables */
>>
> 
> We have has_null and null_index for list partitioning. There
> null_index == -1 = has_null. May be Rahila and/or Jeevan just copied
> that style. Probably we should change that as well?

Probably a good idea.

Thanks,
Amit




Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:


On Wed, May 17, 2017 at 2:28 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Tue, May 16, 2017 at 9:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 16, 2017 at 8:57 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>> I have fixed the crash in attached patch.
>> Also the patch needed bit of adjustments due to recent commit.
>> I have re-based the patch on latest commit.
>
> +    bool        has_default;        /* Is there a default partition?
> Currently false
> +                                 * for a range partitioned table */
> +    int            default_index;        /* Index of the default list
> partition. -1 for
> +                                 * range partitioned tables */
>

We have has_null and null_index for list partitioning. There
null_index == -1 = has_null. May be Rahila and/or Jeevan just copied
that style. Probably we should change that as well?


I agree with Ashutosh.
I had given similar comment on earlier version of patch[1], and  Rahila reverted
with above reasoning, hence did not change the logic she introduced.

Probably its a good idea to have a separate patch that removes has_null logic,
in a separate thread.


Regards,
Jeevan Ladhe.

Re: [HACKERS] Adding support for Default partition in partitioning

From
Beena Emerson
Date:
Hello,

Patch for default range partition has been added. PFA the rebased v12 patch for the same.
I have not removed the has_default variable yet.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

I started looking into Rahila's default_partition_v11.patch, and reworked on
few things as below:

- I tried to cover all the review comments posted on the thread. Do let
me know if something is missing.

- Got rid of the functions get_qual_for_default() and generate_qual_for_defaultpart().
There is no need of collecting boundspecs of all the partitions in case of list
partition, the list is available in boundinfo->ndatums, an expression for
default can be created from the information that is available in boundinfo.

- Got rid of variable has_default, and added a macro for it.

- Changed the logic of checking the overlapping of existing rows in default
partition. Earlier version of patch used to build new constraints for default
partition table and then was checking if any of existing rows violate those
constraints. However, current version of patch just checks if any of the rows in
default partition satisfy the new partition's constraint and fail if there
exists any.
This logic can also be used as it is for default partition in case of RANGE
partitioning.

- Simplified grammar rule.

- Got rid of PARTITION_DEFAULT since DEFAULT is not a different partition
strategy, the applicable logic is also revised:

- There are few other code adjustments like: indentation, commenting, code
simplification etc.

- Added regression tests.

TODO:
Documentation, I am working on it. Will updated the patch soon.

PFA.

Regards,
Jeevan

On Mon, May 22, 2017 at 7:31 AM, Beena Emerson <memissemerson@gmail.com> wrote:
Hello,

Patch for default range partition has been added. PFA the rebased v12 patch for the same.
I have not removed the has_default variable yet.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rajkumar Raghuwanshi
Date:
On Thu, May 25, 2017 at 12:10 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
PFA.

Hi

I have applied v13 patch, got a crash when trying to attach default temp partition.

postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TEMP TABLE temp_def_part (a int);
CREATE TABLE
postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
 
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Rajkumar,

postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TEMP TABLE temp_def_part (a int);
CREATE TABLE
postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Thanks for reporting.
PFA patch that fixes above issue.

Regards,
Jeevan Ladhe 

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Forgot to attach the patch.
PFA.

On Thu, May 25, 2017 at 3:02 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi Rajkumar,

postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TEMP TABLE temp_def_part (a int);
CREATE TABLE
postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Thanks for reporting.
PFA patch that fixes above issue.

Regards,
Jeevan Ladhe 

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Beena Emerson
Date:
On Thu, May 25, 2017 at 3:03 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
> Forgot to attach the patch.
> PFA.
>
> On Thu, May 25, 2017 at 3:02 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
>>
>> Hi Rajkumar,
>>
>>> postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
>>> CREATE TABLE
>>> postgres=# CREATE TEMP TABLE temp_def_part (a int);
>>> CREATE TABLE
>>> postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT;
>>> server closed the connection unexpectedly
>>>     This probably means the server terminated abnormally
>>>     before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>
>>
>> Thanks for reporting.
>> PFA patch that fixes above issue.
>>


The existing comment is not valid           /*            * A null partition key is only acceptable if null-accepting
list           * partition exists.            */
 
as we allow NULL to be stored in default. It should be updated.

DROP TABLE list1;
CREATE TABLE list1 (    a int) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (2);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 VALUES (NULL);
SELECT * FROM list1_def;a
---

(1 row)


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:

This patch needs a rebase on recent commits, and also a fix[1] that is posted for get_qual_for_list().

I am working on both of these tasks. Will update the patch once I am done with this.


Regards,

Jeevan Ladhe


On Mon, May 29, 2017 at 12:25 PM, Beena Emerson <memissemerson@gmail.com> wrote:
On Thu, May 25, 2017 at 3:03 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
> Forgot to attach the patch.
> PFA.
>
> On Thu, May 25, 2017 at 3:02 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
>>
>> Hi Rajkumar,
>>
>>> postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
>>> CREATE TABLE
>>> postgres=# CREATE TEMP TABLE temp_def_part (a int);
>>> CREATE TABLE
>>> postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part DEFAULT;
>>> server closed the connection unexpectedly
>>>     This probably means the server terminated abnormally
>>>     before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>
>>
>> Thanks for reporting.
>> PFA patch that fixes above issue.
>>


The existing comment is not valid
            /*
             * A null partition key is only acceptable if null-accepting list
             * partition exists.
             */
as we allow NULL to be stored in default. It should be updated.

DROP TABLE list1;
CREATE TABLE list1 (    a int) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (2);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 VALUES (NULL);
SELECT * FROM list1_def;
 a
---

(1 row)


--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:


The existing comment is not valid
            /*
             * A null partition key is only acceptable if null-accepting list
             * partition exists.
             */
as we allow NULL to be stored in default. It should be updated.

Sure Beena, as stated earlier will update this on my next version of patch.


Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

I have rebased the patch on latest commit with few cosmetic changes.

The patch fix_listdatums_get_qual_for_list_v3.patch [1]  needs to be applied
before applying this patch.


Regards,
Jeevan Ladhe


On Mon, May 29, 2017 at 2:28 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:


The existing comment is not valid
            /*
             * A null partition key is only acceptable if null-accepting list
             * partition exists.
             */
as we allow NULL to be stored in default. It should be updated.

Sure Beena, as stated earlier will update this on my next version of patch.


Regards,
Jeevan Ladhe

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Beena Emerson
Date:
On Mon, May 29, 2017 at 9:33 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patch on latest commit with few cosmetic changes.
>
> The patch fix_listdatums_get_qual_for_list_v3.patch [1]  needs to be applied
> before applying this patch.
>
> [1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315490.html
>


This needs a rebase again.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

I have rebased the patch on the latest commit.
PFA.

There exists one issue reported by Rajkumar[1] off-line as following, where
describing the default partition after deleting null partition, does not show
updated constraints. I am working on fixing this issue.

create table t1 (c1 int) partition by list (c1);
create table t11 partition of t1 for values in (1,2);
create table t12 partition of t1 default;
create table t13 partition of t1 for values in (10,11);
create table t14 partition of t1 for values in (null);

postgres=# \d+ t12
                                    Table "public.t12"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 c1     | integer |           |          |         | plain   |              | 
Partition of: t1 DEFAULT
Partition constraint: ((c1 IS NOT NULL) AND (c1 <> ALL (ARRAY[1, 2, 10, 11])))

postgres=# alter table t1 detach partition t14;
ALTER TABLE
postgres=# \d+ t12
                                    Table "public.t12"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 c1     | integer |           |          |         | plain   |              | 
Partition of: t1 DEFAULT
Partition constraint: ((c1 IS NOT NULL) AND (c1 <> ALL (ARRAY[1, 2, 10, 11])))

postgres=# insert into t1 values(null);
INSERT 0 1

Note that the parent correctly allows the nulls to be inserted.


Regards,
Jeevan Ladhe

On Tue, May 30, 2017 at 10:59 AM, Beena Emerson <memissemerson@gmail.com> wrote:
On Mon, May 29, 2017 at 9:33 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patch on latest commit with few cosmetic changes.
>
> The patch fix_listdatums_get_qual_for_list_v3.patch [1]  needs to be applied
> before applying this patch.
>
> [1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315490.html
>


This needs a rebase again.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

I have fixed the issue related to default partition constraints not getting updated
after detaching a partition.

PFA.

Regards,
Jeevan Ladhe

On Tue, May 30, 2017 at 1:08 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi,

I have rebased the patch on the latest commit.
PFA.

There exists one issue reported by Rajkumar[1] off-line as following, where
describing the default partition after deleting null partition, does not show
updated constraints. I am working on fixing this issue.

create table t1 (c1 int) partition by list (c1);
create table t11 partition of t1 for values in (1,2);
create table t12 partition of t1 default;
create table t13 partition of t1 for values in (10,11);
create table t14 partition of t1 for values in (null);

postgres=# \d+ t12
                                    Table "public.t12"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 c1     | integer |           |          |         | plain   |              | 
Partition of: t1 DEFAULT
Partition constraint: ((c1 IS NOT NULL) AND (c1 <> ALL (ARRAY[1, 2, 10, 11])))

postgres=# alter table t1 detach partition t14;
ALTER TABLE
postgres=# \d+ t12
                                    Table "public.t12"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 c1     | integer |           |          |         | plain   |              | 
Partition of: t1 DEFAULT
Partition constraint: ((c1 IS NOT NULL) AND (c1 <> ALL (ARRAY[1, 2, 10, 11])))

postgres=# insert into t1 values(null);
INSERT 0 1

Note that the parent correctly allows the nulls to be inserted.


Regards,
Jeevan Ladhe

On Tue, May 30, 2017 at 10:59 AM, Beena Emerson <memissemerson@gmail.com> wrote:
On Mon, May 29, 2017 at 9:33 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patch on latest commit with few cosmetic changes.
>
> The patch fix_listdatums_get_qual_for_list_v3.patch [1]  needs to be applied
> before applying this patch.
>
> [1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315490.html
>


This needs a rebase again.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Tue, May 30, 2017 at 1:08 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patch on the latest commit.
> PFA.
>

Thanks for rebasing the patch. Here are some review comments.
+                /*
+                 * In case of default partition, just note the index, we do not
+                 * add this to non_null_values list.
+                 */
We may want to rephrase it like
"Note the index of the partition bound spec for the default partition. There's
no datum to add to the list of non-null datums for this partition."
                   /* Assign mapping index for default partition. */
"mapping index" should be "mapped index". May be we want to use "the" before
default partition everywhere, there's only one specific default partition.
                       Assert(default_index >= 0 &&                              mapping[default_index] == -1);
Needs some explanation for asserting mapping[default_index] == -1. Since
default partition accepts any non-specified value, it should not get a mapped
index while assigning those for non-null datums.

+                     * Currently range partition do not have default partition
May be rephrased as "As of now, we do not support default range partition."

+     * ArrayExpr, which would return an negated expression for default
a negated instead of an negated.

+        cur_index = -1;        /*
-         * A null partition key is only acceptable if null-accepting list
-         * partition exists.
+         * A null partition key is acceptable if null-accepting list partition
+         * or a default partition exists. Check if there exists a null
+         * accepting partition, else this will be handled later by default
+         * partition if it exists.         */
-        cur_index = -1;
Why do we need to move assignment to cur_index before the comment.
The comment should probably change to "Handle NULL partition key here
if there's a
null-accepting list partition. Else it will routed to a default partition if
one exists."

+-- attaching default partition overlaps if a default partition already exists
+ERROR:  partition "part_def2" would overlap partition "part_def1"
Saying a default partition overlaps is misleading here. A default partition is
not exepected to overlap with anything. It's expected to "adjust" with the rest
of the partitions. It can "conflict" with another default partition. So the
right error message here is "a default partition "part_def1" already exists."

+CREATE TABLE part_def1 PARTITION OF list_parted DEFAULT;
+CREATE TABLE part_def2 (LIKE part_1 INCLUDING CONSTRAINTS);
+ALTER TABLE list_parted ATTACH PARTITION part_def2 DEFAULT;
May be you want to name part_def1 as def_part and part_def2 as fail_def_part to
be consistent with other names in the file. May be you want to test to
consecutive CREATE TABLE ... DEFAULT.

+ALTER TABLE list_parted2 ATTACH PARTITION part_3 FOR VALUES IN (11);
+ERROR:  new default partition constraint is violated by some row
+DETAIL:  Violating row contains (11, z).
The error message seems to be misleading. The default partition is not new. May
be we should say, "default partition contains rows that conflict with the
partition bounds of "part_3"". I think we should use a better word instead of
"conflict", but I am not able to find one right now.

+-- check that leaf partitons of default partition are scanned when
s/partitons/partitions/

-ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5)), ALTER a
SET NOT NULL;
-ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5);
+ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5, 55)), ALTER
a SET NOT NULL;
+ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5, 55);
Why do we want to change partition bounds of this one? The test is for children
of part_5 right?

+drop table part_default;
I think this is premature drop. Down the file there's a SELECT from
list_parted, which won't list the rows inserted to the default partition and we
will miss to check whether the tuples were routed to the right partition or
not.

+update list_part1 set a = 'c' where a = 'a';
+ERROR:  new row for relation "list_part1" violates partition constraint
+DETAIL:  Failing row contains (c, 1).
Why do we need this test here? It's not dealing with the default partition and
partition row movement is not in there. So the updated row may not move to the
default partition, even if it's there.

This isn't a complete review. I will continue to review this patch further.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
Hi Jeevan,

On 2017/05/30 16:38, Jeevan Ladhe wrote:
> I have rebased the patch on the latest commit.
> PFA.

Was looking at the patch and felt that the parse node representation of
default partition bound could be slightly different.  Can you explain the
motivation behind implementing it without adding a new member to the
PartitionBoundSpec struct?

I would suggest instead adding a bool named is_default and be done with
it.  It will help get rid of the public isDefaultPartitionBound() in the
proposed patch whose interface isn't quite clear and instead simply check
if (spec->is_default) in places where it's called by passing it (Node *)
linitial(spec->listdatums).

Further looking into the patch, I found a tiny problem in
check_default_allows_bound().  If the default partition that will be
scanned by it is a foreign table or a partitioned table with a foreign
leaf partition, you will get a failure like:

-- default partition is a foreign table
alter table p attach partition fp default;

-- adding a new partition will try to scan fp above
alter table p attach partition p12 for values in (1, 2);
ERROR:  could not open file "base/13158/16456": No such file or directory

I think the foreign tables should be ignored here to avoid the error.  The
fact that foreign default partition may contain data that satisfies the
new partition's constraint is something we cannot do much about.  Also,
see the note in ATTACH PARTITION description regarding foreign tables [1]
and the discussion at [2].

Thanks,
Amit

[1] https://www.postgresql.org/docs/devel/static/sql-altertable.html
[2]
https://www.postgresql.org/message-id/flat/8f89dcb2-bd15-d8dc-5f54-3e11dc6c9463%40lab.ntt.co.jp




Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Thanks Amit for your comments.

On 31-May-2017 6:03 AM, "Amit Langote" <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Hi Jeevan,

On 2017/05/30 16:38, Jeevan Ladhe wrote:
> I have rebased the patch on the latest commit.
> PFA.

Was looking at the patch and felt that the parse node representation of
default partition bound could be slightly different.  Can you explain the
motivation behind implementing it without adding a new member to the
PartitionBoundSpec struct?
I would suggest instead adding a bool named is_default and be done with
it.  It will help get rid of the public isDefaultPartitionBound() in the
proposed patch whose interface isn't quite clear and instead simply check
if (spec->is_default) in places where it's called by passing it (Node *)
linitial(spec->listdatums).

I thought of reusing the existing members of PartitionBoundSpec, but I agree that having a bool could simplify the code. Will do the receptive change.

Further looking into the patch, I found a tiny problem in
check_default_allows_bound().  If the default partition that will be
scanned by it is a foreign table or a partitioned table with a foreign
leaf partition, you will get a failure like:

-- default partition is a foreign table
alter table p attach partition fp default;

-- adding a new partition will try to scan fp above
alter table p attach partition p12 for values in (1, 2);
ERROR:  could not open file "base/13158/16456": No such file or directory

I think the foreign tables should be ignored here to avoid the error.  The
fact that foreign default partition may contain data that satisfies the
new partition's constraint is something we cannot do much about.  Also,
see the note in ATTACH PARTITION description regarding foreign tables [1]
and the discussion at [2].

Will look into this.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/05/31 9:33, Amit Langote wrote:
> On 2017/05/30 16:38, Jeevan Ladhe wrote:
>> I have rebased the patch on the latest commit.
>> PFA.
> 
> Was looking at the patch

I tried creating default partition of a range-partitioned table and got
the following error:

ERROR:  invalid bound specification for a range partition

I thought it would give:

ERROR: creating default partition is not supported for range partitioned
tables

Which means transformPartitionBound() should perform this check more
carefully.  As I suggested in my previous email, if there were a
is_default field in the PartitionBoundSpec, then one could add the
following block of code at the beginning of transformPartitionBound:

 if (spec->is_default && spec->strategy != PARTITION_STRATEGY_LIST)     ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),             errmsg("creating default partition is not supported for %s
 
partitioned tables", get_partition_strategy_name(key->strategy))));


Some more comments on the patch:

+                         errmsg("new default partition constraint is
violated by some row"),

"new default partition constraint" may sound a bit confusing to users.
That we recompute the default partition's constraint and check the "new
constraint" against the rows it contains seems to me to be the description
of internal details.  How about:

ERROR: default partition contains rows that belong to partition being created

+char *ExecBuildSlotValueDescription(Oid reloid,
+                              TupleTableSlot *slot,
+                              TupleDesc tupdesc,
+                              Bitmapset *modifiedCols,
+                              int maxfieldlen);

It seems that you made the above public to use it in
check_default_allows_bound(), which while harmless, I'm not sure if
needed.  ATRewriteTable() in tablecmds.c, for example, emits the following
error messages:

errmsg("check constraint \"%s\" is violated by some row",

errmsg("partition constraint is violated by some row")));

but neither outputs the DETAIL part showing exactly what row.  I think
it's fine for check_default_allows_bound() not to show the row itself and
hence no need to make ExecBuildSlotValueDescription public.


In get_rule_expr():
                    case PARTITION_STRATEGY_LIST:                        Assert(spec->listdatums != NIL);

+                        /*
+                         * If the boundspec is of Default partition, it does
+                         * not have list of datums, but has only one node to
+                         * indicate its a default partition.
+                         */
+                        if (isDefaultPartitionBound(
+                                        (Node *) linitial(spec->listdatums)))
+                        {
+                            appendStringInfoString(buf, "DEFAULT");
+                            break;
+                        }
+

How about adding this part before the switch (key->strategy)?  That way,
we won't have to come back and add this again when we add range default
partitions.

Gotta go; will provide more comments later.

Thanks,
Amit




Re: [HACKERS] Adding support for Default partition in partitioning

From
Beena Emerson
Date:
On Wed, May 31, 2017 at 8:13 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/05/31 9:33, Amit Langote wrote:
>
>
> In get_rule_expr():
>
>                      case PARTITION_STRATEGY_LIST:
>                          Assert(spec->listdatums != NIL);
>
> +                        /*
> +                         * If the boundspec is of Default partition, it does
> +                         * not have list of datums, but has only one node to
> +                         * indicate its a default partition.
> +                         */
> +                        if (isDefaultPartitionBound(
> +                                        (Node *) linitial(spec->listdatums)))
> +                        {
> +                            appendStringInfoString(buf, "DEFAULT");
> +                            break;
> +                        }
> +
>
> How about adding this part before the switch (key->strategy)?  That way,
> we won't have to come back and add this again when we add range default
> partitions.

I think it is best that we add a bool is_default to PartitionBoundSpec
and then have a general check for both list and range. Though
listdatums, upperdatums and lowerdatums are set to default for a
DEFAULt partition, it does not seem proper that we check listdatums
for range as well.




-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

I have addressed Ashutosh's and Amit's comments in the attached patch.

Please let me know if I have missed anything and any further comments.

PFA.

Regards,
Jeevan Ladhe

On Wed, May 31, 2017 at 9:50 AM, Beena Emerson <memissemerson@gmail.com> wrote:
On Wed, May 31, 2017 at 8:13 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/05/31 9:33, Amit Langote wrote:
>
>
> In get_rule_expr():
>
>                      case PARTITION_STRATEGY_LIST:
>                          Assert(spec->listdatums != NIL);
>
> +                        /*
> +                         * If the boundspec is of Default partition, it does
> +                         * not have list of datums, but has only one node to
> +                         * indicate its a default partition.
> +                         */
> +                        if (isDefaultPartitionBound(
> +                                        (Node *) linitial(spec->listdatums)))
> +                        {
> +                            appendStringInfoString(buf, "DEFAULT");
> +                            break;
> +                        }
> +
>
> How about adding this part before the switch (key->strategy)?  That way,
> we won't have to come back and add this again when we add range default
> partitions.

I think it is best that we add a bool is_default to PartitionBoundSpec
and then have a general check for both list and range. Though
listdatums, upperdatums and lowerdatums are set to default for a
DEFAULt partition, it does not seem proper that we check listdatums
for range as well.




--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, Jun 1, 2017 at 3:35 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Please let me know if I have missed anything and any further comments.

+                     errmsg("a default partition \"%s\" already exists",

I suggest: partition \"%s\" conflicts with existing default partition \"%s\"

The point is that's more similar to the message you get when overlap
&& !spec->is_default.

+     * If the default partition exists, it's partition constraint will change

it's -> its

+                         errmsg("default partition contains row(s)
that would overlap with partition being created")));

It doesn't really sound right to talk about rows overlapping with a
partition.  Partitions can overlap with each other, but not rows.
Also, it's not really project style to use ambiguously plural forms
like "row(s)" in error messages.  Maybe something like:

new partition constraint for default partition \"%s\" would be
violated by some row

+/*
+ * InvalidateDefaultPartitionRelcache
+ *
+ * Given a parent oid, this function checks if there exists a default partition
+ * and invalidates it's relcache if it exists.
+ */
+void
+InvalidateDefaultPartitionRelcache(Oid parentOid)
+{
+    Relation parent = heap_open(parentOid, AccessShareLock);
+    Oid default_relid =
parent->rd_partdesc->oids[DEFAULT_PARTITION_INDEX(parent)];
+
+    if (partition_bound_has_default(parent->rd_partdesc->boundinfo))
+        CacheInvalidateRelcacheByRelid(default_relid);
+
+    heap_close(parent, AccessShareLock);
+}

It does not seem like a good idea to put the heap_open() call inside
this function.  One of the two callers already *has* the Relation, and
we definitely want to avoid pulling the Oid out of the Relation only
to reopen it to get the Relation back.  And I think
heap_drop_with_catalog could open the parent relation instead of
calling LockRelationOid().

If DETACH PARTITION and DROP PARTITION require this, why not ATTACH
PARTITION and CREATE TABLE .. PARTITION OF?

The indentation of the changes in gram.y doesn't appear to match the
nearby code.  I'd remove this comment:

+             * Currently this is supported only for LIST partition.

Since nothing here is dependent on this working only for LIST
partitions, and since this will probably change, I think it would be
more future-proof to leave this out, lest somebody forget to update it
later.

-                switch (spec->strategy)
+                if (spec->is_default && (strategy == PARTITION_STRATEGY_LIST ||
+                                         strategy == PARTITION_STRATEGY_RANGE))

Checking strategy here appears pointless.

This is not a full review, but I'm out of time for today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
Here's some detailed review of the code.

@@ -1883,6 +1883,15 @@ heap_drop_with_catalog(Oid relid)    if (OidIsValid(parentOid))    {        /*
+         * Default partition constraints are constructed run-time from the
+         * constraints of its siblings(basically by negating them), so any
+         * change in the siblings needs to rebuild the constraints of the
+         * default partition. So, invalidate the sibling default partition's
+         * relcache.
+         */
+        InvalidateDefaultPartitionRelcache(parentOid);
+
Do we need a lock on the default partition for doing this? A query might be
scanning the default partition directly and we will invalidate the relcache
underneath it. What if two partitions are being dropped simultaneously and
change default constraints simultaneously. Probably the lock on the parent
helps there, but need to check it. What if the default partition cache is
invalidated because partition gets added/dropped to the default partition
itself. If we need a lock on the default partition, we will need to
check the order in which we should be obtaining the locks so as to avoid
deadlocks. This also means that we have to test PREPARED statements involving
default partition. Any addition/deletion/attach/detach of other partition
should invalidate those cached statements.

+                        if (partition_bound_has_default(boundinfo))
+                        {
+                            overlap = true;
+                            with = boundinfo->default_index;
+                        }
You could possibly rewrite this as
overlap = partition_bound_has_default(boundinfo);
with = boundinfo->default_index;
that would save one indentation and a conditional jump.

+    if (partdesc->nparts > 0 && partition_bound_has_default(boundinfo))
+        check_default_allows_bound(parent, spec);
If the table has a default partition, nparts > 0, nparts > 0 check looks
redundant. The comments above should also explain that this check doesn't
trigger when a default partition is added since we don't expect an existing
default partition in such a case.

+ * Checks if there exists any row in the default partition that passes the
+ * check for constraints of new partition, if any reports an error.
grammar two conflicting ifs in the same statement. You may want to rephrase
this as "This function checks if there exists a row in the default
partition that fits in the new
partition and throws an error if it finds one."

+    if (new_spec->strategy != PARTITION_STRATEGY_LIST)
+        return;
This should probably be an Assert. When default range partition is supported
this function would silently return, meaning there is no row in the default
partition which fits the new partition. We don't want that behavior.

The code in check_default_allows_bound() to check whether the default partition
has any rows that would fit new partition looks quite similar to the code in
ATExecAttachPartition() checking whether all rows in the table being attached
as a partition fit the partition bounds. One thing that
check_default_allows_bound() misses is, if there's already a constraint on the
default partition refutes the partition constraint on the new partition, we can
skip the scan of the default partition since it can not have rows that would
fit the new partition. ATExecAttachPartition() has code to deal with a similar
case i.e. the table being attached has a constraint which implies the partition
constraint. There may be more cases which check_default_allows_bound() does not
handle but ATExecAttachPartition() handles. So, I am wondering whether it's
better to somehow take out the common code into a function and use it. We will
have to deal with a difference through. The first one would throw an error when
finding a row that satisfies partition constraints whereas the second one would
throw an error when it doesn't find such a row. But this difference can be
handled through a flag or by negating the constraint. This would also take care
of Amit Langote's complaint about foreign partitions. There's also another
difference that the ATExecAttachPartition() queues the table for scan and the
actual scan takes place in ATRewriteTable(), but there is not such queue while
creating a table as a partition. But we should check if we can reuse the code to
scan the heap for checking a constraint.

In case of ATTACH PARTITION, probably we should schedule scan of default
partition in the alter table's work queue like what ATExecAttachPartition() is
doing for the table being attached. That would fit in the way alter table
works.
make_partition_op_expr(PartitionKey key, int keynum,
-                       uint16 strategy, Expr *arg1, Expr *arg2)
+                    uint16 strategy, Expr *arg1, Expr *arg2, bool is_default)
Indentation

+                if (is_default &&
+                    ((operoid = get_negator(operoid)) == InvalidOid))
+                    ereport(ERROR, (errcode(ERRCODE_RESTRICT_VIOLATION),
+                                    errmsg("DEFAULT partition cannot
be used without negator of operator  %s",
+                                           get_opname(operoid))));
+
If the existence of default partition depends upon the negator, shouldn't there
be a dependency between the default partition and the negator. At the time of
creating the default partition, we will try to constuct the partition
constraint for the default partition and if the negator doesn't exist that
time, it will throw an error. But in an unlikely event when the user drops the
negator, the partitioned table will not be usable at all, as every time it will
try to create the relcache, it will try to create default partition constraint
and will throw error because of missing negator. That's not a very good
scenario. Have you tried this case? Apart from that, while restoring a dump, if
the default partition gets restored before the negator is created, restore will
fail with this error.
    /* Generate the main expression, i.e., keyCol = ANY (arr) */    opexpr = make_partition_op_expr(key, 0,
BTEqualStrategyNumber,
-                                    keyCol, (Expr *) arr);
+                                    keyCol, (Expr *) arr, spec->is_default);                /* Build leftop = ANY
(rightop)*/                saopexpr = makeNode(ScalarArrayOpExpr);
 
The comments in both the places need correction, as for default partition the
expression will be keyCol <> ALL(arr).

+    /*
+     * In case of the default partition for list, the partition constraint
+     * is basically any value that is not equal to any of the values in
+     * boundinfo->datums array. So, construct a list of constants from
+     * boundinfo->datums to pass to function make_partition_op_expr via
+     * ArrayExpr, which would return a negated expression for the default
+     * partition.
+     */
This is misleading, since the actual constraint would also have NOT NULL or IS
NULL in there depending upon the existence of a NULL partition.
I would simply rephrase this as "For default list partition, collect lists for
all the partitions. The default partition constraint should check that the
partition key is equal to none of those."

+        ndatums = (pdesc->nparts > 0) ? boundinfo->ndatums : 0;
wouldn't ndatums be simply boundinfo->ndatums? When nparts = 0, ndatums will be
0.
+        int         ndatums = 0;
This assignment looks redundant then.

+        if (boundinfo && partition_bound_accepts_nulls(boundinfo))
You have not checked existence of boundinfo when extracting ndatums out of it
and just few lines below you check that. If the later check is required then we
will get a segfault while extracting ndatums.

+    if ((!list_has_null && !spec->is_default) ||
+        (list_has_null && spec->is_default))
Need a comment explaining what's going on here. The condition is no more a
simple condition.

-            result = -1;
-            *failed_at = parent;
-            *failed_slot = slot;
-            break;
+            if (partition_bound_has_default(partdesc->boundinfo))
+            {
+                result = parent->indexes[partdesc->boundinfo->default_index];
+
+                if (result >= 0)
+                    break;
+                else
+                    parent = pd[-result];
+            }
+            else
+            {
+                result = -1;
+                *failed_at = parent;
+                *failed_slot = slot;
+                break;
+            }
The code to handle result is duplicated here and few lines below. I think it
would be better to not duplicate it by having separate condition blocks to deal
with setting result and setting parent. Basically if (cur_index < 0) ... else
would set the result breaking when setting result = -1 explicitly. A follow-on
block would adjust the parent if result < 0 or break otherwise.

Both the places where DEFAULT_PARTITION_INDEX is used, its result is used to
fetch OID of the default partition. So, instead of having this macro, may be we
should have macro to fetch OID of default partition. But even there I don't see
much value in that. Further, the macro and code using that macro fetches
rd_partdesc directly from Relation. We have RelationGetPartitionDesc() for
that. Probably we should also add Asserts to check that every pointer in the
long pointer chain is Non-null.

InvalidateDefaultPartitionRelcache() is called in case of drop and detach.
Shouldn't the constraint change when we add or attach a new partition.
Shouldn't we invalidate the cache then as well? I am not able to find that
code in your patch.
    /*
+     * Default partition constraints are constructed run-time from the
+     * constraints of its siblings(basically by negating them), so any
+     * change in the siblings needs to rebuild the constraints of the
+     * default partition. So, invalidate the sibling default partition's
+     * relcache.
+     */
May be rephrase this as "The default partition constraints depend upon the
partition bounds of other partitions. Detaching a partition invalidates the
default partition constraints. Invalidate the default partition's relcache so
that the constraints are built anew and any plans dependent on those
constraints are invalidated as well."

+                     errmsg("default partition is supported only for
list partitioned table")));
for "a" list partitioned table.

+            /*
+             * A default partition, that can be partition of either LIST or
+             * RANGE partitioned table.
+             * Currently this is supported only for LIST partition.
+             */
Keep everything in single paragraph without line break.
                }
+        ;
unnecessary extra line.

+        /*
+         * The default partition bound does not have any datums to be
+         * transformed, return the new bound.
+         */
Probably not needed.

+                if (spec->is_default && (strategy == PARTITION_STRATEGY_LIST ||
+                                         strategy == PARTITION_STRATEGY_RANGE))
+                {
+                    appendStringInfoString(buf, "DEFAULT");
+                    break;
+                }
+
What happens if strategy is something other than RANGE or LIST. For that matter
why not just LIST? Possibly you could write this as
+                if (spec->is_default)
+                {
+                    Assert(strategy == PARTITION_STRATEGY_LIST);
+                    appendStringInfoString(buf, "DEFAULT");
+                    break;
+                }

@@ -2044,7 +2044,7 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables,"");    /* Limited completion support for partition bound
specification*/    else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
 
-        COMPLETE_WITH_CONST("FOR VALUES");
+        COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");    else if (TailMatches2("FOR", "VALUES"))
COMPLETE_WITH_LIST2("FROM(", "IN (");
 

@@ -2483,7 +2483,7 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables,"");    /* Limited completion support for partition
boundspecification */    else if (TailMatches3("PARTITION", "OF", MatchAny))
 
-        COMPLETE_WITH_CONST("FOR VALUES");
+        COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
Do we include psql tab completion in the main feature patch? I have not seen
this earlier. But appreciate taking care of these defails.

+char *ExecBuildSlotValueDescription(Oid reloid,
needs an "extern" declaration.

On Fri, Jun 2, 2017 at 1:05 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have addressed Ashutosh's and Amit's comments in the attached patch.
>
> Please let me know if I have missed anything and any further comments.
>
> PFA.
>
> Regards,
> Jeevan Ladhe
>
> On Wed, May 31, 2017 at 9:50 AM, Beena Emerson <memissemerson@gmail.com>
> wrote:
>>
>> On Wed, May 31, 2017 at 8:13 AM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> > On 2017/05/31 9:33, Amit Langote wrote:
>> >
>> >
>> > In get_rule_expr():
>> >
>> >                      case PARTITION_STRATEGY_LIST:
>> >                          Assert(spec->listdatums != NIL);
>> >
>> > +                        /*
>> > +                         * If the boundspec is of Default partition, it
>> > does
>> > +                         * not have list of datums, but has only one
>> > node to
>> > +                         * indicate its a default partition.
>> > +                         */
>> > +                        if (isDefaultPartitionBound(
>> > +                                        (Node *)
>> > linitial(spec->listdatums)))
>> > +                        {
>> > +                            appendStringInfoString(buf, "DEFAULT");
>> > +                            break;
>> > +                        }
>> > +
>> >
>> > How about adding this part before the switch (key->strategy)?  That way,
>> > we won't have to come back and add this again when we add range default
>> > partitions.
>>
>> I think it is best that we add a bool is_default to PartitionBoundSpec
>> and then have a general check for both list and range. Though
>> listdatums, upperdatums and lowerdatums are set to default for a
>> DEFAULt partition, it does not seem proper that we check listdatums
>> for range as well.
>>
>>
>>
>>
>> --
>>
>> Beena Emerson
>>
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Robert,

Thanks for your comments:


If DETACH PARTITION and DROP PARTITION require this, why not ATTACH
PARTITION and CREATE TABLE .. PARTITION OF?


For CREATE and ATTACH parition the invalidation of default relation is taken
care by the following clean-up part in check_default_allows_bound():

+ ResetExprContext(econtext);
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ CacheInvalidateRelcache(part_rel);
+ MemoryContextSwitchTo(oldCxt);

However, post your comment I carefully looked in the code I wrote here, and I
see that this still explicitly needs cache invalidation in ATTACH and CREATE
command, because the above invalidation call will not happen in case the
default partition is further partitioned. Plus, I think the call to
CacheInvalidateRelcache() in check_default_allows_bound() can be completely
removed.

This code however will be rearranged, as I plan to address Ashutosh's one of the
comment to write a function for common code of ATExecAttachPartition() and
check_default_allows_bound().

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Beena Emerson
Date:
Hello,

On Fri, Jun 2, 2017 at 1:05 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have addressed Ashutosh's and Amit's comments in the attached patch.
>
> Please let me know if I have missed anything and any further comments.
>
> PFA.
>
> Regards,
> Jeevan Ladhe
>

What is the reason the new patch does not mention of violating rows
when a new partition overlaps with default?
Is it because more than one row could be violating the condition?

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:



What is the reason the new patch does not mention of violating rows
when a new partition overlaps with default?
Is it because more than one row could be violating the condition?

This is because, for reporting the violating error, I had to function
ExecBuildSlotValueDescription() public. Per Amit's comment I have
removed this change and let the overlapping error without row contains.
I think this is analogus to other functions that are throwing violation error
but are not local to execMain.c.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Beena Emerson
Date:
On Mon, Jun 5, 2017 at 12:14 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
>
>>
>> What is the reason the new patch does not mention of violating rows
>> when a new partition overlaps with default?
>> Is it because more than one row could be violating the condition?
>
>
> This is because, for reporting the violating error, I had to function
> ExecBuildSlotValueDescription() public. Per Amit's comment I have
> removed this change and let the overlapping error without row contains.
> I think this is analogus to other functions that are throwing violation
> error
> but are not local to execMain.c.
>

ok thanks.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Ashutosh,

Thanks for the detailed review.

Also, please find my feedback on your comments in-lined, I also addressed
the comments given by Robert in attached patch:

On Sat, Jun 3, 2017 at 5:13 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
Here's some detailed review of the code.

@@ -1883,6 +1883,15 @@ heap_drop_with_catalog(Oid relid)
     if (OidIsValid(parentOid))
     {
         /*
+         * Default partition constraints are constructed run-time from the
+         * constraints of its siblings(basically by negating them), so any
+         * change in the siblings needs to rebuild the constraints of the
+         * default partition. So, invalidate the sibling default partition's
+         * relcache.
+         */
+        InvalidateDefaultPartitionRelcache(parentOid);
+
Do we need a lock on the default partition for doing this? A query might be
scanning the default partition directly and we will invalidate the relcache
underneath it. What if two partitions are being dropped simultaneously and
change default constraints simultaneously. Probably the lock on the parent
helps there, but need to check it. What if the default partition cache is
invalidated because partition gets added/dropped to the default partition
itself. If we need a lock on the default partition, we will need to
check the order in which we should be obtaining the locks so as to avoid
deadlocks.
 
Done. I have taken a lock on default partition after acquiring a lock on parent
relation where ever applicable.
 
This also means that we have to test PREPARED statements involving
default partition. Any addition/deletion/attach/detach of other partition
should invalidate those cached statements.

Will add this in next version of patch.
 
+                        if (partition_bound_has_default(boundinfo))
+                        {
+                            overlap = true;
+                            with = boundinfo->default_index;
+                        }
You could possibly rewrite this as
overlap = partition_bound_has_default(boundinfo);
with = boundinfo->default_index;
that would save one indentation and a conditional jump.

Done
 
+    if (partdesc->nparts > 0 && partition_bound_has_default(boundinfo))
+        check_default_allows_bound(parent, spec);
If the table has a default partition, nparts > 0, nparts > 0 check looks
redundant. The comments above should also explain that this check doesn't
trigger when a default partition is added since we don't expect an existing
default partition in such a case.

The check nparts > 0, is needed to make sure that the boundinfo is non-null,
i.e. to confirm that there exists at least one partition so that
partition_bound_has_default() does not result in segmentation fault.
I have changed the condition as below to make it more intuitive:
if (boundinfo && partition_bound_has_default(boundinfo))
Also, I have updated the comment.
 
+ * Checks if there exists any row in the default partition that passes the
+ * check for constraints of new partition, if any reports an error.
grammar two conflicting ifs in the same statement. You may want to rephrase
this as "This function checks if there exists a row in the default
partition that fits in the new
partition and throws an error if it finds one."

Done
 
+    if (new_spec->strategy != PARTITION_STRATEGY_LIST)
+        return;
This should probably be an Assert. When default range partition is supported
this function would silently return, meaning there is no row in the default
partition which fits the new partition. We don't want that behavior.

Agreed, changed.
 
The code in check_default_allows_bound() to check whether the default partition
has any rows that would fit new partition looks quite similar to the code in
ATExecAttachPartition() checking whether all rows in the table being attached
as a partition fit the partition bounds. One thing that
check_default_allows_bound() misses is, if there's already a constraint on the
default partition refutes the partition constraint on the new partition, we can
skip the scan of the default partition since it can not have rows that would
fit the new partition. ATExecAttachPartition() has code to deal with a similar
case i.e. the table being attached has a constraint which implies the partition
constraint. There may be more cases which check_default_allows_bound() does not
handle but ATExecAttachPartition() handles. So, I am wondering whether it's
better to somehow take out the common code into a function and use it. We will
have to deal with a difference through. The first one would throw an error when
finding a row that satisfies partition constraints whereas the second one would
throw an error when it doesn't find such a row. But this difference can be
handled through a flag or by negating the constraint. This would also take care
of Amit Langote's complaint about foreign partitions. There's also another
difference that the ATExecAttachPartition() queues the table for scan and the
actual scan takes place in ATRewriteTable(), but there is not such queue while
creating a table as a partition. But we should check if we can reuse the code to
scan the heap for checking a constraint.

In case of ATTACH PARTITION, probably we should schedule scan of default
partition in the alter table's work queue like what ATExecAttachPartition() is
doing for the table being attached. That would fit in the way alter table
works.

I am still working on this.
But, about your comment here:
"if there's already a constraint on the default partition refutes the partition
constraint on the new partition, we can skip the scan":
I am so far not able to imagine such a case, since default partition constraint
can be imagined something like "minus infinity to positive infinity with
some finite set elimination", and any new non-default partition being added
would simply be a set of finite values(at-least in case of list, but I think range
should not also differ here). Hence one cannot imply the other here. Possibly,
I might be missing something that you had visioned when you raised the flag,
please correct me if I am missing something.
 
 make_partition_op_expr(PartitionKey key, int keynum,
-                       uint16 strategy, Expr *arg1, Expr *arg2)
+                    uint16 strategy, Expr *arg1, Expr *arg2, bool is_default)
Indentation

Done.
 

+                if (is_default &&
+                    ((operoid = get_negator(operoid)) == InvalidOid))
+                    ereport(ERROR, (errcode(ERRCODE_RESTRICT_VIOLATION),
+                                    errmsg("DEFAULT partition cannot
be used without negator of operator  %s",
+                                           get_opname(operoid))));
+
If the existence of default partition depends upon the negator, shouldn't there
be a dependency between the default partition and the negator. At the time of
creating the default partition, we will try to constuct the partition
constraint for the default partition and if the negator doesn't exist that
time, it will throw an error. But in an unlikely event when the user drops the
negator, the partitioned table will not be usable at all, as every time it will
try to create the relcache, it will try to create default partition constraint
and will throw error because of missing negator. That's not a very good
scenario. Have you tried this case? Apart from that, while restoring a dump, if
the default partition gets restored before the negator is created, restore will
fail with this error.

I am looking into this.
  
     /* Generate the main expression, i.e., keyCol = ANY (arr) */
     opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber,
-                                    keyCol, (Expr *) arr);
+                                    keyCol, (Expr *) arr, spec->is_default);
                 /* Build leftop = ANY (rightop) */
                 saopexpr = makeNode(ScalarArrayOpExpr);
The comments in both the places need correction, as for default partition the
expression will be keyCol <> ALL(arr).

Done.

+    /*
+     * In case of the default partition for list, the partition constraint
+     * is basically any value that is not equal to any of the values in
+     * boundinfo->datums array. So, construct a list of constants from
+     * boundinfo->datums to pass to function make_partition_op_expr via
+     * ArrayExpr, which would return a negated expression for the default
+     * partition.
+     */
This is misleading, since the actual constraint would also have NOT NULL or IS
NULL in there depending upon the existence of a NULL partition.
I would simply rephrase this as "For default list partition, collect lists for
all the partitions. The default partition constraint should check that the
partition key is equal to none of those."

Done.

+        ndatums = (pdesc->nparts > 0) ? boundinfo->ndatums : 0;
wouldn't ndatums be simply boundinfo->ndatums? When nparts = 0, ndatums will be
0.

Yes, but in case the default partition is the first partition to be added then
boundinfo will be null and the access to ndatums within it will result in
segmentation fault.
Simplified code to make this more readable.
 
+        int         ndatums = 0;
This assignment looks redundant then.

Per change made for above comment, this is now needed.
 
+        if (boundinfo && partition_bound_accepts_nulls(boundinfo))
You have not checked existence of boundinfo when extracting ndatums out of it
and just few lines below you check that. If the later check is required then we
will get a segfault while extracting ndatums.

The code to extract ndatums is changed and now has a check now for boundinfo,
but it would not have resulted in segmentation fault in its earlier state also,
because there was a check for avoiding this i.e. (pdesc->nparts > 0) ?:...
 
+    if ((!list_has_null && !spec->is_default) ||
+        (list_has_null && spec->is_default))
Need a comment explaining what's going on here. The condition is no more a
simple condition.

-            result = -1;
-            *failed_at = parent;
-            *failed_slot = slot;
-            break;
+            if (partition_bound_has_default(partdesc->boundinfo))
+            {
+                result = parent->indexes[partdesc->boundinfo->default_index];
+
+                if (result >= 0)
+                    break;
+                else
+                    parent = pd[-result];
+            }
+            else
+            {
+                result = -1;
+                *failed_at = parent;
+                *failed_slot = slot;
+                break;
+            }
The code to handle result is duplicated here and few lines below. I think it
would be better to not duplicate it by having separate condition blocks to deal
with setting result and setting parent. Basically if (cur_index < 0) ... else
would set the result breaking when setting result = -1 explicitly. A follow-on
block would adjust the parent if result < 0 or break otherwise.

I have tried to simplified it in attached patch, please let me know if that change
looks any better.
 
Both the places where DEFAULT_PARTITION_INDEX is used, its result is used to
fetch OID of the default partition. So, instead of having this macro, may be we
should have macro to fetch OID of default partition. But even there I don't see
much value in that.
 
Removed the macro, and did this in place at both the places.

Further, the macro and code using that macro fetches
rd_partdesc directly from Relation.
 
Done this where ever applicable. 

We have RelationGetPartitionDesc() for
that. Probably we should also add Asserts to check that every pointer in the
long pointer chain is Non-null.
 
I am sorry, but I did not understand which chain you are trying to point here.

InvalidateDefaultPartitionRelcache() is called in case of drop and detach.
Shouldn't the constraint change when we add or attach a new partition.
Shouldn't we invalidate the cache then as well? I am not able to find that
code in your patch.
 
In case of CREATE/ATTACH this was taken care by a call to
CacheInvalidateRelcache(part_rel) in check_default_allows_bound(), which wasn't
the correct place anyway, and this had a flaw that the invalidation would not
happen in case the default partition is further partitioned.
Now, the relcache for default partition is getting invalidated for
CREATE/DROP/ALTER commands.
 
     /*
+     * Default partition constraints are constructed run-time from the
+     * constraints of its siblings(basically by negating them), so any
+     * change in the siblings needs to rebuild the constraints of the
+     * default partition. So, invalidate the sibling default partition's
+     * relcache.
+     */
May be rephrase this as "The default partition constraints depend upon the
partition bounds of other partitions. Detaching a partition invalidates the
default partition constraints. Invalidate the default partition's relcache so
that the constraints are built anew and any plans dependent on those
constraints are invalidated as well."
 
Done!
 
+                     errmsg("default partition is supported only for
list partitioned table")));
for "a" list partitioned table.
 
Done.
 

+            /*
+             * A default partition, that can be partition of either LIST or
+             * RANGE partitioned table.
+             * Currently this is supported only for LIST partition.
+             */
Keep everything in single paragraph without line break.
 
Not applicable now, as I removed the later part of the comment.

                 }
+
         ;
unnecessary extra line.

Removed.
 
+        /*
+         * The default partition bound does not have any datums to be
+         * transformed, return the new bound.
+         */
Probably not needed.

Removed.
 

+                if (spec->is_default && (strategy == PARTITION_STRATEGY_LIST ||
+                                         strategy == PARTITION_STRATEGY_RANGE))
+                {
+                    appendStringInfoString(buf, "DEFAULT");
+                    break;
+                }
+
What happens if strategy is something other than RANGE or LIST. For that matter
why not just LIST? Possibly you could write this as
+                if (spec->is_default)
+                {
+                    Assert(strategy == PARTITION_STRATEGY_LIST);
+                    appendStringInfoString(buf, "DEFAULT");
+                    break;
+                }

Done.
 
@@ -2044,7 +2044,7 @@ psql_completion(const char *text, int start, int end)
         COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, "");
     /* Limited completion support for partition bound specification */
     else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
-        COMPLETE_WITH_CONST("FOR VALUES");
+        COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
     else if (TailMatches2("FOR", "VALUES"))
         COMPLETE_WITH_LIST2("FROM (", "IN (");

@@ -2483,7 +2483,7 @@ psql_completion(const char *text, int start, int end)
         COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables, "");
     /* Limited completion support for partition bound specification */
     else if (TailMatches3("PARTITION", "OF", MatchAny))
-        COMPLETE_WITH_CONST("FOR VALUES");
+        COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
Do we include psql tab completion in the main feature patch? I have not seen
this earlier. But appreciate taking care of these defails.

I am not sure about this. If needed I can submit a patch to take care of this later, but
as of now I have not removed this from the patch.

+char *ExecBuildSlotValueDescription(Oid reloid,
needs an "extern" declaration.

Per one of the comment[1] given by Amit Langote, I have removed a call to
ExecBuildSlotValueDescription(), and this was a leftover, I cleaned it up.


Regards,
Jeevan Ladhe
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
amul sul
Date:
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
[...]
>>
>> The code in check_default_allows_bound() to check whether the default
>> partition
>> has any rows that would fit new partition looks quite similar to the code
>> in
>> ATExecAttachPartition() checking whether all rows in the table being
>> attached
>> as a partition fit the partition bounds. One thing that
>> check_default_allows_bound() misses is, if there's already a constraint on
>> the
>> default partition refutes the partition constraint on the new partition,
>> we can
>> skip the scan of the default partition since it can not have rows that
>> would
>> fit the new partition. ATExecAttachPartition() has code to deal with a
>> similar
>> case i.e. the table being attached has a constraint which implies the
>> partition
>> constraint. There may be more cases which check_default_allows_bound()
>> does not
>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>> it's
>> better to somehow take out the common code into a function and use it. We
>> will
>> have to deal with a difference through. The first one would throw an error
>> when
>> finding a row that satisfies partition constraints whereas the second one
>> would
>> throw an error when it doesn't find such a row. But this difference can be
>> handled through a flag or by negating the constraint. This would also take
>> care
>> of Amit Langote's complaint about foreign partitions. There's also another
>> difference that the ATExecAttachPartition() queues the table for scan and
>> the
>> actual scan takes place in ATRewriteTable(), but there is not such queue
>> while
>> creating a table as a partition. But we should check if we can reuse the
>> code to
>> scan the heap for checking a constraint.
>>
>> In case of ATTACH PARTITION, probably we should schedule scan of default
>> partition in the alter table's work queue like what
>> ATExecAttachPartition() is
>> doing for the table being attached. That would fit in the way alter table
>> works.
>
>
> I am still working on this.
> But, about your comment here:
> "if there's already a constraint on the default partition refutes the
> partition
> constraint on the new partition, we can skip the scan":
> I am so far not able to imagine such a case, since default partition
> constraint
> can be imagined something like "minus infinity to positive infinity with
> some finite set elimination", and any new non-default partition being added
> would simply be a set of finite values(at-least in case of list, but I think
> range
> should not also differ here). Hence one cannot imply the other here.
> Possibly,
> I might be missing something that you had visioned when you raised the flag,
> please correct me if I am missing something.
>

IIUC, default partition constraints is simply NOT IN (<values of all
other sibling partitions>).
If constraint on the default partition refutes the new partition's
constraints that means we have overlapping partition, and perhaps
error.


Regards,
Amul



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:


IIUC, default partition constraints is simply NOT IN (<values of all
other sibling partitions>).
If constraint on the default partition refutes the new partition's
constraints that means we have overlapping partition, and perhaps
error.

You are correct Amul, but this error will be thrown before we try to
check for the default partition data. So, in such cases I think we really
do not need to have logic to check if default partition refutes the new
partition contraints.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
amul sul
Date:
On Wed, Jun 7, 2017 at 10:30 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
>
>> IIUC, default partition constraints is simply NOT IN (<values of all
>> other sibling partitions>).
>> If constraint on the default partition refutes the new partition's
>> constraints that means we have overlapping partition, and perhaps
>> error.
>
>
> You are correct Amul, but this error will be thrown before we try to
> check for the default partition data. So, in such cases I think we really
> do not need to have logic to check if default partition refutes the new
> partition contraints.
>

But Ashutosh's suggestion make sense, we might have constraints other
than that partitioning constraint on default partition.  If those
constraints refutes the new partition's constraints, we should skip
the scan.

Regards,
Amul



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>>
>> This also means that we have to test PREPARED statements involving
>> default partition. Any addition/deletion/attach/detach of other partition
>> should invalidate those cached statements.
>
>
> Will add this in next version of patch.

My earlier statement requires a clarification. By "PREPARED statements
involving default partition." I mean PREPAREd statements with direct
access to the default partition, without going through the partitioned
table.

>
>>
>> The code in check_default_allows_bound() to check whether the default
>> partition
>> has any rows that would fit new partition looks quite similar to the code
>> in
>> ATExecAttachPartition() checking whether all rows in the table being
>> attached
>> as a partition fit the partition bounds. One thing that
>> check_default_allows_bound() misses is, if there's already a constraint on
>> the
>> default partition refutes the partition constraint on the new partition,
>> we can
>> skip the scan of the default partition since it can not have rows that
>> would
>> fit the new partition. ATExecAttachPartition() has code to deal with a
>> similar
>> case i.e. the table being attached has a constraint which implies the
>> partition
>> constraint. There may be more cases which check_default_allows_bound()
>> does not
>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>> it's
>> better to somehow take out the common code into a function and use it. We
>> will
>> have to deal with a difference through. The first one would throw an error
>> when
>> finding a row that satisfies partition constraints whereas the second one
>> would
>> throw an error when it doesn't find such a row. But this difference can be
>> handled through a flag or by negating the constraint. This would also take
>> care
>> of Amit Langote's complaint about foreign partitions. There's also another
>> difference that the ATExecAttachPartition() queues the table for scan and
>> the
>> actual scan takes place in ATRewriteTable(), but there is not such queue
>> while
>> creating a table as a partition. But we should check if we can reuse the
>> code to
>> scan the heap for checking a constraint.
>>
>> In case of ATTACH PARTITION, probably we should schedule scan of default
>> partition in the alter table's work queue like what
>> ATExecAttachPartition() is
>> doing for the table being attached. That would fit in the way alter table
>> works.
>
>
> I am still working on this.
> But, about your comment here:
> "if there's already a constraint on the default partition refutes the
> partition
> constraint on the new partition, we can skip the scan":
> I am so far not able to imagine such a case, since default partition
> constraint
> can be imagined something like "minus infinity to positive infinity with
> some finite set elimination", and any new non-default partition being added
> would simply be a set of finite values(at-least in case of list, but I think
> range
> should not also differ here). Hence one cannot imply the other here.
> Possibly,
> I might be missing something that you had visioned when you raised the flag,
> please correct me if I am missing something.

I am hoping that this has been clarified in other mails in this thread
between you and Amul.

>
>>
>>      /* Generate the main expression, i.e., keyCol = ANY (arr) */
>>      opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber,
>> -                                    keyCol, (Expr *) arr);
>> +                                    keyCol, (Expr *) arr,
>> spec->is_default);
>>                  /* Build leftop = ANY (rightop) */
>>                  saopexpr = makeNode(ScalarArrayOpExpr);
>> The comments in both the places need correction, as for default partition
>> the
>> expression will be keyCol <> ALL(arr).
>
>
> Done.

Please note that this changes, if you construct the constraint as
!(keycol = ANY[]).

>
>> We have RelationGetPartitionDesc() for
>> that. Probably we should also add Asserts to check that every pointer in
>> the
>> long pointer chain is Non-null.
>
>
> I am sorry, but I did not understand which chain you are trying to point
> here.

The chain of pointers: a->b->c->d is a chain of pointers.

>
>>
>> @@ -2044,7 +2044,7 @@ psql_completion(const char *text, int start, int
>> end)
>>          COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, "");
>>      /* Limited completion support for partition bound specification */
>>      else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
>> -        COMPLETE_WITH_CONST("FOR VALUES");
>> +        COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
>>      else if (TailMatches2("FOR", "VALUES"))
>>          COMPLETE_WITH_LIST2("FROM (", "IN (");
>>
>> @@ -2483,7 +2483,7 @@ psql_completion(const char *text, int start, int
>> end)
>>          COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables,
>> "");
>>      /* Limited completion support for partition bound specification */
>>      else if (TailMatches3("PARTITION", "OF", MatchAny))
>> -        COMPLETE_WITH_CONST("FOR VALUES");
>> +        COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
>> Do we include psql tab completion in the main feature patch? I have not
>> seen
>> this earlier. But appreciate taking care of these defails.
>
>
> I am not sure about this. If needed I can submit a patch to take care of
> this later, but
> as of now I have not removed this from the patch.

I looked at Amul's patch. He has tab completion changes for HASH
partitions and those were suggested by Robert. So, keep those changes
in this patch. Sorry for misunderstanding on my part.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Sat, Jun 3, 2017 at 2:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> +                         errmsg("default partition contains row(s)
> that would overlap with partition being created")));
>
> It doesn't really sound right to talk about rows overlapping with a
> partition.  Partitions can overlap with each other, but not rows.
> Also, it's not really project style to use ambiguously plural forms
> like "row(s)" in error messages.  Maybe something like:
>
> new partition constraint for default partition \"%s\" would be
> violated by some row
>

Partition constraint is implementation detail here. We enforce
partition bounds through constraints and we call such constraints as
partition constraints. But a user may not necessarily understand this
term or may interpret it different. Adding "new" adds to the confusion
as the default partition is not new. My suggestion in an earlier mail
was ""default partition contains rows that conflict with the partition
bounds of "part_xyz"", with a note that we should use a better word
than "conflict". So, Jeevan seems to have used overlap, which again is
not correct. How about "default partition contains row/s which would
fit the partition "part_xyz" being created or attached." with a hint
to move those rows to the new partition's table in case of attach. I
don't think hint would be so straight forward i.e. to create the table
with SELECT INTO and then ATTACH.

What do you think?

Also, the error code ERRCODE_CHECK_VIOLATION, which is an "integrity
constraint violation" code, seems misleading. We aren't violating any
integrity here. In fact I am not able to understand, how could adding
an object violate integrity constraint. The nearest errorcode seems to
be ERRCODE_INVALID_OBJECT_DEFINITION, which is also used for
partitions with overlapping bounds.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:

>
>>
>> The code in check_default_allows_bound() to check whether the default
>> partition
>> has any rows that would fit new partition looks quite similar to the code
>> in
>> ATExecAttachPartition() checking whether all rows in the table being
>> attached
>> as a partition fit the partition bounds. One thing that
>> check_default_allows_bound() misses is, if there's already a constraint on
>> the
>> default partition refutes the partition constraint on the new partition,
>> we can
>> skip the scan of the default partition since it can not have rows that
>> would
>> fit the new partition. ATExecAttachPartition() has code to deal with a
>> similar
>> case i.e. the table being attached has a constraint which implies the
>> partition
>> constraint. There may be more cases which check_default_allows_bound()
>> does not
>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>> it's
>> better to somehow take out the common code into a function and use it. We
>> will
>> have to deal with a difference through. The first one would throw an error
>> when
>> finding a row that satisfies partition constraints whereas the second one
>> would
>> throw an error when it doesn't find such a row. But this difference can be
>> handled through a flag or by negating the constraint. This would also take
>> care
>> of Amit Langote's complaint about foreign partitions. There's also another
>> difference that the ATExecAttachPartition() queues the table for scan and
>> the
>> actual scan takes place in ATRewriteTable(), but there is not such queue
>> while
>> creating a table as a partition. But we should check if we can reuse the
>> code to
>> scan the heap for checking a constraint.
>>
>> In case of ATTACH PARTITION, probably we should schedule scan of default
>> partition in the alter table's work queue like what
>> ATExecAttachPartition() is
>> doing for the table being attached. That would fit in the way alter table
>> works.
>

I tried refactoring existing code so that it can be used for default
partitioning as well. The code to validate the partition constraints
against the table being attached in ATExecAttachPartition() is
extracted out into a set of functions. For default partition we reuse
those functions to check whether it contains any row that would fit
the partition being attached. While creating a new partition, the
function to skip validation is reused but the scan portion is
duplicated from ATRewriteTable since we are not in ALTER TABLE
context. The names of the functions, their declaration will require
some thought.

There's one test failing because for ATTACH partition the error comes
from ATRewriteTable instead of check_default_allows_bounds(). May be
we want to use same message in both places or some make ATRewriteTable
give a different message while validating default partition.

Please review the patch and let me know if the changes look good.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Thu, Jun 8, 2017 at 2:54 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>
>>
>>>
>>> The code in check_default_allows_bound() to check whether the default
>>> partition
>>> has any rows that would fit new partition looks quite similar to the code
>>> in
>>> ATExecAttachPartition() checking whether all rows in the table being
>>> attached
>>> as a partition fit the partition bounds. One thing that
>>> check_default_allows_bound() misses is, if there's already a constraint on
>>> the
>>> default partition refutes the partition constraint on the new partition,
>>> we can
>>> skip the scan of the default partition since it can not have rows that
>>> would
>>> fit the new partition. ATExecAttachPartition() has code to deal with a
>>> similar
>>> case i.e. the table being attached has a constraint which implies the
>>> partition
>>> constraint. There may be more cases which check_default_allows_bound()
>>> does not
>>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>>> it's
>>> better to somehow take out the common code into a function and use it. We
>>> will
>>> have to deal with a difference through. The first one would throw an error
>>> when
>>> finding a row that satisfies partition constraints whereas the second one
>>> would
>>> throw an error when it doesn't find such a row. But this difference can be
>>> handled through a flag or by negating the constraint. This would also take
>>> care
>>> of Amit Langote's complaint about foreign partitions. There's also another
>>> difference that the ATExecAttachPartition() queues the table for scan and
>>> the
>>> actual scan takes place in ATRewriteTable(), but there is not such queue
>>> while
>>> creating a table as a partition. But we should check if we can reuse the
>>> code to
>>> scan the heap for checking a constraint.
>>>
>>> In case of ATTACH PARTITION, probably we should schedule scan of default
>>> partition in the alter table's work queue like what
>>> ATExecAttachPartition() is
>>> doing for the table being attached. That would fit in the way alter table
>>> works.
>>
>
> I tried refactoring existing code so that it can be used for default
> partitioning as well. The code to validate the partition constraints
> against the table being attached in ATExecAttachPartition() is
> extracted out into a set of functions. For default partition we reuse
> those functions to check whether it contains any row that would fit
> the partition being attached. While creating a new partition, the
> function to skip validation is reused but the scan portion is
> duplicated from ATRewriteTable since we are not in ALTER TABLE
> context. The names of the functions, their declaration will require
> some thought.
>
> There's one test failing because for ATTACH partition the error comes
> from ATRewriteTable instead of check_default_allows_bounds(). May be
> we want to use same message in both places or some make ATRewriteTable
> give a different message while validating default partition.
>
> Please review the patch and let me know if the changes look good.

From the discussion on thread [1], that having a NOT NULL constraint
embedded within an expression may cause a scan to be skipped when it
shouldn't be. For default partitions such a case may arise. If an
existing partition accepts NULL and we try to attach a default
partition, it would get a NOT NULL partition constraint but it will be
buried within an expression like !(key = any(array[1, 2, 3]) OR key is
null) where the existing partition/s accept values 1, 2, 3 and null.
We need to check whether the refactored code handles this case
correctly. v19 patch does not have this problem since it doesn't try
to skip the scan based on the constraints of the table being attached.
Please try following cases 1. a default partition accepting nulls
exists and we attach a partition to accept NULL values 2. a NULL
accepting partition exists and we try to attach a table as default
partition. In both the cases default partition should be checked for
rows with NULL partition keys. In both the cases, if the default
partition table has a NOT NULL constraint we should be able to skip
the scan and should scan the table when such a constraint does not
exist.

[1] http://www.postgresql-archive.org/A-bug-in-mapping-attributes-in-ATExecAttachPartition-td5965298.html

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Thanks Ashutosh,

On Thu, Jun 8, 2017 at 4:04 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Thu, Jun 8, 2017 at 2:54 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>
>>
>>>
>>> The code in check_default_allows_bound() to check whether the default
>>> partition
>>> has any rows that would fit new partition looks quite similar to the code
>>> in
>>> ATExecAttachPartition() checking whether all rows in the table being
>>> attached
>>> as a partition fit the partition bounds. One thing that
>>> check_default_allows_bound() misses is, if there's already a constraint on
>>> the
>>> default partition refutes the partition constraint on the new partition,
>>> we can
>>> skip the scan of the default partition since it can not have rows that
>>> would
>>> fit the new partition. ATExecAttachPartition() has code to deal with a
>>> similar
>>> case i.e. the table being attached has a constraint which implies the
>>> partition
>>> constraint. There may be more cases which check_default_allows_bound()
>>> does not
>>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>>> it's
>>> better to somehow take out the common code into a function and use it. We
>>> will
>>> have to deal with a difference through. The first one would throw an error
>>> when
>>> finding a row that satisfies partition constraints whereas the second one
>>> would
>>> throw an error when it doesn't find such a row. But this difference can be
>>> handled through a flag or by negating the constraint. This would also take
>>> care
>>> of Amit Langote's complaint about foreign partitions. There's also another
>>> difference that the ATExecAttachPartition() queues the table for scan and
>>> the
>>> actual scan takes place in ATRewriteTable(), but there is not such queue
>>> while
>>> creating a table as a partition. But we should check if we can reuse the
>>> code to
>>> scan the heap for checking a constraint.
>>>
>>> In case of ATTACH PARTITION, probably we should schedule scan of default
>>> partition in the alter table's work queue like what
>>> ATExecAttachPartition() is
>>> doing for the table being attached. That would fit in the way alter table
>>> works.
>>
>
> I tried refactoring existing code so that it can be used for default
> partitioning as well. The code to validate the partition constraints
> against the table being attached in ATExecAttachPartition() is
> extracted out into a set of functions. For default partition we reuse
> those functions to check whether it contains any row that would fit
> the partition being attached. While creating a new partition, the
> function to skip validation is reused but the scan portion is
> duplicated from ATRewriteTable since we are not in ALTER TABLE
> context. The names of the functions, their declaration will require
> some thought.
>
> There's one test failing because for ATTACH partition the error comes
> from ATRewriteTable instead of check_default_allows_bounds(). May be
> we want to use same message in both places or some make ATRewriteTable
> give a different message while validating default partition.
>
> Please review the patch and let me know if the changes look good.

From the discussion on thread [1], that having a NOT NULL constraint
embedded within an expression may cause a scan to be skipped when it
shouldn't be. For default partitions such a case may arise. If an
existing partition accepts NULL and we try to attach a default
partition, it would get a NOT NULL partition constraint but it will be
buried within an expression like !(key = any(array[1, 2, 3]) OR key is
null) where the existing partition/s accept values 1, 2, 3 and null.
We need to check whether the refactored code handles this case
correctly. v19 patch does not have this problem since it doesn't try
to skip the scan based on the constraints of the table being attached.
Please try following cases 1. a default partition accepting nulls
exists and we attach a partition to accept NULL values 2. a NULL
accepting partition exists and we try to attach a table as default
partition. In both the cases default partition should be checked for
rows with NULL partition keys. In both the cases, if the default
partition table has a NOT NULL constraint we should be able to skip
the scan and should scan the table when such a constraint does not
exist.

I will review your refactoring patch as well test above cases.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, Jun 7, 2017 at 1:59 AM, amul sul <sulamul@gmail.com> wrote:
> But Ashutosh's suggestion make sense, we might have constraints other
> than that partitioning constraint on default partition.  If those
> constraints refutes the new partition's constraints, we should skip
> the scan.

Right.  If the user adds a constraint to the default partition that is
identical to the new partition constraint, that should cause the scan
to be skipped.

Ideally, we could do even better.  For example, if the user is
creating a new partition FOR VALUES IN (7), and the default partition
has CHECK (key != 7), we could perhaps deduce that the combination of
the existing partition constraint (which must certainly hold) and the
additional CHECK constraint (which must also hold, at least assuming
it's not marked NOT VALID) are sufficient to prove the new check
constraint.  But I'm not sure whether predicate_refuted_by() is smart
enough to figure that out.  However, it should definitely be smart
enough to figure out that if somebody's added the new partitioning
constraint as a CHECK constraint on the default partition, we don't
need to scan it.

The reason somebody might want to do that, just to be clear, is that
they could do this in multiple steps: first, add the new CHECK
constraint as NOT VALID.  Then VALIDATE CONSTRAINT.  Then add the new
non-default partition.  This would result in holding an exclusive lock
for a lesser period of time than if they did it all together as one
operation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, Jun 7, 2017 at 5:47 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Sat, Jun 3, 2017 at 2:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> +                         errmsg("default partition contains row(s)
>> that would overlap with partition being created")));
>>
>> It doesn't really sound right to talk about rows overlapping with a
>> partition.  Partitions can overlap with each other, but not rows.
>> Also, it's not really project style to use ambiguously plural forms
>> like "row(s)" in error messages.  Maybe something like:
>>
>> new partition constraint for default partition \"%s\" would be
>> violated by some row
>
> Partition constraint is implementation detail here. We enforce
> partition bounds through constraints and we call such constraints as
> partition constraints. But a user may not necessarily understand this
> term or may interpret it different. Adding "new" adds to the confusion
> as the default partition is not new.

I see your point.  We could say "updated partition constraint" instead
of "new partition constraint" to address that to some degree.

> My suggestion in an earlier mail
> was ""default partition contains rows that conflict with the partition
> bounds of "part_xyz"", with a note that we should use a better word
> than "conflict". So, Jeevan seems to have used overlap, which again is
> not correct. How about "default partition contains row/s which would
> fit the partition "part_xyz" being created or attached." with a hint
> to move those rows to the new partition's table in case of attach. I
> don't think hint would be so straight forward i.e. to create the table
> with SELECT INTO and then ATTACH.

The problem is that none of these actually sound very good.  Neither
conflict nor overlap nor fit actually express the underlying idea very
clearly, at least IMHO.  I'm not opposed to using some wording along
these lines if we can think of a clear way to word it, but I think my
wording is better than using some unclear word for this concept.  I
can't immediately think of a way to adjust your wording so that it
seems completely clear.

> Also, the error code ERRCODE_CHECK_VIOLATION, which is an "integrity
> constraint violation" code, seems misleading. We aren't violating any
> integrity here. In fact I am not able to understand, how could adding
> an object violate integrity constraint. The nearest errorcode seems to
> be ERRCODE_INVALID_OBJECT_DEFINITION, which is also used for
> partitions with overlapping bounds.

I think that calling a constraint failure a check violation is not too
much of a stretch, even if it's technically a partition constraint
rather than a CHECK constraint.  However, your proposal also seems
reasonable.  I'm happy to go with whatever most people like best.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Ashutosh,

I tried to look into your refactoring code.
When applied all 3 patches, I got some regression failures, I have fixed all of
them now in attached patches, attached the regression.diffs.

Moving further, I have also made following changes in attached patches:

1. 0001-Refactor-ATExecAttachPartition.patch

+ * There is a case in which we cannot rely on just the result of the
+ * proof
This comment seems to also exist in current code, and I am not able to follow
which case this refers to. But, IIUC, this comment is for the case where we are
handling the 'key IS NOT NULL' part separately, and if that is the case it is
not needed here in the prologue of the function.

attachPartCanSkipValidation
+static bool
+ATCheckValidationSkippable(Relation scanRel, List *partConstraint,
+                          PartitionKey key)
The function name ATCheckValidationSkippable does not sound very intuitive to me,
and also I think prefix AT is something does not fit here as the function is not
really directly related to alter table command, instead is an auxiliary function.
How about changing it to "attachPartitionRequiresScan" or
"canSkipPartConstraintValidation"

+   List       *existConstraint = NIL;
Needs to be moved to inside if block instead.

+   bool        skip_validate;
Needs to be initialized to false, otherwise it can be returned without
initialization when scanRel_constr is NULL.

+   if (scanRel_constr != NULL)
instead of this may be we can simply have:
+   if (scanRel_constr == NULL)
+ return false;
This can prevent further indentation.

+static void
+ATValidatePartitionConstraints(List **wqueue, Relation scanRel,
+                              List *partConstraint, Relation rel)
What about just validatePartitionConstraints()

+   bool        skip_validate = false;
+
+   /* Check if we can do away with having to scan the table being attached. */
+   skip_validate = ATCheckValidationSkippable(scanRel, partConstraint, key);

First assignment is unnecessary here.

Instead of:
/* Check if we can do away with having to scan the table being attached. */
skip_validate = ATCheckValidationSkippable(scanRel, partConstraint, key);

/* It's safe to skip the validation scan after all */
if (skip_validate)
ereport(INFO,
(errmsg("partition constraint for table \"%s\" is implied by existing constraints",
RelationGetRelationName(scanRel))));

Following change can prevent further indentation:
if (ATCheckValidationSkippable(scanRel, partConstraint, key))
{
ereport(INFO,
(errmsg("partition constraint for table \"%s\" is implied by existing constraints",
RelationGetRelationName(scanRel))));
return;
}
This way variable skip_validate will not be needed.

Apart from this, I see that the patch will need change depending on how the fix
for validating partition constraints in case of embedded NOT-NULL[1] shapes up.

2. 0003-Refactor-default-partitioning-patch-to-re-used-code.patch

+ * In case the new partition bound being checked itself is a DEFAULT
+ * bound, this check shouldn't be triggered as there won't already exists
+ * the default partition in such a case.
I think above comment in DefineRelation() is not applicable, as
check_default_allows_bound() is called unconditional, and the check for existence
of default partition is now done inside the check_default_allows_bound() function.

  * This function checks if there exists a row in the default partition that
  * fits in the new partition and throws an error if it finds one.
  */
Above comment for check_default_allows_bound() needs a change now, may be something like this:
  * This function checks if a default partition already exists and if it does
  * it checks if there exists a row in the default partition that fits in the
  * new partition and throws an error if it finds one.
  */
  
List   *new_part_constraints = NIL;
List   *def_part_constraints = NIL;
I think above initialization is not needed, as the further assignments are
unconditional.

+ if (OidIsValid(default_oid))
+ {
+ Relation default_rel = heap_open(default_oid, AccessExclusiveLock);
We already have taken a lock on default and here we should be using a NoLock
instead.

+ def_part_constraints = get_default_part_validation_constraint(new_part_constraints);
exceeds 80 columns.

+ defPartConstraint = get_default_part_validation_constraint(partBoundConstraint);
similarly, needs indentation.

+
+List *
+get_default_part_validation_constraint(List *new_part_constraints)
+{
Needs some comment. What about:
/*
 * get_default_part_validation_constraint
 *
 * Given partition constraints, this function returns *would be* default
 * partition constraint.
 */
 
Apart from this, I tried to address the differences in error shown in case of
attache and create partition when rows in default partition would violate the
updated constraints, basically I have taken a flag in AlteredTableInfo to
indicate if the relation being scanned is a default partition or a child of
default partition(which I dint like much, but I don't see a way out here). Still
the error message does not display the default partition name in error as of
check_default_allows_bound(). May be to address this and keep the messages
exactly similar we can copy the name of parent default partition in a field in
AlteredTableInfo structure, which looks very ugly to me. I am open to
suggestions here.

3. changes to default_partition_v19.patch:

The default partition constraint are no more built using the negator of the
operator, instead it is formed simply as NOT of the existing partitions:
e.g.:
if a null accepting partition already exists:
NOT ((keycol IS NULL) OR (keycol = ANY (arr)))
if a null accepting partition does not exists:
NOT ((keycol IS NOT NULL) AND (keycol = ANY (arr))), where arr is an array of
datums in boundinfo->datums.

Added tests for prepared statment.

Renamed RelationGetDefaultPartitionOid() to get_default_partition_oid().

+ if (partqualstate && ExecCheck(partqualstate, econtext))
+ ereport(ERROR,
+ (errcode(ERRCODE_CHECK_VIOLATION),
+ errmsg("new partition constraint for default partition \"%s\" would be violated by some row",
+   RelationGetRelationName(default_rel))));
Per Ashutosh's suggestion[2], changed the error code to ERRCODE_INVALID_OBJECT_DEFINITION.
Also, per Robert's suggestion[3], changed following message:
"new partition constraint for default partition \"%s\" would be violated by some row"
to
"updated partition constraint for default partition \"%s\" would be violated by some row"

Some other cosmetic changes.

Apart from this, I am exploring the tests in relation with NOT NULL constraint
embedded within an expression. Will update on that shortly.


Regards,
Jeevan Ladhe


On Thu, Jun 8, 2017 at 2:54 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:

>
>>
>> The code in check_default_allows_bound() to check whether the default
>> partition
>> has any rows that would fit new partition looks quite similar to the code
>> in
>> ATExecAttachPartition() checking whether all rows in the table being
>> attached
>> as a partition fit the partition bounds. One thing that
>> check_default_allows_bound() misses is, if there's already a constraint on
>> the
>> default partition refutes the partition constraint on the new partition,
>> we can
>> skip the scan of the default partition since it can not have rows that
>> would
>> fit the new partition. ATExecAttachPartition() has code to deal with a
>> similar
>> case i.e. the table being attached has a constraint which implies the
>> partition
>> constraint. There may be more cases which check_default_allows_bound()
>> does not
>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>> it's
>> better to somehow take out the common code into a function and use it. We
>> will
>> have to deal with a difference through. The first one would throw an error
>> when
>> finding a row that satisfies partition constraints whereas the second one
>> would
>> throw an error when it doesn't find such a row. But this difference can be
>> handled through a flag or by negating the constraint. This would also take
>> care
>> of Amit Langote's complaint about foreign partitions. There's also another
>> difference that the ATExecAttachPartition() queues the table for scan and
>> the
>> actual scan takes place in ATRewriteTable(), but there is not such queue
>> while
>> creating a table as a partition. But we should check if we can reuse the
>> code to
>> scan the heap for checking a constraint.
>>
>> In case of ATTACH PARTITION, probably we should schedule scan of default
>> partition in the alter table's work queue like what
>> ATExecAttachPartition() is
>> doing for the table being attached. That would fit in the way alter table
>> works.
>

I tried refactoring existing code so that it can be used for default
partitioning as well. The code to validate the partition constraints
against the table being attached in ATExecAttachPartition() is
extracted out into a set of functions. For default partition we reuse
those functions to check whether it contains any row that would fit
the partition being attached. While creating a new partition, the
function to skip validation is reused but the scan portion is
duplicated from ATRewriteTable since we are not in ALTER TABLE
context. The names of the functions, their declaration will require
some thought.

There's one test failing because for ATTACH partition the error comes
from ATRewriteTable instead of check_default_allows_bounds(). May be
we want to use same message in both places or some make ATRewriteTable
give a different message while validating default partition.

Please review the patch and let me know if the changes look good.

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
While the refactoring seems a reasonable way to re-use existing code,
that may change based on the discussion in [1]. Till then please keep
the refactoring patches separate from the main patch. In the final
version, I think the refactoring changes to ATAttachPartition and the
default partition support should be committed separately. So, please
provide three different patches. That also makes review easy.

On Mon, Jun 12, 2017 at 8:29 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi Ashutosh,
>
> I tried to look into your refactoring code.
> When applied all 3 patches, I got some regression failures, I have fixed all
> of
> them now in attached patches, attached the regression.diffs.
>
> Moving further, I have also made following changes in attached patches:
>
> 1. 0001-Refactor-ATExecAttachPartition.patch
>
> + * There is a case in which we cannot rely on just the result of the
> + * proof
> This comment seems to also exist in current code, and I am not able to
> follow
> which case this refers to. But, IIUC, this comment is for the case where we
> are
> handling the 'key IS NOT NULL' part separately, and if that is the case it
> is
> not needed here in the prologue of the function.
>
> attachPartCanSkipValidation
> +static bool
> +ATCheckValidationSkippable(Relation scanRel, List *partConstraint,
> +                          PartitionKey key)
> The function name ATCheckValidationSkippable does not sound very intuitive
> to me,
> and also I think prefix AT is something does not fit here as the function is
> not
> really directly related to alter table command, instead is an auxiliary
> function.
> How about changing it to "attachPartitionRequiresScan" or
> "canSkipPartConstraintValidation"
>
> +   List       *existConstraint = NIL;
> Needs to be moved to inside if block instead.
>
> +   bool        skip_validate;
> Needs to be initialized to false, otherwise it can be returned without
> initialization when scanRel_constr is NULL.
>
> +   if (scanRel_constr != NULL)
> instead of this may be we can simply have:
> +   if (scanRel_constr == NULL)
> + return false;
> This can prevent further indentation.
>
> +static void
> +ATValidatePartitionConstraints(List **wqueue, Relation scanRel,
> +                              List *partConstraint, Relation rel)
> What about just validatePartitionConstraints()
>
> +   bool        skip_validate = false;
> +
> +   /* Check if we can do away with having to scan the table being attached.
> */
> +   skip_validate = ATCheckValidationSkippable(scanRel, partConstraint,
> key);
>
> First assignment is unnecessary here.
>
> Instead of:
> /* Check if we can do away with having to scan the table being attached. */
> skip_validate = ATCheckValidationSkippable(scanRel, partConstraint, key);
>
> /* It's safe to skip the validation scan after all */
> if (skip_validate)
> ereport(INFO,
> (errmsg("partition constraint for table \"%s\" is implied by existing
> constraints",
> RelationGetRelationName(scanRel))));
>
> Following change can prevent further indentation:
> if (ATCheckValidationSkippable(scanRel, partConstraint, key))
> {
> ereport(INFO,
> (errmsg("partition constraint for table \"%s\" is implied by existing
> constraints",
> RelationGetRelationName(scanRel))));
> return;
> }
> This way variable skip_validate will not be needed.
>
> Apart from this, I see that the patch will need change depending on how the
> fix
> for validating partition constraints in case of embedded NOT-NULL[1] shapes
> up.
>
> 2. 0003-Refactor-default-partitioning-patch-to-re-used-code.patch
>
> + * In case the new partition bound being checked itself is a DEFAULT
> + * bound, this check shouldn't be triggered as there won't already exists
> + * the default partition in such a case.
> I think above comment in DefineRelation() is not applicable, as
> check_default_allows_bound() is called unconditional, and the check for
> existence
> of default partition is now done inside the check_default_allows_bound()
> function.
>
>   * This function checks if there exists a row in the default partition that
>   * fits in the new partition and throws an error if it finds one.
>   */
> Above comment for check_default_allows_bound() needs a change now, may be
> something like this:
>   * This function checks if a default partition already exists and if it
> does
>   * it checks if there exists a row in the default partition that fits in
> the
>   * new partition and throws an error if it finds one.
>   */
>
> List   *new_part_constraints = NIL;
> List   *def_part_constraints = NIL;
> I think above initialization is not needed, as the further assignments are
> unconditional.
>
> + if (OidIsValid(default_oid))
> + {
> + Relation default_rel = heap_open(default_oid, AccessExclusiveLock);
> We already have taken a lock on default and here we should be using a NoLock
> instead.
>
> + def_part_constraints =
> get_default_part_validation_constraint(new_part_constraints);
> exceeds 80 columns.
>
> + defPartConstraint =
> get_default_part_validation_constraint(partBoundConstraint);
> similarly, needs indentation.
>
> +
> +List *
> +get_default_part_validation_constraint(List *new_part_constraints)
> +{
> Needs some comment. What about:
> /*
>  * get_default_part_validation_constraint
>  *
>  * Given partition constraints, this function returns *would be* default
>  * partition constraint.
>  */
>
> Apart from this, I tried to address the differences in error shown in case
> of
> attache and create partition when rows in default partition would violate
> the
> updated constraints, basically I have taken a flag in AlteredTableInfo to
> indicate if the relation being scanned is a default partition or a child of
> default partition(which I dint like much, but I don't see a way out here).
> Still
> the error message does not display the default partition name in error as of
> check_default_allows_bound(). May be to address this and keep the messages
> exactly similar we can copy the name of parent default partition in a field
> in
> AlteredTableInfo structure, which looks very ugly to me. I am open to
> suggestions here.
>
> 3. changes to default_partition_v19.patch:
>
> The default partition constraint are no more built using the negator of the
> operator, instead it is formed simply as NOT of the existing partitions:
> e.g.:
> if a null accepting partition already exists:
> NOT ((keycol IS NULL) OR (keycol = ANY (arr)))
> if a null accepting partition does not exists:
> NOT ((keycol IS NOT NULL) AND (keycol = ANY (arr))), where arr is an array
> of
> datums in boundinfo->datums.
>
> Added tests for prepared statment.
>
> Renamed RelationGetDefaultPartitionOid() to get_default_partition_oid().
>
> + if (partqualstate && ExecCheck(partqualstate, econtext))
> + ereport(ERROR,
> + (errcode(ERRCODE_CHECK_VIOLATION),
> + errmsg("new partition constraint for default partition \"%s\" would be
> violated by some row",
> +   RelationGetRelationName(default_rel))));
> Per Ashutosh's suggestion[2], changed the error code to
> ERRCODE_INVALID_OBJECT_DEFINITION.
> Also, per Robert's suggestion[3], changed following message:
> "new partition constraint for default partition \"%s\" would be violated by
> some row"
> to
> "updated partition constraint for default partition \"%s\" would be violated
> by some row"
>
> Some other cosmetic changes.
>
> Apart from this, I am exploring the tests in relation with NOT NULL
> constraint
> embedded within an expression. Will update on that shortly.
>
> [1]http://www.postgresql-archive.org/A-bug-in-mapping-attributes-in-ATExecAttachPartition-td5965298.html
>
[2]http://www.postgresql-archive.org/Adding-support-for-Default-partition-in-partitioning-td5946868i120.html#a5965277
> [3]http://www.postgresql-archive.org/Adding-support-for-Default-partition-in-partitioning-tp5946868p5965599.html
>
> Regards,
> Jeevan Ladhe
>
>
> On Thu, Jun 8, 2017 at 2:54 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>>
>> On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
>> <jeevan.ladhe@enterprisedb.com> wrote:
>>
>> >
>> >>
>> >> The code in check_default_allows_bound() to check whether the default
>> >> partition
>> >> has any rows that would fit new partition looks quite similar to the
>> >> code
>> >> in
>> >> ATExecAttachPartition() checking whether all rows in the table being
>> >> attached
>> >> as a partition fit the partition bounds. One thing that
>> >> check_default_allows_bound() misses is, if there's already a constraint
>> >> on
>> >> the
>> >> default partition refutes the partition constraint on the new
>> >> partition,
>> >> we can
>> >> skip the scan of the default partition since it can not have rows that
>> >> would
>> >> fit the new partition. ATExecAttachPartition() has code to deal with a
>> >> similar
>> >> case i.e. the table being attached has a constraint which implies the
>> >> partition
>> >> constraint. There may be more cases which check_default_allows_bound()
>> >> does not
>> >> handle but ATExecAttachPartition() handles. So, I am wondering whether
>> >> it's
>> >> better to somehow take out the common code into a function and use it.
>> >> We
>> >> will
>> >> have to deal with a difference through. The first one would throw an
>> >> error
>> >> when
>> >> finding a row that satisfies partition constraints whereas the second
>> >> one
>> >> would
>> >> throw an error when it doesn't find such a row. But this difference can
>> >> be
>> >> handled through a flag or by negating the constraint. This would also
>> >> take
>> >> care
>> >> of Amit Langote's complaint about foreign partitions. There's also
>> >> another
>> >> difference that the ATExecAttachPartition() queues the table for scan
>> >> and
>> >> the
>> >> actual scan takes place in ATRewriteTable(), but there is not such
>> >> queue
>> >> while
>> >> creating a table as a partition. But we should check if we can reuse
>> >> the
>> >> code to
>> >> scan the heap for checking a constraint.
>> >>
>> >> In case of ATTACH PARTITION, probably we should schedule scan of
>> >> default
>> >> partition in the alter table's work queue like what
>> >> ATExecAttachPartition() is
>> >> doing for the table being attached. That would fit in the way alter
>> >> table
>> >> works.
>> >
>>
>> I tried refactoring existing code so that it can be used for default
>> partitioning as well. The code to validate the partition constraints
>> against the table being attached in ATExecAttachPartition() is
>> extracted out into a set of functions. For default partition we reuse
>> those functions to check whether it contains any row that would fit
>> the partition being attached. While creating a new partition, the
>> function to skip validation is reused but the scan portion is
>> duplicated from ATRewriteTable since we are not in ALTER TABLE
>> context. The names of the functions, their declaration will require
>> some thought.
>>
>> There's one test failing because for ATTACH partition the error comes
>> from ATRewriteTable instead of check_default_allows_bounds(). May be
>> we want to use same message in both places or some make ATRewriteTable
>> give a different message while validating default partition.
>>
>> Please review the patch and let me know if the changes look good.
>
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:

On Mon, Jun 12, 2017 at 9:39 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
While the refactoring seems a reasonable way to re-use existing code,
that may change based on the discussion in [1]. Till then please keep
the refactoring patches separate from the main patch. In the final
version, I think the refactoring changes to ATAttachPartition and the
default partition support should be committed separately. So, please
provide three different patches. That also makes review easy.

Sure Ashutosh,

PFA. 
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
While rebasing the current set of patches to the latest source, I realized
that it might be a good idea to split the default partitioning support patch further
into two incremental patches, where the first patch for default partition
support would prevent addition of any new partition if there exists a default
partition, and then an incremental patch which allows to create/attach a
new partition even if there exists a default partition provided the default
partition does not have any rows satisfying the bounds of the new partition
being added. This would be easier for review.

Here are the details of the patches in attached zip.
0001. refactoring existing ATExecAttachPartition  code so that it can be used for
default partitioning as well
0002. support for default partition with the restriction of preventing addition
of any new partition after default partition.
0003. extend default partitioning support to allow addition of new partitions.
0004. extend default partitioning validation code to reuse the refactored code
in patch 0001. 

PFA

Regards,
Jeevan Ladhe

On Mon, Jun 12, 2017 at 11:49 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:

On Mon, Jun 12, 2017 at 9:39 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
While the refactoring seems a reasonable way to re-use existing code,
that may change based on the discussion in [1]. Till then please keep
the refactoring patches separate from the main patch. In the final
version, I think the refactoring changes to ATAttachPartition and the
default partition support should be committed separately. So, please
provide three different patches. That also makes review easy.

Sure Ashutosh,

PFA. 

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, Jun 14, 2017 at 8:02 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Here are the details of the patches in attached zip.
> 0001. refactoring existing ATExecAttachPartition  code so that it can be
> used for
> default partitioning as well
> 0002. support for default partition with the restriction of preventing
> addition
> of any new partition after default partition.
> 0003. extend default partitioning support to allow addition of new
> partitions.
> 0004. extend default partitioning validation code to reuse the refactored
> code
> in patch 0001.

I think the core ideas of this patch are pretty solid now.  It's come
a long way in the last month.  High-level comments:

- Needs to be rebased over b08df9cab777427fdafe633ca7b8abf29817aa55.
- Still no documentation.
- Should probably be merged with the patch to add default partitioning
for ranges.

Other stuff I noticed:

- The regression tests don't seem to check that the scan-skipping
logic works as expected.  We have regression tests for that case for
attaching regular partitions, and it seems like it would be worth
testing the default-partition case as well.

- check_default_allows_bound() assumes that if
canSkipPartConstraintValidation() fails for the default partition, it
will also fail for every subpartition of the default partition.  That
is, once we commit to scanning one child partition, we're committed to
scanning them all.  In practice, that's probably not a huge
limitation, but if it's not too much code, we could keep the top-level
check but also check each partitioning individually as we reach it,
and skip the scan for any individual partitions for which the
constraint can be proven.  For example, suppose the top-level table is
list-partitioned with a partition for each of the most common values,
and then we range-partition the default partition.

- The changes to the regression test results in 0004 make the error
messages slightly worse.  The old message names the default partition,
whereas the new one does not.  Maybe that's worth avoiding.

Specific comments:

+ * Also, invalidate the parent's and a sibling default partition's relcache,
+ * so that the next rebuild will load the new partition's info into parent's
+ * partition descriptor and default partition constraints(which are dependent
+ * on other partition bounds) are built anew.

I find this a bit unclear, and it also repeats the comment further
down.  Maybe something like: Also, invalidate the parent's relcache
entry, so that the next rebuild will load he new partition's info into
its partition descriptor.  If there is a default partition, we must
invalidate its relcache entry as well.

+    /*
+     * The default partition constraints depend upon the partition bounds of
+     * other partitions. Adding a new(or even removing existing) partition
+     * would invalidate the default partition constraints. Invalidate the
+     * default partition's relcache so that the constraints are built anew and
+     * any plans dependent on those constraints are invalidated as well.
+     */

Here, I'd write: The partition constraint for the default partition
depends on the partition bounds of every other partition, so we must
invalidate the relcache entry for that partition every time a
partition is added or removed.

+                    /*
+                     * Default partition cannot be added if there already
+                     * exists one.
+                     */
+                    if (spec->is_default)
+                    {
+                        overlap = partition_bound_has_default(boundinfo);
+                        with = boundinfo->default_index;
+                        break;
+                    }

To support default partitioning for range, this is going to have to be
moved above the switch rather than done inside of it.  And there's
really no downside to putting it there.

+ * constraint, by *proving* that the existing constraints of the table
+ * *imply* the given constraints.  We include the table's check constraints and

Both the comma and the asterisks are unnecessary.

+ * Check whether all rows in the given table (scanRel) obey given partition

obey the given

I think the larger comment block could be tightened up a bit, like
this:  Check whether all rows in the given table obey the given
partition constraint; if so, it can be attached as a partition.  We do
this by scanning the table (or all of its leaf partitions) row by row,
except when the existing constraints are sufficient to prove that the
new partitioning constraint must already hold.

+    /* Check if we can do away with having to scan the table being attached. */

If possible, skip the validation scan.

+     * Set up to have the table be scanned to validate the partition
+     * constraint If it's a partitioned table, we instead schedule its leaf
+     * partitions to be scanned.

I suggest: Prepare to scan the default partition (or, if it is itself
partitioned, all of its leaf partitions).

+    int         default_index;  /* Index of the default partition if any; -1
+                                 * if there isn't one */

"if any" is a bit redundant with "if there isn't one"; note the
phrasing of the preceding entry.

+        /*
+         * Skip if it's a partitioned table. Only RELKIND_RELATION relations
+         * (ie, leaf partitions) need to be scanned.
+         */
+        if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
+            part_rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)

The comment talks about what must be included in our list of things to
scan, but the code tests for the things that can be excluded.  I
suspect the comment has the right idea and the code should be adjusted
to match, but anyway they should be consistent.  Also, the correct way
to punctuate i.e. is like this: (i.e. leaf partitions) You should have
a period after each letter, but no following comma.

+     * The default partition must be already having an AccessExclusiveLock.

I think we should instead change DefineRelation to open (rather than
just lock) the default partition and pass the Relation as an argument
here so that we need not reopen it.

+            /* Construct const from datum */
+            val = makeConst(key->parttypid[0],
+                            key->parttypmod[0],
+                            key->parttypcoll[0],
+                            key->parttyplen[0],
+                            *boundinfo->datums[i],
+                            false,      /* isnull */
+                            key->parttypbyval[0] /* byval */ );

The /* byval */ comment looks a bit redundant, but I think this could
use a comment along the lines of: /* Only single-column list
partitioning is supported, so we only need to worry about the
partition key with index 0. */  And I'd also add an Assert() verifying
the the partition key has exactly 1 column, so that this breaks a bit
more obviously if someone removes that restriction in the future.

+         * Handle NULL partition key here if there's a null-accepting list
+         * partition, else later it will be routed to the default partition if
+         * one exists.

This isn't a great update of the existing comment -- it's drifted from
explaining the code to which it is immediately attached to a more
general discussion of NULL handling.  I'd just say something like: If
this is a NULL, send it to the null-accepting partition.  Otherwise,
route by searching the array of partition bounds.

+                if (tab->is_default_partition)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                             errmsg("updated partition constraint for
default partition would be violated by some row")));
+                else
+                    ereport(ERROR,
+                            (errcode(ERRCODE_CHECK_VIOLATION),

While there's room for debate about the correct error code here, it's
hard for me to believe that it's correct to use one error code for the
is_default_partition case and a different error code the rest of the
time.

+         * previously cached default partition constraints; those constraints
+         * won't stand correct after addition(or even removal) of a partition.

won't be correct after addition or removal

+         * allow any row that qualifies for this new partition. So, check if
+         * the existing data in the default partition satisfies this *would be*
+         * default partition constraint.

check that the existing data in the default partition satisfies the
constraint as it will exist after adding this partition

+     * Need to take a lock on the default partition, refer comment for locking
+     * the default partition in DefineRelation().

I'd say: We must also lock the default partition, for the same reasons
explained in DefineRelation().

And similarly in the other places that refer to that same comment.

+    /*
+     * In case of the default partition, the constraint is of the form
+     * "!(result)" i.e. one of the following two forms:
+     * 1. NOT ((keycol IS NULL) OR (keycol = ANY (arr)))
+     * 2. NOT ((keycol IS NOT NULL) AND (keycol = ANY (arr))), where arr is an
+     * array of datums in boundinfo->datums.
+     */

Does this survive pgindent?  You might need to surround the comment
with dashes to preserve formatting.

I think it would be worth adding a little more text this comment,
something like this: Note that, in general, applying NOT to a
constraint expression doesn't necessarily invert the set of rows it
accepts, because NOT NULL is NULL.  However, the partition constraints
we construct here never evaluate to NULL, so applying NOT works as
intended.

+     * Check whether default partition has a row that would fit the partition
+     * being attached by negating the partition constraint derived from the
+     * bounds. Since default partition is already part of the partitioned
+     * table, we don't need to validate the constraints on the partitioned
+     * table.

Here again, I'd add to the end of the first sentence a parenthetical
note, like this: ...from the bounds (the partition constraint never
evaluates to NULL, so negating it like this is safe).

I don't understand the second sentence.  It seems to contradict the first one.

+extern List *get_default_part_validation_constraint(List *new_part_constaints);#endif   /* PARTITION_H */

There should be a blank line after the last prototype and before the #endif.

+-- default partition table when it is being used in cahced plan.

Typo.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/06/15 4:51, Robert Haas wrote:
> On Wed, Jun 14, 2017 at 8:02 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>> Here are the details of the patches in attached zip.
>> 0001. refactoring existing ATExecAttachPartition  code so that it can be
>> used for
>> default partitioning as well
>> 0002. support for default partition with the restriction of preventing
>> addition
>> of any new partition after default partition.
>> 0003. extend default partitioning support to allow addition of new
>> partitions.
>> 0004. extend default partitioning validation code to reuse the refactored
>> code
>> in patch 0001.
> 
> I think the core ideas of this patch are pretty solid now.  It's come
> a long way in the last month.

+1


BTW, I noticed the following in 0002:

@@ -1322,15 +1357,59 @@ get_qual_for_list(PartitionKey key,
PartitionBoundSpec *spec)

[ ... ]

+        oldcxt = MemoryContextSwitchTo(CacheMemoryContext);

I'm not sure if we need to do that.  Can you explain?

Thanks,
Amit




Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
Oops, I meant to send one more comment.

On 2017/06/15 15:48, Amit Langote wrote:
> BTW, I noticed the following in 0002
+                     errmsg("there exists a default partition for table \"%s\", cannot
add a new partition",

This error message style seems novel to me.  I'm not sure about the best
message text here, but maybe: "cannot add new partition to table \"%s\"
with default partition"

Note that the comment applies to both DefineRelation and
ATExecAttachPartition.

Thanks,
Amit




Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
Some more comments on the latest set of patches.

In heap_drop_with_catalog(), we heap_open() the parent table to get the
default partition OID, if any. If the relcache doesn't have an entry for the
parent, this means that the entry will be created, only to be invalidated at
the end of the function. If there is no default partition, this all is
completely unnecessary. We should avoid heap_open() in this case. This also
means that get_default_partition_oid() should not rely on the relcache entry,
but should growl through pg_inherit to find the default partition.

In get_qual_for_list(), if the table has only default partition, it won't have
any boundinfo. In such a case the default partition's constraint would come out
as (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[]::integer[])))). The empty array
looks odd and may be we spend a few CPU cycles executing ANY on an empty array.
We have the same problem with a partition containing only NULL value. So, may
be this one is not that bad.

Please add a testcase to test addition of default partition as the first
partition.

get_qual_for_list() allocates the constant expressions corresponding to the
datums in CacheMemoryContext while constructing constraints for a default
partition. We do not do this for other partitions. We may not be constructing
the constraints for saving in the cache. For example, ATExecAttachPartition
constructs the constraints for validation. In such a case, this code will
unnecessarily clobber the cache memory. generate_partition_qual() copies the
partition constraint in the CacheMemoryContext.

+   if (spec->is_default)
+   {
+       result = list_make1(make_ands_explicit(result));
+       result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
+   }

If the "result" is an OR expression, calling make_ands_explicit() on it would
create AND(OR(...)) expression, with an unnecessary AND. We want to avoid that?

+       if (cur_index < 0 && (partition_bound_has_default(partdesc->boundinfo)))
+           cur_index = partdesc->boundinfo->default_index;
+
The partition_bound_has_default() check is unnecessary since we check for
cur_index < 0 next anyway.

+ *
+ * Given the parent relation checks if it has default partition, and if it
+ * does exist returns its oid, otherwise returns InvalidOid.
+ */
May be reworded as "If the given relation has a default partition, this
function returns the OID of the default partition. Otherwise it returns
InvalidOid."

+Oid
+get_default_partition_oid(Relation parent)
+{
+   PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+
+   if (partdesc->boundinfo && partition_bound_has_default(partdesc->boundinfo))
+       return partdesc->oids[partdesc->boundinfo->default_index];
+
+   return InvalidOid;
+}
An unpartitioned table would not have partdesc set set. So, this function will
segfault if we pass an unpartitioned table. Either Assert that partdesc should
exist or check for its NULL-ness.


+    defaultPartOid = get_default_partition_oid(rel);
+    if (OidIsValid(defaultPartOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                 errmsg("there exists a default partition for table
\"%s\", cannot attach a new partition",
+                        RelationGetRelationName(rel))));
+
Should be done before heap_open on the table being attached. If we are not
going to attach the partition, there's no point in instantiating its relcache.

The comment in heap_drop_with_catalog() should mention why we lock the default
partition before locking the table being dropped.
extern List *preprune_pg_partitions(PlannerInfo *root, RangeTblEntry *rte,                       Index rti, Node
*quals,LOCKMODE lockmode);
 
-#endif   /* PARTITION_H */
Unnecessary hunk.

On Thu, Jun 15, 2017 at 12:31 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Oops, I meant to send one more comment.
>
> On 2017/06/15 15:48, Amit Langote wrote:
>> BTW, I noticed the following in 0002
> +                                        errmsg("there exists a default partition for table \"%s\", cannot
> add a new partition",
>
> This error message style seems novel to me.  I'm not sure about the best
> message text here, but maybe: "cannot add new partition to table \"%s\"
> with default partition"
>
> Note that the comment applies to both DefineRelation and
> ATExecAttachPartition.
>
> Thanks,
> Amit
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Some more comments on the latest set of patches.
>
> In heap_drop_with_catalog(), we heap_open() the parent table to get the
> default partition OID, if any. If the relcache doesn't have an entry for the
> parent, this means that the entry will be created, only to be invalidated at
> the end of the function. If there is no default partition, this all is
> completely unnecessary. We should avoid heap_open() in this case. This also
> means that get_default_partition_oid() should not rely on the relcache entry,
> but should growl through pg_inherit to find the default partition.

I am *entirely* unconvinced by this line of argument.  I think we want
to open the relation the first time we touch it and pass the Relation
around thereafter.  Anything else is prone to accidentally failing to
have the relation locked early enough, or looking up the OID in the
relcache multiple times.

> In get_qual_for_list(), if the table has only default partition, it won't have
> any boundinfo. In such a case the default partition's constraint would come out
> as (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[]::integer[])))). The empty array
> looks odd and may be we spend a few CPU cycles executing ANY on an empty array.
> We have the same problem with a partition containing only NULL value. So, may
> be this one is not that bad.

I think that one is probably worth fixing.

> Please add a testcase to test addition of default partition as the first
> partition.

That seems like a good idea, too.

> get_qual_for_list() allocates the constant expressions corresponding to the
> datums in CacheMemoryContext while constructing constraints for a default
> partition. We do not do this for other partitions. We may not be constructing
> the constraints for saving in the cache. For example, ATExecAttachPartition
> constructs the constraints for validation. In such a case, this code will
> unnecessarily clobber the cache memory. generate_partition_qual() copies the
> partition constraint in the CacheMemoryContext.
>
> +   if (spec->is_default)
> +   {
> +       result = list_make1(make_ands_explicit(result));
> +       result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
> +   }

Clearly we do not want things to end up across multiple contexts.  We
should ensure that anything linked from the relcache entry ends up in
CacheMemoryContext, but we must be careful not to allocate anything
else in there, because CacheMemoryContext is never reset.

> If the "result" is an OR expression, calling make_ands_explicit() on it would
> create AND(OR(...)) expression, with an unnecessary AND. We want to avoid that?

I'm not sure it's worth the trouble.

> +    defaultPartOid = get_default_partition_oid(rel);
> +    if (OidIsValid(defaultPartOid))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> +                 errmsg("there exists a default partition for table
> \"%s\", cannot attach a new partition",
> +                        RelationGetRelationName(rel))));
> +
> Should be done before heap_open on the table being attached. If we are not
> going to attach the partition, there's no point in instantiating its relcache.

No, because we should take the lock before examining any properties of
the table.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Fri, Jun 16, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> Some more comments on the latest set of patches.
>>
>> In heap_drop_with_catalog(), we heap_open() the parent table to get the
>> default partition OID, if any. If the relcache doesn't have an entry for the
>> parent, this means that the entry will be created, only to be invalidated at
>> the end of the function. If there is no default partition, this all is
>> completely unnecessary. We should avoid heap_open() in this case. This also
>> means that get_default_partition_oid() should not rely on the relcache entry,
>> but should growl through pg_inherit to find the default partition.
>
> I am *entirely* unconvinced by this line of argument.  I think we want
> to open the relation the first time we touch it and pass the Relation
> around thereafter.

If this would be correct, why heap_drop_with_catalog() without this
patch just locks the parent and doesn't call a heap_open(). I am
missing something.

> Anything else is prone to accidentally failing to
> have the relation locked early enough,

We are locking the parent relation even without this patch, so this
isn't an issue.

> or looking up the OID in the
> relcache multiple times.

I am not able to understand this in the context of default partition.
After that nobody else is going to change its partitions and their
bounds (since both of those require heap_open on parent which would be
stuck on the lock we hold.). So, we have to check only once if the
table has a default partition. If it doesn't, it's not going to
acquire one unless we release the lock on the parent i.e at the end of
transaction. If it has one, it's not going to get dropped till the end
of the transaction for the same reason. I don't see where we are
looking up OIDs multiple times.


>
>> +    defaultPartOid = get_default_partition_oid(rel);
>> +    if (OidIsValid(defaultPartOid))
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>> +                 errmsg("there exists a default partition for table
>> \"%s\", cannot attach a new partition",
>> +                        RelationGetRelationName(rel))));
>> +
>> Should be done before heap_open on the table being attached. If we are not
>> going to attach the partition, there's no point in instantiating its relcache.
>
> No, because we should take the lock before examining any properties of
> the table.

There are three tables involved here. "rel" which is the partitioned
table. "attachrel" is the table being attached as a partition to "rel"
and defaultrel, which is the default partition table. If there exists
a default partition in "rel" we are not allowing "attachrel" to be
attached to "rel". If that's the case, we don't need to examine any
properties of "attachrel" and hence we don't need to instantiate
relcache of "attachrel". That's what the comment is about.
ATExecAttachPartition() receives "rel" as an argument which has been
already locked and opened. So, we can check the existence of default
partition right at the beginning of the function.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Kyotaro HORIGUCHI
Date:
Hello, I'd like to review this but it doesn't fit the master, as
Robert said. Especially the interface of predicate_implied_by is
changed by the suggested commit.

Anyway I have some comment on this patch with fresh eyes.  I
believe the basic design so my comment below are from a rather
micro viewpoint.

At Thu, 15 Jun 2017 16:01:53 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in
<a1267081-6e9a-e570-f6cf-34ff801bf503@lab.ntt.co.jp>
> Oops, I meant to send one more comment.
> 
> On 2017/06/15 15:48, Amit Langote wrote:
> > BTW, I noticed the following in 0002
> +                     errmsg("there exists a default partition for table \"%s\", cannot
> add a new partition",
> 
> This error message style seems novel to me.  I'm not sure about the best
> message text here, but maybe: "cannot add new partition to table \"%s\"
> with default partition"
> 
> Note that the comment applies to both DefineRelation and
> ATExecAttachPartition.

- Considering on how canSkipPartConstraintValidation is called, I *think* that RelationProvenValid() would be better.
(Butthis would be disappear by rebasing..)
 

- 0002 changes the interface of get_qual_for_list, but left get_qual_for_range alone.  Anyway get_qual_for_range will
haveto do the similar thing soon.
 

- In check_new_partition_bound, "overlap" and "with" is completely correlated with each other. "with > -1" means
"overlap= true". So overlap is not useless. ("with" would be better to be "overlap_with" or somehting if we remove
"overlap")

- The error message of check_default_allows_bound is below.
 "updated partition constraint for default partition \"%s\"  would be violated by some row"
 This looks an analog of validateCheckConstraint but as my understanding this function is called only when new
partitionis added. This would be difficult to recognize in the situation.
 
 "the default partition contains rows that should be in  the new partition: \"%s\""
 or something?

- In check_default_allows_bound, the iteration over partitions is quite similar to what validateCheckConstraint does.
Canwe somehow share validateCheckConstraint with this function?
 

- In the same function, skipping RELKIND_PARTITIONED_TABLE is okay, but silently ignoring RELKIND_FOREIGN_TABLE doesn't
seemgood. I think at least some warning should be emitted.
 
 "Skipping foreign tables in the defalut partition. It might  contain rows that should be in the new partition."
(Needs preventing multple warnings in single call, maybe)
 

- In the same function, the following condition seems somewhat strange in comparison to validateCheckConstraint.

> if (partqualstate && ExecCheck(partqualstate, econtext))
 partqualstate won't be null as long as partition_constraint is valid. Anyway (I'm believing that) an invalid
constraintresults in error by ExecPrepareExpr. Therefore 'if (partqualstate' is useless.
 

- In gram.y, the nonterminal for list spec clause is still "ForValues". It seems somewhat strange. partition_spec or
somethingwould be better.
 

- This is not a part of this patch, but in ruleutils.c, the error for unknown paritioning strategy is emitted as
following.

>   elog(ERROR, "unrecognized partition strategy: %d",
>        (int) strategy);
 The cast is added because the strategy is a char. I suppose this is because strategy can be an unprintable. I'd like
tosee a comment if it is correct.
 


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/06/16 14:16, Ashutosh Bapat wrote:
> On Fri, Jun 16, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> Some more comments on the latest set of patches.
>>>
>>> In heap_drop_with_catalog(), we heap_open() the parent table to get the
>>> default partition OID, if any. If the relcache doesn't have an entry for the
>>> parent, this means that the entry will be created, only to be invalidated at
>>> the end of the function. If there is no default partition, this all is
>>> completely unnecessary. We should avoid heap_open() in this case. This also
>>> means that get_default_partition_oid() should not rely on the relcache entry,
>>> but should growl through pg_inherit to find the default partition.
>>
>> I am *entirely* unconvinced by this line of argument.  I think we want
>> to open the relation the first time we touch it and pass the Relation
>> around thereafter.
> 
> If this would be correct, why heap_drop_with_catalog() without this
> patch just locks the parent and doesn't call a heap_open(). I am
> missing something.

As of commit c1e0e7e1d790bf, we avoid creating relcache entry for the
parent.  Before that commit, drop table
partitioned_table_with_many_partitions used to take too long and consumed
quite some memory as result of relcache invalidation requested at the end
on the parent table for every partition.

If this patch reintroduces the heap_open() on the parent table, that's
going to bring back the problem fixed by that commit.

>> Anything else is prone to accidentally failing to
>> have the relation locked early enough,
> 
> We are locking the parent relation even without this patch, so this
> isn't an issue.

Yes.

>> or looking up the OID in the
>> relcache multiple times.
> 
> I am not able to understand this in the context of default partition.
> After that nobody else is going to change its partitions and their
> bounds (since both of those require heap_open on parent which would be
> stuck on the lock we hold.). So, we have to check only once if the
> table has a default partition. If it doesn't, it's not going to
> acquire one unless we release the lock on the parent i.e at the end of
> transaction. If it has one, it's not going to get dropped till the end
> of the transaction for the same reason. I don't see where we are
> looking up OIDs multiple times.

Without heap_opening the parent, the only way is to look up parentOid's
children in pg_inherits and for each child looking up its pg_class tuple
in the syscache to see if its relpartbound indicates that it's a default
partition.  That seems like it won't be inexpensive either.

It would be nice if could get that information (that is - is a given
relation being heap_drop_with_catalog'd a partition of the parent that
happens to have default partition) in less number of steps than that.
Having that information in relcache is one way, but as mentioned, that
turns out be expensive.

Has anyone considered the idea of putting the default partition OID in the
pg_partitioned_table catalog?  Looking the above information up would
amount to one syscache lookup.  Default partition seems to be special
enough object to receive a place in the pg_partitioned_table tuple of the
parent.  Thoughts?

>>> +    defaultPartOid = get_default_partition_oid(rel);
>>> +    if (OidIsValid(defaultPartOid))
>>> +        ereport(ERROR,
>>> +                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>>> +                 errmsg("there exists a default partition for table
>>> \"%s\", cannot attach a new partition",
>>> +                        RelationGetRelationName(rel))));
>>> +
>>> Should be done before heap_open on the table being attached. If we are not
>>> going to attach the partition, there's no point in instantiating its relcache.
>>
>> No, because we should take the lock before examining any properties of
>> the table.
> 
> There are three tables involved here. "rel" which is the partitioned
> table. "attachrel" is the table being attached as a partition to "rel"
> and defaultrel, which is the default partition table. If there exists
> a default partition in "rel" we are not allowing "attachrel" to be
> attached to "rel". If that's the case, we don't need to examine any
> properties of "attachrel" and hence we don't need to instantiate
> relcache of "attachrel". That's what the comment is about.
> ATExecAttachPartition() receives "rel" as an argument which has been
> already locked and opened. So, we can check the existence of default
> partition right at the beginning of the function.

It seems that we are examining the properties of the parent table here
(whether it has default partition), which as Ashutosh mentions, is already
locked before we got to ATExecAttachPartition().  Another place where we
are ereporting before locking the table to be attached (actually even
before looking it up by name), based just on the properties of the parent
table, is in transformPartitionCmd():
   /* the table must be partitioned */   if (parentRel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
ereport(ERROR,              (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),                errmsg("\"%s\" is not
partitioned",                      RelationGetRelationName(parentRel))));
 

Thanks,
Amit




Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

Sorry for being away from here.
I had some issues with my laptop, and I have resumed working on this.

On Thu, Jun 15, 2017 at 1:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jun 14, 2017 at 8:02 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Here are the details of the patches in attached zip.
> 0001. refactoring existing ATExecAttachPartition  code so that it can be
> used for
> default partitioning as well
> 0002. support for default partition with the restriction of preventing
> addition
> of any new partition after default partition.
> 0003. extend default partitioning support to allow addition of new
> partitions.
> 0004. extend default partitioning validation code to reuse the refactored
> code
> in patch 0001.

I think the core ideas of this patch are pretty solid now.  It's come
a long way in the last month.  High-level comments:
 
Thanks Robert for looking into this.
 
- Needs to be rebased over b08df9cab777427fdafe633ca7b8abf29817aa55.

Will rebase.
 
- Still no documentation.
- Should probably be merged with the patch to add default partitioning
for ranges.
Will try to get this soon.

Regards,
Jeevan Ladhe 

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Amit,

On Thu, Jun 15, 2017 at 12:31 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Oops, I meant to send one more comment.

On 2017/06/15 15:48, Amit Langote wrote:
> BTW, I noticed the following in 0002
+                                        errmsg("there exists a default partition for table \"%s\", cannot
add a new partition",

This error message style seems novel to me.  I'm not sure about the best
message text here, but maybe: "cannot add new partition to table \"%s\"
with default partition"

This sounds confusing to me, what about something like:
"\"%s\" has a default partition, cannot add a new partition."

Note that this comment belongs to patch 0002, and it will go away
in case we are going to have extended functionality i.e. patch 0003,
as in that patch we allow user to create a new partition even in the
cases when there exists a default partition.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Thanks Ashutosh and Kyotaro for reviewing further.
I shall address your comments in next version of my patch.

Regards,
Jeevan Ladhe

On Fri, Jun 16, 2017 at 1:46 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello, I'd like to review this but it doesn't fit the master, as
Robert said. Especially the interface of predicate_implied_by is
changed by the suggested commit.

Anyway I have some comment on this patch with fresh eyes.  I
believe the basic design so my comment below are from a rather
micro viewpoint.

At Thu, 15 Jun 2017 16:01:53 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <a1267081-6e9a-e570-f6cf-34ff801bf503@lab.ntt.co.jp>
> Oops, I meant to send one more comment.
>
> On 2017/06/15 15:48, Amit Langote wrote:
> > BTW, I noticed the following in 0002
> +                                      errmsg("there exists a default partition for table \"%s\", cannot
> add a new partition",
>
> This error message style seems novel to me.  I'm not sure about the best
> message text here, but maybe: "cannot add new partition to table \"%s\"
> with default partition"
>
> Note that the comment applies to both DefineRelation and
> ATExecAttachPartition.

- Considering on how canSkipPartConstraintValidation is called, I
  *think* that RelationProvenValid() would be better.  (But this
  would be disappear by rebasing..)

- 0002 changes the interface of get_qual_for_list, but left
  get_qual_for_range alone.  Anyway get_qual_for_range will have
  to do the similar thing soon.

- In check_new_partition_bound, "overlap" and "with" is
  completely correlated with each other. "with > -1" means
  "overlap = true". So overlap is not useless. ("with" would be
  better to be "overlap_with" or somehting if we remove
  "overlap")

- The error message of check_default_allows_bound is below.

  "updated partition constraint for default partition \"%s\"
   would be violated by some row"

  This looks an analog of validateCheckConstraint but as my
  understanding this function is called only when new partition
  is added. This would be difficult to recognize in the
  situation.

  "the default partition contains rows that should be in
   the new partition: \"%s\""

  or something?

- In check_default_allows_bound, the iteration over partitions is
  quite similar to what validateCheckConstraint does. Can we
  somehow share validateCheckConstraint with this function?

- In the same function, skipping RELKIND_PARTITIONED_TABLE is
  okay, but silently ignoring RELKIND_FOREIGN_TABLE doesn't seem
  good. I think at least some warning should be emitted.

  "Skipping foreign tables in the defalut partition. It might
   contain rows that should be in the new partition."  (Needs
   preventing multple warnings in single call, maybe)

- In the same function, the following condition seems somewhat
  strange in comparison to validateCheckConstraint.

> if (partqualstate && ExecCheck(partqualstate, econtext))

  partqualstate won't be null as long as partition_constraint is
  valid. Anyway (I'm believing that) an invalid constraint
  results in error by ExecPrepareExpr. Therefore 'if
  (partqualstate' is useless.

- In gram.y, the nonterminal for list spec clause is still
  "ForValues". It seems somewhat strange. partition_spec or
  something would be better.

- This is not a part of this patch, but in ruleutils.c, the error
  for unknown paritioning strategy is emitted as following.

>   elog(ERROR, "unrecognized partition strategy: %d",
>        (int) strategy);

  The cast is added because the strategy is a char. I suppose
  this is because strategy can be an unprintable. I'd like to see
  a comment if it is correct.


regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


Re: [HACKERS] Adding support for Default partition in partitioning

From
Amit Langote
Date:
On 2017/06/21 21:37, Jeevan Ladhe wrote:
> Hi Amit,
> 
> On Thu, Jun 15, 2017 at 12:31 PM, Amit Langote <
> Langote_Amit_f8@lab.ntt.co.jp> wrote:
> 
>> Oops, I meant to send one more comment.
>>
>> On 2017/06/15 15:48, Amit Langote wrote:
>>> BTW, I noticed the following in 0002
>> +                                        errmsg("there exists a default
>> partition for table \"%s\", cannot
>> add a new partition",
>>
>> This error message style seems novel to me.  I'm not sure about the best
>> message text here, but maybe: "cannot add new partition to table \"%s\"
>> with default partition"
>>
> 
> This sounds confusing to me, what about something like:
> "\"%s\" has a default partition, cannot add a new partition."

It's the comma inside the error message that suggests to me that it's a
style that I haven't seen elsewhere in the backend code.  The primary
error message here is that the new partition cannot be created.  "%s has
default partition" seems to me to belong in errdetail() (see "What Goes
Where" in [1].)

Or write the sentence such that the comma is not required.  Anyway, we can
leave this for the committer to decide.

> Note that this comment belongs to patch 0002, and it will go away
> in case we are going to have extended functionality i.e. patch 0003,
> as in that patch we allow user to create a new partition even in the
> cases when there exists a default partition.

Oh, that'd be great.  It's always better to get rid of the error
conditions that are hard to communicate to users. :)  (Although, this
one's not that ambiguous.)

Thanks,
Amit

[1] https://www.postgresql.org/docs/devel/static/error-style-guide.html




Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, Jun 21, 2017 at 8:47 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> It's the comma inside the error message that suggests to me that it's a
> style that I haven't seen elsewhere in the backend code.

Exactly.  Avoid that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

On Mon, Jun 19, 2017 at 12:34 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/06/16 14:16, Ashutosh Bapat wrote:
> On Fri, Jun 16, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> Some more comments on the latest set of patches.

>> or looking up the OID in the
>> relcache multiple times.
>
> I am not able to understand this in the context of default partition.
> After that nobody else is going to change its partitions and their
> bounds (since both of those require heap_open on parent which would be
> stuck on the lock we hold.). So, we have to check only once if the
> table has a default partition. If it doesn't, it's not going to
> acquire one unless we release the lock on the parent i.e at the end of
> transaction. If it has one, it's not going to get dropped till the end
> of the transaction for the same reason. I don't see where we are
> looking up OIDs multiple times.

Without heap_opening the parent, the only way is to look up parentOid's
children in pg_inherits and for each child looking up its pg_class tuple
in the syscache to see if its relpartbound indicates that it's a default
partition.  That seems like it won't be inexpensive either.

It would be nice if could get that information (that is - is a given
relation being heap_drop_with_catalog'd a partition of the parent that
happens to have default partition) in less number of steps than that.
Having that information in relcache is one way, but as mentioned, that
turns out be expensive.

Has anyone considered the idea of putting the default partition OID in the
pg_partitioned_table catalog?  Looking the above information up would
amount to one syscache lookup.  Default partition seems to be special
enough object to receive a place in the pg_partitioned_table tuple of the
parent.  Thoughts?
 
I liked this suggestion. Having an entry in pg_partitioned_table would avoid
both expensive methods, i.e. 1. opening the parent or 2. lookup for
each of the children first in pg_inherits and then its corresponding entry in
pg_class.
Unless anybody has any other suggestions/comments here, I am going to
implement this suggestion.

Thanks,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

I have worked further on V21 patch set, rebased it on latest master commit,
addressed the comments given by Robert, Ashutosh and others.

The attached tar has a series of 7 patches.
Here is a brief of these 7 patches:

0001:
Refactoring existing ATExecAttachPartition  code so that it can be used for
default partitioning as well

0002:
This patch teaches the partitioning code to handle the NIL returned by
get_qual_for_list().
This is needed because a default partition will not have any constraints in case
it is the only partition of its parent.

0003:
Support for default partition with the restriction of preventing addition of any
new partition after default partition.

0004:
Store the default partition OID in pg_partition_table, this will help us to
retrieve the OID of default relation when we don't have the relation cache
available. This was also suggested by Amit Langote here[1].

0005:
Extend default partitioning support to allow addition of new partitions.

0006:
Extend default partitioning validation code to reuse the refactored code in
patch 0001. 

0007:
This patch introduces code to check if the scanning of default partition child
can be skipped if it's constraints are proven.

TODO:
Add documentation.
Merge default range partitioning patch.

Regards,
Jeevan Ladhe

On Fri, Jun 30, 2017 at 5:48 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi,

On Mon, Jun 19, 2017 at 12:34 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/06/16 14:16, Ashutosh Bapat wrote:
> On Fri, Jun 16, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jun 15, 2017 at 12:54 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> Some more comments on the latest set of patches.

>> or looking up the OID in the
>> relcache multiple times.
>
> I am not able to understand this in the context of default partition.
> After that nobody else is going to change its partitions and their
> bounds (since both of those require heap_open on parent which would be
> stuck on the lock we hold.). So, we have to check only once if the
> table has a default partition. If it doesn't, it's not going to
> acquire one unless we release the lock on the parent i.e at the end of
> transaction. If it has one, it's not going to get dropped till the end
> of the transaction for the same reason. I don't see where we are
> looking up OIDs multiple times.

Without heap_opening the parent, the only way is to look up parentOid's
children in pg_inherits and for each child looking up its pg_class tuple
in the syscache to see if its relpartbound indicates that it's a default
partition.  That seems like it won't be inexpensive either.

It would be nice if could get that information (that is - is a given
relation being heap_drop_with_catalog'd a partition of the parent that
happens to have default partition) in less number of steps than that.
Having that information in relcache is one way, but as mentioned, that
turns out be expensive.

Has anyone considered the idea of putting the default partition OID in the
pg_partitioned_table catalog?  Looking the above information up would
amount to one syscache lookup.  Default partition seems to be special
enough object to receive a place in the pg_partitioned_table tuple of the
parent.  Thoughts?
 
I liked this suggestion. Having an entry in pg_partitioned_table would avoid
both expensive methods, i.e. 1. opening the parent or 2. lookup for
each of the children first in pg_inherits and then its corresponding entry in
pg_class.
Unless anybody has any other suggestions/comments here, I am going to
implement this suggestion.

Thanks,
Jeevan Ladhe

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Robert,

I have tried to address your comments in the V22 set of patches[1].
Please find my feedback inlined on your comments.


On Thu, Jun 15, 2017 at 1:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
- Needs to be rebased over b08df9cab777427fdafe633ca7b8abf29817aa55.

Rebased on master latest commit: ca793c59a51e94cedf8cbea5c29668bf8fa298f3
 
- Still no documentation.
Yes, this is long pending, and I will make this is a priority to get it included
in next set of my patches.

- Should probably be merged with the patch to add default partitioning
for ranges.

Beena is already rebasing her patch on my latest patches, so I think getting
them merged here won't be an issue, mostly will be just like one more patch
on top my patches.
  
Other stuff I noticed:

- The regression tests don't seem to check that the scan-skipping
logic works as expected.  We have regression tests for that case for
attaching regular partitions, and it seems like it would be worth
testing the default-partition case as well.

Added a test case for default in alter_table.sql.

- check_default_allows_bound() assumes that if
canSkipPartConstraintValidation() fails for the default partition, it
will also fail for every subpartition of the default partition.  That
is, once we commit to scanning one child partition, we're committed to
scanning them all.  In practice, that's probably not a huge
limitation, but if it's not too much code, we could keep the top-level
check but also check each partitioning individually as we reach it,
and skip the scan for any individual partitions for which the
constraint can be proven.  For example, suppose the top-level table is
list-partitioned with a partition for each of the most common values,
and then we range-partition the default partition.
 
I have tried to address this in patch 0007, please let me know your views on
that patch.

- The changes to the regression test results in 0004 make the error
messages slightly worse.  The old message names the default partition,
whereas the new one does not.  Maybe that's worth avoiding.
 
The only way for this, I can think of to achieve this is to store the name of
the default relation in AlteredTableInfo, currently I am using a flag for
realizing if the scanned table is a default partition to throw specific error.
But, IMO storing a string in AlteredTableInfo just for error purpose might be
overkill. Your suggestions? 
 
Specific comments:

+ * Also, invalidate the parent's and a sibling default partition's relcache,
+ * so that the next rebuild will load the new partition's info into parent's
+ * partition descriptor and default partition constraints(which are dependent
+ * on other partition bounds) are built anew.

I find this a bit unclear, and it also repeats the comment further
down.  Maybe something like: Also, invalidate the parent's relcache
entry, so that the next rebuild will load he new partition's info into
its partition descriptor.  If there is a default partition, we must
invalidate its relcache entry as well.

Done.
 
+    /*
+     * The default partition constraints depend upon the partition bounds of
+     * other partitions. Adding a new(or even removing existing) partition
+     * would invalidate the default partition constraints. Invalidate the
+     * default partition's relcache so that the constraints are built anew and
+     * any plans dependent on those constraints are invalidated as well.
+     */

Here, I'd write: The partition constraint for the default partition
depends on the partition bounds of every other partition, so we must
invalidate the relcache entry for that partition every time a
partition is added or removed.

Done.
 
+                    /*
+                     * Default partition cannot be added if there already
+                     * exists one.
+                     */
+                    if (spec->is_default)
+                    {
+                        overlap = partition_bound_has_default(boundinfo);
+                        with = boundinfo->default_index;
+                        break;
+                    }

To support default partitioning for range, this is going to have to be
moved above the switch rather than done inside of it.  And there's
really no downside to putting it there.

Done.
 
+ * constraint, by *proving* that the existing constraints of the table
+ * *imply* the given constraints.  We include the table's check constraints and

Both the comma and the asterisks are unnecessary.

Done.
 
+ * Check whether all rows in the given table (scanRel) obey given partition

obey the given

I think the larger comment block could be tightened up a bit, like
this:  Check whether all rows in the given table obey the given
partition constraint; if so, it can be attached as a partition.  We do
this by scanning the table (or all of its leaf partitions) row by row,
except when the existing constraints are sufficient to prove that the
new partitioning constraint must already hold.

Done.
 
+    /* Check if we can do away with having to scan the table being attached. */

If possible, skip the validation scan.

Fixed.
 
+     * Set up to have the table be scanned to validate the partition
+     * constraint If it's a partitioned table, we instead schedule its leaf
+     * partitions to be scanned.

I suggest: Prepare to scan the default partition (or, if it is itself
partitioned, all of its leaf partitions).

Done.
 
+    int         default_index;  /* Index of the default partition if any; -1
+                                 * if there isn't one */

"if any" is a bit redundant with "if there isn't one"; note the
phrasing of the preceding entry.

Done.
 
+        /*
+         * Skip if it's a partitioned table. Only RELKIND_RELATION relations
+         * (ie, leaf partitions) need to be scanned.
+         */
+        if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
+            part_rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)

The comment talks about what must be included in our list of things to
scan, but the code tests for the things that can be excluded.  I
suspect the comment has the right idea and the code should be adjusted
to match, but anyway they should be consistent.  Also, the correct way
to punctuate i.e. is like this: (i.e. leaf partitions) You should have
a period after each letter, but no following comma.

Done.
 
+     * The default partition must be already having an AccessExclusiveLock.

I think we should instead change DefineRelation to open (rather than
just lock) the default partition and pass the Relation as an argument
here so that we need not reopen it.

I have fixed this as a part of patch 0006.
 
+            /* Construct const from datum */
+            val = makeConst(key->parttypid[0],
+                            key->parttypmod[0],
+                            key->parttypcoll[0],
+                            key->parttyplen[0],
+                            *boundinfo->datums[i],
+                            false,      /* isnull */
+                            key->parttypbyval[0] /* byval */ );

The /* byval */ comment looks a bit redundant, but I think this could
use a comment along the lines of: /* Only single-column list
partitioning is supported, so we only need to worry about the
partition key with index 0. */  And I'd also add an Assert() verifying
the the partition key has exactly 1 column, so that this breaks a bit
more obviously if someone removes that restriction in the future.
 
Removed the /* byval */ comment.
The assert is taken care as part of commit 5efccc1cb43005a832776ed9158d2704fd976f8f. 


+         * Handle NULL partition key here if there's a null-accepting list
+         * partition, else later it will be routed to the default partition if
+         * one exists.

This isn't a great update of the existing comment -- it's drifted from
explaining the code to which it is immediately attached to a more
general discussion of NULL handling.  I'd just say something like: If
this is a NULL, send it to the null-accepting partition.  Otherwise,
route by searching the array of partition bounds.

Done.
 
+                if (tab->is_default_partition)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                             errmsg("updated partition constraint for
default partition would be violated by some row")));
+                else
+                    ereport(ERROR,
+                            (errcode(ERRCODE_CHECK_VIOLATION),

While there's room for debate about the correct error code here, it's
hard for me to believe that it's correct to use one error code for the
is_default_partition case and a different error code the rest of the
time.

Per discussion here[2], I had changed this error code, but as of now I have
restored this to ERRCODE_CHECK_VIOLATION to be consistent with the error when
non-default partition being attached has some existing row that violates
partition constraints. Similarly, for consistency I have changed this in
check_default_allows_bound() too.
I agree that there is still a room for debate here after this change too, and
also this change reverts the suggestion by Ashutosh.

+         * previously cached default partition constraints; those constraints
+         * won't stand correct after addition(or even removal) of a partition.

won't be correct after addition or removal

Done.
 

+         * allow any row that qualifies for this new partition. So, check if
+         * the existing data in the default partition satisfies this *would be*
+         * default partition constraint.

check that the existing data in the default partition satisfies the
constraint as it will exist after adding this partition

Done.
 

+     * Need to take a lock on the default partition, refer comment for locking
+     * the default partition in DefineRelation().

I'd say: We must also lock the default partition, for the same reasons
explained in DefineRelation().

And similarly in the other places that refer to that same comment.

Done.
 

+    /*
+     * In case of the default partition, the constraint is of the form
+     * "!(result)" i.e. one of the following two forms:
+     * 1. NOT ((keycol IS NULL) OR (keycol = ANY (arr)))
+     * 2. NOT ((keycol IS NOT NULL) AND (keycol = ANY (arr))), where arr is an
+     * array of datums in boundinfo->datums.
+     */

Does this survive pgindent?  You might need to surround the comment
with dashes to preserve formatting.

Yes, this din't survive pg_indent, but even adding dashes '--' did not make
the deal(may be I misunderstood the workaround), I have instead added
blank line in the bullets.

I think it would be worth adding a little more text this comment,
something like this: Note that, in general, applying NOT to a
constraint expression doesn't necessarily invert the set of rows it
accepts, because NOT NULL is NULL.  However, the partition constraints
we construct here never evaluate to NULL, so applying NOT works as
intended.

Added.
 
+     * Check whether default partition has a row that would fit the partition
+     * being attached by negating the partition constraint derived from the
+     * bounds. Since default partition is already part of the partitioned
+     * table, we don't need to validate the constraints on the partitioned
+     * table.

Here again, I'd add to the end of the first sentence a parenthetical
note, like this: ...from the bounds (the partition constraint never
evaluates to NULL, so negating it like this is safe).

Done.
 
I don't understand the second sentence.  It seems to contradict the first one.

Fixed, I removed the second sentence.
 
+extern List *get_default_part_validation_constraint(List *new_part_constaints);
 #endif   /* PARTITION_H */

There should be a blank line after the last prototype and before the #endif.

+-- default partition table when it is being used in cahced plan.

Typo.
 
Fixed.

Thanks,
Jeevan Ladhe
 

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Ashutosh,

I have tried to address your comments in the V22 set of patches[1].
Please find my feedback inlined on your comments.

On Thu, Jun 15, 2017 at 10:24 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
Some more comments on the latest set of patches.

In heap_drop_with_catalog(), we heap_open() the parent table to get the
default partition OID, if any. If the relcache doesn't have an entry for the
parent, this means that the entry will be created, only to be invalidated at
the end of the function. If there is no default partition, this all is
completely unnecessary. We should avoid heap_open() in this case. This also
means that get_default_partition_oid() should not rely on the relcache entry,
but should growl through pg_inherit to find the default partition.
 
Instead of reading the defaultOid from cache, as suggested by Amit here[2], now
I have stored this in pg_partition_table, and reading it from there.


In get_qual_for_list(), if the table has only default partition, it won't have
any boundinfo. In such a case the default partition's constraint would come out
as (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[]::integer[])))). The empty array
looks odd and may be we spend a few CPU cycles executing ANY on an empty array.
We have the same problem with a partition containing only NULL value. So, may
be this one is not that bad.

Fixed.
 
Please add a testcase to test addition of default partition as the first
partition.

Added this in insert.sql rather than create_table.sql, as the purpose here
is to test if default being a first partition accepts any values for the key
including null.
 
get_qual_for_list() allocates the constant expressions corresponding to the
datums in CacheMemoryContext while constructing constraints for a default
partition. We do not do this for other partitions. We may not be constructing
the constraints for saving in the cache. For example, ATExecAttachPartition
constructs the constraints for validation. In such a case, this code will
unnecessarily clobber the cache memory. generate_partition_qual() copies the
partition constraint in the CacheMemoryContext.

Removed CacheMemoryContext.
I thought once the partition qual is generated, it should be in remain in
the memory context, but when it is needed, it is indirectly taken care by
generate_partition_qual() in following code:

/* Save a copy in the relcache */
oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
rel->rd_partcheck = copyObject(result);
MemoryContextSwitchTo(oldcxt);
 

+   if (spec->is_default)
+   {
+       result = list_make1(make_ands_explicit(result));
+       result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
+   }

If the "result" is an OR expression, calling make_ands_explicit() on it would
create AND(OR(...)) expression, with an unnecessary AND. We want to avoid that?


Actually the OR expression here is generated using a call to makeBoolExpr(),
which returns a single expression node, and if this is passed to
make_ands_explicit(), it checks if the list length is node, returns the initial
node itself, and hence AND(OR(...)) kind of expression is not generated here.
 
+       if (cur_index < 0 && (partition_bound_has_default(partdesc->boundinfo)))
+           cur_index = partdesc->boundinfo->default_index;
+
The partition_bound_has_default() check is unnecessary since we check for
cur_index < 0 next anyway.

Done.
 
+ *
+ * Given the parent relation checks if it has default partition, and if it
+ * does exist returns its oid, otherwise returns InvalidOid.
+ */
May be reworded as "If the given relation has a default partition, this
function returns the OID of the default partition. Otherwise it returns
InvalidOid."

I have reworded this to:
"If the given relation has a default partition return the OID of the default
partition, otherwise return InvalidOid."
 
+Oid
+get_default_partition_oid(Relation parent)
+{
+   PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+
+   if (partdesc->boundinfo && partition_bound_has_default(partdesc->boundinfo))
+       return partdesc->oids[partdesc->boundinfo->default_index];
+
+   return InvalidOid;
+}
An unpartitioned table would not have partdesc set set. So, this function will
segfault if we pass an unpartitioned table. Either Assert that partdesc should
exist or check for its NULL-ness.

Fixed.
 


+    defaultPartOid = get_default_partition_oid(rel);
+    if (OidIsValid(defaultPartOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                 errmsg("there exists a default partition for table
\"%s\", cannot attach a new partition",
+                        RelationGetRelationName(rel))));
+
Should be done before heap_open on the table being attached. If we are not
going to attach the partition, there's no point in instantiating its relcache.

Fixed.
 

The comment in heap_drop_with_catalog() should mention why we lock the default
partition before locking the table being dropped.

 extern List *preprune_pg_partitions(PlannerInfo *root, RangeTblEntry *rte,
                        Index rti, Node *quals, LOCKMODE lockmode);
-
 #endif   /* PARTITION_H */
Unnecessary hunk.

I could not locate this hunk.

Regards,
Jeevan Ladhe

Refs:

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

I have tried to address your comments in the V22 set of patches[1].
Please find my feedback inlined on your comments.

On Fri, Jun 16, 2017 at 1:46 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello, I'd like to review this but it doesn't fit the master, as
Robert said. Especially the interface of predicate_implied_by is
changed by the suggested commit.

Anyway I have some comment on this patch with fresh eyes.  I
believe the basic design so my comment below are from a rather
micro viewpoint.

- Considering on how canSkipPartConstraintValidation is called, I
  *think* that RelationProvenValid() would be better.  (But this
  would be disappear by rebasing..)

I think RelationProvenValid() is bit confusing in the sense that, it does not
imply the meaning that some constraint is being checke
 
- 0002 changes the interface of get_qual_for_list, but left
  get_qual_for_range alone.  Anyway get_qual_for_range will have
  to do the similar thing soon.

Yes, this will be taken care with default partition for range.
 
- In check_new_partition_bound, "overlap" and "with" is
  completely correlated with each other. "with > -1" means
  "overlap = true". So overlap is not useless. ("with" would be
  better to be "overlap_with" or somehting if we remove
  "overlap")

Agree, probably this can be taken as a separate refactoring patch. Currently,
for in case of default I have got rid of "overlap", and now use of "with" and
that is also used just for code simplification.
 
- The error message of check_default_allows_bound is below.

  "updated partition constraint for default partition \"%s\"
   would be violated by some row"

  This looks an analog of validateCheckConstraint but as my
  understanding this function is called only when new partition
  is added. This would be difficult to recognize in the
  situation.

  "the default partition contains rows that should be in
   the new partition: \"%s\""

  or something?

I think the current error message is more clearer. Agree that there might be
sort of confusion if it's due to addition or attach partition, but we have
already stretched the message longer. I am open to suggestions here.
 
- In check_default_allows_bound, the iteration over partitions is
  quite similar to what validateCheckConstraint does. Can we
  somehow share validateCheckConstraint with this function?

May be we can, but I think again this can also be categorized as refactoring
patch and done later maybe. Your thoughts?
 
- In the same function, skipping RELKIND_PARTITIONED_TABLE is
  okay, but silently ignoring RELKIND_FOREIGN_TABLE doesn't seem
  good. I think at least some warning should be emitted.

  "Skipping foreign tables in the defalut partition. It might
   contain rows that should be in the new partition."  (Needs
   preventing multple warnings in single call, maybe)

Currently we do not emit any warning when attaching a foreign table as a
non-default partition having rows that do not match its partition constraints
and we still let attach the partition.
But, I agree that we should emit such a warning, I added a code to do this. 
 
- In the same function, the following condition seems somewhat
  strange in comparison to validateCheckConstraint.

> if (partqualstate && ExecCheck(partqualstate, econtext))

  partqualstate won't be null as long as partition_constraint is
  valid. Anyway (I'm believing that) an invalid constraint
  results in error by ExecPrepareExpr. Therefore 'if
  (partqualstate' is useless.

Removed the check for partqualstate.
 
- In gram.y, the nonterminal for list spec clause is still
  "ForValues". It seems somewhat strange. partition_spec or
  something would be better.

Done.
Thanks for catching this, I agree with you.
I have changed the name to PartitionBoundSpec.
 
- This is not a part of this patch, but in ruleutils.c, the error
  for unknown paritioning strategy is emitted as following.

>   elog(ERROR, "unrecognized partition strategy: %d",
>        (int) strategy);

  The cast is added because the strategy is a char. I suppose
  this is because strategy can be an unprintable. I'd like to see
  a comment if it is correct.

I think this should be taken separately. 

Thanks,
Jeevan Ladhe

Refs:

Re: [HACKERS] Adding support for Default partition in partitioning

From
Beena Emerson
Date:
Hello,

On Thu, Jul 13, 2017 at 1:22 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
>> - Should probably be merged with the patch to add default partitioning
>> for ranges.
>
>
> Beena is already rebasing her patch on my latest patches, so I think getting
> them merged here won't be an issue, mostly will be just like one more patch
> on top my patches.
>

I have posted the updated patch which can be applied over the v22
patches submitted here.
https://www.postgresql.org/message-id/CAOG9ApGEZxSQD-ZD3icj_CwTmprSGG7sZ_r3k9m4rmcc6ozr%3Dg%40mail.gmail.com

Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

On Thu, Jul 13, 2017 at 1:01 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi,

I have worked further on V21 patch set, rebased it on latest master commit,
addressed the comments given by Robert, Ashutosh and others.

The attached tar has a series of 7 patches.
Here is a brief of these 7 patches:

0001:
Refactoring existing ATExecAttachPartition  code so that it can be used for
default partitioning as well

0002:
This patch teaches the partitioning code to handle the NIL returned by
get_qual_for_list().
This is needed because a default partition will not have any constraints in case
it is the only partition of its parent.

0003:
Support for default partition with the restriction of preventing addition of any
new partition after default partition.

0004:
Store the default partition OID in pg_partition_table, this will help us to
retrieve the OID of default relation when we don't have the relation cache
available. This was also suggested by Amit Langote here[1].

0005:
Extend default partitioning support to allow addition of new partitions.

0006:
Extend default partitioning validation code to reuse the refactored code in
patch 0001. 

0007:
This patch introduces code to check if the scanning of default partition child
can be skipped if it's constraints are proven.

TODO:
Add documentation.

I have added a documentation patch(patch 0008) to the existing set of patches.
PFA.
 
Merge default range partitioning patch.
 
Beena has created a patch on top of my patches here[1].


Regards,
Jeevan Ladhe
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

I have rebased the patches on the latest commit.

PFA.

Regards,
Jeevan Ladhe

On Thu, Jul 20, 2017 at 6:47 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi,

On Thu, Jul 13, 2017 at 1:01 AM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi,

I have worked further on V21 patch set, rebased it on latest master commit,
addressed the comments given by Robert, Ashutosh and others.

The attached tar has a series of 7 patches.
Here is a brief of these 7 patches:

0001:
Refactoring existing ATExecAttachPartition  code so that it can be used for
default partitioning as well

0002:
This patch teaches the partitioning code to handle the NIL returned by
get_qual_for_list().
This is needed because a default partition will not have any constraints in case
it is the only partition of its parent.

0003:
Support for default partition with the restriction of preventing addition of any
new partition after default partition.

0004:
Store the default partition OID in pg_partition_table, this will help us to
retrieve the OID of default relation when we don't have the relation cache
available. This was also suggested by Amit Langote here[1].

0005:
Extend default partitioning support to allow addition of new partitions.

0006:
Extend default partitioning validation code to reuse the refactored code in
patch 0001. 

0007:
This patch introduces code to check if the scanning of default partition child
can be skipped if it's constraints are proven.

TODO:
Add documentation.

I have added a documentation patch(patch 0008) to the existing set of patches.
PFA.
 
Merge default range partitioning patch.
 
Beena has created a patch on top of my patches here[1].


Regards,
Jeevan Ladhe

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Wed, Jul 26, 2017 at 5:44 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patches on the latest commit.
>

Thanks for rebasing the patches. The patches apply and compile
cleanly. make check passes.

Here are some review comments
0001 patch
Most of this patch is same as 0002 patch posted in thread [1]. I have
extensively reviewed that patch for Amit Langote. Can you please compare these
two patches and try to address those comments OR just use patch from that
thread? For example, canSkipPartConstraintValidation() is named as
PartConstraintImpliedByRelConstraint() in that patch. OR
+    if (scanRel_constr == NULL)
+        return false;
+
is not there in that patch since returning false is wrong when partConstraint
is NULL. I think this patch needs those fixes. Also, this patch set would need
a rebase when 0001 from that thread gets committed.

0002 patch
+        if (!and_args)
+            result = NULL;
Add "NULL, if there are not partition constraints e.g. in case of default
partition as the only partition.". This patch avoids calling
validatePartitionConstraints() and hence canSkipPartConstraintValidation() when
partConstraint is NULL, but patches in [1] introduce more callers of
canSkipPartConstraintValidation() which may pass NULL. So, it's better that we
handle that case.

0003 patch
+        parentRel = heap_open(parentOid, AccessExclusiveLock);
In [2], Amit Langote has given a reason as to why heap_drop_with_catalog()
should not heap_open() the parent relation. But this patch still calls
heap_open() without giving any counter argument. Also I don't see
get_default_partition_oid() using Relation anywhere. If you remove that
heap_open() please remove following heap_close().
+        heap_close(parentRel, NoLock);

+                        /*
+                         * The default partition accepts any non-specified
+                         * value, hence it should not get a mapped index while
+                         * assigning those for non-null datums.
+                         */
Instead of "any non-specified value", you may want to use "any value not
specified in the lists of other partitions" or something like that.

+         * If this is a NULL, route it to the null-accepting partition.
+         * Otherwise, route by searching the array of partition bounds.
You may want to write it as "If this is a null partition key, ..." to clarify
what's NULL.

+         * cur_index < 0 means we could not find a non-default partition of
+         * this parent. cur_index >= 0 means we either found the leaf
+         * partition, or the next parent to find a partition of.
+         *
+         * If we couldn't find a non-default partition check if the default
+         * partition exists, if it does, get its index.
In order to avoid repeating "we couldn't find a ..."; you may want to add ",
try default partition if one exists." in the first sentence itself.

get_default_partition_oid() is defined in this patch and then redefined in
0004. Let's define it only once, mostly in or before 0003 patch.

+         * partition strategy. Assign the parent strategy to the default
s/parent/parent's/

+-- attaching default partition overlaps if the default partition already exists
+CREATE TABLE def_part PARTITION OF list_parted DEFAULT;
+CREATE TABLE fail_def_part (LIKE part_1 INCLUDING CONSTRAINTS);
+ALTER TABLE list_parted ATTACH PARTITION fail_def_part DEFAULT;
+ERROR:  cannot attach a new partition to table "list_parted" having a
default partition
For 0003 patch this testcase is same as the testcase in the next hunk; no new
partition can be added after default partition. Please add this testcase in
next set of patches.

+-- fail
+insert into part_default values ('aa', 2);
May be explain why the insert should fail. "A row, which would fit
other partition, does not fit default partition, even when inserted directly"
or something like that. I see that many of the tests in that file do not
explain why something should "fail" or be "ok", but may be it's better to
document the reason for better readability and future reference.

+-- check in case of multi-level default partitioned table
s/in/the/ ?. Or you may want to reword it as "default partitioned partition in
multi-level partitioned table" as there is nothing like "default partitioned
table". May be we need a testcase where every level of a multi-level
partitioned table has a default partition.

+-- drop default, as we need to add some more partitions to test tuple routing
Should be clubbed with the actual DROP statement?

+-- Check that addition or removal of any partition is correctly dealt with by
+-- default partition table when it is being used in cached plan.
Plan of a prepared statement gets cached only after it's executed 5 times.
Before that the statement gets invalidated but there's not cached plan that
gets invalidated. The test is fine here, but in order to test the cached plan
as mentioned in the comment, you will need to execute the statement 5 times
before executing drop statement. That's probably unnecessary, so just modify
the comment to say "prepared statements instead of cached plan".

0004 patch
The patch adds another column partdefid to catalog pg_partitioned_table. The
column gives OID of the default partition for a given partitioned table. This
means that the default partition's OID is stored at two places 1. in the
default partition table's pg_class entry and in pg_partitioned_table. There is
no way to detect when these two go out of sync. Keeping those two in sync is
also a maintenance burdern. Given that default partition's OID is required only
while adding/dropping a partition, which is a less frequent operation, it won't
hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out
the default partition's OID. That will be occasional performance hit
worth the otherwise maintenance burden.

I haven't reviewed next two patches, but those patches depend upon
some of the comments above. So, it's better to consider these comments
before looking at those patches.

[1] https://www.postgresql.org/message-id/cee32590-68a7-8b56-5213-e07d9b8ab89e@lab.ntt.co.jp
[2] https://www.postgresql.org/message-id/35d68d49-555f-421a-99f8-185a44d085a4@lab.ntt.co.jp



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Fri, Jul 28, 2017 at 9:30 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> 0004 patch
> The patch adds another column partdefid to catalog pg_partitioned_table. The
> column gives OID of the default partition for a given partitioned table. This
> means that the default partition's OID is stored at two places 1. in the
> default partition table's pg_class entry and in pg_partitioned_table. There is
> no way to detect when these two go out of sync. Keeping those two in sync is
> also a maintenance burdern. Given that default partition's OID is required only
> while adding/dropping a partition, which is a less frequent operation, it won't
> hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out
> the default partition's OID. That will be occasional performance hit
> worth the otherwise maintenance burden.

Performance isn't the only consideration here.  We also need to think
about locking and concurrency.  I think that most operations that
involve locking the parent will also involve locking the default
partition.  However, we can't safely build a relcache entry for a
relation before we've got some kind of lock on it.  We can't assume
that there is no concurrent DDL going on before we take some lock.  We
can't assume invalidation messages are processed before we have taken
some lock.  If we read multiple catalog tuples, they may be from
different points in time.  If we can figure out everything we need to
know from one or two syscache lookups, it may be easier to verify that
the code is bug-free vs. having to do something more complicated.

Now that having been said, I'm not taking the position that Jeevan's
patch (based on Amit Langote's idea) has definitely got the right
idea, just that you should think twice before shooting down the
approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Ashutosh,

0003 patch
+        parentRel = heap_open(parentOid, AccessExclusiveLock);
In [2], Amit Langote has given a reason as to why heap_drop_with_catalog()
should not heap_open() the parent relation. But this patch still calls
heap_open() without giving any counter argument. Also I don't see
get_default_partition_oid() using Relation anywhere. If you remove that
heap_open() please remove following heap_close().

I think the patch 0004 exactly does what you have said here, i.e. it gets
rid of the heap_open() and heap_close().
The question might be why I kept the patch 0004 a separate one, and the
answer is I wanted to make it easier for review, and also keeping it that
way would make it bit easy to work on a different approach if needed.

About this: "Also I don't see get_default_partition_oid() using Relation anywhere."
The get_default_partition_oid() uses parent relation to retrieve PartitionDesc
from parent.

Kindly let me know if you think I am still missing anything.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Sat, Jul 29, 2017 at 2:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 9:30 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> 0004 patch
>> The patch adds another column partdefid to catalog pg_partitioned_table. The
>> column gives OID of the default partition for a given partitioned table. This
>> means that the default partition's OID is stored at two places 1. in the
>> default partition table's pg_class entry and in pg_partitioned_table. There is
>> no way to detect when these two go out of sync. Keeping those two in sync is
>> also a maintenance burdern. Given that default partition's OID is required only
>> while adding/dropping a partition, which is a less frequent operation, it won't
>> hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out
>> the default partition's OID. That will be occasional performance hit
>> worth the otherwise maintenance burden.
>
> Performance isn't the only consideration here.  We also need to think
> about locking and concurrency.  I think that most operations that
> involve locking the parent will also involve locking the default
> partition.  However, we can't safely build a relcache entry for a
> relation before we've got some kind of lock on it.  We can't assume
> that there is no concurrent DDL going on before we take some lock.  We
> can't assume invalidation messages are processed before we have taken
> some lock.  If we read multiple catalog tuples, they may be from
> different points in time.  If we can figure out everything we need to
> know from one or two syscache lookups, it may be easier to verify that
> the code is bug-free vs. having to do something more complicated.
>

The code takes a lock on the parent relation. While that function
holds that lock nobody else would change partitions of that relation
and hence nobody changes the default partition.
heap_drop_with_catalog() has code to lock the parent. Looking up
pg_inherits catalog for its partitions followed by identifying the
partition which has default partition bounds specification while
holding the lock on the parent should be safe. Any changes to
partition bounds, or partitions would require lock on the parent. In
order to prevent any buggy code changing the default partition without
sufficient locks, we should lock the default partition after it's
found and check the default partition bound specification again. Will
that work?

> Now that having been said, I'm not taking the position that Jeevan's
> patch (based on Amit Langote's idea) has definitely got the right
> idea, just that you should think twice before shooting down the
> approach.
>

If we can avoid the problems specified by Amit Langote, I am fine with
the approach of reading the default partition OID from the Relcache as
well. But I am not able to device a solution to those problems.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Sun, Jul 30, 2017 at 8:07 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi Ashutosh,
>
> 0003 patch
>>
>> +        parentRel = heap_open(parentOid, AccessExclusiveLock);
>> In [2], Amit Langote has given a reason as to why heap_drop_with_catalog()
>> should not heap_open() the parent relation. But this patch still calls
>> heap_open() without giving any counter argument. Also I don't see
>> get_default_partition_oid() using Relation anywhere. If you remove that
>> heap_open() please remove following heap_close().
>
>
> I think the patch 0004 exactly does what you have said here, i.e. it gets
> rid of the heap_open() and heap_close().
> The question might be why I kept the patch 0004 a separate one, and the
> answer is I wanted to make it easier for review, and also keeping it that
> way would make it bit easy to work on a different approach if needed.
>

The reviewer has to review two different set of changes to the same
portion of the code. That just doubles the work. I didn't find that
simplifying review. As I have suggested earlier, let's define
get_default_partition_oid() only once, mostly in or before 0003 patch.
Having it in a separate patch would allow you to change its
implementation if needed.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, Jul 12, 2017 at 3:31 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> 0001:
> Refactoring existing ATExecAttachPartition  code so that it can be used for
> default partitioning as well

Boring refactoring.  Seems fine.

> 0002:
> This patch teaches the partitioning code to handle the NIL returned by
> get_qual_for_list().
> This is needed because a default partition will not have any constraints in
> case
> it is the only partition of its parent.

Perhaps it would be better to make validatePartConstraint() a no-op
when the constraint is empty rather than putting the logic in the
caller.  Otherwise, every place that calls validatePartConstraint()
has to think about whether or not the constraint-is-NULL case needs to
be handled.

> 0003:
> Support for default partition with the restriction of preventing addition of
> any
> new partition after default partition.

This looks generally reasonable, but can't really be committed without
the later patches, because it might break pg_dump, which won't know
that the DEFAULT partition must be dumped last and might therefore get
the dump ordering wrong, and of course also because it reverts commit
c1e0e7e1d790bf18c913e6a452dea811e858b554.

> 0004:
> Store the default partition OID in pg_partition_table, this will help us to
> retrieve the OID of default relation when we don't have the relation cache
> available. This was also suggested by Amit Langote here[1].

I looked this over and I think this is the right approach.  An
alternative way to avoid needing a relcache entry in
heap_drop_with_catalog() would be for get_default_partition_oid() to
call find_inheritance_children() here and then use a syscache lookup
to get the partition bound for each one, but that's still going to
cause some syscache bloat.

> 0005:
> Extend default partitioning support to allow addition of new partitions.

+       if (spec->is_default)
+       {
+               /* Default partition cannot be added if there already
exists one. */
+               if (partdesc->nparts > 0 &&
partition_bound_has_default(boundinfo))
+               {
+                       with = boundinfo->default_index;
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                        errmsg("partition \"%s\"
conflicts with existing default partition \"%s\"",
+                                                       relname,
get_rel_name(partdesc->oids[with])),
+                                        parser_errposition(pstate,
spec->location)));
+               }
+
+               return;
+       }

I generally think it's good to structure the code so as to minimize
the indentation level.  In this case, if you did if (partdesc->nparts
== 0 || !partition_bound_has_default(boundinfo)) return; first, then
the rest of it could be one level less indented.  Also, perhaps it
would be clearer to test boundinfo == NULL rather than
partdesc->nparts == 0, assuming they are equivalent.

-        * We must also lock the default partition, for the same
reasons explained
-        * in heap_drop_with_catalog().
+        * We must lock the default partition, for the same reasons explained in
+        * DefineRelation().

I don't really see the point of this change.  Whichever earlier patch
adds this code could include or omit the word "also" as appropriate,
and then this patch wouldn't need to change it.

> 0006:
> Extend default partitioning validation code to reuse the refactored code in
> patch 0001.

I'm having a very hard time understanding what's going on with this
patch.  It certainly doesn't seem to be just refactoring things to use
the code from 0001.  For example:

-                       if (ExecCheck(partqualstate, econtext))
+                       if (!ExecCheck(partqualstate, econtext))

It seems hard to believe that refactoring things to use the code from
0001 would involve inverting the value of this test.

+                * derived from the bounds(the partition constraint
never evaluates to
+                * NULL, so negating it like this is safe).

I don't see it being negated.

I think this patch needs a better explanation of what it's trying to
do, and better comments.  I gather that at least part of the point
here is to skip validation scans on default partitions if the default
partition has been constrained not to contain any values that would
fall in the new partition, but neither the commit message for 0006 nor
your description here make that very clear.

> 0007:
> This patch introduces code to check if the scanning of default partition
> child
> can be skipped if it's constraints are proven.

If I understand correctly, this is actually a completely separate
feature not intrinsically related to default partitioning.

> [0008 documentation]

-      attached is marked <literal>NO INHERIT</literal>, the command will fail;
-      such a constraint must be recreated without the <literal>NO
INHERIT</literal>
-      clause.
+      target table.
+     </para>

I don't favor inserting a paragraph break here.

+      then the default partition(if it is a regular table) is scanned to check

The sort-of-trivial problem with this is that an open parenthesis
should be proceeded by a space.  But I think this won't be clear.  I
think you should move this below the following paragraph, which
describes what happens for foreign tables, and then add a new
paragraph like this:

When a table has a default partition, defining a new partition changes
the partition constraint for the default partition.  The default
partition can't contain any rows that would need to be moved to the
new partition, and will be scanned to verify that none are present.
This scan, like the scan of the new partition, can be avoided if an
appropriate <literal>CHECK</literal> constraint is present.  Also like
the scan of the new partition, it is always skipped when the default
partition is a foreign table.

-) ] FOR VALUES <replaceable
class="PARAMETER">partition_bound_spec</replaceable>
+) ] { DEFAULT | FOR VALUES <replaceable
class="PARAMETER">partition_bound_spec</replaceable> }

I recommend writing FOR VALUES | DEFAULT both here and in the ATTACH
PARTITION syntax summary.

+     If <literal>DEFAULT</literal> is specified the table will be created as a
+     default partition of the parent table. The parent can either be a list or
+     range partitioned table. A partition key value not fitting into any other
+     partition of the given parent will be routed to the default partition.
+     There can be only one default partition for a given parent table.
+     </para>
+
+     <para>
+     If the given parent is already having a default partition then adding a
+     new partition would result in an error if the default partition contains a
+     record that would fit in the new partition being added. This check is not
+     performed if the default partition is a foreign table.
+     </para>

The indentation isn't correct here - it doesn't match the surrounding
paragraphs.  The bit about list or range partitioning doesn't match
the actual behavior of the other patches, but maybe you intended this
to document both this feature and what Beena's doing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Robert,


> 0005:
> Extend default partitioning support to allow addition of new partitions.

+       if (spec->is_default)
+       {
+               /* Default partition cannot be added if there already
exists one. */
+               if (partdesc->nparts > 0 &&
partition_bound_has_default(boundinfo))
+               {
+                       with = boundinfo->default_index;
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                        errmsg("partition \"%s\"
conflicts with existing default partition \"%s\"",
+                                                       relname,
get_rel_name(partdesc->oids[with])),
+                                        parser_errposition(pstate,
spec->location)));
+               }
+
+               return;
+       }

I generally think it's good to structure the code so as to minimize
the indentation level.  In this case, if you did if (partdesc->nparts
== 0 || !partition_bound_has_default(boundinfo)) return; first, then
the rest of it could be one level less indented.  Also, perhaps it
would be clearer to test boundinfo == NULL rather than
partdesc->nparts == 0, assuming they are equivalent.

I think even with this change there will be one level of indentation
needed for throwing the error, as the error is to be thrown only if
there exists a default partition.
 
 
-        * We must also lock the default partition, for the same
reasons explained
-        * in heap_drop_with_catalog().
+        * We must lock the default partition, for the same reasons explained in
+        * DefineRelation().

I don't really see the point of this change.  Whichever earlier patch
adds this code could include or omit the word "also" as appropriate,
and then this patch wouldn't need to change it.


Actually the change is made because if the difference in the function name.
I will remove ‘also’ from the first patch itself.
 
> 0007:
> This patch introduces code to check if the scanning of default partition
> child
> can be skipped if it's constraints are proven.

If I understand correctly, this is actually a completely separate
feature not intrinsically related to default partitioning.

I don't see this as a new feature, since scanning the default partition
will be introduced by this series of patches only, and rather than a
feature this can be classified as a completeness of default skip
validation logic. Your thoughts?

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Mon, Aug 14, 2017 at 7:51 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I think even with this change there will be one level of indentation
> needed for throwing the error, as the error is to be thrown only if
> there exists a default partition.

That's true, but we don't need two levels.

>> > 0007:
>> > This patch introduces code to check if the scanning of default partition
>> > child
>> > can be skipped if it's constraints are proven.
>>
>> If I understand correctly, this is actually a completely separate
>> feature not intrinsically related to default partitioning.
>
> I don't see this as a new feature, since scanning the default partition
> will be introduced by this series of patches only, and rather than a
> feature this can be classified as a completeness of default skip
> validation logic. Your thoughts?

Currently, when a partitioned table is attached, we check whether all
the scans can be checked but not whether scans on some partitions can
be attached.  So there are two separate things:

1. When we introduce default partitioning, we need scan the default
partition either when (a) any partition is attached or (b) any
partition is created.

2. In any situation where scans are needed (scanning the partition
when it's attached, scanning the default partition when some other
partition is attached, scanning the default when a new partition is
created), we can run predicate_implied_by for each partition to see
whether the scan of that partition can be skipped.

Those two changes are independent. We could do (1) without doing (2)
or (2) without doing (1) or we could do both.  So they are separate
features.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have rebased the patches on the latest commit.

This needs another rebase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

On Tue, Aug 15, 2017 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have rebased the patches on the latest commit.

This needs another rebase.

I have rebased the patch and addressed your and Ashutosh comments on last set of patches.

The current set of patches contains 6 patches as below:

0001:
Refactoring existing ATExecAttachPartition  code so that it can be used for
default partitioning as well

0002:
This patch teaches the partitioning code to handle the NIL returned by
get_qual_for_list().
This is needed because a default partition will not have any constraints in case
it is the only partition of its parent.

0003:
Support for default partition with the restriction of preventing addition of any
new partition after default partition. This is a merge of 0003 and 0004 in
V24 series.

0004:
Extend default partitioning support to allow addition of new partitions after
default partition is created/attached. This patch is a merge of patches
0005 and 0006 in V24 series to simplify the review process. The
commit message has more details regarding what all is included.

0005:
This patch introduces code to check if the scanning of default partition child
can be skipped if it's constraints are proven.

0006:
Documentation.


PFA, and let me know in case of any comments.

Regards,
Jeevan Ladhe
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Ashutosh,

Please find my feedback inlined.

On Fri, Jul 28, 2017 at 7:00 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Wed, Jul 26, 2017 at 5:44 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patches on the latest commit.
>

Thanks for rebasing the patches. The patches apply and compile
cleanly. make check passes.

Here are some review comments
0001 patch
Most of this patch is same as 0002 patch posted in thread [1]. I have
extensively reviewed that patch for Amit Langote. Can you please compare these
two patches and try to address those comments OR just use patch from that
thread? For example, canSkipPartConstraintValidation() is named as
PartConstraintImpliedByRelConstraint() in that patch. OR
+    if (scanRel_constr == NULL)
+        return false;
+
is not there in that patch since returning false is wrong when partConstraint
is NULL. I think this patch needs those fixes. Also, this patch set would need
a rebase when 0001 from that thread gets committed.


I have renamed the canSkipPartConstraintValidation() to
PartConstraintImpliedByRelConstraint() and made other changes applicable per
Amit’s patch. This patch also refactors the scanning logic in ATExecAttachPartition()
and adds it into a function ValidatePartitionConstraints(), hence I could not use
Amit’s patch as it is. Please have a look into the new patch and let me know if it
looks fine to you.
 
0002 patch
+        if (!and_args)
+            result = NULL;
Add "NULL, if there are not partition constraints e.g. in case of default
partition as the only partition.".

Added. Please check.
 
This patch avoids calling
validatePartitionConstraints() and hence canSkipPartConstraintValidation() when
partConstraint is NULL, but patches in [1] introduce more callers of
canSkipPartConstraintValidation() which may pass NULL. So, it's better that we
handle that case.

Following code added in patch 0001 now should take care of this.
+   num_check = (constr != NULL) ? constr->num_check : 0; 
 
0003 patch
+        parentRel = heap_open(parentOid, AccessExclusiveLock);
In [2], Amit Langote has given a reason as to why heap_drop_with_catalog()
should not heap_open() the parent relation. But this patch still calls
heap_open() without giving any counter argument. Also I don't see
get_default_partition_oid() using Relation anywhere. If you remove that
heap_open() please remove following heap_close().
+        heap_close(parentRel, NoLock);

As clarified earlier this was addressed in 0004 patch of V24 series. In
current set of patches this is now addressed in patch 0003 itself.
 

+                        /*
+                         * The default partition accepts any non-specified
+                         * value, hence it should not get a mapped index while
+                         * assigning those for non-null datums.
+                         */
Instead of "any non-specified value", you may want to use "any value not
specified in the lists of other partitions" or something like that.

Changed the comment.
 

+         * If this is a NULL, route it to the null-accepting partition.
+         * Otherwise, route by searching the array of partition bounds.
You may want to write it as "If this is a null partition key, ..." to clarify
what's NULL.

Changed the comment.
 

+         * cur_index < 0 means we could not find a non-default partition of
+         * this parent. cur_index >= 0 means we either found the leaf
+         * partition, or the next parent to find a partition of.
+         *
+         * If we couldn't find a non-default partition check if the default
+         * partition exists, if it does, get its index.
In order to avoid repeating "we couldn't find a ..."; you may want to add ",
try default partition if one exists." in the first sentence itself.
 
Sorry, but I am not really sure how this change would make the comment
more readable than the current one.
 
get_default_partition_oid() is defined in this patch and then redefined in
0004. Let's define it only once, mostly in or before 0003 patch.
 
get_default_partition_oid() is now defined only once in patch 0003.
 

+         * partition strategy. Assign the parent strategy to the default
s/parent/parent's/

Fixed.
 

+-- attaching default partition overlaps if the default partition already exists
+CREATE TABLE def_part PARTITION OF list_parted DEFAULT;
+CREATE TABLE fail_def_part (LIKE part_1 INCLUDING CONSTRAINTS);
+ALTER TABLE list_parted ATTACH PARTITION fail_def_part DEFAULT;
+ERROR:  cannot attach a new partition to table "list_parted" having a
default partition
For 0003 patch this testcase is same as the testcase in the next hunk; no new
partition can be added after default partition. Please add this testcase in
next set of patches.
 
Though the error message is same, the purpose of testing is different:
1. There cannot be more than one default partition,
2. and other is to test the fact the a new partition cannot be added if the
default partition exists.
The later test needs to be removed in next patch where we add support for
adding new partition even if a default partition exists.
 
+-- fail
+insert into part_default values ('aa', 2);
May be explain why the insert should fail. "A row, which would fit
other partition, does not fit default partition, even when inserted directly"
or something like that. I see that many of the tests in that file do not
explain why something should "fail" or be "ok", but may be it's better to
document the reason for better readability and future reference.

Added a comment. 

+-- check in case of multi-level default partitioned table
s/in/the/ ?. Or you may want to reword it as "default partitioned partition in
multi-level partitioned table" as there is nothing like "default partitioned
table". May be we need a testcase where every level of a multi-level
partitioned table has a default partition.

I have changed the comment as well as added a test scenario where the
partition further has a default partition.
 
+-- drop default, as we need to add some more partitions to test tuple routing
Should be clubbed with the actual DROP statement?

This is needed in patch 0003, as it prevents adding/creating further partitions
to parent. This is removed in patch 0004.
 
+-- Check that addition or removal of any partition is correctly dealt with by
+-- default partition table when it is being used in cached plan.
Plan of a prepared statement gets cached only after it's executed 5 times.
Before that the statement gets invalidated but there's not cached plan that
gets invalidated. The test is fine here, but in order to test the cached plan
as mentioned in the comment, you will need to execute the statement 5 times
before executing drop statement. That's probably unnecessary, so just modify
the comment to say "prepared statements instead of cached plan".

Agree. Fixed.
 
0004 patch
The patch adds another column partdefid to catalog pg_partitioned_table. The
column gives OID of the default partition for a given partitioned table. This
means that the default partition's OID is stored at two places 1. in the
default partition table's pg_class entry and in pg_partitioned_table. There is
no way to detect when these two go out of sync. Keeping those two in sync is
also a maintenance burdern. Given that default partition's OID is required only
while adding/dropping a partition, which is a less frequent operation, it won't
hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out
the default partition's OID. That will be occasional performance hit
worth the otherwise maintenance burden.
 
To avoid partdefid of pg_partitioned_table going out of sync during any
future developments I have added an assert in RelationBuildPartitionDesc()
in patch 0003 in V25 series. I believe DBAs are not supposed to alter any
catalog tables, hence instead of adding an error, I added an Assert to prevent
this breaking during development cycle.
We have similar kind of duplications in other catalogs e.g. pg_opfamily,
pg_operator etc. Also, per Robert [1], the other route of searching pg_class
and pg_inherits is going to cause some syscache bloat.


Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Ashutosh,

On Thu, Aug 17, 2017 at 3:41 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi Ashutosh,

Please find my feedback inlined.

On Fri, Jul 28, 2017 at 7:00 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Wed, Jul 26, 2017 at 5:44 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patches on the latest commit.
>

Thanks for rebasing the patches. The patches apply and compile
cleanly. make check passes.

Here are some review comments
0001 patch
Most of this patch is same as 0002 patch posted in thread [1]. I have
extensively reviewed that patch for Amit Langote. Can you please compare these
two patches and try to address those comments OR just use patch from that
thread? For example, canSkipPartConstraintValidation() is named as
PartConstraintImpliedByRelConstraint() in that patch. OR
+    if (scanRel_constr == NULL)
+        return false;
+
is not there in that patch since returning false is wrong when partConstraint
is NULL. I think this patch needs those fixes. Also, this patch set would need
a rebase when 0001 from that thread gets committed.


I have renamed the canSkipPartConstraintValidation() to
PartConstraintImpliedByRelConstraint() and made other changes applicable per
Amit’s patch. This patch also refactors the scanning logic in ATExecAttachPartition()
and adds it into a function ValidatePartitionConstraints(), hence I could not use
Amit’s patch as it is. Please have a look into the new patch and let me know if it
looks fine to you.
 
0002 patch
+        if (!and_args)
+            result = NULL;
Add "NULL, if there are not partition constraints e.g. in case of default
partition as the only partition.".

Added. Please check.
 
This patch avoids calling
validatePartitionConstraints() and hence canSkipPartConstraintValidation() when
partConstraint is NULL, but patches in [1] introduce more callers of
canSkipPartConstraintValidation() which may pass NULL. So, it's better that we
handle that case.

Following code added in patch 0001 now should take care of this.
+   num_check = (constr != NULL) ? constr->num_check : 0; 
 
0003 patch
+        parentRel = heap_open(parentOid, AccessExclusiveLock);
In [2], Amit Langote has given a reason as to why heap_drop_with_catalog()
should not heap_open() the parent relation. But this patch still calls
heap_open() without giving any counter argument. Also I don't see
get_default_partition_oid() using Relation anywhere. If you remove that
heap_open() please remove following heap_close().
+        heap_close(parentRel, NoLock);

As clarified earlier this was addressed in 0004 patch of V24 series. In
current set of patches this is now addressed in patch 0003 itself.
 

+                        /*
+                         * The default partition accepts any non-specified
+                         * value, hence it should not get a mapped index while
+                         * assigning those for non-null datums.
+                         */
Instead of "any non-specified value", you may want to use "any value not
specified in the lists of other partitions" or something like that.

Changed the comment.
 

+         * If this is a NULL, route it to the null-accepting partition.
+         * Otherwise, route by searching the array of partition bounds.
You may want to write it as "If this is a null partition key, ..." to clarify
what's NULL.

Changed the comment.
 

+         * cur_index < 0 means we could not find a non-default partition of
+         * this parent. cur_index >= 0 means we either found the leaf
+         * partition, or the next parent to find a partition of.
+         *
+         * If we couldn't find a non-default partition check if the default
+         * partition exists, if it does, get its index.
In order to avoid repeating "we couldn't find a ..."; you may want to add ",
try default partition if one exists." in the first sentence itself.
 
Sorry, but I am not really sure how this change would make the comment
more readable than the current one.
 
get_default_partition_oid() is defined in this patch and then redefined in
0004. Let's define it only once, mostly in or before 0003 patch.
 
get_default_partition_oid() is now defined only once in patch 0003.
 

+         * partition strategy. Assign the parent strategy to the default
s/parent/parent's/

Fixed.
 

+-- attaching default partition overlaps if the default partition already exists
+CREATE TABLE def_part PARTITION OF list_parted DEFAULT;
+CREATE TABLE fail_def_part (LIKE part_1 INCLUDING CONSTRAINTS);
+ALTER TABLE list_parted ATTACH PARTITION fail_def_part DEFAULT;
+ERROR:  cannot attach a new partition to table "list_parted" having a
default partition
For 0003 patch this testcase is same as the testcase in the next hunk; no new
partition can be added after default partition. Please add this testcase in
next set of patches.
 
Though the error message is same, the purpose of testing is different:
1. There cannot be more than one default partition,
2. and other is to test the fact the a new partition cannot be added if the
default partition exists.
The later test needs to be removed in next patch where we add support for
adding new partition even if a default partition exists.
 
+-- fail
+insert into part_default values ('aa', 2);
May be explain why the insert should fail. "A row, which would fit
other partition, does not fit default partition, even when inserted directly"
or something like that. I see that many of the tests in that file do not
explain why something should "fail" or be "ok", but may be it's better to
document the reason for better readability and future reference.

Added a comment. 

+-- check in case of multi-level default partitioned table
s/in/the/ ?. Or you may want to reword it as "default partitioned partition in
multi-level partitioned table" as there is nothing like "default partitioned
table". May be we need a testcase where every level of a multi-level
partitioned table has a default partition.

I have changed the comment as well as added a test scenario where the
partition further has a default partition.
 
+-- drop default, as we need to add some more partitions to test tuple routing
Should be clubbed with the actual DROP statement?

This is needed in patch 0003, as it prevents adding/creating further partitions
to parent. This is removed in patch 0004.
 
+-- Check that addition or removal of any partition is correctly dealt with by
+-- default partition table when it is being used in cached plan.
Plan of a prepared statement gets cached only after it's executed 5 times.
Before that the statement gets invalidated but there's not cached plan that
gets invalidated. The test is fine here, but in order to test the cached plan
as mentioned in the comment, you will need to execute the statement 5 times
before executing drop statement. That's probably unnecessary, so just modify
the comment to say "prepared statements instead of cached plan".

Agree. Fixed.
 
0004 patch
The patch adds another column partdefid to catalog pg_partitioned_table. The
column gives OID of the default partition for a given partitioned table. This
means that the default partition's OID is stored at two places 1. in the
default partition table's pg_class entry and in pg_partitioned_table. There is
no way to detect when these two go out of sync. Keeping those two in sync is
also a maintenance burdern. Given that default partition's OID is required only
while adding/dropping a partition, which is a less frequent operation, it won't
hurt to join a few catalogs (pg_inherits and pg_class in this case) to find out
the default partition's OID. That will be occasional performance hit
worth the otherwise maintenance burden.
 
To avoid partdefid of pg_partitioned_table going out of sync during any
future developments I have added an assert in RelationBuildPartitionDesc()
in patch 0003 in V25 series. I believe DBAs are not supposed to alter any
catalog tables, hence instead of adding an error, I added an Assert to prevent
this breaking during development cycle.
We have similar kind of duplications in other catalogs e.g. pg_opfamily,
pg_operator etc. Also, per Robert [1], the other route of searching pg_class
and pg_inherits is going to cause some syscache bloat.



You can see your comments addressed as above in patch series v25 here[1].


Regards,
Jeevan Ladhe 

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Robert,

Please find my feedback inlined.
I have addressed following comments in V25 patch[1].


> 0002:
> This patch teaches the partitioning code to handle the NIL returned by
> get_qual_for_list().
> This is needed because a default partition will not have any constraints in
> case
> it is the only partition of its parent.

Perhaps it would be better to make validatePartConstraint() a no-op
when the constraint is empty rather than putting the logic in the
caller.  Otherwise, every place that calls validatePartConstraint()
has to think about whether or not the constraint-is-NULL case needs to
be handled.

I have now added a check in ValidatePartConstraint(). This change is made
in 0001 patch.

 
> 0003:
> Support for default partition with the restriction of preventing addition of
> any
> new partition after default partition.

This looks generally reasonable, but can't really be committed without
the later patches, because it might break pg_dump, which won't know
that the DEFAULT partition must be dumped last and might therefore get
the dump ordering wrong, and of course also because it reverts commit
c1e0e7e1d790bf18c913e6a452dea811e858b554.


Thanks Robert for looking into this. The purpose of having different patches is
just to ease the review process and the basic patch of introducing the default
partition support and extending support for addition of new partition should go
together.
 
> 0004:
> Store the default partition OID in pg_partition_table, this will help us to
> retrieve the OID of default relation when we don't have the relation cache
> available. This was also suggested by Amit Langote here[1].

I looked this over and I think this is the right approach.  An
alternative way to avoid needing a relcache entry in
heap_drop_with_catalog() would be for get_default_partition_oid() to
call find_inheritance_children() here and then use a syscache lookup
to get the partition bound for each one, but that's still going to
cause some syscache bloat.

To safeguard future development from missing this and leaving it out of sync, I
have added an Assert in RelationBuildPartitionDesc() to cross check the
partdefid.
 

> 0005:
> Extend default partitioning support to allow addition of new partitions.

+       if (spec->is_default)
+       {
+               /* Default partition cannot be added if there already
exists one. */
+               if (partdesc->nparts > 0 &&
partition_bound_has_default(boundinfo))
+               {
+                       with = boundinfo->default_index;
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                        errmsg("partition \"%s\"
conflicts with existing default partition \"%s\"",
+                                                       relname,
get_rel_name(partdesc->oids[with])),
+                                        parser_errposition(pstate,
spec->location)));
+               }
+
+               return;
+       }

I generally think it's good to structure the code so as to minimize
the indentation level.  In this case, if you did if (partdesc->nparts
== 0 || !partition_bound_has_default(boundinfo)) return; first, then
the rest of it could be one level less indented.  Also, perhaps it
would be clearer to test boundinfo == NULL rather than
partdesc->nparts == 0, assuming they are equivalent.

Fixed.

> 0006:
> Extend default partitioning validation code to reuse the refactored code in
> patch 0001.

I'm having a very hard time understanding what's going on with this
patch.  It certainly doesn't seem to be just refactoring things to use
the code from 0001.  For example:

-                       if (ExecCheck(partqualstate, econtext))
+                       if (!ExecCheck(partqualstate, econtext))

It seems hard to believe that refactoring things to use the code from
0001 would involve inverting the value of this test.

+                * derived from the bounds(the partition constraint
never evaluates to
+                * NULL, so negating it like this is safe).

I don't see it being negated.

I think this patch needs a better explanation of what it's trying to
do, and better comments.  I gather that at least part of the point
here is to skip validation scans on default partitions if the default
partition has been constrained not to contain any values that would
fall in the new partition, but neither the commit message for 0006 nor
your description here make that very clear.

In V25 series, this is now part of patch 0004, and should avoid any
confusion as above. I have tried to add verbose comment in commit
message as well as I have improved the code comments in this code
area.

> [0008 documentation]

-      attached is marked <literal>NO INHERIT</literal>, the command will fail;
-      such a constraint must be recreated without the <literal>NO
INHERIT</literal>
-      clause.
+      target table.
+     </para>

I don't favor inserting a paragraph break here.

Fixed.
 
+      then the default partition(if it is a regular table) is scanned to check

The sort-of-trivial problem with this is that an open parenthesis
should be proceeded by a space.  But I think this won't be clear.  I
think you should move this below the following paragraph, which
describes what happens for foreign tables, and then add a new
paragraph like this:

When a table has a default partition, defining a new partition changes
the partition constraint for the default partition.  The default
partition can't contain any rows that would need to be moved to the
new partition, and will be scanned to verify that none are present.
This scan, like the scan of the new partition, can be avoided if an
appropriate <literal>CHECK</literal> constraint is present.  Also like
the scan of the new partition, it is always skipped when the default
partition is a foreign table.

I have made the change as suggested.
 
-) ] FOR VALUES <replaceable
class="PARAMETER">partition_bound_spec</replaceable>
+) ] { DEFAULT | FOR VALUES <replaceable
class="PARAMETER">partition_bound_spec</replaceable> }

I recommend writing FOR VALUES | DEFAULT both here and in the ATTACH
PARTITION syntax summary.

Changed.
 
+     If <literal>DEFAULT</literal> is specified the table will be created as a
+     default partition of the parent table. The parent can either be a list or
+     range partitioned table. A partition key value not fitting into any other
+     partition of the given parent will be routed to the default partition.
+     There can be only one default partition for a given parent table.
+     </para>
+
+     <para>
+     If the given parent is already having a default partition then adding a
+     new partition would result in an error if the default partition contains a
+     record that would fit in the new partition being added. This check is not
+     performed if the default partition is a foreign table.
+     </para>

The indentation isn't correct here - it doesn't match the surrounding
paragraphs.  The bit about list or range partitioning doesn't match
the actual behavior of the other patches, but maybe you intended this
to document both this feature and what Beena's doing.

I have tried to fix this now.


[1]

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Robert,

>> > 0007:
>> > This patch introduces code to check if the scanning of default partition
>> > child
>> > can be skipped if it's constraints are proven.
>>
>> If I understand correctly, this is actually a completely separate
>> feature not intrinsically related to default partitioning.
>
> I don't see this as a new feature, since scanning the default partition
> will be introduced by this series of patches only, and rather than a
> feature this can be classified as a completeness of default skip
> validation logic. Your thoughts?

Currently, when a partitioned table is attached, we check whether all
the scans can be checked but not whether scans on some partitions can
be attached.  So there are two separate things:

1. When we introduce default partitioning, we need scan the default
partition either when (a) any partition is attached or (b) any
partition is created.

2. In any situation where scans are needed (scanning the partition
when it's attached, scanning the default partition when some other
partition is attached, scanning the default when a new partition is
created), we can run predicate_implied_by for each partition to see
whether the scan of that partition can be skipped.

Those two changes are independent. We could do (1) without doing (2)
or (2) without doing (1) or we could do both.  So they are separate
features.


In my V25 series(patch 0005) I have only addressed half of (2) above
i.e. code to check whether the scan of a partition of default partition
can be skipped when other partition is being added. Amit Langote
has submitted[1] a separate patch(0003)  to address skipping the scan
of the children of relation when it's being attached as a partition.
 
[1] https://www.postgresql.org/message-id/4cd13b03-846d-dc65-89de-1fd9743a3869%40lab.ntt.co.jp

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Thom Brown
Date:
On 17 August 2017 at 10:59, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
> Hi,
>
> On Tue, Aug 15, 2017 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe
>> <jeevan.ladhe@enterprisedb.com> wrote:
>> > I have rebased the patches on the latest commit.
>>
>> This needs another rebase.
>
>
> I have rebased the patch and addressed your and Ashutosh comments on last
> set of patches.
>
> The current set of patches contains 6 patches as below:
>
> 0001:
> Refactoring existing ATExecAttachPartition  code so that it can be used for
> default partitioning as well
>
> 0002:
> This patch teaches the partitioning code to handle the NIL returned by
> get_qual_for_list().
> This is needed because a default partition will not have any constraints in
> case
> it is the only partition of its parent.
>
> 0003:
> Support for default partition with the restriction of preventing addition of
> any
> new partition after default partition. This is a merge of 0003 and 0004 in
> V24 series.
>
> 0004:
> Extend default partitioning support to allow addition of new partitions
> after
> default partition is created/attached. This patch is a merge of patches
> 0005 and 0006 in V24 series to simplify the review process. The
> commit message has more details regarding what all is included.
>
> 0005:
> This patch introduces code to check if the scanning of default partition
> child
> can be skipped if it's constraints are proven.
>
> 0006:
> Documentation.
>
>
> PFA, and let me know in case of any comments.

Thanks.  Applies fine, and I've been exercising the patch and it is
doing everything it's supposed to do.

I am, however, curious to know why the planner can't optimise the following:

SELECT * FROM mystuff WHERE mystuff = (1::int,'JP'::text,'blue'::text);

This exhaustively checks all partitions, but if I change it to:

SELECT * FROM mystuff WHERE (id, country, content) =
(1::int,'JP'::text,'blue'::text);

It works fine.

The former filters like so: ((mystuff_default_1.*)::mystuff = ROW(1,
'JP'::text, 'blue'::text))

Shouldn't it instead do:

((mystuff_default_1.id, mystuff_default_1.country,
mystuff_default_1.content)::mystuff = ROW(1, 'JP'::text,
'blue'::text))

So it's not really to do with this patch; it's just something I
noticed while testing.

Thom



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, Aug 17, 2017 at 6:24 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have addressed following comments in V25 patch[1].

Committed 0001.  Since that code seems to be changing about every 10
minutes, it seems best to get this refactoring out of the way before
it changes again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:


On Fri, Aug 18, 2017 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 17, 2017 at 6:24 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have addressed following comments in V25 patch[1].

Committed 0001.  Since that code seems to be changing about every 10
minutes, it seems best to get this refactoring out of the way before
it changes again.

Thanks Robert for taking care of this.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:

Hi,

On Thu, Aug 17, 2017 at 3:29 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi,

On Tue, Aug 15, 2017 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have rebased the patches on the latest commit.

This needs another rebase.

I have rebased the patch and addressed your and Ashutosh comments on last set of patches.

The current set of patches contains 6 patches as below:

0001:
Refactoring existing ATExecAttachPartition  code so that it can be used for
default partitioning as well

0002:
This patch teaches the partitioning code to handle the NIL returned by
get_qual_for_list().
This is needed because a default partition will not have any constraints in case
it is the only partition of its parent.

0003:
Support for default partition with the restriction of preventing addition of any
new partition after default partition. This is a merge of 0003 and 0004 in
V24 series.

0004:
Extend default partitioning support to allow addition of new partitions after
default partition is created/attached. This patch is a merge of patches
0005 and 0006 in V24 series to simplify the review process. The
commit message has more details regarding what all is included.

0005:
This patch introduces code to check if the scanning of default partition child
can be skipped if it's constraints are proven.

0006:
Documentation.

 

After patch 0001 in above series got committed[1], I have rebased the patches.


The attached set of patches now looks like below:


0001:

This patch teaches the partitioning code to handle the NIL returned by

get_qual_for_list().

This is needed because a default partition will not have any constraints in case

it is the only partition of its parent.


0002:

Support for default partition with the restriction of preventing addition of any

new partition after default partition. This is a merge of 0003 and 0004 in

V24 series.


0003:

Extend default partitioning support to allow addition of new partitions after

default partition is created/attached. This patch is a merge of patches

0005 and 0006 in V24 series to simplify the review process. The

commit message has more details regarding what all is included.


0004:

This patch introduces code to check if the scanning of default partition child

can be skipped if it's constraints are proven.


0005:

Documentation.


[1] https://www.postgresql.org/message-id/CA%2BTgmoYp-QePjTGEC6W%2BRfuh%3DTMZ4Hj8t2fX2o8cbhto6zS9DA%40mail.gmail.com


Regards,

Jeevan Ladhe 
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Mon, Aug 21, 2017 at 4:47 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
> Hi,
>
> On Thu, Aug 17, 2017 at 3:29 PM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>>
>> Hi,
>>
>> On Tue, Aug 15, 2017 at 7:20 PM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>>>
>>> On Wed, Jul 26, 2017 at 8:14 AM, Jeevan Ladhe
>>> <jeevan.ladhe@enterprisedb.com> wrote:
>>> > I have rebased the patches on the latest commit.
>>>
>>> This needs another rebase.
>>
>>
>> I have rebased the patch and addressed your and Ashutosh comments on last
>> set of patches.

Thanks for the rebased patches.

>>
>> The current set of patches contains 6 patches as below:
>>
>> 0001:
>> Refactoring existing ATExecAttachPartition  code so that it can be used
>> for
>> default partitioning as well
 * Returns an expression tree describing the passed-in relation's partition
- * constraint.
+ * constraint. If there are no partition constraints returns NULL e.g. in case
+ * default partition is the only partition.
The first sentence uses singular constraint. The second uses plural. Given that
partition bounds together form a single constraint we should use singular
constraint in the second sentence as well.

Do we want to add a similar comment in the prologue of
generate_partition_qual(). The current wording there seems to cover this case,
but do we want to explicitly mention this case?

+        if (!and_args)
+            result = NULL;
While this is correct, I am increasingly seeing (and_args != NIL) usage.

get_partition_qual_relid() is called from pg_get_partition_constraintdef(),   constr_expr =
get_partition_qual_relid(relationId);
   /* Quick exit if not a partition */   if (constr_expr == NULL)       PG_RETURN_NULL();
The comment is now wrong since a default partition may have no constraints. May
be rewrite it as simply, "Quick exit if no partition constraint."

generate_partition_qual() has three callers and all of them are capable of
handling NIL partition constraint for default partition. May be it's better to
mention in the commit message that we have checked that the callers of
this function
can handle NIL partition constraint.

>>
>> 0002:
>> This patch teaches the partitioning code to handle the NIL returned by
>> get_qual_for_list().
>> This is needed because a default partition will not have any constraints
>> in case
>> it is the only partition of its parent.

If the partition being dropped is the default partition,
heap_drop_with_catalog() locks default partition twice, once as the default
partition and the second time as the partition being dropped. So, it will be
counted as locked twice. There doesn't seem to be any harm in this, since the
lock will be help till the transaction ends, by when all the locks will be
released.

Same is the case with cache invalidation message. If we are dropping default
partition, the cache invalidation message on "default partition" is not
required. Again this might be harmless, but better to avoid it.

Similar problems exists with ATExecDetachPartition(), default partition will be
locked twice if it's being detached.

+        /*
+         * If this is a default partition, pg_partitioned_table must have it's
+         * OID as value of 'partdefid' for it's parent (i.e. rel) entry.
+         */
+        if (castNode(PartitionBoundSpec, boundspec)->is_default)
+        {
+            Oid            partdefid;
+
+            partdefid = get_default_partition_oid(RelationGetRelid(rel));
+            Assert(partdefid == inhrelid);
+        }
Since an accidental change or database corruption may change the default
partition OID in pg_partition_table. An Assert won't help in such a case. May
be we should throw an error or at least report a warning. If we throw an error,
the table will become useless (or even the database will become useless
RelationBuildPartitionDesc is called from RelationCacheInitializePhase3() on
such a corrupted table). To avoid that we may raise a warning.

I am wondering whether we could avoid call to get_default_partition_oid() in
the above block, thus avoiding a sys cache lookup. The sys cache search
shouldn't be expensive since the cache should already have that entry, but
still if we can avoid it, we save some CPU cycles. The default partition OID is
stored in pg_partition_table catalog, which is looked up in
RelationGetPartitionKey(), a function which precedes RelationGetPartitionDesc()
everywhere. What if that RelationGetPartitionKey() also returns the default
partition OID and the common caller passes it to RelationGetPartitionDesc()?.

+    /* A partition cannot be attached if there exists a default partition */
+    defaultPartOid = get_default_partition_oid(RelationGetRelid(rel));
+    if (OidIsValid(defaultPartOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                 errmsg("cannot attach a new partition to table
\"%s\" having a default partition",
+                        RelationGetRelationName(rel))));
get_default_partition_oid() searches the catalogs, which is not needed when we
have relation descriptor of the partitioned table (to which a new partition is
being attached). You should get the default partition OID from partition
descriptor. That will be cheaper.

+                /* If there isn't any constraint, show that explicitly */
+                if (partconstraintdef[0] == '\0')
+                    printfPQExpBuffer(&tmpbuf, _("No partition constraint"));
I think we need to change the way we set partconstraintdef at           if (PQnfields(result) == 3)
partconstraintdef= PQgetvalue(result, 0, 2);
 
Before this commit, constraints will never be NULL so this code works, but now
that the cosntraints could be NULL, we need to check whether 3rd value is NULL
or not using PQgetisnull() and assigning a value only when it's not NULL.

+-- test adding default partition as first partition accepts any value including
grammar, reword as "test that a default partition added as the first
partition accepts any
value including".

>>
>> 0003:
>> Support for default partition with the restriction of preventing addition
>> of any
>> new partition after default partition. This is a merge of 0003 and 0004 in
>> V24 series.

The commit message of this patch has following line, which no more applies to
patch 0001. May be you want to remove this line or update the patch number.
3. This patch uses the refactored functions created in patch 0001
in this series.
Similarly the credit line refers to patch 0001. That too needs correction.

- * Also, invalidate the parent's relcache, so that the next rebuild will load
- * the new partition's info into its partition descriptor.
+ * Also, invalidate the parent's relcache entry, so that the next rebuild will
+ * load he new partition's info into its partition descriptor.  If there is a
+ * default partition, we must invalidate its relcache entry as well.
Replacing "relcache" with "relcache entry" in the first sentence  may be a good
idea, but is unrelated to this patch. I would leave that change aside and just
add comment about default partition.

+    /*
+     * The partition constraint for the default partition depends on the
+     * partition bounds of every other partition, so we must invalidate the
+     * relcache entry for that partition every time a partition is added or
+     * removed.
+     */
+    defaultPartOid = get_default_partition_oid(RelationGetRelid(parent));
+    if (OidIsValid(defaultPartOid))
+        CacheInvalidateRelcacheByRelid(defaultPartOid);
Again, since we have access to the parent's relcache, we should get the default
partition OID from relcache rather than catalogs.

The commit message of this patch has following line, which no more applies to
patch 0001. May be you want to remove this line or update the patch number.
3. This patch uses the refactored functions created in patch 0001
in this series.
Similarly the credit line refers to patch 0001. That too needs correction.

- * Also, invalidate the parent's relcache, so that the next rebuild will load
- * the new partition's info into its partition descriptor.
+ * Also, invalidate the parent's relcache entry, so that the next rebuild will
+ * load he new partition's info into its partition descriptor.  If there is a
+ * default partition, we must invalidate its relcache entry as well.
Replacing "relcache" with "relcache entry" in the first sentence  may be a good
idea, but is unrelated to this patch. I would leave that change aside and just
add comment about default partition.

+    /*
+     * The partition constraint for the default partition depends on the
+     * partition bounds of every other partition, so we must invalidate the
+     * relcache entry for that partition every time a partition is added or
+     * removed.
+     */
+    defaultPartOid = get_default_partition_oid(RelationGetRelid(parent));
+    if (OidIsValid(defaultPartOid))
+        CacheInvalidateRelcacheByRelid(defaultPartOid);
Again, since we have access to the parent's relcache, we should get the default
partition OID from relcache rather than catalogs.

I haven't gone through the full patch yet, so there may be more
comments here. There is some duplication of code in
check_default_allows_bound() and ValidatePartitionConstraints() to
scan the children of partition being validated. The difference is that
the first one scans the relations in the same function and the second
adds them to work queue. May be we could use
ValidatePartitionConstraints() to scan the relation or add to the
queue based on some input flag may be wqueue argument itself. But I
haven't thought through this completely. Any thoughts?

>>
>> 0004:
>> Extend default partitioning support to allow addition of new partitions
>> after
>> default partition is created/attached. This patch is a merge of patches
>> 0005 and 0006 in V24 series to simplify the review process. The
>> commit message has more details regarding what all is included.
>>
>> 0005:
>> This patch introduces code to check if the scanning of default partition
>> child
>> can be skipped if it's constraints are proven.
>>
>> 0006:
>> Documentation.
>>
>

I will get to these patches in a short while.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

I have merged the default partition for range[1] patches in attached V26
series  of patches.

Here are how the patches now look like:

0001:
This patch refactors RelationBuildPartitionDesc(), basically this is patch
0001 of default range partition[1].

0002:
This patch teaches the partitioning code to handle the NIL returned by
get_qual_for_list().
This is needed because a default partition will not have any constraints in case
it is the only partition of its parent.

0003:
Support for default partition with the restriction of preventing addition of any
new partition after default partition. This patch now has support for both
default partition for list and range.

0004:
Extend default partitioning support to allow addition of new partitions after
default partition is created/attached. This patch is a merge of patches
0005 and 0006 in V24 series to simplify the review process. The
commit message has more details regarding what all is included.

0005:
This patch introduces code to check if the scanning of default partition child
can be skipped if it's constraints are proven.

0006:
Documentation.


Regards,
Jeevan Ladhe

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, Aug 31, 2017 at 8:53 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> 0001:
> This patch refactors RelationBuildPartitionDesc(), basically this is patch
> 0001 of default range partition[1].

I spent a while studying this; it seems to be simpler and there's no
real downside.  So, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Fri, Sep 1, 2017 at 3:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Aug 31, 2017 at 8:53 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>> 0001:
>> This patch refactors RelationBuildPartitionDesc(), basically this is patch
>> 0001 of default range partition[1].
>
> I spent a while studying this; it seems to be simpler and there's no
> real downside.  So, committed.

BTW, the rest of this series seems to need a rebase.  The changes to
insert.sql conflicted with 30833ba154e0c1106d61e3270242dc5999a3e4f3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
On Sat, Sep 2, 2017 at 7:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 1, 2017 at 3:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Aug 31, 2017 at 8:53 AM, Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>> 0001:
>> This patch refactors RelationBuildPartitionDesc(), basically this is patch
>> 0001 of default range partition[1].
>
> I spent a while studying this; it seems to be simpler and there's no
> real downside.  So, committed.


Thanks Robert for taking care of this.
 
BTW, the rest of this series seems to need a rebase.  The changes to
insert.sql conflicted with 30833ba154e0c1106d61e3270242dc5999a3e4f3.

Will rebase the patches.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

Attached is the rebased set of patches.
Robert has committed[1] patch 0001 in V26 series, hence the patch numbering
in V27 is now decreased by 1 for each patch as compared to V26.

This set of patches also addresses comments[2] given by Ashutosh.

Here is the description of the patches:

0001:
This patch teaches the partitioning code to handle the NIL returned by
get_qual_for_list(). This is needed because a default partition will not have
any constraints in case it is the only partition of its parent.

0002:
Support for default partition with the restriction of preventing addition of any
new partition after default partition. This patch has support for default
partition for both list and range.
Addition to V26 patch 0003 following are the additional changes here:;
1. Some changes in range default partition comments given by Beena offline.
2. I have shifted definition of macro partition_bound_has_default to next patch
as it wasn't used in this patch at all.

0003:
Extend default partitioning support to allow addition of new partitions after
default partition is created/attached.

0004:
This patch introduces code to check if the scanning of default partition child
can be skipped if it's constraints are proven.

0005:
Documentation.



Regards,
Jeevan Ladhe

Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi Ashutosh,

I have tried to address your comments in V27 patch series[1].
Please find my comments inlined.


>>
>> The current set of patches contains 6 patches as below:
>>
>> 0001:
>> Refactoring existing ATExecAttachPartition  code so that it can be used
>> for
>> default partitioning as well
 
If I understand correctly these comments refer to patch 0001 of V25_rebase
series, which is related to "Fix assumptions that get_qual_from_partbound()"
and not refactoring. This is patch 0001 in V27 now.

  * Returns an expression tree describing the passed-in relation's partition
- * constraint.
+ * constraint. If there are no partition constraints returns NULL e.g. in case
+ * default partition is the only partition.
The first sentence uses singular constraint. The second uses plural. Given that
partition bounds together form a single constraint we should use singular
constraint in the second sentence as well.

I have changed the wording now.
 

Do we want to add a similar comment in the prologue of
generate_partition_qual(). The current wording there seems to cover this case,
but do we want to explicitly mention this case?

I have added a comment there.
 

+        if (!and_args)
+            result = NULL;
While this is correct, I am increasingly seeing (and_args != NIL) usage.

Changed this to:
+       if (and_args == NIL)
+           result = NULL; 


get_partition_qual_relid() is called from pg_get_partition_constraintdef(),
    constr_expr = get_partition_qual_relid(relationId);

    /* Quick exit if not a partition */
    if (constr_expr == NULL)
        PG_RETURN_NULL();
The comment is now wrong since a default partition may have no constraints. May
be rewrite it as simply, "Quick exit if no partition constraint."

Fixed.
 

generate_partition_qual() has three callers and all of them are capable of
handling NIL partition constraint for default partition. May be it's better to
mention in the commit message that we have checked that the callers of
this function
can handle NIL partition constraint.

Added in commit message.
 
>>
>> 0002:
>> This patch teaches the partitioning code to handle the NIL returned by
>> get_qual_for_list().
>> This is needed because a default partition will not have any constraints
>> in case
>> it is the only partition of its parent.

Comments below refer to patch 0002 in V25_rebase(0003 in V25), which
adds basic support for default partition, which is now 0002 in V27.
 
If the partition being dropped is the default partition,
heap_drop_with_catalog() locks default partition twice, once as the default
partition and the second time as the partition being dropped. So, it will be
counted as locked twice. There doesn't seem to be any harm in this, since the
lock will be help till the transaction ends, by when all the locks will be
released.

 Fixed.


Same is the case with cache invalidation message. If we are dropping default
partition, the cache invalidation message on "default partition" is not
required. Again this might be harmless, but better to avoid it.
 
Fixed.
 
Similar problems exists with ATExecDetachPartition(), default partition will be
locked twice if it's being detached.

In ATExecDetachPartition() we do not have OID of the relation being detached 
available at the time we lock default partition. Moreover, here we are taking an
exclusive lock on default OID and an share lock on partition being detached.
As you correctly said in your earlier comment that it will be counted as locked
twice, which to me also seems harmless. As these locks are to be held till
commit of the transaction nobody else is supposed to be releasing these locks in
between. I am not able to visualize a problem here, but still I have tried to
avoid locking the default partition table twice, please review the changes and
let me know your thoughts.
 
+        /*
+         * If this is a default partition, pg_partitioned_table must have it's
+         * OID as value of 'partdefid' for it's parent (i.e. rel) entry.
+         */
+        if (castNode(PartitionBoundSpec, boundspec)->is_default)
+        {
+            Oid            partdefid;
+
+            partdefid = get_default_partition_oid(RelationGetRelid(rel));
+            Assert(partdefid == inhrelid);
+        }
Since an accidental change or database corruption may change the default
partition OID in pg_partition_table. An Assert won't help in such a case. May
be we should throw an error or at least report a warning. If we throw an error,
the table will become useless (or even the database will become useless
RelationBuildPartitionDesc is called from RelationCacheInitializePhase3() on
such a corrupted table). To avoid that we may raise a warning.

I have added a warning here instead of Assert.
 
I am wondering whether we could avoid call to get_default_partition_oid() in
the above block, thus avoiding a sys cache lookup. The sys cache search
shouldn't be expensive since the cache should already have that entry, but
still if we can avoid it, we save some CPU cycles. The default partition OID is
stored in pg_partition_table catalog, which is looked up in
RelationGetPartitionKey(), a function which precedes RelationGetPartitionDesc()
everywhere. What if that RelationGetPartitionKey() also returns the default
partition OID and the common caller passes it to RelationGetPartitionDesc()?.

The purpose here is to cross check the relid with partdefid stored in catalog
pg_partitioned_table, though its going to be the same in the parents cache, I
think its better that we retrieve it from the catalog syscache.
Further, RelationGetPartitionKey() is a macro and not a function, so modifying
the existing simple macro for this reason does not sound a good idea to me.
Having said this I am open to ideas here. 


+    /* A partition cannot be attached if there exists a default partition */
+    defaultPartOid = get_default_partition_oid(RelationGetRelid(rel));
+    if (OidIsValid(defaultPartOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                 errmsg("cannot attach a new partition to table
\"%s\" having a default partition",
+                        RelationGetRelationName(rel))));
get_default_partition_oid() searches the catalogs, which is not needed when we
have relation descriptor of the partitioned table (to which a new partition is
being attached). You should get the default partition OID from partition
descriptor. That will be cheaper.
 
Something like following can be done here:
    /* A partition cannot be attached if there exists a default partition */
    if (partition_bound_has_default(rel->partdesc->boundinfo))
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
                 errmsg("cannot attach a new partition to table \"%s\" having a default partition",
                        RelationGetRelationName(rel))));

But, partition_bound_has_default() is defined in partition.c and not in
partition.h. This is done that way because boundinfo is not available in
partition.h. Further, this piece of code is removed in next patch where we
extend default partition support to add/attach partition even when default
partition exists. So, to me I don’t see much of the correction issue here.

Another way to get around this is, we can define another version of
get_default_partition_oid() something like get_default_partition_oid_from_parent_rel()
in partition.c which looks around in relcache instead of catalog and returns the
oid of default partition, or get_default_partition_oid() accepts both parent OID,
and parent ‘Relation’ rel, if rel is not null look into relcahce and return,
else search from catalog using OID.
 

+                /* If there isn't any constraint, show that explicitly */
+                if (partconstraintdef[0] == '\0')
+                    printfPQExpBuffer(&tmpbuf, _("No partition constraint"));
I think we need to change the way we set partconstraintdef at
            if (PQnfields(result) == 3)
                partconstraintdef = PQgetvalue(result, 0, 2);
Before this commit, constraints will never be NULL so this code works, but now
that the cosntraints could be NULL, we need to check whether 3rd value is NULL
or not using PQgetisnull() and assigning a value only when it's not NULL.
 
I have changed this to:
-                       if (PQnfields(result) == 3)
+                       if (PQnfields(result) == 3 && !PQgetisnull(result, 0, 2))
                                partconstraintdef = PQgetvalue(result, 0, 2);

Please let me know if the change looks good to you.
 
+-- test adding default partition as first partition accepts any value including
grammar, reword as "test that a default partition added as the first
partition accepts any
value including".

changed the wording in the comment as suggested.
 

>>
>> 0003:
>> Support for default partition with the restriction of preventing addition
>> of any
>> new partition after default partition. This is a merge of 0003 and 0004 in
>> V24 series.
Comments below rather seem to be for the patch that extends default partition
such that new partition can be added even when default partition exists. This
is 0003 patch in V27.
 

The commit message of this patch has following line, which no more applies to
patch 0001. May be you want to remove this line or update the patch number.
3. This patch uses the refactored functions created in patch 0001
in this series.
Similarly the credit line refers to patch 0001. That too needs correction.

Fixed commit message.
 

- * Also, invalidate the parent's relcache, so that the next rebuild will load
- * the new partition's info into its partition descriptor.
+ * Also, invalidate the parent's relcache entry, so that the next rebuild will
+ * load he new partition's info into its partition descriptor.  If there is a
+ * default partition, we must invalidate its relcache entry as well.
Replacing "relcache" with "relcache entry" in the first sentence  may be a good
idea, but is unrelated to this patch. I would leave that change aside and just
add comment about default partition.

Agree. Fixed. 


+    /*
+     * The partition constraint for the default partition depends on the
+     * partition bounds of every other partition, so we must invalidate the
+     * relcache entry for that partition every time a partition is added or
+     * removed.
+     */
+    defaultPartOid = get_default_partition_oid(RelationGetRelid(parent));
+    if (OidIsValid(defaultPartOid))
+        CacheInvalidateRelcacheByRelid(defaultPartOid);
Again, since we have access to the parent's relcache, we should get the default
partition OID from relcache rather than catalogs.


This change is in heap.c, as I said above we would need to have a
different version of get_default_partition_oid() to address this.
Your thoughts?

I haven't gone through the full patch yet, so there may be more
comments here. There is some duplication of code in
check_default_allows_bound() and ValidatePartitionConstraints() to
scan the children of partition being validated. The difference is that
the first one scans the relations in the same function and the second
adds them to work queue. May be we could use
ValidatePartitionConstraints() to scan the relation or add to the
queue based on some input flag may be wqueue argument itself. But I
haven't thought through this completely. Any thoughts?

check_default_allows_bound() is called only from DefineRelation(),
and not for alter command. I am not really sure how can we use
work queue for create command.



Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Rajkumar Raghuwanshi
Date:
On Wed, Sep 6, 2017 at 5:25 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi,

Attached is the rebased set of patches.
Robert has committed[1] patch 0001 in V26 series, hence the patch numbering
in V27 is now decreased by 1 for each patch as compared to V26.

Hi,

I have applied v27 patches and while testing got below observation.

Observation: in below partition table, d1 constraints not allowing NULL to be inserted in b column
but I am able to insert it.

steps to reproduce:
create table d0 (a int, b int) partition by range(a,b);
create table d1 partition of d0 for values from (0,0) to (maxvalue,maxvalue);

postgres=# insert into d0 values (0,null);
INSERT 0 1
postgres=# \d+ d1
                                    Table "public.d1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
 a      | integer |           |          |         | plain   |              |
 b      | integer |           |          |         | plain   |              |
Partition of: d0 FOR VALUES FROM (0, 0) TO (MAXVALUE, MAXVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((a > 0) OR ((a = 0) AND (b >= 0))))

postgres=# select tableoid::regclass,* from d0;
 tableoid | a | b
----------+---+---
 d1       | 0 | 
(1 row)


Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
On Thu, Sep 7, 2017 at 3:15 PM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
On Wed, Sep 6, 2017 at 5:25 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi,

Attached is the rebased set of patches.
Robert has committed[1] patch 0001 in V26 series, hence the patch numbering
in V27 is now decreased by 1 for each patch as compared to V26.

Hi,

I have applied v27 patches and while testing got below observation.

Observation: in below partition table, d1 constraints not allowing NULL to be inserted in b column
but I am able to insert it.

steps to reproduce:
create table d0 (a int, b int) partition by range(a,b);
create table d1 partition of d0 for values from (0,0) to (maxvalue,maxvalue);

postgres=# insert into d0 values (0,null);
INSERT 0 1
postgres=# \d+ d1
                                    Table "public.d1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
 a      | integer |           |          |         | plain   |              |
 b      | integer |           |          |         | plain   |              |
Partition of: d0 FOR VALUES FROM (0, 0) TO (MAXVALUE, MAXVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((a > 0) OR ((a = 0) AND (b >= 0))))

postgres=# select tableoid::regclass,* from d0;
 tableoid | a | b
----------+---+---
 d1       | 0 | 
(1 row)

Good catch. Thanks Rajkumar.
This seems to be happening because of the following changes made in
get_partition_for_tuple() for default range partition support as part of patch 0002.

@@ -1971,27 +2204,10 @@ get_partition_for_tuple(PartitionDispatch *pd,
  ecxt->ecxt_scantuple = slot;
  FormPartitionKeyDatum(parent, slot, estate, values, isnull);
 
- if (key->strategy == PARTITION_STRATEGY_RANGE)
- {
- /*
- * Since we cannot route tuples with NULL partition keys through a
- * range-partitioned table, simply return that no partition exists
- */
- for (i = 0; i < key->partnatts; i++)
- {
- if (isnull[i])
- {
- *failed_at = parent;
- *failed_slot = slot;
- result = -1;
- goto error_exit;
- }
- }
- }

Instead of getting rid of this. If there isn't a default partition then
we still do not have any range partition to route a null partition
key and the routing should fail.

I will work on a fix and send a patch shortly.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:

I will work on a fix and send a patch shortly.


Attached is the V28 patch that fixes the issue reported by Rajkumar.
The patch series is exactly same as that of V27 series[1].
The fix is in patch 0002, and macro partition_bound_has_default() is
again moved in 0002 from 0003, as the fix needed to use it.

The fix is basically in get_partition_for_tuple() as below:

@@ -1973,30 +2209,46 @@ get_partition_for_tuple(PartitionDispatch *pd,

        if (key->strategy == PARTITION_STRATEGY_RANGE)
        {
-           /*
-            * Since we cannot route tuples with NULL partition keys through a
-            * range-partitioned table, simply return that no partition exists
-            */ 
            for (i = 0; i < key->partnatts; i++)
            {
                if (isnull[i])
                {
-                   *failed_at = parent;
-                   *failed_slot = slot;
-                   result = -1;
-                   goto error_exit;
+                   /*
+                    * We cannot route tuples with NULL partition keys through
+                    * a range-partitioned table if it does not have a default
+                    * partition. In such case simply return that no partition
+                    * exists for routing null partition key.
+                    */
+                   if (!partition_bound_has_default(partdesc->boundinfo))
+                   {
+                       *failed_at = parent;
+                       *failed_slot = slot;
+                       result = -1;
+                       goto error_exit;
+                   }
+                   else
+                   {
+                       /*
+                        * If there is any null partition key, it would be
+                        * routed to the default partition.
+                        */
+                       range_partkey_has_null = true;
+                       break;
+                   }
                }
            }
        }

        /*
-        * A null partition key is only acceptable if null-accepting list
-        * partition exists.
+        * If partition strategy is LIST and this is a null partition key,
+        * route it to the null-accepting partition. Otherwise, route by
+        * searching the array of partition bounds.
         */
        cur_index = -1;
-       if (isnull[0] && partition_bound_accepts_nulls(partdesc->boundinfo))
+       if (key->strategy == PARTITION_STRATEGY_LIST && isnull[0] &&
+           partition_bound_accepts_nulls(partdesc->boundinfo))
            cur_index = partdesc->boundinfo->null_index;
-       else if (!isnull[0])
+       else if (!range_partkey_has_null && !isnull[0])
        {


The fix would be much easier if the refactoring patch 0001 by Amul in hash
partitioning thread[2] is committed.
The current code mixes the routing for list and range partitioning, and makes
it difficult to understand and fix any issues coming forward. I believe it will
be a good idea to keep the logic separate for both partitioning strategies.
Thoughts, view?


Regards,
Jeevan Ladhe
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

On Thu, Sep 7, 2017 at 5:43 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:

I will work on a fix and send a patch shortly.


Attached is the V28 patch that fixes the issue reported by Rajkumar.
The patch series is exactly same as that of V27 series[1].
The fix is in patch 0002, and macro partition_bound_has_default() is
again moved in 0002 from 0003, as the fix needed to use it.

 
Somehow only 3 patches are their in tar.
Please find the correct tar attached.

Regards,
Jeevan Ladhe
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Ashutosh Bapat
Date:
On Wed, Sep 6, 2017 at 5:50 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
>>
>> I am wondering whether we could avoid call to get_default_partition_oid()
>> in
>> the above block, thus avoiding a sys cache lookup. The sys cache search
>> shouldn't be expensive since the cache should already have that entry, but
>> still if we can avoid it, we save some CPU cycles. The default partition
>> OID is
>> stored in pg_partition_table catalog, which is looked up in
>> RelationGetPartitionKey(), a function which precedes
>> RelationGetPartitionDesc()
>> everywhere. What if that RelationGetPartitionKey() also returns the
>> default
>> partition OID and the common caller passes it to
>> RelationGetPartitionDesc()?.
>
>
> The purpose here is to cross check the relid with partdefid stored in
> catalog
> pg_partitioned_table, though its going to be the same in the parents cache,
> I
> think its better that we retrieve it from the catalog syscache.
> Further, RelationGetPartitionKey() is a macro and not a function, so
> modifying
> the existing simple macro for this reason does not sound a good idea to me.
> Having said this I am open to ideas here.

Sorry, I meant RelationBuildPartitionKey() and
RelationBuildPartitionDesc() instead of RelationGetPartitionKey() and
RelationGetPartitionDesc() resp.

>
>>
>> +    /* A partition cannot be attached if there exists a default partition
>> */
>> +    defaultPartOid = get_default_partition_oid(RelationGetRelid(rel));
>> +    if (OidIsValid(defaultPartOid))
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>> +                 errmsg("cannot attach a new partition to table
>> \"%s\" having a default partition",
>> +                        RelationGetRelationName(rel))));
>> get_default_partition_oid() searches the catalogs, which is not needed
>> when we
>> have relation descriptor of the partitioned table (to which a new
>> partition is
>> being attached). You should get the default partition OID from partition
>> descriptor. That will be cheaper.
>
>
> Something like following can be done here:
>     /* A partition cannot be attached if there exists a default partition */
>     if (partition_bound_has_default(rel->partdesc->boundinfo))
>         ereport(ERROR,
>                 (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>                  errmsg("cannot attach a new partition to table \"%s\"
> having a default partition",
>                         RelationGetRelationName(rel))));
>
> But, partition_bound_has_default() is defined in partition.c and not in
> partition.h. This is done that way because boundinfo is not available in
> partition.h. Further, this piece of code is removed in next patch where we
> extend default partition support to add/attach partition even when default
> partition exists. So, to me I don’t see much of the correction issue here.

If the code is being removed, I don't think we should sweat too much
about it. Sorry for the noise.

>
> Another way to get around this is, we can define another version of
> get_default_partition_oid() something like
> get_default_partition_oid_from_parent_rel()
> in partition.c which looks around in relcache instead of catalog and returns
> the
> oid of default partition, or get_default_partition_oid() accepts both parent
> OID,
> and parent ‘Relation’ rel, if rel is not null look into relcahce and return,
> else search from catalog using OID.

I think we should define a function to return default partition OID
from partition descriptor and make it extern. Define a wrapper which
accepts Relation and returns calls this function to get default
partition OID from partition descriptor. The wrapper will be called
only on an open Relation, wherever it's available.


>
>> I haven't gone through the full patch yet, so there may be more
>> comments here. There is some duplication of code in
>> check_default_allows_bound() and ValidatePartitionConstraints() to
>> scan the children of partition being validated. The difference is that
>> the first one scans the relations in the same function and the second
>> adds them to work queue. May be we could use
>> ValidatePartitionConstraints() to scan the relation or add to the
>> queue based on some input flag may be wqueue argument itself. But I
>> haven't thought through this completely. Any thoughts?
>
>
> check_default_allows_bound() is called only from DefineRelation(),
> and not for alter command. I am not really sure how can we use
> work queue for create command.


No, we shouldn't use work queue for CREATE command. We should extract
the common code into a function to be called from
check_default_allows_bound() and ValidatePartitionConstraints(). To
that function we pass a flag (or the work queue argument itself),
which decides whether to add a work queue item or scan the relation
directly.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Thu, Sep 7, 2017 at 8:13 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> The fix would be much easier if the refactoring patch 0001 by Amul in hash
> partitioning thread[2] is committed.

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
Hi,

On Thu, Sep 7, 2017 at 6:27 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Wed, Sep 6, 2017 at 5:50 PM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>
>>
>> I am wondering whether we could avoid call to get_default_partition_oid()
>> in
>> the above block, thus avoiding a sys cache lookup. The sys cache search
>> shouldn't be expensive since the cache should already have that entry, but
>> still if we can avoid it, we save some CPU cycles. The default partition
>> OID is
>> stored in pg_partition_table catalog, which is looked up in
>> RelationGetPartitionKey(), a function which precedes
>> RelationGetPartitionDesc()
>> everywhere. What if that RelationGetPartitionKey() also returns the
>> default
>> partition OID and the common caller passes it to
>> RelationGetPartitionDesc()?.
>
>
> The purpose here is to cross check the relid with partdefid stored in
> catalog
> pg_partitioned_table, though its going to be the same in the parents cache,
> I
> think its better that we retrieve it from the catalog syscache.
> Further, RelationGetPartitionKey() is a macro and not a function, so
> modifying
> the existing simple macro for this reason does not sound a good idea to me.
> Having said this I am open to ideas here.

Sorry, I meant RelationBuildPartitionKey() and
RelationBuildPartitionDesc() instead of RelationGetPartitionKey() and
RelationGetPartitionDesc() resp.


I get your concern here that we are scanning the pg_partitioned_table syscache
twice when we are building a partition descriptor; first in
RelationBuildPartitionKey() and next in RelationBuildPartitionDesc() when we
call get_default_partition_oid().

To avoid this, I can think of following three different solutions:
1.
Introduce a default partition OID field in PartitionKey structure, and store the
partdefid while we scan pg_partitioned_table syscache in function
RelationBuildPartitionKey(). RelationBuildPartitionDesc() can later retrieve
this field from PartitionKey.

2.
Return the default OID RelationBuildPartitionKey() , and pass that as a parameter to
RelationBuildPartitionDesc().

3.
Introduce a out parameter OID to function RelationBuildPartitionKey() which would store
the partdefid, and pass that as a parameter to RelationBuildPartitionDesc().

I really do not think any of the above solution is very neat and organized or
intuitive. While I understand that the syscache would be scanned twice if we
don’t fix this, we are not building a new cache here for pg_partitioned_table,
we are just scanning it. Moreover, if there is a heavy OLTP going on this
partitioned table we could expect that this relation cache is going to be mostly
there, and RelationBuildPartitionDesc() won’t happen for the same table more
often.

I guess it would be worth getting others(excluding me and Ashutosh) opinion/views
also here.
 
>
>>
>> +    /* A partition cannot be attached if there exists a default partition
>> */
>> +    defaultPartOid = get_default_partition_oid(RelationGetRelid(rel));
>> +    if (OidIsValid(defaultPartOid))
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>> +                 errmsg("cannot attach a new partition to table
>> \"%s\" having a default partition",
>> +                        RelationGetRelationName(rel))));
>> get_default_partition_oid() searches the catalogs, which is not needed
>> when we
>> have relation descriptor of the partitioned table (to which a new
>> partition is
>> being attached). You should get the default partition OID from partition
>> descriptor. That will be cheaper.
>
>
> Something like following can be done here:
>     /* A partition cannot be attached if there exists a default partition */
>     if (partition_bound_has_default(rel->partdesc->boundinfo))
>         ereport(ERROR,
>                 (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>                  errmsg("cannot attach a new partition to table \"%s\"
> having a default partition",
>                         RelationGetRelationName(rel))));
>
> But, partition_bound_has_default() is defined in partition.c and not in
> partition.h. This is done that way because boundinfo is not available in
> partition.h. Further, this piece of code is removed in next patch where we
> extend default partition support to add/attach partition even when default
> partition exists. So, to me I don’t see much of the correction issue here.

If the code is being removed, I don't think we should sweat too much
about it. Sorry for the noise.

>
> Another way to get around this is, we can define another version of
> get_default_partition_oid() something like
> get_default_partition_oid_from_parent_rel()
> in partition.c which looks around in relcache instead of catalog and returns
> the
> oid of default partition, or get_default_partition_oid() accepts both parent
> OID,
> and parent ‘Relation’ rel, if rel is not null look into relcahce and return,
> else search from catalog using OID.

I think we should define a function to return default partition OID
from partition descriptor and make it extern. Define a wrapper which
accepts Relation and returns calls this function to get default
partition OID from partition descriptor. The wrapper will be called
only on an open Relation, wherever it's available.


I have introduced a new function partdesc_get_defpart_oid() to
retrieve the default oid from the partition descriptor and used it
whereever we have relation partition desc available.
Also, I have renamed the existing function get get_default_partition_oid()
to partition_catalog_get_defpart_oid().


>
>> I haven't gone through the full patch yet, so there may be more
>> comments here. There is some duplication of code in
>> check_default_allows_bound() and ValidatePartitionConstraints() to
>> scan the children of partition being validated. The difference is that
>> the first one scans the relations in the same function and the second
>> adds them to work queue. May be we could use
>> ValidatePartitionConstraints() to scan the relation or add to the
>> queue based on some input flag may be wqueue argument itself. But I
>> haven't thought through this completely. Any thoughts?
>
>
> check_default_allows_bound() is called only from DefineRelation(),
> and not for alter command. I am not really sure how can we use
> work queue for create command.


No, we shouldn't use work queue for CREATE command. We should extract
the common code into a function to be called from
check_default_allows_bound() and ValidatePartitionConstraints(). To
that function we pass a flag (or the work queue argument itself),
which decides whether to add a work queue item or scan the relation
directly.
 
I still need to look into this.

Regards,
Jeevan Ladhe 
Attachment

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:
On Fri, Sep 8, 2017 at 6:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 7, 2017 at 8:13 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> The fix would be much easier if the refactoring patch 0001 by Amul in hash
> partitioning thread[2] is committed.

Done.

Thanks Robert for taking care of this.
My V29 patch series[1] is based on this commit now.


Regards,
Jeevan Ladhe 

Re: [HACKERS] Adding support for Default partition in partitioning

From
Robert Haas
Date:
On Fri, Sep 8, 2017 at 10:08 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Thanks Robert for taking care of this.
> My V29 patch series[1] is based on this commit now.

Committed 0001-0003, 0005 with assorted modifications, mostly
cosmetic, but with some actual changes to describeOneTableDetails and
ATExecAttachPartition and minor additions to the regression tests.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

From
Jeevan Ladhe
Date:

On Sat, Sep 9, 2017 at 3:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 8, 2017 at 10:08 AM, Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Thanks Robert for taking care of this.
> My V29 patch series[1] is based on this commit now.

Committed 0001-0003, 0005 with assorted modifications, mostly
cosmetic, but with some actual changes to describeOneTableDetails and
ATExecAttachPartition and minor additions to the regression tests.


Thanks Robert!!