Re: [HACKERS] Adding support for Default partition in partitioning - Mailing list pgsql-hackers

From Jeevan Ladhe
Subject Re: [HACKERS] Adding support for Default partition in partitioning
Date
Msg-id CAOgcT0Ph=MJZTOM+e46QCzVtsrq3B_e6vyPtUJWJg49+kQABRQ@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Adding support for Default partition in partitioning  (Rahila Syed <rahilasyed90@gmail.com>)
Responses Re: [HACKERS] Adding support for Default partition in partitioning
Re: [HACKERS] Adding support for Default partition in partitioning
Re: [HACKERS] Adding support for Default partition in partitioning
List pgsql-hackers
Hi Ashutosh,

Thanks for the detailed review.

Also, please find my feedback on your comments in-lined, I also addressed
the comments given by Robert in attached patch:

On Sat, Jun 3, 2017 at 5:13 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
Here's some detailed review of the code.

@@ -1883,6 +1883,15 @@ heap_drop_with_catalog(Oid relid)
     if (OidIsValid(parentOid))
     {
         /*
+         * Default partition constraints are constructed run-time from the
+         * constraints of its siblings(basically by negating them), so any
+         * change in the siblings needs to rebuild the constraints of the
+         * default partition. So, invalidate the sibling default partition's
+         * relcache.
+         */
+        InvalidateDefaultPartitionRelcache(parentOid);
+
Do we need a lock on the default partition for doing this? A query might be
scanning the default partition directly and we will invalidate the relcache
underneath it. What if two partitions are being dropped simultaneously and
change default constraints simultaneously. Probably the lock on the parent
helps there, but need to check it. What if the default partition cache is
invalidated because partition gets added/dropped to the default partition
itself. If we need a lock on the default partition, we will need to
check the order in which we should be obtaining the locks so as to avoid
deadlocks.
 
Done. I have taken a lock on default partition after acquiring a lock on parent
relation where ever applicable.
 
This also means that we have to test PREPARED statements involving
default partition. Any addition/deletion/attach/detach of other partition
should invalidate those cached statements.

Will add this in next version of patch.
 
+                        if (partition_bound_has_default(boundinfo))
+                        {
+                            overlap = true;
+                            with = boundinfo->default_index;
+                        }
You could possibly rewrite this as
overlap = partition_bound_has_default(boundinfo);
with = boundinfo->default_index;
that would save one indentation and a conditional jump.

Done
 
+    if (partdesc->nparts > 0 && partition_bound_has_default(boundinfo))
+        check_default_allows_bound(parent, spec);
If the table has a default partition, nparts > 0, nparts > 0 check looks
redundant. The comments above should also explain that this check doesn't
trigger when a default partition is added since we don't expect an existing
default partition in such a case.

The check nparts > 0, is needed to make sure that the boundinfo is non-null,
i.e. to confirm that there exists at least one partition so that
partition_bound_has_default() does not result in segmentation fault.
I have changed the condition as below to make it more intuitive:
if (boundinfo && partition_bound_has_default(boundinfo))
Also, I have updated the comment.
 
+ * Checks if there exists any row in the default partition that passes the
+ * check for constraints of new partition, if any reports an error.
grammar two conflicting ifs in the same statement. You may want to rephrase
this as "This function checks if there exists a row in the default
partition that fits in the new
partition and throws an error if it finds one."

Done
 
+    if (new_spec->strategy != PARTITION_STRATEGY_LIST)
+        return;
This should probably be an Assert. When default range partition is supported
this function would silently return, meaning there is no row in the default
partition which fits the new partition. We don't want that behavior.

Agreed, changed.
 
The code in check_default_allows_bound() to check whether the default partition
has any rows that would fit new partition looks quite similar to the code in
ATExecAttachPartition() checking whether all rows in the table being attached
as a partition fit the partition bounds. One thing that
check_default_allows_bound() misses is, if there's already a constraint on the
default partition refutes the partition constraint on the new partition, we can
skip the scan of the default partition since it can not have rows that would
fit the new partition. ATExecAttachPartition() has code to deal with a similar
case i.e. the table being attached has a constraint which implies the partition
constraint. There may be more cases which check_default_allows_bound() does not
handle but ATExecAttachPartition() handles. So, I am wondering whether it's
better to somehow take out the common code into a function and use it. We will
have to deal with a difference through. The first one would throw an error when
finding a row that satisfies partition constraints whereas the second one would
throw an error when it doesn't find such a row. But this difference can be
handled through a flag or by negating the constraint. This would also take care
of Amit Langote's complaint about foreign partitions. There's also another
difference that the ATExecAttachPartition() queues the table for scan and the
actual scan takes place in ATRewriteTable(), but there is not such queue while
creating a table as a partition. But we should check if we can reuse the code to
scan the heap for checking a constraint.

In case of ATTACH PARTITION, probably we should schedule scan of default
partition in the alter table's work queue like what ATExecAttachPartition() is
doing for the table being attached. That would fit in the way alter table
works.

I am still working on this.
But, about your comment here:
"if there's already a constraint on the default partition refutes the partition
constraint on the new partition, we can skip the scan":
I am so far not able to imagine such a case, since default partition constraint
can be imagined something like "minus infinity to positive infinity with
some finite set elimination", and any new non-default partition being added
would simply be a set of finite values(at-least in case of list, but I think range
should not also differ here). Hence one cannot imply the other here. Possibly,
I might be missing something that you had visioned when you raised the flag,
please correct me if I am missing something.
 
 make_partition_op_expr(PartitionKey key, int keynum,
-                       uint16 strategy, Expr *arg1, Expr *arg2)
+                    uint16 strategy, Expr *arg1, Expr *arg2, bool is_default)
Indentation

Done.
 

+                if (is_default &&
+                    ((operoid = get_negator(operoid)) == InvalidOid))
+                    ereport(ERROR, (errcode(ERRCODE_RESTRICT_VIOLATION),
+                                    errmsg("DEFAULT partition cannot
be used without negator of operator  %s",
+                                           get_opname(operoid))));
+
If the existence of default partition depends upon the negator, shouldn't there
be a dependency between the default partition and the negator. At the time of
creating the default partition, we will try to constuct the partition
constraint for the default partition and if the negator doesn't exist that
time, it will throw an error. But in an unlikely event when the user drops the
negator, the partitioned table will not be usable at all, as every time it will
try to create the relcache, it will try to create default partition constraint
and will throw error because of missing negator. That's not a very good
scenario. Have you tried this case? Apart from that, while restoring a dump, if
the default partition gets restored before the negator is created, restore will
fail with this error.

I am looking into this.
  
     /* Generate the main expression, i.e., keyCol = ANY (arr) */
     opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber,
-                                    keyCol, (Expr *) arr);
+                                    keyCol, (Expr *) arr, spec->is_default);
                 /* Build leftop = ANY (rightop) */
                 saopexpr = makeNode(ScalarArrayOpExpr);
The comments in both the places need correction, as for default partition the
expression will be keyCol <> ALL(arr).

Done.

+    /*
+     * In case of the default partition for list, the partition constraint
+     * is basically any value that is not equal to any of the values in
+     * boundinfo->datums array. So, construct a list of constants from
+     * boundinfo->datums to pass to function make_partition_op_expr via
+     * ArrayExpr, which would return a negated expression for the default
+     * partition.
+     */
This is misleading, since the actual constraint would also have NOT NULL or IS
NULL in there depending upon the existence of a NULL partition.
I would simply rephrase this as "For default list partition, collect lists for
all the partitions. The default partition constraint should check that the
partition key is equal to none of those."

Done.

+        ndatums = (pdesc->nparts > 0) ? boundinfo->ndatums : 0;
wouldn't ndatums be simply boundinfo->ndatums? When nparts = 0, ndatums will be
0.

Yes, but in case the default partition is the first partition to be added then
boundinfo will be null and the access to ndatums within it will result in
segmentation fault.
Simplified code to make this more readable.
 
+        int         ndatums = 0;
This assignment looks redundant then.

Per change made for above comment, this is now needed.
 
+        if (boundinfo && partition_bound_accepts_nulls(boundinfo))
You have not checked existence of boundinfo when extracting ndatums out of it
and just few lines below you check that. If the later check is required then we
will get a segfault while extracting ndatums.

The code to extract ndatums is changed and now has a check now for boundinfo,
but it would not have resulted in segmentation fault in its earlier state also,
because there was a check for avoiding this i.e. (pdesc->nparts > 0) ?:...
 
+    if ((!list_has_null && !spec->is_default) ||
+        (list_has_null && spec->is_default))
Need a comment explaining what's going on here. The condition is no more a
simple condition.

-            result = -1;
-            *failed_at = parent;
-            *failed_slot = slot;
-            break;
+            if (partition_bound_has_default(partdesc->boundinfo))
+            {
+                result = parent->indexes[partdesc->boundinfo->default_index];
+
+                if (result >= 0)
+                    break;
+                else
+                    parent = pd[-result];
+            }
+            else
+            {
+                result = -1;
+                *failed_at = parent;
+                *failed_slot = slot;
+                break;
+            }
The code to handle result is duplicated here and few lines below. I think it
would be better to not duplicate it by having separate condition blocks to deal
with setting result and setting parent. Basically if (cur_index < 0) ... else
would set the result breaking when setting result = -1 explicitly. A follow-on
block would adjust the parent if result < 0 or break otherwise.

I have tried to simplified it in attached patch, please let me know if that change
looks any better.
 
Both the places where DEFAULT_PARTITION_INDEX is used, its result is used to
fetch OID of the default partition. So, instead of having this macro, may be we
should have macro to fetch OID of default partition. But even there I don't see
much value in that.
 
Removed the macro, and did this in place at both the places.

Further, the macro and code using that macro fetches
rd_partdesc directly from Relation.
 
Done this where ever applicable. 

We have RelationGetPartitionDesc() for
that. Probably we should also add Asserts to check that every pointer in the
long pointer chain is Non-null.
 
I am sorry, but I did not understand which chain you are trying to point here.

InvalidateDefaultPartitionRelcache() is called in case of drop and detach.
Shouldn't the constraint change when we add or attach a new partition.
Shouldn't we invalidate the cache then as well? I am not able to find that
code in your patch.
 
In case of CREATE/ATTACH this was taken care by a call to
CacheInvalidateRelcache(part_rel) in check_default_allows_bound(), which wasn't
the correct place anyway, and this had a flaw that the invalidation would not
happen in case the default partition is further partitioned.
Now, the relcache for default partition is getting invalidated for
CREATE/DROP/ALTER commands.
 
     /*
+     * Default partition constraints are constructed run-time from the
+     * constraints of its siblings(basically by negating them), so any
+     * change in the siblings needs to rebuild the constraints of the
+     * default partition. So, invalidate the sibling default partition's
+     * relcache.
+     */
May be rephrase this as "The default partition constraints depend upon the
partition bounds of other partitions. Detaching a partition invalidates the
default partition constraints. Invalidate the default partition's relcache so
that the constraints are built anew and any plans dependent on those
constraints are invalidated as well."
 
Done!
 
+                     errmsg("default partition is supported only for
list partitioned table")));
for "a" list partitioned table.
 
Done.
 

+            /*
+             * A default partition, that can be partition of either LIST or
+             * RANGE partitioned table.
+             * Currently this is supported only for LIST partition.
+             */
Keep everything in single paragraph without line break.
 
Not applicable now, as I removed the later part of the comment.

                 }
+
         ;
unnecessary extra line.

Removed.
 
+        /*
+         * The default partition bound does not have any datums to be
+         * transformed, return the new bound.
+         */
Probably not needed.

Removed.
 

+                if (spec->is_default && (strategy == PARTITION_STRATEGY_LIST ||
+                                         strategy == PARTITION_STRATEGY_RANGE))
+                {
+                    appendStringInfoString(buf, "DEFAULT");
+                    break;
+                }
+
What happens if strategy is something other than RANGE or LIST. For that matter
why not just LIST? Possibly you could write this as
+                if (spec->is_default)
+                {
+                    Assert(strategy == PARTITION_STRATEGY_LIST);
+                    appendStringInfoString(buf, "DEFAULT");
+                    break;
+                }

Done.
 
@@ -2044,7 +2044,7 @@ psql_completion(const char *text, int start, int end)
         COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, "");
     /* Limited completion support for partition bound specification */
     else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
-        COMPLETE_WITH_CONST("FOR VALUES");
+        COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
     else if (TailMatches2("FOR", "VALUES"))
         COMPLETE_WITH_LIST2("FROM (", "IN (");

@@ -2483,7 +2483,7 @@ psql_completion(const char *text, int start, int end)
         COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables, "");
     /* Limited completion support for partition bound specification */
     else if (TailMatches3("PARTITION", "OF", MatchAny))
-        COMPLETE_WITH_CONST("FOR VALUES");
+        COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
Do we include psql tab completion in the main feature patch? I have not seen
this earlier. But appreciate taking care of these defails.

I am not sure about this. If needed I can submit a patch to take care of this later, but
as of now I have not removed this from the patch.

+char *ExecBuildSlotValueDescription(Oid reloid,
needs an "extern" declaration.

Per one of the comment[1] given by Amit Langote, I have removed a call to
ExecBuildSlotValueDescription(), and this was a leftover, I cleaned it up.


Regards,
Jeevan Ladhe
Attachment

pgsql-hackers by date:

Previous
From: Sergey Burladyan
Date:
Subject: Re: [HACKERS] Broken hint bits (freeze)
Next
From: Marko Tiikkaja
Date:
Subject: Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table