Re: [HACKERS] Adding support for Default partition in partitioning - Mailing list pgsql-hackers

From Rahila Syed
Subject Re: [HACKERS] Adding support for Default partition in partitioning
Date
Msg-id CAH2L28u1gnddgSTFG=_XR9KBZ4pvT-jaqsxhsQNwV48cys0ZaA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Adding support for Default partition in partitioning  (Rahila Syed <rahilasyed90@gmail.com>)
Responses Re: [HACKERS] Adding support for Default partition in partitioning  (amul sul <sulamul@gmail.com>)
List pgsql-hackers
Please find attached updated patch with review comments by Robert and Jeevan implemented.

The newly proposed syntax
CREATE TABLE .. PARTITION OF .. DEFAULT has got most votes on this thread.

If there is no more objection, I will go ahead and include that in the patch.

Thank you,
Rahila Syed

On Mon, Apr 24, 2017 at 2:40 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

Thank you for reviewing.

>But that's not a good idea for several reasons.  For one thing, you
>can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
>For another thing, this kind of syntax won't generalize to range
>partitioning, which we've talked about making this feature support.
>Maybe something like:

>CREATE TABLE .. PARTITION OF .. DEFAULT;

I agree that the syntax should be changed to also support range partitioning.

Following can also be considered as it specifies more clearly that the
partition holds default values.

CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;

>Maybe we should introduce a dedicated node type to
>represent a default-specification in the parser grammar.  If not, then
>let's at least encapsulate the test a little better, e.g. by adding
>isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
>also whether the name is DEFAULT as expected.  BTW, we typically use
>lower-case internally, so if we stick with this representation it
>should really be "default" not "DEFAULT".

isDefaultPartitionBound() function is created in the attached patch which
checks for both node type and name.

>Why abbreviate "default" to def here?  Seems pointless.
Corrected in the attached.

>Consider &&
Fixed.

>+     * default partiton for rows satisfying the new partition
>Spelling.
Fixed.

>Missing apostrophe
Fixed.

>Definitely not safe against concurrency, since AccessShareLock won't
>exclude somebody else's update.  In fact, it won't even cover somebody
>else's already-in-flight transaction
Changed it to AccessExclusiveLock

>Normally in such cases we try to give more detail using
>ExecBuildSlotValueDescription.
This function is used in execMain.c and the error is being
reported in partition.c.
Do you mean the error reporting should be moved into execMain.c
to use ExecBuildSlotValueDescription?

>This variable starts out true and is never set to any value other than
>true.  Just get rid of it and, in the one place where it is currently
>used, write "true".  That's shorter and clearer.
Fixed.

>There's not really a reason to cast the result of stringToNode() to
>Node * and then turn around and cast it to PartitionBoundSpec *.  Just
>cast it directly to whatever it needs to be.  And use the new castNode
>macro
Fixed. castNode macro takes as input Node * whereas stringToNode() takes string.
IIUC, castNode cant be used here.

>The if (def_elem) test continues
>early, but if the point is that the loop using cell3 shouldn't execute
>in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
>... } } instead of reiterating the ReleaseSysCache in two places?
Fixed in the attached.

I will respond to further comments in following email.


On Thu, Apr 13, 2017 at 12:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 6, 2017 at 7:30 AM, Rahila Syed <rahilasyed90@gmail.com> wrote:
> Thanks a lot for testing and reporting this. Please find attached an updated
> patch with the fix. The patch also contains a fix
> regarding operator used at the time of creating expression as default
> partition constraint. This was notified offlist by Amit Langote.

I think that the syntax for this patch should probably be revised.
Right now the proposal is for:

CREATE TABLE .. PARTITION OF ... FOR VALUES IN (DEFAULT);

But that's not a good idea for several reasons.  For one thing, you
can also write FOR VALUES IN (DEFAULT, 5) or which isn't sensible.
For another thing, this kind of syntax won't generalize to range
partitioning, which we've talked about making this feature support.
Maybe something like:

CREATE TABLE .. PARTITION OF .. DEFAULT;

This patch makes the assumption throughout that any DefElem represents
the word DEFAULT, which is true in the patch as written but doesn't
seem very future-proof.  I think the "def" in "DefElem" stands for
"definition" or "define" or something like that, so this is actually
pretty confusing.  Maybe we should introduce a dedicated node type to
represent a default-specification in the parser grammar.  If not, then
let's at least encapsulate the test a little better, e.g. by adding
isDefaultPartitionBound() which tests not only IsA(..., DefElem) but
also whether the name is DEFAULT as expected.  BTW, we typically use
lower-case internally, so if we stick with this representation it
should really be "default" not "DEFAULT".

Useless hunk:

+    bool        has_def;        /* Is there a default partition?
Currently false
+                                 * for a range partitioned table */
+    int            def_index;        /* Index of the default list
partition. -1 for
+                                 * range partitioned tables */

Why abbreviate "default" to def here?  Seems pointless.

+                    if (found_def)
+                    {
+                        if (mapping[def_index] == -1)
+                            mapping[def_index] = next_index++;
+                    }

Consider &&

@@ -717,7 +754,6 @@ check_new_partition_bound(char *relname, Relation
parent, Node *bound)
                         }
                     }
                 }
-
                 break;
             }

+     * default partiton for rows satisfying the new partition

Spelling.

+     * constraint. If found dont allow addition of a new partition.

Missing apostrophe.

+        defrel = heap_open(defid, AccessShareLock);
+        tupdesc = CreateTupleDescCopy(RelationGetDescr(defrel));
+
+        /* Build expression execution states for partition check quals */
+        partqualstate = ExecPrepareCheck(partConstraint,
+                        estate);
+
+        econtext = GetPerTupleExprContext(estate);
+        snapshot = RegisterSnapshot(GetLatestSnapshot());

Definitely not safe against concurrency, since AccessShareLock won't
exclude somebody else's update.  In fact, it won't even cover somebody
else's already-in-flight transaction.

+                errmsg("new default partition constraint is violated
by some row")));

Normally in such cases we try to give more detail using
ExecBuildSlotValueDescription.

+    bool        is_def = true;

This variable starts out true and is never set to any value other than
true.  Just get rid of it and, in the one place where it is currently
used, write "true".  That's shorter and clearer.

+    inhoids = find_inheritance_children(RelationGetRelid(parent), NoLock);

If it's actually safe to do this with no lock, there ought to be a
comment with a very compelling explanation of why it's safe.

+        boundspec = (Node *) stringToNode(TextDatumGetCString(datum));
+        bspec = (PartitionBoundSpec *)boundspec;

There's not really a reason to cast the result of stringToNode() to
Node * and then turn around and cast it to PartitionBoundSpec *.  Just
cast it directly to whatever it needs to be.  And use the new castNode
macro.

+        foreach(cell1, bspec->listdatums)
+        {
+            Node *value = lfirst(cell1);
+            if (IsA(value, DefElem))
+            {
+                def_elem = true;
+                *defid = inhrelid;
+            }
+        }
+        if (def_elem)
+        {
+            ReleaseSysCache(tuple);
+            continue;
+        }
+        foreach(cell3, bspec->listdatums)
+        {
+            Node *value = lfirst(cell3);
+            boundspecs = lappend(boundspecs, value);
+        }
+        ReleaseSysCache(tuple);
+    }
+    foreach(cell4, spec->listdatums)
+    {
+        Node *value = lfirst(cell4);
+        boundspecs = lappend(boundspecs, value);
+    }

cell1, cell2, cell3, and cell4 are not very clear variable names.
Between that and the lack of comments, this is not easy to understand.
It's sort of spaghetti logic, too.  The if (def_elem) test continues
early, but if the point is that the loop using cell3 shouldn't execute
in that case, why not just put if (!def_elem) { foreach(cell3, ...) {
... } } instead of reiterating the ReleaseSysCache in two places?

+                /* Collect bound spec nodes in a list. This is done
if the partition is
+                 * a default partition. In case of default partition,
constraint is formed
+                 * by performing <> operation over the partition
constraints of the
+                 * existing partitions.
+                 */

I doubt that handles NULLs properly.

+                inhoids =
find_inheritance_children(RelationGetRelid(parent), NoLock);

Again, no lock?  Really?

The logic which follows looks largely cut-and-pasted, which makes me
think you need to do some refactoring here to make it more clear
what's going on, so that you have the relevant logic in just one
place.  It seems wrong anyway to shove all of this logic specific to
the default case into get_qual_from_partbound() when the logic for the
non-default case is inside get_qual_for_list.  Where there were 2
lines of code before you've now got something like 30.

+        if(get_negator(operoid) == InvalidOid)
+            elog(ERROR, "no negator found for partition operator %u",
+                 operoid);

I really doubt that's OK.  elog() shouldn't be reachable, but this
will be reachable if the partitioning operator does not have a
negator.  And there's the NULL-handling issue I mentioned above, too.

+            if (partdesc->boundinfo->has_def && key->strategy
+                == PARTITION_STRATEGY_LIST)
+                result = parent->indexes[partdesc->boundinfo->def_index];

Testing for PARTITION_STRATEGY_LIST here seems unnecessary.  If
has_def (or has_default, as it probably should be) isn't allowed for
range partitions, then it's redundant; if it is allowed, then that
case should be handled too.  Also, at this point we've already set
*failed_at and *failed_slot; presumably you'd want to make this check
before you get to that point.

I suspect there are quite a few more problems here in addition to the
ones mentioned above, but I don't think it makes sense to spend too
much time searching for them until some of this basic stuff is cleaned
up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Attachment

pgsql-hackers by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: [HACKERS] CTE inlining
Next
From: Andres Freund
Date:
Subject: [HACKERS] Re: Potential hot-standby bug around xacts committed but inxl_running_xacts