Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1 - Mailing list pgsql-hackers

From Nikhil Sontakke
Subject Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date
Msg-id a301bfd90812152235j7af3bfabh5265c1e3ec813fd9@mail.gmail.com
Whole thread Raw
In response to Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1  (Emmanuel Cecchet <manu@frogthinker.org>)
Responses Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1  (Emmanuel Cecchet <manu@frogthinker.org>)
List pgsql-hackers
Hi,


I will be working on a roadmap for the partitioning features. I think that there are different needs and that we will not be able to address them all in 8.5 or even 8.6.
The goal will be to get things done step by step but possibly with a design that will not require major refactoring to support new features. I will try to setup the new wiki page tomorrow.

+1.
 

In the meantime, I have made some more tests with the trigger in C (see attached patch). To prevent duplicating too much code, it requires the ExecRelCheck method to be exported (that would be nice to have this function exported in 8.4 so that we can start experimenting in 8.4 and don't have to wait another year for 8.5). If there is locality in the inserts (which might be the case if you COPY sorted data), the performance remains constant regardless the number of child tables.

A similar DELETE trigger should be pretty easy to write up in C. I think the main challenge is with UPDATE triggers especially if the new row will fall into another child table - but we can always throw an error for such a case initially.
 

Right now if the row cannot be inserted in any child table, it is inserted in the parent. If you want to fail, we can add a 'fail trigger' (after all other triggers) that generates an error if previous triggers did not capture the row. If you want to create a new partition, you can have another trigger to handle that.

One of the work items related to partitioning eventually is to avoid having to APPEND the parent in all queries involving children. Maybe having an overflow child table might help to catch failed triggers for those cases?

Regards,
Nikhils
 

So I think that this trigger approach is pretty flexible like people used AOP in J2EE servers to process requests. It has also the advantage of allowing fast prototyping. It should also be easy to push that functionality down in the core as needed.

Is it ok if I move Simon's requirement document under a more generic 'Table partitioning' page on the Wiki?

Thanks for your feedback,
manu


Jaime Casanova wrote:
On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark <stark@enterprisedb.com> wrote:
 
Josh Berkus <josh@agliodbs.com> writes:

   
Hackers,

We don't yet seem to have a clear specification for this feature, and the Other
Open Source DB has shown us how problematic it is to get auto-partitioning
wrong.

Should we defer auto-partitioning to 8.5?
     
If we're serious about having a "next generation" partitioning with a concept
of partition keys then it seems to me to make more sense to do that first and
then add on a feature like this.

   

+1

 
This is still very useful. I haven't looked at the actual patch, does it
require core changes or can it be stashed in a pgfoundry or contrib module?

   

what i don't like about this one is that it creates partitions at
create table time and to manually add all new partitions (inherit
tables and modify the trigger)... and what i want to see is an
automatic creation when it's needed...


 


--
Emmanuel Cecchet
FTO @ Frog Thinker Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet


### Eclipse Workspace Patch 1.0
#P Postgres-HEAD
Index: src/test/regress/regress.c
===================================================================
RCS file: /root/cvsrepo/pgsql/src/test/regress/regress.c,v
retrieving revision 1.71
diff -u -r1.71 regress.c
--- src/test/regress/regress.c  25 Mar 2008 22:42:46 -0000      1.71
+++ src/test/regress/regress.c  16 Dec 2008 01:42:50 -0000
@@ -10,6 +10,9 @@
 #include "utils/geo_decls.h"   /* includes <math.h> */
 #include "executor/executor.h" /* For GetAttributeByName */
 #include "commands/sequence.h" /* for nextval() */
+#include "catalog/namespace.h"
+#include "executor/executor.h"
+#include "executor/tuptable.h"

 #define P_MAXDIG 12
 #define LDELIM                 '('
@@ -732,3 +735,141 @@
       *--walk = '\0';
       PG_RETURN_CSTRING(result);
 }
+
+
+/*
+ * Partition trigger test
+ *
+ * The trigger should be used this way:
+ * CREATE TRIGGER trigger_name
+    BEFORE INSERT ON master_table
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(child_table_name, column_number, min_val, max_val);
+ */
+
+extern Datum partition_insert_trigger(PG_FUNCTION_ARGS);
+
+static Datum
+check_constraints_and_insert_tuple(Relation child_table_relation, TriggerData *trigdata ,HeapTuple trigtuple)
+{ // Check the constraints
+       ResultRelInfo   *resultRelInfo;
+       TupleTableSlot  *slot;
+       EState                  *estate = CreateExecutorState();
+       Datum                   result;
+
+       result = PointerGetDatum(trigdata->tg_trigtuple);
+
+       resultRelInfo = makeNode(ResultRelInfo);
+       resultRelInfo->ri_RangeTableIndex = 1;          /* dummy */
+       resultRelInfo->ri_RelationDesc = child_table_relation;
+
+       estate->es_result_relations = resultRelInfo;
+       estate->es_num_result_relations = 1;
+       estate->es_result_relation_info = resultRelInfo;
+
+       /* Set up a tuple slot too */
+       slot = MakeSingleTupleTableSlot(trigdata->tg_relation->rd_att);
+       ExecStoreTuple(trigtuple, slot, InvalidBuffer, false);
+
+       if (ExecRelCheck(resultRelInfo, slot, estate) == NULL)
+       { // Constraints satisfied, insert the row in the child table
+               bool use_wal = true;
+               bool use_fsm=true;
+
+               /* BEFORE ROW INSERT Triggers */
+               if (resultRelInfo->ri_TrigDesc &&
+                               resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0)
+               {
+                       HeapTuple       newtuple;
+                       newtuple = ExecBRInsertTriggers(estate, resultRelInfo, trigtuple);
+
+                       if (newtuple != trigtuple) /* modified by Trigger(s) */
+                       {
+                               heap_freetuple(trigtuple);
+                               trigtuple = newtuple;
+                       }
+               }
+
+               /* Perform the insert
+                * TODO: Check that we detect constraint violation if before row insert does something bad */
+               heap_insert(child_table_relation, trigtuple, GetCurrentCommandId(true), use_wal, use_fsm);
+
+               /* Update indices */
+               ExecOpenIndices(resultRelInfo);
+               if (resultRelInfo->ri_NumIndices > 0)
+                       ExecInsertIndexTuples(slot, &(trigtuple->t_self), estate, false);
+
+               /* AFTER ROW INSERT Triggers */
+               ExecARInsertTriggers(estate, resultRelInfo, trigtuple);
+
+               result = PointerGetDatum(NULL);
+       }
+       // Free resources
+       FreeExecutorState(estate);
+       ExecDropSingleTupleTableSlot(slot);
+
+       return result;
+}
+
+
+PG_FUNCTION_INFO_V1(partition_insert_trigger);
+
+static Relation        last_inserted_relation;
+
+Datum
+partition_insert_trigger(PG_FUNCTION_ARGS)
+{
+    TriggerData *trigdata = (TriggerData *) fcinfo->context;
+    HeapTuple    trigtuple= trigdata->tg_trigtuple;
+    TupleConstr *constr;
+    char           *child_table_name;
+       Relation        child_table_relation;
+       Oid                     relation_id;
+       Datum           result;
+
+       // Try to exploit locality for bulk inserts
+       // We expect consecutive insert to go to the same child table
+       if (last_inserted_relation != NULL)
+       {   // Try the last table we used
+           result = check_constraints_and_insert_tuple(last_inserted_relation, trigdata, trigtuple);
+           if (result == PointerGetDatum(NULL))
+               return result;
+           // We got a miss
+           last_inserted_relation = NULL;
+       }
+
+    /* make sure it's called as a trigger at all */
+    if (!CALLED_AS_TRIGGER(fcinfo))
+        elog(ERROR, "partition_insert_trigger: not called by trigger manager");
+
+    /* Sanity checks */
+    if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event) || !TRIGGER_FIRED_BEFORE(trigdata->tg_event))
+        elog(ERROR, "partition_insert_trigger: not called on insert before");
+
+    // Child table name is either given as the unique parameter or it is the name of the trigger
+    if (trigdata->tg_trigger->tgnargs == 1)
+       child_table_name = trigdata->tg_trigger->tgargs[0];
+    else
+       child_table_name = trigdata->tg_trigger->tgname;
+
+    // Lookup the child relation
+    relation_id = RelnameGetRelid(child_table_name);
+    if (relation_id == InvalidOid)
+       elog(ERROR, "partition_insert_trigger: Invalid child table %s", child_table_name);
+    child_table_relation = RelationIdGetRelation(relation_id);
+    if (child_table_relation == NULL)
+       elog(ERROR, "partition_insert_trigger: Failed to locate relation for child table %s", child_table_name);
+
+       constr = child_table_relation->rd_att->constr;
+       if (constr->num_check == 0)
+               elog(ERROR, "partition_insert_trigger: No constraint found for child table %s", child_table_name);
+
+       result = check_constraints_and_insert_tuple(child_table_relation, trigdata, trigtuple);
+
+    if (result == PointerGetDatum(NULL))
+       last_inserted_relation = child_table_relation;
+
+    RelationClose(child_table_relation);
+
+    return result;
+}
+
Index: src/backend/executor/execMain.c
===================================================================
RCS file: /root/cvsrepo/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.314
diff -u -r1.314 execMain.c
--- src/backend/executor/execMain.c     31 Oct 2008 21:07:54 -0000      1.314
+++ src/backend/executor/execMain.c     16 Dec 2008 01:42:50 -0000
@@ -1947,7 +1947,7 @@
 /*
 * ExecRelCheck --- check that tuple meets constraints for result relation
 */
-static const char *
+const char *
 ExecRelCheck(ResultRelInfo *resultRelInfo,
                        TupleTableSlot *slot, EState *estate)
 {
Index: src/test/regress/output/create_function_1.source
===================================================================
RCS file: /root/cvsrepo/pgsql/src/test/regress/output/create_function_1.source,v
retrieving revision 1.34
diff -u -r1.34 create_function_1.source
--- src/test/regress/output/create_function_1.source    31 Oct 2008 19:37:56 -0000      1.34
+++ src/test/regress/output/create_function_1.source    16 Dec 2008 01:42:50 -0000
@@ -47,6 +47,10 @@
        RETURNS int4
        AS '@libdir@/regress@DLSUFFIX@'
        LANGUAGE C STRICT;
+CREATE FUNCTION partition_insert_trigger ()
+        RETURNS trigger
+        AS '@libdir@/regress@DLSUFFIX@'
+        LANGUAGE C STRICT;
 -- Things that shouldn't work:
 CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
    AS 'SELECT ''not an integer'';';
@@ -80,3 +84,70 @@
 CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
    AS 'nosuch';
 ERROR:  there is no built-in function named "nosuch"
+-- Partitioning trigger test
+CREATE TABLE master (
+  id     int not null,
+  date   date not null,
+  value  int
+);
+CREATE TABLE child_y2008m01 (
+    CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
+) INHERITS (master);
+CREATE TABLE child_y2008m02 (
+    CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
+) INHERITS (master);
+CREATE TABLE child_y2008m03 (
+    CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
+) INHERITS (master);
+CREATE TRIGGER insert_child_y2008m01
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01');
+CREATE TRIGGER child_y2008m02
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+CREATE TRIGGER child_y2008m03
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+INSERT INTO master VALUES (1, '2008-01-15', 1);
+INSERT INTO master VALUES (2, '2008-02-15', 2);
+INSERT INTO master VALUES (3, '2008-03-15', 3);
+INSERT INTO master VALUES (4, '2008-04-15', 4);
+COPY master FROM '/root/pg_partitions/copy_input.txt';
+select * from master;
+ id |    date    | value
+----+------------+-------
+  4 | 04-15-2008 |     4
+  1 | 01-15-2008 |     1
+ 11 | 01-10-2008 |    11
+  2 | 02-15-2008 |     2
+ 12 | 02-15-2008 |    12
+  3 | 03-15-2008 |     3
+ 13 | 03-15-2008 |    13
+(7 rows)
+
+ select * from child_y2008m01;
+ id |    date    | value
+----+------------+-------
+  1 | 01-15-2008 |     1
+ 11 | 01-10-2008 |    11
+(2 rows)
+
+  select * from child_y2008m02;
+ id |    date    | value
+----+------------+-------
+  2 | 02-15-2008 |     2
+ 12 | 02-15-2008 |    12
+(2 rows)
+
+  select * from child_y2008m03;
+ id |    date    | value
+----+------------+-------
+  3 | 03-15-2008 |     3
+ 13 | 03-15-2008 |    13
+(2 rows)
+
+DROP TABLE master CASCADE;
+NOTICE:  drop cascades to 3 other objects
+DETAIL:  drop cascades to table child_y2008m01
+drop cascades to table child_y2008m02
+drop cascades to table child_y2008m03
Index: src/include/executor/executor.h
===================================================================
RCS file: /root/cvsrepo/pgsql/src/include/executor/executor.h,v
retrieving revision 1.152
diff -u -r1.152 executor.h
--- src/include/executor/executor.h     31 Oct 2008 21:07:55 -0000      1.152
+++ src/include/executor/executor.h     16 Dec 2008 01:42:50 -0000
@@ -155,6 +155,8 @@
 extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
                               TupleTableSlot *slot, EState *estate);
+extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo,
+                        TupleTableSlot *slot, EState *estate);
 extern TupleTableSlot *EvalPlanQual(EState *estate, Index rti,
                        ItemPointer tid, TransactionId priorXmax);
 extern PlanState *ExecGetActivePlanTree(QueryDesc *queryDesc);
Index: src/test/regress/input/create_function_1.source
===================================================================
RCS file: /root/cvsrepo/pgsql/src/test/regress/input/create_function_1.source,v
retrieving revision 1.19
diff -u -r1.19 create_function_1.source
--- src/test/regress/input/create_function_1.source     1 Oct 2008 22:38:57 -0000       1.19
+++ src/test/regress/input/create_function_1.source     16 Dec 2008 01:42:50 -0000
@@ -52,6 +52,12 @@
        AS '@libdir@/regress@DLSUFFIX@'
        LANGUAGE C STRICT;

+CREATE FUNCTION partition_insert_trigger ()
+        RETURNS trigger
+        AS '@libdir@/regress@DLSUFFIX@'
+        LANGUAGE C STRICT;
+
+
 -- Things that shouldn't work:

 CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
@@ -77,3 +83,48 @@

 CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
    AS 'nosuch';
+
+-- Partitioning trigger test
+
+CREATE TABLE master (
+  id     int not null,
+  date   date not null,
+  value  int
+);
+
+CREATE TABLE child_y2008m01 (
+    CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
+) INHERITS (master);
+
+CREATE TABLE child_y2008m02 (
+    CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
+) INHERITS (master);
+
+CREATE TABLE child_y2008m03 (
+    CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
+) INHERITS (master);
+
+CREATE TRIGGER insert_child_y2008m01
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01');
+
+CREATE TRIGGER child_y2008m02
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+
+CREATE TRIGGER child_y2008m03
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+
+INSERT INTO master VALUES (1, '2008-01-15', 1);
+INSERT INTO master VALUES (2, '2008-02-15', 2);
+INSERT INTO master VALUES (3, '2008-03-15', 3);
+INSERT INTO master VALUES (4, '2008-04-15', 4);
+COPY master FROM '/root/pg_partitions/copy_input.txt';
+
+select * from master;
+select * from child_y2008m01;
+select * from child_y2008m02;
+select * from child_y2008m03;
+
+DROP TABLE master CASCADE;




--
http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: "Rushabh Lathia"
Date:
Subject: Re: Function with defval returns error
Next
From: "Pavel Stehule"
Date:
Subject: Re: Function with defval returns error