Thread: Partitioning option for COPY
Hi all, I have extracted the partitioning option for COPY (removed the error logging part) from the previous patch. The documentation and test suite sample are provided as well. More details are on the wiki page at http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY. Ignore the error logging related comments that do not apply here. Looking forward to your feedback Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com Index: src/test/regress/parallel_schedule =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/parallel_schedule,v retrieving revision 1.57 diff -c -r1.57 parallel_schedule *** src/test/regress/parallel_schedule 24 Aug 2009 03:10:16 -0000 1.57 --- src/test/regress/parallel_schedule 11 Nov 2009 03:17:48 -0000 *************** *** 47,53 **** # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect # ---------- # Another group of parallel tests --- 47,53 ---- # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect copy_partitioning # ---------- # Another group of parallel tests Index: src/backend/utils/adt/ruleutils.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/utils/adt/ruleutils.c,v retrieving revision 1.314 diff -c -r1.314 ruleutils.c *** src/backend/utils/adt/ruleutils.c 5 Nov 2009 23:24:25 -0000 1.314 --- src/backend/utils/adt/ruleutils.c 11 Nov 2009 03:17:48 -0000 *************** *** 218,224 **** static Node *processIndirection(Node *node, deparse_context *context, bool printit); static void printSubscripts(ArrayRef *aref, deparse_context *context); ! static char *generate_relation_name(Oid relid, List *namespaces); static char *generate_function_name(Oid funcid, int nargs, List *argnames, Oid *argtypes, bool *is_variadic); static char *generate_operator_name(Oid operid, Oid arg1, Oid arg2); --- 218,224 ---- static Node *processIndirection(Node *node, deparse_context *context, bool printit); static void printSubscripts(ArrayRef *aref, deparse_context *context); ! char *generate_relation_name(Oid relid, List *namespaces); static char *generate_function_name(Oid funcid, int nargs, List *argnames, Oid *argtypes, bool *is_variadic); static char *generate_operator_name(Oid operid, Oid arg1, Oid arg2); *************** *** 6347,6353 **** * We will forcibly qualify the relation name if it equals any CTE name * visible in the namespace list. */ ! static char * generate_relation_name(Oid relid, List *namespaces) { HeapTuple tp; --- 6347,6353 ---- * We will forcibly qualify the relation name if it equals any CTE name * visible in the namespace list. */ ! char * generate_relation_name(Oid relid, List *namespaces) { HeapTuple tp; Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /home/manu/cvsrepo/pgsql/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.92 diff -c -r1.92 copy.sgml *** doc/src/sgml/ref/copy.sgml 21 Sep 2009 20:10:21 -0000 1.92 --- doc/src/sgml/ref/copy.sgml 11 Nov 2009 03:17:48 -0000 *************** *** 41,46 **** --- 41,47 ---- ESCAPE '<replaceable class="parameter">escape_character</replaceable>' FORCE_QUOTE { ( <replaceable class="parameter">column</replaceable> [, ...] ) | * } FORCE_NOT_NULL ( <replaceable class="parameter">column</replaceable> [, ...] ) + PARTITIONING [ <replaceable class="parameter">boolean</replaceable> ] </synopsis> </refsynopsisdiv> *************** *** 282,287 **** --- 283,301 ---- </listitem> </varlistentry> + <varlistentry> + <term><literal>PARTITIONING</></term> + <listitem> + <para> + In <literal>PARTITIONING</> mode, <command>COPY TO</> a parent + table will automatically move each row to the child table that + has the matching constraints. This feature can be used with + <literal>ERROR_LOGGING</> to capture rows that do not match any + constraint in the table hierarchy. See the notes below for the + limitations. + </para> + </listitem> + </varlistentry> </variablelist> </refsect1> *************** *** 384,389 **** --- 398,421 ---- <command>VACUUM</command> to recover the wasted space. </para> + <para> + <literal>PARTITIONING</> mode scans for each child table constraint in the + hierarchy to find a match. As an optimization, a cache of the last child + tables where tuples have been routed is kept and tried first. The size + of the cache is set by the <literal>copy_partitioning_cache_size</literal> + session variable. It the size is set to 0, the cache is disabled otherwise + the indicated number of child tables is kept in the cache (at most). + </para> + + <para> + <literal>PARTITIONING</> mode assumes that every child table has at least + one constraint defined otherwise an error is thrown. If child tables have + overlapping constraints, the row is inserted in the first child table found + (be it a cached table or the first table to appear in the lookup). + ROW and STATEMENT triggers that modify the tuple value after routing has + been performed will lead to unpredictable errors. + </para> + </refsect1> <refsect1> *************** *** 828,833 **** --- 860,1003 ---- 0000200 M B A B W E 377 377 377 377 377 377 </programlisting> </para> + + <para> + Multiple options are separated by a comma like: + <programlisting> + COPY (SELECT t FROM foo WHERE id = 1) TO STDOUT (FORMAT CSV, HEADER, FORCE_QUOTE (t)); + </programlisting> + </para> + + <refsect2> + <title>Partitioning examples</title> + <para> + Here is an example on how to use partitioning. Let's first create a parent + table and 3 child tables as follows: + <programlisting> + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + </programlisting> + We prepare the following data file (1 row for each child table): + copy_input.data content: + <programlisting> + 11 '2008-01-10' 11 + 12 '2008-02-15' 12 + 13 '2008-03-15' 13 + 21 '2008-01-10' 11 + 31 '2008-01-10' 11 + 41 '2008-01-10' 11 + 22 '2008-02-15' 12 + 23 '2008-03-15' 13 + 32 '2008-02-15' 12 + 33 '2008-03-15' 13 + 42 '2008-02-15' 12 + 43 '2008-03-15' 13 + </programlisting> + If we COPY the data in the parent table without partitioning enabled, all + rows are inserted in the master table as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data'; + + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + </programlisting> + If we execute COPY with partitioning enabled, rows are loaded in the + appropriate child table automatically as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + + SELECT * FROM y2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM feb2008; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + </programlisting> + The cache size can be tuned using: + <programlisting> + set copy_partitioning_cache_size = 3; + </programlisting> + Repeating the COPY command will now be faster: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + </programlisting> + </para> + </refsect2> </refsect1> <refsect1> Index: src/include/utils/guc_tables.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/utils/guc_tables.h,v retrieving revision 1.46 diff -c -r1.46 guc_tables.h *** src/include/utils/guc_tables.h 11 Jun 2009 14:49:13 -0000 1.46 --- src/include/utils/guc_tables.h 11 Nov 2009 03:17:48 -0000 *************** *** 76,82 **** COMPAT_OPTIONS_CLIENT, PRESET_OPTIONS, CUSTOM_OPTIONS, ! DEVELOPER_OPTIONS }; /* --- 76,83 ---- COMPAT_OPTIONS_CLIENT, PRESET_OPTIONS, CUSTOM_OPTIONS, ! DEVELOPER_OPTIONS, ! COPY_OPTIONS }; /* Index: src/include/utils/builtins.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/utils/builtins.h,v retrieving revision 1.341 diff -c -r1.341 builtins.h *** src/include/utils/builtins.h 21 Oct 2009 20:38:58 -0000 1.341 --- src/include/utils/builtins.h 11 Nov 2009 03:17:48 -0000 *************** *** 609,614 **** --- 609,615 ---- extern const char *quote_identifier(const char *ident); extern char *quote_qualified_identifier(const char *qualifier, const char *ident); + extern char *generate_relation_name(Oid relid, List *namespaces); /* tid.c */ extern Datum tidin(PG_FUNCTION_ARGS); Index: src/backend/executor/execMain.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.334 diff -c -r1.334 execMain.c *** src/backend/executor/execMain.c 26 Oct 2009 02:26:29 -0000 1.334 --- src/backend/executor/execMain.c 11 Nov 2009 03:17:48 -0000 *************** *** 1235,1241 **** /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! static const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { --- 1235,1241 ---- /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { Index: src/backend/commands/copy.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/commands/copy.c,v retrieving revision 1.317 diff -c -r1.317 copy.c *** src/backend/commands/copy.c 21 Sep 2009 20:10:21 -0000 1.317 --- src/backend/commands/copy.c 11 Nov 2009 03:17:48 -0000 *************** *** 43,48 **** --- 43,56 ---- #include "utils/memutils.h" #include "utils/snapmgr.h" + /* For tuple routing */ + #include "catalog/pg_inherits.h" + #include "catalog/pg_inherits_fn.h" + #include "nodes/makefuncs.h" + #include "nodes/pg_list.h" + #include "utils/fmgroids.h" + #include "utils/relcache.h" + #include "utils/tqual.h" #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7')) #define OCTVALUE(c) ((c) - '0') *************** *** 117,122 **** --- 125,131 ---- char *escape; /* CSV escape char (must be 1 byte) */ bool *force_quote_flags; /* per-column CSV FQ flags */ bool *force_notnull_flags; /* per-column CSV FNN flags */ + bool partitioning; /* tuple routing in table hierarchy */ /* these are just for error messages, see copy_in_error_callback */ const char *cur_relname; /* table name for error messages */ *************** *** 173,178 **** --- 182,208 ---- } DR_copy; + /** + * Size of the LRU list of relations to keep in cache for routing + */ + int partitioningCacheSize; + + typedef struct OidCell OidCell; + + typedef struct OidLinkedList + { + int length; + OidCell *head; + } OidLinkedList; + + struct OidCell + { + Oid oid_value; + OidCell *next; + }; + + OidLinkedList *child_table_lru = NULL; + /* * These macros centralize code used to process line_buf and raw_buf buffers. * They are macros because they often do continue/break control and to avoid *************** *** 839,844 **** --- 869,882 ---- errmsg("argument to option \"%s\" must be a list of column names", defel->defname))); } + else if (strcmp(defel->defname, "partitioning") == 0) + { + if (cstate->partitioning) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + cstate->partitioning = defGetBoolean(defel); + } else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), *************** *** 1662,1667 **** --- 1700,1984 ---- return res; } + /** + * Check that the given tuple matches the constraints of the given child table + * and performs an insert if the constraints are matched. insert_tuple specifies + * if the tuple must be inserted in the table if the constraint is satisfied. + * The method returns true if the constraint is satisfied (and insert was + * performed if insert_tuple is true), false otherwise (constraints not + * satisfied for this tuple on this child table). + */ + static bool + check_tuple_constraints(Relation child_table_relation, HeapTuple tuple, + bool insert_tuple, int hi_options) + { + /* Check the constraints */ + ResultRelInfo *resultRelInfo; + TupleTableSlot *slot; + EState *estate = CreateExecutorState(); + bool result = false; + + resultRelInfo = makeNode(ResultRelInfo); + resultRelInfo->ri_RangeTableIndex = 1; /* dummy */ + resultRelInfo->ri_RelationDesc = child_table_relation; + + estate->es_result_relations = resultRelInfo; + estate->es_num_result_relations = 1; + estate->es_result_relation_info = resultRelInfo; + + /* Set up a tuple slot too */ + slot = MakeSingleTupleTableSlot(child_table_relation->rd_att); + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + + if (ExecRelCheck(resultRelInfo, slot, estate) == NULL) + { + /* Constraints satisfied */ + if (insert_tuple) + { + /* Insert the row in the child table */ + List *recheckIndexes = NIL; + + /* BEFORE ROW INSERT Triggers */ + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0) + { + HeapTuple newtuple; + newtuple = ExecBRInsertTriggers(estate, resultRelInfo, tuple); + + if (newtuple != tuple) + { + /* modified by Trigger(s) */ + heap_freetuple(tuple); + tuple = newtuple; + } + } + + /* Perform the insert + * TODO: Check that we detect constraint violation if before row + * insert does something bad + */ + /* OK, store the tuple and create index entries for it */ + heap_insert(child_table_relation, tuple, GetCurrentCommandId(true), + hi_options, NULL); + + /* Update indices */ + if (resultRelInfo->ri_NumIndices > 0) + recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), + estate, false); + + /* AFTER ROW INSERT Triggers */ + ExecARInsertTriggers(estate, resultRelInfo, tuple, + recheckIndexes); + } + result = true; + } + + /* Free resources */ + FreeExecutorState(estate); + ExecDropSingleTupleTableSlot(slot); + + return result; + } + + + /** + * Route a tuple into a child table that matches the constraints of the tuple + * to be inserted. + * @param parent_relation_id Oid of the parent relation + * @param tuple the tuple to be routed + */ + static bool route_tuple_to_child(Relation parent_relation, HeapTuple tuple, int hi_options) + { + Relation child_table_relation; + bool result = false; + Relation catalog_relation; + HeapTuple inherits_tuple; + HeapScanDesc scan; + ScanKeyData key[1]; + + /* Try to exploit locality for bulk inserts + * We expect consecutive insert to go to the same child table */ + if (partitioningCacheSize > 0 && child_table_lru != NULL) + { + /* Try the child table LRU */ + OidCell *child_oid_cell; + OidCell *previous_cell = NULL; + Oid child_relation_id; + + for (child_oid_cell = child_table_lru->head ; child_oid_cell != NULL ; + child_oid_cell = child_oid_cell->next) + { + child_relation_id = child_oid_cell->oid_value; + child_table_relation = try_relation_open(child_relation_id, + RowExclusiveLock); + + if (child_table_relation == NULL) + { + /* Child table does not exist anymore, purge cache entry */ + if (previous_cell == NULL) + { + child_table_lru->head = child_oid_cell->next; + } + else + { + previous_cell->next = child_oid_cell->next; + } + + pfree(child_oid_cell); + child_table_lru->length--; + continue; + } + + if (check_tuple_constraints(child_table_relation, tuple, true, hi_options)) + { + /* Hit, move in front if not already the head + * Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + if (previous_cell != NULL) + { + previous_cell->next = child_oid_cell->next; + child_oid_cell->next = child_table_lru->head; + child_table_lru->head = child_oid_cell; + } + return true; + } + relation_close(child_table_relation, RowExclusiveLock); + previous_cell = child_oid_cell; + } + /* We got a miss */ + } + + /* Looking up child tables */ + ScanKeyInit(&key[0], + Anum_pg_inherits_inhparent, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(parent_relation->rd_id)); + catalog_relation = heap_open(InheritsRelationId, AccessShareLock); + scan = heap_beginscan(catalog_relation, SnapshotNow, 1, key); + while ((inherits_tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + { + TupleConstr *constr; + Form_pg_inherits inh = (Form_pg_inherits) GETSTRUCT(inherits_tuple); + Oid child_relation_id = inh->inhrelid; + + /* Check if the child table satisfy the constraints, if the relation + * cannot be opened this throws an exception */ + child_table_relation = (Relation) relation_open(child_relation_id, + RowExclusiveLock); + + constr = child_table_relation->rd_att->constr; + if (constr->num_check == 0) + { + ereport(ERROR, ( + errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("partition routing found no constraint for relation %s", + generate_relation_name(child_relation_id, NIL)) + )); + } + + if (has_subclass(child_table_relation->rd_id)) + { + /* This is a parent table, check its constraints first */ + if (check_tuple_constraints(child_table_relation, tuple, false, hi_options)) + { + /* Constraint satisfied, explore the child tables */ + result = route_tuple_to_child(child_table_relation, tuple, hi_options); + if (result) + { + /* Success, one of our child tables matched. + * Release the lock on this parent relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + break; + } + else + { + ereport(ERROR, ( + errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("tuple matched constraints of relation %s but none of " + "its children", + generate_relation_name(child_relation_id, NIL)) + )); + } + } + } + else + { + /* Child table, try it */ + result = check_tuple_constraints(child_table_relation, tuple, true, hi_options); + } + + if (result) + { + /* We found the one, update the LRU and exit the loop! + * + * Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + if (partitioningCacheSize > 0) + { + OidCell *new_head; + + if (child_table_lru == NULL) + { + /* Create the list if it does not exist */ + child_table_lru = (OidLinkedList *)MemoryContextAlloc( + CacheMemoryContext, sizeof(OidLinkedList)); + child_table_lru->length = 0; + child_table_lru->head = NULL; + } + + /* Add the new entry in head of the list */ + new_head = (OidCell *) MemoryContextAlloc( + CacheMemoryContext, sizeof(OidCell)); + new_head->oid_value = child_relation_id; + new_head->next = child_table_lru->head; + child_table_lru->head = new_head; + child_table_lru->length++; + + /* Adjust list size if needed */ + if (child_table_lru->length > partitioningCacheSize) + { + OidCell *child_oid_cell; + OidCell *previous_cell = NULL; + int length = 1; + + for (child_oid_cell = child_table_lru->head ; + child_oid_cell != NULL ; child_oid_cell = child_oid_cell->next) + { + /* Note that partitioningCacheSize is at least 1 so we don't + * have to worry about the head. */ + if (length > partitioningCacheSize) + { + /* Remove entry */ + previous_cell->next = child_oid_cell->next; + pfree(child_oid_cell); + child_oid_cell = previous_cell; + } + else + { + previous_cell = child_oid_cell; + } + length++; + } + child_table_lru->length = partitioningCacheSize; + } + } + break; + } + else + { + /* Release the lock on that relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + } + } + heap_endscan(scan); + heap_close(catalog_relation, AccessShareLock); + return result; + } + /* * Copy FROM file to relation. */ *************** *** 2149,2178 **** { List *recheckIndexes = NIL; ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; } } --- 2466,2518 ---- { List *recheckIndexes = NIL; ! /* If routing is enabled and table has child tables, let's try routing */ ! if (cstate->partitioning && has_subclass(cstate->rel->rd_id)) ! { ! if (route_tuple_to_child(cstate->rel, tuple, hi_options)) ! { ! /* increase the counter so that we return how many ! * tuples got copied into all tables in total */ ! cstate->processed++; ! } ! else ! { ! ereport(ERROR, ( ! errcode(ERRCODE_BAD_COPY_FILE_FORMAT), ! errmsg("tuple does not satisfy any child table constraint") ! )); ! } ! } ! else ! { ! /* No partitioning, prepare the tuple and ! * check the constraints */ ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; ! } } } Index: src/include/executor/executor.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/executor/executor.h,v retrieving revision 1.163 diff -c -r1.163 executor.h *** src/include/executor/executor.h 26 Oct 2009 02:26:41 -0000 1.163 --- src/include/executor/executor.h 11 Nov 2009 03:17:48 -0000 *************** *** 166,171 **** --- 166,173 ---- extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids); extern void ExecConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate); + extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo, + TupleTableSlot *slot, EState *estate); extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate, Relation relation, Index rti, ItemPointer tid, TransactionId priorXmax); Index: src/include/commands/copy.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/commands/copy.h,v retrieving revision 1.32 diff -c -r1.32 copy.h *** src/include/commands/copy.h 1 Jan 2009 17:23:58 -0000 1.32 --- src/include/commands/copy.h 11 Nov 2009 03:17:48 -0000 *************** *** 17,25 **** #include "nodes/parsenodes.h" #include "tcop/dest.h" - extern uint64 DoCopy(const CopyStmt *stmt, const char *queryString); extern DestReceiver *CreateCopyDestReceiver(void); #endif /* COPY_H */ --- 17,29 ---- #include "nodes/parsenodes.h" #include "tcop/dest.h" extern uint64 DoCopy(const CopyStmt *stmt, const char *queryString); extern DestReceiver *CreateCopyDestReceiver(void); + /** + * Size of the LRU list of relations to keep in cache for partitioning in COPY + */ + extern int partitioningCacheSize; + #endif /* COPY_H */ Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.523 diff -c -r1.523 guc.c *** src/backend/utils/misc/guc.c 21 Oct 2009 20:38:58 -0000 1.523 --- src/backend/utils/misc/guc.c 11 Nov 2009 03:17:48 -0000 *************** *** 32,37 **** --- 32,38 ---- #include "access/xact.h" #include "catalog/namespace.h" #include "commands/async.h" + #include "commands/copy.h" #include "commands/prepare.h" #include "commands/vacuum.h" #include "commands/variable.h" *************** *** 534,539 **** --- 535,542 ---- gettext_noop("Customized Options"), /* DEVELOPER_OPTIONS */ gettext_noop("Developer Options"), + /* COPY_OPTIONS */ + gettext_noop("Copy Options"), /* help_config wants this array to be null-terminated */ NULL }; *************** *** 1955,1960 **** --- 1958,2019 ---- 1024, 100, 102400, NULL, NULL }, + { + { + /* variable name */ + "copy_partitioning_cache_size", + + /* context, we want the user to set it */ + PGC_USERSET, + + /* category for this configuration variable */ + COPY_OPTIONS, + + /* short description */ + gettext_noop("Size of the LRU list of child tables to keep in cache " + " when partitioning tuples in COPY."), + + /* long description */ + gettext_noop("When tuples are automatically routed in COPY, all " + "tables are scanned until the constraints are matched. When " + "a large number of child tables are present the scanning " + "overhead can be large. To reduce that overhead, the routing " + "mechanism keeps a cache of the last child tables in which " + "tuples where inserted and try these tables first before " + "performing a full scan. This variable defines the cache size " + "with 0 meaning no caching, 1 keep the last matching child table" + ", x keep the last x child tables in which tuples were inserted." + " Note that the list is managed with an LRU policy."), + + + /* flags: this option is not in the postgresql.conf.sample + * file and should not be allowed in the config. + * NOTE: this is not currently enforced. + */ + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE + }, + + /* pointer to the variable, this one is present in + * src/backend/commands/copy.c + */ + &partitioningCacheSize, + + /* default value */ + 2, + + /* min value */ + 0, + + /* max value */ + INT_MAX, + + /* assign hook function */ + NULL, + + /* show hook function */ + NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL Index: src/test/regress/input/copy_partitioning.source =================================================================== RCS file: src/test/regress/input/copy_partitioning.source diff -N src/test/regress/input/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/input/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,108 ---- + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + + SELECT COUNT(*) FROM y2008; + SELECT COUNT(*) FROM jan2008; + SELECT COUNT(*) FROM jan2008half1; + SELECT COUNT(*) FROM jan2008half2; + SELECT COUNT(*) FROM feb2008; + SELECT COUNT(*) FROM mar2008; + + DELETE FROM y2008; + + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 3; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + DROP TABLE y2008 CASCADE; Index: src/test/regress/output/copy_partitioning.source =================================================================== RCS file: src/test/regress/output/copy_partitioning.source diff -N src/test/regress/output/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/output/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,492 ---- + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int + ); + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half1; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half2; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 3; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + DROP TABLE y2008 CASCADE; + NOTICE: drop cascades to 5 other objects + DETAIL: drop cascades to table jan2008 + drop cascades to table jan2008half1 + drop cascades to table jan2008half2 + drop cascades to table feb2008 + drop cascades to table mar2008 Index: src/test/regress/data/copy_input.data =================================================================== RCS file: src/test/regress/data/copy_input.data diff -N src/test/regress/data/copy_input.data *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/data/copy_input.data 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,12 ---- + 11 '2008-01-19' 11 + 12 '2008-02-15' 12 + 13 '2008-03-15' 13 + 21 '2008-01-10' 11 + 31 '2008-01-10' 11 + 41 '2008-01-10' 11 + 22 '2008-02-15' 12 + 23 '2008-03-15' 13 + 32 '2008-02-15' 12 + 33 '2008-03-15' 13 + 42 '2008-02-15' 12 + 43 '2008-03-15' 13
Emmanuel Cecchet <manu@asterdata.com> wrote: > I have extracted the partitioning option for COPY (removed the error > logging part) from the previous patch. We can use an INSERT trigger to route tuples into partitions even now. Why do you need an additional router for COPY? Also, it would be nicer that the router can works not only in COPY but also in INSERT. BTW, I'm working on meta data of partitioning now. Your "partitioning" option in COPY could be replaced with the catalog. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Hi, >> I have extracted the partitioning option for COPY (removed the error >> logging part) from the previous patch. >> > > We can use an INSERT trigger to route tuples into partitions even now. > Why do you need an additional router for COPY? Tom has already explained on the list why using a trigger was a bad idea (and I know we can use a trigger since I am the one who wrote it). If you look at the code you will see that you can do optimizations in the COPY code that you cannot do in the trigger. > Also, it would be nicer > that the router can works not only in COPY but also in INSERT. > As 8.5 will at best provide a syntactic hack on top of the existing constraint implementation, I think that it will not hurt to have routing in COPY since we will not have it anywhere otherwise. > BTW, I'm working on meta data of partitioning now. Your "partitioning" > option in COPY could be replaced with the catalog. > This implementation is only for the current 8.5 and it will not be needed anymore once we get a fully functional partitioning in Postgres which seems to be for a future version. Best regards, Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
Emmanuel Cecchet <manu@asterdata.com> wrote: > If you look at the code you will see that you can do optimizations in > the COPY code that you cannot do in the trigger. Since the optimizations is nice, I hope it will work not only in COPY but also in INSERT. An idea is moving the partitioning cache into Relation cache, and also moving the routing routines into heap_insert(). My concern is just in the modified position; I think you don't have to change your logic itself. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
> We can use an INSERT trigger to route tuples into partitions even now. > Why do you need an additional router for COPY? Also, it would be nicer > that the router can works not only in COPY but also in INSERT. Yeah, but performance on an insert trigger is impractical for large volumes of data. --Josh Berkus
Emmanuel Cecchet wrote: > Hi all, Hi!, > partitioning option for COPY Here's the review: == Submission == The patch is contextual, applies cleanly to current HEAD, compiles fine. The docs build cleanly. == Docs == They're reasonably clear, although they still mention ERROR_LOGGING, which was taken out of this patch. They could use some wordsmithing, but I didn't go into details, as there were more severe issues with the patch. One thing that made me cautious was the mention that triggers modifying tuples will make random errors appear. As is demonstrated later, triggers are a big issue. == Regression tests == They ran fine, there's one additional regression test that exercises the new option. == Style/nitpicks == Minor gripes include:o instead of using an ad-hoc data structure for the LRU cache list, I'd suggest an OidList from pg_list.h.o some mentions of "method" in comments should be changed to "function"o trailing whitespacein the patch (it's not the end of the world, of course) == Issues == Attached are 3 files that demonstrate problems the patch has.o test1.sql always segfaults for me, poking around with gdbsuggests it's a case of an uninitialised cache list (another reason to use the builtin one).o test2.sql demonstrates, that indices on child tables are not being updated, probably because after resultRelInfo in check_tuple_constraints() gets created is never has ri_NumIndices set, and so the code that was supposed to take care of indices is never called. Looks like a copy-paste error.o test3.sql demonstrates, that some triggers that I would expect to be fired are in fact not fired. I guess it's the same reason as mentioned: ri_TrigDesc never gets set, so the code that calls triggers is dead. I stopped there, because unfortunately, apart from all that there's one fundamental problem with this patch, namely "we probably don't want it". As it stands it's more of a proof of concept than a really usable solution, it feels like built from spare (copied from around copy.c) parts. IMHO it's much too narrow for a general partitioning solution, even if the design it's based upon would be accepted. It's assuming a lot of things about the presence of child tables (with proper constraints), the absence of triggers, and so on. Granted, it solves a particular problem (bulk loading into a partitioned table, with not extra features like triggers and with standard inheritance/exclusive check constraints setup), but that's not good enough in my opinion, even if all other issues would be addressed. Now I'm not a real Postgres user, it's been a while since I worked in a PG shop (or a DB shop for that matter), but from what I understand from following this community for a while, a patch like that doesn't have a lot of chances to be committed. That said, my puny experience with real PG installations and their needs must be taken into account here. I'll mark this patch as "Waiting on Author", but I have little doubt that even after fixing those probably trivial segfaults etc. the patch would be promptly rejected by a committer. I suggest withdrawing it from this commitfest and trying to work out a more complete design first that would address the needs of a bigger variety of users, or joining some of the already underway efforts to bring full-featured partitioning into Postgres. Best, Jan
Jan Urbański wrote: > Emmanuel Cecchet wrote: >> Hi all, > > Hi!, > >> partitioning option for COPY > Attached are 3 files that demonstrate problems the patch has. And the click-before-you-think prize winner is... me. Test cases attached, see the comments for expected/actual results. Jan -- segfaults, probably uninitialised cache oid list -- disabling cache fixes it -- set copy_partitioning_cache_size = 0; drop table parent cascade; create table parent(i int); create table c1 (check (i > 0 and i <= 1)) inherits (parent); copy parent from stdin with (partitioning); 1 \. drop table parent cascade; create table parent(i int); create table c1 (check (i > 0 and i <= 1)) inherits (parent); copy parent from stdin with (partitioning); 1 \. set copy_partitioning_cache_size = 0; drop table parent cascade; create table parent(i int, j int); create table c1 (check (i > 0 and i <= 1)) inherits (parent); create table c2 (check (i > 1 and i <= 2)) inherits (parent); create table c3 (check (i > 2 and i <= 3)) inherits (parent); create index c1_idx on c1(j); copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; copy parent from '/tmp/parent' with (partitioning); analyse; set enable_seqscan to false; -- no rows, index was not updated select * from c1 where j = 3; set enable_seqscan to true; set enable_indexscan to false; -- some rows select * from c1 where j = 3; set copy_partitioning_cache_size = 0; drop table parent cascade; drop table audit cascade; drop function audit(); create table parent(i int); create table c1 (check (i > 0 and i <= 1)) inherits (parent); create table c2 (check (i > 1 and i <= 2)) inherits (parent); create table c3 (check (i > 2 and i <= 3)) inherits (parent); create table audit(i int); create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; create trigger parent_a after insert on parent for each row execute procedure audit(); -- the before trigger on the parent would get fired -- create trigger parent_a2 before insert on parent for each row execute procedure audit(); create trigger c1_a before insert on c1 for each row execute procedure audit(); create trigger c1_a2 after insert on c1 for each row execute procedure audit(); copy parent from stdin with (partitioning); 1 2 3 \. -- no rows select * from audit;
Jan, Here is a new version of the patch. Find the response to your comments embedded in the text. >> partitioning option for COPY >> > > Here's the review: > > == Submission == > The patch is contextual, applies cleanly to current HEAD, compiles fine. > The docs build cleanly. > > == Docs == > They're reasonably clear, although they still mention ERROR_LOGGING, > which was taken out of this patch. They could use some wordsmithing, but > I didn't go into details, as there were more severe issues with the patch. > Removed the text related to ERROR_LOGGING. > One thing that made me cautious was the mention that triggers modifying > tuples will make random errors appear. As is demonstrated later, > triggers are a big issue. > Whichever way routing is implemented we will have to decide what we want to do with triggers. We can decide to fire them or not (there was already a debate whether COPY is an insert statement or not and should fire the statement trigger for insert). This is not a design problem with this patch, we just have to chose what we want to do with triggers when partitioning is involved. IMHO we should disable them altogether but there are scenarios where one could argue that there are still useful. > == Regression tests == > They ran fine, there's one additional regression test that exercises the > new option. > > == Style/nitpicks == > Minor gripes include: > o instead of using an ad-hoc data structure for the LRU cache list, I'd > suggest an OidList from pg_list.h. > Will do if we decide to go further with this patch. > o some mentions of "method" in comments should be changed to "function" > o trailing whitespace in the patch (it's not the end of the world, of > course) > I guess the committer will run pg_indent anyway so I'm not too worried about spaces. > == Issues == > Attached are 3 files that demonstrate problems the patch has. > o test1.sql always segfaults for me, poking around with gdb suggests > it's a case of an uninitialised cache list (another reason to use the > builtin one). > I was never able to reproduce that problem. I don't know where this comes from. > o test2.sql demonstrates, that indices on child tables are not being > updated, probably because after resultRelInfo in > check_tuple_constraints() gets created is never has ri_NumIndices set, > and so the code that was supposed to take care of indices is never > called. Looks like a copy-paste error. > Fixed, actually there was a leak in relcache for the index. > o test3.sql demonstrates, that some triggers that I would expect to be > fired are in fact not fired. I guess it's the same reason as mentioned: > ri_TrigDesc never gets set, so the code that calls triggers is dead. > There is a problem with after row triggers that I did not completely figure out. For some reason, if I use the regular mechanism by calling ExecARInsertTrigger that differ the execution of the trigger until the after row event is triggered, the child relation is not closed and there is a leak in the relcache. I forced the after row triggers to execute synchronously after inserting in the child table to work around the problem. If someone has an explanation, I am willing to do a cleaner implementation! > I stopped there, because unfortunately, apart from all that there's one > fundamental problem with this patch, namely "we probably don't want it". > > As it stands it's more of a proof of concept than a really usable > solution, it feels like built from spare (copied from around copy.c) > parts. IMHO it's much too narrow for a general partitioning solution, > even if the design it's based upon would be accepted. It's assuming a > lot of things about the presence of child tables (with proper > constraints), the absence of triggers, and so on. > > Granted, it solves a particular problem (bulk loading into a partitioned > table, with not extra features like triggers and with standard > inheritance/exclusive check constraints setup), but that's not good > enough in my opinion, even if all other issues would be addressed. > Well, as Postgres does not have any support for real partitioning besides inheritance, and so far it is unlikely that another implementation will happen in the 8.5 timeframe, this feature fills the need for people doing data warehouses. This is a scenario used with every single Aster customer. Now if the Postgres community does not think that the Aster use case is general enough or of interest to be integrated in the code base, this is a different issue and I won't spent time arguing if this is a philosophical/political issue. Note that the new patch works with triggers but you can easily generate corrupt data if your triggers are modifying the data on which the routing decision is based. > Now I'm not a real Postgres user, it's been a while since I worked in a > PG shop (or a DB shop for that matter), but from what I understand from > following this community for a while, a patch like that doesn't have a > lot of chances to be committed. That said, my puny experience with real > PG installations and their needs must be taken into account here. > I don't really understand why a new option of COPY should be solving a general problem. It's an option, and like every option, it is to solve a particular use case. I don't see what is wrong with that. > I'll mark this patch as "Waiting on Author", but I have little doubt > that even after fixing those probably trivial segfaults etc. the patch > would be promptly rejected by a committer. I suggest withdrawing it from > this commitfest and trying to work out a more complete design first that > would address the needs of a bigger variety of users, or joining some of > the already underway efforts to bring full-featured partitioning into > Postgres. > I have integrated your tests in the regression test suite and I was never able to reproduce the segfault you mentioned. What platform are you using? Thanks for your valuable feedback Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com Index: src/backend/commands/trigger.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.256 diff -c -r1.256 trigger.c *** src/backend/commands/trigger.c 27 Oct 2009 20:14:27 -0000 1.256 --- src/backend/commands/trigger.c 15 Nov 2009 23:12:50 -0000 *************** *** 1756,1761 **** --- 1756,1802 ---- return newtuple; } + HeapTuple + ExecARInsertTriggersNow(EState *estate, ResultRelInfo *relinfo, + HeapTuple trigtuple) + { + TriggerDesc *trigdesc = relinfo->ri_TrigDesc; + int ntrigs = trigdesc->n_after_row[TRIGGER_EVENT_INSERT]; + int *tgindx = trigdesc->tg_after_row[TRIGGER_EVENT_INSERT]; + HeapTuple newtuple = trigtuple; + HeapTuple oldtuple; + TriggerData LocTriggerData; + int i; + + LocTriggerData.type = T_TriggerData; + LocTriggerData.tg_event = TRIGGER_EVENT_INSERT | + TRIGGER_EVENT_ROW; + LocTriggerData.tg_relation = relinfo->ri_RelationDesc; + LocTriggerData.tg_newtuple = NULL; + LocTriggerData.tg_newtuplebuf = InvalidBuffer; + for (i = 0; i < ntrigs; i++) + { + Trigger *trigger = &trigdesc->triggers[tgindx[i]]; + + if (!TriggerEnabled(trigger, LocTriggerData.tg_event, NULL)) + continue; + + LocTriggerData.tg_trigtuple = oldtuple = newtuple; + LocTriggerData.tg_trigtuplebuf = InvalidBuffer; + LocTriggerData.tg_trigger = trigger; + newtuple = ExecCallTriggerFunc(&LocTriggerData, + tgindx[i], + relinfo->ri_TrigFunctions, + relinfo->ri_TrigInstrument, + GetPerTupleMemoryContext(estate)); + if (oldtuple != newtuple && oldtuple != trigtuple) + heap_freetuple(oldtuple); + if (newtuple == NULL) + break; + } + return newtuple; + } + void ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple, List *recheckIndexes) Index: src/backend/commands/copy.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/commands/copy.c,v retrieving revision 1.317 diff -c -r1.317 copy.c *** src/backend/commands/copy.c 21 Sep 2009 20:10:21 -0000 1.317 --- src/backend/commands/copy.c 15 Nov 2009 23:12:50 -0000 *************** *** 43,48 **** --- 43,56 ---- #include "utils/memutils.h" #include "utils/snapmgr.h" + /* For tuple routing */ + #include "catalog/pg_inherits.h" + #include "catalog/pg_inherits_fn.h" + #include "nodes/makefuncs.h" + #include "nodes/pg_list.h" + #include "utils/fmgroids.h" + #include "utils/relcache.h" + #include "utils/tqual.h" #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7')) #define OCTVALUE(c) ((c) - '0') *************** *** 117,122 **** --- 125,131 ---- char *escape; /* CSV escape char (must be 1 byte) */ bool *force_quote_flags; /* per-column CSV FQ flags */ bool *force_notnull_flags; /* per-column CSV FNN flags */ + bool partitioning; /* tuple routing in table hierarchy */ /* these are just for error messages, see copy_in_error_callback */ const char *cur_relname; /* table name for error messages */ *************** *** 173,178 **** --- 182,208 ---- } DR_copy; + /** + * Size of the LRU list of relations to keep in cache for routing + */ + int partitioningCacheSize; + + typedef struct OidCell OidCell; + + typedef struct OidLinkedList + { + int length; + OidCell *head; + } OidLinkedList; + + struct OidCell + { + Oid oid_value; + OidCell *next; + }; + + OidLinkedList *child_table_lru = NULL; + /* * These macros centralize code used to process line_buf and raw_buf buffers. * They are macros because they often do continue/break control and to avoid *************** *** 839,844 **** --- 869,882 ---- errmsg("argument to option \"%s\" must be a list of column names", defel->defname))); } + else if (strcmp(defel->defname, "partitioning") == 0) + { + if (cstate->partitioning) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + cstate->partitioning = defGetBoolean(defel); + } else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), *************** *** 1662,1667 **** --- 1700,1992 ---- return res; } + /** + * Check that the given tuple matches the constraints of the given child table + * and performs an insert if the constraints are matched. insert_tuple specifies + * if the tuple must be inserted in the table if the constraint is satisfied. + * The method returns true if the constraint is satisfied (and insert was + * performed if insert_tuple is true), false otherwise (constraints not + * satisfied for this tuple on this child table). + */ + static bool + check_tuple_constraints(Relation child_table_relation, HeapTuple tuple, + bool insert_tuple, int hi_options, ResultRelInfo *parentResultRelInfo) + { + /* Check the constraints */ + ResultRelInfo *resultRelInfo; + TupleTableSlot *slot; + EState *estate = CreateExecutorState(); + bool result = false; + + resultRelInfo = makeNode(ResultRelInfo); + resultRelInfo->ri_RangeTableIndex = 1; /* dummy */ + resultRelInfo->ri_RelationDesc = child_table_relation; + resultRelInfo->ri_TrigDesc = CopyTriggerDesc(child_table_relation->trigdesc); + if (resultRelInfo->ri_TrigDesc) + resultRelInfo->ri_TrigFunctions = (FmgrInfo *) + palloc0(resultRelInfo->ri_TrigDesc->numtriggers * sizeof(FmgrInfo)); + resultRelInfo->ri_TrigInstrument = NULL; + + ExecOpenIndices(resultRelInfo); + + estate->es_result_relations = resultRelInfo; + estate->es_num_result_relations = 1; + estate->es_result_relation_info = resultRelInfo; + + /* Set up a tuple slot too */ + slot = MakeSingleTupleTableSlot(child_table_relation->rd_att); + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + + if (ExecRelCheck(resultRelInfo, slot, estate) == NULL) + { + /* Constraints satisfied */ + if (insert_tuple) + { + /* Insert the row in the child table */ + List *recheckIndexes = NIL; + + /* BEFORE ROW INSERT Triggers */ + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0) + { + HeapTuple newtuple; + newtuple = ExecBRInsertTriggers(estate, resultRelInfo, tuple); + + if (newtuple != tuple) + { + /* modified by Trigger(s) */ + heap_freetuple(tuple); + tuple = newtuple; + } + } + + /* Perform the insert + * TODO: Check that we detect constraint violation if before row + * insert does something bad + */ + /* OK, store the tuple and create index entries for it */ + heap_insert(child_table_relation, tuple, GetCurrentCommandId(true), + hi_options, NULL); + + /* Update indices */ + if (resultRelInfo->ri_NumIndices > 0) + recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), + estate, false); + + /* AFTER ROW INSERT Triggers */ + + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_after_row[TRIGGER_EVENT_INSERT] > 0) + ExecARInsertTriggersNow(estate, resultRelInfo, tuple); + } + result = true; + } + + /* Free resources */ + FreeExecutorState(estate); + ExecDropSingleTupleTableSlot(slot); + ExecCloseIndices(resultRelInfo); + + return result; + } + + + /** + * Route a tuple into a child table that matches the constraints of the tuple + * to be inserted. + * @param parent_relation_id Oid of the parent relation + * @param tuple the tuple to be routed + */ + static bool route_tuple_to_child(Relation parent_relation, HeapTuple tuple, int hi_options, ResultRelInfo *parentResultRelInfo) + { + Relation child_table_relation; + bool result = false; + Relation catalog_relation; + HeapTuple inherits_tuple; + HeapScanDesc scan; + ScanKeyData key[1]; + + /* Try to exploit locality for bulk inserts + * We expect consecutive insert to go to the same child table */ + if (partitioningCacheSize > 0 && child_table_lru != NULL) + { + /* Try the child table LRU */ + OidCell *child_oid_cell; + OidCell *previous_cell = NULL; + Oid child_relation_id; + + for (child_oid_cell = child_table_lru->head ; child_oid_cell != NULL ; + child_oid_cell = child_oid_cell->next) + { + child_relation_id = child_oid_cell->oid_value; + child_table_relation = try_relation_open(child_relation_id, + RowExclusiveLock); + + if (child_table_relation == NULL) + { + /* Child table does not exist anymore, purge cache entry */ + if (previous_cell == NULL) + { + child_table_lru->head = child_oid_cell->next; + } + else + { + previous_cell->next = child_oid_cell->next; + } + + pfree(child_oid_cell); + child_table_lru->length--; + continue; + } + + if (check_tuple_constraints(child_table_relation, tuple, true, hi_options, parentResultRelInfo)) + { + /* Hit, move in front if not already the head + * Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + if (previous_cell != NULL) + { + previous_cell->next = child_oid_cell->next; + child_oid_cell->next = child_table_lru->head; + child_table_lru->head = child_oid_cell; + } + return true; + } + relation_close(child_table_relation, RowExclusiveLock); + previous_cell = child_oid_cell; + } + /* We got a miss */ + } + + /* Looking up child tables */ + ScanKeyInit(&key[0], + Anum_pg_inherits_inhparent, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(parent_relation->rd_id)); + catalog_relation = heap_open(InheritsRelationId, AccessShareLock); + scan = heap_beginscan(catalog_relation, SnapshotNow, 1, key); + while ((inherits_tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + { + TupleConstr *constr; + Form_pg_inherits inh = (Form_pg_inherits) GETSTRUCT(inherits_tuple); + Oid child_relation_id = inh->inhrelid; + + /* Check if the child table satisfy the constraints, if the relation + * cannot be opened this throws an exception */ + child_table_relation = (Relation) relation_open(child_relation_id, + RowExclusiveLock); + + constr = child_table_relation->rd_att->constr; + if (constr->num_check == 0) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("partition routing found no constraint for relation \"%s\"", + RelationGetRelationName(child_table_relation)))); + } + + if (has_subclass(child_table_relation->rd_id)) + { + /* This is a parent table, check its constraints first */ + if (check_tuple_constraints(child_table_relation, tuple, false, hi_options, parentResultRelInfo)) + { + /* Constraint satisfied, explore the child tables */ + result = route_tuple_to_child(child_table_relation, tuple, hi_options, parentResultRelInfo); + if (result) + { + /* Success, one of our child tables matched. + * Release the lock on this parent relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + break; + } + else + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("tuple matched constraints of relation \"%s\" but none of " + "its children", + RelationGetRelationName(child_table_relation)))); + } + } + } + else + { + /* Child table, try it */ + result = check_tuple_constraints(child_table_relation, tuple, true, hi_options, parentResultRelInfo); + } + + if (result) + { + /* We found the one, update the LRU and exit the loop! + * + * Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + if (partitioningCacheSize > 0) + { + OidCell *new_head; + + if (child_table_lru == NULL) + { + /* Create the list if it does not exist */ + child_table_lru = (OidLinkedList *)MemoryContextAlloc( + CacheMemoryContext, sizeof(OidLinkedList)); + child_table_lru->length = 0; + child_table_lru->head = NULL; + } + + /* Add the new entry in head of the list */ + new_head = (OidCell *) MemoryContextAlloc( + CacheMemoryContext, sizeof(OidCell)); + new_head->oid_value = child_relation_id; + new_head->next = child_table_lru->head; + child_table_lru->head = new_head; + child_table_lru->length++; + + /* Adjust list size if needed */ + if (child_table_lru->length > partitioningCacheSize) + { + OidCell *child_oid_cell; + OidCell *previous_cell = NULL; + int length = 1; + + for (child_oid_cell = child_table_lru->head ; + child_oid_cell != NULL ; child_oid_cell = child_oid_cell->next) + { + /* Note that partitioningCacheSize is at least 1 so we don't + * have to worry about the head. */ + if (length > partitioningCacheSize) + { + /* Remove entry */ + previous_cell->next = child_oid_cell->next; + pfree(child_oid_cell); + child_oid_cell = previous_cell; + } + else + { + previous_cell = child_oid_cell; + } + length++; + } + child_table_lru->length = partitioningCacheSize; + } + } + break; + } + else + { + /* Release the lock on that relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + } + } + heap_endscan(scan); + heap_close(catalog_relation, AccessShareLock); + return result; + } + /* * Copy FROM file to relation. */ *************** *** 2149,2178 **** { List *recheckIndexes = NIL; ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; } } --- 2474,2526 ---- { List *recheckIndexes = NIL; ! /* If routing is enabled and table has child tables, let's try routing */ ! if (cstate->partitioning && has_subclass(cstate->rel->rd_id)) ! { ! if (route_tuple_to_child(cstate->rel, tuple, hi_options, resultRelInfo)) ! { ! /* increase the counter so that we return how many ! * tuples got copied into all tables in total */ ! cstate->processed++; ! } ! else ! { ! ereport(ERROR, ( ! errcode(ERRCODE_BAD_COPY_FILE_FORMAT), ! errmsg("tuple does not satisfy any child table constraint") ! )); ! } ! } ! else ! { ! /* No partitioning, prepare the tuple and ! * check the constraints */ ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; ! } } } Index: src/include/commands/trigger.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/commands/trigger.h,v retrieving revision 1.77 diff -c -r1.77 trigger.h *** src/include/commands/trigger.h 26 Oct 2009 02:26:41 -0000 1.77 --- src/include/commands/trigger.h 15 Nov 2009 23:12:50 -0000 *************** *** 130,135 **** --- 130,138 ---- extern HeapTuple ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple); + extern HeapTuple ExecARInsertTriggersNow(EState *estate, + ResultRelInfo *relinfo, + HeapTuple trigtuple); extern void ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple, Index: src/include/commands/copy.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/commands/copy.h,v retrieving revision 1.32 diff -c -r1.32 copy.h *** src/include/commands/copy.h 1 Jan 2009 17:23:58 -0000 1.32 --- src/include/commands/copy.h 15 Nov 2009 23:12:50 -0000 *************** *** 22,25 **** --- 22,30 ---- extern DestReceiver *CreateCopyDestReceiver(void); + /** + * Size of the LRU list of relations to keep in cache for partitioning in COPY + */ + extern int partitioningCacheSize; + #endif /* COPY_H */ Index: src/include/executor/executor.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/executor/executor.h,v retrieving revision 1.163 diff -c -r1.163 executor.h *** src/include/executor/executor.h 26 Oct 2009 02:26:41 -0000 1.163 --- src/include/executor/executor.h 15 Nov 2009 23:12:50 -0000 *************** *** 166,171 **** --- 166,173 ---- extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids); extern void ExecConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate); + extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo, + TupleTableSlot *slot, EState *estate); extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate, Relation relation, Index rti, ItemPointer tid, TransactionId priorXmax); Index: src/backend/executor/execMain.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.334 diff -c -r1.334 execMain.c *** src/backend/executor/execMain.c 26 Oct 2009 02:26:29 -0000 1.334 --- src/backend/executor/execMain.c 15 Nov 2009 23:12:50 -0000 *************** *** 1235,1241 **** /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! static const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { --- 1235,1241 ---- /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.523 diff -c -r1.523 guc.c *** src/backend/utils/misc/guc.c 21 Oct 2009 20:38:58 -0000 1.523 --- src/backend/utils/misc/guc.c 15 Nov 2009 23:12:50 -0000 *************** *** 32,37 **** --- 32,38 ---- #include "access/xact.h" #include "catalog/namespace.h" #include "commands/async.h" + #include "commands/copy.h" #include "commands/prepare.h" #include "commands/vacuum.h" #include "commands/variable.h" *************** *** 534,539 **** --- 535,542 ---- gettext_noop("Customized Options"), /* DEVELOPER_OPTIONS */ gettext_noop("Developer Options"), + /* COPY_OPTIONS */ + gettext_noop("Copy Options"), /* help_config wants this array to be null-terminated */ NULL }; *************** *** 1955,1960 **** --- 1958,2019 ---- 1024, 100, 102400, NULL, NULL }, + { + { + /* variable name */ + "copy_partitioning_cache_size", + + /* context, we want the user to set it */ + PGC_USERSET, + + /* category for this configuration variable */ + COPY_OPTIONS, + + /* short description */ + gettext_noop("Size of the LRU list of child tables to keep in cache " + " when partitioning tuples in COPY."), + + /* long description */ + gettext_noop("When tuples are automatically routed in COPY, all " + "tables are scanned until the constraints are matched. When " + "a large number of child tables are present the scanning " + "overhead can be large. To reduce that overhead, the routing " + "mechanism keeps a cache of the last child tables in which " + "tuples where inserted and try these tables first before " + "performing a full scan. This variable defines the cache size " + "with 0 meaning no caching, 1 keep the last matching child table" + ", x keep the last x child tables in which tuples were inserted." + " Note that the list is managed with an LRU policy."), + + + /* flags: this option is not in the postgresql.conf.sample + * file and should not be allowed in the config. + * NOTE: this is not currently enforced. + */ + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE + }, + + /* pointer to the variable, this one is present in + * src/backend/commands/copy.c + */ + &partitioningCacheSize, + + /* default value */ + 2, + + /* min value */ + 0, + + /* max value */ + INT_MAX, + + /* assign hook function */ + NULL, + + /* show hook function */ + NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL Index: src/test/regress/parallel_schedule =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/parallel_schedule,v retrieving revision 1.57 diff -c -r1.57 parallel_schedule *** src/test/regress/parallel_schedule 24 Aug 2009 03:10:16 -0000 1.57 --- src/test/regress/parallel_schedule 15 Nov 2009 23:12:50 -0000 *************** *** 47,53 **** # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect # ---------- # Another group of parallel tests --- 47,53 ---- # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect copy_partitioning # ---------- # Another group of parallel tests Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /home/manu/cvsrepo/pgsql/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.92 diff -c -r1.92 copy.sgml *** doc/src/sgml/ref/copy.sgml 21 Sep 2009 20:10:21 -0000 1.92 --- doc/src/sgml/ref/copy.sgml 15 Nov 2009 23:12:50 -0000 *************** *** 41,46 **** --- 41,47 ---- ESCAPE '<replaceable class="parameter">escape_character</replaceable>' FORCE_QUOTE { ( <replaceable class="parameter">column</replaceable> [, ...] ) | * } FORCE_NOT_NULL ( <replaceable class="parameter">column</replaceable> [, ...] ) + PARTITIONING [ <replaceable class="parameter">boolean</replaceable> ] </synopsis> </refsynopsisdiv> *************** *** 282,287 **** --- 283,298 ---- </listitem> </varlistentry> + <varlistentry> + <term><literal>PARTITIONING</></term> + <listitem> + <para> + In <literal>PARTITIONING</> mode, <command>COPY TO</> a parent + table will automatically move each row to the child table that + has the matching constraints. + </para> + </listitem> + </varlistentry> </variablelist> </refsect1> *************** *** 384,389 **** --- 395,418 ---- <command>VACUUM</command> to recover the wasted space. </para> + <para> + <literal>PARTITIONING</> mode scans for each child table constraint in the + hierarchy to find a match. As an optimization, a cache of the last child + tables where tuples have been routed is kept and tried first. The size + of the cache is set by the <literal>copy_partitioning_cache_size</literal> + session variable. It the size is set to 0, the cache is disabled otherwise + the indicated number of child tables is kept in the cache (at most). + </para> + + <para> + <literal>PARTITIONING</> mode assumes that every child table has at least + one constraint defined otherwise an error is thrown. If child tables have + overlapping constraints, the row is inserted in the first child table found + (be it a cached table or the first table to appear in the lookup). + ROW and STATEMENT triggers that modify the tuple value after routing has + been performed will lead to unpredictable errors. + </para> + </refsect1> <refsect1> *************** *** 828,833 **** --- 857,1000 ---- 0000200 M B A B W E 377 377 377 377 377 377 </programlisting> </para> + + <para> + Multiple options are separated by a comma like: + <programlisting> + COPY (SELECT t FROM foo WHERE id = 1) TO STDOUT (FORMAT CSV, HEADER, FORCE_QUOTE (t)); + </programlisting> + </para> + + <refsect2> + <title>Partitioning examples</title> + <para> + Here is an example on how to use partitioning. Let's first create a parent + table and 3 child tables as follows: + <programlisting> + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + </programlisting> + We prepare the following data file (1 row for each child table): + copy_input.data content: + <programlisting> + 11 '2008-01-10' 11 + 12 '2008-02-15' 12 + 13 '2008-03-15' 13 + 21 '2008-01-10' 11 + 31 '2008-01-10' 11 + 41 '2008-01-10' 11 + 22 '2008-02-15' 12 + 23 '2008-03-15' 13 + 32 '2008-02-15' 12 + 33 '2008-03-15' 13 + 42 '2008-02-15' 12 + 43 '2008-03-15' 13 + </programlisting> + If we COPY the data in the parent table without partitioning enabled, all + rows are inserted in the master table as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data'; + + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + </programlisting> + If we execute COPY with partitioning enabled, rows are loaded in the + appropriate child table automatically as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + + SELECT * FROM y2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM feb2008; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + </programlisting> + The cache size can be tuned using: + <programlisting> + set copy_partitioning_cache_size = 3; + </programlisting> + Repeating the COPY command will now be faster: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + </programlisting> + </para> + </refsect2> </refsect1> <refsect1> Index: src/include/utils/guc_tables.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/utils/guc_tables.h,v retrieving revision 1.46 diff -c -r1.46 guc_tables.h *** src/include/utils/guc_tables.h 11 Jun 2009 14:49:13 -0000 1.46 --- src/include/utils/guc_tables.h 15 Nov 2009 23:12:50 -0000 *************** *** 76,82 **** COMPAT_OPTIONS_CLIENT, PRESET_OPTIONS, CUSTOM_OPTIONS, ! DEVELOPER_OPTIONS }; /* --- 76,83 ---- COMPAT_OPTIONS_CLIENT, PRESET_OPTIONS, CUSTOM_OPTIONS, ! DEVELOPER_OPTIONS, ! COPY_OPTIONS }; /* Index: src/test/regress/input/copy_partitioning.source =================================================================== RCS file: src/test/regress/input/copy_partitioning.source diff -N src/test/regress/input/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/input/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,180 ---- + -- test 1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + 1 + \. + + drop table parent cascade; + + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + 1 + \. + + drop table parent cascade; + + -- test 2 + set copy_partitioning_cache_size = 0; + create table parent(i int, j int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + + create index c1_idx on c1(j); + copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; + copy parent from '/tmp/parent' with (partitioning); + analyse; + + set enable_seqscan to false; + -- no rows if index was not updated + select * from c1 where j = 3; + + set enable_seqscan to true; + set enable_indexscan to false; + -- 1 row + select * from c1 where j = 3; + drop table parent cascade; + + -- test 3 + set copy_partitioning_cache_size = 0; + + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + + create table audit(i int); + + create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; + + create trigger parent_a after insert on parent for each row execute procedure audit(); + -- the before trigger on the parent would get fired + -- create trigger parent_a2 before insert on parent for each row execute procedure audit(); + create trigger c1_a before insert on c1 for each row execute procedure audit(); + create trigger c1_a2 after insert on c1 for each row execute procedure audit(); + + copy parent from stdin with (partitioning); + 1 + 2 + 3 + \. + + -- no rows if trigger does not work + select * from audit; + + drop table parent cascade; + drop table audit cascade; + drop function audit(); + + -- test cache size + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int, + primary key(id) + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + + SELECT COUNT(*) FROM y2008; + SELECT COUNT(*) FROM jan2008; + SELECT COUNT(*) FROM jan2008half1; + SELECT COUNT(*) FROM jan2008half2; + SELECT COUNT(*) FROM feb2008; + SELECT COUNT(*) FROM mar2008; + + DELETE FROM y2008; + + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 3; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + DROP TABLE y2008 CASCADE; Index: src/test/regress/output/copy_partitioning.source =================================================================== RCS file: src/test/regress/output/copy_partitioning.source diff -N src/test/regress/output/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/output/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,567 ---- + -- test 1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + drop table parent cascade; + NOTICE: drop cascades to table c1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + drop table parent cascade; + NOTICE: drop cascades to table c1 + -- test 2 + set copy_partitioning_cache_size = 0; + create table parent(i int, j int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + create index c1_idx on c1(j); + copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; + copy parent from '/tmp/parent' with (partitioning); + analyse; + set enable_seqscan to false; + -- no rows if index was not updated + select * from c1 where j = 3; + i | j + ---+--- + 1 | 3 + (1 row) + + set enable_seqscan to true; + set enable_indexscan to false; + -- 1 row + select * from c1 where j = 3; + i | j + ---+--- + 1 | 3 + (1 row) + + drop table parent cascade; + NOTICE: drop cascades to 3 other objects + DETAIL: drop cascades to table c1 + drop cascades to table c2 + drop cascades to table c3 + -- test 3 + set copy_partitioning_cache_size = 0; + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + create table audit(i int); + create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; + create trigger parent_a after insert on parent for each row execute procedure audit(); + -- the before trigger on the parent would get fired + -- create trigger parent_a2 before insert on parent for each row execute procedure audit(); + create trigger c1_a before insert on c1 for each row execute procedure audit(); + create trigger c1_a2 after insert on c1 for each row execute procedure audit(); + copy parent from stdin with (partitioning); + -- no rows if trigger does not work + select * from audit; + i + --- + 1 + 1 + (2 rows) + + drop table parent cascade; + NOTICE: drop cascades to 3 other objects + DETAIL: drop cascades to table c1 + drop cascades to table c2 + drop cascades to table c3 + drop table audit cascade; + drop function audit(); + -- test cache size + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int, + primary key(id) + ); + NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "y2008_pkey" for table "y2008" + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half1; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half2; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 3; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + DROP TABLE y2008 CASCADE; + NOTICE: drop cascades to 5 other objects + DETAIL: drop cascades to table jan2008 + drop cascades to table jan2008half1 + drop cascades to table jan2008half2 + drop cascades to table feb2008 + drop cascades to table mar2008
Hi, I'll hopefully look at the next version of the patch tommorrow. Emmanuel Cecchet wrote: >> o test1.sql always segfaults for me, poking around with gdb suggests >> it's a case of an uninitialised cache list (another reason to use the >> builtin one). >> > I was never able to reproduce that problem. I don't know where this > comes from. > I have integrated your tests in the regression test suite and I was > never able to reproduce the segfault you mentioned. What platform are > you using? In the meantime I tried the test1.sql file again and it still segfaulted for me. I'm using 32bit Linux, PG compiled with: $ ./configure CFLAGS=-O0 --enable-cassert --enable-debug --without-perl --without-python --without-openssl --without-tcl and then I start postmaster, fire up psql, attach gdb to the backend, do \i test1.sql and get: Program received signal SIGSEGV, Segmentation fault. 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 1821 child_relation_id = child_oid_cell->oid_value; (gdb) bt #0 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 #1 0x081950e3 in CopyFrom (cstate=0x871e0dc) at copy.c:2480 #2 0x08192532 in DoCopy (stmt=0x86fb144, queryString=0x86fa73c "copy parent from stdin with (partitioning);") at copy.c:1227 (gdb) p child_oid_cell $1 = (OidCell *) 0x7f7f7f7f (gdb) p child_oid_cell->oid_value Cannot access memory at address 0x7f7f7f7f That 0x7f7f7f7f looks like clobbered memory, the memory management funcs do that when cassert is enabled, IIRC. Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin
Jan Urbański <wulczer@wulczer.org> writes: > Program received signal SIGSEGV, Segmentation fault. > 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, > tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 > 1821 child_relation_id = > child_oid_cell->oid_value; > (gdb) p child_oid_cell > $1 = (OidCell *) 0x7f7f7f7f This looks like the patch is trying to create a data structure in a memory context that's not sufficiently long-lived for the use of the structure. If you do this in a non-cassert build, it will seem to work, some of the time, if the memory in question happens to not get reallocated to something else. A good rule of thumb is to never do code development in a non-cassert build. You're just setting yourself up for failure. regards, tom lane
Tom Lane wrote: > A good rule of thumb is to never do code development in a non-cassert > build. And the same rule goes for review, too; I'll update the review guidelines to spell that out more clearly. Basically, if you're doing any work on new code, you should have cassert turned on, *except* if you're doing performance testing. The asserts slow things down enough (particularly with large shared_buffers values) to skew performance tests, but in all other coding situations you should have them enabled. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Tom Lane wrote: > Jan Urbański <wulczer@wulczer.org> writes: > >> Program received signal SIGSEGV, Segmentation fault. >> 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, >> tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 >> 1821 child_relation_id = >> child_oid_cell->oid_value; >> (gdb) p child_oid_cell >> $1 = (OidCell *) 0x7f7f7f7f >> > > This looks like the patch is trying to create a data structure in a > memory context that's not sufficiently long-lived for the use of the > structure. If you do this in a non-cassert build, it will seem to > work, some of the time, if the memory in question happens to not > get reallocated to something else. > I was using the CacheMemoryContext. Could someone tell me why this is wrong and what should have been the appropriate context to use? Thanks Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
Emmanuel Cecchet <manu@asterdata.com> writes: > Tom Lane wrote: >> This looks like the patch is trying to create a data structure in a >> memory context that's not sufficiently long-lived for the use of the >> structure. If you do this in a non-cassert build, it will seem to >> work, some of the time, if the memory in question happens to not >> get reallocated to something else. >> > I was using the CacheMemoryContext. Could someone tell me why this is > wrong and what should have been the appropriate context to use? Well, (a) I doubt you really were creating the list in CacheMemoryContext, else it'd have not gotten clobbered; (b) creating statement-local data structures in CacheMemoryContext is entirely unacceptable anyway, because then they represent a permanent memory leak. The right context for statement-lifetime data structures is generally the CurrentMemoryContext the statement code is called with. regards, tom lane
Tom Lane wrote: > Emmanuel Cecchet <manu@asterdata.com> writes: > >> Tom Lane wrote: >> >>> This looks like the patch is trying to create a data structure in a >>> memory context that's not sufficiently long-lived for the use of the >>> structure. If you do this in a non-cassert build, it will seem to >>> work, some of the time, if the memory in question happens to not >>> get reallocated to something else. >>> >>> >> I was using the CacheMemoryContext. Could someone tell me why this is >> wrong and what should have been the appropriate context to use? >> > > Well, (a) I doubt you really were creating the list in > CacheMemoryContext, else it'd have not gotten clobbered; (b) creating > statement-local data structures in CacheMemoryContext is entirely > unacceptable anyway, because then they represent a permanent memory > leak. > Well I thought that this code would do it: child_table_lru = (OidLinkedList *)MemoryContextAlloc( + CacheMemoryContext, sizeof(OidLinkedList)); ... + /* Add the new entry in head of the list */ + new_head = (OidCell *) MemoryContextAlloc( + CacheMemoryContext, sizeof(OidCell)); > The right context for statement-lifetime data structures is generally > the CurrentMemoryContext the statement code is called with. > Actually the list is supposed to stay around between statement executions. You don't want to restart with a cold cache at every statement so I really want this structure to stay in memory at a more global level. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
Emmanuel Cecchet <manu@asterdata.com> writes: > Actually the list is supposed to stay around between statement > executions. You don't want to restart with a cold cache at every > statement so I really want this structure to stay in memory at a more > global level. Cache? Why do you need a cache for COPY? Repeated bulk loads into the same table within a single session doesn't seem to me to be a case that is common enough to justify a cache. (BTW, the quoted code seems to be busily reinventing OID Lists. Don't do that.) regards, tom lane
Tom Lane wrote: > Emmanuel Cecchet <manu@asterdata.com> writes: > >> Actually the list is supposed to stay around between statement >> executions. You don't want to restart with a cold cache at every >> statement so I really want this structure to stay in memory at a more >> global level. >> > > Cache? Why do you need a cache for COPY? Repeated bulk loads into the > same table within a single session doesn't seem to me to be a case that > is common enough to justify a cache. > Actually the cache is only activated if you use the partitioning option. It is just a list of oids of child tables where tuples were inserted. It is common to have multiple COPY operations in the same session when you are doing bulk loading in a warehouse. > (BTW, the quoted code seems to be busily reinventing OID Lists. Don't > do that.) > Yes, I understood that I should use an OidList instead. But I was trying to understand what I did wrong here (besides reinventing the oid list ;-)). Why do I get this segfault if I use memory from CacheMemoryContext? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
Emmanuel Cecchet <manu@asterdata.com> writes: > Tom Lane wrote: >> Cache? Why do you need a cache for COPY? > Actually the cache is only activated if you use the partitioning option. > It is just a list of oids of child tables where tuples were inserted. Umm ... why is that useful enough to be cached? > Why do I get this segfault if I use memory from CacheMemoryContext? Well, CacheMemoryContext will never be reset, so either you freed the data structure yourself or there's something wrong with the pointer you think is pointing at the data structure ... regards, tom lane
Hi Jan, Here is a new version of the patch with the following modifications: - used oid list from pg_list.h - properly handles triggers and generate an error if needed (updated doc as well) - added your test cases + extra bad trigger cases Emmanuel > Hi, > > I'll hopefully look at the next version of the patch tommorrow. > > Emmanuel Cecchet wrote: > >>> o test1.sql always segfaults for me, poking around with gdb suggests >>> it's a case of an uninitialised cache list (another reason to use the >>> builtin one). >>> >>> >> I was never able to reproduce that problem. I don't know where this >> comes from. >> > > >> I have integrated your tests in the regression test suite and I was >> never able to reproduce the segfault you mentioned. What platform are >> you using? >> > > In the meantime I tried the test1.sql file again and it still segfaulted > for me. > I'm using 32bit Linux, PG compiled with: > > $ ./configure CFLAGS=-O0 --enable-cassert --enable-debug --without-perl > --without-python --without-openssl --without-tcl > > and then I start postmaster, fire up psql, attach gdb to the backend, do > \i test1.sql and get: > > Program received signal SIGSEGV, Segmentation fault. > 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, > tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 > 1821 child_relation_id = > child_oid_cell->oid_value; > (gdb) bt > #0 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, > tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 > #1 0x081950e3 in CopyFrom (cstate=0x871e0dc) at copy.c:2480 > #2 0x08192532 in DoCopy (stmt=0x86fb144, queryString=0x86fa73c "copy > parent from stdin with (partitioning);") at copy.c:1227 > > (gdb) p child_oid_cell > $1 = (OidCell *) 0x7f7f7f7f > > (gdb) p child_oid_cell->oid_value > Cannot access memory at address 0x7f7f7f7f > > > That 0x7f7f7f7f looks like clobbered memory, the memory management funcs > do that when cassert is enabled, IIRC. > > Cheers, > Jan > > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: manu@frogthinker.org Skype: emmanuel_cecchet Index: src/backend/commands/trigger.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.256 diff -c -r1.256 trigger.c *** src/backend/commands/trigger.c 27 Oct 2009 20:14:27 -0000 1.256 --- src/backend/commands/trigger.c 19 Nov 2009 21:19:09 -0000 *************** *** 1756,1761 **** --- 1756,1802 ---- return newtuple; } + HeapTuple + ExecARInsertTriggersNow(EState *estate, ResultRelInfo *relinfo, + HeapTuple trigtuple) + { + TriggerDesc *trigdesc = relinfo->ri_TrigDesc; + int ntrigs = trigdesc->n_after_row[TRIGGER_EVENT_INSERT]; + int *tgindx = trigdesc->tg_after_row[TRIGGER_EVENT_INSERT]; + HeapTuple newtuple = trigtuple; + HeapTuple oldtuple; + TriggerData LocTriggerData; + int i; + + LocTriggerData.type = T_TriggerData; + LocTriggerData.tg_event = TRIGGER_EVENT_INSERT | + TRIGGER_EVENT_ROW; + LocTriggerData.tg_relation = relinfo->ri_RelationDesc; + LocTriggerData.tg_newtuple = NULL; + LocTriggerData.tg_newtuplebuf = InvalidBuffer; + for (i = 0; i < ntrigs; i++) + { + Trigger *trigger = &trigdesc->triggers[tgindx[i]]; + + if (!TriggerEnabled(trigger, LocTriggerData.tg_event, NULL)) + continue; + + LocTriggerData.tg_trigtuple = oldtuple = newtuple; + LocTriggerData.tg_trigtuplebuf = InvalidBuffer; + LocTriggerData.tg_trigger = trigger; + newtuple = ExecCallTriggerFunc(&LocTriggerData, + tgindx[i], + relinfo->ri_TrigFunctions, + relinfo->ri_TrigInstrument, + GetPerTupleMemoryContext(estate)); + if (oldtuple != newtuple && oldtuple != trigtuple) + heap_freetuple(oldtuple); + if (newtuple == NULL) + break; + } + return newtuple; + } + void ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple, List *recheckIndexes) Index: src/backend/commands/copy.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/commands/copy.c,v retrieving revision 1.317 diff -c -r1.317 copy.c *** src/backend/commands/copy.c 21 Sep 2009 20:10:21 -0000 1.317 --- src/backend/commands/copy.c 19 Nov 2009 21:19:09 -0000 *************** *** 43,48 **** --- 43,56 ---- #include "utils/memutils.h" #include "utils/snapmgr.h" + /* For tuple routing */ + #include "catalog/pg_inherits.h" + #include "catalog/pg_inherits_fn.h" + #include "nodes/makefuncs.h" + #include "nodes/pg_list.h" + #include "utils/fmgroids.h" + #include "utils/relcache.h" + #include "utils/tqual.h" #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7')) #define OCTVALUE(c) ((c) - '0') *************** *** 117,122 **** --- 125,131 ---- char *escape; /* CSV escape char (must be 1 byte) */ bool *force_quote_flags; /* per-column CSV FQ flags */ bool *force_notnull_flags; /* per-column CSV FNN flags */ + bool partitioning; /* tuple routing in table hierarchy */ /* these are just for error messages, see copy_in_error_callback */ const char *cur_relname; /* table name for error messages */ *************** *** 173,178 **** --- 182,194 ---- } DR_copy; + /** + * Size of the LRU list of relations to keep in cache for routing + */ + int partitioningCacheSize; + + List *child_table_lru = NULL; + /* * These macros centralize code used to process line_buf and raw_buf buffers. * They are macros because they often do continue/break control and to avoid *************** *** 839,844 **** --- 855,868 ---- errmsg("argument to option \"%s\" must be a list of column names", defel->defname))); } + else if (strcmp(defel->defname, "partitioning") == 0) + { + if (cstate->partitioning) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + cstate->partitioning = defGetBoolean(defel); + } else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), *************** *** 1662,1667 **** --- 1686,1960 ---- return res; } + /** + * Check that the given tuple matches the constraints of the given child table + * and performs an insert if the constraints are matched. insert_tuple specifies + * if the tuple must be inserted in the table if the constraint is satisfied. + * The method returns true if the constraint is satisfied (and insert was + * performed if insert_tuple is true), false otherwise (constraints not + * satisfied for this tuple on this child table). + */ + static bool + check_tuple_constraints(Relation child_table_relation, HeapTuple tuple, + bool insert_tuple, int hi_options, ResultRelInfo *parentResultRelInfo) + { + /* Check the constraints */ + ResultRelInfo *resultRelInfo; + TupleTableSlot *slot; + EState *estate = CreateExecutorState(); + bool result = false; + + resultRelInfo = makeNode(ResultRelInfo); + resultRelInfo->ri_RangeTableIndex = 1; /* dummy */ + resultRelInfo->ri_RelationDesc = child_table_relation; + resultRelInfo->ri_TrigDesc = CopyTriggerDesc(child_table_relation->trigdesc); + if (resultRelInfo->ri_TrigDesc) + resultRelInfo->ri_TrigFunctions = (FmgrInfo *) + palloc0(resultRelInfo->ri_TrigDesc->numtriggers * sizeof(FmgrInfo)); + resultRelInfo->ri_TrigInstrument = NULL; + + ExecOpenIndices(resultRelInfo); + + estate->es_result_relations = resultRelInfo; + estate->es_num_result_relations = 1; + estate->es_result_relation_info = resultRelInfo; + + /* Set up a tuple slot too */ + slot = MakeSingleTupleTableSlot(child_table_relation->rd_att); + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + + if (ExecRelCheck(resultRelInfo, slot, estate) == NULL) + { + /* Constraints satisfied */ + if (insert_tuple) + { + /* Insert the row in the child table */ + List *recheckIndexes = NIL; + + /* BEFORE ROW INSERT Triggers */ + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0) + { + HeapTuple newtuple; + newtuple = ExecBRInsertTriggers(estate, resultRelInfo, tuple); + + if (newtuple != tuple) + { + /* tuple modified by Trigger(s), check that the constraint is still valid */ + heap_freetuple(tuple); + tuple = newtuple; + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + if (ExecRelCheck(resultRelInfo, slot, estate) != NULL) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("Before row insert trigger on table \"%s\" modified partitioning routing decision.Aborting insert.", + RelationGetRelationName(child_table_relation)))); + } + } + } + + /* OK, store the tuple and create index entries for it */ + heap_insert(child_table_relation, tuple, GetCurrentCommandId(true), + hi_options, NULL); + + /* Update indices */ + if (resultRelInfo->ri_NumIndices > 0) + recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), + estate, false); + + /* AFTER ROW INSERT Triggers */ + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_after_row[TRIGGER_EVENT_INSERT] > 0) + { + HeapTuple newtuple; + newtuple = ExecARInsertTriggersNow(estate, resultRelInfo, tuple); + if (newtuple != tuple) + { + /* tuple modified by Trigger(s), check that the constraint is still valid */ + heap_freetuple(tuple); + tuple = newtuple; + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + if (ExecRelCheck(resultRelInfo, slot, estate) != NULL) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("After row insert trigger on table \"%s\" modified partitioning routingdecision. Aborting insert.", + RelationGetRelationName(child_table_relation)))); + } + } + } + } + result = true; + } + + /* Free resources */ + FreeExecutorState(estate); + ExecDropSingleTupleTableSlot(slot); + ExecCloseIndices(resultRelInfo); + + return result; + } + + + /** + * Route a tuple into a child table that matches the constraints of the tuple + * to be inserted. + * @param parent_relation_id Oid of the parent relation + * @param tuple the tuple to be routed + */ + static bool route_tuple_to_child(Relation parent_relation, HeapTuple tuple, int hi_options, ResultRelInfo *parentResultRelInfo) + { + Relation child_table_relation; + bool result = false; + Relation catalog_relation; + HeapTuple inherits_tuple; + HeapScanDesc scan; + ScanKeyData key[1]; + + /* Try to exploit locality for bulk inserts + * We expect consecutive insert to go to the same child table */ + if (partitioningCacheSize > 0 && child_table_lru != NULL) + { + /* Try the child table LRU */ + ListCell *child_oid_cell; + Oid child_relation_id; + + foreach(child_oid_cell, child_table_lru) + { + child_relation_id = lfirst_oid(child_oid_cell); + child_table_relation = try_relation_open(child_relation_id, + RowExclusiveLock); + + if (child_table_relation == NULL) + { + /* Child table does not exist anymore, purge cache entry */ + child_table_lru = list_delete_oid(child_table_lru, child_relation_id); + if (list_length(child_table_lru) == 0) + break; /* Cache is now empty */ + else + { /* Restart scanning */ + child_oid_cell = list_head(child_table_lru); + continue; + } + } + + if (check_tuple_constraints(child_table_relation, tuple, true, hi_options, parentResultRelInfo)) + { + /* Hit, move in front if not already the head */ + if (lfirst_oid(list_head(child_table_lru)) != child_relation_id) + { + /* The partitioning cache is in the CacheMemoryContext) */ + MemoryContext currentContext = MemoryContextSwitchTo(CacheMemoryContext); + child_table_lru = list_delete_oid(child_table_lru, child_relation_id); + child_table_lru = lcons_oid(child_relation_id, child_table_lru); + MemoryContextSwitchTo(currentContext); + } + + /* Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + return true; + } + relation_close(child_table_relation, RowExclusiveLock); + } + /* We got a miss */ + } + + /* Looking up child tables */ + ScanKeyInit(&key[0], + Anum_pg_inherits_inhparent, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(parent_relation->rd_id)); + catalog_relation = heap_open(InheritsRelationId, AccessShareLock); + scan = heap_beginscan(catalog_relation, SnapshotNow, 1, key); + while ((inherits_tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + { + TupleConstr *constr; + Form_pg_inherits inh = (Form_pg_inherits) GETSTRUCT(inherits_tuple); + Oid child_relation_id = inh->inhrelid; + + /* Check if the child table satisfy the constraints, if the relation + * cannot be opened this throws an exception */ + child_table_relation = (Relation) relation_open(child_relation_id, + RowExclusiveLock); + + constr = child_table_relation->rd_att->constr; + if (constr->num_check == 0) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("partition routing found no constraint for relation \"%s\"", + RelationGetRelationName(child_table_relation)))); + } + + if (has_subclass(child_table_relation->rd_id)) + { + /* This is a parent table, check its constraints first */ + if (check_tuple_constraints(child_table_relation, tuple, false, hi_options, parentResultRelInfo)) + { + /* Constraint satisfied, explore the child tables */ + result = route_tuple_to_child(child_table_relation, tuple, hi_options, parentResultRelInfo); + if (result) + { + /* Success, one of our child tables matched. + * Release the lock on this parent relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + break; + } + else + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("tuple matched constraints of relation \"%s\" but none of " + "its children", + RelationGetRelationName(child_table_relation)))); + } + } + } + else + { + /* Child table, try it */ + result = check_tuple_constraints(child_table_relation, tuple, true, hi_options, parentResultRelInfo); + } + + if (result) + { + /* We found the one, update the LRU and exit the loop! + * + * Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + if (partitioningCacheSize > 0) + { + /* The partitioning cache is in the CacheMemoryContext) */ + MemoryContext currentContext; + currentContext = MemoryContextSwitchTo(CacheMemoryContext); + + /* Add the new entry in head of the list (also builds the list if needed) */ + child_table_lru = lcons_oid(child_relation_id, child_table_lru); + + /* Adjust list size if needed */ + child_table_lru = list_truncate(child_table_lru, partitioningCacheSize); + + /* Restore memory context */ + MemoryContextSwitchTo(currentContext); + } + break; + } + else + { + /* Release the lock on that relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + } + } + heap_endscan(scan); + heap_close(catalog_relation, AccessShareLock); + return result; + } + /* * Copy FROM file to relation. */ *************** *** 2149,2178 **** { List *recheckIndexes = NIL; ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; } } --- 2442,2494 ---- { List *recheckIndexes = NIL; ! /* If routing is enabled and table has child tables, let's try routing */ ! if (cstate->partitioning && has_subclass(cstate->rel->rd_id)) ! { ! if (route_tuple_to_child(cstate->rel, tuple, hi_options, resultRelInfo)) ! { ! /* increase the counter so that we return how many ! * tuples got copied into all tables in total */ ! cstate->processed++; ! } ! else ! { ! ereport(ERROR, ( ! errcode(ERRCODE_BAD_COPY_FILE_FORMAT), ! errmsg("tuple does not satisfy any child table constraint") ! )); ! } ! } ! else ! { ! /* No partitioning, prepare the tuple and ! * check the constraints */ ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; ! } } } Index: src/include/commands/trigger.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/commands/trigger.h,v retrieving revision 1.77 diff -c -r1.77 trigger.h *** src/include/commands/trigger.h 26 Oct 2009 02:26:41 -0000 1.77 --- src/include/commands/trigger.h 19 Nov 2009 21:19:09 -0000 *************** *** 130,135 **** --- 130,138 ---- extern HeapTuple ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple); + extern HeapTuple ExecARInsertTriggersNow(EState *estate, + ResultRelInfo *relinfo, + HeapTuple trigtuple); extern void ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple, Index: src/include/commands/copy.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/commands/copy.h,v retrieving revision 1.32 diff -c -r1.32 copy.h *** src/include/commands/copy.h 1 Jan 2009 17:23:58 -0000 1.32 --- src/include/commands/copy.h 19 Nov 2009 21:19:09 -0000 *************** *** 22,25 **** --- 22,30 ---- extern DestReceiver *CreateCopyDestReceiver(void); + /** + * Size of the LRU list of relations to keep in cache for partitioning in COPY + */ + extern int partitioningCacheSize; + #endif /* COPY_H */ Index: src/include/executor/executor.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/executor/executor.h,v retrieving revision 1.163 diff -c -r1.163 executor.h *** src/include/executor/executor.h 26 Oct 2009 02:26:41 -0000 1.163 --- src/include/executor/executor.h 19 Nov 2009 21:19:09 -0000 *************** *** 166,171 **** --- 166,173 ---- extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids); extern void ExecConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate); + extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo, + TupleTableSlot *slot, EState *estate); extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate, Relation relation, Index rti, ItemPointer tid, TransactionId priorXmax); Index: src/backend/executor/execMain.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.334 diff -c -r1.334 execMain.c *** src/backend/executor/execMain.c 26 Oct 2009 02:26:29 -0000 1.334 --- src/backend/executor/execMain.c 19 Nov 2009 21:19:09 -0000 *************** *** 1235,1241 **** /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! static const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { --- 1235,1241 ---- /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.523 diff -c -r1.523 guc.c *** src/backend/utils/misc/guc.c 21 Oct 2009 20:38:58 -0000 1.523 --- src/backend/utils/misc/guc.c 19 Nov 2009 21:19:09 -0000 *************** *** 32,37 **** --- 32,38 ---- #include "access/xact.h" #include "catalog/namespace.h" #include "commands/async.h" + #include "commands/copy.h" #include "commands/prepare.h" #include "commands/vacuum.h" #include "commands/variable.h" *************** *** 534,539 **** --- 535,542 ---- gettext_noop("Customized Options"), /* DEVELOPER_OPTIONS */ gettext_noop("Developer Options"), + /* COPY_OPTIONS */ + gettext_noop("Copy Options"), /* help_config wants this array to be null-terminated */ NULL }; *************** *** 1955,1960 **** --- 1958,2019 ---- 1024, 100, 102400, NULL, NULL }, + { + { + /* variable name */ + "copy_partitioning_cache_size", + + /* context, we want the user to set it */ + PGC_USERSET, + + /* category for this configuration variable */ + COPY_OPTIONS, + + /* short description */ + gettext_noop("Size of the LRU list of child tables to keep in cache " + " when partitioning tuples in COPY."), + + /* long description */ + gettext_noop("When tuples are automatically routed in COPY, all " + "tables are scanned until the constraints are matched. When " + "a large number of child tables are present the scanning " + "overhead can be large. To reduce that overhead, the routing " + "mechanism keeps a cache of the last child tables in which " + "tuples where inserted and try these tables first before " + "performing a full scan. This variable defines the cache size " + "with 0 meaning no caching, 1 keep the last matching child table" + ", x keep the last x child tables in which tuples were inserted." + " Note that the list is managed with an LRU policy."), + + + /* flags: this option is not in the postgresql.conf.sample + * file and should not be allowed in the config. + * NOTE: this is not currently enforced. + */ + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE + }, + + /* pointer to the variable, this one is present in + * src/backend/commands/copy.c + */ + &partitioningCacheSize, + + /* default value */ + 2, + + /* min value */ + 0, + + /* max value */ + INT_MAX, + + /* assign hook function */ + NULL, + + /* show hook function */ + NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL Index: src/test/regress/parallel_schedule =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/parallel_schedule,v retrieving revision 1.57 diff -c -r1.57 parallel_schedule *** src/test/regress/parallel_schedule 24 Aug 2009 03:10:16 -0000 1.57 --- src/test/regress/parallel_schedule 19 Nov 2009 21:19:09 -0000 *************** *** 47,53 **** # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect # ---------- # Another group of parallel tests --- 47,55 ---- # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect ! test: copy_partitioning ! test: copy_partitioning_trigger # ---------- # Another group of parallel tests Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /home/manu/cvsrepo/pgsql/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.92 diff -c -r1.92 copy.sgml *** doc/src/sgml/ref/copy.sgml 21 Sep 2009 20:10:21 -0000 1.92 --- doc/src/sgml/ref/copy.sgml 19 Nov 2009 21:19:09 -0000 *************** *** 41,46 **** --- 41,47 ---- ESCAPE '<replaceable class="parameter">escape_character</replaceable>' FORCE_QUOTE { ( <replaceable class="parameter">column</replaceable> [, ...] ) | * } FORCE_NOT_NULL ( <replaceable class="parameter">column</replaceable> [, ...] ) + PARTITIONING [ <replaceable class="parameter">boolean</replaceable> ] </synopsis> </refsynopsisdiv> *************** *** 282,287 **** --- 283,298 ---- </listitem> </varlistentry> + <varlistentry> + <term><literal>PARTITIONING</></term> + <listitem> + <para> + In <literal>PARTITIONING</> mode, <command>COPY TO</> a parent + table will automatically move each row to the child table that + has the matching constraints. + </para> + </listitem> + </varlistentry> </variablelist> </refsect1> *************** *** 384,389 **** --- 395,419 ---- <command>VACUUM</command> to recover the wasted space. </para> + <para> + <literal>PARTITIONING</> mode scans for each child table constraint in the + hierarchy to find a match. As an optimization, a cache of the last child + tables where tuples have been routed is kept and tried first. The size + of the cache is set by the <literal>copy_partitioning_cache_size</literal> + session variable. It the size is set to 0, the cache is disabled otherwise + the indicated number of child tables is kept in the cache (at most). + </para> + + <para> + <literal>PARTITIONING</> mode assumes that every child table has at least + one constraint defined otherwise an error is thrown. If child tables have + overlapping constraints, the row is inserted in the first child table found + (be it a cached table or the first table to appear in the lookup). + Before of after ROW triggers will generate an error and abort the COPY operation + if they modify the tuple value in a way that violates the constraints of the child + table where the tuple has been routed. + </para> + </refsect1> <refsect1> *************** *** 828,833 **** --- 858,1001 ---- 0000200 M B A B W E 377 377 377 377 377 377 </programlisting> </para> + + <para> + Multiple options are separated by a comma like: + <programlisting> + COPY (SELECT t FROM foo WHERE id = 1) TO STDOUT (FORMAT CSV, HEADER, FORCE_QUOTE (t)); + </programlisting> + </para> + + <refsect2> + <title>Partitioning examples</title> + <para> + Here is an example on how to use partitioning. Let's first create a parent + table and 3 child tables as follows: + <programlisting> + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + </programlisting> + We prepare the following data file (1 row for each child table): + copy_input.data content: + <programlisting> + 11 '2008-01-10' 11 + 12 '2008-02-15' 12 + 13 '2008-03-15' 13 + 21 '2008-01-10' 11 + 31 '2008-01-10' 11 + 41 '2008-01-10' 11 + 22 '2008-02-15' 12 + 23 '2008-03-15' 13 + 32 '2008-02-15' 12 + 33 '2008-03-15' 13 + 42 '2008-02-15' 12 + 43 '2008-03-15' 13 + </programlisting> + If we COPY the data in the parent table without partitioning enabled, all + rows are inserted in the master table as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data'; + + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + </programlisting> + If we execute COPY with partitioning enabled, rows are loaded in the + appropriate child table automatically as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + + SELECT * FROM y2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM feb2008; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + </programlisting> + The cache size can be tuned using: + <programlisting> + set copy_partitioning_cache_size = 3; + </programlisting> + Repeating the COPY command will now be faster: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + </programlisting> + </para> + </refsect2> </refsect1> <refsect1> Index: src/include/utils/guc_tables.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/utils/guc_tables.h,v retrieving revision 1.46 diff -c -r1.46 guc_tables.h *** src/include/utils/guc_tables.h 11 Jun 2009 14:49:13 -0000 1.46 --- src/include/utils/guc_tables.h 19 Nov 2009 21:19:09 -0000 *************** *** 76,82 **** COMPAT_OPTIONS_CLIENT, PRESET_OPTIONS, CUSTOM_OPTIONS, ! DEVELOPER_OPTIONS }; /* --- 76,83 ---- COMPAT_OPTIONS_CLIENT, PRESET_OPTIONS, CUSTOM_OPTIONS, ! DEVELOPER_OPTIONS, ! COPY_OPTIONS }; /* Index: src/test/regress/input/copy_partitioning_trigger.source =================================================================== RCS file: src/test/regress/input/copy_partitioning_trigger.source diff -N src/test/regress/input/copy_partitioning_trigger.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/input/copy_partitioning_trigger.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,62 ---- + -- Test triggers with partitioning + set copy_partitioning_cache_size = 0; + + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create table t3 (check (i > 2 and i <= 3)) inherits (t); + + create table audit(i int); + + create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; + + create trigger t_a after insert on t for each row execute procedure audit(); + -- the before trigger on the t would get fired + -- create trigger t_a2 before insert on t for each row execute procedure audit(); + create trigger t1_a before insert on t1 for each row execute procedure audit(); + create trigger t1_a2 after insert on t1 for each row execute procedure audit(); + + copy t from stdin with (partitioning); + 1 + 2 + 3 + \. + + -- no rows if trigger does not work + select * from audit; + + drop table t cascade; + drop table audit cascade; + drop function audit(); + + -- Test bad before row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_before before insert on t1 for each row execute procedure i2(); + + -- COPY should fail + copy t from stdin with (partitioning); + 1 + \. + + drop table t cascade; + drop function i2(); + + -- Test bad after row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_after after insert on t1 for each row execute procedure i2(); + + -- COPY should fail + copy t from stdin with (partitioning); + 1 + \. + + drop table t cascade; + drop function i2(); Index: src/test/regress/data/copy_input.data =================================================================== RCS file: src/test/regress/data/copy_input.data diff -N src/test/regress/data/copy_input.data *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/data/copy_input.data 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,12 ---- + 11 '2008-01-19' 11 + 12 '2008-02-15' 12 + 13 '2008-03-15' 13 + 21 '2008-01-10' 11 + 31 '2008-01-10' 11 + 41 '2008-01-10' 11 + 22 '2008-02-15' 12 + 23 '2008-03-15' 13 + 32 '2008-02-15' 12 + 33 '2008-03-15' 13 + 42 '2008-02-15' 12 + 43 '2008-03-15' 13 Index: src/test/regress/input/copy_partitioning.source =================================================================== RCS file: src/test/regress/input/copy_partitioning.source diff -N src/test/regress/input/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/input/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,149 ---- + -- test 1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + 1 + \. + + drop table parent cascade; + + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + 1 + \. + + drop table parent cascade; + + -- test 2 + set copy_partitioning_cache_size = 0; + create table parent(i int, j int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + + create index c1_idx on c1(j); + copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; + copy parent from '/tmp/parent' with (partitioning); + analyse; + + set enable_seqscan to false; + -- no rows if index was not updated + select * from c1 where j = 3; + + set enable_seqscan to true; + set enable_indexscan to false; + -- 1 row + select * from c1 where j = 3; + drop table parent cascade; + + -- test cache size + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int, + primary key(id) + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + + SELECT COUNT(*) FROM y2008; + SELECT COUNT(*) FROM jan2008; + SELECT COUNT(*) FROM jan2008half1; + SELECT COUNT(*) FROM jan2008half2; + SELECT COUNT(*) FROM feb2008; + SELECT COUNT(*) FROM mar2008; + + DELETE FROM y2008; + + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 3; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + DROP TABLE y2008 CASCADE; Index: src/test/regress/output/copy_partitioning.source =================================================================== RCS file: src/test/regress/output/copy_partitioning.source diff -N src/test/regress/output/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/output/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,538 ---- + -- test 1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + drop table parent cascade; + NOTICE: drop cascades to table c1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + drop table parent cascade; + NOTICE: drop cascades to table c1 + -- test 2 + set copy_partitioning_cache_size = 0; + create table parent(i int, j int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + create index c1_idx on c1(j); + copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; + copy parent from '/tmp/parent' with (partitioning); + analyse; + set enable_seqscan to false; + -- no rows if index was not updated + select * from c1 where j = 3; + i | j + ---+--- + 1 | 3 + (1 row) + + set enable_seqscan to true; + set enable_indexscan to false; + -- 1 row + select * from c1 where j = 3; + i | j + ---+--- + 1 | 3 + (1 row) + + drop table parent cascade; + NOTICE: drop cascades to 3 other objects + DETAIL: drop cascades to table c1 + drop cascades to table c2 + drop cascades to table c3 + -- test cache size + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int, + primary key(id) + ); + NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "y2008_pkey" for table "y2008" + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half1; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half2; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 3; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + DROP TABLE y2008 CASCADE; + NOTICE: drop cascades to 5 other objects + DETAIL: drop cascades to table jan2008 + drop cascades to table jan2008half1 + drop cascades to table jan2008half2 + drop cascades to table feb2008 + drop cascades to table mar2008 Index: src/test/regress/output/copy_partitioning_trigger.source =================================================================== RCS file: src/test/regress/output/copy_partitioning_trigger.source diff -N src/test/regress/output/copy_partitioning_trigger.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/output/copy_partitioning_trigger.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,59 ---- + -- Test triggers with partitioning + set copy_partitioning_cache_size = 0; + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create table t3 (check (i > 2 and i <= 3)) inherits (t); + create table audit(i int); + create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; + create trigger t_a after insert on t for each row execute procedure audit(); + -- the before trigger on the t would get fired + -- create trigger t_a2 before insert on t for each row execute procedure audit(); + create trigger t1_a before insert on t1 for each row execute procedure audit(); + create trigger t1_a2 after insert on t1 for each row execute procedure audit(); + copy t from stdin with (partitioning); + -- no rows if trigger does not work + select * from audit; + i + --- + 1 + 1 + (2 rows) + + drop table t cascade; + NOTICE: drop cascades to 3 other objects + DETAIL: drop cascades to table t1 + drop cascades to table t2 + drop cascades to table t3 + drop table audit cascade; + drop function audit(); + -- Test bad before row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_before before insert on t1 for each row execute procedure i2(); + -- COPY should fail + copy t from stdin with (partitioning); + ERROR: Before row insert trigger on table "t1" modified partitioning routing decision. Aborting insert. + CONTEXT: COPY t, line 1: "1" + drop table t cascade; + NOTICE: drop cascades to 2 other objects + DETAIL: drop cascades to table t1 + drop cascades to table t2 + drop function i2(); + -- Test bad after row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_after after insert on t1 for each row execute procedure i2(); + -- COPY should fail + copy t from stdin with (partitioning); + ERROR: After row insert trigger on table "t1" modified partitioning routing decision. Aborting insert. + CONTEXT: COPY t, line 1: "1" + drop table t cascade; + NOTICE: drop cascades to 2 other objects + DETAIL: drop cascades to table t1 + drop cascades to table t2 + drop function i2();
Emmanuel Cecchet wrote: > Hi Jan, > > Here is a new version of the patch with the following modifications: > - used oid list from pg_list.h > - properly handles triggers and generate an error if needed (updated doc > as well) > - added your test cases + extra bad trigger cases Hi, that got broken by the WHEN triggers patch (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the TriggerEnabled function signature, the code currently does not compile. I'll continue reading, in the meantime could you send a updated patch? Thanks, Jan
Jan Urbański <wulczer@wulczer.org> writes: > that got broken by the WHEN triggers patch > (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the > TriggerEnabled function signature, the code currently does not compile. [ squint... ] What is that patch doing touching the innards of trigger.c in the first place? I can't see any reason for trigger.c to be associated with partitioning. regards, tom lane
Tom Lane wrote: > Jan Urbański <wulczer@wulczer.org> writes: > >> that got broken by the WHEN triggers patch >> (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the >> TriggerEnabled function signature, the code currently does not compile. >> > > [ squint... ] What is that patch doing touching the innards of > trigger.c in the first place? I can't see any reason for trigger.c > to be associated with partitioning. > The problem I had is that if I used the standard trigger mechanism for after row inserts on a child table where the trigger is called asynchronously, I had a relcache leak on the child table. I tried to ask for help on that earlier on but it got lost with other discussions on the patch. So I tried to call the after trigger synchronously on the child table and it worked. So the patch is just adding a synchronous call to after row insert triggers that is called when the tuple is moved to a child table (also allows to detect for triggers that are messing with the routing). I would be happy to follow any recommendation for a more elegant solution to the problem. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
Hi Jan, Here is the updated patch. Note that the new code in trigger is a copy/paste of the before row insert trigger code modified to use the pointers of the after row trigger functions. Emmanuel > Emmanuel Cecchet wrote: > >> Hi Jan, >> >> Here is a new version of the patch with the following modifications: >> - used oid list from pg_list.h >> - properly handles triggers and generate an error if needed (updated doc >> as well) >> - added your test cases + extra bad trigger cases >> > > Hi, > > that got broken by the WHEN triggers patch > (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the > TriggerEnabled function signature, the code currently does not compile. > > I'll continue reading, in the meantime could you send a updated patch? > > Thanks, > Jan > > -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com Index: src/backend/commands/trigger.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.257 diff -c -r1.257 trigger.c *** src/backend/commands/trigger.c 20 Nov 2009 20:38:10 -0000 1.257 --- src/backend/commands/trigger.c 21 Nov 2009 03:56:41 -0000 *************** *** 1921,1926 **** --- 1921,1968 ---- return newtuple; } + HeapTuple + ExecARInsertTriggersNow(EState *estate, ResultRelInfo *relinfo, + HeapTuple trigtuple) + { + TriggerDesc *trigdesc = relinfo->ri_TrigDesc; + int ntrigs = trigdesc->n_after_row[TRIGGER_EVENT_INSERT]; + int *tgindx = trigdesc->tg_after_row[TRIGGER_EVENT_INSERT]; + HeapTuple newtuple = trigtuple; + HeapTuple oldtuple; + TriggerData LocTriggerData; + int i; + + LocTriggerData.type = T_TriggerData; + LocTriggerData.tg_event = TRIGGER_EVENT_INSERT | + TRIGGER_EVENT_ROW; + LocTriggerData.tg_relation = relinfo->ri_RelationDesc; + LocTriggerData.tg_newtuple = NULL; + LocTriggerData.tg_newtuplebuf = InvalidBuffer; + for (i = 0; i < ntrigs; i++) + { + Trigger *trigger = &trigdesc->triggers[tgindx[i]]; + + if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event, + NULL, NULL, newtuple)) + continue; + + LocTriggerData.tg_trigtuple = oldtuple = newtuple; + LocTriggerData.tg_trigtuplebuf = InvalidBuffer; + LocTriggerData.tg_trigger = trigger; + newtuple = ExecCallTriggerFunc(&LocTriggerData, + tgindx[i], + relinfo->ri_TrigFunctions, + relinfo->ri_TrigInstrument, + GetPerTupleMemoryContext(estate)); + if (oldtuple != newtuple && oldtuple != trigtuple) + heap_freetuple(oldtuple); + if (newtuple == NULL) + break; + } + return newtuple; + } + void ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple, List *recheckIndexes) Index: src/backend/commands/copy.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/commands/copy.c,v retrieving revision 1.318 diff -c -r1.318 copy.c *** src/backend/commands/copy.c 20 Nov 2009 20:38:10 -0000 1.318 --- src/backend/commands/copy.c 21 Nov 2009 03:56:41 -0000 *************** *** 43,48 **** --- 43,56 ---- #include "utils/memutils.h" #include "utils/snapmgr.h" + /* For tuple routing */ + #include "catalog/pg_inherits.h" + #include "catalog/pg_inherits_fn.h" + #include "nodes/makefuncs.h" + #include "nodes/pg_list.h" + #include "utils/fmgroids.h" + #include "utils/relcache.h" + #include "utils/tqual.h" #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7')) #define OCTVALUE(c) ((c) - '0') *************** *** 117,122 **** --- 125,131 ---- char *escape; /* CSV escape char (must be 1 byte) */ bool *force_quote_flags; /* per-column CSV FQ flags */ bool *force_notnull_flags; /* per-column CSV FNN flags */ + bool partitioning; /* tuple routing in table hierarchy */ /* these are just for error messages, see copy_in_error_callback */ const char *cur_relname; /* table name for error messages */ *************** *** 173,178 **** --- 182,194 ---- } DR_copy; + /** + * Size of the LRU list of relations to keep in cache for routing + */ + int partitioningCacheSize; + + List *child_table_lru = NULL; + /* * These macros centralize code used to process line_buf and raw_buf buffers. * They are macros because they often do continue/break control and to avoid *************** *** 839,844 **** --- 855,868 ---- errmsg("argument to option \"%s\" must be a list of column names", defel->defname))); } + else if (strcmp(defel->defname, "partitioning") == 0) + { + if (cstate->partitioning) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + cstate->partitioning = defGetBoolean(defel); + } else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), *************** *** 1662,1667 **** --- 1686,1960 ---- return res; } + /** + * Check that the given tuple matches the constraints of the given child table + * and performs an insert if the constraints are matched. insert_tuple specifies + * if the tuple must be inserted in the table if the constraint is satisfied. + * The method returns true if the constraint is satisfied (and insert was + * performed if insert_tuple is true), false otherwise (constraints not + * satisfied for this tuple on this child table). + */ + static bool + check_tuple_constraints(Relation child_table_relation, HeapTuple tuple, + bool insert_tuple, int hi_options, ResultRelInfo *parentResultRelInfo) + { + /* Check the constraints */ + ResultRelInfo *resultRelInfo; + TupleTableSlot *slot; + EState *estate = CreateExecutorState(); + bool result = false; + + resultRelInfo = makeNode(ResultRelInfo); + resultRelInfo->ri_RangeTableIndex = 1; /* dummy */ + resultRelInfo->ri_RelationDesc = child_table_relation; + resultRelInfo->ri_TrigDesc = CopyTriggerDesc(child_table_relation->trigdesc); + if (resultRelInfo->ri_TrigDesc) + resultRelInfo->ri_TrigFunctions = (FmgrInfo *) + palloc0(resultRelInfo->ri_TrigDesc->numtriggers * sizeof(FmgrInfo)); + resultRelInfo->ri_TrigInstrument = NULL; + + ExecOpenIndices(resultRelInfo); + + estate->es_result_relations = resultRelInfo; + estate->es_num_result_relations = 1; + estate->es_result_relation_info = resultRelInfo; + + /* Set up a tuple slot too */ + slot = MakeSingleTupleTableSlot(child_table_relation->rd_att); + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + + if (ExecRelCheck(resultRelInfo, slot, estate) == NULL) + { + /* Constraints satisfied */ + if (insert_tuple) + { + /* Insert the row in the child table */ + List *recheckIndexes = NIL; + + /* BEFORE ROW INSERT Triggers */ + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0) + { + HeapTuple newtuple; + newtuple = ExecBRInsertTriggers(estate, resultRelInfo, tuple); + + if (newtuple != tuple) + { + /* tuple modified by Trigger(s), check that the constraint is still valid */ + heap_freetuple(tuple); + tuple = newtuple; + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + if (ExecRelCheck(resultRelInfo, slot, estate) != NULL) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("Before row insert trigger on table \"%s\" modified partitioning routing decision.Aborting insert.", + RelationGetRelationName(child_table_relation)))); + } + } + } + + /* OK, store the tuple and create index entries for it */ + heap_insert(child_table_relation, tuple, GetCurrentCommandId(true), + hi_options, NULL); + + /* Update indices */ + if (resultRelInfo->ri_NumIndices > 0) + recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), + estate, false); + + /* AFTER ROW INSERT Triggers */ + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_after_row[TRIGGER_EVENT_INSERT] > 0) + { + HeapTuple newtuple; + newtuple = ExecARInsertTriggersNow(estate, resultRelInfo, tuple); + if (newtuple != tuple) + { + /* tuple modified by Trigger(s), check that the constraint is still valid */ + heap_freetuple(tuple); + tuple = newtuple; + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + if (ExecRelCheck(resultRelInfo, slot, estate) != NULL) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("After row insert trigger on table \"%s\" modified partitioning routingdecision. Aborting insert.", + RelationGetRelationName(child_table_relation)))); + } + } + } + } + result = true; + } + + /* Free resources */ + FreeExecutorState(estate); + ExecDropSingleTupleTableSlot(slot); + ExecCloseIndices(resultRelInfo); + + return result; + } + + + /** + * Route a tuple into a child table that matches the constraints of the tuple + * to be inserted. + * @param parent_relation_id Oid of the parent relation + * @param tuple the tuple to be routed + */ + static bool route_tuple_to_child(Relation parent_relation, HeapTuple tuple, int hi_options, ResultRelInfo *parentResultRelInfo) + { + Relation child_table_relation; + bool result = false; + Relation catalog_relation; + HeapTuple inherits_tuple; + HeapScanDesc scan; + ScanKeyData key[1]; + + /* Try to exploit locality for bulk inserts + * We expect consecutive insert to go to the same child table */ + if (partitioningCacheSize > 0 && child_table_lru != NULL) + { + /* Try the child table LRU */ + ListCell *child_oid_cell; + Oid child_relation_id; + + foreach(child_oid_cell, child_table_lru) + { + child_relation_id = lfirst_oid(child_oid_cell); + child_table_relation = try_relation_open(child_relation_id, + RowExclusiveLock); + + if (child_table_relation == NULL) + { + /* Child table does not exist anymore, purge cache entry */ + child_table_lru = list_delete_oid(child_table_lru, child_relation_id); + if (list_length(child_table_lru) == 0) + break; /* Cache is now empty */ + else + { /* Restart scanning */ + child_oid_cell = list_head(child_table_lru); + continue; + } + } + + if (check_tuple_constraints(child_table_relation, tuple, true, hi_options, parentResultRelInfo)) + { + /* Hit, move in front if not already the head */ + if (lfirst_oid(list_head(child_table_lru)) != child_relation_id) + { + /* The partitioning cache is in the CacheMemoryContext) */ + MemoryContext currentContext = MemoryContextSwitchTo(CacheMemoryContext); + child_table_lru = list_delete_oid(child_table_lru, child_relation_id); + child_table_lru = lcons_oid(child_relation_id, child_table_lru); + MemoryContextSwitchTo(currentContext); + } + + /* Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + return true; + } + relation_close(child_table_relation, RowExclusiveLock); + } + /* We got a miss */ + } + + /* Looking up child tables */ + ScanKeyInit(&key[0], + Anum_pg_inherits_inhparent, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(parent_relation->rd_id)); + catalog_relation = heap_open(InheritsRelationId, AccessShareLock); + scan = heap_beginscan(catalog_relation, SnapshotNow, 1, key); + while ((inherits_tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + { + TupleConstr *constr; + Form_pg_inherits inh = (Form_pg_inherits) GETSTRUCT(inherits_tuple); + Oid child_relation_id = inh->inhrelid; + + /* Check if the child table satisfy the constraints, if the relation + * cannot be opened this throws an exception */ + child_table_relation = (Relation) relation_open(child_relation_id, + RowExclusiveLock); + + constr = child_table_relation->rd_att->constr; + if (constr->num_check == 0) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("partition routing found no constraint for relation \"%s\"", + RelationGetRelationName(child_table_relation)))); + } + + if (has_subclass(child_table_relation->rd_id)) + { + /* This is a parent table, check its constraints first */ + if (check_tuple_constraints(child_table_relation, tuple, false, hi_options, parentResultRelInfo)) + { + /* Constraint satisfied, explore the child tables */ + result = route_tuple_to_child(child_table_relation, tuple, hi_options, parentResultRelInfo); + if (result) + { + /* Success, one of our child tables matched. + * Release the lock on this parent relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + break; + } + else + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("tuple matched constraints of relation \"%s\" but none of " + "its children", + RelationGetRelationName(child_table_relation)))); + } + } + } + else + { + /* Child table, try it */ + result = check_tuple_constraints(child_table_relation, tuple, true, hi_options, parentResultRelInfo); + } + + if (result) + { + /* We found the one, update the LRU and exit the loop! + * + * Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + if (partitioningCacheSize > 0) + { + /* The partitioning cache is in the CacheMemoryContext) */ + MemoryContext currentContext; + currentContext = MemoryContextSwitchTo(CacheMemoryContext); + + /* Add the new entry in head of the list (also builds the list if needed) */ + child_table_lru = lcons_oid(child_relation_id, child_table_lru); + + /* Adjust list size if needed */ + child_table_lru = list_truncate(child_table_lru, partitioningCacheSize); + + /* Restore memory context */ + MemoryContextSwitchTo(currentContext); + } + break; + } + else + { + /* Release the lock on that relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + } + } + heap_endscan(scan); + heap_close(catalog_relation, AccessShareLock); + return result; + } + /* * Copy FROM file to relation. */ *************** *** 2154,2183 **** { List *recheckIndexes = NIL; ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; } } --- 2447,2499 ---- { List *recheckIndexes = NIL; ! /* If routing is enabled and table has child tables, let's try routing */ ! if (cstate->partitioning && has_subclass(cstate->rel->rd_id)) ! { ! if (route_tuple_to_child(cstate->rel, tuple, hi_options, resultRelInfo)) ! { ! /* increase the counter so that we return how many ! * tuples got copied into all tables in total */ ! cstate->processed++; ! } ! else ! { ! ereport(ERROR, ( ! errcode(ERRCODE_BAD_COPY_FILE_FORMAT), ! errmsg("tuple does not satisfy any child table constraint") ! )); ! } ! } ! else ! { ! /* No partitioning, prepare the tuple and ! * check the constraints */ ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; ! } } } Index: src/include/commands/trigger.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/commands/trigger.h,v retrieving revision 1.78 diff -c -r1.78 trigger.h *** src/include/commands/trigger.h 20 Nov 2009 20:38:11 -0000 1.78 --- src/include/commands/trigger.h 21 Nov 2009 03:56:41 -0000 *************** *** 130,135 **** --- 130,138 ---- extern HeapTuple ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple); + extern HeapTuple ExecARInsertTriggersNow(EState *estate, + ResultRelInfo *relinfo, + HeapTuple trigtuple); extern void ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple, Index: src/include/commands/copy.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/commands/copy.h,v retrieving revision 1.32 diff -c -r1.32 copy.h *** src/include/commands/copy.h 1 Jan 2009 17:23:58 -0000 1.32 --- src/include/commands/copy.h 21 Nov 2009 03:56:41 -0000 *************** *** 22,25 **** --- 22,30 ---- extern DestReceiver *CreateCopyDestReceiver(void); + /** + * Size of the LRU list of relations to keep in cache for partitioning in COPY + */ + extern int partitioningCacheSize; + #endif /* COPY_H */ Index: src/include/executor/executor.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/executor/executor.h,v retrieving revision 1.163 diff -c -r1.163 executor.h *** src/include/executor/executor.h 26 Oct 2009 02:26:41 -0000 1.163 --- src/include/executor/executor.h 21 Nov 2009 03:56:41 -0000 *************** *** 166,171 **** --- 166,173 ---- extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids); extern void ExecConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate); + extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo, + TupleTableSlot *slot, EState *estate); extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate, Relation relation, Index rti, ItemPointer tid, TransactionId priorXmax); Index: src/backend/executor/execMain.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.335 diff -c -r1.335 execMain.c *** src/backend/executor/execMain.c 20 Nov 2009 20:38:10 -0000 1.335 --- src/backend/executor/execMain.c 21 Nov 2009 03:56:41 -0000 *************** *** 1239,1245 **** /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! static const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { --- 1239,1245 ---- /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.523 diff -c -r1.523 guc.c *** src/backend/utils/misc/guc.c 21 Oct 2009 20:38:58 -0000 1.523 --- src/backend/utils/misc/guc.c 21 Nov 2009 03:56:41 -0000 *************** *** 32,37 **** --- 32,38 ---- #include "access/xact.h" #include "catalog/namespace.h" #include "commands/async.h" + #include "commands/copy.h" #include "commands/prepare.h" #include "commands/vacuum.h" #include "commands/variable.h" *************** *** 534,539 **** --- 535,542 ---- gettext_noop("Customized Options"), /* DEVELOPER_OPTIONS */ gettext_noop("Developer Options"), + /* COPY_OPTIONS */ + gettext_noop("Copy Options"), /* help_config wants this array to be null-terminated */ NULL }; *************** *** 1955,1960 **** --- 1958,2019 ---- 1024, 100, 102400, NULL, NULL }, + { + { + /* variable name */ + "copy_partitioning_cache_size", + + /* context, we want the user to set it */ + PGC_USERSET, + + /* category for this configuration variable */ + COPY_OPTIONS, + + /* short description */ + gettext_noop("Size of the LRU list of child tables to keep in cache " + " when partitioning tuples in COPY."), + + /* long description */ + gettext_noop("When tuples are automatically routed in COPY, all " + "tables are scanned until the constraints are matched. When " + "a large number of child tables are present the scanning " + "overhead can be large. To reduce that overhead, the routing " + "mechanism keeps a cache of the last child tables in which " + "tuples where inserted and try these tables first before " + "performing a full scan. This variable defines the cache size " + "with 0 meaning no caching, 1 keep the last matching child table" + ", x keep the last x child tables in which tuples were inserted." + " Note that the list is managed with an LRU policy."), + + + /* flags: this option is not in the postgresql.conf.sample + * file and should not be allowed in the config. + * NOTE: this is not currently enforced. + */ + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE + }, + + /* pointer to the variable, this one is present in + * src/backend/commands/copy.c + */ + &partitioningCacheSize, + + /* default value */ + 2, + + /* min value */ + 0, + + /* max value */ + INT_MAX, + + /* assign hook function */ + NULL, + + /* show hook function */ + NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL Index: src/test/regress/parallel_schedule =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/parallel_schedule,v retrieving revision 1.57 diff -c -r1.57 parallel_schedule *** src/test/regress/parallel_schedule 24 Aug 2009 03:10:16 -0000 1.57 --- src/test/regress/parallel_schedule 21 Nov 2009 03:56:41 -0000 *************** *** 47,53 **** # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect # ---------- # Another group of parallel tests --- 47,55 ---- # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect ! test: copy_partitioning ! test: copy_partitioning_trigger # ---------- # Another group of parallel tests Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /home/manu/cvsrepo/pgsql/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.92 diff -c -r1.92 copy.sgml *** doc/src/sgml/ref/copy.sgml 21 Sep 2009 20:10:21 -0000 1.92 --- doc/src/sgml/ref/copy.sgml 21 Nov 2009 03:56:41 -0000 *************** *** 41,46 **** --- 41,47 ---- ESCAPE '<replaceable class="parameter">escape_character</replaceable>' FORCE_QUOTE { ( <replaceable class="parameter">column</replaceable> [, ...] ) | * } FORCE_NOT_NULL ( <replaceable class="parameter">column</replaceable> [, ...] ) + PARTITIONING [ <replaceable class="parameter">boolean</replaceable> ] </synopsis> </refsynopsisdiv> *************** *** 282,287 **** --- 283,298 ---- </listitem> </varlistentry> + <varlistentry> + <term><literal>PARTITIONING</></term> + <listitem> + <para> + In <literal>PARTITIONING</> mode, <command>COPY TO</> a parent + table will automatically move each row to the child table that + has the matching constraints. + </para> + </listitem> + </varlistentry> </variablelist> </refsect1> *************** *** 384,389 **** --- 395,419 ---- <command>VACUUM</command> to recover the wasted space. </para> + <para> + <literal>PARTITIONING</> mode scans for each child table constraint in the + hierarchy to find a match. As an optimization, a cache of the last child + tables where tuples have been routed is kept and tried first. The size + of the cache is set by the <literal>copy_partitioning_cache_size</literal> + session variable. It the size is set to 0, the cache is disabled otherwise + the indicated number of child tables is kept in the cache (at most). + </para> + + <para> + <literal>PARTITIONING</> mode assumes that every child table has at least + one constraint defined otherwise an error is thrown. If child tables have + overlapping constraints, the row is inserted in the first child table found + (be it a cached table or the first table to appear in the lookup). + Before of after ROW triggers will generate an error and abort the COPY operation + if they modify the tuple value in a way that violates the constraints of the child + table where the tuple has been routed. + </para> + </refsect1> <refsect1> *************** *** 828,833 **** --- 858,1001 ---- 0000200 M B A B W E 377 377 377 377 377 377 </programlisting> </para> + + <para> + Multiple options are separated by a comma like: + <programlisting> + COPY (SELECT t FROM foo WHERE id = 1) TO STDOUT (FORMAT CSV, HEADER, FORCE_QUOTE (t)); + </programlisting> + </para> + + <refsect2> + <title>Partitioning examples</title> + <para> + Here is an example on how to use partitioning. Let's first create a parent + table and 3 child tables as follows: + <programlisting> + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + </programlisting> + We prepare the following data file (1 row for each child table): + copy_input.data content: + <programlisting> + 11 '2008-01-10' 11 + 12 '2008-02-15' 12 + 13 '2008-03-15' 13 + 21 '2008-01-10' 11 + 31 '2008-01-10' 11 + 41 '2008-01-10' 11 + 22 '2008-02-15' 12 + 23 '2008-03-15' 13 + 32 '2008-02-15' 12 + 33 '2008-03-15' 13 + 42 '2008-02-15' 12 + 43 '2008-03-15' 13 + </programlisting> + If we COPY the data in the parent table without partitioning enabled, all + rows are inserted in the master table as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data'; + + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + </programlisting> + If we execute COPY with partitioning enabled, rows are loaded in the + appropriate child table automatically as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + + SELECT * FROM y2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM feb2008; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + </programlisting> + The cache size can be tuned using: + <programlisting> + set copy_partitioning_cache_size = 3; + </programlisting> + Repeating the COPY command will now be faster: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + </programlisting> + </para> + </refsect2> </refsect1> <refsect1> Index: src/include/utils/guc_tables.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/utils/guc_tables.h,v retrieving revision 1.46 diff -c -r1.46 guc_tables.h *** src/include/utils/guc_tables.h 11 Jun 2009 14:49:13 -0000 1.46 --- src/include/utils/guc_tables.h 21 Nov 2009 03:56:41 -0000 *************** *** 76,82 **** COMPAT_OPTIONS_CLIENT, PRESET_OPTIONS, CUSTOM_OPTIONS, ! DEVELOPER_OPTIONS }; /* --- 76,83 ---- COMPAT_OPTIONS_CLIENT, PRESET_OPTIONS, CUSTOM_OPTIONS, ! DEVELOPER_OPTIONS, ! COPY_OPTIONS }; /* Index: src/test/regress/input/copy_partitioning_trigger.source =================================================================== RCS file: src/test/regress/input/copy_partitioning_trigger.source diff -N src/test/regress/input/copy_partitioning_trigger.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/input/copy_partitioning_trigger.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,62 ---- + -- Test triggers with partitioning + set copy_partitioning_cache_size = 0; + + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create table t3 (check (i > 2 and i <= 3)) inherits (t); + + create table audit(i int); + + create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; + + create trigger t_a after insert on t for each row execute procedure audit(); + -- the before trigger on the t would get fired + -- create trigger t_a2 before insert on t for each row execute procedure audit(); + create trigger t1_a before insert on t1 for each row execute procedure audit(); + create trigger t1_a2 after insert on t1 for each row execute procedure audit(); + + copy t from stdin with (partitioning); + 1 + 2 + 3 + \. + + -- no rows if trigger does not work + select * from audit; + + drop table t cascade; + drop table audit cascade; + drop function audit(); + + -- Test bad before row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_before before insert on t1 for each row execute procedure i2(); + + -- COPY should fail + copy t from stdin with (partitioning); + 1 + \. + + drop table t cascade; + drop function i2(); + + -- Test bad after row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_after after insert on t1 for each row execute procedure i2(); + + -- COPY should fail + copy t from stdin with (partitioning); + 1 + \. + + drop table t cascade; + drop function i2(); Index: src/test/regress/data/copy_input.data =================================================================== RCS file: src/test/regress/data/copy_input.data diff -N src/test/regress/data/copy_input.data *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/data/copy_input.data 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,12 ---- + 11 '2008-01-19' 11 + 12 '2008-02-15' 12 + 13 '2008-03-15' 13 + 21 '2008-01-10' 11 + 31 '2008-01-10' 11 + 41 '2008-01-10' 11 + 22 '2008-02-15' 12 + 23 '2008-03-15' 13 + 32 '2008-02-15' 12 + 33 '2008-03-15' 13 + 42 '2008-02-15' 12 + 43 '2008-03-15' 13 Index: src/test/regress/input/copy_partitioning.source =================================================================== RCS file: src/test/regress/input/copy_partitioning.source diff -N src/test/regress/input/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/input/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,149 ---- + -- test 1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + 1 + \. + + drop table parent cascade; + + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + 1 + \. + + drop table parent cascade; + + -- test 2 + set copy_partitioning_cache_size = 0; + create table parent(i int, j int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + + create index c1_idx on c1(j); + copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; + copy parent from '/tmp/parent' with (partitioning); + analyse; + + set enable_seqscan to false; + -- no rows if index was not updated + select * from c1 where j = 3; + + set enable_seqscan to true; + set enable_indexscan to false; + -- 1 row + select * from c1 where j = 3; + drop table parent cascade; + + -- test cache size + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int, + primary key(id) + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + + SELECT COUNT(*) FROM y2008; + SELECT COUNT(*) FROM jan2008; + SELECT COUNT(*) FROM jan2008half1; + SELECT COUNT(*) FROM jan2008half2; + SELECT COUNT(*) FROM feb2008; + SELECT COUNT(*) FROM mar2008; + + DELETE FROM y2008; + + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 3; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + DROP TABLE y2008 CASCADE; Index: src/test/regress/output/copy_partitioning.source =================================================================== RCS file: src/test/regress/output/copy_partitioning.source diff -N src/test/regress/output/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/output/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,538 ---- + -- test 1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + drop table parent cascade; + NOTICE: drop cascades to table c1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + drop table parent cascade; + NOTICE: drop cascades to table c1 + -- test 2 + set copy_partitioning_cache_size = 0; + create table parent(i int, j int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + create index c1_idx on c1(j); + copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; + copy parent from '/tmp/parent' with (partitioning); + analyse; + set enable_seqscan to false; + -- no rows if index was not updated + select * from c1 where j = 3; + i | j + ---+--- + 1 | 3 + (1 row) + + set enable_seqscan to true; + set enable_indexscan to false; + -- 1 row + select * from c1 where j = 3; + i | j + ---+--- + 1 | 3 + (1 row) + + drop table parent cascade; + NOTICE: drop cascades to 3 other objects + DETAIL: drop cascades to table c1 + drop cascades to table c2 + drop cascades to table c3 + -- test cache size + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int, + primary key(id) + ); + NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "y2008_pkey" for table "y2008" + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half1; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half2; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 3; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 2; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 1; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + set copy_partitioning_cache_size = 0; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + DROP TABLE y2008 CASCADE; + NOTICE: drop cascades to 5 other objects + DETAIL: drop cascades to table jan2008 + drop cascades to table jan2008half1 + drop cascades to table jan2008half2 + drop cascades to table feb2008 + drop cascades to table mar2008 Index: src/test/regress/output/copy_partitioning_trigger.source =================================================================== RCS file: src/test/regress/output/copy_partitioning_trigger.source diff -N src/test/regress/output/copy_partitioning_trigger.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/output/copy_partitioning_trigger.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,59 ---- + -- Test triggers with partitioning + set copy_partitioning_cache_size = 0; + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create table t3 (check (i > 2 and i <= 3)) inherits (t); + create table audit(i int); + create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; + create trigger t_a after insert on t for each row execute procedure audit(); + -- the before trigger on the t would get fired + -- create trigger t_a2 before insert on t for each row execute procedure audit(); + create trigger t1_a before insert on t1 for each row execute procedure audit(); + create trigger t1_a2 after insert on t1 for each row execute procedure audit(); + copy t from stdin with (partitioning); + -- no rows if trigger does not work + select * from audit; + i + --- + 1 + 1 + (2 rows) + + drop table t cascade; + NOTICE: drop cascades to 3 other objects + DETAIL: drop cascades to table t1 + drop cascades to table t2 + drop cascades to table t3 + drop table audit cascade; + drop function audit(); + -- Test bad before row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_before before insert on t1 for each row execute procedure i2(); + -- COPY should fail + copy t from stdin with (partitioning); + ERROR: Before row insert trigger on table "t1" modified partitioning routing decision. Aborting insert. + CONTEXT: COPY t, line 1: "1" + drop table t cascade; + NOTICE: drop cascades to 2 other objects + DETAIL: drop cascades to table t1 + drop cascades to table t2 + drop function i2(); + -- Test bad after row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_after after insert on t1 for each row execute procedure i2(); + -- COPY should fail + copy t from stdin with (partitioning); + ERROR: After row insert trigger on table "t1" modified partitioning routing decision. Aborting insert. + CONTEXT: COPY t, line 1: "1" + drop table t cascade; + NOTICE: drop cascades to 2 other objects + DETAIL: drop cascades to table t1 + drop cascades to table t2 + drop function i2();
Emmanuel Cecchet wrote: > Hi Jan, > > Here is the updated patch. > Note that the new code in trigger is a copy/paste of the before row > insert trigger code modified to use the pointers of the after row > trigger functions. Hi, ok, this version applied, compiled and ran the regression tests fine. I tried a few things and was not able to break it this time. A couple of nitpicks first: o) the route_tuple_to_child recurses to child tables of child tables, which is undocumented and requires a check_stack_depth() call if it's really desirable o) the error messages given when a trigger modifies the tuple should be one sentence, I suggest dropping the "Aborting insert" part o) there are two places with "Close the relation but keep the lock" comments. Why is in necessary to keep the locks? I confess I don't know why *wouldn't* it be necessary, but maybe the comment could explain that? Or is it just my lack of understanding and it should be obvious that the lock needs to be kept? o) the result of relation_open is explicitly cast to Relation, the result of try_relation_open is not (a minor gripe) And a couple of more important things: o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted from just above, I guess there was a reason why you needed that code, but I also suspect that's a string indication that something's wrong with the abstractions in your patch. Again I don't really know how else you could achieve what you want. It just looks fishy if you need to modify trigger.c to add an option to COPY. o) publicizing ExecRelCheck might also indicate a problem, but I guess that can be defended, as the patch is basically based on using that function for each incoming tuple o) the LRU OID cache is a separate optimisation that could be separated from the patch. I didn't do any performance tests, and I trust that a cache like that helps with some workloads, but I think we could do a better effort that a simplistic cache like that. Also, I'm not 100% sure it's OK to just stick it into CacheMemoryContext... Maybe it could go into the COPY statement context? You said you don't want to start with a cold cache always, but OTOH if you're loading into different tables in the same backend, the cache will actually hurt... [thinks of something really bad... types up a quick test...] Oh, actually, the cache is outright *wrong*, as the attached test6.sql shows. Ugh, let's just forget about that LRU cache for now. o) the patch could use some more docs, especially about descending into child tables. o) my main concern is still valid: the design was never agreed upon. The approach of using inheritance info for automatic partitioning is, at least IMHO, too restricted. Regular INSERTs won't get routed to child tables. Data from writable CTEs won't get routed. People wanting to do partitioning on something else that constraints are stuffed. I strongly suspect the patch will get rejected on the grounds of lack of community agreement on partitioning, but I'd hate to see your work wasted. It's not too late to open a discussion on how automatic partitioning could work (or start working out a common proposal with the people discussing in the "Syntax for partitioning" thread). Marking as Waiting on Author, although I'd really like to see a solid design being agreed upon, and then the code. Cheers, Jan drop table parent cascade; drop table parent2 cascade; create table parent(i int); create table c1 (check (i > 0 and i <= 1)) inherits (parent); create table parent2(i int); create table c12 (check (i > 0 and i <= 1)) inherits (parent2); set copy_partitioning_cache_size = 1; copy parent from stdin with (partitioning); 1 \. copy parent2 from stdin with (partitioning); 1 \. -- all tuples went to parent ! select * from parent; -- is empty select * from parent2;
Jan Urbański wrote: > o) my main concern is still valid: the design was never agreed upon. > The approach of using inheritance info for automatic partitioning is, at > least IMHO, too restricted. Regular INSERTs won't get routed to child > tables. Data from writable CTEs won't get routed. People wanting to do > partitioning on something else that constraints are stuffed. > Whether or not the other paths to load data are supported, COPY is the one you have to get right before this sort of feature is useful to the sort of use-cases that need partitioning the most. While your concerns are valid, I hope Emmanuel doesn't take your feedback the wrong way. I'll take a working prototype that needs improvement over a paper design with no implementation anytime. He's coming at this bottom-up starting with the little details that needs to be right, the partitioning syntax patch is starting at the top and working downward, there seems to be clear progress being made toward the eventual goal somewhere in the middle of all that here. Considering how long this area has been bogged down in discussion without action, I'm rather glad we're seeing working proof of concepts bits rather than just talking about things. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Jan Urbański wrote: > o) my main concern is still valid: the design was never agreed upon. > The approach of using inheritance info for automatic partitioning is, at > least IMHO, too restricted. Regular INSERTs won't get routed to child > tables. Data from writable CTEs won't get routed. People wanting to do > partitioning on something else that constraints are stuffed. > Well, this patch does not claim to implement partitioning for Postgres, it just offers partitioning as an option for COPY (and COPY only) based on the existing mechanism in Postgres. I have already participated in lengthy and relatively sterile discussions on how to implement a full-blown partitioning but we never reached the beginning of an agreement and it was decided that a step-by-step approach would be better. I will propose another implementation of partitioning in COPY once Postgres has another representation than constraints on child tables to implement it. > I strongly suspect the patch will get rejected on the grounds of lack of > community agreement on partitioning, but I'd hate to see your work > wasted. It's not too late to open a discussion on how automatic > partitioning could work (or start working out a common proposal with the > people discussing in the "Syntax for partitioning" thread). > This is not my call. Right now the syntax for partitioning does not change anything to Postgres, it just adds syntactic sugar on top of the existing implementation. It will not route anything or answer any of the needs you mentioned in your previous point. > Marking as Waiting on Author, although I'd really like to see a solid > design being agreed upon, and then the code. > You are asking the wrong person if you want me to lead the partitioning design discussions. I already tried once and I was unsuccessful. As nothing as changed I don't see why I would be more successful this time. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
Jan, > A couple of nitpicks first: > > o) the route_tuple_to_child recurses to child tables of child tables, > which is undocumented and requires a check_stack_depth() call if it's > really desirable > The recursive call is as deep as the inheritance hierarchy. I am not sure what we are supposed to do if check_stack_depth() fails. > o) the error messages given when a trigger modifies the tuple should be > one sentence, I suggest dropping the "Aborting insert" part > Where are those rules about error messages specified? > o) there are two places with "Close the relation but keep the lock" > comments. Why is in necessary to keep the locks? I confess I don't know > why *wouldn't* it be necessary, but maybe the comment could explain > that? Or is it just my lack of understanding and it should be obvious > that the lock needs to be kept? > As we did write to the table, we must maintain the lock on it until the operation or transaction is complete. > o) the result of relation_open is explicitly cast to Relation, the > result of try_relation_open is not (a minor gripe) > The first cast was unnecessary, I removed it. > And a couple of more important things: > > o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted > from just above, I guess there was a reason why you needed that code, > but I also suspect that's a string indication that something's wrong > with the abstractions in your patch. Again I don't really know how else > you could achieve what you want. It just looks fishy if you need to > modify trigger.c to add an option to COPY. > As I explained to Tom, if the after row trigger is called asynchronously I get a relcache leak on the child table at the end of the copy operation. If the trigger is called synchronously (like a before row trigger) it works fine. Also calling the after row trigger synchronously allows me to detect any potential problem between the actions of the trigger and the routing decision. I am open to any suggestion for a more elegant solution. > o) publicizing ExecRelCheck might also indicate a problem, but I guess > that can be defended, as the patch is basically based on using that > function for each incoming tuple > The only exposed method for checking constraints (ExecConstraints) goes directly into an error (ereport) if the constraint checking fails. Another option would be to add a new parameter to ExecConstraint to tell it whether to generate an ereport or not but that would impact all callers of that method. > o) the LRU OID cache is a separate optimisation that could be > separated from the patch. I didn't do any performance tests, and I trust > that a cache like that helps with some workloads, but I think we could > do a better effort that a simplistic cache like that. Also, I'm not 100% > sure it's OK to just stick it into CacheMemoryContext... Maybe it could > go into the COPY statement context? You said you don't want to start > with a cold cache always, but OTOH if you're loading into different > tables in the same backend, the cache will actually hurt... > > [thinks of something really bad... types up a quick test...] > > Oh, actually, the cache is outright *wrong*, as the attached test6.sql > shows. Ugh, let's just forget about that LRU cache for now. > Point taken, I have removed the cache from the GUC variables and it is now only used for the duration of the COPY operation. > o) the patch could use some more docs, especially about descending into > child tables. > Do you mean an overall comment explaining the design? Otherwise there is a comment for every single 'if' and block of code in the patch. Be more specific if you have a special location where you think comments are missing or too vague. > o) my main concern is still valid: the design was never agreed upon. > The approach of using inheritance info for automatic partitioning is, at > least IMHO, too restricted. Regular INSERTs won't get routed to child > tables. Data from writable CTEs won't get routed. People wanting to do > partitioning on something else that constraints are stuffed. > > I strongly suspect the patch will get rejected on the grounds of lack of > community agreement on partitioning, but I'd hate to see your work > wasted. It's not too late to open a discussion on how automatic > partitioning could work (or start working out a common proposal with the > people discussing in the "Syntax for partitioning" thread). > > Marking as Waiting on Author, although I'd really like to see a solid > design being agreed upon, and then the code. > I already commented on that part in another message and this is not related to that patch but to the politics of implementing partitioning in Postgres. Now if the rejection of the patch is based on political stances rather than technical once, I can understand that too. Please find the new patch attached. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com Index: src/backend/commands/trigger.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.257 diff -c -r1.257 trigger.c *** src/backend/commands/trigger.c 20 Nov 2009 20:38:10 -0000 1.257 --- src/backend/commands/trigger.c 22 Nov 2009 16:52:29 -0000 *************** *** 1921,1926 **** --- 1921,1968 ---- return newtuple; } + HeapTuple + ExecARInsertTriggersNow(EState *estate, ResultRelInfo *relinfo, + HeapTuple trigtuple) + { + TriggerDesc *trigdesc = relinfo->ri_TrigDesc; + int ntrigs = trigdesc->n_after_row[TRIGGER_EVENT_INSERT]; + int *tgindx = trigdesc->tg_after_row[TRIGGER_EVENT_INSERT]; + HeapTuple newtuple = trigtuple; + HeapTuple oldtuple; + TriggerData LocTriggerData; + int i; + + LocTriggerData.type = T_TriggerData; + LocTriggerData.tg_event = TRIGGER_EVENT_INSERT | + TRIGGER_EVENT_ROW; + LocTriggerData.tg_relation = relinfo->ri_RelationDesc; + LocTriggerData.tg_newtuple = NULL; + LocTriggerData.tg_newtuplebuf = InvalidBuffer; + for (i = 0; i < ntrigs; i++) + { + Trigger *trigger = &trigdesc->triggers[tgindx[i]]; + + if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event, + NULL, NULL, newtuple)) + continue; + + LocTriggerData.tg_trigtuple = oldtuple = newtuple; + LocTriggerData.tg_trigtuplebuf = InvalidBuffer; + LocTriggerData.tg_trigger = trigger; + newtuple = ExecCallTriggerFunc(&LocTriggerData, + tgindx[i], + relinfo->ri_TrigFunctions, + relinfo->ri_TrigInstrument, + GetPerTupleMemoryContext(estate)); + if (oldtuple != newtuple && oldtuple != trigtuple) + heap_freetuple(oldtuple); + if (newtuple == NULL) + break; + } + return newtuple; + } + void ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple, List *recheckIndexes) Index: src/backend/commands/copy.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/commands/copy.c,v retrieving revision 1.318 diff -c -r1.318 copy.c *** src/backend/commands/copy.c 20 Nov 2009 20:38:10 -0000 1.318 --- src/backend/commands/copy.c 22 Nov 2009 16:52:29 -0000 *************** *** 43,48 **** --- 43,56 ---- #include "utils/memutils.h" #include "utils/snapmgr.h" + /* For tuple routing */ + #include "catalog/pg_inherits.h" + #include "catalog/pg_inherits_fn.h" + #include "nodes/makefuncs.h" + #include "nodes/pg_list.h" + #include "utils/fmgroids.h" + #include "utils/relcache.h" + #include "utils/tqual.h" #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7')) #define OCTVALUE(c) ((c) - '0') *************** *** 117,122 **** --- 125,131 ---- char *escape; /* CSV escape char (must be 1 byte) */ bool *force_quote_flags; /* per-column CSV FQ flags */ bool *force_notnull_flags; /* per-column CSV FNN flags */ + bool partitioning; /* tuple routing in table hierarchy */ /* these are just for error messages, see copy_in_error_callback */ const char *cur_relname; /* table name for error messages */ *************** *** 173,178 **** --- 182,190 ---- } DR_copy; + /* List of child tables where tuples where routed (for partitioning option) */ + List *child_table_lru = NULL; + /* * These macros centralize code used to process line_buf and raw_buf buffers. * They are macros because they often do continue/break control and to avoid *************** *** 839,844 **** --- 851,864 ---- errmsg("argument to option \"%s\" must be a list of column names", defel->defname))); } + else if (strcmp(defel->defname, "partitioning") == 0) + { + if (cstate->partitioning) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + cstate->partitioning = defGetBoolean(defel); + } else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), *************** *** 1662,1667 **** --- 1682,1949 ---- return res; } + /** + * Check that the given tuple matches the constraints of the given child table + * and performs an insert if the constraints are matched. insert_tuple specifies + * if the tuple must be inserted in the table if the constraint is satisfied. + * The method returns true if the constraint is satisfied (and insert was + * performed if insert_tuple is true), false otherwise (constraints not + * satisfied for this tuple on this child table). + */ + static bool + check_tuple_constraints(Relation child_table_relation, HeapTuple tuple, + bool insert_tuple, int hi_options, ResultRelInfo *parentResultRelInfo) + { + /* Check the constraints */ + ResultRelInfo *resultRelInfo; + TupleTableSlot *slot; + EState *estate = CreateExecutorState(); + bool result = false; + + resultRelInfo = makeNode(ResultRelInfo); + resultRelInfo->ri_RangeTableIndex = 1; /* dummy */ + resultRelInfo->ri_RelationDesc = child_table_relation; + resultRelInfo->ri_TrigDesc = CopyTriggerDesc(child_table_relation->trigdesc); + if (resultRelInfo->ri_TrigDesc) + resultRelInfo->ri_TrigFunctions = (FmgrInfo *) + palloc0(resultRelInfo->ri_TrigDesc->numtriggers * sizeof(FmgrInfo)); + resultRelInfo->ri_TrigInstrument = NULL; + + ExecOpenIndices(resultRelInfo); + + estate->es_result_relations = resultRelInfo; + estate->es_num_result_relations = 1; + estate->es_result_relation_info = resultRelInfo; + + /* Set up a tuple slot too */ + slot = MakeSingleTupleTableSlot(child_table_relation->rd_att); + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + + if (ExecRelCheck(resultRelInfo, slot, estate) == NULL) + { + /* Constraints satisfied */ + if (insert_tuple) + { + /* Insert the row in the child table */ + List *recheckIndexes = NIL; + + /* BEFORE ROW INSERT Triggers */ + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0) + { + HeapTuple newtuple; + newtuple = ExecBRInsertTriggers(estate, resultRelInfo, tuple); + + if (newtuple != tuple) + { + /* tuple modified by Trigger(s), check that the constraint is still valid */ + heap_freetuple(tuple); + tuple = newtuple; + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + if (ExecRelCheck(resultRelInfo, slot, estate) != NULL) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("Before row insert trigger on table \"%s\" modified partitioning routing decision.Aborting insert.", + RelationGetRelationName(child_table_relation)))); + } + } + } + + /* OK, store the tuple and create index entries for it */ + heap_insert(child_table_relation, tuple, GetCurrentCommandId(true), + hi_options, NULL); + + /* Update indices */ + if (resultRelInfo->ri_NumIndices > 0) + recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), + estate, false); + + /* AFTER ROW INSERT Triggers */ + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_after_row[TRIGGER_EVENT_INSERT] > 0) + { + HeapTuple newtuple; + newtuple = ExecARInsertTriggersNow(estate, resultRelInfo, tuple); + if (newtuple != tuple) + { + /* tuple modified by Trigger(s), check that the constraint is still valid */ + heap_freetuple(tuple); + tuple = newtuple; + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + if (ExecRelCheck(resultRelInfo, slot, estate) != NULL) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("After row insert trigger on table \"%s\" modified partitioning routingdecision. Aborting insert.", + RelationGetRelationName(child_table_relation)))); + } + } + } + } + result = true; + } + + /* Free resources */ + FreeExecutorState(estate); + ExecDropSingleTupleTableSlot(slot); + ExecCloseIndices(resultRelInfo); + + return result; + } + + + /** + * Route a tuple into a child table that matches the constraints of the tuple + * to be inserted. + * @param parent_relation_id Oid of the parent relation + * @param tuple the tuple to be routed + */ + static bool route_tuple_to_child(Relation parent_relation, HeapTuple tuple, int hi_options, ResultRelInfo *parentResultRelInfo) + { + Relation child_table_relation; + bool result = false; + Relation catalog_relation; + HeapTuple inherits_tuple; + HeapScanDesc scan; + ScanKeyData key[1]; + + /* Try to exploit locality for bulk inserts + * We expect consecutive insert to go to the same child table */ + if (child_table_lru != NULL) + { + /* Try the child table LRU */ + ListCell *child_oid_cell; + Oid child_relation_id; + + foreach(child_oid_cell, child_table_lru) + { + child_relation_id = lfirst_oid(child_oid_cell); + child_table_relation = try_relation_open(child_relation_id, + RowExclusiveLock); + + if (child_table_relation == NULL) + { + /* Child table does not exist anymore, purge cache entry */ + child_table_lru = list_delete_oid(child_table_lru, child_relation_id); + if (list_length(child_table_lru) == 0) + break; /* Cache is now empty */ + else + { /* Restart scanning */ + child_oid_cell = list_head(child_table_lru); + continue; + } + } + + if (check_tuple_constraints(child_table_relation, tuple, true, hi_options, parentResultRelInfo)) + { + /* Hit, move in front if not already the head */ + if (lfirst_oid(list_head(child_table_lru)) != child_relation_id) + { + /* The partitioning cache is in the CurTransactionContext) */ + MemoryContext currentContext = MemoryContextSwitchTo(CurTransactionContext); + child_table_lru = list_delete_oid(child_table_lru, child_relation_id); + child_table_lru = lcons_oid(child_relation_id, child_table_lru); + MemoryContextSwitchTo(currentContext); + } + + /* Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + return true; + } + relation_close(child_table_relation, RowExclusiveLock); + } + /* We got a miss */ + } + + /* Looking up child tables */ + ScanKeyInit(&key[0], + Anum_pg_inherits_inhparent, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(parent_relation->rd_id)); + catalog_relation = heap_open(InheritsRelationId, AccessShareLock); + scan = heap_beginscan(catalog_relation, SnapshotNow, 1, key); + while ((inherits_tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + { + TupleConstr *constr; + Form_pg_inherits inh = (Form_pg_inherits) GETSTRUCT(inherits_tuple); + Oid child_relation_id = inh->inhrelid; + + /* Check if the child table satisfy the constraints, if the relation + * cannot be opened this throws an exception */ + child_table_relation = relation_open(child_relation_id, RowExclusiveLock); + + constr = child_table_relation->rd_att->constr; + if (constr->num_check == 0) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("partition routing found no constraint for relation \"%s\"", + RelationGetRelationName(child_table_relation)))); + } + + if (has_subclass(child_table_relation->rd_id)) + { + /* This is a parent table, check its constraints first */ + if (check_tuple_constraints(child_table_relation, tuple, false, hi_options, parentResultRelInfo)) + { + /* Constraint satisfied, explore the child tables */ + result = route_tuple_to_child(child_table_relation, tuple, hi_options, parentResultRelInfo); + if (result) + { + /* Success, one of our child tables matched. + * Release the lock on this parent relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + break; + } + else + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("tuple matched constraints of relation \"%s\" but none of " + "its children", + RelationGetRelationName(child_table_relation)))); + } + } + } + else + { + /* Child table, try it */ + result = check_tuple_constraints(child_table_relation, tuple, true, hi_options, parentResultRelInfo); + } + + if (result) + { + MemoryContext currentContext; + /* We found the one, update the LRU and exit the loop! + * + * Close the relation but keep the lock until the end of + * the transaction */ + relation_close(child_table_relation, NoLock); + + /* The partitioning cache is in the CurTransactionContext) */ + currentContext = MemoryContextSwitchTo(CurTransactionContext); + + /* Add the new entry in head of the list (also builds the list if needed) */ + child_table_lru = lcons_oid(child_relation_id, child_table_lru); + + /* Restore memory context */ + MemoryContextSwitchTo(currentContext); + break; + } + else + { + /* Release the lock on that relation, we did not use it */ + relation_close(child_table_relation, RowExclusiveLock); + } + } + heap_endscan(scan); + heap_close(catalog_relation, AccessShareLock); + return result; + } + /* * Copy FROM file to relation. */ *************** *** 2154,2189 **** { List *recheckIndexes = NIL; ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; } } /* Done, clean up */ error_context_stack = errcontext.previous; FreeBulkInsertState(bistate); MemoryContextSwitchTo(oldcontext); --- 2436,2503 ---- { List *recheckIndexes = NIL; ! /* If routing is enabled and table has child tables, let's try routing */ ! if (cstate->partitioning && has_subclass(cstate->rel->rd_id)) ! { ! if (route_tuple_to_child(cstate->rel, tuple, hi_options, resultRelInfo)) ! { ! /* increase the counter so that we return how many ! * tuples got copied into all tables in total */ ! cstate->processed++; ! } ! else ! { ! ereport(ERROR, ( ! errcode(ERRCODE_BAD_COPY_FILE_FORMAT), ! errmsg("tuple does not satisfy any child table constraint") ! )); ! } ! } ! else ! { ! /* No partitioning, prepare the tuple and ! * check the constraints */ ! /* Place tuple in tuple slot */ ! ExecStoreTuple(tuple, slot, InvalidBuffer, false); ! /* Check the constraints of the tuple */ ! if (cstate->rel->rd_att->constr) ! ExecConstraints(resultRelInfo, slot, estate); ! ! /* OK, store the tuple and create index entries for it */ ! heap_insert(cstate->rel, tuple, mycid, hi_options, bistate); ! ! if (resultRelInfo->ri_NumIndices > 0) ! recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), ! estate, false); ! ! /* AFTER ROW INSERT Triggers */ ! ExecARInsertTriggers(estate, resultRelInfo, tuple, ! recheckIndexes); ! /* ! * We count only tuples not suppressed by a BEFORE INSERT trigger; ! * this is the same definition used by execMain.c for counting ! * tuples inserted by an INSERT command. ! */ ! cstate->processed++; ! } } } /* Done, clean up */ error_context_stack = errcontext.previous; + /* Free the partitioning LRU list if any */ + if (child_table_lru != NULL) + { + MemoryContext currentContext = MemoryContextSwitchTo(CurTransactionContext); + list_free(child_table_lru); + child_table_lru = NULL; + MemoryContextSwitchTo(currentContext); + } + FreeBulkInsertState(bistate); MemoryContextSwitchTo(oldcontext); Index: src/include/commands/trigger.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/commands/trigger.h,v retrieving revision 1.78 diff -c -r1.78 trigger.h *** src/include/commands/trigger.h 20 Nov 2009 20:38:11 -0000 1.78 --- src/include/commands/trigger.h 22 Nov 2009 16:52:29 -0000 *************** *** 130,135 **** --- 130,138 ---- extern HeapTuple ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple); + extern HeapTuple ExecARInsertTriggersNow(EState *estate, + ResultRelInfo *relinfo, + HeapTuple trigtuple); extern void ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo, HeapTuple trigtuple, Index: src/include/executor/executor.h =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/include/executor/executor.h,v retrieving revision 1.163 diff -c -r1.163 executor.h *** src/include/executor/executor.h 26 Oct 2009 02:26:41 -0000 1.163 --- src/include/executor/executor.h 22 Nov 2009 16:52:29 -0000 *************** *** 166,171 **** --- 166,173 ---- extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids); extern void ExecConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate); + extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo, + TupleTableSlot *slot, EState *estate); extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate, Relation relation, Index rti, ItemPointer tid, TransactionId priorXmax); Index: src/backend/executor/execMain.c =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.335 diff -c -r1.335 execMain.c *** src/backend/executor/execMain.c 20 Nov 2009 20:38:10 -0000 1.335 --- src/backend/executor/execMain.c 22 Nov 2009 16:52:29 -0000 *************** *** 1239,1245 **** /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! static const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { --- 1239,1245 ---- /* * ExecRelCheck --- check that tuple meets constraints for result relation */ ! const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { Index: src/test/regress/parallel_schedule =================================================================== RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/parallel_schedule,v retrieving revision 1.57 diff -c -r1.57 parallel_schedule *** src/test/regress/parallel_schedule 24 Aug 2009 03:10:16 -0000 1.57 --- src/test/regress/parallel_schedule 22 Nov 2009 16:52:29 -0000 *************** *** 47,53 **** # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect # ---------- # Another group of parallel tests --- 47,55 ---- # execute two copy tests parallel, to check that copy itself # is concurrent safe. # ---------- ! test: copy copyselect ! test: copy_partitioning ! test: copy_partitioning_trigger # ---------- # Another group of parallel tests Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /home/manu/cvsrepo/pgsql/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.92 diff -c -r1.92 copy.sgml *** doc/src/sgml/ref/copy.sgml 21 Sep 2009 20:10:21 -0000 1.92 --- doc/src/sgml/ref/copy.sgml 22 Nov 2009 16:52:29 -0000 *************** *** 41,46 **** --- 41,47 ---- ESCAPE '<replaceable class="parameter">escape_character</replaceable>' FORCE_QUOTE { ( <replaceable class="parameter">column</replaceable> [, ...] ) | * } FORCE_NOT_NULL ( <replaceable class="parameter">column</replaceable> [, ...] ) + PARTITIONING [ <replaceable class="parameter">boolean</replaceable> ] </synopsis> </refsynopsisdiv> *************** *** 282,287 **** --- 283,298 ---- </listitem> </varlistentry> + <varlistentry> + <term><literal>PARTITIONING</></term> + <listitem> + <para> + In <literal>PARTITIONING</> mode, <command>COPY TO</> a parent + table will automatically move each row to the child table that + has the matching constraints. + </para> + </listitem> + </varlistentry> </variablelist> </refsect1> *************** *** 384,389 **** --- 395,411 ---- <command>VACUUM</command> to recover the wasted space. </para> + <para> + <literal>PARTITIONING</> mode scans for each child table constraint in the + hierarchy to find a match. <literal>PARTITIONING</> assumes that every child + table has at least one constraint defined otherwise an error is thrown. If child + tables have overlapping constraints, the row is inserted in the first child table + found (be it a cached table or the first table to appear in the lookup). + Before of after ROW triggers will generate an error and abort the COPY operation + if they modify the tuple value in a way that violates the constraints of the child + table where the tuple has been routed. + </para> + </refsect1> <refsect1> *************** *** 828,833 **** --- 850,993 ---- 0000200 M B A B W E 377 377 377 377 377 377 </programlisting> </para> + + <para> + Multiple options are separated by a comma like: + <programlisting> + COPY (SELECT t FROM foo WHERE id = 1) TO STDOUT (FORMAT CSV, HEADER, FORCE_QUOTE (t)); + </programlisting> + </para> + + <refsect2> + <title>Partitioning examples</title> + <para> + Here is an example on how to use partitioning. Let's first create a parent + table and 3 child tables as follows: + <programlisting> + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + </programlisting> + We prepare the following data file (1 row for each child table): + copy_input.data content: + <programlisting> + 11 '2008-01-10' 11 + 12 '2008-02-15' 12 + 13 '2008-03-15' 13 + 21 '2008-01-10' 11 + 31 '2008-01-10' 11 + 41 '2008-01-10' 11 + 22 '2008-02-15' 12 + 23 '2008-03-15' 13 + 32 '2008-02-15' 12 + 33 '2008-03-15' 13 + 42 '2008-02-15' 12 + 43 '2008-03-15' 13 + </programlisting> + If we COPY the data in the parent table without partitioning enabled, all + rows are inserted in the master table as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data'; + + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + </programlisting> + If we execute COPY with partitioning enabled, rows are loaded in the + appropriate child table automatically as in this example: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + + SELECT * FROM y2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008; + id | date | value + ----+------------+------- + 11 | 01-10-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM feb2008; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + </programlisting> + The cache size can be tuned using: + <programlisting> + set copy_partitioning_cache_size = 3; + </programlisting> + Repeating the COPY command will now be faster: + <programlisting> + COPY y2008 FROM 'copy_input.data' (PARTITIONING); + </programlisting> + </para> + </refsect2> </refsect1> <refsect1> Index: src/test/regress/input/copy_partitioning_trigger.source =================================================================== RCS file: src/test/regress/input/copy_partitioning_trigger.source diff -N src/test/regress/input/copy_partitioning_trigger.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/input/copy_partitioning_trigger.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,60 ---- + -- Test triggers with partitioning + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create table t3 (check (i > 2 and i <= 3)) inherits (t); + + create table audit(i int); + + create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; + + create trigger t_a after insert on t for each row execute procedure audit(); + -- the before trigger on the t would get fired + -- create trigger t_a2 before insert on t for each row execute procedure audit(); + create trigger t1_a before insert on t1 for each row execute procedure audit(); + create trigger t1_a2 after insert on t1 for each row execute procedure audit(); + + copy t from stdin with (partitioning); + 1 + 2 + 3 + \. + + -- no rows if trigger does not work + select * from audit; + + drop table t cascade; + drop table audit cascade; + drop function audit(); + + -- Test bad before row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_before before insert on t1 for each row execute procedure i2(); + + -- COPY should fail + copy t from stdin with (partitioning); + 1 + \. + + drop table t cascade; + drop function i2(); + + -- Test bad after row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_after after insert on t1 for each row execute procedure i2(); + + -- COPY should fail + copy t from stdin with (partitioning); + 1 + \. + + drop table t cascade; + drop function i2(); Index: src/test/regress/data/copy_input.data =================================================================== RCS file: src/test/regress/data/copy_input.data diff -N src/test/regress/data/copy_input.data *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/data/copy_input.data 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,12 ---- + 11 '2008-01-19' 11 + 12 '2008-02-15' 12 + 13 '2008-03-15' 13 + 21 '2008-01-10' 11 + 31 '2008-01-10' 11 + 41 '2008-01-10' 11 + 22 '2008-02-15' 12 + 23 '2008-03-15' 13 + 32 '2008-02-15' 12 + 33 '2008-03-15' 13 + 42 '2008-02-15' 12 + 43 '2008-03-15' 13 Index: src/test/regress/input/copy_partitioning.source =================================================================== RCS file: src/test/regress/input/copy_partitioning.source diff -N src/test/regress/input/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/input/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,105 ---- + -- test 1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + 1 + \. + + drop table parent cascade; + + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + 1 + \. + + drop table parent cascade; + + -- test 2 (index update check) + create table parent(i int, j int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + + create index c1_idx on c1(j); + copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; + copy parent from '/tmp/parent' with (partitioning); + analyse; + + set enable_seqscan to false; + -- no rows if index was not updated + select * from c1 where j = 3; + + set enable_seqscan to true; + set enable_indexscan to false; + -- 1 row + select * from c1 where j = 3; + drop table parent cascade; + + -- test 3 + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int, + primary key(id) + ); + + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + + SELECT COUNT(*) FROM y2008; + SELECT COUNT(*) FROM jan2008; + SELECT COUNT(*) FROM jan2008half1; + SELECT COUNT(*) FROM jan2008half2; + SELECT COUNT(*) FROM feb2008; + SELECT COUNT(*) FROM mar2008; + + DELETE FROM y2008; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + SELECT * FROM jan2008 ORDER BY id; + SELECT * FROM jan2008half1 ORDER BY id; + SELECT * FROM jan2008half2 ORDER BY id; + SELECT * FROM feb2008 ORDER BY id; + SELECT * FROM mar2008 ORDER BY id; + DELETE FROM y2008; + + DROP TABLE y2008 CASCADE; + + -- test 4 (cache testing) + create table parent1(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent1); + create table parent2(i int); + create table c2 (check (i > 0 and i <= 1)) inherits (parent2); + copy parent1 from stdin with (partitioning); + 1 + \. + + copy parent2 from stdin with (partitioning); + 1 + \. + + -- If the caching does not work all tuples will go to parent1 + select * from parent1; + select * from parent2; + drop table parent1 cascade; + drop table parent2 cascade; Index: src/test/regress/output/copy_partitioning.source =================================================================== RCS file: src/test/regress/output/copy_partitioning.source diff -N src/test/regress/output/copy_partitioning.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/output/copy_partitioning.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,194 ---- + -- test 1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + drop table parent cascade; + NOTICE: drop cascades to table c1 + create table parent(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + copy parent from stdin with (partitioning); + drop table parent cascade; + NOTICE: drop cascades to table c1 + -- test 2 (index update check) + create table parent(i int, j int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent); + create table c2 (check (i > 1 and i <= 2)) inherits (parent); + create table c3 (check (i > 2 and i <= 3)) inherits (parent); + create index c1_idx on c1(j); + copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; + copy parent from '/tmp/parent' with (partitioning); + analyse; + set enable_seqscan to false; + -- no rows if index was not updated + select * from c1 where j = 3; + i | j + ---+--- + 1 | 3 + (1 row) + + set enable_seqscan to true; + set enable_indexscan to false; + -- 1 row + select * from c1 where j = 3; + i | j + ---+--- + 1 | 3 + (1 row) + + drop table parent cascade; + NOTICE: drop cascades to 3 other objects + DETAIL: drop cascades to table c1 + drop cascades to table c2 + drop cascades to table c3 + -- test 3 + CREATE TABLE y2008 ( + id int not null, + date date not null, + value int, + primary key(id) + ); + NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "y2008_pkey" for table "y2008" + CREATE TABLE jan2008 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) + ) INHERITS (y2008); + CREATE TABLE jan2008half1 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-01-15' ) + ) INHERITS (jan2008); + CREATE TABLE jan2008half2 ( + CHECK ( date >= DATE '2008-01-16' AND date < DATE '2008-01-31' ) + ) INHERITS (jan2008); + CREATE TABLE feb2008 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) + ) INHERITS (y2008); + CREATE TABLE mar2008 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) + ) INHERITS (y2008); + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data'; + SELECT COUNT(*) FROM y2008; + count + ------- + 12 + (1 row) + + SELECT COUNT(*) FROM jan2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half1; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM jan2008half2; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM feb2008; + count + ------- + 0 + (1 row) + + SELECT COUNT(*) FROM mar2008; + count + ------- + 0 + (1 row) + + DELETE FROM y2008; + COPY y2008 FROM '@abs_srcdir@/data/copy_input.data' (PARTITIONING); + SELECT * FROM y2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 12 | 02-15-2008 | 12 + 13 | 03-15-2008 | 13 + 21 | 01-10-2008 | 11 + 22 | 02-15-2008 | 12 + 23 | 03-15-2008 | 13 + 31 | 01-10-2008 | 11 + 32 | 02-15-2008 | 12 + 33 | 03-15-2008 | 13 + 41 | 01-10-2008 | 11 + 42 | 02-15-2008 | 12 + 43 | 03-15-2008 | 13 + (12 rows) + + SELECT * FROM jan2008 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (4 rows) + + SELECT * FROM jan2008half1 ORDER BY id; + id | date | value + ----+------------+------- + 21 | 01-10-2008 | 11 + 31 | 01-10-2008 | 11 + 41 | 01-10-2008 | 11 + (3 rows) + + SELECT * FROM jan2008half2 ORDER BY id; + id | date | value + ----+------------+------- + 11 | 01-19-2008 | 11 + (1 row) + + SELECT * FROM feb2008 ORDER BY id; + id | date | value + ----+------------+------- + 12 | 02-15-2008 | 12 + 22 | 02-15-2008 | 12 + 32 | 02-15-2008 | 12 + 42 | 02-15-2008 | 12 + (4 rows) + + SELECT * FROM mar2008 ORDER BY id; + id | date | value + ----+------------+------- + 13 | 03-15-2008 | 13 + 23 | 03-15-2008 | 13 + 33 | 03-15-2008 | 13 + 43 | 03-15-2008 | 13 + (4 rows) + + DELETE FROM y2008; + DROP TABLE y2008 CASCADE; + NOTICE: drop cascades to 5 other objects + DETAIL: drop cascades to table jan2008 + drop cascades to table jan2008half1 + drop cascades to table jan2008half2 + drop cascades to table feb2008 + drop cascades to table mar2008 + -- test 4 (cache testing) + create table parent1(i int); + create table c1 (check (i > 0 and i <= 1)) inherits (parent1); + create table parent2(i int); + create table c2 (check (i > 0 and i <= 1)) inherits (parent2); + copy parent1 from stdin with (partitioning); + copy parent2 from stdin with (partitioning); + -- If the caching does not work all tuples will go to parent1 + select * from parent1; + i + --- + 1 + (1 row) + + select * from parent2; + i + --- + 1 + (1 row) + + drop table parent1 cascade; + NOTICE: drop cascades to table c1 + drop table parent2 cascade; + NOTICE: drop cascades to table c2 Index: src/test/regress/output/copy_partitioning_trigger.source =================================================================== RCS file: src/test/regress/output/copy_partitioning_trigger.source diff -N src/test/regress/output/copy_partitioning_trigger.source *** /dev/null 1 Jan 1970 00:00:00 -0000 --- src/test/regress/output/copy_partitioning_trigger.source 1 Jan 1970 00:00:00 -0000 *************** *** 0 **** --- 1,58 ---- + -- Test triggers with partitioning + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create table t3 (check (i > 2 and i <= 3)) inherits (t); + create table audit(i int); + create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; + create trigger t_a after insert on t for each row execute procedure audit(); + -- the before trigger on the t would get fired + -- create trigger t_a2 before insert on t for each row execute procedure audit(); + create trigger t1_a before insert on t1 for each row execute procedure audit(); + create trigger t1_a2 after insert on t1 for each row execute procedure audit(); + copy t from stdin with (partitioning); + -- no rows if trigger does not work + select * from audit; + i + --- + 1 + 1 + (2 rows) + + drop table t cascade; + NOTICE: drop cascades to 3 other objects + DETAIL: drop cascades to table t1 + drop cascades to table t2 + drop cascades to table t3 + drop table audit cascade; + drop function audit(); + -- Test bad before row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_before before insert on t1 for each row execute procedure i2(); + -- COPY should fail + copy t from stdin with (partitioning); + ERROR: Before row insert trigger on table "t1" modified partitioning routing decision. Aborting insert. + CONTEXT: COPY t, line 1: "1" + drop table t cascade; + NOTICE: drop cascades to 2 other objects + DETAIL: drop cascades to table t1 + drop cascades to table t2 + drop function i2(); + -- Test bad after row trigger + create table t(i int); + create table t1 (check (i > 0 and i <= 1)) inherits (t); + create table t2 (check (i > 1 and i <= 2)) inherits (t); + create function i2() returns trigger as $$ begin NEW.i := 2; return NEW; end; $$ language plpgsql; + create trigger t1_after after insert on t1 for each row execute procedure i2(); + -- COPY should fail + copy t from stdin with (partitioning); + ERROR: After row insert trigger on table "t1" modified partitioning routing decision. Aborting insert. + CONTEXT: COPY t, line 1: "1" + drop table t cascade; + NOTICE: drop cascades to 2 other objects + DETAIL: drop cascades to table t1 + drop cascades to table t2 + drop function i2();
Emmanuel Cecchet wrote: > Jan, > >> A couple of nitpicks first: >> >> o) the route_tuple_to_child recurses to child tables of child tables, >> which is undocumented and requires a check_stack_depth() call if it's >> really desirable >> > The recursive call is as deep as the inheritance hierarchy. I am not > sure what we are supposed to do if check_stack_depth() fails. I think that check_stack_depth() just throws an elog(ERROR) when the stack depth is exceeded, so you can just add a check_stack_depth() call somewhere at the beginning of route_tuple_to_child and that's it. >> o) the error messages given when a trigger modifies the tuple should be >> one sentence, I suggest dropping the "Aborting insert" part >> > Where are those rules about error messages specified? http://www.postgresql.org/docs/current/static/error-style-guide.html Dropping "Aborting insert" is just a suggestion, it's possible the error message will sound OK to a native English speaker. >> o) there are two places with "Close the relation but keep the lock" >> comments. Why is in necessary to keep the locks? I confess I don't know >> why *wouldn't* it be necessary, but maybe the comment could explain >> that? Or is it just my lack of understanding and it should be obvious >> that the lock needs to be kept? >> > As we did write to the table, we must maintain the lock on it until the > operation or transaction is complete. OK, understood. >> o) the result of relation_open is explicitly cast to Relation, the >> result of try_relation_open is not (a minor gripe) >> > The first cast was unnecessary, I removed it. OK. >> o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted >> from just above, I guess there was a reason why you needed that code, >> but I also suspect that's a string indication that something's wrong >> with the abstractions in your patch. > As I explained to Tom, if the after row trigger is called asynchronously > I get a relcache leak on the child table at the end of the copy > operation. If the trigger is called synchronously (like a before row > trigger) it works fine. Also calling the after row trigger synchronously > allows me to detect any potential problem between the actions of the > trigger and the routing decision. I am open to any suggestion for a more > elegant solution. OK, my competence ends here :-) Someone with a better knowledge of the code should comment on that, I certainly don't have a better proposal. >> Oh, actually, the cache is outright *wrong*, as the attached test6.sql >> shows. Ugh, let's just forget about that LRU cache for now. >> > Point taken, I have removed the cache from the GUC variables and it is > now only used for the duration of the COPY operation. OK, that looks better. >> o) the patch could use some more docs, especially about descending into >> child tables. >> > Do you mean an overall comment explaining the design? Otherwise there is > a comment for every single 'if' and block of code in the patch. Be more > specific if you have a special location where you think comments are > missing or too vague. I was thinking more about SGML docs. They could mention that BEFORE triggers are fired both for the parent table and for the child table, while AFTER triggers will only be called on the target table. I'd add a sentence or two explaining what happens if you have a three-level inheritance hierarchy (that the tuple will be inserted in the bottommost table of the hierarchy). >> o) my main concern is still valid: the design was never agreed upon. > I already commented on that part in another message and this is not > related to that patch but to the politics of implementing partitioning > in Postgres. Now if the rejection of the patch is based on political > stances rather than technical once, I can understand that too. Sure, sorry if it sounded harsh. As I said I have virtually no field experience with PG, so I might have a wrong perspective. I also don't feel particulary eligible to judge your approach of handling automatic partitioning designwise. Except for the really minor things like checking stack depth and adding a few sentences to the SGML docs, I think it's time someone more qualified looked at the patch. If you'd like to send a new version, I'll wait for it and mark it as ready for committer review. Thanks for persisting with the patch and sorry for nitpicking so much :-) Cheers, Jan
On Sun, 22 Nov 2009, Emmanuel Cecchet wrote: > As I explained to Tom, if the after row trigger is called asynchronously > I get a relcache leak on the child table at the end of the copy > operation. If the trigger is called synchronously (like a before row > trigger) it works fine. Also calling the after row trigger synchronously > allows me to detect any potential problem between the actions of the > trigger and the routing decision. I am open to any suggestion for a more > elegant solution. Well, I think there are still some issues there that at least need to be better documented. For example,create or replace function fi() returns trigger as ' begin if (NEW.p is not null) then if (select count(*)from i where i.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end;' language'plpgsql'; create or replace function fc() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from c wherec.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end;' language 'plpgsql'; create or replace function fp() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from p wherep.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end;' language 'plpgsql'; drop table i;drop table c;drop table p cascade; create table i(i int, p int);create trigger tri after insert on i for each row execute procedure fi(); create table c(i int, p int);create trigger trc after insert on c for each row execute procedure fc(); create table p(i int, p int);create table p1 (check (i > 0 and i <= 10)) inherits (p);create table p2 (check (i > 10 andi <= 20)) inherits (p);create table p3 (check (i > 20 and i <= 30)) inherits (p);create trigger trp1 after insert on p1for each row execute procedure fp();create trigger trp2 after insert on p2 for each row execute procedure fp();create triggertrp3 after insert on p3 for each row execute procedure fp(); insert into i values (1,3),(2,1),(3,NULL); copy c from stdin; 1 3 2 1 3 \N \. copy p from stdin with (partitioning); 1 3 2 1 3 \N \. gives me a successful load into i and c, but not into p with the current patch AFAICS while a load where the 3 row is first does load.
Stephan Szabo wrote: > On Sun, 22 Nov 2009, Emmanuel Cecchet wrote: > > >> As I explained to Tom, if the after row trigger is called asynchronously >> I get a relcache leak on the child table at the end of the copy >> operation. If the trigger is called synchronously (like a before row >> trigger) it works fine. Also calling the after row trigger synchronously >> allows me to detect any potential problem between the actions of the >> trigger and the routing decision. I am open to any suggestion for a more >> elegant solution. >> > > Well, I think there are still some issues there that at least need to be > better documented. > > For example, > create or replace function fi() returns trigger as ' > begin > if (NEW.p is not null) then > if (select count(*) from i where i.i = NEW.p) = 0 then > raise exception ''No parent''; > end if; > end if; > return NEW; > end; > ' language 'plpgsql'; > > create or replace function fc() returns trigger as ' > begin > if (NEW.p is not null) then > if (select count(*) from c where c.i = NEW.p) = 0 then > raise exception ''No parent''; > end if; > end if; > return NEW; > end; > ' language 'plpgsql'; > > create or replace function fp() returns trigger as ' > begin > if (NEW.p is not null) then > if (select count(*) from p where p.i = NEW.p) = 0 then > raise exception ''No parent''; > end if; > end if; > return NEW; > end; > ' language 'plpgsql'; > > drop table i; > drop table c; > drop table p cascade; > > create table i(i int, p int); > create trigger tri after insert on i for each row execute procedure fi(); > > create table c(i int, p int); > create trigger trc after insert on c for each row execute procedure fc(); > > create table p(i int, p int); > create table p1 (check (i > 0 and i <= 10)) inherits (p); > create table p2 (check (i > 10 and i <= 20)) inherits (p); > create table p3 (check (i > 20 and i <= 30)) inherits (p); > create trigger trp1 after insert on p1 for each row execute procedure fp(); > create trigger trp2 after insert on p2 for each row execute procedure fp(); > create trigger trp3 after insert on p3 for each row execute procedure fp(); > > insert into i values (1,3),(2,1),(3,NULL); > copy c from stdin; > 1 3 > 2 1 > 3 \N > \. > copy p from stdin with (partitioning); > 1 3 > 2 1 > 3 \N > \. > > gives me a successful load into i and c, but not into p with the current > patch AFAICS while a load where the 3 row is first does load. > Well, if you don't insert anything in p (the table, try to avoid using the same name for the table and the column in an example), copy will insert (1,3) in p1 and then the trigger will evaluate select count(*) from p where p.i = NEW.p => NEW.p is 3 and the only p.i available is 1. This should return 0 rows and raise the exception. This seems normal to me. The only reason it works for i is because you inserted the values before the copy. Am I missing something? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
On Sun, 22 Nov 2009, Emmanuel Cecchet wrote: > Stephan Szabo wrote: > > On Sun, 22 Nov 2009, Emmanuel Cecchet wrote: > > > > > >> As I explained to Tom, if the after row trigger is called asynchronously > >> I get a relcache leak on the child table at the end of the copy > >> operation. If the trigger is called synchronously (like a before row > >> trigger) it works fine. Also calling the after row trigger synchronously > >> allows me to detect any potential problem between the actions of the > >> trigger and the routing decision. I am open to any suggestion for a more > >> elegant solution. > >> > > > > Well, I think there are still some issues there that at least need to be > > better documented. > > > > For example, > > create or replace function fi() returns trigger as ' > > begin > > if (NEW.p is not null) then > > if (select count(*) from i where i.i = NEW.p) = 0 then > > raise exception ''No parent''; > > end if; > > end if; > > return NEW; > > end; > > ' language 'plpgsql'; > > > > create or replace function fc() returns trigger as ' > > begin > > if (NEW.p is not null) then > > if (select count(*) from c where c.i = NEW.p) = 0 then > > raise exception ''No parent''; > > end if; > > end if; > > return NEW; > > end; > > ' language 'plpgsql'; > > > > create or replace function fp() returns trigger as ' > > begin > > if (NEW.p is not null) then > > if (select count(*) from p where p.i = NEW.p) = 0 then > > raise exception ''No parent''; > > end if; > > end if; > > return NEW; > > end; > > ' language 'plpgsql'; > > > > drop table i; > > drop table c; > > drop table p cascade; > > > > create table i(i int, p int); > > create trigger tri after insert on i for each row execute procedure fi(); > > > > create table c(i int, p int); > > create trigger trc after insert on c for each row execute procedure fc(); > > > > create table p(i int, p int); > > create table p1 (check (i > 0 and i <= 10)) inherits (p); > > create table p2 (check (i > 10 and i <= 20)) inherits (p); > > create table p3 (check (i > 20 and i <= 30)) inherits (p); > > create trigger trp1 after insert on p1 for each row execute procedure fp(); > > create trigger trp2 after insert on p2 for each row execute procedure fp(); > > create trigger trp3 after insert on p3 for each row execute procedure fp(); > > > > insert into i values (1,3),(2,1),(3,NULL); > > copy c from stdin; > > 1 3 > > 2 1 > > 3 \N > > \. > > copy p from stdin with (partitioning); > > 1 3 > > 2 1 > > 3 \N > > \. > > > > gives me a successful load into i and c, but not into p with the current > > patch AFAICS while a load where the 3 row is first does load. > > > Well, if you don't insert anything in p (the table, try to avoid using > the same name for the table and the column in an example), copy will > insert (1,3) in p1 and then the trigger will evaluate > > select count(*) from p where p.i = NEW.p => NEW.p is 3 and the only p.i available is 1. > > This should return 0 rows and raise the exception. This seems normal to me. > > The only reason it works for i is because you inserted the values before > the copy. > > Am I missing something? I believe so unless I am. There are three separate cases being run for comparison purposes. Multi-row insert on i where an after trigger on i checks the parents within i, a copy on c where an after trigger on c checks the parents within c, a copy on p (with inheritance) where an after trigger on p* checks the parents within the p hierarchy. So, in the case of the multi-row insert, it's inserting (1,3), but it doesn't immediately check, it inserts (2,1) and (3,NULL) before running the checks. The same seems to happen for the base copy. Copy with inheritance seems to be working differently. That may or may not be okay, but if it's different it needs to be prominently mentioned in documentation.
On Sun, Nov 22, 2009 at 12:35 PM, Jan Urbański <wulczer@wulczer.org> wrote: > I was thinking more about SGML docs. They could mention that BEFORE > triggers are fired both for the parent table and for the child table, > while AFTER triggers will only be called on the target table. I'd add a > sentence or two explaining what happens if you have a three-level > inheritance hierarchy (that the tuple will be inserted in the bottommost > table of the hierarchy). I have a hard time believing this is OK, even with documentation. While it might be OK in some (many?) particular use cases to fire triggers in this way, making COPY with the partitioning option fire different triggers at different times than COPY without the partitioning option - and in fact every other method of getting data into a table, all of which are consistent with each other and with COPY without the partitioning option - seems like a bad idea to me. I don't think the behavior described above is OK, and I also don't think that the changes in the timing of AFTER-trigger firing are OK. I understand that without that change there was a relcache leak, but I think that just means that bug needs to be found and fixed. I would also like to see some more discussion of the basic mechanism of this patch. Essentially, what it's trying to do is traverse the inheritance hierarchy looking for a table whose constraints match the current tuple, and then inserting the tuple there. First - is this a good idea? It's embeds some assumptions about how inheritance hierarchies are set up which don't seem totally unreasonable, but even so I'm not sure we want to go that route. Second - in lieu of accepting this approach, do we want to wait for Itagaki Takahiro's partitioning syntax patch to go in (as I am hoping that it will) and then do something more structured based on the notation introduced there? One thing that biases me toward thinking that maybe we should wait is the fact that this patch relies on an MRU list to determine into which child table a particular tuple should be inserted. If the constraints on the child tables are not mutually exclusive, the tuple routing won't be deterministic, which seems undesirable to me. On the other hand, if we got rid of the MRU cache and made the order deterministic (say, alphabetical by partition name) then I'm guessing this would be quite slow for large numbers of partitions when most of the tuples need to go into the later partitions. ...Robert
On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote: > Hi, > >> I have extracted the partitioning option for COPY (removed the error > >> logging part) from the previous patch. > >> > > > > We can use an INSERT trigger to route tuples into partitions even now. > > Why do you need an additional router for COPY? > Tom has already explained on the list why using a trigger was a bad idea > (and I know we can use a trigger since I am the one who wrote it). > If you look at the code you will see that you can do optimizations in > the COPY code that you cannot do in the trigger. > > > Also, it would be nicer > > that the router can works not only in COPY but also in INSERT. > > > As 8.5 will at best provide a syntactic hack on top of the existing > constraint implementation, I think that it will not hurt to have routing > in COPY since we will not have it anywhere otherwise. > > BTW, I'm working on meta data of partitioning now. Your "partitioning" > > option in COPY could be replaced with the catalog. > > > This implementation is only for the current 8.5 and it will not be > needed anymore once we get a fully functional partitioning in Postgres > which seems to be for a future version. Yes, the trigger way of doing this is a bad way. I regret to say that the way proposed here isn't much better, AFAICS. Let me explain why I think that, but -1 to anyone applying this patch. This patch proposes keeping a cache of last visited partitions to reduce the overhead of data routing. What I've requested is that partitioning work by using a data structure held in relcache for inheritance parents. This differs in 3 ways from this patch a) it has a clearly defined location for the cached metadata, with clearly identified and well worked out mechanisms for cache invalidation b) the cache can be built once when it is first needed, not slowly grown as parts of the metadata are used c) it would be available for all parts of the server, not just COPY. The easiest way to build that metadata is when structured partitioning info is available. i.e. the best next action is to complete and commit Itagaki's partitioning syntax patch. Then we can easily build the metadata for partitioning, which can then be used in COPY for data routing. Anyway, I want data routing, as is the intention of this patch. I just don't think this patch is a useful way to do it. It is too narrow in its scope and potentially buggy in its approach to developing a cache and using trigger-like stuff. ISTM that with the right metadata in the right place, a cleaner and easier solution is still possible for 8.5. The code within COPY should really just reduce to a small piece of code to derive the correct relation for the desired row and then use that during heap_insert(). I have just discussed partitioning with Itagaki-san at JPUG, so I know his plans. Itagaki-san and Manu, please can you work together to make this work for 8.5? --- A more detailed explanation of Partitioning Metadata: Partitioning Metadata is information held on the relcache for a table that has child partitions. Currently, a table does not cache info about its children, which prevents various optimisations. We would have an extra pointer on the Relation struct that points to a PartitioningMetadata struct. We can fill in this information when we construct the relcache for a relation, or we can populate it on demand the first time we attempt to use that information (if it exists). We want to hold an array of partition boundary values. This will then allow us to use bsearch to find the partition that a specific value applies to. Thus it can be used for routing data from INSERTs or COPY, can be used for identifying which partitions need to be included/excluded from an APPEND node. Using this will be O(logN) rather than O(N), so allowing us to have much larger number of partitions when required. Note that it can also be used within the executor to perform dynamic partition elimination, thus allowing us to easily implement partition aware joins etc. To construct the array we must sort the partition boundary values and prove that the partition definitions do not overlap. That is much easier to do when the partitions are explicitly defined. (Plus, there is no requirement to have, or mechanism to specify, unique partitions currently, although most users assume this in their usage). I imagine we would have an API called something like RelationIdentifyPartition() where we provide value(s) for the PartitioningKey column(s) and we then return the Oid of the partition that holds that value. That function would build the metadata, if not already cached, then bsearch it to provide the Oid. -- Simon Riggs www.2ndQuadrant.com
Simon, I think you should read the thread and the patch before making any false statements like you did in your email. 1. The patch does not use any trigger for routing. 2. This is just an option for COPY that is useful for loading operations in the datawarehouse world. It is not meant to implement full partitioning as explained many times already in this thread. 3. This patch elaborates on existing mechanisms and cannot rely on a meta-data representation of partitions which does not exist yet and will probably not exist in 8.5 You should justify your statements when you say 'potentially buggy in its approach to developing a cache and using trigger-like stuff'. I understand that you don't like it because this is not what you want but this is not my fault. This is not an implementation of partitioning like COPY does not do update/delete/alter/... And yes the use case is 'narrow' like any option in COPY. It is like complaining that the CSV option is not useful because you want to load binary dumps. If Itagaki gets the support of the community to get his implementation accepted, I will gladly use it. Contributing? If Aster is willing to contribute a code monkey to implement your specs, why not but you will have to convince them. You should really think twice about the style of your emails that cast a detestable tone to discussions on pg-hackers. Emmanuel > On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote: > >> Hi, >> >>>> I have extracted the partitioning option for COPY (removed the error >>>> logging part) from the previous patch. >>>> >>>> >>> We can use an INSERT trigger to route tuples into partitions even now. >>> Why do you need an additional router for COPY? >>> >> Tom has already explained on the list why using a trigger was a bad idea >> (and I know we can use a trigger since I am the one who wrote it). >> If you look at the code you will see that you can do optimizations in >> the COPY code that you cannot do in the trigger. >> >> >>> Also, it would be nicer >>> that the router can works not only in COPY but also in INSERT. >>> >>> >> As 8.5 will at best provide a syntactic hack on top of the existing >> constraint implementation, I think that it will not hurt to have routing >> in COPY since we will not have it anywhere otherwise. >> >>> BTW, I'm working on meta data of partitioning now. Your "partitioning" >>> option in COPY could be replaced with the catalog. >>> >>> >> This implementation is only for the current 8.5 and it will not be >> needed anymore once we get a fully functional partitioning in Postgres >> which seems to be for a future version. >> > > Yes, the trigger way of doing this is a bad way. > > I regret to say that the way proposed here isn't much better, AFAICS. > Let me explain why I think that, but -1 to anyone applying this patch. > > This patch proposes keeping a cache of last visited partitions to reduce > the overhead of data routing. > > What I've requested is that partitioning work by using a data structure > held in relcache for inheritance parents. This differs in 3 ways from > this patch > a) it has a clearly defined location for the cached metadata, with > clearly identified and well worked out mechanisms for cache invalidation > b) the cache can be built once when it is first needed, not slowly grown > as parts of the metadata are used > c) it would be available for all parts of the server, not just COPY. > > The easiest way to build that metadata is when structured partitioning > info is available. i.e. the best next action is to complete and commit > Itagaki's partitioning syntax patch. Then we can easily build the > metadata for partitioning, which can then be used in COPY for data > routing. > > Anyway, I want data routing, as is the intention of this patch. I just > don't think this patch is a useful way to do it. It is too narrow in its > scope and potentially buggy in its approach to developing a cache and > using trigger-like stuff. > > ISTM that with the right metadata in the right place, a cleaner and > easier solution is still possible for 8.5. The code within COPY should > really just reduce to a small piece of code to derive the correct > relation for the desired row and then use that during heap_insert(). > > I have just discussed partitioning with Itagaki-san at JPUG, so I know > his plans. Itagaki-san and Manu, please can you work together to make > this work for 8.5? > > --- > > A more detailed explanation of Partitioning Metadata: > > Partitioning Metadata is information held on the relcache for a table > that has child partitions. Currently, a table does not cache info about > its children, which prevents various optimisations. > > We would have an extra pointer on the Relation struct that points to a > PartitioningMetadata struct. We can fill in this information when we > construct the relcache for a relation, or we can populate it on demand > the first time we attempt to use that information (if it exists). > > We want to hold an array of partition boundary values. This will then > allow us to use bsearch to find the partition that a specific value > applies to. Thus it can be used for routing data from INSERTs or COPY, > can be used for identifying which partitions need to be > included/excluded from an APPEND node. Using this will be O(logN) rather > than O(N), so allowing us to have much larger number of partitions when > required. Note that it can also be used within the executor to perform > dynamic partition elimination, thus allowing us to easily implement > partition aware joins etc. > > To construct the array we must sort the partition boundary values and > prove that the partition definitions do not overlap. That is much easier > to do when the partitions are explicitly defined. (Plus, there is no > requirement to have, or mechanism to specify, unique partitions > currently, although most users assume this in their usage). > > I imagine we would have an API called something like > RelationIdentifyPartition() where we provide value(s) for the > PartitioningKey column(s) and we then return the Oid of the partition > that holds that value. That function would build the metadata, if not > already cached, then bsearch it to provide the Oid. > > -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet <manu@asterdata.com> wrote: > I think you should read the thread and the patch before making any false > statements like you did in your email. > > 1. The patch does not use any trigger for routing. Whoa, whoa! I don't think that Simon said that it did. But even if I am wrong and he did... > You should really think twice about the style of your emails that cast a > detestable tone to discussions on pg-hackers. ...I certainly don't think this comment is justified. This is a technical discussion about the best way of solving a certain problem, and I don't believe that any of the discussion up to this point has been anything other than civil. I can tell that you are frustrated that your patch is not getting the support you would like to see it get, but launching ad hominem attacks on Simon or anyone else is not going to help. ...Robert
Robert Haas wrote: > On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet <manu@asterdata.com> wrote: > >> I think you should read the thread and the patch before making any false >> statements like you did in your email. >> >> 1. The patch does not use any trigger for routing. >> > > Whoa, whoa! I don't think that Simon said that it did. But even if I > am wrong and he did... > Quote from Simon's email: "It is too narrow in its scope and potentially buggy in its approach to developing a cache and using trigger-like stuff." >> You should really think twice about the style of your emails that cast a >> detestable tone to discussions on pg-hackers. >> > ...I certainly don't think this comment is justified. This is a > technical discussion about the best way of solving a certain problem, > and I don't believe that any of the discussion up to this point has > been anything other than civil. I can tell that you are frustrated > that your patch is not getting the support you would like to see it > get, but launching ad hominem attacks on Simon or anyone else is not > going to help We certainly don't live in the same civilization then. I am not frustrated if my patch does not get in because of technical considerations and I am happy so far with Jan's feedback that helped a lot. I think there is a misunderstanding between what Simon wants ('Anyway, I want data routing, as is the intention of this patch.') and what this patch is about. This patch is just supposed to load tuples in a hierarchy of tables as this is a recurrent use case in datawarehouse scenarios. It is not supposed to solve data routing in general (otherwise that would be integrated standard in COPY and not as an option). But it looks like it is a waste of everybody's time to continue this discussion further. Just move the patch to the rejected patches and let's wait for Itagaki's implementation. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote: > I think you should read the thread and the patch I did read the thread and patch in full before posting. My opinions are given to help you and the community towards a desirable common goal. I was unaware you were developing these ideas and so was unable to provide comments until now. My review of Kedar's patch in July did lay out in general terms a specific implementation route for future work on partitioning. I had thought I might not have made those comments clearly enough, so gave a more specific description of what I consider to be a more workable and general solution for cacheing and using partitioning metadata. -- Simon Riggs www.2ndQuadrant.com
On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote: > I think you should read the thread and the patch I did read the thread and patch in full before posting. My opinions are given to help you and the community towards a desirable common goal. I was unaware you were developing these ideas and so was unable to provide comments until now. My review of Kedar's patch in July did lay out in general terms a specific implementation route for future work on partitioning. I had thought I might not have made those comments clearly enough, so gave a more specific description of what I consider to be a more workable and general solution for cacheing and using partitioning metadata. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > I was unaware you were developing these ideas and so was unable to > provide comments until now. The first patch was published to this list on September 10 (almost 2.5 months ago) along with the wiki page describing the problem and the solution. What should I have done to raise awareness further? /E -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
On Mon, 2009-11-23 at 10:24 -0500, Emmanuel Cecchet wrote: > I think there is a misunderstanding between what Simon wants > ('Anyway, I want data routing, as is the intention of this patch.') and > what this patch is about. This patch is just supposed to load tuples in > a hierarchy of tables as this is a recurrent use case in datawarehouse > scenarios. It is not supposed to solve data routing in general > (otherwise that would be integrated standard in COPY and not as an option). I have not misunderstood. You wish to solve a very specific problem, with very specific code. I've done that myself on occasion. My opinion is that we should solve many of the partitioning problems with one set of central, common code. If we do not do this we will need 3-4 times as much code, most of which will be similar and yet must be exactly the same. That alone is enough to block the patch's proposed method (IMHO). > But it looks like it is a waste of everybody's time to continue this > discussion further. Just move the patch to the rejected patches and > let's wait for Itagaki's implementation. The lack of discussion and design in this area has held back the last few patches, by various authors; we should learn from that. Also, working in isolation on narrow problems will not move us forwards as fast as if we all work together on pieces of the whole vision for partitioning. My piece was to think through how to link each of the different aspects of partitioning and to propose a solution. Please join with Itagaki to move this forwards - your further contributions will be valuable. -- Simon Riggs www.2ndQuadrant.com
On Mon, 2009-11-23 at 10:43 -0500, Emmanuel Cecchet wrote: > Simon Riggs wrote: > > I was unaware you were developing these ideas and so was unable to > > provide comments until now. > The first patch was published to this list on September 10 (almost 2.5 > months ago) along with the wiki page describing the problem and the > solution. > What should I have done to raise awareness further? ...Read my detailed comments in response to Kedar's patch and post comments on that thread to say you didn't agree with that proposal and that you were thinking of another way entirely. ~14 July. >4 months ago. ...Contact me personally when you saw that I hadn't responded to your later posts, knowing that I have recent track record as a reviewer of partitioning patches. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> writes: > Anyway, I want data routing, as is the intention of this patch. I just > don't think this patch is a useful way to do it. It is too narrow in its > scope and potentially buggy in its approach to developing a cache and > using trigger-like stuff. FWIW, I agree --- there are two really fundamental problems with this patch: * It only applies to COPY. You'd certainly want routing for INSERT as well. And it shouldn't be necessary to specify anoption. * Building this type of infrastructure on top of independent, not guaranteed consistent table constraints is just throwingmore work into a dead end. The patch is already full of special-case errors for possible inconsistency of the constraints,and I don't think it's bulletproof even so (what if someone is altering the constraints concurrently? What ifthere's more than one legal destination?) And the performance necessarily sucks. What we need first is an explicit representation of partitioning, and then to build routing code on top of that. I haven't looked at Itagaki-san's syntax patch at all, but I think it's at least starting in a sensible place. regards, tom lane
Simon Riggs wrote: > ...Read my detailed comments in response to Kedar's patch and post > comments on that thread to say you didn't agree with that proposal and > that you were thinking of another way entirely. Useful background here is: http://wiki.postgresql.org/wiki/Table_partitioning http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d5d7@mail.gmail.com http://archives.postgresql.org/message-id/1247564358.11347.1308.camel@ebony.2ndQuadrant The basic problem here is that Emmanuel and Aster developed a useful answer to one of the more pressing implementation details needed here, but did so without being involved in the much larger discussion of how to implement general, more automated partitioning in PostgreSQL that (as you can see from the date of the first links there) has been going on for years already. What we did wrong as a community is not more explicitly tell Emmanuel the above when he first submitted code a few months ago, before he'd invested more time on a subset implementation that was unlikely to be committed. As I already commented upthread, I was just happy to see coding progress being made on part of the design that nobody had hacked on before to my knowledge; I didn't consider then how Emmanuel was going to be disappointed by the slow rate that code would be assimilated into the design going on in this area. What would probably be helpful here is to take the mess of raw data above and turn it into a simpler partitioning roadmap. There's a stack of useful patches here, multiple contributors who have gotten familiar with the implementation details required, and enough time left that it's possible to pull something together in time for 8.5--but only if everyone is clear on exactly what direction to push toward. I'm going to reread the history here myself and see if I can write something helpful here. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On Mon, 2009-11-23 at 12:23 -0500, Greg Smith wrote: > What would probably be helpful here is to take the mess of raw data > above and turn it into a simpler partitioning roadmap. Thanks for summarising. I briefly tried to do that on the thread for Itagaki-san's patch. That's a first stab at things, at least. -- Simon Riggs www.2ndQuadrant.com
Tom Lane <tgl@sss.pgh.pa.us> wrote: > What we need first is an explicit representation of partitioning, and > then to build routing code on top of that. I haven't looked at > Itagaki-san's syntax patch at all, but I think it's at least starting > in a sensible place. I have the following development plan for partitioning. I'll continue to use inherits-based partitioning... at least in 8.5. 8.5 Alpha 3: Syntax and catalog changes (on-disk structure). I think pg_dump is the biggest stopper in the phase. 8.5 Alpha 4: Internal representation (on-memory structure), that will replace insert-triggers first, and also replaceCHECK constraints if possible (but probably non-INSERT optimizations will slide to 8.6). The internal representation of RANGE partitions will be an array of pairs of { upper-value, parition-relid } for each parent table. An insert target partition are determined using binary search on insert. It will be faster than sequential checks of CHECK constraint especially in large number of child tables. The array will be kept in CacheMemoryContext or query context to reduce access to the system catalog. RelationData or TupleDesc will have an additional field for it. > * It only applies to COPY. You'd certainly want routing for INSERT as > well. And it shouldn't be necessary to specify an option. Sure. We need the routingin both INSERT and COPY. Even if Emmanuel-san's patch will be committed in Alpha 3, the code would be discarded in Alpha 4. > * Building this type of infrastructure on top of independent, not > guaranteed consistent table constraints is just throwing more work > into a dead end. I think the current approach is not necessarily wrong for CHECK-based partitioning, but I'd like to have more specialized or generalized functionality for the replacement of triggers. If we will take specialized approach, triggers will be replaced with built-in feature. We can only use RANGE and LIST partitions. On the other hand, it might be interesting to take some generalized approach; For example, spliting BEFORE INSERT triggers into 3 phases: 1. Can cancel the insert and modify the new tuple.2. Can cancel the insert, but cannot modify tuple. 3. Neigher can cancel nor modify. We call triggers in the number order. INSERT TRIGGERs are implemented in 2nd phase, so we're not afraid of modifing partition keys. (3rd phase will be used for replication trigger.) However, I think generalized one is overkill. A specialized approach would be enough. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Hi, >> What would probably be helpful here is to take the mess of raw data >> above and turn it into a simpler partitioning roadmap. > > Thanks for summarising. > Yeah, excellent summary Greg. As you rightly pointed out, partitioning needs a broad roadmap so that the community can contribute in unison. That ways we can in future avoid decent efforts like Manu's which might not bear any fruit because of the prevailing confusion today.. > I briefly tried to do that on the thread for Itagaki-san's patch. That's > a first stab at things, at least. +1. Itagaki-san's patch seems like a firm foot forward. Regards, Nikhils -- http://www.enterprisedb.com
On Mon, 2009-11-23 at 10:24 -0500, Emmanuel Cecchet wrote: > But it looks like it is a waste of everybody's time to continue this > discussion further. Just move the patch to the rejected patches and > let's wait for Itagaki's implementation. Emmanuel, please try to work together with Itagaki san on getting the bigger vision implemented, as this is a thing that can benefit a lot from more people who have taken the time to learn about the parts of code involved. Even though this patch will not get in, most of the effort in developing it is not actual coding, but familiarizing yourself with the other code involved. Coding actual patches should be easy once you know the code _and_ the desired result. You probably already know a lot of what is required to help us to common goal of a clean implementation of partitioning. > -- > Emmanuel Cecchet > Aster Data > Web: http://www.asterdata.com > > -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hannu Krosing <hannu@2ndQuadrant.com> wrote: > Even though this patch will not get in, most of the effort in developing > it is not actual coding, but familiarizing yourself with the other code > involved. I just edited a wiki page for this discussion. I hope it can be a help. http://wiki.postgresql.org/wiki/Table_partitioning Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
On Tue, 2009-11-24 at 17:30 +0900, Itagaki Takahiro wrote: > Hannu Krosing <hannu@2ndQuadrant.com> wrote: > > > Even though this patch will not get in, most of the effort in developing > > it is not actual coding, but familiarizing yourself with the other code > > involved. > > I just edited a wiki page for this discussion. > I hope it can be a help. > http://wiki.postgresql.org/wiki/Table_partitioning > Good job. Looks like a clear path forwards to me. I've made a couple of minor clarifications. -- Simon Riggs www.2ndQuadrant.com
Itagaki Takahiro wrote: > I just edited a wiki page for this discussion. > I hope it can be a help. > http://wiki.postgresql.org/wiki/Table_partitioning > I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. In the case of COPY, an after statement trigger could change all the routing decisions taken for each row. I am not sure what the semantic should be if you have triggers defined on the parent and child tables. Which triggers do you fire if the insert is on the parent table but the tuple ends up in a child table? If the new implementation hides the child tables, it might be safer to not allow triggers on child tables altogether and use the parent table as the single point of entry to access the partition (and define triggers). With the current proposed implementation, would it be possible to define a view using child tables? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
Emmanuel Cecchet <manu@asterdata.com> wrote: > I guess the problem of handling user triggers is still open. > If we allow triggers on partitions, badly written logic could lead to > infinite loops in routing. Infinite loops are not a partition-related problem, no? We can also find infinite loops in user defined functions, recursive queries, etc. I think the only thing we can do for it is to *stop* loops instead of prevention, like max_stack_depth. > With the current proposed implementation, would it be > possible to define a view using child tables? No, if you mean using a partition-view. I'm thinking we are moving our implementation of partitioning from view-based to built-in feature. Do you have any use-cases that requires view-based partitioning? Was the inheritance-based partitioning not enough for it? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Itagaki Takahiro wrote: > Emmanuel Cecchet <manu@asterdata.com> wrote: > > >> I guess the problem of handling user triggers is still open. >> If we allow triggers on partitions, badly written logic could lead to >> infinite loops in routing. >> > Infinite loops are not a partition-related problem, no? > We can also find infinite loops in user defined functions, > recursive queries, etc. I think the only thing we can do for it > is to *stop* loops instead of prevention, like max_stack_depth. > I was thinking a trigger on child1 updating the partition key forcing the tuple to move to child2. And then a trigger on child2 updating the key again to move the tuple back to child1. You end up with an infinite loop. >> With the current proposed implementation, would it be >> possible to define a view using child tables? >> > > No, if you mean using a partition-view. I'm thinking we are moving > our implementation of partitioning from view-based to built-in feature. > Do you have any use-cases that requires view-based partitioning? > Was the inheritance-based partitioning not enough for it? > Nevermind, I was thinking about the implications of materialized views but Postgres does not have materialized views! I have other questions related to create table but I will post them in the 'syntax for partitioning' thread. Thanks Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote: > Itagaki Takahiro wrote: > > I just edited a wiki page for this discussion. > > I hope it can be a help. > > http://wiki.postgresql.org/wiki/Table_partitioning > > > I guess the problem of handling user triggers is still open. > If we allow triggers on partitions, badly written logic could lead to > infinite loops in routing. In the case of COPY, an after statement > trigger could change all the routing decisions taken for each row. A simple update to the row can cause it to move between partitions, no ? > I am not sure what the semantic should be if you have triggers defined on the > parent and child tables. Which triggers do you fire if the insert is on > the parent table but the tuple ends up in a child table? I'd propose that triggers on both parent table and selected child are executed. 1. first you execute before triggers on parent table, which may change which partition the row belongs to 2. then you execute before triggers on selected child table 2.1 if this changes the child table selection repeat from 2. 3. save the tuple in child table 4. execute after triggers of the final selected child table 5. execute after triggers of parent table order of 4. and 5. is selected arbitrarily, others are determined by flow. > If the new implementation hides the child tables, If you hide child tables, you suddenly need a lot of new syntax to "unhide" them, so that partitions can be manipulated. Currently it is easy to do it with INHERIT / NO INHERIT. > it might be safer to > not allow triggers on child tables altogether and use the parent table > as the single point of entry to access the partition (and define > triggers). With the current proposed implementation, would it be > possible to define a view using child tables? the child tables are there, and they _are_ defined, either implicitly (using constraints, which "constraint exclusion" resolves to a set of child tables) or explicitly, using child table names directly. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hannu Krosing wrote: > On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote: > >> Itagaki Takahiro wrote: >> >>> I just edited a wiki page for this discussion. >>> I hope it can be a help. >>> http://wiki.postgresql.org/wiki/Table_partitioning >>> >>> >> I guess the problem of handling user triggers is still open. >> If we allow triggers on partitions, badly written logic could lead to >> infinite loops in routing. In the case of COPY, an after statement >> trigger could change all the routing decisions taken for each row. >> > > A simple update to the row can cause it to move between partitions, no ? > Yes. >> I am not sure what the semantic should be if you have triggers defined on the >> parent and child tables. Which triggers do you fire if the insert is on >> the parent table but the tuple ends up in a child table? >> > > I'd propose that triggers on both parent table and selected child are > executed. > > 1. first you execute before triggers on parent table, which may > change which partition the row belongs to > > 2. then you execute before triggers on selected child table > > 2.1 if this changes the child table selection repeat from 2. > > 3. save the tuple in child table > > 4. execute after triggers of the final selected child table > What if that trigger changes again the child table selection? > 5. execute after triggers of parent table > Same here, what if the trigger changes the child table selection. Do we re-execute triggers on the new child table? Also it is debatable whether we should execute an after trigger on a table where nothing was really inserted. > order of 4. and 5. is selected arbitrarily, others are determined by > flow. > Also the description omits the execution of before and after statement triggers. While those can apply to the parent table (but the same question about what happens if the after statement modifies routing decision still applies), what does it mean in the case of COPY to have statement triggers on the child tables? You cannot know in advance where the tuples are going to go and fire the before statement triggers. If you had to fire after statement triggers, in which order would you fire them? >> If the new implementation hides the child tables, >> > > If you hide child tables, you suddenly need a lot of new syntax to > "unhide" them, so that partitions can be manipulated. Currently it is > easy to do it with INHERIT / NO INHERIT. > Agreed, but I think that we will discover some restrictions that will apply to child tables. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > I'd propose that triggers on both parent table and selected child are > executed. I was thinking we should make the partitioning decision FIRST, before any triggers are fired, and then fire only those triggers relevant to the selected partition. If the BEFORE triggers on the partition modify the tuple in a way that makes it incompatible with the table constraints on that partition, the insert (or update) fails. Firing triggers on more than one table is pretty substantially incompatible with what we do elsewhere and I'm not clear what we get in exchange. What is the use case for this? ...Robert
Robert Haas wrote: > On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > >> I'd propose that triggers on both parent table and selected child are >> executed. >> > > I was thinking we should make the partitioning decision FIRST, before > any triggers are fired, and then fire only those triggers relevant to > the selected partition. If the BEFORE triggers on the partition > modify the tuple in a way that makes it incompatible with the table > constraints on that partition, the insert (or update) fails. > > Firing triggers on more than one table is pretty substantially > incompatible with what we do elsewhere and I'm not clear what we get > in exchange. What is the use case for this? > I don't have a use case for this but I was puzzled with that when I had to implement trigger support in COPY with partitioning. I came to the same conclusion as you and made the operation fail if the trigger was trying to move the tuple to another partition. However, I had a problem with after row triggers that I had to call synchronously to be able to detect the change. We will need something to tell us that an after row trigger did not mess with the routing decision. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
On Wed, Nov 25, 2009 at 9:21 AM, Emmanuel Cecchet <manu@asterdata.com> wrote: > Robert Haas wrote: >> On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing <hannu@2ndquadrant.com> >> wrote: >>> >>> I'd propose that triggers on both parent table and selected child are >>> executed. >> >> I was thinking we should make the partitioning decision FIRST, before >> any triggers are fired, and then fire only those triggers relevant to >> the selected partition. If the BEFORE triggers on the partition >> modify the tuple in a way that makes it incompatible with the table >> constraints on that partition, the insert (or update) fails. >> >> Firing triggers on more than one table is pretty substantially >> incompatible with what we do elsewhere and I'm not clear what we get >> in exchange. What is the use case for this? > > I don't have a use case for this but I was puzzled with that when I had to > implement trigger support in COPY with partitioning. > I came to the same conclusion as you and made the operation fail if the > trigger was trying to move the tuple to another partition. However, I had a > problem with after row triggers that I had to call synchronously to be able > to detect the change. We will need something to tell us that an after row > trigger did not mess with the routing decision. *scratches head* I'm confused. Only a BEFORE ROW trigger can possibly change anything... the return value of an AFTER ROW trigger is ignored. ...Robert
It seems like the easiest way to resolve this without weird corner cases is to say that we fire triggers belonging to the parent table. The individual partition child tables either shouldn't have triggers at all, or we should restrict the cases in which those are considered applicable. As an example, what are you going to do with statement-level triggers? Fire them for *every* child whether it receives a row or not? Doesn't seem like the right thing. Again, this solution presupposes an explicit concept of partitioned tables within the system... regards, tom lane
On Wed, 2009-11-25 at 08:39 -0500, Emmanuel Cecchet wrote: > Hannu Krosing wrote: > > On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote: > > > >> Itagaki Takahiro wrote: > >> > >>> I just edited a wiki page for this discussion. > >>> I hope it can be a help. > >>> http://wiki.postgresql.org/wiki/Table_partitioning > >>> > >>> > >> I guess the problem of handling user triggers is still open. > >> If we allow triggers on partitions, badly written logic could lead to > >> infinite loops in routing. In the case of COPY, an after statement > >> trigger could change all the routing decisions taken for each row. > >> > > > > A simple update to the row can cause it to move between partitions, no ? > > > Yes. > >> I am not sure what the semantic should be if you have triggers defined on the > >> parent and child tables. Which triggers do you fire if the insert is on > >> the parent table but the tuple ends up in a child table? > >> > > > > I'd propose that triggers on both parent table and selected child are > > executed. > > > > 1. first you execute before triggers on parent table, which may > > change which partition the row belongs to > > > > 2. then you execute before triggers on selected child table > > > > 2.1 if this changes the child table selection repeat from 2. > > > > 3. save the tuple in child table > > > > 4. execute after triggers of the final selected child table > > > What if that trigger changes again the child table selection? > > 5. execute after triggers of parent table > > > Same here, what if the trigger changes the child table selection. Do we > re-execute triggers on the new child table? After triggers can't change tuple, thus cant change routing. > Also it is debatable whether we should execute an after trigger on a > table where nothing was really inserted. > > order of 4. and 5. is selected arbitrarily, others are determined by > > flow. > > > Also the description omits the execution of before and after statement > triggers. While those can apply to the parent table (but the same > question about what happens if the after statement modifies routing > decision still applies), what does it mean in the case of COPY to have > statement triggers on the child tables? What statement triggers do you mean ? I don't think we have ON COPY triggers ? > You cannot know in advance where > the tuples are going to go and fire the before statement triggers. If > you had to fire after statement triggers, in which order would you fire > them? > >> If the new implementation hides the child tables, > >> > > > > If you hide child tables, you suddenly need a lot of new syntax to > > "unhide" them, so that partitions can be manipulated. Currently it is > > easy to do it with INHERIT / NO INHERIT. > > > Agreed, but I think that we will discover some restrictions that will > apply to child tables. I think we should keep the possibility to populate partitions offline and then plug then into table as partitions (current INHERIT) and also to extract partition into separate table (NO INHERIT). -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Wed, 2009-11-25 at 11:30 -0500, Tom Lane wrote: > It seems like the easiest way to resolve this without weird corner > cases is to say that we fire triggers belonging to the parent table. > The individual partition child tables either shouldn't have triggers > at all, or we should restrict the cases in which those are considered > applicable. Agreed. maybe allow only ROW-level AFTER triggers (for logging late arrivals and updates on tables partitioned on time for example ) > As an example, what are you going to do with statement-level triggers? > Fire them for *every* child whether it receives a row or not? Doesn't > seem like the right thing. > > Again, this solution presupposes an explicit concept of partitioned > tables within the system... For explicit partitioned tables with hidden partitions it is of course best to not add extra effort for allowing triggers to be defined on those (hidden) partitions. If the partition tables are visible, some trigger support would be good. > regards, tom lane -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Wed, Nov 25, 2009 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It seems like the easiest way to resolve this without weird corner > cases is to say that we fire triggers belonging to the parent table. > The individual partition child tables either shouldn't have triggers > at all, or we should restrict the cases in which those are considered > applicable. > > As an example, what are you going to do with statement-level triggers? > Fire them for *every* child whether it receives a row or not? Doesn't > seem like the right thing. Just the tables that get a row? I don't know, your way may be best, but it seems like tables on individual partitions might be useful in some situations. > Again, this solution presupposes an explicit concept of partitioned > tables within the system... ...Robert
Hannu Krosing wrote:<br /><blockquote cite="mid:1259189910.30357.200.camel@hvost1700" type="cite"><pre wrap="">After triggerscan't change tuple, thus cant change routing. </pre></blockquote> An after trigger can always issue an update ofthe tuple but that should be trapped by the regular mechanism that will deal with updates (when we have it available).<br/><blockquote cite="mid:1259189910.30357.200.camel@hvost1700" type="cite"><blockquote type="cite"><pre wrap="">Alsothe description omits the execution of before and after statement triggers. While those can apply to the parent table (but the same question about what happens if the after statement modifies routing decision still applies), what does it mean in the case of COPY to have statement triggers on the child tables? </pre></blockquote><pre wrap=""> What statement triggers do you mean ? I don't think we have ON COPY triggers ? </pre></blockquote> I mean <br /><pre class="SYNOPSIS">CREATE TRIGGER <tt class="REPLACEABLE"><i>name</i></tt>{ BEFORE | AFTER } <tt class="REPLACEABLE"><i>event</i></tt> ON <tt class="REPLACEABLE"><i>table</i></tt>FOR EACH STATEMENT </pre><br /> Emmanuel<br /><pre class="moz-signature" cols="72">-- Emmanuel Cecchet Aster Data Web: <a class="moz-txt-link-freetext" href="http://www.asterdata.com">http://www.asterdata.com</a> </pre>