Thread: Subscription tests fail under CLOBBER_CACHE_ALWAYS
I discovered $SUBJECT after wondering why hyrax hadn't reported in recently, and trying to run check-world under CCA to see if anything got stuck. Indeed it did --- although this doesn't explain the radio silence from hyrax, because that animal doesn't run any TAP tests. (Neither does avocet, which I think is the only other active CCA critter. So this could have been broken for a very long time.) I count three distinct bugs that were exposed by this attempt: 1. In the part of 013_partition.pl that tests firing AFTER triggers on partitioned tables, we have a case of continuing to access a relcache entry that's already been closed. (I'm not quite sure why prion's -DRELCACHE_FORCE_RELEASE hasn't exposed this.) It looks to me like instead we had a relcache reference leak before f3b141c48, but now, the only relcache reference count on a partition child table is dropped by ExecCleanupTupleRouting, which logical/worker.c invokes before it fires triggers on that table. Kaboom. This might go away if worker.c weren't so creatively different from the other code paths concerned with executor shutdown. 2. Said bug causes a segfault in the apply worker process. This causes the parent postmaster to give up and die. I don't understand why we don't treat that like a crash in a regular backend, considering that an apply worker is running largely user-defined code. 3. Once the subscriber1 postmaster has exited, the TAP test will eventually time out, and then this happens: timed out waiting for catchup at t/013_partition.pl line 219. ### Stopping node "publisher" using mode immediate # Running: pg_ctl -D /Users/tgl/pgsql/src/test/subscription/tmp_check/t_013_partition_publisher_data/pgdata -m immediatestop waiting for server to shut down.... done server stopped # No postmaster PID for node "publisher" ### Stopping node "subscriber1" using mode immediate # Running: pg_ctl -D /Users/tgl/pgsql/src/test/subscription/tmp_check/t_013_partition_subscriber1_data/pgdata -m immediatestop pg_ctl: PID file "/Users/tgl/pgsql/src/test/subscription/tmp_check/t_013_partition_subscriber1_data/pgdata/postmaster.pid"does not exist Is server running? Bail out! system pg_ctl failed That is, because we failed to shut down subscriber1, the test script neglects to shut down subscriber2, and now things just sit indefinitely. So that's a robustness problem in the TAP infrastructure, rather than a bug in PG proper; but I still say it's a bug that needs fixing. regards, tom lane
On Tue, May 18, 2021 at 07:42:08PM -0400, Tom Lane wrote: > I count three distinct bugs that were exposed by this attempt: > > 1. In the part of 013_partition.pl that tests firing AFTER > triggers on partitioned tables, we have a case of continuing > to access a relcache entry that's already been closed. > (I'm not quite sure why prion's -DRELCACHE_FORCE_RELEASE > hasn't exposed this.) It looks to me like instead we had > a relcache reference leak before f3b141c48, but now, the > only relcache reference count on a partition child table > is dropped by ExecCleanupTupleRouting, which logical/worker.c > invokes before it fires triggers on that table. Kaboom. > This might go away if worker.c weren't so creatively different > from the other code paths concerned with executor shutdown. The tuple routing has made the whole worker logic messier by a larger degree compared to when this stuff was only able to apply DMLs changes on the partition leaves. I know that it is not that great to be more creative here, but we need to make sure that AfterTriggerEndQuery() is moved before ExecCleanupTupleRouting(). We could either keep the ExecCleanupTupleRouting() calls as they are now, and call AfterTriggerEndQuery() in more code paths. Or we could have one PartitionTupleRouting and one ModifyTableState created beforehand in create_estate_for_relation() if applying the change on a partitioned table but that means manipulating more structures across more layers of this code. Something like the attached fixes the problem for me, but honestly it does not help in clarifying this code more. I was not patient enough to wait for CLOBBER_CACHE_ALWAYS to initialize the nodes in the TAP tests, so I have tested that with a setup initialized with a non-clobber build, and reproduced the problem with CLOBBER_CACHE_ALWAYS builds on those same nodes. You are right that this is not a problem of 14~. I can reproduce the problem on 13 as well, and we have no coverage for tuple routing with triggers on this branch, so this would never have been stressed in the buildfarm. There is a good argument to be made here in cherry-picking 2ecfeda3 to REL_13_STABLE. > 2. Said bug causes a segfault in the apply worker process. > This causes the parent postmaster to give up and die. > I don't understand why we don't treat that like a crash > in a regular backend, considering that an apply worker > is running largely user-defined code. CleanupBackgroundWorker() and CleanupBackend() have a lot of common points. Are you referring to an inconsistent behavior with restart_after_crash that gets ignored for bgworkers? We disable it by default in the TAP tests. > 3. Once the subscriber1 postmaster has exited, the TAP > test will eventually time out, and then this happens: > > [.. logs ..] > > That is, because we failed to shut down subscriber1, the > test script neglects to shut down subscriber2, and now > things just sit indefinitely. So that's a robustness > problem in the TAP infrastructure, rather than a bug in > PG proper; but I still say it's a bug that needs fixing. This one comes down to teardown_node() that uses system_or_bail(), leaving things unfinished. I guess that we could be more aggressive and ignore failures if we have a non-zero error code and that not all the tests have passed within the END block of PostgresNode.pm. -- Michael
Attachment
On Wed, May 19, 2021 at 12:04 PM Michael Paquier <michael@paquier.xyz> wrote: > On Tue, May 18, 2021 at 07:42:08PM -0400, Tom Lane wrote: > > I count three distinct bugs that were exposed by this attempt: > > > > 1. In the part of 013_partition.pl that tests firing AFTER > > triggers on partitioned tables, we have a case of continuing > > to access a relcache entry that's already been closed. > > (I'm not quite sure why prion's -DRELCACHE_FORCE_RELEASE > > hasn't exposed this.) It looks to me like instead we had > > a relcache reference leak before f3b141c48, but now, the > > only relcache reference count on a partition child table > > is dropped by ExecCleanupTupleRouting, which logical/worker.c > > invokes before it fires triggers on that table. Kaboom. Oops. > > This might go away if worker.c weren't so creatively different > > from the other code paths concerned with executor shutdown. > > The tuple routing has made the whole worker logic messier by a larger > degree compared to when this stuff was only able to apply DMLs changes > on the partition leaves. I know that it is not that great to be more > creative here, but we need to make sure that AfterTriggerEndQuery() is > moved before ExecCleanupTupleRouting(). We could either keep the > ExecCleanupTupleRouting() calls as they are now, and call > AfterTriggerEndQuery() in more code paths. Yeah, that's what I thought to propose doing too. Your patch looks enough in that regard. Thanks for writing it. > Or we could have one > PartitionTupleRouting and one ModifyTableState created beforehand > in create_estate_for_relation() if applying the change on a > partitioned table but that means manipulating more structures across > more layers of this code. Yeah, that seems like too much change to me too. > Something like the attached fixes the > problem for me, but honestly it does not help in clarifying this code > more. I was not patient enough to wait for CLOBBER_CACHE_ALWAYS to > initialize the nodes in the TAP tests, so I have tested that with a > setup initialized with a non-clobber build, and reproduced the problem > with CLOBBER_CACHE_ALWAYS builds on those same nodes. I have checked the fix works with a CLOBBER_CACHE_ALWAYS build. > You are right that this is not a problem of 14~. I can reproduce the > problem on 13 as well, and we have no coverage for tuple routing with > triggers on this branch, so this would never have been stressed in the > buildfarm. There is a good argument to be made here in cherry-picking > 2ecfeda3 to REL_13_STABLE. +1 -- Amit Langote EDB: http://www.enterprisedb.com
Michael Paquier <michael@paquier.xyz> writes: > On Tue, May 18, 2021 at 07:42:08PM -0400, Tom Lane wrote: >> This might go away if worker.c weren't so creatively different >> from the other code paths concerned with executor shutdown. > The tuple routing has made the whole worker logic messier by a larger > degree compared to when this stuff was only able to apply DMLs changes > on the partition leaves. I know that it is not that great to be more > creative here, but we need to make sure that AfterTriggerEndQuery() is > moved before ExecCleanupTupleRouting(). We could either keep the > ExecCleanupTupleRouting() calls as they are now, and call > AfterTriggerEndQuery() in more code paths. Or we could have one > PartitionTupleRouting and one ModifyTableState created beforehand > in create_estate_for_relation() if applying the change on a > partitioned table but that means manipulating more structures across > more layers of this code. Something like the attached fixes the > problem for me, but honestly it does not help in clarifying this code > more. I was wondering if we could move the ExecCleanupTupleRouting call into finish_estate. copyfrom.c, for example, does that during its shutdown function. Compare also the worker.c changes proposed in https://www.postgresql.org/message-id/3362608.1621367104%40sss.pgh.pa.us which are because I discovered it's unsafe to pop the snapshot before running AfterTriggerEndQuery. regards, tom lane
On Tue, May 18, 2021 at 11:46:25PM -0400, Tom Lane wrote: > I was wondering if we could move the ExecCleanupTupleRouting call > into finish_estate. copyfrom.c, for example, does that during > its shutdown function. Compare also the worker.c changes proposed > in Yeah, the first patch I wrote for this thread was pushing out PopActiveSnapshot() into the finish() routine, but I really found the creation of the ModifyTableState stuff needed for a partitioned table done in create_estate_for_relation() to make the code more confusing, as that's only a piece needed for the tuple routing path. Saying that, minimizing calls to PopActiveSnapshot() and PushActiveSnapshot() is an improvement. That's why I settled into more calls to AfterTriggerEndQuery() in the 4 code paths of any apply (tuple routing + 3 DMLs). > https://www.postgresql.org/message-id/3362608.1621367104%40sss.pgh.pa.us > > which are because I discovered it's unsafe to pop the snapshot > before running AfterTriggerEndQuery. Didn't remember this one. This reminds me of something similar I did a couple of weeks ago for the worker code, similar to what you have here. Moving the snapshot push to be earlier, as your other patch is doing, was bringing a bit more sanity when it came to opening the indexes of the relation on which a change is applied as we need an active snapshot for predicates and expressions (aka ExecOpenIndices and ExecCloseIndices). -- Michael
Attachment
On Wed, May 19, 2021 at 9:54 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, May 18, 2021 at 11:46:25PM -0400, Tom Lane wrote: > > I was wondering if we could move the ExecCleanupTupleRouting call > > into finish_estate. copyfrom.c, for example, does that during > > its shutdown function. Compare also the worker.c changes proposed > > in > > Yeah, the first patch I wrote for this thread was pushing out > PopActiveSnapshot() into the finish() routine, but I really found the > creation of the ModifyTableState stuff needed for a partitioned table > done in create_estate_for_relation() to make the code more confusing, > as that's only a piece needed for the tuple routing path. > How about moving AfterTriggerEndQuery() to apply_handle_*_internal calls? That way, we might not even need to change Push/Pop calls. -- With Regards, Amit Kapila.
On Wed, May 19, 2021 at 10:26:28AM +0530, Amit Kapila wrote: > How about moving AfterTriggerEndQuery() to apply_handle_*_internal > calls? That way, we might not even need to change Push/Pop calls. Isn't that going to be a problem when a tuple is moved to a new partition in the tuple routing? This does a DELETE followed by an INSERT, but the operation is an UPDATE. -- Michael
Attachment
On Wed, May 19, 2021 at 10:35 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, May 19, 2021 at 10:26:28AM +0530, Amit Kapila wrote: > > How about moving AfterTriggerEndQuery() to apply_handle_*_internal > > calls? That way, we might not even need to change Push/Pop calls. > > Isn't that going to be a problem when a tuple is moved to a new > partition in the tuple routing? > Right, it won't work. -- With Regards, Amit Kapila.
On Wed, May 19, 2021 at 2:05 PM Michael Paquier <michael@paquier.xyz> wrote: > On Wed, May 19, 2021 at 10:26:28AM +0530, Amit Kapila wrote: > > How about moving AfterTriggerEndQuery() to apply_handle_*_internal > > calls? That way, we might not even need to change Push/Pop calls. > > Isn't that going to be a problem when a tuple is moved to a new > partition in the tuple routing? This does a DELETE followed by an > INSERT, but the operation is an UPDATE. That indeed doesn't work. Once AfterTriggerEndQuery() would get called for DELETE from apply_handle_delete_internal(), after triggers of the subsequent INSERT can't be processed, instead causing: ERROR: AfterTriggerSaveEvent() called outside of query IOW, the patch you posted earlier seems like the way to go. -- Amit Langote EDB: http://www.enterprisedb.com
On 5/19/21 1:42 AM, Tom Lane wrote: > I discovered $SUBJECT after wondering why hyrax hadn't reported > in recently, and trying to run check-world under CCA to see if > anything got stuck. Indeed it did --- although this doesn't > explain the radio silence from hyrax, because that animal doesn't > run any TAP tests. (Neither does avocet, which I think is the > only other active CCA critter. So this could have been broken > for a very long time.) > There are three CCA animals on the same box (avocet, husky and trilobite) with different compilers, running in a round-robin manner. One cycle took about 14 days, but about a month ago the machine got stuck, requiring a hard reboot about a week ago (no idea why it got stuck). It has more CPU power now (8 cores instead of 2), so it should take less time to run one test cycle. avocet already ran all the tests, husky is running HEAD at the moment and then it'll be trilobite's turn ... AFAICS none of those runs seems to have failed or got stuck so far. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5/18/21 11:03 PM, Michael Paquier wrote: > >> 3. Once the subscriber1 postmaster has exited, the TAP >> test will eventually time out, and then this happens: >> >> [.. logs ..] >> >> That is, because we failed to shut down subscriber1, the >> test script neglects to shut down subscriber2, and now >> things just sit indefinitely. So that's a robustness >> problem in the TAP infrastructure, rather than a bug in >> PG proper; but I still say it's a bug that needs fixing. > This one comes down to teardown_node() that uses system_or_bail(), > leaving things unfinished. I guess that we could be more aggressive > and ignore failures if we have a non-zero error code and that not all > the tests have passed within the END block of PostgresNode.pm. Yeah, this area needs substantial improvement. I have seen similar sorts of nasty hangs, where the script is waiting forever for some process that hasn't got the shutdown message. At least we probably need some way of making sure the END handler doesn't abort early. Maybe PostgresNode::stop() needs a mode that handles failure more gracefully. Maybe it needs to try shutting down all the nodes and only calling BAIL_OUT after trying all of them and getting a failure. But that might still leave us work to do on failures occuring pre-END. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes: > IOW, the patch you posted earlier seems like the way to go. I really dislike that patch. I think it's doubling down on the messy, unstructured coding patterns that got us into this situation to begin with. I'd prefer to expend a little effort on refactoring so that the ExecCleanupTupleRouting call can be moved to the cleanup function where it belongs. So, I propose the attached, which invents a new struct to carry the stuff we've discovered to be necessary. This makes the APIs noticeably cleaner IMHO. I did not touch the APIs of the apply_XXX_internal functions, as it didn't really seem to offer any notational advantage. We can't simply collapse them to take an "edata" as I did for apply_handle_tuple_routing, because the ResultRelInfo they're supposed to operate on could be different from the original one. I considered a couple of alternatives: * Replace their estate arguments with edata, but keep the separate ResultRelInfo arguments. This might be worth doing in future, if we add more fields to ApplyExecutionData. Right now it'd save nothing, and it'd create a risk of confusion about when to use the ResultRelInfo argument vs. edata->resultRelInfo. * Allow apply_handle_tuple_routing to overwrite edata->resultRelInfo with the partition child's RRI, then simplify the apply_XXX_internal functions to take just edata instead of separate estate and resultRelInfo args. I think this would work right now, but it seems grotty, and it might cause problems in future. * Replace the edata->resultRelInfo field with two fields, one for the original parent and one for the actual/current target. Perhaps this is worth doing, not sure. Thoughts? regards, tom lane diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 1432554d5a..c51a83f797 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -178,6 +178,14 @@ static HTAB *xidhash = NULL; /* BufFile handle of the current streaming file */ static BufFile *stream_fd = NULL; +typedef struct ApplyExecutionData +{ + EState *estate; /* always needed */ + ResultRelInfo *resultRelInfo; /* always needed */ + ModifyTableState *mtstate; /* used for partitioned target rel */ + PartitionTupleRouting *proute; /* used for partitioned target rel */ +} ApplyExecutionData; + typedef struct SubXactInfo { TransactionId xid; /* XID of the subxact */ @@ -239,8 +247,7 @@ static bool FindReplTupleInLocalRel(EState *estate, Relation localrel, LogicalRepRelation *remoterel, TupleTableSlot *remoteslot, TupleTableSlot **localslot); -static void apply_handle_tuple_routing(ResultRelInfo *relinfo, - EState *estate, +static void apply_handle_tuple_routing(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, LogicalRepRelMapEntry *relmapentry, @@ -336,20 +343,21 @@ handle_streamed_transaction(LogicalRepMsgType action, StringInfo s) /* * Executor state preparation for evaluation of constraint expressions, - * indexes and triggers. + * indexes and triggers for the specified relation. * - * resultRelInfo is a ResultRelInfo for the relation to be passed to the - * executor routines. The caller must open and close any indexes to be - * updated independently of the relation registered here. + * Note that the caller must open and close any indexes to be updated. */ -static EState * -create_estate_for_relation(LogicalRepRelMapEntry *rel, - ResultRelInfo **resultRelInfo) +static ApplyExecutionData * +create_edata_for_relation(LogicalRepRelMapEntry *rel) { + ApplyExecutionData *edata; EState *estate; RangeTblEntry *rte; + ResultRelInfo *resultRelInfo; - estate = CreateExecutorState(); + edata = (ApplyExecutionData *) palloc0(sizeof(ApplyExecutionData)); + + edata->estate = estate = CreateExecutorState(); rte = makeNode(RangeTblEntry); rte->rtekind = RTE_RELATION; @@ -358,13 +366,13 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel, rte->rellockmode = AccessShareLock; ExecInitRangeTable(estate, list_make1(rte)); - *resultRelInfo = makeNode(ResultRelInfo); + edata->resultRelInfo = resultRelInfo = makeNode(ResultRelInfo); /* * Use Relation opened by logicalrep_rel_open() instead of opening it * again. */ - InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0); + InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0); /* * We put the ResultRelInfo in the es_opened_result_relations list, even @@ -377,29 +385,38 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel, * an apply operation being responsible for that. */ estate->es_opened_result_relations = - lappend(estate->es_opened_result_relations, *resultRelInfo); + lappend(estate->es_opened_result_relations, resultRelInfo); estate->es_output_cid = GetCurrentCommandId(true); /* Prepare to catch AFTER triggers. */ AfterTriggerBeginQuery(); - return estate; + /* other fields of edata remain NULL for now */ + + return edata; } /* * Finish any operations related to the executor state created by - * create_estate_for_relation(). + * create_edata_for_relation(). */ static void -finish_estate(EState *estate) +finish_edata(ApplyExecutionData *edata) { + EState *estate = edata->estate; + /* Handle any queued AFTER triggers. */ AfterTriggerEndQuery(estate); + /* Shut down tuple routing, if any was done. */ + if (edata->proute) + ExecCleanupTupleRouting(edata->mtstate, edata->proute); + /* Cleanup. */ ExecResetTupleTable(estate->es_tupleTable, false); FreeExecutorState(estate); + pfree(edata); } /* @@ -1181,10 +1198,10 @@ GetRelationIdentityOrPK(Relation rel) static void apply_handle_insert(StringInfo s) { - ResultRelInfo *resultRelInfo; LogicalRepRelMapEntry *rel; LogicalRepTupleData newtup; LogicalRepRelId relid; + ApplyExecutionData *edata; EState *estate; TupleTableSlot *remoteslot; MemoryContext oldctx; @@ -1207,7 +1224,8 @@ apply_handle_insert(StringInfo s) } /* Initialize the executor state. */ - estate = create_estate_for_relation(rel, &resultRelInfo); + edata = create_edata_for_relation(rel); + estate = edata->estate; remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); @@ -1223,15 +1241,15 @@ apply_handle_insert(StringInfo s) /* For a partitioned table, insert the tuple into a partition. */ if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - apply_handle_tuple_routing(resultRelInfo, estate, + apply_handle_tuple_routing(edata, remoteslot, NULL, rel, CMD_INSERT); else - apply_handle_insert_internal(resultRelInfo, estate, + apply_handle_insert_internal(edata->resultRelInfo, estate, remoteslot); PopActiveSnapshot(); - finish_estate(estate); + finish_edata(edata); logicalrep_rel_close(rel, NoLock); @@ -1293,9 +1311,9 @@ check_relation_updatable(LogicalRepRelMapEntry *rel) static void apply_handle_update(StringInfo s) { - ResultRelInfo *resultRelInfo; LogicalRepRelMapEntry *rel; LogicalRepRelId relid; + ApplyExecutionData *edata; EState *estate; LogicalRepTupleData oldtup; LogicalRepTupleData newtup; @@ -1326,7 +1344,8 @@ apply_handle_update(StringInfo s) check_relation_updatable(rel); /* Initialize the executor state. */ - estate = create_estate_for_relation(rel, &resultRelInfo); + edata = create_edata_for_relation(rel); + estate = edata->estate; remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); @@ -1368,15 +1387,15 @@ apply_handle_update(StringInfo s) /* For a partitioned table, apply update to correct partition. */ if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - apply_handle_tuple_routing(resultRelInfo, estate, + apply_handle_tuple_routing(edata, remoteslot, &newtup, rel, CMD_UPDATE); else - apply_handle_update_internal(resultRelInfo, estate, + apply_handle_update_internal(edata->resultRelInfo, estate, remoteslot, &newtup, rel); PopActiveSnapshot(); - finish_estate(estate); + finish_edata(edata); logicalrep_rel_close(rel, NoLock); @@ -1448,10 +1467,10 @@ apply_handle_update_internal(ResultRelInfo *relinfo, static void apply_handle_delete(StringInfo s) { - ResultRelInfo *resultRelInfo; LogicalRepRelMapEntry *rel; LogicalRepTupleData oldtup; LogicalRepRelId relid; + ApplyExecutionData *edata; EState *estate; TupleTableSlot *remoteslot; MemoryContext oldctx; @@ -1477,7 +1496,8 @@ apply_handle_delete(StringInfo s) check_relation_updatable(rel); /* Initialize the executor state. */ - estate = create_estate_for_relation(rel, &resultRelInfo); + edata = create_edata_for_relation(rel); + estate = edata->estate; remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); @@ -1491,15 +1511,15 @@ apply_handle_delete(StringInfo s) /* For a partitioned table, apply delete to correct partition. */ if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - apply_handle_tuple_routing(resultRelInfo, estate, + apply_handle_tuple_routing(edata, remoteslot, NULL, rel, CMD_DELETE); else - apply_handle_delete_internal(resultRelInfo, estate, + apply_handle_delete_internal(edata->resultRelInfo, estate, remoteslot, &rel->remoterel); PopActiveSnapshot(); - finish_estate(estate); + finish_edata(edata); logicalrep_rel_close(rel, NoLock); @@ -1582,16 +1602,17 @@ FindReplTupleInLocalRel(EState *estate, Relation localrel, * This handles insert, update, delete on a partitioned table. */ static void -apply_handle_tuple_routing(ResultRelInfo *relinfo, - EState *estate, +apply_handle_tuple_routing(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, LogicalRepRelMapEntry *relmapentry, CmdType operation) { + EState *estate = edata->estate; + ResultRelInfo *relinfo = edata->resultRelInfo; Relation parentrel = relinfo->ri_RelationDesc; - ModifyTableState *mtstate = NULL; - PartitionTupleRouting *proute = NULL; + ModifyTableState *mtstate; + PartitionTupleRouting *proute; ResultRelInfo *partrelinfo; Relation partrel; TupleTableSlot *remoteslot_part; @@ -1599,12 +1620,14 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, MemoryContext oldctx; /* ModifyTableState is needed for ExecFindPartition(). */ - mtstate = makeNode(ModifyTableState); + edata->mtstate = mtstate = makeNode(ModifyTableState); mtstate->ps.plan = NULL; mtstate->ps.state = estate; mtstate->operation = operation; mtstate->resultRelInfo = relinfo; - proute = ExecSetupPartitionTupleRouting(estate, parentrel); + + /* ... as is PartitionTupleRouting. */ + edata->proute = proute = ExecSetupPartitionTupleRouting(estate, parentrel); /* * Find the partition to which the "search tuple" belongs. @@ -1797,8 +1820,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, elog(ERROR, "unrecognized CmdType: %d", (int) operation); break; } - - ExecCleanupTupleRouting(mtstate, proute); } /*
On Wed, May 19, 2021 at 04:23:55PM -0400, Tom Lane wrote: > I really dislike that patch. I think it's doubling down on the messy, > unstructured coding patterns that got us into this situation to begin > with. I'd prefer to expend a little effort on refactoring so that > the ExecCleanupTupleRouting call can be moved to the cleanup function > where it belongs. Okay. > I did not touch the APIs of the apply_XXX_internal functions, > as it didn't really seem to offer any notational advantage. > We can't simply collapse them to take an "edata" as I did for > apply_handle_tuple_routing, because the ResultRelInfo they're > supposed to operate on could be different from the original one. > I considered a couple of alternatives: > > * Replace their estate arguments with edata, but keep the separate > ResultRelInfo arguments. This might be worth doing in future, if we > add more fields to ApplyExecutionData. Right now it'd save nothing, > and it'd create a risk of confusion about when to use the > ResultRelInfo argument vs. edata->resultRelInfo. Not sure about this one. It may be better to wait until this gets more expanded, if it gets expanded. > * Allow apply_handle_tuple_routing to overwrite edata->resultRelInfo > with the partition child's RRI, then simplify the apply_XXX_internal > functions to take just edata instead of separate estate and > resultRelInfo args. I think this would work right now, but it seems > grotty, and it might cause problems in future. Agreed that it does not seem like a good idea to blindly overwrite resultRelInfo with the partition targetted for the apply. > * Replace the edata->resultRelInfo field with two fields, one for > the original parent and one for the actual/current target. Perhaps > this is worth doing, not sure. This one sounds more natural to me, though. > Thoughts? May I ask why you are not moving the snapshot pop and push into the finish() and create() routines for this patch? Also, any thoughts about adding the trigger tests from 013_partition.pl to REL_13_STABLE? -- Michael
Attachment
On Wed, May 19, 2021 at 02:36:03PM -0400, Andrew Dunstan wrote: > Yeah, this area needs substantial improvement. I have seen similar sorts > of nasty hangs, where the script is waiting forever for some process > that hasn't got the shutdown message. At least we probably need some way > of making sure the END handler doesn't abort early. Maybe > PostgresNode::stop() needs a mode that handles failure more gracefully. > Maybe it needs to try shutting down all the nodes and only calling > BAIL_OUT after trying all of them and getting a failure. But that might > still leave us work to do on failures occuring pre-END. For that, we could just make the END block called run_log() directly as well, as this catches stderr and an error code. What about making the shutdown a two-phase logic by the way? Trigger an immediate stop, and if it fails fallback to an extra kill9() to be on the safe side. Have you seen this being a problem even in cases where the tests all passed? If yes, it may be worth using the more aggressive flow even in the case where the tests pass. -- Michael
Attachment
On Thu, May 20, 2021 at 5:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > IOW, the patch you posted earlier seems like the way to go. > > I really dislike that patch. I think it's doubling down on the messy, > unstructured coding patterns that got us into this situation to begin > with. I'd prefer to expend a little effort on refactoring so that > the ExecCleanupTupleRouting call can be moved to the cleanup function > where it belongs. > > So, I propose the attached, which invents a new struct to carry > the stuff we've discovered to be necessary. This makes the APIs > noticeably cleaner IMHO. Larger footprint, but definitely cleaner. Thanks. > I did not touch the APIs of the apply_XXX_internal functions, > as it didn't really seem to offer any notational advantage. > We can't simply collapse them to take an "edata" as I did for > apply_handle_tuple_routing, because the ResultRelInfo they're > supposed to operate on could be different from the original one. > I considered a couple of alternatives: > > * Replace their estate arguments with edata, but keep the separate > ResultRelInfo arguments. This might be worth doing in future, if we > add more fields to ApplyExecutionData. Right now it'd save nothing, > and it'd create a risk of confusion about when to use the > ResultRelInfo argument vs. edata->resultRelInfo. > > * Allow apply_handle_tuple_routing to overwrite edata->resultRelInfo > with the partition child's RRI, then simplify the apply_XXX_internal > functions to take just edata instead of separate estate and > resultRelInfo args. I think this would work right now, but it seems > grotty, and it might cause problems in future. > > * Replace the edata->resultRelInfo field with two fields, one for > the original parent and one for the actual/current target. Perhaps > this is worth doing, not sure. > > Thoughts? IMHO, it would be better to keep the lowest-level apply_handle_XXX_internal() out of this, because presumably we're only cleaning up the mess in higher-level callers. Somewhat related, one of the intentions behind a04daa97a43, which removed es_result_relation_info in favor of passing the ResultRelInfo explicitly to the executor's lower-level functions, was to avoid bugs caused by failing to set/reset that global field correctly in higher-level callers. Now worker.c is pretty small compared with the executor, but still it seems like a good idea to follow the same principle here. -- Amit Langote EDB: http://www.enterprisedb.com
Michael Paquier <michael@paquier.xyz> writes: > On Wed, May 19, 2021 at 04:23:55PM -0400, Tom Lane wrote: >> * Replace the edata->resultRelInfo field with two fields, one for >> the original parent and one for the actual/current target. Perhaps >> this is worth doing, not sure. > This one sounds more natural to me, though. OK, I'll give that a try tomorrow. > May I ask why you are not moving the snapshot pop and push into the > finish() and create() routines for this patch? That does need to happen, but I figured I'd leave it to the other patch, since there are other things to change too for that snapshot problem. Obviously, whichever patch goes in second will need trivial adjustments, but I think it's logically clearer that way. > Also, any thoughts > about adding the trigger tests from 013_partition.pl to REL_13_STABLE? Yeah, if this is a pre-existing problem then we should back-port the tests that revealed it. regards, tom lane
I wrote: > Michael Paquier <michael@paquier.xyz> writes: >> On Wed, May 19, 2021 at 04:23:55PM -0400, Tom Lane wrote: >>> * Replace the edata->resultRelInfo field with two fields, one for >>> the original parent and one for the actual/current target. Perhaps >>> this is worth doing, not sure. >> This one sounds more natural to me, though. > OK, I'll give that a try tomorrow. Here's a version that does it like that. I'm not entirely convinced whether this is better or not. regards, tom lane diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 1432554d5a..d19269ebce 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -178,6 +178,18 @@ static HTAB *xidhash = NULL; /* BufFile handle of the current streaming file */ static BufFile *stream_fd = NULL; +typedef struct ApplyExecutionData +{ + EState *estate; /* executor state, used to track resources */ + ResultRelInfo *origRelInfo; /* the originally-named target relation */ + ResultRelInfo *activeRelInfo; /* the actual target relation */ + /* activeRelInfo can equal origRelInfo, or be for a partition of it */ + + /* These fields are used when the target relation is partitioned: */ + ModifyTableState *mtstate; /* dummy ModifyTable state */ + PartitionTupleRouting *proute; /* partition routing info */ +} ApplyExecutionData; + typedef struct SubXactInfo { TransactionId xid; /* XID of the subxact */ @@ -226,21 +238,20 @@ static void apply_dispatch(StringInfo s); static void apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data); -static void apply_handle_insert_internal(ResultRelInfo *relinfo, - EState *estate, TupleTableSlot *remoteslot); -static void apply_handle_update_internal(ResultRelInfo *relinfo, - EState *estate, TupleTableSlot *remoteslot, +static void apply_handle_insert_internal(ApplyExecutionData *edata, + TupleTableSlot *remoteslot); +static void apply_handle_update_internal(ApplyExecutionData *edata, + TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, LogicalRepRelMapEntry *relmapentry); -static void apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate, +static void apply_handle_delete_internal(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepRelation *remoterel); static bool FindReplTupleInLocalRel(EState *estate, Relation localrel, LogicalRepRelation *remoterel, TupleTableSlot *remoteslot, TupleTableSlot **localslot); -static void apply_handle_tuple_routing(ResultRelInfo *relinfo, - EState *estate, +static void apply_handle_tuple_routing(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, LogicalRepRelMapEntry *relmapentry, @@ -336,20 +347,21 @@ handle_streamed_transaction(LogicalRepMsgType action, StringInfo s) /* * Executor state preparation for evaluation of constraint expressions, - * indexes and triggers. + * indexes and triggers for the specified relation. * - * resultRelInfo is a ResultRelInfo for the relation to be passed to the - * executor routines. The caller must open and close any indexes to be - * updated independently of the relation registered here. + * Note that the caller must open and close any indexes to be updated. */ -static EState * -create_estate_for_relation(LogicalRepRelMapEntry *rel, - ResultRelInfo **resultRelInfo) +static ApplyExecutionData * +create_edata_for_relation(LogicalRepRelMapEntry *rel) { + ApplyExecutionData *edata; EState *estate; RangeTblEntry *rte; + ResultRelInfo *resultRelInfo; - estate = CreateExecutorState(); + edata = (ApplyExecutionData *) palloc0(sizeof(ApplyExecutionData)); + + edata->estate = estate = CreateExecutorState(); rte = makeNode(RangeTblEntry); rte->rtekind = RTE_RELATION; @@ -358,13 +370,16 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel, rte->rellockmode = AccessShareLock; ExecInitRangeTable(estate, list_make1(rte)); - *resultRelInfo = makeNode(ResultRelInfo); + edata->origRelInfo = resultRelInfo = makeNode(ResultRelInfo); + + /* Initially, set activeRelInfo to be the named rel */ + edata->activeRelInfo = resultRelInfo; /* * Use Relation opened by logicalrep_rel_open() instead of opening it * again. */ - InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0); + InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0); /* * We put the ResultRelInfo in the es_opened_result_relations list, even @@ -377,29 +392,38 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel, * an apply operation being responsible for that. */ estate->es_opened_result_relations = - lappend(estate->es_opened_result_relations, *resultRelInfo); + lappend(estate->es_opened_result_relations, resultRelInfo); estate->es_output_cid = GetCurrentCommandId(true); /* Prepare to catch AFTER triggers. */ AfterTriggerBeginQuery(); - return estate; + /* other fields of edata remain NULL for now */ + + return edata; } /* * Finish any operations related to the executor state created by - * create_estate_for_relation(). + * create_edata_for_relation(). */ static void -finish_estate(EState *estate) +finish_edata(ApplyExecutionData *edata) { + EState *estate = edata->estate; + /* Handle any queued AFTER triggers. */ AfterTriggerEndQuery(estate); + /* Shut down tuple routing, if any was done. */ + if (edata->proute) + ExecCleanupTupleRouting(edata->mtstate, edata->proute); + /* Cleanup. */ ExecResetTupleTable(estate->es_tupleTable, false); FreeExecutorState(estate); + pfree(edata); } /* @@ -1181,10 +1205,10 @@ GetRelationIdentityOrPK(Relation rel) static void apply_handle_insert(StringInfo s) { - ResultRelInfo *resultRelInfo; LogicalRepRelMapEntry *rel; LogicalRepTupleData newtup; LogicalRepRelId relid; + ApplyExecutionData *edata; EState *estate; TupleTableSlot *remoteslot; MemoryContext oldctx; @@ -1207,7 +1231,8 @@ apply_handle_insert(StringInfo s) } /* Initialize the executor state. */ - estate = create_estate_for_relation(rel, &resultRelInfo); + edata = create_edata_for_relation(rel); + estate = edata->estate; remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); @@ -1223,15 +1248,14 @@ apply_handle_insert(StringInfo s) /* For a partitioned table, insert the tuple into a partition. */ if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - apply_handle_tuple_routing(resultRelInfo, estate, + apply_handle_tuple_routing(edata, remoteslot, NULL, rel, CMD_INSERT); else - apply_handle_insert_internal(resultRelInfo, estate, - remoteslot); + apply_handle_insert_internal(edata, remoteslot); PopActiveSnapshot(); - finish_estate(estate); + finish_edata(edata); logicalrep_rel_close(rel, NoLock); @@ -1240,9 +1264,13 @@ apply_handle_insert(StringInfo s) /* Workhorse for apply_handle_insert() */ static void -apply_handle_insert_internal(ResultRelInfo *relinfo, - EState *estate, TupleTableSlot *remoteslot) +apply_handle_insert_internal(ApplyExecutionData *edata, + TupleTableSlot *remoteslot) { + EState *estate = edata->estate; + ResultRelInfo *relinfo = edata->activeRelInfo; + + /* We must open indexes here. */ ExecOpenIndices(relinfo, false); /* Do the insert. */ @@ -1293,9 +1321,9 @@ check_relation_updatable(LogicalRepRelMapEntry *rel) static void apply_handle_update(StringInfo s) { - ResultRelInfo *resultRelInfo; LogicalRepRelMapEntry *rel; LogicalRepRelId relid; + ApplyExecutionData *edata; EState *estate; LogicalRepTupleData oldtup; LogicalRepTupleData newtup; @@ -1326,7 +1354,8 @@ apply_handle_update(StringInfo s) check_relation_updatable(rel); /* Initialize the executor state. */ - estate = create_estate_for_relation(rel, &resultRelInfo); + edata = create_edata_for_relation(rel); + estate = edata->estate; remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); @@ -1368,15 +1397,15 @@ apply_handle_update(StringInfo s) /* For a partitioned table, apply update to correct partition. */ if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - apply_handle_tuple_routing(resultRelInfo, estate, + apply_handle_tuple_routing(edata, remoteslot, &newtup, rel, CMD_UPDATE); else - apply_handle_update_internal(resultRelInfo, estate, + apply_handle_update_internal(edata, remoteslot, &newtup, rel); PopActiveSnapshot(); - finish_estate(estate); + finish_edata(edata); logicalrep_rel_close(rel, NoLock); @@ -1385,11 +1414,13 @@ apply_handle_update(StringInfo s) /* Workhorse for apply_handle_update() */ static void -apply_handle_update_internal(ResultRelInfo *relinfo, - EState *estate, TupleTableSlot *remoteslot, +apply_handle_update_internal(ApplyExecutionData *edata, + TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, LogicalRepRelMapEntry *relmapentry) { + EState *estate = edata->estate; + ResultRelInfo *relinfo = edata->activeRelInfo; Relation localrel = relinfo->ri_RelationDesc; EPQState epqstate; TupleTableSlot *localslot; @@ -1448,10 +1479,10 @@ apply_handle_update_internal(ResultRelInfo *relinfo, static void apply_handle_delete(StringInfo s) { - ResultRelInfo *resultRelInfo; LogicalRepRelMapEntry *rel; LogicalRepTupleData oldtup; LogicalRepRelId relid; + ApplyExecutionData *edata; EState *estate; TupleTableSlot *remoteslot; MemoryContext oldctx; @@ -1477,7 +1508,8 @@ apply_handle_delete(StringInfo s) check_relation_updatable(rel); /* Initialize the executor state. */ - estate = create_estate_for_relation(rel, &resultRelInfo); + edata = create_edata_for_relation(rel); + estate = edata->estate; remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); @@ -1491,15 +1523,14 @@ apply_handle_delete(StringInfo s) /* For a partitioned table, apply delete to correct partition. */ if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - apply_handle_tuple_routing(resultRelInfo, estate, + apply_handle_tuple_routing(edata, remoteslot, NULL, rel, CMD_DELETE); else - apply_handle_delete_internal(resultRelInfo, estate, - remoteslot, &rel->remoterel); + apply_handle_delete_internal(edata, remoteslot, &rel->remoterel); PopActiveSnapshot(); - finish_estate(estate); + finish_edata(edata); logicalrep_rel_close(rel, NoLock); @@ -1508,10 +1539,12 @@ apply_handle_delete(StringInfo s) /* Workhorse for apply_handle_delete() */ static void -apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate, +apply_handle_delete_internal(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepRelation *remoterel) { + EState *estate = edata->estate; + ResultRelInfo *relinfo = edata->activeRelInfo; Relation localrel = relinfo->ri_RelationDesc; EPQState epqstate; TupleTableSlot *localslot; @@ -1582,16 +1615,17 @@ FindReplTupleInLocalRel(EState *estate, Relation localrel, * This handles insert, update, delete on a partitioned table. */ static void -apply_handle_tuple_routing(ResultRelInfo *relinfo, - EState *estate, +apply_handle_tuple_routing(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, LogicalRepRelMapEntry *relmapentry, CmdType operation) { + EState *estate = edata->estate; + ResultRelInfo *relinfo = edata->origRelInfo; Relation parentrel = relinfo->ri_RelationDesc; - ModifyTableState *mtstate = NULL; - PartitionTupleRouting *proute = NULL; + ModifyTableState *mtstate; + PartitionTupleRouting *proute; ResultRelInfo *partrelinfo; Relation partrel; TupleTableSlot *remoteslot_part; @@ -1599,12 +1633,14 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, MemoryContext oldctx; /* ModifyTableState is needed for ExecFindPartition(). */ - mtstate = makeNode(ModifyTableState); + edata->mtstate = mtstate = makeNode(ModifyTableState); mtstate->ps.plan = NULL; mtstate->ps.state = estate; mtstate->operation = operation; mtstate->resultRelInfo = relinfo; - proute = ExecSetupPartitionTupleRouting(estate, parentrel); + + /* ... as is PartitionTupleRouting. */ + edata->proute = proute = ExecSetupPartitionTupleRouting(estate, parentrel); /* * Find the partition to which the "search tuple" belongs. @@ -1616,6 +1652,9 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, Assert(partrelinfo != NULL); partrel = partrelinfo->ri_RelationDesc; + /* Make that partition the active target rel. */ + edata->activeRelInfo = partrelinfo; + /* * To perform any of the operations below, the tuple must match the * partition's rowtype. Convert if needed or just copy, using a dedicated @@ -1638,12 +1677,12 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, switch (operation) { case CMD_INSERT: - apply_handle_insert_internal(partrelinfo, estate, + apply_handle_insert_internal(edata, remoteslot_part); break; case CMD_DELETE: - apply_handle_delete_internal(partrelinfo, estate, + apply_handle_delete_internal(edata, remoteslot_part, &relmapentry->remoterel); break; @@ -1757,11 +1796,12 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, Assert(partrelinfo_new != partrelinfo); /* DELETE old tuple found in the old partition. */ - apply_handle_delete_internal(partrelinfo, estate, + apply_handle_delete_internal(edata, localslot, &relmapentry->remoterel); /* INSERT new tuple into the new partition. */ + edata->activeRelInfo = partrelinfo_new; /* * Convert the replacement tuple to match the destination @@ -1787,7 +1827,7 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, slot_getallattrs(remoteslot); } MemoryContextSwitchTo(oldctx); - apply_handle_insert_internal(partrelinfo_new, estate, + apply_handle_insert_internal(edata, remoteslot_part); } } @@ -1797,8 +1837,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, elog(ERROR, "unrecognized CmdType: %d", (int) operation); break; } - - ExecCleanupTupleRouting(mtstate, proute); } /*
On Thu, May 20, 2021 at 02:57:40PM -0400, Tom Lane wrote: > Here's a version that does it like that. I'm not entirely convinced > whether this is better or not. Hmm. I think that this is better. This makes the code easier to follow, and the extra information is useful for debugging. The change looks good to me. -- Michael
Attachment
On Fri, May 21, 2021 at 6:29 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, May 20, 2021 at 02:57:40PM -0400, Tom Lane wrote: > > Here's a version that does it like that. I'm not entirely convinced > > whether this is better or not. > > Hmm. I think that this is better. This makes the code easier to > follow, and the extra information is useful for debugging. > > The change looks good to me. > Yeah, the change looks good to me as well but I think we should consider Amit L's point that maintaining this extra activeRelInfo might be prone to bugs if the partitioning logic needs to be extended at other places in the worker.c. As the code stands today, it doesn't seem problematic so we can go with the second patch if both Tom and you feel that is a better option. -- With Regards, Amit Kapila.
Amit Langote <amitlangote09@gmail.com> writes: > IMHO, it would be better to keep the lowest-level > apply_handle_XXX_internal() out of this, because presumably we're only > cleaning up the mess in higher-level callers. Somewhat related, one > of the intentions behind a04daa97a43, which removed > es_result_relation_info in favor of passing the ResultRelInfo > explicitly to the executor's lower-level functions, was to avoid bugs > caused by failing to set/reset that global field correctly in > higher-level callers. Yeah, that's a fair point, and after some reflection I think that repeatedly changing the "active" field of the struct is exactly what was bothering me about the v2 patch. So in the attached v3, I went back to passing that as an explicit argument. The state struct now has no fields that need to change after first being set. I did notice that we could remove some other random arguments by adding the LogicalRepRelMapEntry* to the state struct, so this also does that. regards, tom lane diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 452e5f3df7..6a30662f37 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -178,6 +178,18 @@ static HTAB *xidhash = NULL; /* BufFile handle of the current streaming file */ static BufFile *stream_fd = NULL; +typedef struct ApplyExecutionData +{ + EState *estate; /* executor state, used to track resources */ + + LogicalRepRelMapEntry *targetRel; /* replication target rel */ + ResultRelInfo *targetRelInfo; /* ResultRelInfo for same */ + + /* These fields are used when the target relation is partitioned: */ + ModifyTableState *mtstate; /* dummy ModifyTable state */ + PartitionTupleRouting *proute; /* partition routing info */ +} ApplyExecutionData; + typedef struct SubXactInfo { TransactionId xid; /* XID of the subxact */ @@ -226,24 +238,23 @@ static void apply_dispatch(StringInfo s); static void apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data); -static void apply_handle_insert_internal(ResultRelInfo *relinfo, - EState *estate, TupleTableSlot *remoteslot); -static void apply_handle_update_internal(ResultRelInfo *relinfo, - EState *estate, TupleTableSlot *remoteslot, - LogicalRepTupleData *newtup, - LogicalRepRelMapEntry *relmapentry); -static void apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate, +static void apply_handle_insert_internal(ApplyExecutionData *edata, + ResultRelInfo *relinfo, + TupleTableSlot *remoteslot); +static void apply_handle_update_internal(ApplyExecutionData *edata, + ResultRelInfo *relinfo, TupleTableSlot *remoteslot, - LogicalRepRelation *remoterel); + LogicalRepTupleData *newtup); +static void apply_handle_delete_internal(ApplyExecutionData *edata, + ResultRelInfo *relinfo, + TupleTableSlot *remoteslot); static bool FindReplTupleInLocalRel(EState *estate, Relation localrel, LogicalRepRelation *remoterel, TupleTableSlot *remoteslot, TupleTableSlot **localslot); -static void apply_handle_tuple_routing(ResultRelInfo *relinfo, - EState *estate, +static void apply_handle_tuple_routing(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, - LogicalRepRelMapEntry *relmapentry, CmdType operation); /* @@ -336,18 +347,17 @@ handle_streamed_transaction(LogicalRepMsgType action, StringInfo s) /* * Executor state preparation for evaluation of constraint expressions, - * indexes and triggers. + * indexes and triggers for the specified relation. * - * resultRelInfo is a ResultRelInfo for the relation to be passed to the - * executor routines. The caller must open and close any indexes to be - * updated independently of the relation registered here. + * Note that the caller must open and close any indexes to be updated. */ -static EState * -create_estate_for_relation(LogicalRepRelMapEntry *rel, - ResultRelInfo **resultRelInfo) +static ApplyExecutionData * +create_edata_for_relation(LogicalRepRelMapEntry *rel) { + ApplyExecutionData *edata; EState *estate; RangeTblEntry *rte; + ResultRelInfo *resultRelInfo; /* * Input functions may need an active snapshot, as may AFTER triggers @@ -356,7 +366,10 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel, */ PushActiveSnapshot(GetTransactionSnapshot()); - estate = CreateExecutorState(); + edata = (ApplyExecutionData *) palloc0(sizeof(ApplyExecutionData)); + edata->targetRel = rel; + + edata->estate = estate = CreateExecutorState(); rte = makeNode(RangeTblEntry); rte->rtekind = RTE_RELATION; @@ -365,13 +378,13 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel, rte->rellockmode = AccessShareLock; ExecInitRangeTable(estate, list_make1(rte)); - *resultRelInfo = makeNode(ResultRelInfo); + edata->targetRelInfo = resultRelInfo = makeNode(ResultRelInfo); /* * Use Relation opened by logicalrep_rel_open() instead of opening it * again. */ - InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0); + InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0); /* * We put the ResultRelInfo in the es_opened_result_relations list, even @@ -384,29 +397,39 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel, * an apply operation being responsible for that. */ estate->es_opened_result_relations = - lappend(estate->es_opened_result_relations, *resultRelInfo); + lappend(estate->es_opened_result_relations, resultRelInfo); estate->es_output_cid = GetCurrentCommandId(true); /* Prepare to catch AFTER triggers. */ AfterTriggerBeginQuery(); - return estate; + /* other fields of edata remain NULL for now */ + + return edata; } /* * Finish any operations related to the executor state created by - * create_estate_for_relation(). + * create_edata_for_relation(). */ static void -finish_estate(EState *estate) +finish_edata(ApplyExecutionData *edata) { + EState *estate = edata->estate; + /* Handle any queued AFTER triggers. */ AfterTriggerEndQuery(estate); + /* Shut down tuple routing, if any was done. */ + if (edata->proute) + ExecCleanupTupleRouting(edata->mtstate, edata->proute); + /* Cleanup. */ ExecResetTupleTable(estate->es_tupleTable, false); FreeExecutorState(estate); + pfree(edata); + PopActiveSnapshot(); } @@ -1189,10 +1212,10 @@ GetRelationIdentityOrPK(Relation rel) static void apply_handle_insert(StringInfo s) { - ResultRelInfo *resultRelInfo; LogicalRepRelMapEntry *rel; LogicalRepTupleData newtup; LogicalRepRelId relid; + ApplyExecutionData *edata; EState *estate; TupleTableSlot *remoteslot; MemoryContext oldctx; @@ -1215,7 +1238,8 @@ apply_handle_insert(StringInfo s) } /* Initialize the executor state. */ - estate = create_estate_for_relation(rel, &resultRelInfo); + edata = create_edata_for_relation(rel); + estate = edata->estate; remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); @@ -1228,24 +1252,32 @@ apply_handle_insert(StringInfo s) /* For a partitioned table, insert the tuple into a partition. */ if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - apply_handle_tuple_routing(resultRelInfo, estate, - remoteslot, NULL, rel, CMD_INSERT); + apply_handle_tuple_routing(edata, + remoteslot, NULL, CMD_INSERT); else - apply_handle_insert_internal(resultRelInfo, estate, + apply_handle_insert_internal(edata, edata->targetRelInfo, remoteslot); - finish_estate(estate); + finish_edata(edata); logicalrep_rel_close(rel, NoLock); CommandCounterIncrement(); } -/* Workhorse for apply_handle_insert() */ +/* + * Workhorse for apply_handle_insert() + * relinfo is for the relation we're actually inserting into + * (could be a child partition of edata->targetRelInfo) + */ static void -apply_handle_insert_internal(ResultRelInfo *relinfo, - EState *estate, TupleTableSlot *remoteslot) +apply_handle_insert_internal(ApplyExecutionData *edata, + ResultRelInfo *relinfo, + TupleTableSlot *remoteslot) { + EState *estate = edata->estate; + + /* We must open indexes here. */ ExecOpenIndices(relinfo, false); /* Do the insert. */ @@ -1296,9 +1328,9 @@ check_relation_updatable(LogicalRepRelMapEntry *rel) static void apply_handle_update(StringInfo s) { - ResultRelInfo *resultRelInfo; LogicalRepRelMapEntry *rel; LogicalRepRelId relid; + ApplyExecutionData *edata; EState *estate; LogicalRepTupleData oldtup; LogicalRepTupleData newtup; @@ -1329,7 +1361,8 @@ apply_handle_update(StringInfo s) check_relation_updatable(rel); /* Initialize the executor state. */ - estate = create_estate_for_relation(rel, &resultRelInfo); + edata = create_edata_for_relation(rel); + estate = edata->estate; remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); @@ -1369,26 +1402,32 @@ apply_handle_update(StringInfo s) /* For a partitioned table, apply update to correct partition. */ if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - apply_handle_tuple_routing(resultRelInfo, estate, - remoteslot, &newtup, rel, CMD_UPDATE); + apply_handle_tuple_routing(edata, + remoteslot, &newtup, CMD_UPDATE); else - apply_handle_update_internal(resultRelInfo, estate, - remoteslot, &newtup, rel); + apply_handle_update_internal(edata, edata->targetRelInfo, + remoteslot, &newtup); - finish_estate(estate); + finish_edata(edata); logicalrep_rel_close(rel, NoLock); CommandCounterIncrement(); } -/* Workhorse for apply_handle_update() */ +/* + * Workhorse for apply_handle_update() + * relinfo is for the relation we're actually updating in + * (could be a child partition of edata->targetRelInfo) + */ static void -apply_handle_update_internal(ResultRelInfo *relinfo, - EState *estate, TupleTableSlot *remoteslot, - LogicalRepTupleData *newtup, - LogicalRepRelMapEntry *relmapentry) +apply_handle_update_internal(ApplyExecutionData *edata, + ResultRelInfo *relinfo, + TupleTableSlot *remoteslot, + LogicalRepTupleData *newtup) { + EState *estate = edata->estate; + LogicalRepRelMapEntry *relmapentry = edata->targetRel; Relation localrel = relinfo->ri_RelationDesc; EPQState epqstate; TupleTableSlot *localslot; @@ -1447,10 +1486,10 @@ apply_handle_update_internal(ResultRelInfo *relinfo, static void apply_handle_delete(StringInfo s) { - ResultRelInfo *resultRelInfo; LogicalRepRelMapEntry *rel; LogicalRepTupleData oldtup; LogicalRepRelId relid; + ApplyExecutionData *edata; EState *estate; TupleTableSlot *remoteslot; MemoryContext oldctx; @@ -1476,7 +1515,8 @@ apply_handle_delete(StringInfo s) check_relation_updatable(rel); /* Initialize the executor state. */ - estate = create_estate_for_relation(rel, &resultRelInfo); + edata = create_edata_for_relation(rel); + estate = edata->estate; remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel), &TTSOpsVirtual); @@ -1488,26 +1528,32 @@ apply_handle_delete(StringInfo s) /* For a partitioned table, apply delete to correct partition. */ if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - apply_handle_tuple_routing(resultRelInfo, estate, - remoteslot, NULL, rel, CMD_DELETE); + apply_handle_tuple_routing(edata, + remoteslot, NULL, CMD_DELETE); else - apply_handle_delete_internal(resultRelInfo, estate, - remoteslot, &rel->remoterel); + apply_handle_delete_internal(edata, edata->targetRelInfo, + remoteslot); - finish_estate(estate); + finish_edata(edata); logicalrep_rel_close(rel, NoLock); CommandCounterIncrement(); } -/* Workhorse for apply_handle_delete() */ +/* + * Workhorse for apply_handle_delete() + * relinfo is for the relation we're actually deleting from + * (could be a child partition of edata->targetRelInfo) + */ static void -apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate, - TupleTableSlot *remoteslot, - LogicalRepRelation *remoterel) +apply_handle_delete_internal(ApplyExecutionData *edata, + ResultRelInfo *relinfo, + TupleTableSlot *remoteslot) { + EState *estate = edata->estate; Relation localrel = relinfo->ri_RelationDesc; + LogicalRepRelation *remoterel = &edata->targetRel->remoterel; EPQState epqstate; TupleTableSlot *localslot; bool found; @@ -1577,16 +1623,17 @@ FindReplTupleInLocalRel(EState *estate, Relation localrel, * This handles insert, update, delete on a partitioned table. */ static void -apply_handle_tuple_routing(ResultRelInfo *relinfo, - EState *estate, +apply_handle_tuple_routing(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, - LogicalRepRelMapEntry *relmapentry, CmdType operation) { + EState *estate = edata->estate; + LogicalRepRelMapEntry *relmapentry = edata->targetRel; + ResultRelInfo *relinfo = edata->targetRelInfo; Relation parentrel = relinfo->ri_RelationDesc; - ModifyTableState *mtstate = NULL; - PartitionTupleRouting *proute = NULL; + ModifyTableState *mtstate; + PartitionTupleRouting *proute; ResultRelInfo *partrelinfo; Relation partrel; TupleTableSlot *remoteslot_part; @@ -1594,12 +1641,14 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, MemoryContext oldctx; /* ModifyTableState is needed for ExecFindPartition(). */ - mtstate = makeNode(ModifyTableState); + edata->mtstate = mtstate = makeNode(ModifyTableState); mtstate->ps.plan = NULL; mtstate->ps.state = estate; mtstate->operation = operation; mtstate->resultRelInfo = relinfo; - proute = ExecSetupPartitionTupleRouting(estate, parentrel); + + /* ... as is PartitionTupleRouting. */ + edata->proute = proute = ExecSetupPartitionTupleRouting(estate, parentrel); /* * Find the partition to which the "search tuple" belongs. @@ -1633,14 +1682,13 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, switch (operation) { case CMD_INSERT: - apply_handle_insert_internal(partrelinfo, estate, + apply_handle_insert_internal(edata, partrelinfo, remoteslot_part); break; case CMD_DELETE: - apply_handle_delete_internal(partrelinfo, estate, - remoteslot_part, - &relmapentry->remoterel); + apply_handle_delete_internal(edata, partrelinfo, + remoteslot_part); break; case CMD_UPDATE: @@ -1752,9 +1800,8 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, Assert(partrelinfo_new != partrelinfo); /* DELETE old tuple found in the old partition. */ - apply_handle_delete_internal(partrelinfo, estate, - localslot, - &relmapentry->remoterel); + apply_handle_delete_internal(edata, partrelinfo, + localslot); /* INSERT new tuple into the new partition. */ @@ -1782,7 +1829,7 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, slot_getallattrs(remoteslot); } MemoryContextSwitchTo(oldctx); - apply_handle_insert_internal(partrelinfo_new, estate, + apply_handle_insert_internal(edata, partrelinfo_new, remoteslot_part); } } @@ -1792,8 +1839,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo, elog(ERROR, "unrecognized CmdType: %d", (int) operation); break; } - - ExecCleanupTupleRouting(mtstate, proute); } /*
I wrote: > I count three distinct bugs that were exposed by this attempt: > ... > 2. Said bug causes a segfault in the apply worker process. > This causes the parent postmaster to give up and die. > I don't understand why we don't treat that like a crash > in a regular backend, considering that an apply worker > is running largely user-defined code. Figured that one out: we *do* treat it like a crash in a regular backend, which explains the lack of field complaints. What's contributing to the TAP test getting stuck is that PostgresNode.pm does this: open my $conf, '>>', "$pgdata/postgresql.conf"; print $conf "\n# Added by PostgresNode.pm\n"; ... print $conf "restart_after_crash = off\n"; So that'd be fine, if only the TAP tests were a bit more robust about postmasters disappearing unexpectedly. BTW, I wonder whether it wouldn't be a good idea for the postmaster to log something along the lines of "stopping because restart_after_crash is off". The present behavior can be quite mysterious otherwise (it certainly confused me). regards, tom lane
I wrote: > BTW, I wonder whether it wouldn't be a good idea for the > postmaster to log something along the lines of "stopping > because restart_after_crash is off". The present behavior > can be quite mysterious otherwise (it certainly confused me). Concretely, I suggest the attached. While checking the other ExitPostmaster calls to see if any of them lacked suitable log messages, I noticed that there's one after a call to AuxiliaryProcessMain, which is marked pg_attribute_noreturn(). So that's dead code, and if it weren't dead it'd be wrong, because we shouldn't use ExitPostmaster to exit a child process. regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 5327859472..5a050898fe 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -3973,7 +3973,11 @@ PostmasterStateMachine(void) if (ReachedNormalRunning) CancelBackup(); - /* Normal exit from the postmaster is here */ + /* + * Normal exit from the postmaster is here. We don't need to log + * anything here, since the UnlinkLockFiles proc_exit callback + * will do so, and that should be the last user-visible action. + */ ExitPostmaster(0); } } @@ -3985,9 +3989,21 @@ PostmasterStateMachine(void) * startup process fails, because more than likely it will just fail again * and we will keep trying forever. */ - if (pmState == PM_NO_CHILDREN && - (StartupStatus == STARTUP_CRASHED || !restart_after_crash)) - ExitPostmaster(1); + if (pmState == PM_NO_CHILDREN) + { + if (StartupStatus == STARTUP_CRASHED) + { + ereport(LOG, + (errmsg("shutting down due to startup process failure"))); + ExitPostmaster(1); + } + if (!restart_after_crash) + { + ereport(LOG, + (errmsg("shutting down because restart_after_crash is off"))); + ExitPostmaster(1); + } + } /* * If we need to recover from a crash, wait for all non-syslogger children @@ -5439,8 +5455,7 @@ StartChildProcess(AuxProcType type) MemoryContextDelete(PostmasterContext); PostmasterContext = NULL; - AuxiliaryProcessMain(ac, av); - ExitPostmaster(0); + AuxiliaryProcessMain(ac, av); /* does not return */ } #endif /* EXEC_BACKEND */
On Sat, May 22, 2021 at 6:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > IMHO, it would be better to keep the lowest-level > > apply_handle_XXX_internal() out of this, because presumably we're only > > cleaning up the mess in higher-level callers. Somewhat related, one > > of the intentions behind a04daa97a43, which removed > > es_result_relation_info in favor of passing the ResultRelInfo > > explicitly to the executor's lower-level functions, was to avoid bugs > > caused by failing to set/reset that global field correctly in > > higher-level callers. > > Yeah, that's a fair point, and after some reflection I think that > repeatedly changing the "active" field of the struct is exactly > what was bothering me about the v2 patch. So in the attached v3, > I went back to passing that as an explicit argument. The state > struct now has no fields that need to change after first being set. Thanks, that looks good to me. > I did notice that we could remove some other random arguments > by adding the LogicalRepRelMapEntry* to the state struct, > so this also does that. That seems fine. BTW, I think we'd need to cherry-pick f3b141c4825 (or maybe parts of it) into v13 branch for back-patching this. -- Amit Langote EDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes: > BTW, I think we'd need to cherry-pick f3b141c4825 (or maybe parts of > it) into v13 branch for back-patching this. I already did a fair amount of that yesterday, cf 84f5c2908 et al. But that does raise the question of how far we need to back-patch this. I gather that the whole issue might've started with 1375422c, so maybe we don't really need a back-patch at all? But I'm sort of inclined to back-patch to v11 as I did with 84f5c2908, mainly to keep the worker.c code looking more alike in those branches. regards, tom lane
I wrote: > Amit Langote <amitlangote09@gmail.com> writes: >> BTW, I think we'd need to cherry-pick f3b141c4825 (or maybe parts of >> it) into v13 branch for back-patching this. > I already did a fair amount of that yesterday, cf 84f5c2908 et al. > But that does raise the question of how far we need to back-patch this. > I gather that the whole issue might've started with 1375422c, so maybe > we don't really need a back-patch at all? ... wrong. Running v13 branch tip under CLOBBER_CACHE_ALWAYS provokes a core dump in 013_partition.pl, so 1375422c is not to blame. Now I'm wondering how far back there's a live issue. regards, tom lane
I wrote: > ... wrong. Running v13 branch tip under CLOBBER_CACHE_ALWAYS provokes > a core dump in 013_partition.pl, so 1375422c is not to blame. Now > I'm wondering how far back there's a live issue. Oh, of course, it's directly the fault of the patch that added support for partitioned target tables. I concluded that a verbatim backpatch wasn't too suitable because a04daa97a had changed a lot of the APIs here. So I left the APIs for the xxx_internal() functions alone. Otherwise the patch pretty much works as-is in v13. regards, tom lane
On Sun, May 23, 2021 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > ... wrong. Running v13 branch tip under CLOBBER_CACHE_ALWAYS provokes > > a core dump in 013_partition.pl, so 1375422c is not to blame. Now > > I'm wondering how far back there's a live issue. > > Oh, of course, it's directly the fault of the patch that added support > for partitioned target tables. Yeah, the problem seems to affect only partition child tables, so yeah this problem started with f1ac27bfda6. > I concluded that a verbatim backpatch wasn't too suitable because > a04daa97a had changed a lot of the APIs here. So I left the APIs > for the xxx_internal() functions alone. Otherwise the patch > pretty much works as-is in v13. That looks reasonable. Thanks. -- Amit Langote EDB: http://www.enterprisedb.com
On Sun, May 23, 2021 at 02:05:59PM +0900, Amit Langote wrote: > On Sun, May 23, 2021 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wrote: >> > ... wrong. Running v13 branch tip under CLOBBER_CACHE_ALWAYS provokes >> > a core dump in 013_partition.pl, so 1375422c is not to blame. Now >> > I'm wondering how far back there's a live issue. >> >> Oh, of course, it's directly the fault of the patch that added support >> for partitioned target tables. > > Yeah, the problem seems to affect only partition child tables, so yeah > this problem started with f1ac27bfda6. Yep. >> I concluded that a verbatim backpatch wasn't too suitable because >> a04daa97a had changed a lot of the APIs here. So I left the APIs >> for the xxx_internal() functions alone. Otherwise the patch >> pretty much works as-is in v13. Thanks for the backpatch of the partition tests via d18ee6f. -- Michael