Re: Table refer leak in logical replication - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Table refer leak in logical replication |
Date | |
Msg-id | YH0eaWl6k7syS52W@paquier.xyz Whole thread Raw |
In response to | Re: Table refer leak in logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Table refer leak in logical replication
|
List | pgsql-hackers |
On Sat, Apr 17, 2021 at 07:02:00PM +0530, Amit Kapila wrote: > Hmm, I am not sure if it is a good idea to open indexes needlessly > especially when it is not done in the previous code. Studying the history of this code, I think that f1ac27b is to blame here for making the code of the apply worker much messier than it was before. Before that, we were at a point where we had to rely on one single ResultRelInfo with all its indexes opened and closed before doing the DML. After f1ac27b, the code becomes shaped so as the original ResultRelInfo may or may not be used depending on if this code is working on a partitioned table or not. With an UPDATE, not one but *two* ResultRelInfo may be used if a tuple is moved to a different partition. I think that in the long term, and if we want to make use of ExecInitResultRelation() in this area, we are going to need to split the apply code in two parts, roughly (easier to say in words than actually doing it, still): - Find out first which relations it is necessary to work on, and create a set of ResultRelInfo assigned to an executor state by ExecInitResultRelation(), doing all the relation openings that are necessary. The gets funky when attempting to do an update across partitions. - Do the actual DML, with all the relations already opened and ready for use. On top of that, I am really getting scared by the following, done in not one but now two places: /* * The tuple to be updated could not be found. * * TODO what to do here, change the log level to LOG perhaps? */ elog(DEBUG1, "logical replication did not find row for update " "in replication target relation \"%s\"", RelationGetRelationName(localrel)); This already existed in once place before f1ac27b, but this got duplicated in a second area when applying the first update to a partition table. The refactoring change done in 1375422c in worker.c without the tuple routing pieces would be a piece of cake in terms of relations that require to be opened and closed, including the timings of each call because they could be unified in single code paths, and I also guess that we would avoid leak issues really easily. If the tuple routing code actually does not consider the case of moving a tuple across partitions, the level of difficulty to do an integration with ExecInitResultRelation() is much more reduced, though it makes the feature much less appealing as it becomes much more difficult to do some data redistribution across a different set of partitions with logical changes. > I am not sure if it is a good idea to do the refactoring related to > indexes or other things to fix a minor bug in commit 1375422c. It > might be better to add a simple fix like what Hou-San has originally > proposed [1] because even using ExecInitResultRelation might not be > the best thing as it is again trying to open a range table which we > have already opened in logicalrep_rel_open. OTOH, using > ExecInitResultRelation might encapsulate the assignment we are doing > outside. Yeah, that would be nice to just rely on that. copyfrom.c does basically what I guess we should try to copy a maximum here. With a proper cleanup of the executor state using ExecCloseResultRelations() once we are done with the tuple apply. > In general, it seems doing bigger refactoring or changes > might lead to some bugs or unintended consequences, so if possible, we > can try such refactoring as a separate patch. One argument against the > proposed refactoring could be that with the previous code, we were > trying to open the indexes just before it is required but with the new > patch in some cases, they will be opened during the initial phase and > for other cases, they are opened just before they are required. It > might not necessarily be a bad idea to rearrange code like that but > maybe there is a better way to do that. > > [1] - https://www.postgresql.org/message-id/OS0PR01MB571686F75FBDC219FF3DFF0D94769%40OS0PR01MB5716.jpnprd01.prod.outlook.com This feels like piling one extra hack on top of what looks like an abuse of the executor calls to me, and the apply code is already full of it. True that we do that in ExecuteTruncateGuts() for allow triggers to be fired, but I think that it would be better to avoid spread that to consolidate the trigger and execution code. FWIW, I would be tempted to send back f1ac27b to the blackboard, then refactor the code of the apply worker to use ExecInitResultRelation() so as we get more consistency with resource releases, simplifying the business with indexes. Once the code is in a cleaner state, we could come back into making an integration with partitioned tables into this code. -- Michael
Attachment
pgsql-hackers by date: