Thread: Table refer leak in logical replication

Table refer leak in logical replication

From

"tanghy.fnst@fujitsu.com"

Date:

06 April 2021, 01:04:51

Hi

I met a problem about trigger in logical replication.

I created a trigger after inserting data at subscriber, but there is a warning in the log of subscriber when the
triggerfired: 
WARNING: relcache reference leak: relation "xxx" not closed.

Example of the procedure:
------publisher------
create table test (a int primary key);
create publication pub for table test;

------subscriber------
create table test (a int primary key);
create subscription sub connection 'dbname=postgres' publication pub;
create function funcA() returns trigger as $$ begin return null; end; $$ language plpgsql;
create trigger my_trig after insert or update or delete on test for each row execute procedure funcA();
alter table test enable replica trigger my_trig;

------publisher------
insert into test values (6);

It seems an issue about reference leak. Anyone can fix this?

Regards,
Tang

RE: Table refer leak in logical replication

From

"houzj.fnst@fujitsu.com"

Date:

06 April 2021, 01:15:33

> WARNING: relcache reference leak: relation "xxx" not closed.
>
> Example of the procedure:
> ------publisher------
> create table test (a int primary key);
> create publication pub for table test;
>
> ------subscriber------
> create table test (a int primary key);
> create subscription sub connection 'dbname=postgres' publication pub;
> create function funcA() returns trigger as $$ begin return null; end; $$ language
> plpgsql; create trigger my_trig after insert or update or delete on test for each
> row execute procedure funcA(); alter table test enable replica trigger my_trig;
>
> ------publisher------
> insert into test values (6);
>
> It seems an issue about reference leak. Anyone can fix this?

It seems ExecGetTriggerResultRel will reopen the target table because it cannot find an existing one.
Storing the opened table in estate->es_opened_result_relations seems solves the problem.

Attaching a patch that fix this.
BTW, it seems better to add a testcase for this ?

Best regards,
houzj

Attachment

fix_table_refer_leak.diff

RE: Table refer leak in logical replication

From

"shiy.fnst@fujitsu.com"

Date:

06 April 2021, 01:49:15

> BTW, it seems better to add a testcase for this ?

I think the test for it can be added in src/test/subscription/t/003_constraints.pl, which is like what in my patch.

Regards,
Shi yu

Attachment

tests_for_table_refer_leak.diff

Re: Table refer leak in logical replication

From

Masahiko Sawada

Date:

06 April 2021, 03:23:56

On Tue, Apr 6, 2021 at 10:15 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> > WARNING: relcache reference leak: relation "xxx" not closed.
> >
> > Example of the procedure:
> > ------publisher------
> > create table test (a int primary key);
> > create publication pub for table test;
> >
> > ------subscriber------
> > create table test (a int primary key);
> > create subscription sub connection 'dbname=postgres' publication pub;
> > create function funcA() returns trigger as $$ begin return null; end; $$ language
> > plpgsql; create trigger my_trig after insert or update or delete on test for each
> > row execute procedure funcA(); alter table test enable replica trigger my_trig;
> >
> > ------publisher------
> > insert into test values (6);
> >
> > It seems an issue about reference leak. Anyone can fix this?
>
> It seems ExecGetTriggerResultRel will reopen the target table because it cannot find an existing one.
> Storing the opened table in estate->es_opened_result_relations seems solves the problem.

It seems like commit 1375422c is related to this bug. The commit
introduced a new function ExecInitResultRelation() that sets both
estate->es_result_relations and estate->es_opened_result_relations. I
think it's better to use ExecInitResultRelation() rather than directly
setting estate->es_opened_result_relations. It might be better to do
that in create_estate_for_relation() though. Please find an attached
patch.

Since this issue happens on only HEAD and it seems an oversight of
commit 1375422c, I don't think regression tests for this are
essential.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment

fix_relcache_leak_in_lrworker.patch

RE: Table refer leak in logical replication

From

"houzj.fnst@fujitsu.com"

Date:

06 April 2021, 04:01:13

> > > insert into test values (6);
> > >
> > > It seems an issue about reference leak. Anyone can fix this?
> >
> > It seems ExecGetTriggerResultRel will reopen the target table because it
> cannot find an existing one.
> > Storing the opened table in estate->es_opened_result_relations seems
> solves the problem.
> 
> It seems like commit 1375422c is related to this bug. The commit introduced a
> new function ExecInitResultRelation() that sets both
> estate->es_result_relations and estate->es_opened_result_relations. I
> think it's better to use ExecInitResultRelation() rather than directly setting
> estate->es_opened_result_relations. It might be better to do that in
> create_estate_for_relation() though. Please find an attached patch.
> 
> Since this issue happens on only HEAD and it seems an oversight of commit
> 1375422c, I don't think regression tests for this are essential.

It seems we can not only use ExecInitResultRelation.
In function ExecInitResultRelation, it will use ExecGetRangeTableRelation which
will also open the target table and store the rel in "Estate->es_relations".
We should call ExecCloseRangeTableRelations at the end of apply_handle_xxx to
close the rel in "Estate->es_relations".

Attaching the patch with this change.

Best regards,
houzj

Attachment

fix_relcache_leak_in_lrworker.diff

Re: Table refer leak in logical replication

From

Amit Langote

Date:

06 April 2021, 04:15:26

On Tue, Apr 6, 2021 at 1:01 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> > > > insert into test values (6);
> > > >
> > > > It seems an issue about reference leak. Anyone can fix this?
> > >
> > > It seems ExecGetTriggerResultRel will reopen the target table because it
> > cannot find an existing one.
> > > Storing the opened table in estate->es_opened_result_relations seems
> > solves the problem.
> >
> > It seems like commit 1375422c is related to this bug.

Right, thanks for pointing this out.

> The commit introduced a
> > new function ExecInitResultRelation() that sets both
> > estate->es_result_relations and estate->es_opened_result_relations. I
> > think it's better to use ExecInitResultRelation() rather than directly setting
> > estate->es_opened_result_relations. It might be better to do that in
> > create_estate_for_relation() though. Please find an attached patch.

Agree that ExecInitResultRelations() would be better.

> > Since this issue happens on only HEAD and it seems an oversight of commit
> > 1375422c, I don't think regression tests for this are essential.
>
> It seems we can not only use ExecInitResultRelation.
> In function ExecInitResultRelation, it will use ExecGetRangeTableRelation which
> will also open the target table and store the rel in "Estate->es_relations".
> We should call ExecCloseRangeTableRelations at the end of apply_handle_xxx to
> close the rel in "Estate->es_relations".

Right, ExecCloseRangeTableRelations() was missing.

I think it may be better to create a sibling function to
create_estate_for_relation(), say, close_estate(EState *), that
performs the cleanup actions, including the firing of any AFTER
triggers.  See attached updated patch to see what I mean.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

fix_relcache_leak_in_lrworker_v3.patch

Re: Table refer leak in logical replication

From

Masahiko Sawada

Date:

06 April 2021, 04:56:39

On Tue, Apr 6, 2021 at 1:15 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Tue, Apr 6, 2021 at 1:01 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > > > > insert into test values (6);
> > > > >
> > > > > It seems an issue about reference leak. Anyone can fix this?
> > > >
> > > > It seems ExecGetTriggerResultRel will reopen the target table because it
> > > cannot find an existing one.
> > > > Storing the opened table in estate->es_opened_result_relations seems
> > > solves the problem.
> > >
> > > It seems like commit 1375422c is related to this bug.
>
> Right, thanks for pointing this out.
>
> > The commit introduced a
> > > new function ExecInitResultRelation() that sets both
> > > estate->es_result_relations and estate->es_opened_result_relations. I
> > > think it's better to use ExecInitResultRelation() rather than directly setting
> > > estate->es_opened_result_relations. It might be better to do that in
> > > create_estate_for_relation() though. Please find an attached patch.
>
> Agree that ExecInitResultRelations() would be better.
>
> > > Since this issue happens on only HEAD and it seems an oversight of commit
> > > 1375422c, I don't think regression tests for this are essential.
> >
> > It seems we can not only use ExecInitResultRelation.
> > In function ExecInitResultRelation, it will use ExecGetRangeTableRelation which
> > will also open the target table and store the rel in "Estate->es_relations".
> > We should call ExecCloseRangeTableRelations at the end of apply_handle_xxx to
> > close the rel in "Estate->es_relations".
>
> Right, ExecCloseRangeTableRelations() was missing.

Yeah, I had missed it. Thank you for pointing out it.

>
> I think it may be better to create a sibling function to
> create_estate_for_relation(), say, close_estate(EState *), that
> performs the cleanup actions, including the firing of any AFTER
> triggers.  See attached updated patch to see what I mean.

Looks good to me.

BTW I found the following comments in create_estate_for_relation():

/*
 * Executor state preparation for evaluation of constraint expressions,
 * indexes and triggers.
 *
 * This is based on similar code in copy.c
 */
static EState *
create_estate_for_relation(LogicalRepRelMapEntry *rel)

It seems like the comments meant the code around CopyFrom() and
DoCopy() but it would no longer be true since copy.c has been split
into some files and I don't find similar code in copy.c. I think it’s
better to remove the sentence rather than update the file name as this
comment doesn’t really informative and hard to track the updates. What
do you think?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Table refer leak in logical replication

From

Amit Langote

Date:

06 April 2021, 05:25:05

On Tue, Apr 6, 2021 at 1:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Apr 6, 2021 at 1:15 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Tue, Apr 6, 2021 at 1:01 PM houzj.fnst@fujitsu.com
> > > The commit introduced a
> > > > new function ExecInitResultRelation() that sets both
> > > > estate->es_result_relations and estate->es_opened_result_relations. I
> > > > think it's better to use ExecInitResultRelation() rather than directly setting
> > > > estate->es_opened_result_relations. It might be better to do that in
> > > > create_estate_for_relation() though. Please find an attached patch.
> >
> > Agree that ExecInitResultRelations() would be better.
> >
> > > > Since this issue happens on only HEAD and it seems an oversight of commit
> > > > 1375422c, I don't think regression tests for this are essential.
> > >
> > > It seems we can not only use ExecInitResultRelation.
> > > In function ExecInitResultRelation, it will use ExecGetRangeTableRelation which
> > > will also open the target table and store the rel in "Estate->es_relations".
> > > We should call ExecCloseRangeTableRelations at the end of apply_handle_xxx to
> > > close the rel in "Estate->es_relations".
> >
> > Right, ExecCloseRangeTableRelations() was missing.
>
> Yeah, I had missed it. Thank you for pointing out it.
> >
> > I think it may be better to create a sibling function to
> > create_estate_for_relation(), say, close_estate(EState *), that
> > performs the cleanup actions, including the firing of any AFTER
> > triggers.  See attached updated patch to see what I mean.
>
> Looks good to me.
>
> BTW I found the following comments in create_estate_for_relation():
>
> /*
>  * Executor state preparation for evaluation of constraint expressions,
>  * indexes and triggers.
>  *
>  * This is based on similar code in copy.c
>  */
> static EState *
> create_estate_for_relation(LogicalRepRelMapEntry *rel)
>
> It seems like the comments meant the code around CopyFrom() and
> DoCopy() but it would no longer be true since copy.c has been split
> into some files and I don't find similar code in copy.c. I think it’s
> better to remove the sentence rather than update the file name as this
> comment doesn’t really informative and hard to track the updates. What
> do you think?

Yeah, agree with simply removing that comment.

While updating the patch to do so, it occurred to me that maybe we
could move the ExecInitResultRelation() call into
create_estate_for_relation() too, in the spirit of removing
duplicative code.  See attached updated patch.  Actually I remember
proposing that as part of the commit you shared in your earlier email,
but for some reason it didn't end up in the commit.  I now think maybe
we should do that after all.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

fix_relcache_leak_in_lrworker_v4.patch

RE: Table refer leak in logical replication

From

"tanghy.fnst@fujitsu.com"

Date:

06 April 2021, 06:03:42

On Tuesday, April 6, 2021 2:25 PM Amit Langote <amitlangote09@gmail.com> wrote:
>While updating the patch to do so, it occurred to me that maybe we
>could move the ExecInitResultRelation() call into
>create_estate_for_relation() too, in the spirit of removing
>duplicative code.  See attached updated patch.

Thanks for your fixing. The code LGTM.
Made a confirmation right now, the problem has been solved after applying your patch.

Regards,
Tang

Re: Table refer leak in logical replication

From

Justin Pryzby

Date:

10 April 2021, 01:39:19

I added this as an Open Item.
https://wiki.postgresql.org/index.php?title=PostgreSQL_14_Open_Items&type=revision&diff=35895&oldid=35890

https://www.postgresql.org/message-id/flat/OS0PR01MB6113BA0A760C9964A4A0C5C2FB769%40OS0PR01MB6113.jpnprd01.prod.outlook.com#2fc410dff5cd27eea357ffc17fc174f2

On Tue, Apr 06, 2021 at 02:25:05PM +0900, Amit Langote wrote:
> On Tue, Apr 6, 2021 at 1:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Tue, Apr 6, 2021 at 1:15 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Tue, Apr 6, 2021 at 1:01 PM houzj.fnst@fujitsu.com
> > > > The commit introduced a
> > > > > new function ExecInitResultRelation() that sets both
> > > > > estate->es_result_relations and estate->es_opened_result_relations. I
> > > > > think it's better to use ExecInitResultRelation() rather than directly setting
> > > > > estate->es_opened_result_relations. It might be better to do that in
> > > > > create_estate_for_relation() though. Please find an attached patch.
> > >
> > > Agree that ExecInitResultRelations() would be better.
> > >
> > > > > Since this issue happens on only HEAD and it seems an oversight of commit
> > > > > 1375422c, I don't think regression tests for this are essential.
> > > >
> > > > It seems we can not only use ExecInitResultRelation.
> > > > In function ExecInitResultRelation, it will use ExecGetRangeTableRelation which
> > > > will also open the target table and store the rel in "Estate->es_relations".
> > > > We should call ExecCloseRangeTableRelations at the end of apply_handle_xxx to
> > > > close the rel in "Estate->es_relations".
> > >
> > > Right, ExecCloseRangeTableRelations() was missing.
> >
> > Yeah, I had missed it. Thank you for pointing out it.
> > >
> > > I think it may be better to create a sibling function to
> > > create_estate_for_relation(), say, close_estate(EState *), that
> > > performs the cleanup actions, including the firing of any AFTER
> > > triggers.  See attached updated patch to see what I mean.
> >
> > Looks good to me.
> >
> > BTW I found the following comments in create_estate_for_relation():
> >
> > /*
> >  * Executor state preparation for evaluation of constraint expressions,
> >  * indexes and triggers.
> >  *
> >  * This is based on similar code in copy.c
> >  */
> > static EState *
> > create_estate_for_relation(LogicalRepRelMapEntry *rel)
> >
> > It seems like the comments meant the code around CopyFrom() and
> > DoCopy() but it would no longer be true since copy.c has been split
> > into some files and I don't find similar code in copy.c. I think it’s
> > better to remove the sentence rather than update the file name as this
> > comment doesn’t really informative and hard to track the updates. What
> > do you think?
> 
> Yeah, agree with simply removing that comment.
> 
> While updating the patch to do so, it occurred to me that maybe we
> could move the ExecInitResultRelation() call into
> create_estate_for_relation() too, in the spirit of removing
> duplicative code.  See attached updated patch.  Actually I remember
> proposing that as part of the commit you shared in your earlier email,
> but for some reason it didn't end up in the commit.  I now think maybe
> we should do that after all.

Re: Table refer leak in logical replication

From

Amit Langote

Date:

12 April 2021, 01:29:39

On Sat, Apr 10, 2021 at 10:39 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I added this as an Open Item.

Thanks, Justin.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

16 April 2021, 06:24:50

On Tue, Apr 06, 2021 at 02:25:05PM +0900, Amit Langote wrote:
> While updating the patch to do so, it occurred to me that maybe we
> could move the ExecInitResultRelation() call into
> create_estate_for_relation() too, in the spirit of removing
> duplicative code.  See attached updated patch.  Actually I remember
> proposing that as part of the commit you shared in your earlier email,
> but for some reason it didn't end up in the commit.  I now think maybe
> we should do that after all.

So, I have been studying 1375422c and this thread.  Using
ExecCloseRangeTableRelations() when cleaning up the executor state is
reasonable as of the equivalent call to ExecInitRangeTable().  Now,
there is something that itched me quite a lot while reviewing the
proposed patch: ExecCloseResultRelations() is missing, but other
code paths make an effort to mirror any calls of ExecInitRangeTable()
with it.  Looking closer, I think that the worker code is actually
confused with the handling of the opening and closing of the indexes
needed by a result relation once you use that, because some code paths
related to tuple routing for partitions may, or may not, need to do
that.  However, once the code integrates with ExecInitRangeTable() and
ExecCloseResultRelations(), the index handlings could get much simpler
to understand as the internal apply routines for INSERT/UPDATE/DELETE
have no need to think about the opening or closing them.  Well,
mostly, to be honest.

There are two exceptions when it comes the tuple routing for
partitioned tables, one for INSERT/DELETE as the result relation found
at the top of apply_handle_tuple_routing() can be used, and a second
for the UPDATE case as it is necessary to re-route the tuple to the
new partition, as it becomes necessary to open and close the indexes
of the new partition relation where a tuple is sent to.  I think that
there is a lot of room for a much better integration in terms of
estate handling for this stuff with the executor, but that would be
too invasive for v14 post feature freeze, and I am not sure what a
good design would be.

Related to that, I found confusing that the patch was passing down a
result relation from create_estate_for_relation() for something that's
just stored in the executor state.  Having a "close" routine that maps
to the "create" routine gives a good vibe, though "close" is usually
used in parallel of "open" in the PG code, and instead of "free" I
have found "finish" a better term.

Another thing, and I think that it is a good change, is that it is
necessary to push a snapshot in the worker process before creating the
executor state as any index predicates of the result relation are
going to need that when opened.  My impression of the code of worker.c
is that the amount of code duplication is quite high between the three
DML code paths, with the update tuple routing logic being a space of
improvements on its own, and that it could gain in clarity with more
refactoring work around the executor states, but I am fine to leave
that as future work.  That's too late for v14.

Attached is v5 that I am finishing with.  Much more could be done but
I don't want to do something too invasive at this stage of the game.
There are a couple of extra relations in terms of relations opened for
a partitioned table within create_estate_for_relation() when
redirecting to the tuple routing, but my guess is that this would be
better in the long-term.  We could bypass doing that when working on a
partitioned table, but I really don't think that this code should be
made more confusing and that we had better apply
ExecCloseResultRelations() for all the relations faced.  That's
simpler to reason about IMO.
--
Michael

Attachment

Re: Table refer leak in logical replication

From

Amit Kapila

Date:

17 April 2021, 13:32:00

On Fri, Apr 16, 2021 at 11:55 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Apr 06, 2021 at 02:25:05PM +0900, Amit Langote wrote:
>
> Attached is v5 that I am finishing with.  Much more could be done but
> I don't want to do something too invasive at this stage of the game.
> There are a couple of extra relations in terms of relations opened for
> a partitioned table within create_estate_for_relation() when
> redirecting to the tuple routing, but my guess is that this would be
> better in the long-term.
>

Hmm, I am not sure if it is a good idea to open indexes needlessly
especially when it is not done in the previous code.

@@ -1766,8 +1771,11 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
  slot_getallattrs(remoteslot);
  }
  MemoryContextSwitchTo(oldctx);
+
+ ExecOpenIndices(partrelinfo_new, false);
  apply_handle_insert_internal(partrelinfo_new, estate,
  remoteslot_part);
+ ExecCloseIndices(partrelinfo_new);
  }

It seems you forgot to call open indexes before apply_handle_delete_internal.

I am not sure if it is a good idea to do the refactoring related to
indexes or other things to fix a minor bug in commit 1375422c. It
might be better to add a simple fix like what Hou-San has originally
proposed [1] because even using ExecInitResultRelation might not be
the best thing as it is again trying to open a range table which we
have already opened in logicalrep_rel_open. OTOH, using
ExecInitResultRelation might encapsulate the assignment we are doing
outside. In general, it seems doing bigger refactoring or changes
might lead to some bugs or unintended consequences, so if possible, we
can try such refactoring as a separate patch. One argument against the
proposed refactoring could be that with the previous code, we were
trying to open the indexes just before it is required but with the new
patch in some cases, they will be opened during the initial phase and
for other cases, they are opened just before they are required. It
might not necessarily be a bad idea to rearrange code like that but
maybe there is a better way to do that.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB571686F75FBDC219FF3DFF0D94769%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

19 April 2021, 06:08:41

On Sat, Apr 17, 2021 at 07:02:00PM +0530, Amit Kapila wrote:
> Hmm, I am not sure if it is a good idea to open indexes needlessly
> especially when it is not done in the previous code.

Studying the history of this code, I think that f1ac27b is to blame
here for making the code of the apply worker much messier than it was
before.  Before that, we were at a point where we had to rely on one
single ResultRelInfo with all its indexes opened and closed before
doing the DML.  After f1ac27b, the code becomes shaped so as the
original ResultRelInfo may or may not be used depending on if this
code is working on a partitioned table or not.  With an UPDATE, not
one but *two* ResultRelInfo may be used if a tuple is moved to a
different partition.  I think that in the long term, and if we want to
make use of ExecInitResultRelation() in this area, we are going to
need to split the apply code in two parts, roughly (easier to say in
words than actually doing it, still):
- Find out first which relations it is necessary to work on, and
create a set of ResultRelInfo assigned to an executor state by
ExecInitResultRelation(), doing all the relation openings that are
necessary.  The gets funky when attempting to do an update across
partitions.
- Do the actual DML, with all the relations already opened and ready
for use.

On top of that, I am really getting scared by the following, done in
not one but now two places:
    /*
     * The tuple to be updated could not be found.
     *
     * TODO what to do here, change the log level to LOG perhaps?
     */
    elog(DEBUG1,
         "logical replication did not find row for update "
         "in replication target relation \"%s\"",
         RelationGetRelationName(localrel));
This already existed in once place before f1ac27b, but this got
duplicated in a second area when applying the first update to a
partition table.

The refactoring change done in 1375422c in worker.c without the tuple
routing pieces would be a piece of cake in terms of relations that
require to be opened and closed, including the timings of each call
because they could be unified in single code paths, and I also guess
that we would avoid leak issues really easily.  If the tuple routing
code actually does not consider the case of moving a tuple across
partitions, the level of difficulty to do an integration with
ExecInitResultRelation() is much more reduced, though it makes the
feature much less appealing as it becomes much more difficult to do
some data redistribution across a different set of partitions with
logical changes.

> I am not sure if it is a good idea to do the refactoring related to
> indexes or other things to fix a minor bug in commit 1375422c. It
> might be better to add a simple fix like what Hou-San has originally
> proposed [1] because even using ExecInitResultRelation might not be
> the best thing as it is again trying to open a range table which we
> have already opened in logicalrep_rel_open. OTOH, using
> ExecInitResultRelation might encapsulate the assignment we are doing
> outside.

Yeah, that would be nice to just rely on that.  copyfrom.c does
basically what I guess we should try to copy a maximum here.  With a
proper cleanup of the executor state using ExecCloseResultRelations()
once we are done with the tuple apply.

> In general, it seems doing bigger refactoring or changes
> might lead to some bugs or unintended consequences, so if possible, we
> can try such refactoring as a separate patch. One argument against the
> proposed refactoring could be that with the previous code, we were
> trying to open the indexes just before it is required but with the new
> patch in some cases, they will be opened during the initial phase and
> for other cases, they are opened just before they are required. It
> might not necessarily be a bad idea to rearrange code like that but
> maybe there is a better way to do that.
>
> [1] -
https://www.postgresql.org/message-id/OS0PR01MB571686F75FBDC219FF3DFF0D94769%40OS0PR01MB5716.jpnprd01.prod.outlook.com

This feels like piling one extra hack on top of what looks like an
abuse of the executor calls to me, and the apply code is already full
of it.  True that we do that in ExecuteTruncateGuts() for allow
triggers to be fired, but I think that it would be better to avoid
spread that to consolidate the trigger and execution code.  FWIW, I
would be tempted to send back f1ac27b to the blackboard, then refactor
the code of the apply worker to use ExecInitResultRelation() so as we
get more consistency with resource releases, simplifying the business
with indexes.  Once the code is in a cleaner state, we could come back
into making an integration with partitioned tables into this code.
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

19 April 2021, 06:12:36

On Mon, Apr 19, 2021 at 03:08:41PM +0900, Michael Paquier wrote:
> FWIW, I
> would be tempted to send back f1ac27b to the blackboard, then refactor
> the code of the apply worker to use ExecInitResultRelation() so as we
> get more consistency with resource releases, simplifying the business
> with indexes.  Once the code is in a cleaner state, we could come back
> into making an integration with partitioned tables into this code.

But you cannot do that either as f1ac27bf got into 13..
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Langote

Date:

19 April 2021, 07:02:47

On Sat, Apr 17, 2021 at 10:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Apr 16, 2021 at 11:55 AM Michael Paquier <michael@paquier.xyz> wrote:
> > On Tue, Apr 06, 2021 at 02:25:05PM +0900, Amit Langote wrote:
> >
> > Attached is v5 that I am finishing with.  Much more could be done but
> > I don't want to do something too invasive at this stage of the game.
> > There are a couple of extra relations in terms of relations opened for
> > a partitioned table within create_estate_for_relation() when
> > redirecting to the tuple routing, but my guess is that this would be
> > better in the long-term.
> >
>
> Hmm, I am not sure if it is a good idea to open indexes needlessly
> especially when it is not done in the previous code.
>
> @@ -1766,8 +1771,11 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
>   slot_getallattrs(remoteslot);
>   }
>   MemoryContextSwitchTo(oldctx);
> +
> + ExecOpenIndices(partrelinfo_new, false);
>   apply_handle_insert_internal(partrelinfo_new, estate,
>   remoteslot_part);
> + ExecCloseIndices(partrelinfo_new);
>   }
>
> It seems you forgot to call open indexes before apply_handle_delete_internal.
>
> I am not sure if it is a good idea to do the refactoring related to
> indexes or other things to fix a minor bug in commit 1375422c. It
> might be better to add a simple fix like what Hou-San has originally
> proposed [1] because even using ExecInitResultRelation might not be
> the best thing as it is again trying to open a range table which we
> have already opened in logicalrep_rel_open.

FWIW, I agree with fixing this bug of 1375422c in as least scary
manner as possible.  Hou-san proposed that we add the ResultRelInfo
that apply_handle_{insert|update|delete} initialize themselves to
es_opened_result_relations.  I would prefer that only
ExecInitResultRelation() add anything to es_opened_result_relations()
to avoid future maintenance problems.  Instead, a fix as simple as the
Hou-san's proposed fix would be to add a ExecCloseResultRelations()
call at the end of each of apply_handle_{insert|update|delete}.  That
would fix the originally reported leak, because
ExecCloseResultRelations() has this:

    /* Close any relations that have been opened by
ExecGetTriggerResultRel(). */
    foreach(l, estate->es_trig_target_relations)
    {

We do end up with the reality though that trigger.c now opens the
replication target relation on its own (adding it to
es_trig_target_relations) to fire its triggers.

I am also not opposed to reviewing the changes of 1375422c in light of
these findings while we still have time.  For example, it might
perhaps be nice for ExecInitResultRelation to accept a Relation
pointer that the callers from copyfrom.c, worker.c can use to pass
their locally opened Relation.  In that case, ExecInitResultRelation()
would perform InitResultRelInfo() with that Relation pointer, instead
of getting one using ExecGetRangeTableRelation() which is what causes
the Relation to be opened again.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Table refer leak in logical replication

From

Amit Langote

Date:

19 April 2021, 08:21:17

On Fri, Apr 16, 2021 at 3:24 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Apr 06, 2021 at 02:25:05PM +0900, Amit Langote wrote:
> > While updating the patch to do so, it occurred to me that maybe we
> > could move the ExecInitResultRelation() call into
> > create_estate_for_relation() too, in the spirit of removing
> > duplicative code.  See attached updated patch.  Actually I remember
> > proposing that as part of the commit you shared in your earlier email,
> > but for some reason it didn't end up in the commit.  I now think maybe
> > we should do that after all.
>
> So, I have been studying 1375422c and this thread.  Using
> ExecCloseRangeTableRelations() when cleaning up the executor state is
> reasonable as of the equivalent call to ExecInitRangeTable().  Now,
> there is something that itched me quite a lot while reviewing the
> proposed patch: ExecCloseResultRelations() is missing, but other
> code paths make an effort to mirror any calls of ExecInitRangeTable()
> with it.  Looking closer, I think that the worker code is actually
> confused with the handling of the opening and closing of the indexes
> needed by a result relation once you use that, because some code paths
> related to tuple routing for partitions may, or may not, need to do
> that.  However, once the code integrates with ExecInitRangeTable() and
> ExecCloseResultRelations(), the index handlings could get much simpler
> to understand as the internal apply routines for INSERT/UPDATE/DELETE
> have no need to think about the opening or closing them.  Well,
> mostly, to be honest.

To bring up another point, if we were to follow the spirit of the
recent c5b7ba4e67a, whereby we moved ExecOpenIndices() from
ExecInitModifyTable() into ExecInsert() and ExecUpdate(), that is,
from during the initialization phase of an INSERT/UPDATE to its actual
execution, we could even consider performing Exec{Open|Close}Indices()
for a replication target relation in
ExecSimpleRelation{Insert|Update}.  The ExecOpenIndices() performed in
worker.c is pointless in some cases, for example, when
ExecSimpleRelation{Insert|Update} end up skipping the insert/update of
a tuple due to BR triggers.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Table refer leak in logical replication

From

Amit Kapila

Date:

19 April 2021, 08:32:59

On Mon, Apr 19, 2021 at 1:51 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Fri, Apr 16, 2021 at 3:24 PM Michael Paquier <michael@paquier.xyz> wrote:
> > On Tue, Apr 06, 2021 at 02:25:05PM +0900, Amit Langote wrote:
> > > While updating the patch to do so, it occurred to me that maybe we
> > > could move the ExecInitResultRelation() call into
> > > create_estate_for_relation() too, in the spirit of removing
> > > duplicative code.  See attached updated patch.  Actually I remember
> > > proposing that as part of the commit you shared in your earlier email,
> > > but for some reason it didn't end up in the commit.  I now think maybe
> > > we should do that after all.
> >
> > So, I have been studying 1375422c and this thread.  Using
> > ExecCloseRangeTableRelations() when cleaning up the executor state is
> > reasonable as of the equivalent call to ExecInitRangeTable().  Now,
> > there is something that itched me quite a lot while reviewing the
> > proposed patch: ExecCloseResultRelations() is missing, but other
> > code paths make an effort to mirror any calls of ExecInitRangeTable()
> > with it.  Looking closer, I think that the worker code is actually
> > confused with the handling of the opening and closing of the indexes
> > needed by a result relation once you use that, because some code paths
> > related to tuple routing for partitions may, or may not, need to do
> > that.  However, once the code integrates with ExecInitRangeTable() and
> > ExecCloseResultRelations(), the index handlings could get much simpler
> > to understand as the internal apply routines for INSERT/UPDATE/DELETE
> > have no need to think about the opening or closing them.  Well,
> > mostly, to be honest.
>
> To bring up another point, if we were to follow the spirit of the
> recent c5b7ba4e67a, whereby we moved ExecOpenIndices() from
> ExecInitModifyTable() into ExecInsert() and ExecUpdate(), that is,
> from during the initialization phase of an INSERT/UPDATE to its actual
> execution, we could even consider performing Exec{Open|Close}Indices()
> for a replication target relation in
> ExecSimpleRelation{Insert|Update}.  The ExecOpenIndices() performed in
> worker.c is pointless in some cases, for example, when
> ExecSimpleRelation{Insert|Update} end up skipping the insert/update of
> a tuple due to BR triggers.
>

Yeah, that is also worth considering and sounds like a good idea. But,
as I mentioned before it might be better to consider this separately.

-- 
With Regards,
Amit Kapila.

Re: Table refer leak in logical replication

From

Amit Kapila

Date:

19 April 2021, 09:03:10

On Mon, Apr 19, 2021 at 12:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Sat, Apr 17, 2021 at 10:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Apr 16, 2021 at 11:55 AM Michael Paquier <michael@paquier.xyz> wrote:
> > > On Tue, Apr 06, 2021 at 02:25:05PM +0900, Amit Langote wrote:
> > >
> > > Attached is v5 that I am finishing with.  Much more could be done but
> > > I don't want to do something too invasive at this stage of the game.
> > > There are a couple of extra relations in terms of relations opened for
> > > a partitioned table within create_estate_for_relation() when
> > > redirecting to the tuple routing, but my guess is that this would be
> > > better in the long-term.
> > >
> >
> > Hmm, I am not sure if it is a good idea to open indexes needlessly
> > especially when it is not done in the previous code.
> >
> > @@ -1766,8 +1771,11 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
> >   slot_getallattrs(remoteslot);
> >   }
> >   MemoryContextSwitchTo(oldctx);
> > +
> > + ExecOpenIndices(partrelinfo_new, false);
> >   apply_handle_insert_internal(partrelinfo_new, estate,
> >   remoteslot_part);
> > + ExecCloseIndices(partrelinfo_new);
> >   }
> >
> > It seems you forgot to call open indexes before apply_handle_delete_internal.
> >
> > I am not sure if it is a good idea to do the refactoring related to
> > indexes or other things to fix a minor bug in commit 1375422c. It
> > might be better to add a simple fix like what Hou-San has originally
> > proposed [1] because even using ExecInitResultRelation might not be
> > the best thing as it is again trying to open a range table which we
> > have already opened in logicalrep_rel_open.
>
> FWIW, I agree with fixing this bug of 1375422c in as least scary
> manner as possible.  Hou-san proposed that we add the ResultRelInfo
> that apply_handle_{insert|update|delete} initialize themselves to
> es_opened_result_relations.  I would prefer that only
> ExecInitResultRelation() add anything to es_opened_result_relations()
> to avoid future maintenance problems.  Instead, a fix as simple as the
> Hou-san's proposed fix would be to add a ExecCloseResultRelations()
> call at the end of each of apply_handle_{insert|update|delete}.
>

Yeah, that will work too but might look a bit strange. BTW, how that
is taken care of for ExecuteTruncateGuts? I mean we do add rels there
like Hou-San's patch without calling ExecCloseResultRelations, the
rels are probably closed when we close the relation in worker.c but
what about memory for the list?

-- 
With Regards,
Amit Kapila.

Re: Table refer leak in logical replication

From

Amit Langote

Date:

19 April 2021, 09:27:33

On Mon, Apr 19, 2021 at 6:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Apr 19, 2021 at 12:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Sat, Apr 17, 2021 at 10:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Fri, Apr 16, 2021 at 11:55 AM Michael Paquier <michael@paquier.xyz> wrote:
> > > > Attached is v5 that I am finishing with.  Much more could be done but
> > > > I don't want to do something too invasive at this stage of the game.
> > > > There are a couple of extra relations in terms of relations opened for
> > > > a partitioned table within create_estate_for_relation() when
> > > > redirecting to the tuple routing, but my guess is that this would be
> > > > better in the long-term.
> > > >
> > >
> > > Hmm, I am not sure if it is a good idea to open indexes needlessly
> > > especially when it is not done in the previous code.
> > >
> > > @@ -1766,8 +1771,11 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
> > >   slot_getallattrs(remoteslot);
> > >   }
> > >   MemoryContextSwitchTo(oldctx);
> > > +
> > > + ExecOpenIndices(partrelinfo_new, false);
> > >   apply_handle_insert_internal(partrelinfo_new, estate,
> > >   remoteslot_part);
> > > + ExecCloseIndices(partrelinfo_new);
> > >   }
> > >
> > > It seems you forgot to call open indexes before apply_handle_delete_internal.
> > >
> > > I am not sure if it is a good idea to do the refactoring related to
> > > indexes or other things to fix a minor bug in commit 1375422c. It
> > > might be better to add a simple fix like what Hou-San has originally
> > > proposed [1] because even using ExecInitResultRelation might not be
> > > the best thing as it is again trying to open a range table which we
> > > have already opened in logicalrep_rel_open.
> >
> > FWIW, I agree with fixing this bug of 1375422c in as least scary
> > manner as possible.  Hou-san proposed that we add the ResultRelInfo
> > that apply_handle_{insert|update|delete} initialize themselves to
> > es_opened_result_relations.  I would prefer that only
> > ExecInitResultRelation() add anything to es_opened_result_relations()
> > to avoid future maintenance problems.  Instead, a fix as simple as the
> > Hou-san's proposed fix would be to add a ExecCloseResultRelations()
> > call at the end of each of apply_handle_{insert|update|delete}.
> >
>
> Yeah, that will work too but might look a bit strange. BTW, how that
> is taken care of for ExecuteTruncateGuts? I mean we do add rels there
> like Hou-San's patch without calling ExecCloseResultRelations, the
> rels are probably closed when we close the relation in worker.c but
> what about memory for the list?

It seems I had forgotten the code I had written myself.  The following
is from ExecuteTruncateGuts():

    /*
     * To fire triggers, we'll need an EState as well as a ResultRelInfo for
     * each relation.  We don't need to call ExecOpenIndices, though.
     *
     * We put the ResultRelInfos in the es_opened_result_relations list, even
     * though we don't have a range table and don't populate the
     * es_result_relations array.  That's a bit bogus, but it's enough to make
     * ExecGetTriggerResultRel() find them.
     */
    estate = CreateExecutorState();
    resultRelInfos = (ResultRelInfo *)
        palloc(list_length(rels) * sizeof(ResultRelInfo));
    resultRelInfo = resultRelInfos;
    foreach(cell, rels)
    {
        Relation    rel = (Relation) lfirst(cell);

        InitResultRelInfo(resultRelInfo,
                          rel,
                          0,    /* dummy rangetable index */
                          NULL,
                          0);
        estate->es_opened_result_relations =
            lappend(estate->es_opened_result_relations, resultRelInfo);
        resultRelInfo++;
    }

So, that is exactly what Hou-san's patch did.  Although, the comment
does admit that doing this is a bit bogus and maybe written (by Heikki
IIRC) as a caution against repeating the pattern.


--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

19 April 2021, 09:32:32

On Mon, Apr 19, 2021 at 02:33:10PM +0530, Amit Kapila wrote:
> On Mon, Apr 19, 2021 at 12:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> FWIW, I agree with fixing this bug of 1375422c in as least scary
>> manner as possible.  Hou-san proposed that we add the ResultRelInfo
>> that apply_handle_{insert|update|delete} initialize themselves to
>> es_opened_result_relations.  I would prefer that only
>> ExecInitResultRelation() add anything to es_opened_result_relations()
>> to avoid future maintenance problems.  Instead, a fix as simple as the
>> Hou-san's proposed fix would be to add a ExecCloseResultRelations()
>> call at the end of each of apply_handle_{insert|update|delete}.
>
> Yeah, that will work too but might look a bit strange. BTW, how that
> is taken care of for ExecuteTruncateGuts? I mean we do add rels there
> like Hou-San's patch without calling ExecCloseResultRelations, the
> rels are probably closed when we close the relation in worker.c but
> what about memory for the list?

TRUNCATE relies on FreeExecutorState() for that, no?  FWIW, I'd rather
agree to use what has been proposed with es_opened_result_relations
like TRUNCATE does rather than attempt to use ExecInitResultRelation()
combined with potentially asymmetric calls to
ExecCloseResultRelations().
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Kapila

Date:

19 April 2021, 09:42:31

On Mon, Apr 19, 2021 at 3:02 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Apr 19, 2021 at 02:33:10PM +0530, Amit Kapila wrote:
> > On Mon, Apr 19, 2021 at 12:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >> FWIW, I agree with fixing this bug of 1375422c in as least scary
> >> manner as possible.  Hou-san proposed that we add the ResultRelInfo
> >> that apply_handle_{insert|update|delete} initialize themselves to
> >> es_opened_result_relations.  I would prefer that only
> >> ExecInitResultRelation() add anything to es_opened_result_relations()
> >> to avoid future maintenance problems.  Instead, a fix as simple as the
> >> Hou-san's proposed fix would be to add a ExecCloseResultRelations()
> >> call at the end of each of apply_handle_{insert|update|delete}.
> >
> > Yeah, that will work too but might look a bit strange. BTW, how that
> > is taken care of for ExecuteTruncateGuts? I mean we do add rels there
> > like Hou-San's patch without calling ExecCloseResultRelations, the
> > rels are probably closed when we close the relation in worker.c but
> > what about memory for the list?
>
> TRUNCATE relies on FreeExecutorState() for that, no?
>

I am not sure about that because it doesn't seem to be allocated in
es_query_cxt. Note, we switch to oldcontext in the
CreateExecutorState.

-- 
With Regards,
Amit Kapila.

Re: Table refer leak in logical replication

From

Amit Kapila

Date:

19 April 2021, 09:55:15

On Mon, Apr 19, 2021 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 19, 2021 at 3:02 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Mon, Apr 19, 2021 at 02:33:10PM +0530, Amit Kapila wrote:
> > > On Mon, Apr 19, 2021 at 12:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > >> FWIW, I agree with fixing this bug of 1375422c in as least scary
> > >> manner as possible.  Hou-san proposed that we add the ResultRelInfo
> > >> that apply_handle_{insert|update|delete} initialize themselves to
> > >> es_opened_result_relations.  I would prefer that only
> > >> ExecInitResultRelation() add anything to es_opened_result_relations()
> > >> to avoid future maintenance problems.  Instead, a fix as simple as the
> > >> Hou-san's proposed fix would be to add a ExecCloseResultRelations()
> > >> call at the end of each of apply_handle_{insert|update|delete}.
> > >
> > > Yeah, that will work too but might look a bit strange. BTW, how that
> > > is taken care of for ExecuteTruncateGuts? I mean we do add rels there
> > > like Hou-San's patch without calling ExecCloseResultRelations, the
> > > rels are probably closed when we close the relation in worker.c but
> > > what about memory for the list?
> >
> > TRUNCATE relies on FreeExecutorState() for that, no?
> >
>
> I am not sure about that because it doesn't seem to be allocated in
> es_query_cxt. Note, we switch to oldcontext in the
> CreateExecutorState.
>

I have just checked that the memory for the list is allocated in
ApplyMessageContext. So, it appears a memory leak to me unless I am
missing something.

-- 
With Regards,
Amit Kapila.

Re: Table refer leak in logical replication

From

Amit Kapila

Date:

19 April 2021, 09:59:49

On Mon, Apr 19, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 19, 2021 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Apr 19, 2021 at 3:02 PM Michael Paquier <michael@paquier.xyz> wrote:
> > >
> > > On Mon, Apr 19, 2021 at 02:33:10PM +0530, Amit Kapila wrote:
> > > > On Mon, Apr 19, 2021 at 12:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > >> FWIW, I agree with fixing this bug of 1375422c in as least scary
> > > >> manner as possible.  Hou-san proposed that we add the ResultRelInfo
> > > >> that apply_handle_{insert|update|delete} initialize themselves to
> > > >> es_opened_result_relations.  I would prefer that only
> > > >> ExecInitResultRelation() add anything to es_opened_result_relations()
> > > >> to avoid future maintenance problems.  Instead, a fix as simple as the
> > > >> Hou-san's proposed fix would be to add a ExecCloseResultRelations()
> > > >> call at the end of each of apply_handle_{insert|update|delete}.
> > > >
> > > > Yeah, that will work too but might look a bit strange. BTW, how that
> > > > is taken care of for ExecuteTruncateGuts? I mean we do add rels there
> > > > like Hou-San's patch without calling ExecCloseResultRelations, the
> > > > rels are probably closed when we close the relation in worker.c but
> > > > what about memory for the list?
> > >
> > > TRUNCATE relies on FreeExecutorState() for that, no?
> > >
> >
> > I am not sure about that because it doesn't seem to be allocated in
> > es_query_cxt. Note, we switch to oldcontext in the
> > CreateExecutorState.
> >
>
> I have just checked that the memory for the list is allocated in
> ApplyMessageContext. So, it appears a memory leak to me unless I am
> missing something.
>

It seems like the memory will be freed after we apply the truncate
because we reset the ApplyMessageContext after applying each message,
so maybe we don't need to bother about it.

-- 
With Regards,
Amit Kapila.

Re: Table refer leak in logical replication

From

Amit Langote

Date:

19 April 2021, 11:09:05

On Mon, Apr 19, 2021 at 7:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Apr 19, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Mon, Apr 19, 2021 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Mon, Apr 19, 2021 at 3:02 PM Michael Paquier <michael@paquier.xyz> wrote:
> > > > On Mon, Apr 19, 2021 at 02:33:10PM +0530, Amit Kapila wrote:
> > > > > On Mon, Apr 19, 2021 at 12:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > >> FWIW, I agree with fixing this bug of 1375422c in as least scary
> > > > >> manner as possible.  Hou-san proposed that we add the ResultRelInfo
> > > > >> that apply_handle_{insert|update|delete} initialize themselves to
> > > > >> es_opened_result_relations.  I would prefer that only
> > > > >> ExecInitResultRelation() add anything to es_opened_result_relations()
> > > > >> to avoid future maintenance problems.  Instead, a fix as simple as the
> > > > >> Hou-san's proposed fix would be to add a ExecCloseResultRelations()
> > > > >> call at the end of each of apply_handle_{insert|update|delete}.
> > > > >
> > > > > Yeah, that will work too but might look a bit strange. BTW, how that
> > > > > is taken care of for ExecuteTruncateGuts? I mean we do add rels there
> > > > > like Hou-San's patch without calling ExecCloseResultRelations, the
> > > > > rels are probably closed when we close the relation in worker.c but
> > > > > what about memory for the list?
> > > >
> > > > TRUNCATE relies on FreeExecutorState() for that, no?
> > > >
> > >
> > > I am not sure about that because it doesn't seem to be allocated in
> > > es_query_cxt. Note, we switch to oldcontext in the
> > > CreateExecutorState.
> > >
> >
> > I have just checked that the memory for the list is allocated in
> > ApplyMessageContext. So, it appears a memory leak to me unless I am
> > missing something.
> >
>
> It seems like the memory will be freed after we apply the truncate
> because we reset the ApplyMessageContext after applying each message,
> so maybe we don't need to bother about it.

Yes, ApplyMessageContext seems short-lived enough for this not to
matter.  In the case of ExecuteTruncateGuts(), it's PortalContext, but
we don't seem to bother about leaking into that one too.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

19 April 2021, 11:50:21

On Mon, Apr 19, 2021 at 08:09:05PM +0900, Amit Langote wrote:
> On Mon, Apr 19, 2021 at 7:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> It seems like the memory will be freed after we apply the truncate
>> because we reset the ApplyMessageContext after applying each message,
>> so maybe we don't need to bother about it.
>
> Yes, ApplyMessageContext seems short-lived enough for this not to
> matter.  In the case of ExecuteTruncateGuts(), it's PortalContext, but
> we don't seem to bother about leaking into that one too.

Sorry for the dump question because I have not studied this part of
the code in any extensive way, but how many changes at maximum can be
applied within a single ApplyMessageContext?  I am wondering if we
could run into problems depending on the number of relations touched
within a single message, or if there are any patches that could run
into problems because of this limitation, meaning that we may want to
add a proper set of comments within this area to document the
limitations attached to a DML operation applied.
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Kapila

Date:

19 April 2021, 11:57:02

On Mon, Apr 19, 2021 at 5:20 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Apr 19, 2021 at 08:09:05PM +0900, Amit Langote wrote:
> > On Mon, Apr 19, 2021 at 7:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> It seems like the memory will be freed after we apply the truncate
> >> because we reset the ApplyMessageContext after applying each message,
> >> so maybe we don't need to bother about it.
> >
> > Yes, ApplyMessageContext seems short-lived enough for this not to
> > matter.  In the case of ExecuteTruncateGuts(), it's PortalContext, but
> > we don't seem to bother about leaking into that one too.
>
> Sorry for the dump question because I have not studied this part of
> the code in any extensive way, but how many changes at maximum can be
> applied within a single ApplyMessageContext?
>

It is one change for Insert/UpdateDelete. See apply_dispatch() for
different change messages.

-- 
With Regards,
Amit Kapila.

Re: Table refer leak in logical replication

From

Amit Langote

Date:

19 April 2021, 12:44:05

On Mon, Apr 19, 2021 at 6:32 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Apr 19, 2021 at 02:33:10PM +0530, Amit Kapila wrote:
> > On Mon, Apr 19, 2021 at 12:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >> FWIW, I agree with fixing this bug of 1375422c in as least scary
> >> manner as possible.  Hou-san proposed that we add the ResultRelInfo
> >> that apply_handle_{insert|update|delete} initialize themselves to
> >> es_opened_result_relations.  I would prefer that only
> >> ExecInitResultRelation() add anything to es_opened_result_relations()
> >> to avoid future maintenance problems.  Instead, a fix as simple as the
> >> Hou-san's proposed fix would be to add a ExecCloseResultRelations()
> >> call at the end of each of apply_handle_{insert|update|delete}.
> >
> > Yeah, that will work too but might look a bit strange. BTW, how that
> > is taken care of for ExecuteTruncateGuts? I mean we do add rels there
> > like Hou-San's patch without calling ExecCloseResultRelations, the
> > rels are probably closed when we close the relation in worker.c but
> > what about memory for the list?
>
> ... FWIW, I'd rather
> agree to use what has been proposed with es_opened_result_relations
> like TRUNCATE does rather than attempt to use ExecInitResultRelation()
> combined with potentially asymmetric calls to
> ExecCloseResultRelations().

Okay, how about the attached then?  I decided to go with just
finish_estate(), because we no longer have to do anything relation
specific there.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

fix_relcache_leak_in_lrworker_v6.patch

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

20 April 2021, 05:09:18

On Mon, Apr 19, 2021 at 09:44:05PM +0900, Amit Langote wrote:
> Okay, how about the attached then?

create_estate_for_relation() returns an extra resultRelInfo that's
also saved within es_opened_result_relations.  Wouldn't is be simpler
to take the first element from es_opened_result_relations instead?
Okay, that's a nit and you are documenting things in a sufficient way,
but that just seemed duplicated to me.

> I decided to go with just
> finish_estate(), because we no longer have to do anything relation
> specific there.

Fine by me.
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Langote

Date:

20 April 2021, 05:48:35

Thanks for taking a look.

On Tue, Apr 20, 2021 at 2:09 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Apr 19, 2021 at 09:44:05PM +0900, Amit Langote wrote:
> > Okay, how about the attached then?
>
> create_estate_for_relation() returns an extra resultRelInfo that's
> also saved within es_opened_result_relations.  Wouldn't is be simpler
> to take the first element from es_opened_result_relations instead?
> Okay, that's a nit and you are documenting things in a sufficient way,
> but that just seemed duplicated to me.

Manipulating the contents of es_opened_result_relations directly in
worker.c is admittedly a "hack", which I am reluctant to have other
places participating in.  As originally designed, that list is to
speed up ExecCloseResultRelations(), not as a place to access result
relations from.  The result relations targeted over the course of
execution of a query (update/delete) or a (possibly multi-tuple in the
future) replication apply operation will not be guaranteed to be added
to the list in any particular order, so assuming where a result
relation of interest can be found in the list is bound to be unstable.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

20 April 2021, 07:21:57

On Tue, Apr 20, 2021 at 02:48:35PM +0900, Amit Langote wrote:
> Manipulating the contents of es_opened_result_relations directly in
> worker.c is admittedly a "hack", which I am reluctant to have other
> places participating in.  As originally designed, that list is to
> speed up ExecCloseResultRelations(), not as a place to access result
> relations from.  The result relations targeted over the course of
> execution of a query (update/delete) or a (possibly multi-tuple in the
> future) replication apply operation will not be guaranteed to be added
> to the list in any particular order, so assuming where a result
> relation of interest can be found in the list is bound to be unstable.

I really hope that this code gets heavily reorganized before
considering more features or more manipulations of dependencies within
the relations used for any apply operations.  From what I can
understand, I think that it would be really nicer and less bug prone
to have a logic like COPY FROM, where we'd rely on a set of
ExecInitResultRelation() with one final ExecCloseResultRelations(),
and as bonus it should be possible to not have to do any business with
ExecOpenIndices() or ExecCloseIndices() as part of worker.c.  Anyway,
I also understand that we do with what we have now in this code, so I
am fine to live with this patch as of 14.
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Langote

Date:

20 April 2021, 08:51:58

On Tue, Apr 20, 2021 at 4:22 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Apr 20, 2021 at 02:48:35PM +0900, Amit Langote wrote:
> > Manipulating the contents of es_opened_result_relations directly in
> > worker.c is admittedly a "hack", which I am reluctant to have other
> > places participating in.  As originally designed, that list is to
> > speed up ExecCloseResultRelations(), not as a place to access result
> > relations from.  The result relations targeted over the course of
> > execution of a query (update/delete) or a (possibly multi-tuple in the
> > future) replication apply operation will not be guaranteed to be added
> > to the list in any particular order, so assuming where a result
> > relation of interest can be found in the list is bound to be unstable.
>
> I really hope that this code gets heavily reorganized before
> considering more features or more manipulations of dependencies within
> the relations used for any apply operations.  From what I can
> understand, I think that it would be really nicer and less bug prone
> to have a logic like COPY FROM, where we'd rely on a set of
> ExecInitResultRelation() with one final ExecCloseResultRelations(),
> and as bonus it should be possible to not have to do any business with
> ExecOpenIndices() or ExecCloseIndices() as part of worker.c.

As pointed out by Amit K, a problem with using
ExecInitResultRelation() in both copyfrom.c and worker.c is that it
effectively ignores the Relation pointer that's already been acquired
by other parts of the code.  Upthread [1], I proposed that we add a
Relation pointer argument to ExecInitResultRelation() so that the
callers that are not interested in setting up es_range_table, but only
es_result_relations, can do so.

BTW, I tend to agree that ExecCloseIndices() is better only done in
ExecCloseResultRelations(), but...

>  Anyway,
> I also understand that we do with what we have now in this code, so I
> am fine to live with this patch as of 14.

...IIUC, Amit K prefers starting another thread for other improvements
on top of 1375422c.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

[1]
https://www.postgresql.org/message-id/CA%2BHiwqF%2Bq3MyGqLvGdC%2BJk5Xx%3DJpwpR-m5moXN%2Baf-LC-RMvdw%40mail.gmail.com

RE: Table refer leak in logical replication

From

"houzj.fnst@fujitsu.com"

Date:

20 April 2021, 11:29:43

> > ... FWIW, I'd rather
> > agree to use what has been proposed with es_opened_result_relations
> > like TRUNCATE does rather than attempt to use ExecInitResultRelation()
> > combined with potentially asymmetric calls to
> > ExecCloseResultRelations().
> 
> Okay, how about the attached then?  I decided to go with just finish_estate(),
> because we no longer have to do anything relation specific there.
> 

I think the patch looks good.
But I noticed that there seems no testcase to test the [aftertrigger in subscriber] when using logical replication.
As we seems planned to do some further refactor in the future, Is it better to add one testcase to cover this code ?

Best regards,
houzj

Re: Table refer leak in logical replication

From

Amit Kapila

Date:

20 April 2021, 12:50:03

On Tue, Apr 20, 2021 at 4:59 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> > > ... FWIW, I'd rather
> > > agree to use what has been proposed with es_opened_result_relations
> > > like TRUNCATE does rather than attempt to use ExecInitResultRelation()
> > > combined with potentially asymmetric calls to
> > > ExecCloseResultRelations().
> >
> > Okay, how about the attached then?  I decided to go with just finish_estate(),
> > because we no longer have to do anything relation specific there.
> >
>
> I think the patch looks good.
> But I noticed that there seems no testcase to test the [aftertrigger in subscriber] when using logical replication.
> As we seems planned to do some further refactor in the future, Is it better to add one testcase to cover this code ?
>

+1. I think it makes sense to add a test case especially because we
don't have any existing test in this area.


-- 
With Regards,
Amit Kapila.

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

21 April 2021, 00:31:34

On Tue, Apr 20, 2021 at 06:20:03PM +0530, Amit Kapila wrote:
> +1. I think it makes sense to add a test case especially because we
> don't have any existing test in this area.

Yes, let's add add something into 013_partition.pl within both
subscriber1 and subscriber2.   This will not catch up the relation
leak, but it is better to make sure that the trigger is fired as we'd
like to expect.  This will become helpful if this code gets refactored
or changed in the future.  What about adding an extra table inserted
into by the trigger itself?  If I were to design that, I would insert
the following information that gets checked by a simple psql call once
the changes are applied in the subscriber: relation name, TG_WHEN,
TG_OP and TG_LEVEL.  So such a table would need at least 4 columns.
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Langote

Date:

21 April 2021, 02:13:06

On Wed, Apr 21, 2021 at 9:31 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Apr 20, 2021 at 06:20:03PM +0530, Amit Kapila wrote:
> > +1. I think it makes sense to add a test case especially because we
> > don't have any existing test in this area.
>
> Yes, let's add add something into 013_partition.pl within both
> subscriber1 and subscriber2.   This will not catch up the relation
> leak, but it is better to make sure that the trigger is fired as we'd
> like to expect.  This will become helpful if this code gets refactored
> or changed in the future.  What about adding an extra table inserted
> into by the trigger itself?  If I were to design that, I would insert
> the following information that gets checked by a simple psql call once
> the changes are applied in the subscriber: relation name, TG_WHEN,
> TG_OP and TG_LEVEL.  So such a table would need at least 4 columns.

Agree about adding tests along these lines.  Will post in a bit.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

21 April 2021, 03:18:14

On Wed, Apr 21, 2021 at 11:13:06AM +0900, Amit Langote wrote:
> Agree about adding tests along these lines.  Will post in a bit.

Thanks!
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Langote

Date:

21 April 2021, 07:21:52

On Wed, Apr 21, 2021 at 11:13 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Apr 21, 2021 at 9:31 AM Michael Paquier <michael@paquier.xyz> wrote:
> > On Tue, Apr 20, 2021 at 06:20:03PM +0530, Amit Kapila wrote:
> > > +1. I think it makes sense to add a test case especially because we
> > > don't have any existing test in this area.
> >
> > Yes, let's add add something into 013_partition.pl within both
> > subscriber1 and subscriber2.   This will not catch up the relation
> > leak, but it is better to make sure that the trigger is fired as we'd
> > like to expect.  This will become helpful if this code gets refactored
> > or changed in the future.  What about adding an extra table inserted
> > into by the trigger itself?  If I were to design that, I would insert
> > the following information that gets checked by a simple psql call once
> > the changes are applied in the subscriber: relation name, TG_WHEN,
> > TG_OP and TG_LEVEL.  So such a table would need at least 4 columns.
>
> Agree about adding tests along these lines.  Will post in a bit.

Here you go.

So I had started last night by adding some tests for this in
003_constraints.pl because there are already some replica BR trigger
tests there.  I like your suggestion to have some tests around
partitions, so added some in 013_partition.pl too.  Let me know what
you think.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

fix_relcache_leak_in_lrworker_v7.patch

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

21 April 2021, 10:38:13

On Wed, Apr 21, 2021 at 04:21:52PM +0900, Amit Langote wrote:
> So I had started last night by adding some tests for this in
> 003_constraints.pl because there are already some replica BR trigger
> tests there.  I like your suggestion to have some tests around
> partitions, so added some in 013_partition.pl too.  Let me know what
> you think.

Thanks, cool!

+        IF (NEW.bid = 4 AND NEW.id = OLD.id) THEN
+           RETURN NEW;
+        ELSE
+            RETURN NULL;
+        END IF;
Nit: the indentation is a bit off here.

+CREATE FUNCTION log_tab_fk_ref_upd() RETURNS TRIGGER AS \$\$
+BEGIN
+    CREATE TABLE IF NOT EXISTS public.tab_fk_ref_op_log (tgtab text,
tgop text, tgwhen text, tglevel text, oldbid int, newbid int);
+    INSERT INTO public.tab_fk_ref_op_log SELECT TG_RELNAME, TG_OP,
TG_WHEN, TG_LEVEL, OLD.bid, NEW.bid;
+    RETURN NULL;
+END;
Let's use only one function here, then you can just do something like
that and use NEW and OLD as you wish conditionally:
IF (TG_OP = 'INSERT') THEN
  INSERT INTO tab_fk_ref_op_log blah;
ELSIF (TG_OP = 'UPDATE') THEN
  INSERT INTO tab_fk_ref_op_log blah_;
END IF;

The same remark applies to the two files where the tests have been
introduced.

Why don't you create the table beforehand rather than making it part
of the trigger function?

+CREATE TRIGGER tab_fk_ref_log_ins_after_trg
[...]
+CREATE TRIGGER tab_fk_ref_log_upd_after_trg
No need for two triggers either once there is only one function.

+   "SELECT * FROM tab_fk_ref_op_log ORDER BY tgop, newbid;");
I would add tgtab and tgwhen to the ORDER BY here, just to be on the
safe side, and apply the same rule everywhere.  Your patch is already
consistent regarding that and help future debugging, that's good.
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Langote

Date:

21 April 2021, 12:58:10

On Wed, Apr 21, 2021 at 7:38 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Apr 21, 2021 at 04:21:52PM +0900, Amit Langote wrote:
> > So I had started last night by adding some tests for this in
> > 003_constraints.pl because there are already some replica BR trigger
> > tests there.  I like your suggestion to have some tests around
> > partitions, so added some in 013_partition.pl too.  Let me know what
> > you think.
>
> Thanks, cool!

Thanks for looking.

> +        IF (NEW.bid = 4 AND NEW.id = OLD.id) THEN
> +           RETURN NEW;
> +        ELSE
> +            RETURN NULL;
> +        END IF;
> Nit: the indentation is a bit off here.

Hmm, I checked that I used 4 spaces for indenting, but maybe you're
concerned that the whole thing is indented unnecessarily relative to
the parent ELSIF block?

> +CREATE FUNCTION log_tab_fk_ref_upd() RETURNS TRIGGER AS \$\$
> +BEGIN
> +    CREATE TABLE IF NOT EXISTS public.tab_fk_ref_op_log (tgtab text,
> tgop text, tgwhen text, tglevel text, oldbid int, newbid int);
> +    INSERT INTO public.tab_fk_ref_op_log SELECT TG_RELNAME, TG_OP,
> TG_WHEN, TG_LEVEL, OLD.bid, NEW.bid;
> +    RETURN NULL;
> +END;
> Let's use only one function here, then you can just do something like
> that and use NEW and OLD as you wish conditionally:
> IF (TG_OP = 'INSERT') THEN
>   INSERT INTO tab_fk_ref_op_log blah;
> ELSIF (TG_OP = 'UPDATE') THEN
>   INSERT INTO tab_fk_ref_op_log blah_;
> END IF;
>
> The same remark applies to the two files where the tests have been
> introduced.

That's certainly better with fewer lines.

> Why don't you create the table beforehand rather than making it part
> of the trigger function?

Makes sense too.

> +CREATE TRIGGER tab_fk_ref_log_ins_after_trg
> [...]
> +CREATE TRIGGER tab_fk_ref_log_upd_after_trg
> No need for two triggers either once there is only one function.

Right.

> +   "SELECT * FROM tab_fk_ref_op_log ORDER BY tgop, newbid;");
> I would add tgtab and tgwhen to the ORDER BY here, just to be on the
> safe side, and apply the same rule everywhere.  Your patch is already
> consistent regarding that and help future debugging, that's good.

Okay, done.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

fix_relcache_leak_in_lrworker_v8.patch

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

22 April 2021, 04:45:36

On Wed, Apr 21, 2021 at 09:58:10PM +0900, Amit Langote wrote:
> Okay, done.

So, I have been working on that today, and tried to apply the full set
before realizing when writing the commit message that this was a set
of bullet points, and that this was too much for a single commit.  The
tests are a nice thing to have to improve the coverage related to
tuple routing, but that these are not really mandatory for the sake of
the fix discussed here.  So for now I have applied the main fix as of
f3b141c to close the open item.

Now..  Coming back to the tests.

-        RETURN NULL;
+        IF (NEW.bid = 4 AND NEW.id = OLD.id) THEN
+           RETURN NEW;
+        ELSE
+            RETURN NULL;
+        END IF
This part added in test 003 is subtle.  This is a tweak to make sure
that the second trigger, AFTER trigger added in this patch, that would
be fired after the already-existing BEFORE entry, gets its hands on
the NEW tuple values.  I think that this makes the test more confusing
than it should, and that could cause unnecessary pain to understand
what's going on here for a future reader.  Anyway, what's actually
the coverage we gain with this extra trigger in 003?  The tests of
HEAD make already sure that if a trigger fires or not, so that seems
sufficient in itself.  I guess that we could replace the existing
BEFORE trigger with something like what's proposed in this set to
track precisely which and when an operation happens on a relation with
a NEW and/or OLD set of tuples saved into this extra table, but the
interest looks limited for single relations.

On the other hand, the tests for partitions have much more value IMO,
but looking closely I think that we can do better:
- Create triggers on more relations of the partition tree,
particularly to also check when a trigger does not fire.
- Use a more generic name for tab1_2_op_log and its function
log_tab1_2_op(), say subs{1,2}_log_trigger_activity.
- Create some extra BEFORE triggers perhaps?

By the way, I had an idea of trick we could use to check if relations
do not leak: we could scan the logs for this pattern patterns,
similarly to what issues_sql_like() or connect_{fails,ok}() do
already, but that would mean increasing the log level and we don't do
that to ease the load of the nodes.
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Langote

Date:

23 April 2021, 12:38:01

On Thu, Apr 22, 2021 at 1:45 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Apr 21, 2021 at 09:58:10PM +0900, Amit Langote wrote:
> > Okay, done.
>
> So, I have been working on that today, and tried to apply the full set
> before realizing when writing the commit message that this was a set
> of bullet points, and that this was too much for a single commit.  The
> tests are a nice thing to have to improve the coverage related to
> tuple routing, but that these are not really mandatory for the sake of
> the fix discussed here.  So for now I have applied the main fix as of
> f3b141c to close the open item.

Thanks for that.

> Now..  Coming back to the tests.
>
> -        RETURN NULL;
> +        IF (NEW.bid = 4 AND NEW.id = OLD.id) THEN
> +           RETURN NEW;
> +        ELSE
> +            RETURN NULL;
> +        END IF
> This part added in test 003 is subtle.  This is a tweak to make sure
> that the second trigger, AFTER trigger added in this patch, that would
> be fired after the already-existing BEFORE entry, gets its hands on
> the NEW tuple values.  I think that this makes the test more confusing
> than it should, and that could cause unnecessary pain to understand
> what's going on here for a future reader.  Anyway, what's actually
> the coverage we gain with this extra trigger in 003?

Not much maybe.  I am fine with dropping the changes made to 003 if
they are confusing, which I agree they can be.

> On the other hand, the tests for partitions have much more value IMO,
> but looking closely I think that we can do better:
> - Create triggers on more relations of the partition tree,
> particularly to also check when a trigger does not fire.

It seems you're suggesting to adopt 003's trigger firing tests for
partitions in 013, but would we gain much by that?

> - Use a more generic name for tab1_2_op_log and its function
> log_tab1_2_op(), say subs{1,2}_log_trigger_activity.

Sure, done.

> - Create some extra BEFORE triggers perhaps?

Again, that seems separate from what we're trying to do here.  AIUI,
our aim here is to expand coverage for after triggers, and not really
that of the trigger functionality proper, because nothing seems broken
about it, but that of the trigger target relation opening/closing.
ISTM that's what you're talking about below...

> By the way, I had an idea of trick we could use to check if relations
> do not leak: we could scan the logs for this pattern patterns,

It would be interesting to be able to do something like that, but....

> similarly to what issues_sql_like() or connect_{fails,ok}() do
> already, but that would mean increasing the log level and we don't do
> that to ease the load of the nodes.

...sorry, I am not very familiar with our Perl testing infra.  Is
there some script that already does something like this?  For example,
checking in the logs generated by a "node" that no instance of a
certain WARNING is logged?

Meanwhile, attached is the updated version containing some of the
changes mentioned above.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

fix_relcache_leak_in_lrworker_v9.patch

Re: Table refer leak in logical replication

From

Michael Paquier

Date:

26 April 2021, 06:27:32

On Fri, Apr 23, 2021 at 09:38:01PM +0900, Amit Langote wrote:
> On Thu, Apr 22, 2021 at 1:45 PM Michael Paquier <michael@paquier.xyz> wrote:
>> On the other hand, the tests for partitions have much more value IMO,
>> but looking closely I think that we can do better:
>> - Create triggers on more relations of the partition tree,
>> particularly to also check when a trigger does not fire.
>
> It seems you're suggesting to adopt 003's trigger firing tests for
> partitions in 013, but would we gain much by that?

I was suggesting the opposite, aka apply the trigger design that you
are introducing in 013 within 003.  But that may not be necessary
either :)

>> - Use a more generic name for tab1_2_op_log and its function
>> log_tab1_2_op(), say subs{1,2}_log_trigger_activity.
>
> Sure, done.

At the end I have used something simpler, as of
sub{1,2}_trigger_activity and sub{1,2}_trigger_activity_func.

>> - Create some extra BEFORE triggers perhaps?
>
> Again, that seems separate from what we're trying to do here.  AIUI,
> our aim here is to expand coverage for after triggers, and not really
> that of the trigger functionality proper, because nothing seems broken
> about it, but that of the trigger target relation opening/closing.
> ISTM that's what you're talking about below...

Exactly.  My review of the worker code is make me feeling that
reworking this code could easily lead to some incorrect behavior, so
I'd rather be careful with a couple of extra triggers created across
the partition tree, down to the partitions on which the triggers are
fired actually.

>> similarly to what issues_sql_like() or connect_{fails,ok}() do
>> already, but that would mean increasing the log level and we don't do
>> that to ease the load of the nodes.
>
> ...sorry, I am not very familiar with our Perl testing infra.  Is
> there some script that already does something like this?  For example,
> checking in the logs generated by a "node" that no instance of a
> certain WARNING is logged?

See for example how we test for SQL patterns with the backend logs
using issues_sql_like(), or the more recent connect_ok() and
connect_fails().  This functions match regexps with the logs of the
backend for patterns.  I am not sure if that's worth the extra cycles
though.  I also feel that we may want a more centralized place as well
to check such things, with more matching patterns, like at the end of
one run on a set of log files?

So, after a set of edits related to the format of the SQL queries, the
object names and some indentation (including a perltidy run), I have
applied this patch to close the loop.
--
Michael

Attachment

signature.asc

Re: Table refer leak in logical replication

From

Amit Langote

Date:

26 April 2021, 08:45:04

On Mon, Apr 26, 2021 at 3:27 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Apr 23, 2021 at 09:38:01PM +0900, Amit Langote wrote:
> > On Thu, Apr 22, 2021 at 1:45 PM Michael Paquier <michael@paquier.xyz> wrote:
> >> On the other hand, the tests for partitions have much more value IMO,
> >> but looking closely I think that we can do better:
> >> - Create triggers on more relations of the partition tree,
> >> particularly to also check when a trigger does not fire.
> >
> > It seems you're suggesting to adopt 003's trigger firing tests for
> > partitions in 013, but would we gain much by that?
>
> I was suggesting the opposite, aka apply the trigger design that you
> are introducing in 013 within 003.  But that may not be necessary
> either :)

You mentioned adding "triggers on more relations of the partition
trees", so I thought you were talking about 013; 003 doesn't test
partitioning at all at the moment.

> >> - Create some extra BEFORE triggers perhaps?
> >
> > Again, that seems separate from what we're trying to do here.  AIUI,
> > our aim here is to expand coverage for after triggers, and not really
> > that of the trigger functionality proper, because nothing seems broken
> > about it, but that of the trigger target relation opening/closing.
> > ISTM that's what you're talking about below...
>
> Exactly.  My review of the worker code is make me feeling that
> reworking this code could easily lead to some incorrect behavior, so
> I'd rather be careful with a couple of extra triggers created across
> the partition tree, down to the partitions on which the triggers are
> fired actually.

Ah, okay.  You are talking about improving the coverage in general,
NOT in the context of the fix committed in f3b141c482552.

However, note that BEFORE triggers work the same no matter whether the
target relation is directly mentioned in the apply message or found as
a result of tuple routing.  That's because the routines
execReplication.c (like ExecSimpleRelationInsert) and in
nodeModifyTable.c (like ExecInsert) pass the ResultRelInfo *directly*
to the BR trigger.c routines.   So, there's no need for the complexity
of the code and data structures for looking up trigger target
relations, such as what AFTER triggers need --
ExecGetTargetResultRel().  Given that, it's understandable to have
more coverage for the AFTER trigger case like that added by the patch
you just committed.

> >> similarly to what issues_sql_like() or connect_{fails,ok}() do
> >> already, but that would mean increasing the log level and we don't do
> >> that to ease the load of the nodes.
> >
> > ...sorry, I am not very familiar with our Perl testing infra.  Is
> > there some script that already does something like this?  For example,
> > checking in the logs generated by a "node" that no instance of a
> > certain WARNING is logged?
>
> See for example how we test for SQL patterns with the backend logs
> using issues_sql_like(), or the more recent connect_ok() and
> connect_fails().  This functions match regexps with the logs of the
> backend for patterns.

So I assume those pattern-matching functions would catch, for example,
relation leak warnings in case they get introduced later, right?  If
so, I can see the merit of trying the idea.

>  I am not sure if that's worth the extra cycles
> though.  I also feel that we may want a more centralized place as well
> to check such things, with more matching patterns, like at the end of
> one run on a set of log files?

I guess that makes sense.

> So, after a set of edits related to the format of the SQL queries, the
> object names and some indentation (including a perltidy run), I have
> applied this patch to close the loop.

Thanks a lot.

-- 
Amit Langote
EDB: http://www.enterprisedb.com