RE: Logical Replica ReorderBuffer Size Accounting Issues - Mailing list pgsql-bugs

From wangw.fnst@fujitsu.com
Subject RE: Logical Replica ReorderBuffer Size Accounting Issues
Date
Msg-id OS3PR01MB6275A699D18DEF195D8A81729EFD9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Logical Replica ReorderBuffer Size Accounting Issues  (Alex Richman <alexrichman@onesignal.com>)
Responses Re: Logical Replica ReorderBuffer Size Accounting Issues  (Alex Richman <alexrichman@onesignal.com>)
List pgsql-bugs
On Wed, Jan 11, 2023 at 23:42 PM Alex Richman <alexrichman@onesignal.com> wrote:
>

Thanks for the details and analysis you shared.

> On Tue, 10 Jan 2023 at 11:22, wangw.fnst@fujitsu.com
> <wangw.fnst@fujitsu.com> wrote:
> > In summary, with the commit c6e0fe1f2a in master, the additional space
> > allocated in the context is reduced. But I think this size difference seems to
> > be inconsistent with what you meet. So I think the issues you meet seems not
> to
> > be caused by the problem improved by this commit on master. How do you
> think?
> Agreed - I see a few different places where rb->size can disagree with the
> allocation size, but nothing that would produce a delta of 200KB vs 7GiB.  I think
> the issue lies somewhere within the allocator itself (more below).
> 
> > If possible, could you please share which version of PG the issue occurs on,
> > and could you please also try to reproduce the problem on master?
> We run 15.1-1 in prod, I have been trying to replicate the issue on that also.
>
> So far I have a partial replication of the issue by populating a table of schema (id
> UUID PRIMARY KEY, data JSONB) with some millions of rows, then doing some
> updates on them (I ran 16 of these concurrently each acting on 1/16th of the
> rows):
> UPDATE test SET data = data || '{"test_0": "1", "test_1": "1", "test_2": "1",
> "test_3": "1", "test_4": "1", "test_5": "1", "test_6": "1", "test_7": "1", "test_8":
> "1", "test_9": "1", "test_a": "1", "test_b": "1", "test_c": "1", "test_d": "1",
> "test_e": "1", "test_f": "1"}' @- '{test_0}';
> This does cause the walsender memory to grow to ~1GiB, even with a
> configured logical_decoding_work_mem of 256MB.  However it is not a perfect
> replication of the issue we see in prod, because rb->size does eventually grow
> to 256MB and start streaming transactions so the walsender memory does not
> grow up to the level we see in prod.

I think the result you said this time (the memory used in rb->tup_context
reaches 1GB) reproduces the same type of problem.

I think parallelism doesn't affect this problem. Because for a walsender, I
think it will always read the wal serially in order. Please let me know if I'm
missing something.

And I tried to use the table structure and UPDATE statement you said. But
unfortunately I didn't catch 1GB or unexpected (I mean a lot size beyond 256MB)
usage in rb->tup_context. Could you please help me to confirm my test? Here is
my test details:
```
[publisher-side]
    create table tab(id UUID PRIMARY KEY, data JSONB);
    create publication pub for table tab;

[subscriber-side]
    create table tab(id UUID PRIMARY KEY, data JSONB);
    CREATE SUBSCRIPTION sub CONNECTION 'xxx' PUBLICATION pub;

[Initial data in publisher-side]
    INSERT INTO tab SELECT gen_random_uuid(), '{"key1":"values1"}'::jsonb FROM generate_series(1, 2000000) s(i);
```

BTW, I'm not sure, what is the operator '@-' at the end of the UPDATE statement
you mentioned? Do you mean '#-'? I think JSONB seem not to have operator '@-'.
So I used '#-' instead of '@-' when testing.

Before the program serializes data to disk in the function
ReorderBufferCheckMemoryLimit (I mean before this line is executed: [1]), I
print rb->size and call MemoryContextStats(rb->context) to check context usage.

In addition, I think that in the function ReorderBufferIterTXNNext, the usage
in rb->tup_context may exceed rb->size a lot. But because of the limit
max_changes_in_memory in the function ReorderBufferRestoreChanges, and the size
of the tuple we mentioned is not very large, I think the usage in
rb->tup_context won't reach 1GB here.

Could you share one thing with me: When you print rb->size and call the
function MemoryContextStats(rb->context), which line of code is being executed
by the program?

Regards,
Wang Wei

[1] -
https://github.com/postgres/postgres/blob/c5dc80c1bc216f0e21a2f79f5e0415c2d4cfb35d/src/backend/replication/logical/reorderbuffer.c#L3497

pgsql-bugs by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: BUG #17747: Registry entry "Base Directory" is not populated if you only install Command-line tools
Next
From: Alex Richman
Date:
Subject: Re: Logical Replica ReorderBuffer Size Accounting Issues