Re: logical decoding : exceeded maxAllocatedDescs for .spill files - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date
Msg-id CAA4eK1LATyRWPNbLSR2BM+Hn5t-xeyMT+C9+sUrkgYvm5+QLoQ@mail.gmail.com
Whole thread Raw
In response to Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Fri, Jan 10, 2020 at 6:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> >           ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks); 223295504 used
>
> > The test case is only inserting 50K fairly-short rows, so this seems
> > like an unreasonable amount of memory to be consuming for that; and
> > even if you think it's reasonable, it clearly isn't going to scale
> > to large production transactions.
>
> > Now, the good news is that v11 and later get through
> > 006_logical_decoding.pl just fine under the same restriction.
> > So we did something in v11 to fix this excessive memory consumption.
> > However, unless we're willing to back-port whatever that was, this
> > test case is clearly consuming excessive resources for the v10 branch.
>
> I dug around a little in the git history for backend/replication/logical/,
> and while I find several commit messages mentioning memory leaks and
> faulty spill logic, they all claim to have been back-patched as far
> as 9.4.
>
> It seems reasonably likely to me that this result is telling us about
> an actual bug, ie, faulty back-patching of one or more of those fixes
> into v10 and perhaps earlier branches.
>

I think it would be good to narrow down this problem, but it seems we
can do this separately.   I think to avoid forgetting about this, can
we track it somewhere as an open issue (In Older Bugs section of
PostgreSQL 12 Open Items or some other place)?

It seems to me that this test has found a problem in back-branches, so
we might want to keep it after removing the max_files_per_process
restriction.  However, unless we narrow down this memory leak it is
not a good idea to keep it at least not in v10.  So, we have the below
options:
(a) remove this test entirely from all branches and once we found the
memory leak problem in back-branches, then consider adding it again
without max_files_per_process restriction.
(b) keep this test without max_files_per_process restriction till v11
and once the memory leak issue in v10 is found, we can back-patch to
v10 as well.

Suggestions?

>  If I have to do so to prove my point, I will set up a buildfarm member
>  that uses USE_NAMED_POSIX_SEMAPHORES, and then insist that the patch
>  cope with that.
>

Shall we document that under USE_NAMED_POSIX_SEMAPHORES, we consume
additional fd?  I thought about it because the minimum limit for
max_files_per_process is 25 and the system won't even start if someone
has used a platform where USE_NAMED_POSIX_SEMAPHORES is enabled.
Also, if it would have been explicitly mentioned, then I think this
test wouldn't have tried to become so optimistic about
max_files_per_process.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: base backup client as auxiliary backend process
Next
From: Michael Paquier
Date:
Subject: Re: pgbench - use pg logging capabilities