Re: logical decoding : exceeded maxAllocatedDescs for .spill files - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date
Msg-id CAJ3gD9c9s6m1KYfM9w6a52NE6kh=pWyMoEv9Dqnjt+DRTqD24A@mail.gmail.com
Whole thread Raw
In response to Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Sat, 7 Dec 2019 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 6, 2019 at 5:00 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Few comments:
> > > ----------------------
> > >
> > > 1.
> > > + /* Now that the state fields are initialized, it is safe to return it. */
> > > + *iter_state = state;
> > > +
> > >   /* allocate heap */
> > >   state->heap =
> > > binaryheap_allocate(state->nr_txns,
> > >     ReorderBufferIterCompare,
> > >
> > > Is there a reason for not initializing iter_state after
> > > binaryheap_allocate?  If we do so, then we don't need additional check
> > > you have added in ReorderBufferIterTXNFinish.
> >
> > If iter_state is initialized *after* binaryheap_allocate, then we
> > won't be able to close the vfds if binaryheap_allocate() ereports().
> >
>
> Is it possible to have vfds opened before binaryheap_allocate(), if so how?
No it does not look possible for the vfds to be opened before
binaryheap_allocate(). But actually, the idea behind placing the
iter_state at the place where I put is that, we should return back the
iter_state at the *earliest* place in the code where it is safe to
return.


> > I couldn't reproduce the original problem (on HEAD) reported with the
> > test case in the patch.  So, I can't verify the fix.  I think it is
> > because of recent commits cec2edfa7859279f36d2374770ca920c59c73dd8 and
> > 9290ad198b15d6b986b855d2a58d087a54777e87.  It seems you need to either
> > change the value of logical_decoding_work_mem or change the test in
> > some way so that the original problem can be reproduced.
Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> Yeah, it does seem like the commit cec2edfa78592 must have caused the
> test to not reproduce on your env, although the test does fail for me
> still. Setting logical_decoding_work_mem to a low value does sound
> like a good idea. Will work on it.

I checked that setting logical_decoding_work_mem to its min value
(64KB) causes early serialization. So I think if you set this, you
should be able to reproduce with the spill.sql test that has the new
testcase. I will anyway set this value in the test. Also check below
...

>> 4. I think we should also check how much time increase will happen for
>> test_decoding regression test after the test added by this patch?
Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> Yeah, it currently takes noticeably longer compared to the others.
> Let's see if setting logical_decoding_work_mem to a min value allows
> us to reproduce the test with much lesser number of inserts.

The test in the patch takes around 20 seconds, as compared to the max
time of 2 seconds any of the other tests take in that test suite.

But if we set the max_files_per_process to a very low value (say 26)
as against the default 1000, we can reproduce the issue with as low as
20 sub-transactions as against the 600 that I used in spill.sql test.
And with this, the test runs in around 4 seconds, so this is good. But
the problem is : max_files_per_process needs server restart. So either
we have to shift this test to src/test/recovery in one of the
logical_decoding test, or retain it in contrib/test_decoding and let
it run for 20 seconds. Let me know if you figure out any other
approach.


>
> > >
> > > 5. One naive question about the usage of PathNameOpenFile().  When it
> > > reaches the max limit, it will automatically close one of the files,
> > > but how will that be reflected in the data structure (TXNEntryFile)
> > > you are managing.  Basically, when PathNameOpenFile closes some file,
> > > how will the corresponding vfd in TXNEntryFile be changed. Because if
> > > it is not changed, then won't it start pointing to some wrong
> > > filehandle?
> >
> > In PathNameOpenFile(), excess kernel fds could be closed
> > (ReleaseLruFiles). But with that, the vfds themselves don't get
> > invalidated. Only the underlying kernel fd gets closed, and the
> > vfd->fd is marked VFD_CLOSED. The vfd array element remains valid (a
> > non-null vfd->fileName means the vfd slot is valid; check
> > FileIsValid). So later, when FileRead(vfd1) is called and that vfd1
> > happens to be the one that had got it's kernel fd closed, it gets
> > opened again through FileAccess().
> >
>
> I was under impression that once the fd is closed due to excess kernel
> fds that are opened, the slot in VfdCache array could be resued by
> someone else, but on closer inspection that is not true.  It will be
> only available for reuse after we explicitly call FileClose, right?

Yes, that's right.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Session WAL activity
Next
From: Michael Paquier
Date:
Subject: Re: get_database_name() from background worker