Re: logical decoding : exceeded maxAllocatedDescs for .spill files - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date
Msg-id CAJ3gD9cL7-UxM8+v+EE1V8PHruc1sh4ztYKCaOjcSJXBRQfZQg@mail.gmail.com
Whole thread Raw
In response to Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical decoding : exceeded maxAllocatedDescs for .spill files
List pgsql-hackers
On Thu, 12 Dec 2019 at 09:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 11, 2019 at 4:17 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Sat, 7 Dec 2019 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Dec 6, 2019 at 5:00 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > > >
> > > > On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > 1.
> > > > > + /* Now that the state fields are initialized, it is safe to return it. */
> > > > > + *iter_state = state;
> > > > > +
> > > > >   /* allocate heap */
> > > > >   state->heap =
> > > > > binaryheap_allocate(state->nr_txns,
> > > > >     ReorderBufferIterCompare,
> > > > >
> > > > > Is there a reason for not initializing iter_state after
> > > > > binaryheap_allocate?  If we do so, then we don't need additional check
> > > > > you have added in ReorderBufferIterTXNFinish.
> > > >
> > > > If iter_state is initialized *after* binaryheap_allocate, then we
> > > > won't be able to close the vfds if binaryheap_allocate() ereports().
> > > >
> > >
> > > Is it possible to have vfds opened before binaryheap_allocate(), if so how?
> > No it does not look possible for the vfds to be opened before
> > binaryheap_allocate(). But actually, the idea behind placing the
> > iter_state at the place where I put is that, we should return back the
> > iter_state at the *earliest* place in the code where it is safe to
> > return.
> >
>
> Sure, I get that point, but it seems it is equally good to do this
> after binaryheap_allocate().  It will be sligthly better because if
> there is any error in binaryheap_allocate, then we don't need to even
> call ReorderBufferIterTXNFinish().

All right. WIll do that.

>
> >
> > >> 4. I think we should also check how much time increase will happen for
> > >> test_decoding regression test after the test added by this patch?
> > Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > > Yeah, it currently takes noticeably longer compared to the others.
> > > Let's see if setting logical_decoding_work_mem to a min value allows
> > > us to reproduce the test with much lesser number of inserts.
> >
> > The test in the patch takes around 20 seconds, as compared to the max
> > time of 2 seconds any of the other tests take in that test suite.
> >
> > But if we set the max_files_per_process to a very low value (say 26)
> > as against the default 1000, we can reproduce the issue with as low as
> > 20 sub-transactions as against the 600 that I used in spill.sql test.
> > And with this, the test runs in around 4 seconds, so this is good.
> >
>
> Do you get 4s even after setting the minimum value of
> logical_decoding_work_mem?  I think 4s is also too much for this test
> which is going to test one extreme scenario. Can you try with some
> bigger rows or something else to reduce this time?  I think if we can
> get it close to 2s or whatever maximum time taken by any other logical
> decoding tests, then good, otherwise, it doesn't seem like a good idea
> to add this test.

>
> > But
> > the problem is : max_files_per_process needs server restart. So either
> > we have to shift this test to src/test/recovery in one of the
> > logical_decoding test, or retain it in contrib/test_decoding and let
> > it run for 20 seconds. Let me know if you figure out any other
> > approach.
> >
>
> I don't think 20s for one test is acceptable.  So, we should just give
> up that path, you can try what I suggested above, if we can reduce the
> test time, then good, otherwise, I suggest to drop the test from the
> patch.  Having said that, we should try our best to reduce the test
> time as it will be good if we can have such a test.

I tried; it is actually roughly 3.4 seconds. Note that reducing
logical_decoding_work_mem does not reduce the test time. It causes the
serialization to start early, and so increases the chance of
reproducing the problem. During restore of the serialized data, we
still use max_changes_in_memory. So max_changes_in_memory is the one
that allows us to reduce the number of transactions required, so we
can cut down on the outer loop iterations and make the test finish
much earlier.

But also note that, we can't use the test suite in
contrib/test_decoding, because max_changes_in_memory needs server
restart. So we need to shift this test to src/test/recovery. And
there, I guess it is not that critical for the testcase to be very
quick because the tests in general are much slower than the ones in
contrib/test_decoding, although it would be nice to make it fast. What
I propose is to modify  max_changes_in_memory, do a server restart
(which takes hardly a sec), run the testcase (3.5 sec) and then
restart after resetting the guc. So totally it will be around 4-5
seconds.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes
Next
From: Thomas Munro
Date:
Subject: Re: Collation versioning