Re: logical decoding : exceeded maxAllocatedDescs for .spill files - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: logical decoding : exceeded maxAllocatedDescs for .spill files |
Date | |
Msg-id | CAJ3gD9cL7-UxM8+v+EE1V8PHruc1sh4ztYKCaOjcSJXBRQfZQg@mail.gmail.com Whole thread Raw |
In response to | Re: logical decoding : exceeded maxAllocatedDescs for .spill files (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: logical decoding : exceeded maxAllocatedDescs for .spill files
|
List | pgsql-hackers |
On Thu, 12 Dec 2019 at 09:49, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 11, 2019 at 4:17 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > > > On Sat, 7 Dec 2019 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Dec 6, 2019 at 5:00 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > > > > > > > On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > 1. > > > > > + /* Now that the state fields are initialized, it is safe to return it. */ > > > > > + *iter_state = state; > > > > > + > > > > > /* allocate heap */ > > > > > state->heap = > > > > > binaryheap_allocate(state->nr_txns, > > > > > ReorderBufferIterCompare, > > > > > > > > > > Is there a reason for not initializing iter_state after > > > > > binaryheap_allocate? If we do so, then we don't need additional check > > > > > you have added in ReorderBufferIterTXNFinish. > > > > > > > > If iter_state is initialized *after* binaryheap_allocate, then we > > > > won't be able to close the vfds if binaryheap_allocate() ereports(). > > > > > > > > > > Is it possible to have vfds opened before binaryheap_allocate(), if so how? > > No it does not look possible for the vfds to be opened before > > binaryheap_allocate(). But actually, the idea behind placing the > > iter_state at the place where I put is that, we should return back the > > iter_state at the *earliest* place in the code where it is safe to > > return. > > > > Sure, I get that point, but it seems it is equally good to do this > after binaryheap_allocate(). It will be sligthly better because if > there is any error in binaryheap_allocate, then we don't need to even > call ReorderBufferIterTXNFinish(). All right. WIll do that. > > > > > >> 4. I think we should also check how much time increase will happen for > > >> test_decoding regression test after the test added by this patch? > > Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > > Yeah, it currently takes noticeably longer compared to the others. > > > Let's see if setting logical_decoding_work_mem to a min value allows > > > us to reproduce the test with much lesser number of inserts. > > > > The test in the patch takes around 20 seconds, as compared to the max > > time of 2 seconds any of the other tests take in that test suite. > > > > But if we set the max_files_per_process to a very low value (say 26) > > as against the default 1000, we can reproduce the issue with as low as > > 20 sub-transactions as against the 600 that I used in spill.sql test. > > And with this, the test runs in around 4 seconds, so this is good. > > > > Do you get 4s even after setting the minimum value of > logical_decoding_work_mem? I think 4s is also too much for this test > which is going to test one extreme scenario. Can you try with some > bigger rows or something else to reduce this time? I think if we can > get it close to 2s or whatever maximum time taken by any other logical > decoding tests, then good, otherwise, it doesn't seem like a good idea > to add this test. > > > But > > the problem is : max_files_per_process needs server restart. So either > > we have to shift this test to src/test/recovery in one of the > > logical_decoding test, or retain it in contrib/test_decoding and let > > it run for 20 seconds. Let me know if you figure out any other > > approach. > > > > I don't think 20s for one test is acceptable. So, we should just give > up that path, you can try what I suggested above, if we can reduce the > test time, then good, otherwise, I suggest to drop the test from the > patch. Having said that, we should try our best to reduce the test > time as it will be good if we can have such a test. I tried; it is actually roughly 3.4 seconds. Note that reducing logical_decoding_work_mem does not reduce the test time. It causes the serialization to start early, and so increases the chance of reproducing the problem. During restore of the serialized data, we still use max_changes_in_memory. So max_changes_in_memory is the one that allows us to reduce the number of transactions required, so we can cut down on the outer loop iterations and make the test finish much earlier. But also note that, we can't use the test suite in contrib/test_decoding, because max_changes_in_memory needs server restart. So we need to shift this test to src/test/recovery. And there, I guess it is not that critical for the testcase to be very quick because the tests in general are much slower than the ones in contrib/test_decoding, although it would be nice to make it fast. What I propose is to modify max_changes_in_memory, do a server restart (which takes hardly a sec), run the testcase (3.5 sec) and then restart after resetting the guc. So totally it will be around 4-5 seconds. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: