Re: logical decoding : exceeded maxAllocatedDescs for .spill files - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date
Msg-id 20190912125755.GA24047@alvherre.pgsql
Whole thread Raw
In response to Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2019-Sep-12, Tomas Vondra wrote:

> On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
> > On 2019-Sep-11, Amit Khandekar wrote:

> > I think doing this all the time would make restore very slow -- there's a
> > reason we keep the files open, after all.
> 
> How much slower? It certainly will have a hit, but maybe it's negligible
> compared to all the other stuff happening in this code?

I dunno -- that's a half-assed guess based on there being many more
syscalls, and on the fact that the API is how it is in the first place.
(Andres did a lot of perf benchmarking and tweaking back when he was
writing this ... I just point out that I have a colleague that had to
invent *a lot* of new MemCxt infrastructure in order to make some of
Andres' perf optimizations cleaner, just as a semi-related data point.
Anyway, I digress.)

Anyway, such a fix would pessimize all cases, including every single
case that works today (which evidently is almost every single usage of
this feature, since we hadn't heard of this problem until yesterday), in
order to solve a problem that you can only find in very rare ones.

Another point of view is that we should make it work first, then make it
fast.  But the point remains that it works fine and fast for 99.99% of
cases.

> > It would be better if we can keep the descriptors open as much as
> > possible, and only close them if there's trouble.  I was under the
> > impression that using OpenTransientFile was already taking care of that,
> > but that's evidently not the case.
> 
> I don't see how the current API could do that transparently - it does
> track the files, but the user only gets a file descriptor. With just a
> file descriptor, how could the code know to do reopen/seek when it's going
> just through the regular fopen/fclose?

Yeah, I don't know what was in Amit's mind, but it seemed obvious to me
that such a fix required changing that API so that a seekpos is kept
together with the fd.  ReorderBufferRestoreChange is static in
reorderbuffer.c so it's not a problem to change its ABI.

I agree with trying to get a reorderbuffer-localized back-patchable fix
first, then we how to improve from that.

> As a sidenote - in the other thread about streaming, one of the patches
> does change how we log subxact assignments. In the end, this allows using
> just a single file for the top-level transaction, instead of having one
> file per subxact. That would also solve this.

:-(  While I would love to get that patch done and get rid of the issue,
it's not a backpatchable fix either.  ... however, it does mean we maybe
can get away with the reorderbuffer.c-local fix, and then just use your
streaming stuff to get rid of the perf problem forever?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: SegFault on 9.6.14
Next
From: Robert Haas
Date:
Subject: Re: tableam vs. TOAST