Re: DSA_ALLOC_NO_OOM doesn't work - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: DSA_ALLOC_NO_OOM doesn't work
Date
Msg-id cf35fccb-957e-48de-8ea0-a083e9d7843d@iki.fi
Whole thread Raw
In response to Re: DSA_ALLOC_NO_OOM doesn't work  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: DSA_ALLOC_NO_OOM doesn't work
Re: DSA_ALLOC_NO_OOM doesn't work
List pgsql-hackers
On 14/02/2024 09:23, Robert Haas wrote:
> On Tue, Feb 13, 2024 at 7:53 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> However, I must say that the dsm_impl_op() interface is absolutely
>> insane. It's like someone looked at ioctl() and thought, "hey that's a
>> great idea!".
> 
> As the person who wrote that code, this made me laugh.
> 
> I agree it's not the prettiest interface, but I thought that was OK
> considering that it should only ever have a very limited number of
> callers. I believe I did it this way in the interest of code
> compactness. Since there are four DSM implementations, I wanted the
> implementation-specific code to be short and all in one place, and
> jamming it all into one function served that purpose. Also, there's a
> bunch of logic that is shared by multiple operations - detach and
> destroy tend to be similar, and so do create and attach, and there are
> even things that are shared across all operations, like the snprintf
> at the top of dsm_impl_posix() or the slightly larger amount of
> boilerplate at the top of dsm_impl_sysv().
> 
> I'm not particularly opposed to refactoring this to make it nicer, but
> my judgement was that splitting it up into one function per operation
> per implementation, say, would have involved a lot of duplication of
> small bits of code that might then get out of sync with each other
> over time. By doing it this way, the logic is a bit tangled -- or
> maybe more than a bit -- but there's very little duplication because
> each implementation gets jammed into the smallest possible box. I'm OK
> with somebody deciding that I got the trade-offs wrong there, but I
> will be interested to see the number of lines added vs. removed in any
> future refactoring patch.

That's fair, I can see those reasons. Nevertheless, I do think it was a 
bad tradeoff. A little bit of repetition would be better here, or we can 
extract the common parts to smaller functions.

I came up with the attached:

   25 files changed, 1710 insertions(+), 1113 deletions(-)

So yeah, it's more code, and there's some repetition, but I think this 
is more readable. Some of that is extra boilerplate because I split the 
implementations to separate files, and I also added tests.

I'm not 100% wedded on all of the decisions here, but I think this is 
the right direction overall. For example, I decided to separate 
dsm_handle used by the high-level interface in dsm.c and the 
dsm_impl_handle used by the low-level interace in dsm_impl_*.c (more on 
that below). That feels slightly better to me, but could be left out if 
we don't want that.

Notable changes:

- Split the single multiplexed dsm_impl_op() function into multiple 
functions for different operations. This allows more natural function 
signatures for the different operations.

- The create() function is now responsible for generating the handle, 
instead of having the caller generate it. Those implementations that 
need to generate a random handle and retry if it's already in use, now 
do that retry within the implementation.

- The destroy() function no longer detaches the segment; you must call 
detach() first if the segment is still attached. This avoids having to 
pass "junk" values when destroying a segment that's not attached, and in 
case of error, makes it more clear what failed.

- Separate dsm_handle, used by backend code to interact with the high 
level interface in dsm.c, from dsm_impl_handle, which is used to 
interact with the low-level functions in dsm_impl.c. This gets rid of 
the convention in dsm.c of reserving odd numbers for DSM segments stored 
in the main shmem area. There is now an explicit flag for that the 
control slot. For generating dsm_handles, we now use the same scheme we 
used to use for main-area shm segments for all DSM segments, which 
includes the slot number in the dsm_handle. The implementations use 
their own mechanisms for generating the low-level dsm_impl_handles (all 
but the SysV implementation generate a random handle and retry on 
collision).

- Use IPC_PRIVATE in the SysV implementation to have the OS create a 
unique identifier for us. Use the shmid directly as the (low-level) 
handle, so that we don't need to use shmget() to convert a key to shmid, 
and don't need the "cache" for that.

- create() no longer returns the mapped_size. The old Windows 
implementation had some code to read the actual mapped size after 
creating the mapping, and returned that in *mapped_size. Others just 
returned the requested size. In principle reading the actual size might 
be useful; the caller might be able to make use of the whole mapped size 
when it's larger than requested. In practice, the callers didn't do 
that. Also, POSIX shmem on FreeBSD has similar round-up-to-page-size 
behavior but the implementation did not query the actual mapped size 
after creating the segment, so you could not rely on it.

- Added a test that exercises basic create, detach, attach functionality 
using all the different implementations supported on the current platform.

- Change datatype of the opaque types in dsm_impl.c from "void *" to 
typedefs over uintptr_t. It's easy to make mistakes with "void *", as 
you can pass any pointer without getting warnings from the compiler. 
Dedicated typedefs give a bit more type checking. (This is in the first 
commit, all the other changes are bundled together in the second commit.)

Overall, I don't think this is backpatchable. The handle changes and use 
of IPC_PRIVATE in particular: they could lead to failure to clean up old 
segments if you upgraded the binary without a clean shutdown. A slightly 
different version of this possibly would be, but I'd like to focus on 
what's best for master for now.

-- 
Heikki Linnakangas
Neon (https://neon.tech)
Attachment

pgsql-hackers by date:

Previous
From: "Amonson, Paul D"
Date:
Subject: RE: Popcount optimization using AVX512
Next
From: Jeff Davis
Date:
Subject: Re: LogwrtResult contended spinlock