Re: RFC: seccomp-bpf support - Mailing list pgsql-hackers

From Tom Lane
Subject Re: RFC: seccomp-bpf support
Date
Msg-id 28344.1567019959@sss.pgh.pa.us
Whole thread Raw
In response to Re: RFC: seccomp-bpf support  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2019-08-28 14:47:04 -0400, Joshua Brindle wrote:
>> A prime example is madvise() which was a catastrophic failure that 1)
>> isn't preventable by any LSM including SELinux, 2) isn't used by PG
>> and is therefore a good candidate for a kill list, and 3) a clear win
>> in the dont-let-PG-be-a-vector-for-kernel-compromise arena.

> IIRC it's used by glibc as part of its malloc implementation (also
> threading etc) - but not necessarily hit during the most common
> paths. That's *precisely* my problem with this approach.

I think Andres is right here.  There are madvise calls in glibc:

glibc-2.28/malloc/malloc.c:                    __madvise (paligned_mem, size & ~psm1, MADV_DONTNEED);
glibc-2.28/malloc/arena.c:    __madvise ((char *) h + new_size, diff, MADV_DONTNEED);

It appears that the first is only reachable from __malloc_trim which
we don't use, but the second is reachable from free().  However,
strace'ing says that it's never called during our standard regression
tests, confirming Andres' thought that it's in seldom-reached paths.
(I did not go through the free() logic in any detail, but it looks
like it is only reached when dealing with quite-large allocations,
which'd make sense.)

So this makes a perfect example for Peter's point that testing is
going to be a very fallible way of finding the set of syscalls that
need to be allowed.  Even if we had 100.00% code coverage of PG
proper, we would not necessarily find calls like this.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Andres Freund
Date:
Subject: Re: RFC: seccomp-bpf support