Thread: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Aleksander Alekseev
Date:
Hello I was playing with CLang sanitizers[1][2][3][4] recently and discovered something disturbing regarding how PostgreSQL works. Here is an example. Lets create a breakpoint right before filling a CheckPoint structure: (gdb) b xlog.c:4772 Breakpoint 1 at 0x7ffbad0556d4: file xlog.c, line 4772. (gdb) c Continuing. Fill checkpoint with some random data: (gdb) p memset(&checkPoint, 0xEA, sizeof(checkPoint)) $1 = 1110817376 (gdb) x/80xb &checkPoint 0x7ffc4235ba60: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7ffc4235ba68: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7ffc4235ba70: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7ffc4235ba78: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7ffc4235ba80: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7ffc4235ba88: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7ffc4235ba90: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7ffc4235ba98: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7ffc4235baa0: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7ffc4235baa8: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea Wait until checkPoint will be initialized: 4805 checkPoint.redo = XLogSegSize + SizeOfXLogLongPHD; (gdb) 4806 checkPoint.ThisTimeLineID = ThisTimeLineID; (gdb) 4807 checkPoint.PrevTimeLineID = ThisTimeLineID; (gdb) ... Lets see what's in memory: (gdb) x/80xb &checkPoint 0x7ffc4235ba60: 0x28 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x7ffc4235ba68: 0x01 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x7ffc4235ba70: 0x01 0xea 0xea 0xea 0x00 0x00 0x00 0x00 0x7ffc4235ba78: 0x03 0x00 0x00 0x00 0x10 0x27 0x00 0x00 0x7ffc4235ba80: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7ffc4235ba88: 0x03 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x7ffc4235ba90: 0x01 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x7ffc4235ba98: 0x3d 0x0a 0xec 0x56 0x00 0x00 0x00 0x00 0x7ffc4235baa0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7ffc4235baa8: 0x00 0x00 0x00 0x00 0xea 0xea 0xea 0xea As you may see there are "holes" that were in fact not filled. Under normal conditions they will be filled with data previously stored on stack which could be anything including passwords and other private data. Afterwards this structure is written to disk where potentially someone who not supposed to see this data may see it. I realize this is not a big problem in practice. But from security nerd's point o few this is a pretty severe security issue. I think we should fix this and all other cases where MemorySanitizer reports "use-of-uninitialized-value" error. Also I suggest we make PostgreSQL pass _all_ sanitizers checks. They are able to find quite nasty bugs including but not limited to memory leaks, array out of bound accesses, data races, etc. Imagine we could find all these bugs during regression tests run on buildfarms. What do you think? [1] http://clang.llvm.org/docs/MemorySanitizer.html [2] http://clang.llvm.org/docs/AddressSanitizer.html [3] http://clang.llvm.org/docs/LeakSanitizer.html [4] http://clang.llvm.org/docs/ThreadSanitizer.html -- Best regards, Aleksander Alekseev http://eax.me/
Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Chapman Flack
Date:
On 03/21/2016 06:08 AM, Aleksander Alekseev wrote: > As you may see there are "holes" that were in fact not filled. Under > normal conditions they will be filled with data previously stored on > stack which could be anything including passwords and other private > data. Afterwards this structure is written to disk where potentially > someone who not supposed to see this data may see it. > > I realize this is not a big problem in practice. Well, the documentation already says to avoid it: http://www.postgresql.org/docs/current/static/xfunc-c.html Another important point is to avoid leaving any uninitialized bits within data type values; for example, take care tozero out any alignment padding bytes that might be present in structs. so I don't think what you're suggesting would be controversial at all; it looks like what you've done is found a(t least one) bug where the documented practice wasn't followed, and it's good to find any such places. -Chap
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Aleksander Alekseev
Date:
> Well, the documentation already says to avoid it: > > http://www.postgresql.org/docs/current/static/xfunc-c.html > > Another important point is to avoid leaving any uninitialized > bits within data type values; for example, take care to zero out > any alignment padding bytes that might be present in structs. > > so I don't think what you're suggesting would be controversial > at all; it looks like what you've done is found a(t least one) > bug where the documented practice wasn't followed, and it's good > to find any such places. Well in this case here is a patch that fixes "use of uninitialized value" reports by MemorySanitizer I managed to catch so far. -- Best regards, Aleksander Alekseev http://eax.me/
Attachment
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Chapman Flack
Date:
On 03/21/2016 10:21 AM, Aleksander Alekseev wrote: > Well in this case here is a patch that fixes "use of uninitialized > value" reports by MemorySanitizer I managed to catch so far. I'm new here so someone more experienced would have to weigh in, but I would wonder a couple of things: a. whether a braced struct assignment is supported in every C compiler that PostgreSQL still intends to support b. whether such a struct assignment is guaranteed to initialize padding spaces as well as declared fields (in all supported C versions/compilers). It's possible that memset() would be more convincing. -Chap
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Aleksander Alekseev
Date:
> I'm new here so someone more experienced would have to weigh in, > but I would wonder a couple of things: > > a. whether a braced struct assignment is supported in every > C compiler that PostgreSQL still intends to support > > b. whether such a struct assignment is guaranteed to initialize > padding spaces as well as declared fields (in all supported > C versions/compilers). > > It's possible that memset() would be more convincing. Frankly I'm not sure regarding all supported C versions/compilers. But it seems to be a valid ANSI C. Here is a test program: ``` #include <stdio.h> typedef struct { int i; char c; long l; short s; } MyStruct; int main() { int i, sum = 0; char *c; MyStruct s = {0}; s.i = 11; s.c = 22; s.l = 33; s.s = 44; c = (char*)&s; for(i = 0; i < sizeof(s); i++) { sum += *c; c++; } printf("Sum: %d\n", sum); return 0; } ``` I compiled it with various versions of GCC and CLang with different optimization flags: clang38 -O3 -ansi -g t.c -o t gcc -O0 -ansi -g t.c -o t In all cases running a program under debugger shows that structure is properly initialized: (gdb) b main Breakpoint 1 at 0x4007ae: file t.c, line 12. (gdb) r Starting program: /usr/home/eax/temp/t Breakpoint 1, main () at t.c:12 12 int i, sum = 0; (gdb) p memset(&s, 0xEA, sizeof(MyStruct)) $1 = -5376 (gdb) x/24xb &s 0x7fffffffeb00: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7fffffffeb08: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0x7fffffffeb10: 0xea 0xea 0xea 0xea 0xea 0xea 0xea 0xea (gdb) n 14 MyStruct s = {0}; (gdb) 16 s.i = 11; (gdb) x/24xb &s 0x7fffffffeb00: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fffffffeb08: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fffffffeb10: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (gdb) quit Naturally we could use memset() as well. But I personally find it a bit less readable. And in theory it doesn't prevent some _very_ "smart" C compiler from not cleaning the whole structure anyway. -- Best regards, Aleksander Alekseev http://eax.me/
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Tom Lane
Date:
Chapman Flack <chap@anastigmatix.net> writes: > On 03/21/2016 10:21 AM, Aleksander Alekseev wrote: >> Well in this case here is a patch that fixes "use of uninitialized >> value" reports by MemorySanitizer I managed to catch so far. > I'm new here so someone more experienced would have to weigh in, > but I would wonder a couple of things: > a. whether a braced struct assignment is supported in every > C compiler that PostgreSQL still intends to support We rely on struct assignment to work already; although I'm not sure we should expect it to be efficient, so we might not want to use it in performance-critical places. > b. whether such a struct assignment is guaranteed to initialize > padding spaces as well as declared fields (in all supported > C versions/compilers). I think this is a valid concern; my recollection is that the C standard defines struct assignment as "assign each member". > It's possible that memset() would be more convincing. +1 regards, tom lane
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Aleksander Alekseev
Date:
> > It's possible that memset() would be more convincing. > > +1 OK, here is corrected patch. -- Best regards, Aleksander Alekseev http://eax.me/
Attachment
Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Heikki Linnakangas
Date:
On 03/22/2016 03:27 PM, Aleksander Alekseev wrote: > diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c > index 1ff5728..a10c078 100644 > --- a/src/backend/commands/tablespace.c > +++ b/src/backend/commands/tablespace.c > @@ -669,6 +669,11 @@ destroy_tablespace_directories(Oid tablespaceoid, bool redo) > char *subfile; > struct stat st; > > + /* > + * Prevents MemorySanitizer's "use-of-uninitialized-value" warning > + */ > + memset(&st, 0, sizeof(st)); > + > linkloc_with_version_dir = psprintf("pg_tblspc/%u/%s", tablespaceoid, > TABLESPACE_VERSION_DIRECTORY); Why does MemorySanitizer complain about that? Calling stat(2) is supposed to fill in all the fields we look at, right? > diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c > index 924bebb..498e7bd 100644 > --- a/src/backend/utils/cache/inval.c > +++ b/src/backend/utils/cache/inval.c > @@ -330,6 +330,7 @@ AddCatcacheInvalidationMessage(InvalidationListHeader *hdr, > SharedInvalidationMessage msg; > > Assert(id < CHAR_MAX); > + memset(&msg, 0, sizeof(msg)); > msg.cc.id = (int8) id; > msg.cc.dbId = dbId; > msg.cc.hashValue = hashValue; Right after this, we have: > /* > * Define padding bytes in SharedInvalidationMessage structs to be > * defined. Otherwise the sinvaladt.c ringbuffer, which is accessed by > * multiple processes, will cause spurious valgrind warnings about > * undefined memory being used. That's because valgrind remembers the > * undefined bytes from the last local process's store, not realizing that > * another process has written since, filling the previously uninitialized > * bytes > */ > VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg)); Do we need both? > diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c > index d26991e..46ab8a2 100644 > --- a/src/backend/utils/mmgr/aset.c > +++ b/src/backend/utils/mmgr/aset.c > @@ -850,7 +850,7 @@ AllocSetAlloc(MemoryContext context, Size size) > blksize <<= 1; > > /* Try to allocate it */ > - block = (AllocBlock) malloc(blksize); > + block = (AllocBlock) calloc(1, blksize); > > /* > * We could be asking for pretty big blocks here, so cope if malloc > @@ -861,7 +861,7 @@ AllocSetAlloc(MemoryContext context, Size size) > blksize >>= 1; > if (blksize < required_size) > break; > - block = (AllocBlock) malloc(blksize); > + block = (AllocBlock) calloc(1, blksize); > } > > if (block == NULL) I think this goes too far. You're zeroing all palloc'd memory, even if it's going to be passed to palloc0(), and zeroed there. It might even silence legitimate warnings, if there's code somewhere that does palloc(), and accesses some of it before initializing. Plus it's a performance hit. > diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c > index e7826a4..4bbd4d2 100644 > --- a/src/test/regress/regress.c > +++ b/src/test/regress/regress.c > @@ -1022,6 +1022,11 @@ test_atomic_uint64(void) > uint64 expected; > int i; > > + /* > + * Prevents MemorySanitizer's "use-of-uninitialized-value" warning > + */ > + memset(&var, 0, sizeof(var)); > + > pg_atomic_init_u64(&var, 0); > > if (pg_atomic_read_u64(&var) != 0) What's going on here? Surely pg_atomic_init_u64() should initialize the value? - Heikki
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Peter Eisentraut
Date:
On 3/22/16 9:27 AM, Aleksander Alekseev wrote: > diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c > index 28f9fb5..45aa802 100644 > --- a/src/backend/libpq/hba.c > +++ b/src/backend/libpq/hba.c > @@ -1008,14 +1008,9 @@ parse_hba_line(List *line, int line_num, char *raw_line) > *cidr_slash = '\0'; > > /* Get the IP address either way */ > + memset(&hints, 0, sizeof(hints)); > hints.ai_flags = AI_NUMERICHOST; > hints.ai_family = AF_UNSPEC; > - hints.ai_socktype = 0; > - hints.ai_protocol = 0; > - hints.ai_addrlen = 0; > - hints.ai_canonname = NULL; > - hints.ai_addr = NULL; > - hints.ai_next = NULL; > > ret = pg_getaddrinfo_all(str, NULL, &hints, &gai_result); > if (ret == 0 && gai_result) In addition to what Heikki wrote, I think the above is not necessary. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Aleksander Alekseev
Date:
Heikki, Peter, thanks a lot for code review! > What's going on here? Surely pg_atomic_init_u64() should initialize > the value? It's because of how pg_atomic_exchange_u64_impl is implemented: ``` while (true) { old = ptr->value; /* <-- reading of uninitialized value! */ if (pg_atomic_compare_exchange_u64_impl(ptr, &old, xchg_)) break; } ``` Currently pg_atomic_init_u64 works like this: pg_atomic_init_u64 `- pg_atomic_init_u64_impl `- pg_atomic_write_u64_impl `- pg_atomic_exchange_u64_impl I suspect there is actually no need to make an atomic exchange during initialization of an atomic variable. Regular `mov` should be enough (IIRC there is no need to do `lock mov` since `mov` is already atomic). Anyway I don't feel brave enough right now to mess with atomic operations since it involves all sort of portability issues. So I removed this change for now. > Why does MemorySanitizer complain about that? Calling stat(2) is > supposed to fill in all the fields we look at, right? > In addition to what Heikki wrote, I think the above is not necessary. It's been my observation that MemorySanitizer often complains on passing structures with uninitialized padding bytes to system calls. I'm not sure whether libc really reads/copies structures in these cases or MemorySanitizer doesn't like the idea in general. Although it's true that maybe MemorySanitizer it too pedantic in these cases, in some respect it's right. Since operating system is a black box from our perspective (not mentioning that there are _many_ OS'es that change all the time) in general case passing a structure with uninitialized padding bytes to a system call can in theory cause a non-repeatable behavior. For instance if there are caching and hash calculation involved. Also, as Chapman noted previously [1], according to PostgreSQL documentation using structures with uninitialized padding bytes should be avoided anyway. I strongly believe that benefits of passing all MemorySanitizer checks (possibility of discovering many complicated bugs automatically in this case) many times outweighs drawbacks of tiny memset's overhead in these concrete cases. > I think this goes too far. You're zeroing all palloc'd memory, even > if it's going to be passed to palloc0(), and zeroed there. It might > even silence legitimate warnings, if there's code somewhere that > does palloc(), and accesses some of it before initializing. Plus > it's a performance hit. Completely agree, my bad. I removed these changes. > Right after this, we have: > VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg)); > Do we need both? Apparently we don't. I removed VALGRIND_MAKE_MEM_DEFINED here since now there are no uninitialized padding bytes in &msg. Corrected patch is attached. If you have any other ideas how it could be improved please let me know! [1] https://www.postgresql.org/message-id/56EFF347.20500%40anastigmatix.net -- Best regards, Aleksander Alekseev
Attachment
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Andres Freund
Date:
On August 19, 2016 2:50:30 AM PDT, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: >Heikki, Peter, thanks a lot for code review! > >> What's going on here? Surely pg_atomic_init_u64() should initialize >> the value? > >It's because of how pg_atomic_exchange_u64_impl is implemented: > >``` >while (true) >{ > old = ptr->value; /* <-- reading of uninitialized value! */ > if (pg_atomic_compare_exchange_u64_impl(ptr, &old, xchg_)) > break; >} >``` > >Currently pg_atomic_init_u64 works like this: > >pg_atomic_init_u64 >`- pg_atomic_init_u64_impl > `- pg_atomic_write_u64_impl > `- pg_atomic_exchange_u64_impl > >I suspect there is actually no need to make an atomic exchange during >initialization of an atomic variable. Regular `mov` should be enough >(IIRC there is no need to do `lock mov` since `mov` is already atomic). >Anyway I don't feel brave enough right now to mess with atomic >operations since it involves all sort of portability issues. So I >removed this change for now. There's platforms with atomic 8 byte compare exchange, without atomic 8 byte regular stores. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Piotr Stefaniak
Date:
On 2016-08-18 20:02, Heikki Linnakangas wrote: > On 03/22/2016 03:27 PM, Aleksander Alekseev wrote: >> diff --git a/src/backend/utils/mmgr/aset.c >> b/src/backend/utils/mmgr/aset.c >> index d26991e..46ab8a2 100644 >> --- a/src/backend/utils/mmgr/aset.c >> +++ b/src/backend/utils/mmgr/aset.c >> @@ -850,7 +850,7 @@ AllocSetAlloc(MemoryContext context, Size size) >> blksize <<= 1; >> >> /* Try to allocate it */ >> - block = (AllocBlock) malloc(blksize); >> + block = (AllocBlock) calloc(1, blksize); >> >> /* >> * We could be asking for pretty big blocks here, so cope if >> malloc > > I think this goes too far. You're zeroing all palloc'd memory, even if > it's going to be passed to palloc0(), and zeroed there. It might even > silence legitimate warnings, if there's code somewhere that does > palloc(), and accesses some of it before initializing. Plus it's a > performance hit. I just did a test where I 1. memset() that block to 0xAC (aset.c:853) 2. compile and run bin/initdb, then bin/postgres 2. run an SQL file, shut down bin/postgres 3. make a copy of the transaction log file 4. change the memset() to 0x0C, repeat steps 2-3 5. compare the two transaction log files with a combination of hexdump(1), cut(1), and diff(1). At the end of the output I can see: -0f34 0010 0000 f5ff ac02 000a 0000 0000 +0f34 0010 0000 f5ff 0c02 000a 0000 0000 So it looks like the MSan complaint might be a true positive. The SQL file is just this snippet from bit.sql: CREATE TABLE varbit_table (a BIT VARYING(16), b BIT VARYING(16)); COPY varbit_table FROM stdin; X0F X10 X1F X11 X2F X12 X3F X13 X8F X04 X000F X0010 X0123 XFFFF X2468 X2468 XFA50 X05AF X1234 XFFF5 \. I realize given information might be a little bit scarce, but I didn't know what else might be interesting to you that you wouldn't be able to reproduce.
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Tom Lane
Date:
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes: > On 2016-08-18 20:02, Heikki Linnakangas wrote: >>> - block = (AllocBlock) malloc(blksize); >>> + block = (AllocBlock) calloc(1, blksize); >> I think this goes too far. You're zeroing all palloc'd memory, even if >> it's going to be passed to palloc0(), and zeroed there. It might even >> silence legitimate warnings, if there's code somewhere that does >> palloc(), and accesses some of it before initializing. Plus it's a >> performance hit. > I just did a test where I [ this n that ] I think you are failing to understand Heikki's point. There is no way we are committing the change depicted above, because (1) it will mask more bugs than it fixes; (2) it's an enormously expensive way to fix anything; and (3) it will effectively disable valgrind testing for missed initializations. The right thing to do is find out what downstream bit of code is missing initializing a block of memory it's working on, and fix that localized oversight. It'd be useful also to figure out why our existing valgrind testing has not caught this already. The example you give looks like it surely ought to be replicated well enough in the standard regression tests. regards, tom lane
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Piotr Stefaniak
Date:
On 2016-08-19 23:55, Tom Lane wrote: > I think you are failing to understand Heikki's point. There is no way > we are committing the change depicted above, because (1) it will mask more > bugs than it fixes; (2) it's an enormously expensive way to fix anything; > and (3) it will effectively disable valgrind testing for missed > initializations. I wasn't disagreeing with Heikki, I was trying to show that while Aleksander's patch may be useless and perhaps harmful if committed, it is not useless in a larger perspective as it has made people look into an issue. And I did that simply because I have more changes of that kind that may end up being deemed as useless for committing, but I want to share them with -hackers anyway.
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Andres Freund
Date:
On 2016-08-19 17:55:25 -0400, Tom Lane wrote: > It'd be useful also to figure out why our existing valgrind testing has > not caught this already. The example you give looks like it surely > ought to be replicated well enough in the standard regression tests. The valgrind suppression file explicitly hides a bunch of padding issues.
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > On 2016-08-19 17:55:25 -0400, Tom Lane wrote: >> It'd be useful also to figure out why our existing valgrind testing has >> not caught this already. The example you give looks like it surely >> ought to be replicated well enough in the standard regression tests. > The valgrind suppression file explicitly hides a bunch of padding > issues. So maybe we ought to work towards taking those out? regards, tom lane
Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Noah Misch
Date:
On Fri, Aug 19, 2016 at 07:22:02PM -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-08-19 17:55:25 -0400, Tom Lane wrote: > >> It'd be useful also to figure out why our existing valgrind testing has > >> not caught this already. The example you give looks like it surely > >> ought to be replicated well enough in the standard regression tests. > > > The valgrind suppression file explicitly hides a bunch of padding > > issues. > > So maybe we ought to work towards taking those out? Maybe. It's a policy question boiling down to our willingness to spend cycles zeroing memory in order to limit trust in the confidentiality of storage backing the data directory. Think of "INSERT INTO t VALUES(my_encrypt('key', 'cleartext'))", subsequent to which bits of the key or cleartext leak to disk by way of WAL padding bytes. A reasonable person might expect that not to happen; GNU Privacy Guard and a few like-minded programs prevent it. I'm on the fence regarding whether PostgreSQL should target this level of vigilance. An RDBMS is mainly a tool for managing persistent data, and PostgreSQL will never be a superior tool for data that _must not_ persist. Having said that, the runtime cost of zeroing memory and the development cost of reviewing the patches to do so is fairly low.
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: > On Fri, Aug 19, 2016 at 07:22:02PM -0400, Tom Lane wrote: >> So maybe we ought to work towards taking those out? > Maybe. It's a policy question boiling down to our willingness to spend cycles > zeroing memory in order to limit trust in the confidentiality of storage > backing the data directory. Think of "INSERT INTO t VALUES(my_encrypt('key', > 'cleartext'))", subsequent to which bits of the key or cleartext leak to disk > by way of WAL padding bytes. A reasonable person might expect that not to > happen; GNU Privacy Guard and a few like-minded programs prevent it. I'm on > the fence regarding whether PostgreSQL should target this level of vigilance. > An RDBMS is mainly a tool for managing persistent data, and PostgreSQL will > never be a superior tool for data that _must not_ persist. Having said that, > the runtime cost of zeroing memory and the development cost of reviewing the > patches to do so is fairly low. [ after thinking some more about this... ] FWIW, I put pretty much zero credence in the proposition that junk left in padding bytes in WAL or data files represents a meaningful security issue. An attacker who has access to those files will probably find much more that is of interest in the non-pad data. My only interest here is in making the code sanitizer-clean, which seems like it is useful for debugging purposes independently of any security arguments. So to me, it seems like the core of this complaint boils down to "my sanitizer doesn't understand the valgrind exclusion patterns that have been created for Postgres". We can address that to some extent by trying to reduce the number of valgrind exclusions we need, but it's unlikely to be practical to get that to zero, and it's not very clear that adding runtime cycles is a good tradeoff for it either. So maybe we need to push back on the assumption that people should expect their sanitizers to produce zero warnings without having made some effort to adapt the valgrind rules. regards, tom lane
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Robert Haas
Date:
On Sat, Aug 20, 2016 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Noah Misch <noah@leadboat.com> writes: >> On Fri, Aug 19, 2016 at 07:22:02PM -0400, Tom Lane wrote: >>> So maybe we ought to work towards taking those out? > >> Maybe. It's a policy question boiling down to our willingness to spend cycles >> zeroing memory in order to limit trust in the confidentiality of storage >> backing the data directory. Think of "INSERT INTO t VALUES(my_encrypt('key', >> 'cleartext'))", subsequent to which bits of the key or cleartext leak to disk >> by way of WAL padding bytes. A reasonable person might expect that not to >> happen; GNU Privacy Guard and a few like-minded programs prevent it. I'm on >> the fence regarding whether PostgreSQL should target this level of vigilance. >> An RDBMS is mainly a tool for managing persistent data, and PostgreSQL will >> never be a superior tool for data that _must not_ persist. Having said that, >> the runtime cost of zeroing memory and the development cost of reviewing the >> patches to do so is fairly low. > > [ after thinking some more about this... ] > > FWIW, I put pretty much zero credence in the proposition that junk left in > padding bytes in WAL or data files represents a meaningful security issue. > An attacker who has access to those files will probably find much more > that is of interest in the non-pad data. My only interest here is in > making the code sanitizer-clean, which seems like it is useful for > debugging purposes independently of any security arguments. > > So to me, it seems like the core of this complaint boils down to "my > sanitizer doesn't understand the valgrind exclusion patterns that have > been created for Postgres". We can address that to some extent by trying > to reduce the number of valgrind exclusions we need, but it's unlikely to > be practical to get that to zero, and it's not very clear that adding > runtime cycles is a good tradeoff for it either. So maybe we need to push > back on the assumption that people should expect their sanitizers to > produce zero warnings without having made some effort to adapt the > valgrind rules. One idea is to add protect additional memory-clearing operations with #ifdef SANITIZER_CLEAN or something of that sort. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Andres Freund
Date:
On 2016-08-22 13:16:34 -0400, Robert Haas wrote: > On Sat, Aug 20, 2016 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > So to me, it seems like the core of this complaint boils down to "my > > sanitizer doesn't understand the valgrind exclusion patterns that have > > been created for Postgres". We can address that to some extent by trying > > to reduce the number of valgrind exclusions we need, but it's unlikely to > > be practical to get that to zero, and it's not very clear that adding > > runtime cycles is a good tradeoff for it either. So maybe we need to push > > back on the assumption that people should expect their sanitizers to > > produce zero warnings without having made some effort to adapt the > > valgrind rules. I don't think the runtime overhead is likely to be all that high - if you look at valgrind.supp the peformancecritical parts basically are: - pgstat_send - the context switching is going to drown out some zeroing - xlog insertions - making the crc computation more predictable would actually be nice - reorderbuffer serialization - zeroing won't be a material part of the cost The rest is mostly bootstrap or python related. There might be cases where we *don't* unconditionally do the zeroing - e.g. I'm doubtful about the sinval stuff where we currently only conditionally clear - but the stuff in valgrind.supp seems fine. Andres
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Robert Haas
Date:
On Mon, Aug 22, 2016 at 1:46 PM, Andres Freund <andres@anarazel.de> wrote: > I don't think the runtime overhead is likely to be all that high - if > you look at valgrind.supp the peformancecritical parts basically are: > - pgstat_send - the context switching is going to drown out some zeroing > - xlog insertions - making the crc computation more predictable would > actually be nice > - reorderbuffer serialization - zeroing won't be a material part of the > cost > > The rest is mostly bootstrap or python related. > > There might be cases where we *don't* unconditionally do the zeroing - > e.g. I'm doubtful about the sinval stuff where we currently only > conditionally clear - but the stuff in valgrind.supp seems fine. Naturally you'll be wanting to conclusively demonstrate this with benchmarks on multiple workloads, platforms, and concurrency levels. Right? :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Andres Freund
Date:
On 2016-08-22 13:49:47 -0400, Robert Haas wrote: > On Mon, Aug 22, 2016 at 1:46 PM, Andres Freund <andres@anarazel.de> wrote: > > I don't think the runtime overhead is likely to be all that high - if > > you look at valgrind.supp the peformancecritical parts basically are: > > - pgstat_send - the context switching is going to drown out some zeroing > > - xlog insertions - making the crc computation more predictable would > > actually be nice > > - reorderbuffer serialization - zeroing won't be a material part of the > > cost > > > > The rest is mostly bootstrap or python related. > > > > There might be cases where we *don't* unconditionally do the zeroing - > > e.g. I'm doubtful about the sinval stuff where we currently only > > conditionally clear - but the stuff in valgrind.supp seems fine. > > Naturally you'll be wanting to conclusively demonstrate this with > benchmarks on multiple workloads, platforms, and concurrency levels. > Right? :-) Pah ;) I do think some micro-benchmarks aiming at the individual costs make sense, we're only taking about ~three places in the code - don't think concurrency plays a large role though ;)
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Heikki Linnakangas
Date:
On 08/20/2016 10:46 PM, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: >> On Fri, Aug 19, 2016 at 07:22:02PM -0400, Tom Lane wrote: >>> So maybe we ought to work towards taking those out? > >> Maybe. It's a policy question boiling down to our willingness to spend cycles >> zeroing memory in order to limit trust in the confidentiality of storage >> backing the data directory. Think of "INSERT INTO t VALUES(my_encrypt('key', >> 'cleartext'))", subsequent to which bits of the key or cleartext leak to disk >> by way of WAL padding bytes. A reasonable person might expect that not to >> happen; GNU Privacy Guard and a few like-minded programs prevent it. I'm on >> the fence regarding whether PostgreSQL should target this level of vigilance. >> An RDBMS is mainly a tool for managing persistent data, and PostgreSQL will >> never be a superior tool for data that _must not_ persist. Having said that, >> the runtime cost of zeroing memory and the development cost of reviewing the >> patches to do so is fairly low. > > [ after thinking some more about this... ] > > FWIW, I put pretty much zero credence in the proposition that junk left in > padding bytes in WAL or data files represents a meaningful security issue. > An attacker who has access to those files will probably find much more > that is of interest in the non-pad data. My only interest here is in > making the code sanitizer-clean, which seems like it is useful for > debugging purposes independently of any security arguments. Yeah, that's how I view these, too. > So to me, it seems like the core of this complaint boils down to "my > sanitizer doesn't understand the valgrind exclusion patterns that have > been created for Postgres". We can address that to some extent by trying > to reduce the number of valgrind exclusions we need, but it's unlikely to > be practical to get that to zero, and it's not very clear that adding > runtime cycles is a good tradeoff for it either. So maybe we need to push > back on the assumption that people should expect their sanitizers to > produce zero warnings without having made some effort to adapt the > valgrind rules. I'll mark this as "returned with feedback". I'd be happy to take a patch that helps to reduce sanitizer complaints, but this seems to need some work. Aleksander, how did you run the sanitizer? I tried to build with clang 4.0, with the -fsanitize=memory option, and ran "make installcheck-parallel", but I didn't get any sanitizer errors out of it. I did get some errors, from failing to load "regress.so", though: ERROR: could not load library "/home/heikki/git-sandbox-pgsql/master/src/test/regress/regress.so": /home/heikki/git-sandbox-pgsql/master/src/test/regress/regress.so: undefined symbol: __msan_va_arg_overflow_size_tls How did you do it? - Heikki
Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
From
Aleksander Alekseev
Date:
> I'll mark this as "returned with feedback". I'd be happy to take a patch > that helps to reduce sanitizer complaints, but this seems to need some work. > > Aleksander, how did you run the sanitizer? I tried to build with clang > 4.0, with the -fsanitize=memory option, and ran "make > installcheck-parallel", but I didn't get any sanitizer errors out of it. > I did get some errors, from failing to load "regress.so", though: > > ERROR: could not load library > "/home/heikki/git-sandbox-pgsql/master/src/test/regress/regress.so": > /home/heikki/git-sandbox-pgsql/master/src/test/regress/regress.so: > undefined symbol: __msan_va_arg_overflow_size_tls > > How did you do it? It's quite simple actually [1][2]. I've just re-checked on Ubuntu 16.04 and Clang 3.8: ``` sudo apt-get install clang git make flex bison libreadline-dev \ zlib1g-dev jade git clone http://git.postgresql.org/git/postgresql.git cd postgresql CC=/usr/bin/clang CFLAGS="-fsanitize=memory -fPIE -pie" ./configure make -j4 -s MSAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-3.8 make check ``` Stacktraces are written to src/test/regress/log/initdb.log. You can add `printf("%d\n", getpid())` and `sleep(1000)` calls somewhere in main() procedure. It will give you some time to connect using debugger. IIRC it's what I did. [1] http://clang.llvm.org/docs/MemorySanitizer.html [2] https://github.com/google/sanitizers/wiki/MemorySanitizer -- Best regards, Aleksander Alekseev