Thread: UBSan pointer overflow in xlogreader.c
Hi, xlogreader.c has a pointer overflow bug, as revealed by the combination of -fsanitize=undefined -m32, the new 039_end_of_wal.pl test and Robert's incremental backup patch[1]. The bad code tests whether an object could fit using something like base + size <= end, which can be converted to something like size <= end - base to avoid the overflow. See experimental fix patch, attached. I took a while to follow up because I wanted to understand exactly why it doesn't break in practice despite being that way since v15. I think I have it now: 1. In the case of a bad/garbage size at end-of-WAL, the following-page checks will fail first before anything bad happens as a result of the overflow. 2. In the case of a real oversized record on current 64 bit architectures including amd64, aarch64, power and riscv64, the pointer can't really overflow anyway because the virtual address space is < 64 bit, typically around 48, and record lengths are 32 bit. 3. In the case of the 32 bit kernels I looked at including Linux, FreeBSD and cousins, Solaris and Windows the top 1GB plus a bit more of virtual address space is reserved for system use*, so I think a real oversized record shouldn't be able to overflow the pointer there either. A 64 bit kernel running a 32 bit process could run into trouble, though :-(. Those don't need to block out that high memory segment. You'd need to have the WAL buffer in that address range and decode large enough real WAL records and then things could break badly. I guess 32/64 configurations must be rare these days outside developers testing 32 bit code, and that is what happened here (ie CI); and with some minor tweaks to the test it can be reached without Robert's patch too. There may of course be other more exotic systems that could break, but I don't know specifically what. TLDR; this is a horrible bug, but all damage seems to be averted on "normal" systems. The best thing I can say about all this is that the new test found a bug, and the fix seems straightforward. I will study and test this some more, but wanted to share what I have so far. (*I think the old 32 bit macOS kernels might have been an exception to this pattern but 32 bit kernels and even processes are history there.) [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGLCuW_CE9nDAbUNV40G2FkpY_kcPZkaORyVBVive8FQHQ%40mail.gmail.com#d0d00ca5cc3f756656466adc9f2ec186
Attachment
On Wed, Dec 06, 2023 at 12:03:53AM +1300, Thomas Munro wrote: > xlogreader.c has a pointer overflow bug, as revealed by the > combination of -fsanitize=undefined -m32, the new 039_end_of_wal.pl > test and Robert's incremental backup patch[1]. The bad code tests > whether an object could fit using something like base + size <= end, > which can be converted to something like size <= end - base to avoid > the overflow. See experimental fix patch, attached. The patch LGTM. I wonder if it might be worth creating some special pointer arithmetic routines (perhaps using the stuff in common/int.h) to help prevent this sort of thing in the future. But that'd require you to realize that your code is at risk of overflow, at which point it's probably just as easy to restructure the logic like you've done here. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Dec 5, 2023 at 1:04 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Wed, Dec 06, 2023 at 12:03:53AM +1300, Thomas Munro wrote: > > xlogreader.c has a pointer overflow bug, as revealed by the > > combination of -fsanitize=undefined -m32, the new 039_end_of_wal.pl > > test and Robert's incremental backup patch[1]. The bad code tests > > whether an object could fit using something like base + size <= end, > > which can be converted to something like size <= end - base to avoid > > the overflow. See experimental fix patch, attached. > > The patch LGTM. I wonder if it might be worth creating some special > pointer arithmetic routines (perhaps using the stuff in common/int.h) to > help prevent this sort of thing in the future. But that'd require you to > realize that your code is at risk of overflow, at which point it's probably > just as easy to restructure the logic like you've done here. The patch LGTM, too. Thanks for investigating and writing the code. The part about how the reserved kernel memory prevents the bug from appearing on 32-bit systems but not 64-bit systems running in 32-bit mode is pretty interesting -- I don't want to think about how long it would have taken me to figure that out. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Dec 05, 2023 at 03:48:33PM -0500, Robert Haas wrote: > The patch LGTM, too. Thanks for investigating and writing the code. > The part about how the reserved kernel memory prevents the bug from > appearing on 32-bit systems but not 64-bit systems running in 32-bit > mode is pretty interesting -- I don't want to think about how long it > would have taken me to figure that out. +1 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Dec 5, 2023 at 4:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > +1 So, Thomas ... any chance you could commit this? So that my patch stops making cfbot sad? -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Dec 8, 2023 at 3:57 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Dec 5, 2023 at 4:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > +1 > > So, Thomas ... any chance you could commit this? So that my patch > stops making cfbot sad? Done. Thanks both for the reviews.
On Thu, Dec 7, 2023 at 10:18 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Dec 8, 2023 at 3:57 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Dec 5, 2023 at 4:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > +1 > > > > So, Thomas ... any chance you could commit this? So that my patch > > stops making cfbot sad? > > Done. Thanks both for the reviews. Thank you! -- Robert Haas EDB: http://www.enterprisedb.com