Thread: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16694 Logged by: Tom Vijlbrief Email address: tvijlbrief@gmail.com PostgreSQL version: 13.0 Operating system: Ubuntu 18.04 Description: See https://trac.osgeo.org/postgis/ticket/4776 for a full description Summary: st_numpoints() fetches the whole TOAST object and works fine but st_geometrytype() just requests a slice at the front of the TOAST object. This looks to me like a low level issue with Postgres13 and TOAST objects of a specific size and or compression behavior. Could be related to this Postgresql rel_13 commit: https://github.com/postgres/postgres/commit/11a078cf87ffb611d19c7dec6df68b41084ad9c9 Disabling compression elimates the hang: alter table demo alter column wkb_geometry set storage external; Top of stack trace when hanging: #0 pglz_decompress (source=source@entry=0x55584d6c39b0 "", slen=<optimized out>, dest=dest@entry=0x55584d6c3a3c "", rawsize=rawsize@entry=52, check_complete=check_complete@entry=false) at ./build/../src/common/pg_lzcompress.c:767 #1 0x000055584c5906cb in toast_decompress_datum_slice (slicelength=52, attr=0x55584d6c39a8) at ./build/../src/backend/access/common/detoast.c:483 #2 detoast_attr_slice (attr=<optimized out>, sliceoffset=sliceoffset@entry=0, slicelength=52) at ./build/../src/backend/access/common/detoast.c:274 #3 0x000055584c9e6f2a in pg_detoast_datum_slice (datum=<optimized out>, first=first@entry=0, count=<optimized out>) at ./build/../src/backend/utils/fmgr/fmgr.c:1754 #4 0x00007f0bc257a7c1 in geometry_geometrytype (fcinfo=0x55584d6bea60) at lwgeom_ogc.c:199 #5 0x000055584c740c9e in ExecInterpExpr (state=0x55584d6be578, econtext=0x55584d6be260, isnull=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:699 #6 0x000055584c74ec62 in ExecEvalExprSwitchContext (isNull=0x7ffec8369497, econtext=0x55584d6be260, state=0x55584d6be578) at ./build/../src/include/executor/executor.h:313 #7 ExecProject (projInfo=0x55584d6be570) at ./build/../src/include/executor/executor.h:347 #8 ExecScan (node=<optimized out>, accessMtd=0x55584c773130 <SeqNext>, recheckMtd=0x55584c7731c0 <SeqRecheck>) at ./build/../src/backend/executor/execScan.c:238 #9 0x000055584c744bcd in ExecProcNode (node=0x55584d6be150) at ./build/../src/include/executor/executor.h:245 #10 ExecutePlan (execute_once=<optimized out>, dest=0x55584d6b8640, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55584d6be150, estate=0x55584d6bded8) at ./build/../src/backend/executor/execMain.c:1646
Re: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > st_numpoints() fetches the whole TOAST object and works fine but > st_geometrytype() just requests a slice at the front of the TOAST object. > This looks to me like a low level issue with Postgres13 and TOAST objects of > a specific size and or compression behavior. > Could be related to this Postgresql rel_13 commit: > https://github.com/postgres/postgres/commit/11a078cf87ffb611d19c7dec6df68b41084ad9c9 Hm. I think that pglz_maximum_compressed_size is off a little bit: it fails to consider that if N decompressed bytes are wanted, we could have compressed data containing N-1 literal bytes and then a match tag, which requires 2 or 3 bytes, so that the calculation possibly underestimates the amount we need by 1 or 2 bytes. You'd need quite a bit of bad luck to make that happen, though, especially if the requested length is not small (and 52 bytes isn't that small). I suspect the actual bug is in c60e520f6 (Use memcpy instead of a byte loop in pglz_decompress), which looks fishy to me, although I can't quite identify what's wrong with it. I wonder if it's possible for "off" to be zero? Doesn't seem like it should be in normal use, but then maybe that's what triggers the infinite loop (rather than just garbage results) if the other bug causes us to read garbage data. I think this code is in general too trusting of the input not to be corrupt --- it doesn't notice a match tag that extends past the end of the input, either. regards, tom lane
Re: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
From
Tom Lane
Date:
> PG Bug reporting form <noreply@postgresql.org> writes: >> This looks to me like a low level issue with Postgres13 and TOAST objects of >> a specific size and or compression behavior. After looking at it some more, I'm pretty sure that these issues could explain your problem, though it's possible there's an additional contributing issue. If you're in a position to apply a patch and see whether it fixes the problem, please try https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=2330f4d3a87ac43b6ecd31bfd698384888ed03cb regards, tom lane
Re: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
From
Andrey Borodin
Date:
> 2 нояб. 2020 г., в 04:45, Tom Lane <tgl@sss.pgh.pa.us> написал(а): > >> PG Bug reporting form <noreply@postgresql.org> writes: >>> This looks to me like a low level issue with Postgres13 and TOAST objects of >>> a specific size and or compression behavior. > > After looking at it some more, I'm pretty sure that these issues could > explain your problem, though it's possible there's an additional > contributing issue. If you're in a position to apply a patch and > see whether it fixes the problem, please try > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=2330f4d3a87ac43b6ecd31bfd698384888ed03cb Thanks for fixing this, Tom! 1 or 2 extra bytes of match header at the end of sequence of literals is a bug for sure. And the input sequence does notneed to be small. I'm not sure protection from corrupt input is complete within pglz. We still do not protect from matches with offsets beforesource data. This can SegFault or lead to security leaks. I suspect there may be other go-wild input sequences. Thanks! Best regards, Andrey Borodin.
Re: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes: > I'm not sure protection from corrupt input is complete within pglz. We > still do not protect from matches with offsets before source data. Yeah, I was wondering about that. Not quite sure it's worth adding cycles to defend against though. I don't buy the "security" aspect, since there's no plausible route for an attacker to inject corrupted compressed data unless they already have full access to the database. The "maybe core dump" argument is a bit stronger, but not very much so. regards, tom lane
Re: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
From
Andrey Borodin
Date:
> 2 нояб. 2020 г., в 10:16, Tom Lane <tgl@sss.pgh.pa.us> написал(а): > > Andrey Borodin <x4mmm@yandex-team.ru> writes: >> I'm not sure protection from corrupt input is complete within pglz. We >> still do not protect from matches with offsets before source data. > > Yeah, I was wondering about that. Not quite sure it's worth adding > cycles to defend against though. I think so too. That's why I was not protecting from corruption at all when proposing c60e520. Here we can have core dump too if *(srcend+1) produce SegFault: len = (sp[0] & 0x0f) + 3; off = ((sp[0] & 0xf0) << 4) | sp[1]; sp += 2; if (len == 18) len += *sp++; if (sp > srcend || off == 0) break; And proper protection would cost nonzero time for branches. I think we should not protect against this. I hope to see lz4 (or something else) in toast and other places one day :) Best regards, Andrey Borodin.
Re: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes: >> 2 нояб. 2020 г., в 10:16, Tom Lane <tgl@sss.pgh.pa.us> написал(а): >> Yeah, I was wondering about that. Not quite sure it's worth adding >> cycles to defend against though. > I think so too. That's why I was not protecting from corruption at all when proposing c60e520. > Here we can have core dump too if *(srcend+1) produce SegFault: Right. The "sp > srcend" check that I added doesn't help if we were unlucky enough to be right up against the end of memory. I judged it was cheap enough to be worth including, but I'd agree that's a judgment call. Conversely, checking for "off" being too far back would require tracking the distance back to the start of the output buffer, which is a value the loop does not keep ATM. So that would require adding more cycles to track (in addition to the actual comparison), which is why it seemed not quite worth it. regards, tom lane
Re: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
From
Tom Vijlbrief
Date:
Op ma 2 nov. 2020 om 00:45 schreef Tom Lane <tgl@sss.pgh.pa.us>:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> This looks to me like a low level issue with Postgres13 and TOAST objects of
>> a specific size and or compression behavior.
After looking at it some more, I'm pretty sure that these issues could
explain your problem, though it's possible there's an additional
contributing issue. If you're in a position to apply a patch and
see whether it fixes the problem, please try
https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=2330f4d3a87ac43b6ecd31bfd698384888ed03cb
Yes, this patch fixes the problem.
Thanks,
Tom
Re: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring
From
Tom Lane
Date:
Tom Vijlbrief <tvijlbrief@gmail.com> writes: > Op ma 2 nov. 2020 om 00:45 schreef Tom Lane <tgl@sss.pgh.pa.us>: >> After looking at it some more, I'm pretty sure that these issues could >> explain your problem, though it's possible there's an additional >> contributing issue. If you're in a position to apply a patch and >> see whether it fixes the problem, please try >> https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=2330f4d3a87ac43b6ecd31bfd698384888ed03cb > Yes, this patch fixes the problem. Great, thanks for testing! regards, tom lane