Thread: BUG #16694: Server hangs in 100% CPU loop when decompressing a specific TOAST Postgis linestring

The following bug has been logged on the website:

Bug reference:      16694
Logged by:          Tom Vijlbrief
Email address:      tvijlbrief@gmail.com
PostgreSQL version: 13.0
Operating system:   Ubuntu 18.04
Description:

See https://trac.osgeo.org/postgis/ticket/4776 for a full description

Summary:

st_numpoints() fetches the whole TOAST object and works fine but
st_geometrytype() just requests a slice at the front of the TOAST object.
This looks to me like a low level issue with Postgres13 and TOAST objects of
a specific size and or compression behavior. 

Could be related to this Postgresql rel_13 commit:

https://github.com/postgres/postgres/commit/11a078cf87ffb611d19c7dec6df68b41084ad9c9

Disabling compression elimates the hang:

 alter table demo alter column wkb_geometry set storage external; 

Top of stack trace when hanging:

#0  pglz_decompress (source=source@entry=0x55584d6c39b0 "", slen=<optimized
out>,
    dest=dest@entry=0x55584d6c3a3c "", rawsize=rawsize@entry=52,
    check_complete=check_complete@entry=false) at
./build/../src/common/pg_lzcompress.c:767
#1  0x000055584c5906cb in toast_decompress_datum_slice (slicelength=52,
attr=0x55584d6c39a8)
    at ./build/../src/backend/access/common/detoast.c:483
#2  detoast_attr_slice (attr=<optimized out>,
sliceoffset=sliceoffset@entry=0, slicelength=52)
    at ./build/../src/backend/access/common/detoast.c:274
#3  0x000055584c9e6f2a in pg_detoast_datum_slice (datum=<optimized out>,
first=first@entry=0,
    count=<optimized out>) at
./build/../src/backend/utils/fmgr/fmgr.c:1754
#4  0x00007f0bc257a7c1 in geometry_geometrytype (fcinfo=0x55584d6bea60) at
lwgeom_ogc.c:199
#5  0x000055584c740c9e in ExecInterpExpr (state=0x55584d6be578,
econtext=0x55584d6be260,
    isnull=<optimized out>) at
./build/../src/backend/executor/execExprInterp.c:699
#6  0x000055584c74ec62 in ExecEvalExprSwitchContext (isNull=0x7ffec8369497,
econtext=0x55584d6be260,
    state=0x55584d6be578) at
./build/../src/include/executor/executor.h:313
#7  ExecProject (projInfo=0x55584d6be570) at
./build/../src/include/executor/executor.h:347
#8  ExecScan (node=<optimized out>, accessMtd=0x55584c773130 <SeqNext>,
    recheckMtd=0x55584c7731c0 <SeqRecheck>) at
./build/../src/backend/executor/execScan.c:238
#9  0x000055584c744bcd in ExecProcNode (node=0x55584d6be150)
    at ./build/../src/include/executor/executor.h:245
#10 ExecutePlan (execute_once=<optimized out>, dest=0x55584d6b8640,
direction=<optimized out>,
    numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT,
    use_parallel_mode=<optimized out>, planstate=0x55584d6be150,
estate=0x55584d6bded8)
    at ./build/../src/backend/executor/execMain.c:1646


PG Bug reporting form <noreply@postgresql.org> writes:
> st_numpoints() fetches the whole TOAST object and works fine but
> st_geometrytype() just requests a slice at the front of the TOAST object.
> This looks to me like a low level issue with Postgres13 and TOAST objects of
> a specific size and or compression behavior.
> Could be related to this Postgresql rel_13 commit:
> https://github.com/postgres/postgres/commit/11a078cf87ffb611d19c7dec6df68b41084ad9c9

Hm.  I think that pglz_maximum_compressed_size is off a little bit:
it fails to consider that if N decompressed bytes are wanted, we could
have compressed data containing N-1 literal bytes and then a match tag,
which requires 2 or 3 bytes, so that the calculation possibly
underestimates the amount we need by 1 or 2 bytes.

You'd need quite a bit of bad luck to make that happen, though,
especially if the requested length is not small (and 52 bytes
isn't that small).

I suspect the actual bug is in c60e520f6 (Use memcpy instead of a
byte loop in pglz_decompress), which looks fishy to me, although
I can't quite identify what's wrong with it.  I wonder if it's
possible for "off" to be zero?  Doesn't seem like it should be
in normal use, but then maybe that's what triggers the infinite
loop (rather than just garbage results) if the other bug causes
us to read garbage data.

I think this code is in general too trusting of the input not to
be corrupt --- it doesn't notice a match tag that extends past
the end of the input, either.

            regards, tom lane



> PG Bug reporting form <noreply@postgresql.org> writes:
>> This looks to me like a low level issue with Postgres13 and TOAST objects of
>> a specific size and or compression behavior.

After looking at it some more, I'm pretty sure that these issues could
explain your problem, though it's possible there's an additional
contributing issue.  If you're in a position to apply a patch and
see whether it fixes the problem, please try

https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=2330f4d3a87ac43b6ecd31bfd698384888ed03cb

            regards, tom lane



> 2 нояб. 2020 г., в 04:45, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
>> PG Bug reporting form <noreply@postgresql.org> writes:
>>> This looks to me like a low level issue with Postgres13 and TOAST objects of
>>> a specific size and or compression behavior.
>
> After looking at it some more, I'm pretty sure that these issues could
> explain your problem, though it's possible there's an additional
> contributing issue.  If you're in a position to apply a patch and
> see whether it fixes the problem, please try
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=2330f4d3a87ac43b6ecd31bfd698384888ed03cb

Thanks for fixing this, Tom!
1 or 2 extra bytes of match header at the end of sequence of literals is a bug for sure. And the input sequence does
notneed to be small. 

I'm not sure protection from corrupt input is complete within pglz. We still do not protect from matches with offsets
beforesource data. This can SegFault or lead to security leaks. I suspect there may be other go-wild input sequences. 

Thanks! Best regards, Andrey Borodin.


Andrey Borodin <x4mmm@yandex-team.ru> writes:
> I'm not sure protection from corrupt input is complete within pglz. We
> still do not protect from matches with offsets before source data.

Yeah, I was wondering about that.  Not quite sure it's worth adding
cycles to defend against though.  I don't buy the "security" aspect,
since there's no plausible route for an attacker to inject corrupted
compressed data unless they already have full access to the database.
The "maybe core dump" argument is a bit stronger, but not very much so.

            regards, tom lane




> 2 нояб. 2020 г., в 10:16, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> I'm not sure protection from corrupt input is complete within pglz. We
>> still do not protect from matches with offsets before source data.
>
> Yeah, I was wondering about that.  Not quite sure it's worth adding
> cycles to defend against though.

I think so too. That's why I was not protecting from corruption at all when proposing c60e520.

Here we can have core dump too if *(srcend+1) produce SegFault:
len = (sp[0] & 0x0f) + 3;
off = ((sp[0] & 0xf0) << 4) | sp[1];
sp += 2;
if (len == 18)
len += *sp++;
if (sp > srcend || off == 0)
break;

And proper protection would cost nonzero time for branches. I think we should not protect against this.
I hope to see lz4 (or something else) in toast and other places one day :)

Best regards, Andrey Borodin.


Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> 2 нояб. 2020 г., в 10:16, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>> Yeah, I was wondering about that.  Not quite sure it's worth adding
>> cycles to defend against though.

> I think so too. That's why I was not protecting from corruption at all when proposing c60e520.
> Here we can have core dump too if *(srcend+1) produce SegFault:

Right.  The "sp > srcend" check that I added doesn't help if we
were unlucky enough to be right up against the end of memory.
I judged it was cheap enough to be worth including, but I'd
agree that's a judgment call.  Conversely, checking for "off"
being too far back would require tracking the distance back to
the start of the output buffer, which is a value the loop does
not keep ATM.  So that would require adding more cycles to track
(in addition to the actual comparison), which is why it seemed
not quite worth it.

            regards, tom lane




Op ma 2 nov. 2020 om 00:45 schreef Tom Lane <tgl@sss.pgh.pa.us>:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> This looks to me like a low level issue with Postgres13 and TOAST objects of
>> a specific size and or compression behavior.

After looking at it some more, I'm pretty sure that these issues could
explain your problem, though it's possible there's an additional
contributing issue.  If you're in a position to apply a patch and
see whether it fixes the problem, please try

https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=2330f4d3a87ac43b6ecd31bfd698384888ed03cb


Yes, this patch fixes the problem.

Thanks,
Tom
Tom Vijlbrief <tvijlbrief@gmail.com> writes:
> Op ma 2 nov. 2020 om 00:45 schreef Tom Lane <tgl@sss.pgh.pa.us>:
>> After looking at it some more, I'm pretty sure that these issues could
>> explain your problem, though it's possible there's an additional
>> contributing issue.  If you're in a position to apply a patch and
>> see whether it fixes the problem, please try
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=2330f4d3a87ac43b6ecd31bfd698384888ed03cb

> Yes, this patch fixes the problem.

Great, thanks for testing!

            regards, tom lane