Thread: Bug: Reading from single byte character column type may cause out of bounds memory reads.

Bug: Reading from single byte character column type may cause out of bounds memory reads.

From
Spyridon Dimitrios Agathos
Date:
Hi hackers,

While I was writing a test for PSQL, I faced a weird scenario. Depending on how I build PSQL (enabling or not debug options), I saw different results for the following query.

Steps to reproduce:
- OS: Ubuntu 20.04
- PSQL version 14.4

CREATE TABLE test (single_byte_col "char");
INSERT INTO test (single_byte_col) VALUES ('🀆');

In case where the following query runs in debug mode, I got this result:
SELECT left(single_byte_col, 1) l FROM test;
      l
-----------
 ~\x7F\x7F
(1 row)

Once I turn off debug mode, I got:
SELECT left(single_byte_col, 1) l FROM test;
 l
---

(1 row)

That triggered me to use Valgrind, which reported the following error:

==00:00:03:22.867 1171== VALGRINDERROR-BEGIN
==00:00:03:22.867 1171== Invalid read of size 1
==00:00:03:22.867 1171== at 0x4C41209: memmove (vg_replace_strmem.c:1382)
==00:00:03:22.868 1171== by 0xC8D8D0: text_substring (../src/backend/utils/adt/varlena.c:1050)
==00:00:03:22.868 1171== by 0xC9AC78: text_left (../src/backend/utils/adt/varlena.c:5614)
==00:00:03:22.868 1171== by 0x7FBB75: ExecInterpExpr (../src/backend/executor/execExprInterp.c:749)
==00:00:03:22.868 1171== by 0x7FADA6: ExecInterpExprStillValid (../src/backend/executor/execExprInterp.c:1824)
==00:00:03:22.868 1171== by 0x81C0AA: ExecEvalExprSwitchContext (../src/include/executor/executor.h:339)
==00:00:03:22.868 1171== by 0x81BD96: ExecProject (../src/include/executor/executor.h:373)
==00:00:03:22.868 1171== by 0x81B8BE: ExecScan (../src/backend/executor/execScan.c:238)
==00:00:03:22.868 1171== by 0x8638BA: ExecSeqScan (../src/backend/executor/nodeSeqscan.c:112)
==00:00:03:22.868 1171== by 0x8170D4: ExecProcNodeFirst (../src/backend/executor/execProcnode.c:465)
==00:00:03:22.868 1171== by 0x80F291: ExecProcNode (../src/include/executor/executor.h:257)
==00:00:03:22.868 1171== by 0x80A8F0: ExecutePlan (../src/backend/executor/execMain.c:1551)
==00:00:03:22.868 1171== by 0x80A7A3: standard_ExecutorRun (../src/backend/executor/execMain.c:361)
==00:00:03:22.868 1171== by 0x136231BC: pgss_ExecutorRun (../contrib/pg_stat_statements/pg_stat_statements.c:1001)
==00:00:03:22.868 1171== by 0x80A506: ExecutorRun (../src/backend/executor/execMain.c:303)
==00:00:03:22.868 1171== by 0xAD71B0: PortalRunSelect (../src/backend/tcop/pquery.c:921)
==00:00:03:22.868 1171== by 0xAD6B75: PortalRun (../src/backend/tcop/pquery.c:765)
==00:00:03:22.868 1171== by 0xAD1B3C: exec_simple_query (../src/backend/tcop/postgres.c:1214)
==00:00:03:22.868 1171== by 0xAD0D7F: PostgresMain (../src/backend/tcop/postgres.c:4496)
==00:00:03:22.868 1171== by 0x9D6C79: BackendRun (../src/backend/postmaster/postmaster.c:4530)
==00:00:03:22.868 1171== by 0x9D61A3: BackendStartup (../src/backend/postmaster/postmaster.c:4252)
==00:00:03:22.868 1171== by 0x9D4F56: ServerLoop (../src/backend/postmaster/postmaster.c:1745)
==00:00:03:22.868 1171== by 0x9D23A4: PostmasterMain (../src/backend/postmaster/postmaster.c:1417)
==00:00:03:22.868 1171== by 0x8AA0D2: main (../src/backend/main/main.c:209)
==00:00:03:22.868 1171== Address 0x3ed1f3c5 is 293 bytes inside a block of size 8,192 alloc'd
==00:00:03:22.868 1171== at 0x4C37135: malloc (vg_replace_malloc.c:381)
==00:00:03:22.868 1171== by 0xD1C3C0: AllocSetContextCreateInternal (../src/backend/utils/mmgr/aset.c:469)
==00:00:03:22.868 1171== by 0x821CB4: CreateExprContextInternal (../src/backend/executor/execUtils.c:253)
==00:00:03:22.868 1171== by 0x821BE2: CreateExprContext (../src/backend/executor/execUtils.c:303)
==00:00:03:22.868 1171== by 0x822078: ExecAssignExprContext (../src/backend/executor/execUtils.c:482)
==00:00:03:22.868 1171== by 0x8637D6: ExecInitSeqScan (../src/backend/executor/nodeSeqscan.c:147)
==00:00:03:22.868 1171== by 0x816B2D: ExecInitNode (../src/backend/executor/execProcnode.c:211)
==00:00:03:22.868 1171== by 0x80A336: InitPlan (../src/backend/executor/execMain.c:936)
==00:00:03:22.868 1171== by 0x809BE7: standard_ExecutorStart (../src/backend/executor/execMain.c:263)
==00:00:03:22.868 1171== by 0x1362303C: pgss_ExecutorStart (../contrib/pg_stat_statements/pg_stat_statements.c:963)
==00:00:03:22.868 1171== by 0x809881: ExecutorStart (../src/backend/executor/execMain.c:141)
==00:00:03:22.868 1171== by 0xAD636A: PortalStart (../src/backend/tcop/pquery.c:514)
==00:00:03:22.868 1171== by 0xAD1A31: exec_simple_query (../src/backend/tcop/postgres.c:1175)
==00:00:03:22.868 1171== by 0xAD0D7F: PostgresMain (../src/backend/tcop/postgres.c:4496)
==00:00:03:22.868 1171== by 0x9D6C79: BackendRun (../src/backend/postmaster/postmaster.c:4530)
==00:00:03:22.868 1171== by 0x9D61A3: BackendStartup (../src/backend/postmaster/postmaster.c:4252)
==00:00:03:22.868 1171== by 0x9D4F56: ServerLoop (../src/backend/postmaster/postmaster.c:1745)
==00:00:03:22.868 1171== by 0x9D23A4: PostmasterMain (../src/backend/postmaster/postmaster.c:1417)
==00:00:03:22.868 1171== by 0x8AA0D2: main (../src/backend/main/main.c:209)
==00:00:03:22.868 1171==
==00:00:03:22.868 1171== VALGRINDERROR-END

Then I attached a debugger to inspect the variables involved in the error reported by Valgrind for file /src/backend/utils/adt/varlena.c:

1045 for (i = S1; i < E1; i++)
1046 p += pg_mblen(p);
1047
1048 ret = (text *) palloc(VARHDRSZ + (p - s));
1049 SET_VARSIZE(ret, VARHDRSZ + (p - s));
1050 memcpy(VARDATA(ret), s, (p - s));

The column "single_byte_col" is supposed to store only 1 byte. Nevertheless, the INSERT command implicitly casts the '🀆' text into "char". This means that only the first byte of '🀆' ends up stored in the column.
gdb reports that "pg_mblen(p) = 4" (line 1046), which is expected since the pg_mblen('🀆') is indeed 4. Later at line 1050, the memcpy will copy 4 bytes instead of 1, hence an out of bounds memory read happens for pointer 's', which effectively copies random bytes.

--
Spiros
(ServiceNow)

Hi Spyridon,

> The column "single_byte_col" is supposed to store only 1 byte. Nevertheless, the INSERT command implicitly casts the
'🀆'text into "char". This means that only the first byte of '🀆' ends up stored in the column. 
> gdb reports that "pg_mblen(p) = 4" (line 1046), which is expected since the pg_mblen('🀆') is indeed 4. Later at line
1050,the memcpy will copy 4 bytes instead of 1, hence an out of bounds memory read happens for pointer 's', which
effectivelycopies random bytes. 

Many thanks for reporting this!

> - OS: Ubuntu 20.04
> - PSQL version 14.4

I can confirm the bug exists in the `master` branch as well and
doesn't depend on the platform.

Although the bug is easy to fix for this particular case (see the
patch) I'm not sure if this solution is general enough. E.g. is there
something that generally prevents pg_mblen() from doing out of bound
reading in cases similar to this one? Should we prevent such an INSERT
from happening instead?

--
Best regards,
Aleksander Alekseev

Attachment
Aleksander Alekseev <aleksander@timescale.com> writes:
> Although the bug is easy to fix for this particular case (see the
> patch) I'm not sure if this solution is general enough. E.g. is there
> something that generally prevents pg_mblen() from doing out of bound
> reading in cases similar to this one? Should we prevent such an INSERT
> from happening instead?

This is ultimately down to char_text() generating a string that's alleged
to be a valid "text" type value, but it contains illegally-encoded data.
Where we need to fix it is there: if we try to make every single
text-using function be 100% bulletproof against wrongly-encoded data,
we'll still be fixing bugs at the heat death of the universe.

I complained about this in [1], but that thread died off before reaching a
clear consensus about exactly what to do.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/2318797.1638558730%40sss.pgh.pa.us



On Wed, 13 Jul 2022 at 09:15, Aleksander Alekseev <aleksander@timescale.com> wrote:

I can confirm the bug exists in the `master` branch as well and
doesn't depend on the platform.

Although the bug is easy to fix for this particular case (see the
patch) I'm not sure if this solution is general enough. E.g. is there
something that generally prevents pg_mblen() from doing out of bound
reading in cases similar to this one? Should we prevent such an INSERT
from happening instead?

Not just INSERTs, I would think: the implicit cast is already invalid, since the "char" type can only hold characters that can be represented in 1 byte. A comparable example in the numeric types might be:

odyssey=> select (2.0 ^ 80)::double precision::integer;
ERROR:  integer out of range

By comparison:

odyssey=> select '🀆'::"char";
 char
──────
 
(1 row)

I think this should give an error, perhaps 'ERROR: "char" out of range'.

Incidentally, if I apply ascii() to the result, I get sometimes 0 and sometimes 90112, neither of which should be a possible value for ascii () of a "char" value and neither of which is 126982, the actual value of that character.

odyssey=> select ascii ('🀆'::"char");
 ascii
───────
 90112
(1 row)

odyssey=> select ascii ('🀆'::"char");
 ascii
───────
     0
(1 row)

odyssey=> select ascii ('🀆');
 ascii  
────────
 126982
(1 row)

On 2022-07-13 We 11:11, Tom Lane wrote:
> Aleksander Alekseev <aleksander@timescale.com> writes:
>> Although the bug is easy to fix for this particular case (see the
>> patch) I'm not sure if this solution is general enough. E.g. is there
>> something that generally prevents pg_mblen() from doing out of bound
>> reading in cases similar to this one? Should we prevent such an INSERT
>> from happening instead?
> This is ultimately down to char_text() generating a string that's alleged
> to be a valid "text" type value, but it contains illegally-encoded data.
> Where we need to fix it is there: if we try to make every single
> text-using function be 100% bulletproof against wrongly-encoded data,
> we'll still be fixing bugs at the heat death of the universe.
>
> I complained about this in [1], but that thread died off before reaching a
> clear consensus about exactly what to do.
>
>             regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/2318797.1638558730%40sss.pgh.pa.us
>
>


Looks like the main controversy was about the output format. Make an
executive decision and pick one.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Andrew Dunstan <andrew@dunslane.net> writes:
> On 2022-07-13 We 11:11, Tom Lane wrote:
>> I complained about this in [1], but that thread died off before reaching a
>> clear consensus about exactly what to do.
>> [1] https://www.postgresql.org/message-id/flat/2318797.1638558730%40sss.pgh.pa.us

> Looks like the main controversy was about the output format. Make an
> executive decision and pick one.

Done, see other thread.

            regards, tom lane



В письме от среда, 13 июля 2022 г. 16:14:39 MSK пользователь Aleksander
Alekseev написал:

Hi! Let me join the review process. Postgres data types is field of expertise I
am interested in.

0. Though it looks like a steady bug, I can't reproduce it. Not using
valgrind, not using ASan (address sanitizer should catch reading out of bounds
too). I am running Debian Bullseye, and tried to build both postgresl 14.4 and
current master.

Never the less I would dig into this issue. And start with the parts that is
not covered by the patch, but seems to be important for me.

1. typename "char" (with quotes) is very-very-very confusing. it is described
in documentation, but you need to be postgres expert or careful documentation
reader, to notice important difference between "char" and char.
What is the history if "char" type? Is it specified by some standard? May be it
is good point to create more understandable alias, like byte_char, ascii_char
or something for usage in practice, and keep "char" for backward compatibility
only.

2. I would totally agree with  Tom Lane and Isaac Morland, that problem should
be also fixed  on the side of type conversion.  There is whole big thread about
it. Guess we should come to some conclusion there

3.Fixing out of bound reading for broken unicode is also important.  Though
for now I am not quite sure it is possible.


> -            p += pg_mblen(p);
> +        {
> +            int t = pg_mblen(p);
> +            p += t;
> +            max_copy_bytes -= t;
> +        }

Minor issue: Here I would change variable name from "t" to "char_len" or
something, to make code more easy to understand.

Major issue: is pg_mblen function safe to call with broken encoding at the end
of buffer? What if last byte of the buffer is 0xF0 and you call pg_mblen for it?


>+        copy_bytes = p - s;
>+        if(copy_bytes > max_copy_bytes)
>+            copy_bytes = max_copy_bytes;

Here I would suggest to add comment about broken utf encoding case. That would
explain why we might come to situation when we can try to copy more than we
have.

I would also suggest to issue a warning here. I guess person that uses
postgres would prefer to know that he managed to stuff into postgres a string
with broken utf encoding, before it comes to some terrible consequences.

> Hi Spyridon,
>
> > The column "single_byte_col" is supposed to store only 1 byte.
> > Nevertheless, the INSERT command implicitly casts the '🀆' text into
> > "char". This means that only the first byte of '🀆' ends up stored in the
> > column. gdb reports that "pg_mblen(p) = 4" (line 1046), which is expected
> > since the pg_mblen('🀆') is indeed 4. Later at line 1050, the memcpy will
> > copy 4 bytes instead of 1, hence an out of bounds memory read happens for
> > pointer 's', which effectively copies random bytes.
> Many thanks for reporting this!
>
> > - OS: Ubuntu 20.04
> > - PSQL version 14.4
>
> I can confirm the bug exists in the `master` branch as well and
> doesn't depend on the platform.
>
> Although the bug is easy to fix for this particular case (see the
> patch) I'm not sure if this solution is general enough. E.g. is there
> something that generally prevents pg_mblen() from doing out of bound
> reading in cases similar to this one? Should we prevent such an INSERT
> from happening instead?


--
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su

Attachment

Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.

From
Spyridon Dimitrios Agathos
Date:
Hi all,

this is to verify that the .patch proposed here:


fixes the issue. I applied the patch and:
1) The build type doesn't affect the result of the query result
2) The valgrind doesn't complain about out of bound reads
3) The output of the "faulty" insertion is shown in "\ooo format".

Looking forward to the next steps.

--
Spiros
(ServiceNow)


Στις Πέμ 14 Ιουλ 2022 στις 2:08 μ.μ., ο/η Nikolay Shaplov <dhyan@nataraj.su> έγραψε:
В письме от среда, 13 июля 2022 г. 16:14:39 MSK пользователь Aleksander
Alekseev написал:

Hi! Let me join the review process. Postgres data types is field of expertise I
am interested in.

0. Though it looks like a steady bug, I can't reproduce it. Not using
valgrind, not using ASan (address sanitizer should catch reading out of bounds
too). I am running Debian Bullseye, and tried to build both postgresl 14.4 and
current master.

Never the less I would dig into this issue. And start with the parts that is
not covered by the patch, but seems to be important for me.

1. typename "char" (with quotes) is very-very-very confusing. it is described
in documentation, but you need to be postgres expert or careful documentation
reader, to notice important difference between "char" and char.
What is the history if "char" type? Is it specified by some standard? May be it
is good point to create more understandable alias, like byte_char, ascii_char
or something for usage in practice, and keep "char" for backward compatibility
only.

2. I would totally agree with  Tom Lane and Isaac Morland, that problem should
be also fixed  on the side of type conversion.  There is whole big thread about
it. Guess we should come to some conclusion there

3.Fixing out of bound reading for broken unicode is also important.  Though
for now I am not quite sure it is possible.


> -                     p += pg_mblen(p);
> +             {
> +                     int t = pg_mblen(p);
> +                     p += t;
> +                     max_copy_bytes -= t;
> +             }

Minor issue: Here I would change variable name from "t" to "char_len" or
something, to make code more easy to understand.

Major issue: is pg_mblen function safe to call with broken encoding at the end
of buffer? What if last byte of the buffer is 0xF0 and you call pg_mblen for it?


>+              copy_bytes = p - s;
>+              if(copy_bytes > max_copy_bytes)
>+                      copy_bytes = max_copy_bytes;

Here I would suggest to add comment about broken utf encoding case. That would
explain why we might come to situation when we can try to copy more than we
have.

I would also suggest to issue a warning here. I guess person that uses
postgres would prefer to know that he managed to stuff into postgres a string
with broken utf encoding, before it comes to some terrible consequences. 

> Hi Spyridon,
>
> > The column "single_byte_col" is supposed to store only 1 byte.
> > Nevertheless, the INSERT command implicitly casts the '🀆' text into
> > "char". This means that only the first byte of '🀆' ends up stored in the
> > column. gdb reports that "pg_mblen(p) = 4" (line 1046), which is expected
> > since the pg_mblen('🀆') is indeed 4. Later at line 1050, the memcpy will
> > copy 4 bytes instead of 1, hence an out of bounds memory read happens for
> > pointer 's', which effectively copies random bytes.
> Many thanks for reporting this!
>
> > - OS: Ubuntu 20.04
> > - PSQL version 14.4
>
> I can confirm the bug exists in the `master` branch as well and
> doesn't depend on the platform.
>
> Although the bug is easy to fix for this particular case (see the
> patch) I'm not sure if this solution is general enough. E.g. is there
> something that generally prevents pg_mblen() from doing out of bound
> reading in cases similar to this one? Should we prevent such an INSERT
> from happening instead?


--
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su
Spyridon Dimitrios Agathos <spyridon.dimitrios.agathos@gmail.com> writes:
> this is to verify that the .patch proposed here:
> https://www.postgresql.org/message-id/flat/2318797.1638558730%40sss.pgh.pa.us
> fixes the issue.

> Looking forward to the next steps.

That's been committed into HEAD and v15, without pushback so far.
So the complained-of case is no longer reachable in those branches.

I think we should reject Aleksander's patch, on the grounds that
it's now unnecessary --- or if you want to argue that it's still
necessary, then it's woefully inadequate, because there are surely
a bunch of other text-processing functions that will also misbehave
on wrongly-encoded data.  But our general policy for years has been
that we check incoming text for encoding validity and then presume
that it is valid in manipulation operations.

What remains to be debated is whether to push ec62ce55a into the
stable branches.  While we've not had pushback about the change
in 15beta3, that hasn't been out very long, so I don't know how
much faith to put in the lack of complaints.  Should we wait
longer before deciding?

I'm leaning to the idea that we should not back-patch, because
this issue has been there for years with few complaints; it's
not clear that closing the hole is worth creating a compatibility
hazard in minor releases.  On the other hand, you could argue
that we should back-patch so that back-branch charin() will
understand the strings that can now be emitted by v15 charout().
Failing to do so will result in a different sort of compatibility
problem.

            regards, tom lane



On Thu, Sep 01, 2022 at 03:35:52PM -0400, Tom Lane wrote:
> Spyridon Dimitrios Agathos <spyridon.dimitrios.agathos@gmail.com> writes:
> > this is to verify that the .patch proposed here:
> > https://www.postgresql.org/message-id/flat/2318797.1638558730%40sss.pgh.pa.us
> > fixes the issue.
> 
> > Looking forward to the next steps.
> 
> That's been committed into HEAD and v15, without pushback so far.
> So the complained-of case is no longer reachable in those branches.
> 
> I think we should reject Aleksander's patch, on the grounds that
> it's now unnecessary --- or if you want to argue that it's still
> necessary, then it's woefully inadequate, because there are surely
> a bunch of other text-processing functions that will also misbehave
> on wrongly-encoded data.  But our general policy for years has been
> that we check incoming text for encoding validity and then presume
> that it is valid in manipulation operations.

pg_upgrade carries forward invalid text.  A presumption of encoding validity
won't be justified any sooner than a presumption of not finding HEAP_MOVED_OFF
flags.  Hence, I think there should exist another policy that text-processing
functions prevent severe misbehavior when processing invalid text.
Out-of-bounds memory access qualifies as severe.

> I'm leaning to the idea that we should not back-patch, because
> this issue has been there for years with few complaints; it's
> not clear that closing the hole is worth creating a compatibility
> hazard in minor releases.

I would not back-patch.

> On the other hand, you could argue
> that we should back-patch so that back-branch charin() will
> understand the strings that can now be emitted by v15 charout().
> Failing to do so will result in a different sort of compatibility
> problem.

If concerned, I'd back-patch enough of the read side only, not the output
side.  I wouldn't bother, though.



Noah Misch <noah@leadboat.com> writes:
> On Thu, Sep 01, 2022 at 03:35:52PM -0400, Tom Lane wrote:
>> I think we should reject Aleksander's patch, on the grounds that
>> it's now unnecessary --- or if you want to argue that it's still
>> necessary, then it's woefully inadequate, because there are surely
>> a bunch of other text-processing functions that will also misbehave
>> on wrongly-encoded data.  But our general policy for years has been
>> that we check incoming text for encoding validity and then presume
>> that it is valid in manipulation operations.

> pg_upgrade carries forward invalid text.  A presumption of encoding validity
> won't be justified any sooner than a presumption of not finding HEAP_MOVED_OFF
> flags.  Hence, I think there should exist another policy that text-processing
> functions prevent severe misbehavior when processing invalid text.
> Out-of-bounds memory access qualifies as severe.

Well ... that sounds great in the abstract, but it's not clear to me
that the problem justifies either the amount of developer effort it'd
take to close all the holes, or the performance hits we'd likely take.
In any case, changing only text_substring() isn't going to move the
ball very far at all.

>> I'm leaning to the idea that we should not back-patch, because
>> this issue has been there for years with few complaints; it's
>> not clear that closing the hole is worth creating a compatibility
>> hazard in minor releases.

> I would not back-patch.

OK.  Let's close out this CF item as RWF, then.

            regards, tom lane